The GEval API provides a method for evaluating model-generated outputs using the GEval framework. It assesses how well the model's output matches expected results based on custom criteria, evaluation steps, and other optional parameters.
G-Eval paper: https://arxiv.org/abs/2303.16634
G-Eval Python Implementation: https://github.com/confident-ai/deepeval
POST /api/geval
This endpoint evaluates a test case based on input, actual output, criteria, and optional evaluation steps, context, and retrieval context. It uses OpenAI models to calculate a score for the evaluation and provides an explanation for the assigned score.
The API expects a JSON payload with the following fields:
- name: (String) The name of the evaluation or test case. Example:
"order_relevance"
. - input: (String) The input or question given to the model. Example:
"Python course roadmap for beginners first module"
. - actualOutput: (String) The actual output generated by the model. Example:
"- module 1: python basics"
.
- criteria: (String) (Optional) The criteria based on which the output is evaluated. Example:
"check if the course has the correct order for the intended audience"
. - expectedOutput: (String) (Optional) The expected output for comparison. Example:
"module 1: python basics"
. - evaluationSteps: (Array of Strings) (Optional) A list of step-by-step evaluation criteria. Example:
["Verify the order of modules", "Check if topics follow a logical progression for beginners"]
. - context: (String) (Optional) Additional context or background information. Example:
"Python is a fundamental programming language, and a roadmap for beginners should start with basics."
. - retrievalContext: (String) (Optional) Additional information from retrieval context. Example:
"Python is often taught with a clear progression from basics to advanced topics."
.
curl -X POST http://localhost:3001/api/geval \
-H "Content-Type: application/json" \
-d '{
"name": "order_relevance",
"input": "Python course roadmap for beginners first module",
"actualOutput": "- module 1: python basics",
"criteria": "check if the course has correct order for the required audience",
"evaluationSteps": [
"Ensure the order of topics follows a logical progression",
"Check if the content is appropriate for beginners"
]
}'
curl -X POST http://localhost:3001/api/geval \
-H "Content-Type: application/json" \
-d '{
"name": "output_accuracy",
"input": "What is the capital of France?",
"actualOutput": "The capital of France is Paris.",
"expectedOutput": "Paris",
"evaluationSteps": [
"Verify if the actual output matches the expected output",
"Check if the response provides accurate information"
],
"context": "Paris is the capital of France."
}'
The API will respond with a JSON object containing the evaluation results. The fields include:
- score: (Number) The evaluation score, ranging from 0 to 1.
- reason: (String) A concise explanation for the assigned score.
{
"score": 0.9,
"reason": "The output is relevant and follows the expected order for beginners."
}
Parameter | Type | Required | Description |
---|---|---|---|
name | String | Yes | Name of the evaluation or test case. Example: "order_relevance" . |
input | String | Yes | The input given to the model. Example: "Python course roadmap for beginners first module" . |
actualOutput | String | Yes | The actual output generated by the model. Example: "- module 1: python basics" . |
criteria | String | No | Specific criteria for evaluation. Example: "check if the course has correct order for beginners" . |
expectedOutput | String | No | The expected output for comparison. Example: "module 1: python basics" . |
evaluationSteps | Array of Strings | No | Step-by-step criteria for evaluation. Example: ["Verify the order of modules", "Check progression"] . |
context | String | No | Additional background information for evaluation. Example: "Python is a fundamental language" . |
retrievalContext | String | No | Retrieval context information. Example: "Python is often taught progressively from basics" . |
Field | Type | Description |
---|---|---|
score | Number | The final evaluation score, between 0 and 1. |
reason | String | A concise explanation of the evaluation score and key observations. |
By using this API, you can easily automate and perform evaluations of model-generated outputs based on specific criteria and custom evaluation steps.
Founder of Pype, [email protected]
Licensed under the Apache License, Version 2.0; you may not use this file except in compliance with the License. You may obtain a copy of the License at (LICENSE. Portions of this project are derived from [Deepeval], available at [https://github.com/confident-ai/deepeval/], licensed under the Apache License, Version 2.0.