Skip to content

ashish-tripathy-5/g-eval-nextjs

Repository files navigation

GEval Typescript Implementation

Overview

The GEval API provides a method for evaluating model-generated outputs using the GEval framework. It assesses how well the model's output matches expected results based on custom criteria, evaluation steps, and other optional parameters.

G-Eval paper: https://arxiv.org/abs/2303.16634

G-Eval Python Implementation: https://github.com/confident-ai/deepeval

API Endpoint

POST /api/geval

This endpoint evaluates a test case based on input, actual output, criteria, and optional evaluation steps, context, and retrieval context. It uses OpenAI models to calculate a score for the evaluation and provides an explanation for the assigned score.

Request Format

The API expects a JSON payload with the following fields:

Required Fields

  • name: (String) The name of the evaluation or test case. Example: "order_relevance".
  • input: (String) The input or question given to the model. Example: "Python course roadmap for beginners first module".
  • actualOutput: (String) The actual output generated by the model. Example: "- module 1: python basics".

Optional Fields

  • criteria: (String) (Optional) The criteria based on which the output is evaluated. Example: "check if the course has the correct order for the intended audience".
  • expectedOutput: (String) (Optional) The expected output for comparison. Example: "module 1: python basics".
  • evaluationSteps: (Array of Strings) (Optional) A list of step-by-step evaluation criteria. Example: ["Verify the order of modules", "Check if topics follow a logical progression for beginners"].
  • context: (String) (Optional) Additional context or background information. Example: "Python is a fundamental programming language, and a roadmap for beginners should start with basics.".
  • retrievalContext: (String) (Optional) Additional information from retrieval context. Example: "Python is often taught with a clear progression from basics to advanced topics.".

Example Request (with curl)

curl -X POST http://localhost:3001/api/geval \
-H "Content-Type: application/json" \
-d '{
  "name": "order_relevance",
  "input": "Python course roadmap for beginners first module",
  "actualOutput": "- module 1: python basics",
  "criteria": "check if the course has correct order for the required audience",
  "evaluationSteps": [
    "Ensure the order of topics follows a logical progression",
    "Check if the content is appropriate for beginners"
  ]
}'

Example Request (with expectedOutput)

curl -X POST http://localhost:3001/api/geval \
-H "Content-Type: application/json" \
-d '{
  "name": "output_accuracy",
  "input": "What is the capital of France?",
  "actualOutput": "The capital of France is Paris.",
  "expectedOutput": "Paris",
  "evaluationSteps": [
    "Verify if the actual output matches the expected output",
    "Check if the response provides accurate information"
  ],
  "context": "Paris is the capital of France."
}'

Response Format

The API will respond with a JSON object containing the evaluation results. The fields include:

  • score: (Number) The evaluation score, ranging from 0 to 1.
  • reason: (String) A concise explanation for the assigned score.

Example Response

{
  "score": 0.9,
  "reason": "The output is relevant and follows the expected order for beginners."
}

Parameters Summary

Parameter Type Required Description
name String Yes Name of the evaluation or test case. Example: "order_relevance".
input String Yes The input given to the model. Example: "Python course roadmap for beginners first module".
actualOutput String Yes The actual output generated by the model. Example: "- module 1: python basics".
criteria String No Specific criteria for evaluation. Example: "check if the course has correct order for beginners".
expectedOutput String No The expected output for comparison. Example: "module 1: python basics".
evaluationSteps Array of Strings No Step-by-step criteria for evaluation. Example: ["Verify the order of modules", "Check progression"].
context String No Additional background information for evaluation. Example: "Python is a fundamental language".
retrievalContext String No Retrieval context information. Example: "Python is often taught progressively from basics".

Response Summary

Field Type Description
score Number The final evaluation score, between 0 and 1.
reason String A concise explanation of the evaluation score and key observations.

By using this API, you can easily automate and perform evaluations of model-generated outputs based on specific criteria and custom evaluation steps.

Author

Founder of Pype, [email protected]

License

Licensed under the Apache License, Version 2.0; you may not use this file except in compliance with the License. You may obtain a copy of the License at (LICENSE. Portions of this project are derived from [Deepeval], available at [https://github.com/confident-ai/deepeval/], licensed under the Apache License, Version 2.0.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published