Skip to content

Latest commit

 

History

History
61 lines (40 loc) · 1.91 KB

evaluation-metrics.md

File metadata and controls

61 lines (40 loc) · 1.91 KB

Evaluation Metrics Glossary

Response Completeness and Conciseness

Measures if the LLM's response fully addresses the user's query and how relevant the generated response is³.

Text Similarity Metrics

Compares the generated text to a reference or benchmark text to assess similarity³.

Question Answering Accuracy

Evaluates the accuracy of the LLM's responses to questions³.

Relevance

Determines how relevant the LLM's responses are to the given prompts³.

Hallucination Index

Assesses the frequency of the LLM generating incorrect or nonsensical information³.

Toxicity

Measures the level of offensive or harmful content in the LLM's responses³.

Task-Specific Metrics

Various metrics tailored to specific tasks that the LLM performs, such as summarization, translation, etc³.

Certainly! Here's a list of 20 evaluation metrics for Large Language Models (LLMs) with brief descriptions:

Grammar and Syntax

Evaluates the correctness of language usage in the LLM's output.

Truthfulness

Assesses the factual accuracy of the information provided by the LLM.

Summary Capabilities

Measures the LLM's ability to produce concise and accurate summaries.

Problem Solving Capabilities

Evaluates the LLM's effectiveness in solving complex tasks.

Coherence

Checks for logical flow and consistency in the LLM's text.

Diversity

Measures the variety in the LLM's language and content generation.

Engagement

Assesses how engaging and interesting the LLM's responses are.

Empathy

Evaluates the LLM's ability to respond considerately to emotional cues.

Fairness

Measures bias in the LLM's responses.

Robustness

Assesses the LLM's performance across a wide range of inputs and conditions.

Scalability

Evaluates how well the LLM performs as the size of data or complexity increases.

Efficiency

Measures the computational resources required for the LLM to perform tasks.