Skip to content

Latest commit

 

History

History
54 lines (34 loc) · 2.95 KB

File metadata and controls

54 lines (34 loc) · 2.95 KB

Data

This folder contains the datasets used in the project. The datasets are structured as follows:

Static Badge

The dataset contains span-level annotations for errors in long-form answers as per our defined evaluation criteria:

  1. Question misconception: False assumptions made within the given question.
  2. Factuality: Accuracy and correctness of the answer as per verifiable facts.
  3. Relevance: Specificity and meaningfulness of the answer.
  4. Completeness: Answer comprehensiveness ensuring all question aspects are addressed.
  5. References: (Un)helpful examples, analogies, and external references (websites or links) in the answer.

A subset of the dataset is shown below, where given a question and two possible answers (human and GPT-4), the {evaluation_criteria}_span column indicates the error spans in the answer for the respective evaluation criteria and the error justifications are given in {evaluation_criteria}_reason column.

HaluQuestQA


Static Badge

This dataset consists of question-answer pairs with expert span-level annotations for completeness aspect, along with justifications.

A subset of the dataset is shown below, where the instruction column consists the task instruction, input column consists of the question-answer pair to evaluate, and the output column contains the sentence-level tag [Complete/ Incomplete] for the answer, along with justification for the incompleteness.

incomplete_ans_data

Note

This dataset is used to train the error-feedback model.


Static Badge

The preference dataset consists of a question with two possible answers: one from humans and the other from GPT-4. Expert annotators choose the better answer based on our defined evaluation criteria.

A subset of the dataset is shown below, where given a question, the preferred responses are present in the preferred_response column and the rejected responses are present in the rejected_response column.

preference_data

Note

This dataset is used for DPO preference optimization of the refinement model.


CC BY-SA 4.0

The datasets are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0