Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 1.76 KB

Readme.md

File metadata and controls

36 lines (23 loc) · 1.76 KB

Overview

This repository contains the data and code of Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction

Dataset

In the dataset folder, we have the dataset constructed by humans (CR WSC-H) and dataset constructed by machines (CR WSC-M) , which are wsc_273_annotated_final.csv and generated_modify_tq.csv.

In CR WSC-H (wsc_273_annotated_final.csv), the first seven rows are basic question information and answers from WSC. The row Q is the concept and row R is the text modified if we can not replace the original answers with concept. The row H-M are the results of gpt3 and the analysis of the question.

In CR WSC-M (generated_modify_tq.csv), the text is the original question. The entity is the result of LLM. The use means whether the entities generated by LLMs is adversarial enough.

Code

In code folder, we have the code to construct the dataset and the code to evaluate different methods.

In wsc_get_more.ipynb, we get more questions to construct the dataset.

In Model_wsc_H.ipynb and Model_wsc_M.ipynb, we evaluate the performance of different methods on the CR WSC-H and CR WSC-M.

Citation

Please kindly cite the following paper if you found our dataset and code helpful!

@misc{han2024conceptreversedwinogradschemachallenge,
      title={Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction}, 
      author={Kaiqiao Han and Tianqing Fang and Zhaowei Wang and Yangqiu Song and Mark Steedman},
      year={2024},
      eprint={2410.12040},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.12040}, 
}