Overview

This repository contains the data and code of Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction

Dataset

In the dataset folder, we have the dataset constructed by humans (CR WSC-H) and dataset constructed by machines (CR WSC-M) , which are wsc_273_annotated_final.csv and generated_modify_tq.csv.

In CR WSC-H (wsc_273_annotated_final.csv), the first seven rows are basic question information and answers from WSC. The row Q is the concept and row R is the text modified if we can not replace the original answers with concept. The row H-M are the results of gpt3 and the analysis of the question.

In CR WSC-M (generated_modify_tq.csv), the text is the original question. The entity is the result of LLM. The use means whether the entities generated by LLMs is adversarial enough.

Code

In code folder, we have the code to construct the dataset and the code to evaluate different methods.

In wsc_get_more.ipynb, we get more questions to construct the dataset.

In Model_wsc_H.ipynb and Model_wsc_M.ipynb, we evaluate the performance of different methods on the CR WSC-H and CR WSC-M.

Citation

Please kindly cite the following paper if you found our dataset and code helpful!

@misc{han2024conceptreversedwinogradschemachallenge,
      title={Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction}, 
      author={Kaiqiao Han and Tianqing Fang and Zhaowei Wang and Yangqiu Song and Mark Steedman},
      year={2024},
      eprint={2410.12040},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.12040}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

Overview

Dataset

Code

Citation

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

Overview

Dataset

Code

Citation