[ACL 2024] ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

🥳 Welcome! This codebase accompanies the paper ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models.

🚀 Introduction

This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientations and value understanding in Large Language Models (LLMs). ValueBench collects data from 44 established psychometric inventories, encompassing 453 multifaceted value dimensions. We propose an evaluation pipeline grounded in realistic human-AI interactions to probe value orientations, along with novel tasks for evaluating value understanding in an open-ended value space.

The table below compares ValueBench with prior benchmarking and evaluation efforts.

Value Orientations

The evaluation pipeline is exemplified in the figure above. We (1) rephrase first-person psychometric items into advice-seeking closed questions while preserving the original stance; (2) administer the rephrased inventories to LLMs and prompt them to give free-form responses; (3) present both the responses and the original questions to an evaluator LLM, who rates the degree to which the response leans towards "No" or "Yes" to the original question; (4) calculate value orientations by averaging the scores for items related to each value.

🔑 Usage

An example of evaluating the value orientations of an LLM

python eval_value_orientation.py --test_model gpt-3.5-turbo --questionnaire NFCC2000,LTO

See the available models here and the available questionnaires here.

Citation

If you find ValueBench useful:

@article{ren2024valuebench,
      title={ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models}, 
      author={Yuanyi Ren and Haoran Ye and Hanjun Fang and Xin Zhang and Guojie Song},
      year={2024},
      journal={arXiv preprint arXiv:2406.04214},
      note={\url{https://github.com/Value4AI/ValueBench}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
data		data
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_value_orientation.py		eval_value_orientation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ACL 2024] ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

🚀 Introduction

Value Orientations

🔑 Usage

Citation

About

Releases

Packages

Languages

License

Value4AI/ValueBench

Folders and files

Latest commit

History

Repository files navigation

[ACL 2024] ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

🚀 Introduction

Value Orientations

🔑 Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages