Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when evaluating MMLU #91

Open
zjq0455 opened this issue Sep 4, 2024 · 2 comments
Open

Error when evaluating MMLU #91

zjq0455 opened this issue Sep 4, 2024 · 2 comments

Comments

@zjq0455
Copy link

zjq0455 commented Sep 4, 2024

I have added --tasks hendrycksTest in my command, but gotten this error:*

Selected Tasks: ['hendrycksTest-college_medicine', 'hendrycksTest-high_school_macroeconomics', 'hendrycksTest-security_studies', 'hendrycksTest-computer_security', 'hendrycksTest-philosophy', 'hendrycksTest-moral_disputes', 'hendrycksTest-high_school_computer_science', 'hendrycksTest-virology', 'hendrycksTest-college_biology', 'hendrycksTest-business_ethics', 'hendrycksTest-college_computer_science', 'hendrycksTest-college_mathematics', 'hendrycksTest-electrical_engineering', 'hendrycksTest-high_school_government_and_politics', 'hendrycksTest-human_sexuality', 'hendrycksTest-conceptual_physics', 'hendrycksTest-us_foreign_policy', 'hendrycksTest-high_school_world_history', 'hendrycksTest-professional_medicine', 'hendrycksTest-jurisprudence', 'hendrycksTest-machine_learning', 'hendrycksTest-miscellaneous', 'hendrycksTest-college_physics', 'hendrycksTest-medical_genetics', 'hendrycksTest-college_chemistry', 'hendrycksTest-high_school_psychology', 'hendrycksTest-elementary_mathematics', 'hendrycksTest-anatomy', 'hendrycksTest-astronomy', 'hendrycksTest-international_law', 'hendrycksTest-human_aging', 'hendrycksTest-moral_scenarios', 'hendrycksTest-professional_psychology', 'hendrycksTest-world_religions', 'hendrycksTest-high_school_european_history', 'hendrycksTest-marketing', 'hendrycksTest-prehistory', 'hendrycksTest-formal_logic', 'hendrycksTest-logical_fallacies', 'hendrycksTest-professional_accounting', 'hendrycksTest-abstract_algebra', 'hendrycksTest-high_school_physics', 'hendrycksTest-high_school_geography', 'hendrycksTest-management', 'hendrycksTest-nutrition', 'hendrycksTest-clinical_knowledge', 'hendrycksTest-high_school_mathematics', 'hendrycksTest-global_facts', 'hendrycksTest-high_school_microeconomics', 'hendrycksTest-professional_law', 'hendrycksTest-econometrics', 'hendrycksTest-sociology', 'hendrycksTest-high_school_us_history', 'hendrycksTest-high_school_biology', 'hendrycksTest-high_school_chemistry', 'hendrycksTest-high_school_statistics', 'hendrycksTest-public_relations']
/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/load.py:1461: FutureWarning: The repository for hendrycks_test contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/hendrycks_test
You can avoid this message in future by passing the argument trust_remote_code=True.
Passing trust_remote_code=True will be mandatory to load this dataset from the next major release of datasets.
warnings.warn(
Traceback (most recent call last):
File "/root/zeroshot/eval_zero_shot.py", line 392, in
main()
File "/root/zeroshot/eval_zero_shot.py", line 388, in main
evaluate(lm, args,logger)
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/eval_zero_shot.py", line 183, in evaluate
t_results = evaluator.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/utils.py", line 185, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/evaluator.py", line 66, in simple_evaluate
task_dict = lm_eval.tasks.get_task_dict(task_names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/tasks/init.py", line 342, in get_task_dict
task_name_dict = {
^
File "/root/zeroshot/lm_eval/tasks/init.py", line 343, in
task_name: get_task(task_name)()
^^^^^^^^^^^^^^^^^^^^^
File "/root/zeroshot/lm_eval/tasks/hendrycks_test.py", line 100, in init
super().init(subject)
File "/root/zeroshot/lm_eval/tasks/hendrycks_test.py", line 112, in init
super().init()
File "/root/zeroshot/lm_eval/base.py", line 412, in init
self.download(data_dir, cache_dir, download_mode)
File "/root/zeroshot/lm_eval/base.py", line 441, in download
self.dataset = datasets.load_dataset(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/load.py", line 2582, in load_dataset
builder_instance.download_and_prepare(
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1005, in download_and_prepare
self._download_and_prepare(
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1767, in _download_and_prepare
super()._download_and_prepare(
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1100, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/builder.py", line 1565, in _prepare_split
split_info = self.info.splits[split_generator.name]
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/splits.py", line 532, in getitem
instructions = make_file_instructions(
^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/efficientqat/lib/python3.11/site-packages/datasets/arrow_reader.py", line 115, in make_file_instructions
raise TypeError(f"Expected str 'name', but got: {type(name).name}")
TypeError: Expected str 'name', but got: NoneType

@BaohaoLiao
Copy link

BaohaoLiao commented Sep 8, 2024

I also meet this issue.

It is because mmlu in huggingface is renamed to cais/mmlu. You need to change the datapath at

DATASET_PATH = "hendrycks_test"

@BaohaoLiao
Copy link

Even though I can run the code, I can't reproduce the reported results, even for the FP16 one. Not sure whether cais/mmlu is exactly the same as hendrycks_test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants