-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu #191
Comments
cmmlu数据集我们是支持,这里列出的数据集我们都支持 ,请问您的执行命令是什么? |
cmmu在最后的报告也没有打印出来,flores_100这个名字写什么,换了几个名字,最后报告里,这个结果都是空的 |
另外,想测这个页面里的翻译这2个,都没法测。名字该写什么? https://evalscope.readthedocs.io/zh-cn/latest/get_started/supported_dataset.html 翻译
|
另外,输出报告,建议在最后能加一个平均值 |
# Copyright (c) Alibaba, Inc. and its affiliates.
"""
1. Installation
EvalScope: pip install evalscope[opencompass]
2. Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip
3. Deploy model serving
swift deploy --model_type qwen2-1_5b-instruct
4. Run eval task
"""
from evalscope.backend.opencompass import OpenCompassBackendManager
from evalscope.run import run_task
from evalscope.summarizer import Summarizer
def run_swift_eval():
# List all datasets
# e.g. ['mmlu', 'WSC', 'DRCD', 'chid', 'gsm8k', 'AX_g', 'BoolQ', 'cmnli', 'ARC_e', 'ocnli_fc', 'summedits', 'MultiRC', 'GaokaoBench', 'obqa', 'math', 'agieval', 'hellaswag', 'RTE', 'race', 'ocnli', 'strategyqa', 'triviaqa', 'WiC', 'COPA', 'piqa', 'nq', 'mbpp', 'csl', 'Xsum', 'CB', 'tnews', 'ARC_c', 'afqmc', 'eprstmt', 'ReCoRD', 'bbh', 'CMRC', 'AX_b', 'siqa', 'storycloze', 'humaneval', 'cluewsc', 'winogrande', 'lambada', 'ceval', 'bustm', 'C3', 'lcsts']
print(
f"** All datasets from OpenCompass backend: {OpenCompassBackendManager.list_datasets()}"
)
# Prepare the config
"""
Attributes:
`eval_backend`: Default to 'OpenCompass'
`datasets`: list, refer to `OpenCompassBackendManager.list_datasets()`
`models`: list of dict, each dict must contain `path` and `openai_api_base`
`path`: reuse the value of '--model_type' in the command line `swift deploy`
`openai_api_base`: the base URL of swift model serving
`work_dir`: str, the directory to save the evaluation results、logs and summaries. Default to 'outputs/default'
Refer to `opencompass.cli.arguments.ApiModelConfig` for other optional attributes.
"""
# Option 1: Use dict format
# Args:
# path: The path of the model, it means the `model_type` for swift, e.g. 'llama3-8b-instruct'
# is_chat: True for chat model, False for base model
# key: The OpenAI api-key of the model api, default to 'EMPTY'
# openai_api_base: The base URL of the OpenAI API, it means the swift model serving URL.
task_cfg = dict(
eval_backend="OpenCompass",
eval_config={
"datasets": ["Xsum", "triviaqa", "cmmlu",
"OpenBookQA", "GaokaoBench", "flores_100",
"tnews", 'WSC', "hellaswag",
"ceval", "mmlu", "math", "gsm8k",
"humaneval", "mbpp", "bbh"],
"models": [
{
"path": "qwen2-7b-instruct", # Please make sure the model is deployed
"openai_api_base": "http://127.0.0.1:8000/v1/chat/completions",
"is_chat": True,
"batch_size": 16,
},
],
"work_dir": "outputs/qwen2_eval_result",
"limit": 10,
},
)
# Option 2: Use yaml file
# task_cfg = 'examples/tasks/default_eval_swift_openai_api.yaml'
# Option 3: Use json file
# task_cfg = 'examples/tasks/default_eval_swift_openai_api.json'
# Run task
run_task(task_cfg=task_cfg)
# [Optional] Get the final report with summarizer
print(">> Start to get the report with summarizer ...")
report_list = Summarizer.get_report_from_cfg(task_cfg)
print(f"\n>>The report list: {report_list}")
if __name__ == "__main__":
run_swift_eval() 代码这么写的 |
cmmlu加上了,已经merge到main;可以先源码安装: |
好,谢谢。其它集子是也还没加入是吗?
…---原始邮件---
发件人: ***@***.***>
发送时间: 2024年11月12日(周二) 下午5:48
收件人: ***@***.***>;
抄送: "Optimus ***@***.******@***.***>;
主题: Re: [modelscope/evalscope] OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu (Issue #191)
cmmlu加上了,已经merge到main;可以先源码安装:
pip install ***@***.***
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
这俩数据集暂时还没支持,supported_dataset中一些数据集名称跟目前实际已支持的数据集有些diff; 后续文档很快会对齐上。 |
好,期待尽快合入
…---原始邮件---
发件人: ***@***.***>
发送时间: 2024年11月12日(周二) 下午5:56
收件人: ***@***.***>;
抄送: "Optimus ***@***.******@***.***>;
主题: Re: [modelscope/evalscope] OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu (Issue #191)
另外,想测这个页面里的翻译这2个,都没法测。名字该写什么?
https://evalscope.readthedocs.io/zh-cn/latest/get_started/supported_dataset.html
翻译
Flores
IWSLT2017
这俩数据集暂时还没支持,supported_dataset中一些数据集名称跟目前实际已支持的数据集有些diff; 后续文档很快会对齐上。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
可以提需求,哪些需要支持,我们高优接入一下哈(目前复用OpenCompass的benchmark需要针对最新模型复现和验证一下对齐效果) |
对于不在OpenCompassBackendManager.list_datasets()里的数据,提示不支持,但咱们文档里给出的opencompass写的又是支持的,对于这些测试集该怎么测试?
The text was updated successfully, but these errors were encountered: