Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu #191

Open
charliedream1 opened this issue Nov 9, 2024 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@charliedream1
Copy link

对于不在OpenCompassBackendManager.list_datasets()里的数据,提示不支持,但咱们文档里给出的opencompass写的又是支持的,对于这些测试集该怎么测试?

@Yunnglin
Copy link
Collaborator

Yunnglin commented Nov 11, 2024

cmmlu数据集我们是支持,这里列出的数据集我们都支持 ,请问您的执行命令是什么?

@charliedream1
Copy link
Author

cmmu在最后的报告也没有打印出来,flores_100这个名字写什么,换了几个名字,最后报告里,这个结果都是空的

@charliedream1
Copy link
Author

另外,想测这个页面里的翻译这2个,都没法测。名字该写什么?

https://evalscope.readthedocs.io/zh-cn/latest/get_started/supported_dataset.html

翻译

  • Flores
  • IWSLT2017

@charliedream1
Copy link
Author

另外,输出报告,建议在最后能加一个平均值

@charliedream1
Copy link
Author

# Copyright (c) Alibaba, Inc. and its affiliates.

"""
1. Installation
EvalScope: pip install evalscope[opencompass]

2. Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip

3. Deploy model serving
    swift deploy --model_type qwen2-1_5b-instruct

4. Run eval task
"""
from evalscope.backend.opencompass import OpenCompassBackendManager
from evalscope.run import run_task
from evalscope.summarizer import Summarizer


def run_swift_eval():

    # List all datasets
    # e.g.  ['mmlu', 'WSC', 'DRCD', 'chid', 'gsm8k', 'AX_g', 'BoolQ', 'cmnli', 'ARC_e', 'ocnli_fc', 'summedits', 'MultiRC', 'GaokaoBench', 'obqa', 'math', 'agieval', 'hellaswag', 'RTE', 'race', 'ocnli', 'strategyqa', 'triviaqa', 'WiC', 'COPA', 'piqa', 'nq', 'mbpp', 'csl', 'Xsum', 'CB', 'tnews', 'ARC_c', 'afqmc', 'eprstmt', 'ReCoRD', 'bbh', 'CMRC', 'AX_b', 'siqa', 'storycloze', 'humaneval', 'cluewsc', 'winogrande', 'lambada', 'ceval', 'bustm', 'C3', 'lcsts']
    print(
        f"** All datasets from OpenCompass backend: {OpenCompassBackendManager.list_datasets()}"
    )

    # Prepare the config
    """
    Attributes:
        `eval_backend`: Default to 'OpenCompass'
        `datasets`: list, refer to `OpenCompassBackendManager.list_datasets()`
        `models`: list of dict, each dict must contain `path` and `openai_api_base` 
                `path`: reuse the value of '--model_type' in the command line `swift deploy`
                `openai_api_base`: the base URL of swift model serving
        `work_dir`: str, the directory to save the evaluation results、logs and summaries. Default to 'outputs/default'
                
        Refer to `opencompass.cli.arguments.ApiModelConfig` for other optional attributes.
    """
    # Option 1: Use dict format
    # Args:
    #   path: The path of the model, it means the `model_type` for swift, e.g. 'llama3-8b-instruct'
    #   is_chat: True for chat model, False for base model
    #   key: The OpenAI api-key of the model api, default to 'EMPTY'
    #   openai_api_base: The base URL of the OpenAI API, it means the swift model serving URL.
    task_cfg = dict(
        eval_backend="OpenCompass",
        eval_config={
            "datasets": ["Xsum", "triviaqa", "cmmlu",
            "OpenBookQA", "GaokaoBench", "flores_100",
            "tnews", 'WSC', "hellaswag",
            "ceval", "mmlu", "math", "gsm8k",
            "humaneval", "mbpp", "bbh"],
            "models": [
                {
                    "path": "qwen2-7b-instruct",  # Please make sure the model is deployed
                    "openai_api_base": "http://127.0.0.1:8000/v1/chat/completions",
                    "is_chat": True,
                    "batch_size": 16,
                },
            ],
            "work_dir": "outputs/qwen2_eval_result",
            "limit": 10,
        },
    )

    # Option 2: Use yaml file
    # task_cfg = 'examples/tasks/default_eval_swift_openai_api.yaml'

    # Option 3: Use json file
    # task_cfg = 'examples/tasks/default_eval_swift_openai_api.json'

    # Run task
    run_task(task_cfg=task_cfg)

    # [Optional] Get the final report with summarizer
    print(">> Start to get the report with summarizer ...")
    report_list = Summarizer.get_report_from_cfg(task_cfg)
    print(f"\n>>The report list: {report_list}")


if __name__ == "__main__":
    run_swift_eval()

代码这么写的

@wangxingjun778
Copy link
Collaborator

cmmlu加上了,已经merge到main;可以先源码安装:
pip install git+https://github.com/modelscope/evalscope.git@main

@charliedream1
Copy link
Author

charliedream1 commented Nov 12, 2024 via email

@wangxingjun778
Copy link
Collaborator

另外,想测这个页面里的翻译这2个,都没法测。名字该写什么?

https://evalscope.readthedocs.io/zh-cn/latest/get_started/supported_dataset.html

翻译

  • Flores
  • IWSLT2017

这俩数据集暂时还没支持,supported_dataset中一些数据集名称跟目前实际已支持的数据集有些diff; 后续文档很快会对齐上。

@charliedream1
Copy link
Author

charliedream1 commented Nov 12, 2024 via email

@wangxingjun778
Copy link
Collaborator

wangxingjun778 commented Nov 12, 2024

好,谢谢。其它集子是也还没加入是吗?

---原始邮件--- 发件人: @.> 发送时间: 2024年11月12日(周二) 下午5:48 收件人: @.>; 抄送: "Optimus @.@.>; 主题: Re: [modelscope/evalscope] OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu (Issue #191) cmmlu加上了,已经merge到main;可以先源码安装: pip install @.*** — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

可以提需求,哪些需要支持,我们高优接入一下哈(目前复用OpenCompass的benchmark需要针对最新模型复现和验证一下对齐效果)

@Yunnglin Yunnglin added the enhancement New feature or request label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants