Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

每个task的指标怎么计算的在哪里来看? #4

Open
HelloWorld4747 opened this issue Nov 12, 2023 · 2 comments
Open

每个task的指标怎么计算的在哪里来看? #4

HelloWorld4747 opened this issue Nov 12, 2023 · 2 comments

Comments

@HelloWorld4747
Copy link

您好,
想请教一下,就是每个task的指标怎么计算的在哪里看呀?有没有official的说明文档或者up-to-date的paper呢?

谢谢!

@brightmart
Copy link
Member

Agent基准参考了OPEN基准,采用被测模型与代表性国际模型进行对战形式,计算胜率。
具体的,被测模型与3.5进行对战,计算胜(得3分)、平(得1分)、和(得0分)的成绩,算总成绩,并进行归一化。总之,这是相对于同一个基准模型的相对分数或成绩。

@YinSonglin1997
Copy link

Agent基准参考了OPEN基准,采用被测模型与代表性国际模型进行对战形式,计算胜率。 具体的,被测模型与3.5进行对战,计算胜(得3分)、平(得1分)、和(得0分)的成绩,算总成绩,并进行归一化。总之,这是相对于同一个基准模型的相对分数或成绩。

徐老师您好,请问胜、平、和的分数是人为打分的吗?我理解的是模型对战时,两个模型会针对问题进行回答,但哪个答案更优是如何判断的呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants