Added COT Metric and Adapter to MMLU Pro #3162

siyagoel · 2024-11-15T11:56:57Z

Adjusted lite_run_specs.py to include COT implementation of MMLU Pro.

yifanmai · 2024-11-16T00:24:43Z

src/helm/benchmark/static/schema_lite_v2.yaml

-      when: "?"
-      language: English
-
-  - name: ifeval


Don't delete IFEval.

yifanmai · 2024-11-16T00:24:52Z

src/helm/benchmark/static/schema_lite_v2.yaml

@@ -135,7 +140,6 @@ run_groups:
    subgroups:
      - mmlu_pro
      - gpqa
-      - ifeval


Don't delete IFEval.

yifanmai · 2024-11-16T00:25:10Z

src/helm/benchmark/static/schema_lite_v2.yaml

@@ -162,24 +166,7 @@ run_groups:
      - efficiency
      - general_information
    environment:
-      main_name: exact_match  # non-CoT


Don't delete the rest of the environment and taxonomy.

yifanmai · 2024-11-16T00:25:19Z

src/helm/benchmark/static/schema_lite_v2.yaml

+  - name: chain_of_thought_correct
+    display_name: COT correct
+    short_display_name: COT correct
+    description: TBD.


Add description.

yifanmai · 2024-11-16T00:25:49Z

src/helm/benchmark/static/schema_lite_v2.yaml

@@ -93,6 +93,11 @@ metrics:
    short_display_name: IFEval Strict Acc
    description: Fraction of instructions in the instance that are correctly followed.
    lower_is_better: false
+  - name: chain_of_thought_correct
+    display_name: COT correct


"Chain of thought correctness" or something more descriptive like that.

yifanmai · 2024-11-16T00:42:09Z

src/helm/benchmark/run_specs/lite_run_specs.py

+                ),
+                input_noun="Question",
+                input_suffix="\nChoices: \n",
+                reference_prefix="(A) ",


Delete reference_prefix (default to "A. " which is used if reference_prefix is unspecified - this follows the paper).

yifanmai · 2024-11-16T00:43:01Z

src/helm/benchmark/run_specs/lite_run_specs.py

+                chain_of_thought_suffix="The correct answer is ",
+                output_noun="",  # will be overwritten with output_prefix
+                output_prefix="",
+                global_suffix=(


Follow the paper - they don't use this suffix.

Delete global_suffix

yifanmai · 2024-11-16T00:45:58Z

src/helm/benchmark/run_specs/lite_run_specs.py

+                input_suffix="\nChoices: \n",
+                reference_prefix="(A) ",
+                chain_of_thought_prefix="Let's think step by step: ",
+                chain_of_thought_suffix="The correct answer is ",


I think this results in adding the answer twice to the prompt e.g. "The answer is (A). The correct answer is A"

We need to deal with this somehow, probably in the adapter. I'm okay with defering this fix to another pull request.

yifanmai · 2024-11-16T00:46:39Z

src/helm/benchmark/run_specs/lite_run_specs.py

-        metric_specs=get_exact_match_metric_specs(),
+        metric_specs=get_exact_match_metric_specs()
+        + [
+            MetricSpec(class_name="helm.benchmark.metrics.chain_of_thought_metric.ChainOfThoughtMetric", args={}),


Only add this metric if chain of thought is used.

Address this in GPQA as well

yifanmai · 2024-11-16T00:46:45Z

src/helm/benchmark/run_specs/lite_run_specs.py

-        metric_specs=get_exact_match_metric_specs(),  # TODO: update this after cot metric is ready
+        metric_specs=get_exact_match_metric_specs()
+        + [
+            MetricSpec(class_name="helm.benchmark.metrics.chain_of_thought_metric.ChainOfThoughtMetric", args={}),


Only add this metric if chain of thought is used.

yifanmai · 2024-11-18T23:09:45Z

src/helm/benchmark/scenarios/mmlu_pro.py

Somehow didn't catch this before, but please rename this file to mmlu_pro_scenario.py to match the convention.

siyagoel · 2024-12-06T00:37:05Z

Redundant

siyagoel and others added 21 commits November 11, 2024 15:17

Committing changes for COT metric

fad62fd

Changes for COT metrix

89460ec

Changes to COT metric

a366e24

Changes to COT Metric

d676183

Changes made to file.

de6b9b1

Changes made

6c09cbc

Committing changes

2e02fb7

Changes committed

d039a9d

orrect changes to metric

af03185

format changes

d675da0

changes

16afbbe

Merge branch 'main' into siyagoel/cotmetric

4a8e167

changes to file

d367578

changed format

23968c2

changes to file by deleting

90ac194

reformat file

7cfbb1c

changes in files for schema_lite_z2.yaml

c876828

Changes to address comments

97a9aff

changes added based on comments

6d5eb55

MMLU Pro With Metric

1398ab2

format changes to files

243057e

siyagoel requested a review from yifanmai November 15, 2024 12:10

This was referenced Nov 16, 2024

Added Metric for COT #3159

Closed

Addressed Comments on GPQA Metric and MMLU Pro Non-COT Repo #3161

Closed

yifanmai requested changes Nov 16, 2024

View reviewed changes

yifanmai reviewed Nov 18, 2024

View reviewed changes

siyagoel added 3 commits December 5, 2024 11:28

Changes to most recent comments

0ce9bc9

changes for format

bd7edc1

changed the correctness metric

c9b2082

siyagoel mentioned this pull request Dec 5, 2024

Mmlupro with additional COT changes #3199

Closed

siyagoel closed this Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added COT Metric and Adapter to MMLU Pro #3162

Added COT Metric and Adapter to MMLU Pro #3162

siyagoel commented Nov 15, 2024

yifanmai Nov 16, 2024

yifanmai Nov 16, 2024

yifanmai Nov 16, 2024

yifanmai Nov 16, 2024

yifanmai Nov 16, 2024

yifanmai Nov 16, 2024

yifanmai Nov 16, 2024

siyagoel Nov 18, 2024

yifanmai Nov 16, 2024

yifanmai Nov 16, 2024

siyagoel Nov 18, 2024

yifanmai Nov 16, 2024

yifanmai Nov 18, 2024

siyagoel commented Dec 6, 2024

Added COT Metric and Adapter to MMLU Pro #3162

Added COT Metric and Adapter to MMLU Pro #3162

Conversation

siyagoel commented Nov 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siyagoel commented Dec 6, 2024