Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Metric for COT #3159

Closed
wants to merge 16 commits into from
Closed

Added Metric for COT #3159

wants to merge 16 commits into from

Conversation

siyagoel
Copy link
Contributor

Created metric for GPQA COT Prompting

@siyagoel siyagoel requested a review from yifanmai November 14, 2024 08:14
Copy link
Collaborator

@yifanmai yifanmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Requested some minor changes.

@@ -88,11 +88,6 @@ metrics:
short_display_name: PEM
description: Fraction of instances that the predicted output matches the prefix of a correct reference up to light processing.
lower_is_better: false
- name: ifeval_strict_accuracy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove this metric.

metric_specs=get_exact_match_metric_specs()
+ [
MetricSpec(class_name="helm.benchmark.metrics.chain_of_thought_metric.ChainOfThoughtMetric", args={}),
], # TODO: update this after cot metric is ready
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment.



class ChainOfThoughtMetric(Metric):
"""Replacement for BasicGenerationMetric for AIRBench 2024."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update docstring to reflect what this metric does.

from helm.benchmark.metrics.metric_service import MetricService
from helm.benchmark.metrics.statistic import Stat

import re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line to before from typing import List - see the imports section on PEP-8 style guide under "Imports should be grouped in the following order".

return match.group(1)

# If neither regex matches, return "N/A"
return "N/A"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return None if neither regex matches. Also update the type signature to Optional[str] to reflect this.

output_text = request_state.result.completions[0].text

# Extract the answer using the updated logic
extracted_answer = extract_answer(output_text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_text could be uninitialized here. You can fix this by making the output_text initialization unconditional, and making the if condition an assert instead.

correct_answer = chr(65 + index) # Translate index (0 -> A, 1 -> B, etc.)
break

print(request_state.instance.id, correct_answer, extracted_answer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove print.

if option.is_correct:
correct_answer = chr(65 + index) # Translate index (0 -> A, 1 -> B, etc.)
break

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise an exception after the for loop if there is no correct answer.

@yifanmai
Copy link
Collaborator

I see that you made the requested changes in a new pull request #3162. In general, please make the requested changes to a pull request in the same pull request / branch, rather than creating a new pull request for every cycle of requested changes.

If you're unfamiliar with GitHub in general, I would suggest reading the GitHub documentation and / or the Git book.

@siyagoel siyagoel closed this Dec 6, 2024
@siyagoel
Copy link
Contributor Author

siyagoel commented Dec 6, 2024

Redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants