-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor evaluation to allow span based metrics #71
base: feature/new-datagen-and-eval
Are you sure you want to change the base?
Refactor evaluation to allow span based metrics #71
Conversation
tranguyen221
commented
Jan 30, 2023
- Implement the compare_span function in Evaluator class which takes lists of annotated_span and predicted_span and then generates the evaluation output in numbers
- Implement the get_overlap_ratio function in Span class to calculate the overlapping ratio between annotated and predicted offsets.
- Add the unittest for those functions
A general comment regarding the |
@melmatlis I agree, but suggest to wait with this until we finalize all the changes, in order not to make unnecessary conflicts. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Publishing initial comments as discussed in our code review
|
||
|
||
class Evaluator: | ||
def __init__( | ||
self, | ||
verbose: bool = False, | ||
compare_by_io=True, | ||
entities_to_keep: Optional[List[str]] = None, | ||
entities_to_keep=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a list of entities? If the logic has changed, please update the name of the argument and its docstring
@@ -37,6 +40,25 @@ def __init__( | |||
self.entities_to_keep = entities_to_keep | |||
self.span_overlap_threshold = span_overlap_threshold | |||
|
|||
# setup a dict for storing the span metrics | |||
self.span_model_metrics = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on having this as a class/dataclass?
@@ -0,0 +1,156 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this file move into the evaluation
folder?
""" | ||
Given a predicted_span, get the best matchest annotated_span based on the overlap_threshold. | ||
Return a SpanOutput | ||
:param sample: InputSample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please align docstring with function signature
from presidio_evaluator.evaluation import SpanOutput | ||
|
||
|
||
def get_matched_gold(predicted_span: Span, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method used? If not, consider removing. If yes, is the logic aligned with the docstring?
"""Find the overlap between two ranges | ||
Find the overlap between two ranges. Return the overlapping values if | ||
present, else return an empty set(). | ||
Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add params to docstring, change the example format (:Example:
) and add type hints
@@ -73,6 +74,14 @@ def intersect(self, other, ignore_entity_type: bool): | |||
return min(self.end_position, other.end_position) - max( | |||
self.start_position, other.start_position | |||
) | |||
|
|||
def get_overlap_ratio(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_overlap_ratio(self, other): | |
def get_overlap_ratio(self, other: "Span") -> float: |
I know we don't have type hints across the entire codebase, but let's try to update at least the methods we add to modernize the codebase.
from pathlib import Path | ||
from copy import deepcopy | ||
from difflib import SequenceMatcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove unused imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Trang, thanks for the PR. A lot of work went into the complicated modifications. Thank you for these.
May we please have a peer review session on the evaluator.py
file. I need some guidance on the code readability. Thank you
@@ -73,6 +74,14 @@ def intersect(self, other, ignore_entity_type: bool): | |||
return min(self.end_position, other.end_position) - max( | |||
self.start_position, other.start_position | |||
) | |||
|
|||
def get_overlap_ratio(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_overlap_ratio(self, other): | |
def get_overlap_ratio(self, other: Span): |
""" | ||
Calculates the ratio as: ratio = 2.0*M / T , where M = matches , T = total number of elements in both sequences | ||
""" | ||
nb_matches = self.intersect(other, ignore_entity_type = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we always want to ignore_the entity type? Perhaps we should pass it as and argument to the function?
""" | ||
nb_matches = self.intersect(other, ignore_entity_type = True) | ||
total_characters = (self.end_position - self.start_position) + (other.end_position - other.start_position) | ||
return np.round((2*nb_matches/total_characters), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any theoretical chance that total_characters
will be equal to 0?
[ | ||
(150, 153, "123", "A", 150, 153, "123", "A", True), | ||
(150, 153, "123", "B", 150, 153, "123", "A", False), | ||
(150, 153, "123", "A", 150, 153, "345", "A", False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that the same range will have different entity values?
@@ -1,27 +1,30 @@ | |||
from collections import Counter | |||
from typing import List, Optional, Dict, Tuple | |||
from pathlib import Path | |||
from copy import deepcopy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tranguyen221 May we please do peer review on this file?
def __eq__(self, other): | ||
return ( | ||
self.output_type == other.output_type | ||
and self.overlap_score == other.overlap_score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -22,57 +22,79 @@ def get_matched_gold(predicted_span: Span, | |||
overlap_score=0 | |||
) | |||
|
|||
def find_overlap(true_range, pred_range): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this file to be part of the evaluation
directory as well? what is the logic of what is included/excluded from the directory?
def span_compute_actual_possible(results: dict) -> dict: | ||
""" | ||
Takes a result dict that has been output by compute metrics. | ||
Returns the results dict with actual, possible populated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to update the doc string further and to explain what's "Actual" and "possible" refer to.
Add the formulas into the docstring as well
calculating precision and recall. | ||
""" | ||
|
||
actual = results["actual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the schema you created, we should have a class named EvaluationResult. Are the results of type dict as an intermediate solution?