Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New "combined score" for entity linking when place of pub is known #277

Open
thobson88 opened this issue Sep 13, 2024 · 0 comments
Open
Assignees
Labels
enhancement New feature or request entity linking

Comments

@thobson88
Copy link
Collaborator

Linker algorithm using place of publication & combined scores

The first two steps of the pipeline are unchanged:

  • identify toponyms (Recogniser)
  • rank candidates with deezymatch (Ranker)

The third step, disambiguation via the Linker, is new. We assume the place of publication wqid and latlon are known.

For each identified toponym:

  • if the place of publication wqid is found in the list of candidates and is not the prediction:
    • get string match score for the place of publication (from the deezymatch scores computed by the Ranker)
    • if the string match score for the place of publication is greater than 0.01:
      • set the prediction to the place of publication
      • exit
  • attach the place of publication information to the sentence
  • for each candidate:
    • compute the REL+pub cross_cand_score
    • if the latlon coordinates for the candidate are not known/available:
      • set the combined_score equal to the cross_cand_score
    • if the wikidata popularity for the candidate is not known/available:
      • set the combined_score equal to the cross_cand_score
    • if both the candidates latlon coords and popularity are computable:
      • compute the popularity (i.e. the Wikidata "most popular" score)
      • compute the proximity
      • set the combined_score = cross_cand_score * max(popularity, proximity)
  • rank candidates by the combined score and set the prediction to be the top one
  • exit

See also: https://github.com/Living-with-machines/data-culture-newspapers/issues/17

@thobson88 thobson88 self-assigned this Sep 13, 2024
@thobson88 thobson88 added enhancement New feature or request entity linking labels Sep 13, 2024
@thobson88 thobson88 changed the title Implement a new "combined score" for entity linking when place of publication is known New "combined score" for entity linking when place of publication is known Sep 13, 2024
@thobson88 thobson88 changed the title New "combined score" for entity linking when place of publication is known New "combined score" for entity linking when place of pub is known Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request entity linking
Projects
None yet
Development

No branches or pull requests

1 participant