Using Named Entity Recognition as a Discovery Tool

(For a good, broad overview of Named Entity Recognition (NER), please see Wikipedia

Named Entity Recognition comprises two tasks:

Discovering names (Name Detection) in a source document
Classifying those names by the kind of entity to which they refer (people, places, organizations, dates, etc.)

In Named Entity Linking (NEL), a third task is added: associating a named entity with a referent in some authority database.

In a typical pipeline, one program segments a source text into tokens (words, spaces, punctuation marks, etc.), and a second program segments tokens (or contiguous groups of tokens) into lexical items, based on orthography (capitalization) and syntax. The segmentation task can be improved with machine learning, in which an algorithm is taught to recognize particular patterns as names.

The NER task is complicated when the source text is derived from uncorrected OCR. Basic tokenization is often hampered, as are orthographic pattern-matching and syntactic parsing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Named Entity Recognition as a Discovery Tool

Clone this wiki locally