Skip to content

Commit

Permalink
Avoid exceptions in POS Tagging of sentence with many non-words
Browse files Browse the repository at this point in the history
Do not add a final state to the trellis diagram for sentence tagging if the end of the sentence is not yet reached. This seems to solve a problem with tagging a sentence with many strange words.
  • Loading branch information
wartaal authored Nov 15, 2020
1 parent cadb349 commit a5fe138
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion HanTa/HanoverTagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ def tag_sent_viterbi(self, sent, casesensitive = True):
elif casesensitive:
cs = True
wprobs = dict(self.tag_word(w,casesensitive=cs,conditional=True))
if len(wprobs) == 1 and 'UNKNOWN' in wprobs: #This should not occur but can result from erong settings
if len(wprobs) == 1 and 'UNKNOWN' in wprobs: #This should not occur but can result from wrong settings
wprobs = {}
row = {}
backpointer.append({})
Expand All @@ -232,6 +232,8 @@ def tag_sent_viterbi(self, sent, casesensitive = True):
for c, lp_tc in lp_t:
if c not in wprobs and len(wprobs) > 0:
continue
if c == '<END>': #2020-11-11 We are not in the last row, so adding state <END> makes no sense
continue
if len(wprobs) == 0: #If the word is unknown anything goes
lpwc = 0
else:
Expand Down

0 comments on commit a5fe138

Please sign in to comment.