-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#116 Allow token processing "middleware" #144
Open
nsantini
wants to merge
17
commits into
thisandagain:master
Choose a base branch
from
AmbitAI:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 8 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
588de8d
spell checking words
nsantini 90417ad
using Levenshtein distance to spell check
nsantini 131e4b8
checking if word is mispelled before correcting it
nsantini 9d5218f
checking for negating words backwards until end of token or afinn wor…
nsantini 6a0521f
spell checking negation words withouth afinn
nsantini 08409bf
refactorign into files, update unit test to match new findings
nsantini 7e7dc9e
Merging upstream to resolve PR conflicts
nsantini 58b1a70
fixing variable rename
nsantini 52e62cf
moving negation strategy to language module
nsantini d08dfc2
adding unit test for backward search for negation
nsantini aaacd29
making spell check optional and adding unit test
nsantini ecc5de2
using nspell library for spell checking, and loading dictionary synch…
nsantini 6a298f7
Adding readme section about spell checking
nsantini 1f03b7d
Making spell checking available for all languages
nsantini ef9ecda
Documenting API for spell checking
nsantini d22e26f
typo fixed plus readme enhanced about dictioinaries
nsantini 9d25b4b
more specific documentation around dictionaries
nsantini File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
var lev = require('levenshtein'); | ||
var tokenize = require('./tokenize'); | ||
|
||
/** | ||
* Finds the closest match between a statement and a body of words using | ||
* Levenshtein Distance | ||
* | ||
* @param {string} string Input string | ||
* @param {string/array} words List of strings to find closest | ||
* @return {string} The closest word in the list | ||
*/ | ||
module.exports = function(string, words) { | ||
|
||
var shortest = words.toString().length; | ||
var bestFit = ''; | ||
|
||
if (typeof words === 'string') { | ||
words = tokenize(words); | ||
} | ||
|
||
words.forEach(function(word) { | ||
|
||
var distance = lev(string, word); | ||
|
||
if (distance < shortest) { | ||
bestFit = word; | ||
shortest = distance; | ||
} | ||
|
||
}); | ||
|
||
return bestFit; | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
var spelling = require('./spelling'); | ||
|
||
/** | ||
* These words "flip" the sentiment of the following word. | ||
*/ | ||
var negators = { | ||
'cant': 1, | ||
'can\'t': 1, | ||
'dont': 1, | ||
'don\'t': 1, | ||
'doesnt': 1, | ||
'doesn\'t': 1, | ||
'not': 1, | ||
'non': 1, | ||
'wont': 1, | ||
'won\'t': 1, | ||
'isnt': 1, | ||
'isn\'t': 1 | ||
}; | ||
|
||
/** | ||
* Evaluates wether the current token is negated by a previous token | ||
* | ||
* @param {array} afinn words list | ||
* @param {array} tokens list of tokens being evaluated | ||
* @param {int} pos position of the current word in the tokens list | ||
* | ||
* @return {boolean} true if the current pos is being negaed, false otherwise | ||
*/ | ||
module.exports = function negated(afinn, tokens, pos) { | ||
while (pos--) { | ||
if (negators[tokens[pos]]) { | ||
return true; | ||
} | ||
var word = spelling.getSpellCheckedWord(tokens[pos]); | ||
if (negators[word]) { | ||
return true; | ||
} else if (afinn.hasOwnProperty(word)) { | ||
return false; | ||
} | ||
} | ||
return false; | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
var spellChecker = require ('spellchecker'); | ||
var distance = require('./distance'); | ||
|
||
/** | ||
* These two functions atempt to spell check and correct a given word, using | ||
* Levenshtein Distance to choose the most appropriate correction. | ||
* getSpellCheckedAfinnWord also looks for the word to be present on Afinn | ||
*/ | ||
module.exports = { | ||
getSpellCheckedAfinnWord: function (afinn, word) { | ||
if (!afinn.hasOwnProperty(word) && spellChecker.isMisspelled(word)) { | ||
var checked = spellChecker.getCorrectionsForMisspelling(word); | ||
if (checked.length === 0) { | ||
return word; | ||
} else { | ||
var closest = distance(word, checked); | ||
if (closest && afinn.hasOwnProperty(closest)) { | ||
return closest; | ||
} | ||
} | ||
} | ||
return word; | ||
}, | ||
|
||
getSpellCheckedWord: function (word) { | ||
if (spellChecker.isMisspelled(word)) { | ||
var checked = spellChecker.getCorrectionsForMisspelling(word); | ||
if (checked.length === 0) { | ||
return word; | ||
} else { | ||
var closest = distance(word, checked); | ||
if (closest) { | ||
return closest; | ||
} | ||
} | ||
} | ||
return word; | ||
} | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,5 +36,9 @@ | |
}, | ||
"engines": { | ||
"node": ">=8.0" | ||
}, | ||
"dependencies": { | ||
"levenshtein": "^1.0.5", | ||
"spellchecker": "^3.4.3" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I originally forked the repo, here we used to check if the previous word was a negation and invert the score. Now seems like we deal with negation further down. So we end up negating the score twice. Might be that my change to deal with negative words is redundant now