Allow token processing "middleware" #116

mlucool · 2017-04-14T12:15:34Z

Hi,

It is possible to allow for a option which first finds string distances to words in the positive/negative list, and then, if it is above some threshold, categorize it as that word so spelling mistakes and/casual writing style are not lost.

e.g.

> sentiment('Cats are dumb');
{ score: -3,
  comparative: -1,
  tokens: [ 'cats', 'are', 'dumb' ],
  words: [ 'dumb' ],
  positive: [],
  negative: [ 'dumb' ] }
> sentiment('Cats are dumbbb');
{ score: 0,
  comparative: 0,
  tokens: [ 'cats', 'are', 'dumbbb' ],
  words: [],
  positive: [],
  negative: [] }

This example dumbbb is so close to dumb that it should be classified as such. Using a library like natural makes this easy.

require('natural').JaroWinklerDistance('dumb', 'dumbbb')
0.9333333333333333

If adding natural is out of scope, maybe a way that someone could inject it in some processing step could work too.

What do you think? Would this work?

The text was updated successfully, but these errors were encountered:

thisandagain · 2017-04-16T20:49:38Z

Good question! Using edit distance for matching is a really interesting use case. I'm going to modify your title to make this a little more generic, but this is certainly something I'd be interested in supporting.

ghost · 2017-04-17T02:08:24Z

This is exactly what I'm seeing as well with the casual comments and expression social media. +1 for this 👍

mlucool · 2017-04-17T15:46:22Z

It looks very easy to add. Here just allow for an optional callback that supports something like:

function middleware(text, value, wasNegated, afinn) {
     if(value !== 0) return value; // I can easily modify affinity here
     // do search on afinn here for closest word
    return (afinn[closest] || 0) * (wasNegated ? -1 : 1); // Don't really write code like this
}

This will allow for a range of middleware that could do things like chain to apply different techniques if a simpler or faster one fails to work

tuxton · 2017-07-28T15:48:54Z

A different approach of this, could be filter the words with a spellchecker like this https://github.com/atom/node-spellchecker I dont know if this could attempt with the benchmarks but in my case would be great in order to apply another filters like gender guessing and topic classification.

Cheers!

nsantini · 2017-09-12T02:19:49Z

Hi, I took the liberty of forking this great repo to add some features I needed, and they go inline with whats described in this issue.

I added node-spellchecker to check for typos, and also "levenshtein" to find the closest spell correction to the original word.

I also modified the "negation" feature to look backwards until a negation word or a new afinn word is found, to cover cases like "not too bad".

Feel free to check the master branch on https://github.com/AmbitAI/sentiment

Im happy to create a PR with part of the changes or the whole thing, depending on whats in line with the direction of the library.

thisandagain · 2018-02-08T17:41:58Z

@nsantini I think I would be interested in all three of these features as long as they were added in a way that is optional (as to preserve performance for those who need it). Curious to see how each of these features impact the validation tests (make validate).

pdw207 · 2018-06-13T15:27:48Z

@thisandagain Is this still issue still open? @nsantini Did you still want to create a PR with this feature? Let me know if if it would be helpful pitching in.

thisandagain · 2018-06-13T17:50:20Z

@pdw207 This is still open pending a PR. I'd be happy to review a PR from you if you want to pickup where @nsantini left off.

nsantini · 2018-06-13T18:26:32Z

PR #144

It has some merge conflicts. I'll try go solve them, but feel free to take over, I havent looked into this for a while

nsantini · 2018-06-13T18:38:46Z

solved the merge conflicts, but somebody more familiar with the changes that happened since I forked need to validate them :)

nsantini · 2018-06-13T19:01:49Z

So, the sync_negation test case is failing. Looks like since I forked the logic to deal with negation of sentence has changed, so my change is double negating the score. But not sure where to look for the new logic

thisandagain added the question label Apr 16, 2017

thisandagain added feature help wanted and removed question labels Apr 16, 2017

thisandagain changed the title ~~Misspellings~~ Allow token processing "middleware" Apr 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow token processing "middleware" #116

Allow token processing "middleware" #116

mlucool commented Apr 14, 2017

thisandagain commented Apr 16, 2017

ghost commented Apr 17, 2017

mlucool commented Apr 17, 2017

tuxton commented Jul 28, 2017

nsantini commented Sep 12, 2017

thisandagain commented Feb 8, 2018

pdw207 commented Jun 13, 2018

thisandagain commented Jun 13, 2018

nsantini commented Jun 13, 2018

nsantini commented Jun 13, 2018

nsantini commented Jun 13, 2018

Allow token processing "middleware" #116

Allow token processing "middleware" #116

Comments

mlucool commented Apr 14, 2017

thisandagain commented Apr 16, 2017

ghost commented Apr 17, 2017

mlucool commented Apr 17, 2017

tuxton commented Jul 28, 2017

nsantini commented Sep 12, 2017

thisandagain commented Feb 8, 2018

pdw207 commented Jun 13, 2018

thisandagain commented Jun 13, 2018

nsantini commented Jun 13, 2018

nsantini commented Jun 13, 2018

nsantini commented Jun 13, 2018