Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for 2-grams #158

Open
cristiano-belloni opened this issue Oct 29, 2018 · 2 comments
Open

Support for 2-grams #158

cristiano-belloni opened this issue Oct 29, 2018 · 2 comments
Labels

Comments

@cristiano-belloni
Copy link

cristiano-belloni commented Oct 29, 2018

Hello,
I'm trying to override the AFINN scores for 2-grams, but it doesn't seem to work:

sentiment.analyze( 'This stuff is made up', { extras: { 'made up': -1 } } )

{ score: 0,
  comparative: 0,
  tokens: [ 'this', 'is', 'made', 'up' ],
  words: [],
  positive: [],
  negative: [] }

The effect is even more accentuated when a 2-gram would flip the overall score of a phrase; here "fucking good" reinforces a positive word, but the overall score is -1:

sentiment.analyze( 'This stuff is fucking good', { extras: { 'fucking good': 3 } } )
{ score: -1,
  comparative: -0.2,
  tokens: [ 'this', 'stuff', 'is', 'fucking', 'good' ],
  words: [ 'good', 'fucking' ],
  positive: [ 'good' ],
  negative: [ 'fucking' ] }
>

Would it be possible and a good idea to add support for overridden 2-grams?

@martin-richter-uk
Copy link

You could possibly add something like this to your code:


let negativePhrases = ['refund', 'drop in revenue']
let positivePhrases = ['high-end', 'new product']

export const analyzeSentiment = (text) => {

    let sentiment = new Sentiment();
    let result = sentiment.analyze(text);

    [...negativePhrases, ...positivePhrases].forEach((phrase, index) => {
        if(text?.toLowerCase().includes(phrase?.toLowerCase()) && result.words.indexOf(phrase?.toLowerCase()) === -1){
            let obj = {}
            if(index < negativePhrases.length){
                obj[phrase] = -3
            }else{
                obj[phrase] = 3
            }
            result.calculation.push(obj)
        }
    })

    let values = [];
    result.calculation.forEach((obj) => {
        values.push(Object.values(obj)?.[0])
    })

    result.comparative = average(values);
    return result;
}

export const average = arr => {
    if(arr?.length === 0 || arr === undefined){
        return 0
    }
    return arr.reduce((p, c) => p + c, 0) / arr.length
};

@Siddharth-Latthe-07
Copy link

@cristiano-belloni The issue you're encountering arises because the sentiment library's default tokenizer does not recognize multi-word expressions (like "made up" or "fucking good") out of the box. The library processes the text word by word, so multi-word phrases in the extras dictionary aren't being matched correctly.
To handle multi-word expressions, you need to preprocess the text to identify and replace multi-word phrases with a single token before passing it to the sentiment analyzer.

possible solution:-

  1. Preprocessing Text for Multi-word Expressions
    1.a Preprocess the Text:
    Replace multi-word phrases with single tokens before analyzing the sentiment.
    1.b Analyze Sentiment:
    Pass the preprocessed text to the sentiment analyzer.

sample code snippet in js, which might help you:-

const Sentiment = require('sentiment');
const sentiment = new Sentiment();

function preprocessText(text, multiWordPhrases) {
    // Replace multi-word phrases with single tokens
    for (let phrase in multiWordPhrases) {
        const token = phrase.replace(/\s+/g, '_');
        const regex = new RegExp(phrase, 'gi');
        text = text.replace(regex, token);
    }
    return text;
}

function analyzeSentiment(text, extras) {
    const multiWordPhrases = extras;
    const preprocessedText = preprocessText(text, multiWordPhrases);

    // Adjust the extras object to match the preprocessed tokens
    const adjustedExtras = {};
    for (let phrase in multiWordPhrases) {
        const token = phrase.replace(/\s+/g, '_');
        adjustedExtras[token] = multiWordPhrases[phrase];
    }

    // Analyze the sentiment of the preprocessed text
    return sentiment.analyze(preprocessedText, { extras: adjustedExtras });
}

// Example usage
const text1 = 'This stuff is made up';
const extras1 = { 'made up': -1 };
console.log(analyzeSentiment(text1, extras1));

const text2 = 'This stuff is fucking good';
const extras2 = { 'fucking good': 3 };
console.log(analyzeSentiment(text2, extras2));

hope this helps,
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants