Add parenthesis to score expression #13

bunny-therapist · 2024-10-28T17:26:16Z

Closes #10

quesurifn · 2024-10-28T18:00:25Z

@bunny-therapist Do you mind checking against the sample on their website?

http://yake.inesctec.pt/demo.html?doc=Sample1

It looks like the parentheses causes a decrease in accuracy.

quesurifn · 2024-10-28T18:13:38Z

@bunny-therapist Do you have a discord we can discuss this on? I'm looking at this. The scores look better with this version but the actual results seem slightly worse. There seems to be an increase of results with adjectives and I don't think that's what we want.

I'm thinking this could be a scenario where the fix you've posted is uncovering a deficiency somewhere else but I'm not sure.

bunny-therapist · 2024-10-28T18:14:34Z

I am running it against their sample, but I am not getting the same results with either this PR or the pre-existing yake-rust code.

I am trying to create python binding for yake-rust so we can replace LIAAD/yake in our projects. For this reason, I am running tests comparing the yake-rust results to LIAAD/yake - that is how I am finding these issues.

Even with this PR, I am not getting the same as LIAAD/yake nor the homepage. However, I believe that is because there are more issues here. I think this PR fixes one issue, but I still think there are issues related to relatedness and frequency (when I am comparing LIAAD/yake and yake-rust, the discrepancies appear to be coming from those two).

bunny-therapist · 2024-10-28T18:15:01Z

I am pretty sure this is also a bug: #12

But I think there may be more.

bunny-therapist · 2024-10-28T18:16:03Z

Do we expect agreement with the yake homepage? Because in that case we should just use their scores for the tests in the future.

How exactly do you tell the accuracy? I don't know what to look for.

bunny-therapist · 2024-10-28T18:16:52Z

@bunny-therapist Do you have a discord we can discuss this on? I'm looking at this. The scores look better with this version but the actual results seem slightly worse. There seems to be an increase of results with adjectives and I don't think that's what we want.

I'm thinking this could be a scenario where the fix you've posted is uncovering a deficiency somewhere else but I'm not sure.

No I do not have a discord. I have only used the discords of others to discuss their projects. I am not a very experienced discord user.

bunny-therapist · 2024-10-28T18:22:05Z

If I do the changes from #12 together with these changes, we get

        let results: Results = vec![
            ResultItem { raw: "data science".to_owned(), keyword: "data science".to_owned(), score: 0.0599 },
            ResultItem { raw: "Google Cloud Platform".to_owned(), keyword: "google cloud platform".to_owned(), score: 0.0656 },
            ResultItem { raw: "acquiring data science".to_owned(), keyword: "acquiring data science".to_owned(), score: 0.0735 },
            ResultItem { raw: "science community Kaggle".to_owned(), keyword: "science community kaggle".to_owned(), score: 0.0804 },
            ResultItem { raw: "acquiring Kaggle".to_owned(), keyword: "acquiring kaggle".to_owned(), score: 0.0924 },
            ResultItem { raw: "CEO Anthony Goldbloom".to_owned(), keyword: "ceo anthony goldbloom".to_owned(), score: 0.096 },
            ResultItem { raw: "Google Cloud".to_owned(), keyword: "google cloud".to_owned(), score: 0.1085 },
            ResultItem { raw: "Kaggle".to_owned(), keyword: "kaggle".to_owned(), score: 0.1178 },
            ResultItem { raw: "Google".to_owned(), keyword: "google".to_owned(), score: 0.1357 },
            ResultItem { raw: "machine learning".to_owned(), keyword: "machine learning".to_owned(), score: 0.1513 },
        ];

quesurifn · 2024-10-28T18:29:07Z

@bunny-therapist I can't. I'm going based on what I'd expect so not objective by any measure. For me though the science community kaggle and acquiring kaggle results shouldn't be in that top 10 list. Everything else looks okay. The scoring seems generally off though still though better than before because we're in the hundredths place instead of the tenths which is what the top results are on their website.

When I originally released this I remember noticing the issue but thinking the results were close enough.

To answer your question, yes I think we should shoot for their scores.

quesurifn · 2024-10-28T18:32:05Z

I think the play here is to branch off of main, maybe create a "v1.0.0" branch to work towards 1:1 scores with that. That way if we start going the wrong direction we can easily just keep whats in main because I think it works well enough if we can't get to 1:1 scoring.

Let me know your thoughts. And yes, sorry for calling you out like that before. Coffee is kicking in now. I promise to be pleasant.

bunny-therapist · 2024-10-28T18:39:06Z

To work toward their scores, that sounds like a good idea. But we are merging this and looking into the other bug, right?
Or do you mean to do that as part of that branch?

quesurifn · 2024-10-28T18:41:48Z

@bunny-therapist I'll create a v1.0.0-alpha branch and release this as that.

bunny-therapist · 2024-10-28T18:49:10Z

I can post more about the other bugs I reported and what scores we get tomorrow. I have to stop for today since it is getting late here.

quesurifn · 2024-10-28T18:49:57Z

Thanks for your help. I'll release this today sometime.

Add parenthesis to score expression

321a63d

quesurifn changed the base branch from master to v1.0.0-alpha October 28, 2024 18:49

quesurifn merged commit b400f1f into quesurifn:v1.0.0-alpha Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parenthesis to score expression #13

Add parenthesis to score expression #13

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024 •

edited

Loading

quesurifn commented Oct 28, 2024 •

edited

Loading

bunny-therapist commented Oct 28, 2024

bunny-therapist commented Oct 28, 2024

bunny-therapist commented Oct 28, 2024 •

edited

Loading

bunny-therapist commented Oct 28, 2024

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024 •

edited

Loading

quesurifn commented Oct 28, 2024 •

edited

Loading

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024 •

edited

Loading

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024

Add parenthesis to score expression #13

Add parenthesis to score expression #13

Conversation

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024 • edited Loading

quesurifn commented Oct 28, 2024 • edited Loading

bunny-therapist commented Oct 28, 2024

bunny-therapist commented Oct 28, 2024

bunny-therapist commented Oct 28, 2024 • edited Loading

bunny-therapist commented Oct 28, 2024

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024 • edited Loading

quesurifn commented Oct 28, 2024 • edited Loading

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024 • edited Loading

bunny-therapist commented Oct 28, 2024

quesurifn commented Oct 28, 2024

quesurifn commented Oct 28, 2024 •

edited

Loading

quesurifn commented Oct 28, 2024 •

edited

Loading

bunny-therapist commented Oct 28, 2024 •

edited

Loading

quesurifn commented Oct 28, 2024 •

edited

Loading

quesurifn commented Oct 28, 2024 •

edited

Loading

quesurifn commented Oct 28, 2024 •

edited

Loading