-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with punctuation and context building #14
Comments
@xamgore - just making sure you saw this. Maybe you found a simpler way to solve this? |
Python code is a mess, really no way for me to comprehend it right now 😄 maybe on the weekend I've seen that |
Yeah, but vocabulary is different from candidates apparently. The "buffer/buffer_words" vec we have end up with elements like "!" and "?" which then affect ctx.0/1 and thus wr, and wl, and thus frequency and relatedness. |
Hm, right. Can the candidate words have punctuation inside like |
Based on the python code, it skips words that are composed entirely of punctuation. So "abc!?def" would be ok, but "!?" would not. |
The created contexts contain punctuation symbols. If a word is just composed of punctuation symbols, it should be skipped and the buffer emptied.
I fixed this is my branch here: bunny-therapist@a9c3a99
The relevant part in LIAAD/yake is here: https://github.com/LIAAD/yake/blob/master/yake/datarepresentation.py#L59
The "exclude" chars in LIAAD/yake are what is called "punctuation" in yake-rust.
The text was updated successfully, but these errors were encountered: