-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite of spacy_install #243
Conversation
I changed the installation.rmd. Now all that is left is to adapt the tests. |
Unfortunatly, it seems that library(spacyr)
txt <- "This: £ = GBP! 15% not! > 20 percent?"
spacy_tokenize(txt, remove_symbols = TRUE, padding = FALSE)
#> successfully initialized (spaCy Version: 3.6.1, language model: en_core_web_sm)
#> $text1
#> [1] "This" ":" "=" "GBP" "!" "15" "%"
[8] "not" "!" ">" "20" "percent" "?" Created on 2023-08-31 with reprex v2.0.2 This is due to an upstream change in spaCy though. import spacy
doc = nlp("This: £ = GBP! 15% not! > 20 percent?")
for t in doc:
print(t, t.pos_)
>>> This PRON
>>> : PUNCT
>>> £ PROPN
>>> = X
>>> GBP PROPN
>>> ! PROPN
>>> 15 NUM
>>> % NOUN
>>> not PART
>>> ! PUNCT
>>> > X
>>> 20 NUM
>>> percent NOUN
>>> ? PUNCT To test if something is a symbol, you use |
There are a few more inconsitencies that lead to failing tests. Where I think those have nothing to do with this PR, but instead are upstream changes, I skip them with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really welcome @JBGruber. Sorry it took me so long to get to it. I incremented the verson to 1.3. It's full of breaking changes but only to spacy_install()
. So be it.
Addresses #236. I use some newer functions from
reticulate
that were introduced sincespacyr
was developed. The installation process is now significantly easier (but there are also less options available for users). The default is to installspaCy
in a virtual environment managed byreticulate
(after checking if a suitable Python bin is available and installing it if not).I tested it on two Linux machines (Arch and Debian) and Windows 10+11. On the Arch machine I also installed the GPU version. Apple silicone can also be used easily. The installation worked without hiccups (once
C
dependencies andreticulate
were updated).I assume that most people can now run
spacy_install()
without prior knowledge or any specific setup.But I still want to document how one could use a manual install by setting eitherI think this should be a vignette rather than the landing page of https://spacyr.quanteda.io (but that might be a different PR).SPACY_PYTHON
orRETICULATE_PYTHON
-- which should also be used as a troubleshooting guide.Let me know what you think 😁