Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run sumy in Jupyter Notebook #217

Open
azamsharpschool opened this issue Aug 19, 2024 · 2 comments
Open

Unable to run sumy in Jupyter Notebook #217

azamsharpschool opened this issue Aug 19, 2024 · 2 comments

Comments

@azamsharpschool
Copy link

I have been trying without success to get sumy to work in Jupyter Notebook. But it is always throwing error for the Tokenizer.

Here is my Jupyter Notebook code:

!python -c "import nltk; nltk.download('stopwords')"

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

text = "Your long text here..."
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = LsaSummarizer()
summary = summarizer(parser.document, 3)  # Summarize to 3 sentences

for sentence in summary:
    print(sentence)

When I run this code I get the following error:


UnpicklingError                           Traceback (most recent call last)
Cell In[22], line 6
      3 from sumy.summarizers.lsa import LsaSummarizer
      5 text = "Your long text here..."
----> 6 parser = PlaintextParser.from_string(text, Tokenizer("english"))
      7 summarizer = LsaSummarizer()
      8 summary = summarizer(parser.document, 3)  # Summarize to 3 sentences

File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:160, in Tokenizer.__init__(self, language)
    157 self._language = language
    159 tokenizer_language = self.LANGUAGE_ALIASES.get(language, language)
--> 160 self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
    161 self._word_tokenizer = self._get_word_tokenizer(tokenizer_language)

File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:172, in Tokenizer._get_sentence_tokenizer(self, language)
    170 try:
    171     path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
--> 172     return nltk.data.load(path)
    173 except (LookupError, zipfile.BadZipfile) as e:
    174     raise LookupError(
    175         "NLTK tokenizers are missing or the language is not supported.\n"
    176         """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n"""
    177         "Original error was:\n" + str(e)
    178     )

What can I do to fix this issue?

@devsdenepal
Copy link

Yep, I'm also getting module import error while using it on Jupyter Notebook:
image

@miso-belica
Copy link
Owner

Hi, it may be related to this issue #216. Maybe try to download a new punkt_tab module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants