punkt doesn't work as of nltk 3.8.2 #216

tomteecezint · 2024-08-13T10:50:47Z

punkt is loaded in as a pickle file which is not secure CVE-2024-39705 so you have to use punkt_tab now.
This breaks _get_sentence_tokenizer.

In order to use the Tokeniser class I had to override _get_sentence_tokenizer like this:

    def _get_sentence_tokenizer(self, language):
        """ We are overriding this as we need to replace punkt with punkt_tab in sumy"""
        if language in self.SPECIAL_SENTENCE_TOKENIZERS:
            return self.SPECIAL_SENTENCE_TOKENIZERS[language]
        try:
            return PunktTokenizer(language)
        except (LookupError, zipfile.BadZipfile) as e:
            raise LookupError(
                "NLTK tokenizers are missing or the language is not supported.\n"
                """Download them by following command: python -c "import nltk; nltk.download('punkt_tab')"\n"""
                "Original error was:\n" + str(e)
            )

Also change nltk.download('punkt') to nltk.download('punkt_tab')

The text was updated successfully, but these errors were encountered:

tomteecezint · 2024-08-14T08:20:03Z

See this thread - nltk/nltk#3293

miso-belica mentioned this issue Nov 30, 2024

Unable to run sumy in Jupyter Notebook #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

punkt doesn't work as of nltk 3.8.2 #216

punkt doesn't work as of nltk 3.8.2 #216

tomteecezint commented Aug 13, 2024

tomteecezint commented Aug 14, 2024

punkt doesn't work as of nltk 3.8.2 #216

punkt doesn't work as of nltk 3.8.2 #216

Comments

tomteecezint commented Aug 13, 2024

tomteecezint commented Aug 14, 2024