You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
punkt is loaded in as a pickle file which is not secure CVE-2024-39705 so you have to use punkt_tab now.
This breaks _get_sentence_tokenizer.
In order to use the Tokeniser class I had to override _get_sentence_tokenizer like this:
def _get_sentence_tokenizer(self, language):
""" We are overriding this as we need to replace punkt with punkt_tab in sumy"""
if language in self.SPECIAL_SENTENCE_TOKENIZERS:
return self.SPECIAL_SENTENCE_TOKENIZERS[language]
try:
return PunktTokenizer(language)
except (LookupError, zipfile.BadZipfile) as e:
raise LookupError(
"NLTK tokenizers are missing or the language is not supported.\n"
"""Download them by following command: python -c "import nltk; nltk.download('punkt_tab')"\n"""
"Original error was:\n" + str(e)
)
Also change nltk.download('punkt') to nltk.download('punkt_tab')
The text was updated successfully, but these errors were encountered:
punkt is loaded in as a pickle file which is not secure CVE-2024-39705 so you have to use punkt_tab now.
This breaks
_get_sentence_tokenizer
.In order to use the Tokeniser class I had to override
_get_sentence_tokenizer
like this:Also change
nltk.download('punkt')
tonltk.download('punkt_tab')
The text was updated successfully, but these errors were encountered: