Our system does not need to be trained on a particular set of documents, neither it depends on dictionaries, external-corpus, size of the text, language or domain. Here you can compare keyword extraction across state-of-the-art unsupervised approaches like TF.IDF, KP-Miner, RAKE, TextRank, SingleRank, ExpandRank, TopicRank, TopicalPageRank, PositionRank and MultipartiteRank,YAKE,RAKE,RAKE_nltk and supervised methods like KEA and WINGNUS.
Environment Required:
python3.6
Installation:
To install all the dependencies run pip install requirements.txt
Additional Requirements:
python -m pip install pytextrank
python -m nltk.downloader stopwords
python -m nltk.downloader universal_tagset
python -m spacy download en_core_web_sm
python -m spacy link en_core_web_sm en
Usage:
To run the model we have to set the path of the file(.txt) to load and no of words required in the result by setting path and no_of_words variables in main.py file.
Run python3 main.py to run all the models and it returns a dictionary with model name as key and keywords list as value.
Sample Usage:
path='./sampletext.txt'
no_of_words=30
Result:
{'KPMiner': ['yeah', 'okay', 'guys', 'rev ai', 'rev', 'live captions', 'think', 'company', 'speech', 'things', 'know', 'five years', 'captions', 'inaudible', 'accents', 'says', 'something', 'years', 'engineer', 'ellen', 'pretty', 'wanted', 'live', 'surendra', 'chance', 'people', 'great', 'recording', 'indian', 'months'], 'TextRank': ['know', 'want', 'going', 'time', 'customer', 'think', 'Google', 'meeting', 'calls', 'custom', 'need', 'things', 'come', 'guys', 'lot', 'company', 'customers', 'got', 'transcription', 'vocabulary', 'people', 'engine', 'seconds', 'hours', 'use', 'speaker', 'tell', 'stream', 'job', 'let', 'model', 'way'], 'SingleRank': ['own custom vocabulary words', 'custom vocabulary job', 'custom vocabulary id', 'way custom vocabulary', 'custom vocabulary support', 'same custom vocabulary', 'customer account id', 'customer vocabulary', 'custom models', 'customer id', 'rev customer', 'different customers', 'custom voices', 'big enough customer', 'customer support', 'real time streaming', 'medical customer', 'call center thing', 'large customers', 'customer cap effect', 'insignificant comfort customer', 'calls times', 'customer coverage changes', 'customer store', 'custom vocab', 'significant customer', 'customer cabinetry', 'custom terminology', 'referenced customers', 'whichever customer'], 'TopicRank': ['customers', 'meetings', 'time', 'years', 'second one', 'call', 'speakers', 'third party streaming', 'hours', 'right', 'google', 'company', 'things', 'guys', 'model', 'large enterprises', 'people', 'job id', 'chance', 'way', 'lot', 'words', 'real time streaming', 'name', 'bot platform', 'transcription', 'speech maddix', 'number', 'available', 'military stuff'], 'TopicalPageRank': ['custom vocabulary id', 'way custom vocabulary', 'custom vocabulary support', 'same custom vocabulary', 'customer account id', 'customer vocabulary', 'custom models', 'customer id', 'different customers', 'rev customer', 'custom voices', 'customer support', 'big enough customer', 'call center thing', 'real time streaming', 'medical customer', 'large customers', 'own custom', 'calls times', 'customer cap effect', 'customer coverage changes', 'customer store', 'significant customer', 'insignificant comfort customer', 'custom vocab', 'custom terminology', 'referenced customers', 'real time transcription', 'whichever customer', 'customers'], 'PositionRank': ['rev customer', 'custom vocabulary id', 'way custom vocabulary', 'custom vocabulary support', 'same custom vocabulary', 'customer account id', 'custom models', 'customer vocabulary', 'real time streaming', 'customer id', 'different customers', 'third speech company', 'calls times', 'custom voices', 'customer support', 'big enough customer', 'company name', 'rev ai', 'own custom', 'speech engines', 'real time transcription', 'medical customer', 'call center thing', 'large customers', 'customer store', 'custom vocab', 'customer cap effect', 'significant customer', 'insignificant comfort customer', 'customer coverage changes'], 'MultipartiteRank': ['customers', 'years', 'meetings', 'company', 'time', 'second one', 'google', 'call', 'speakers', 'third party streaming', 'right', 'guys', 'hours', 'large enterprises', 'things', 'people', 'seconds', 'stream', 'closed captions', 'real time streaming', 'rev ai', 'bot platform', 'chorus', 'speech maddix', 'lot', 'job id', 'model', 'available', 'way', 'name'], 'Kea': ['yeah', 'okay', 'customers', 'guys', 'speakers', 'bot', 'streaming', 'rev', 'think', 'transcription', 'got', 'company', 'speech', 'stuff', 'talk', 'things', 'know', 'google', 'chorus', 'captions', 'inaudible', 'accents', 'says', 'something', 'use cases', 'years', 'asr', 'banking', 'enterprises', 'voice'], 'WINGNUS': ['guys', 'speakers', 'chorus', 'google', 'bots', 'customers', 'stream', 'transcription', 'use cases', 'rev ai', 'real time streaming', 'years', 'customer vocabulary', 'enterprises', 'hours', 'asr engine', 'company', 'meetings', 'stuff', 'ellen', 'third speech company', 'kore', 'rev live captions', 'machine learning team', 'prem', 'year old startup', 'zoom', 'things', 'chance', 'natural language bots'], 'RAKE': ['on-prem real time', 'swan soca accounted', 'vast cross section', 'fair job edits', 'physical channel separation', 'security press questionnaires', 'zillion security policies', 'complete end implementations', 'evens defend surendra', 'propose custom vocabulary', 'customer cap effect', 'insignificant comfort customer', 'dramatic supports bit', 'decision making process', 'call deflection classification', 'speech maddix prior', 'hundred concurrent calls', 'send duplicate content', 'language model updates', 'huge call centers', 'chorus notetaker joins', 'biggest deal breaker', 'clients job ids', 'employee employee efficiency', 'makes perfect sense', 'custom vocabulary support', 'hundred thousand hours', 'custom vocabulary job', 'front end team', 'health care clients', 'custom vocabulary id'], 'RAKE_nltk': ['000 calls times two minutes times two minutes equals well', 'three seconds plus three seconds plus three seconds', 'customer facing real user facing things', 'already supporting 30 plus languages', 'five second call detached within seconds', 'new security guy like last week', 'seen like complete end implementations', 'actually human trips cause transcription', 'hosting four words per domain', 'build virtual assistant named cora', 'speech monthly using speech morphing', 'guys invited three different people', 'texas use case number one', 'transcribe people speaking british english', 'natural language processing machine learning', 'one year end annual payment', 'like pretty much everybody else', 'audio plus human transcription', 'word boosting would cause us', 'make one apa call pass', 'load like four different models', 'things like true casing', 'really vast cross section', 'volume based tiered model', 'evens defend surendra tried', 'six year old startup', 'things like 25 years', 'medical dictionary per se', 'loading every single time', 'language model updates specifically', 'entire knowledge ontology engine'], 'pytextrank': ['said customer', 'different customers', 'rev customer', 'custom models', 'custom vocabulary', 'customer support', 'customers', 'real time', 'more time', 'customer vocab', 'calls', 'other things', 'api call', 'customer coverage', 'whichever customer', 'huge call centers', 'different models', 'said channel', 'time', 'custom voices', 'other companies', 'other words', 'different speakers', 'delete job id number', 'natural language words', 'things', 'different people', 'custom vocab', 'uk english model', 'real name', 'better things'], 'YAKE': ['yeah', 'google', 'call', 'custom vocabulary', 'customer', 'time', 'english', 'calls', 'custom', 'meeting', 'vocabulary', 'customers', 'english model', 'hours', 'real time', 'model', 'guys', 'things', 'words', 'back', 'company', 'year', 'speaker', 'job', 'years', 'stream', 'asr', 'thing', 'streaming', 'rev']}
-
Notifications
You must be signed in to change notification settings - Fork 0
akhil-bot/python-keyword-extractor-models
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published