website-trigrams

Python script that analyzes Wikipedia articles, extracts specific words related to operators, vehicles, and events, and identifies dates from sentences. It also displays frequently occurring trigrams and the top 10 most frequent bigrams in the corpus using nltk, bs4, requests, and datetime libraries.

Functionality The script fetches articles from multiple Wikipedia URLs, preprocesses the data, and creates a corpus. It then identifies and extracts specific words related to operators, vehicles, and events from the corpus using WordNet synsets.

The script also extracts dates from the sentences and checks for trigrams that occur more than three times in the corpus. Additionally, it displays the top 10 most frequent bigrams found in the text.

Dependencies The script is written in Python 3.10.11 and requires the following libraries to be installed:

nltk: For natural language processing tasks bs4: For web scraping with BeautifulSoup requests: For making HTTP requests to fetch Wikipedia articles datetime: For working with dates Usage Make sure you have Python 3.10.11 installed on your system. Install the required libraries using pip: Copy code pip install nltk bs4 requests Run the script by executing the following command in the terminal: Copy code python capstone.py The script will analyze the Wikipedia articles, display identified words, dates, trigrams, and the top 10 most frequent bigrams. Feel free to modify the URLs in the urls list to analyze different Wikipedia articles.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
capstone.py		capstone.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

website-trigrams

About

Releases

Packages

Languages

arjanssuri/website-trigrams

Folders and files

Latest commit

History

Repository files navigation

website-trigrams

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages