Processing File

Installation

Install necessary packages with pip

  pip install scikit-learn
  pip install nltk
  pip install scipy

Processing File

This program computes a TF-IDF Weighted Term document Incident matrix and a text file containing an Inverted Index.

Program Details

The program first reads the text files stored in the files directory of the project and then preprocess the document by removing special characters, stopwords and stemming. After that it computes the TF-IDF Weighted Term document Incident matrix and store it in the project root directory in a npz (NumPyZipped) file. It builds a text file containing the inverted index of terms and their corresponding document.

Prerequisites

Python 3
os
re
scipy
nltk
sklearn

Usage

Place input text files in the files directory of the root of project before executing the preprocess.py file program

To run the program, open a terminal window and navigate to the projects's directory and use the command

python preprocess.py

Output

Output of this program is a npz file containing TF-IDF Weighted Term document Incident matrix and a text file containing an Inverted Index.

Query File

This program calculates and prints the cosine similarity score of the relevant documents on the console or terminal.

Prerequisites

Python 3
re
numpy
scipy
sklearn

Usage

To run the program, open a terminal window and navigate to the projects's directory and use the command

python query.py

After that the program will ask the path of the query file,so enter the query file path

Output

The program will print the cosine similarity between the two documents in decreasing order of their cosine similarity.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
files		files
README.md		README.md
inverted_index.txt		inverted_index.txt
preprocess.py		preprocess.py
query.py		query.py
tfidf_matrix.npz		tfidf_matrix.npz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Processing File

Program Details

Prerequisites

Usage

Output

Query File

Prerequisites

Usage

Output

About

Releases

Packages

Languages

Shivamrai15/Text-Similarity

Folders and files

Latest commit

History

Repository files navigation

Installation

Processing File

Program Details

Prerequisites

Usage

Output

Query File

Prerequisites

Usage

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages