Text similarity using spaCy

Introduction

This text similarity tool was created to suggest to the user how similar two news articles are. The main challenge here is to set accurate limits that separates the different categories of similarity. In the programme I made, similarity results were split into three categories; similar, some similarities and not similar. Which result appearing would be based on the similarity score output by spaCy.

The programme has 2 modes: Online & Offline. Online mode allows user to pull news articles online to compare whereas offline mode have the user upload the articles in either pdf, docx (Word doc) or txt files.

Another feature included is the option to either use spaCy's default large model or spaCy-BERT model.

[Project done in 2020]

Pre-requisites

Able to run spaCy and use the large-model (https://spacy.io/models/en#en_core_web_lg)
spaCy-transformers to use BERT
pyQt5
OCR for pdf

Look through requirements.txt if anything missing

Run

I believe just need to run the .py file

Others

Added own report for reading.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
Report.pdf		Report.pdf
Test Results (Update 30 July).docx		Test Results (Update 30 July).docx
requirements.txt		requirements.txt
stackedwidgetVersion (spaCy + Bert).ui		stackedwidgetVersion (spaCy + Bert).ui
stackedwidgetVersion - spacy+BERT+OCR.py		stackedwidgetVersion - spacy+BERT+OCR.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text similarity using spaCy

Introduction

Pre-requisites

Run

Others

About

Releases

Packages

Languages

HeChengHui/Text-similarity-using-spaCy

Folders and files

Latest commit

History

Repository files navigation

Text similarity using spaCy

Introduction

Pre-requisites

Run

Others

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages