Hate speech has become a growing concern in today’s online world. Identifying and combating hate speech is important to maintain healthy and respectful discussions on various platforms. This Hate Speech Detection project is designed to help identify and classify text as hate speech or non-hate speech using machine learning algorithms.
The project consists of a Flask backend with a web frontend to make the model accessible and user-friendly.
- Text Classification: Detects whether the input text contains hate speech or not.
- Machine Learning Model: Trained on labeled data for hate speech detection.
- Web Interface: A simple and interactive UI for users to test the model.
- Data Visualization: Graphical representation of predictions and results.
- Responsive Design: Mobile-friendly interface.
- Light-weight and Efficient: Fast processing of text with minimal delay.
-
Frontend:
- HTML
- CSS
- JavaScript
-
Backend:
- Flask (Python)
-
Machine Learning:
- Scikit-learn (Python)
- Natural Language Processing (NLP)
- Pandas, Numpy
-
Deployment:
- Flask server for backend
- Streamlit for quick deployment (optional)
- Clone or download this repository to your local machine.
- Install all the libraries mentioned in the requirements.txt file with the command pip install -r requirements.txt
- Open your terminal/command prompt from your project directory and run the file main.py by executing the command python app.py
- Go to your browser and type http://127.0.0.1:5000/ in the address bar.
- 🎉Congrats!That's it!
The machine learning model used in this project is based on natural language processing (NLP) techniques. The following steps were used to train the model:
- Data Preprocessing: Cleaning and preparing the dataset. -Tokenization -Removal of stop words -Lemmatization and stemming
- Vectorization: Using CountVectorizer and TF-IDF to convert text to numerical format.
- Model: Various classification models such as Logistic Regression, Naive Bayes, and Support Vector Machine (SVM) were evaluated.
- Evaluation: The best-performing model was selected based on accuracy, precision, recall, and F1-score.
You can find the model training script in the train_and_save_model.py file.
It handles data preprocessing, training, and saving the trained model.
- id: Unique identifier for the text.
- text: The actual text data.
- label: 0 for non-hate speech, 1 for hate speech.