Skip to content

Latest commit

 

History

History
49 lines (38 loc) · 2.06 KB

README.md

File metadata and controls

49 lines (38 loc) · 2.06 KB

Data-Mining

Project 1 - Movie Recommendations

The dataset of this project contains Netflix movies.

Part 1

In the first part of the project we observe the dataset and produce statistics about the content of the dataset. Some of the statistics are:

  • Number of movies/series.
  • Country with the most content.
  • Year with the most content.
  • The popularity of each genre for every country.

Part 2

We implement a recommendation system in order to recomend similar movies to a given movie. In order to represent each movie we tried the two following representations:

In order to compute the similarity between the repsesentations we used:

PS: if the notebook cannot be opened on github, you can view it via the Jupiter nbviewer:

Project 2 - Fake/True News Classification

Given a dataset with news articles we should train a model that classifies each article as fake or True. We try different ways to represent the text of each article, such as:

  • Bag Of Words
  • TF-IDF
  • Word2Vec

Also, we use different models in order to compare their performance. The models that we used are:

  • Logistic Regression
  • Naive Bayes
  • Support Vector Machines (SVM)
  • Random Forest

Finaly we compare the performance between every combination of representation/model.