This repository is my playground for machine learning related topics. I use it to save some results of my experiments with Python and machine learning algorithms.
I hope you find something useful here.. 🙂
First, get an understanding of the data, pandas is a great tool for this.
Example
import pandas as pd
# Import dataset
data_set = pd.read_csv('my_dataset.csv')
# Have a look at the raw entries
print(data_set.values)
# Statistical information about the dataset
data_set.describe()
A next step would be to clean the data. This involves removing duplicates and removing entries with missing values, because these can cause problems when training the model.