Penguin dataset where we want to predict the species of penguins based on certain feature using appropriate model.
About penguin dataset
It is a great intro dataset for data exploration & visualization
The dataset consists of 7 columns.
-species: penguin species (Chinstrap, Adélie, or Gentoo) -culmen_length_mm: culmen length (mm) -culmen_depth_mm: culmen depth (mm) -flipper_length_mm: flipper length (mm) -body_mass_g: body mass (g) -island: island name (Dream, Torgersen, or Biscoe) in the Palmer Archipelago (Antarctica) -sex: penguin sex
What are culmen length & depth?
The description of dataset and detailed exploratory analysis of it is done in predicting_penguins.ipynb file PLEASE go through it
Workflow:-
Done through different plots.
Understand the data and make the dataset ready to be fitted in model(label encoder,normalization).
As seen from the step 2, Knc has the best parameters to satisfy our dataset.
- KNeighborsClassifier is one of the simplest Machine Learning algorithms based on Supervised Learning technique.
- KNC algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories.
- KNC algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using KNC algorithm.