Description: This GitHub repository showcases the work undertaken for the Foundation of Computer Science examination, the first exam for the Master's Degree in Data Science. The project focuses on the Dogs Adoptions dataset, comprising three key files: dogs.csv, dogTravel.csv, and NST-EST2021-POP.csv, contained in the 'files' folder.
Project Tasks:
1. Identifying Non-Adoptable Dogs: Extracting dogs with a status that indicates non-adoptability.
2. Breed Analysis: Determining the count of dogs for each primary breed.
3. Mixed Breed Ratio: Calculating the ratio between Mixed Breed and non-Mixed Breed dogs for each primary breed, with consideration of secondary breeds.
4. Temporal Analysis: Establishing the earliest and latest posted timestamps for each primary breed.
5. Sex Imbalance by State: Computing the sex imbalance (difference between male and female dogs) for each state and identifying the state with the largest imbalance.
6. Duration and Cost Analysis: Determining the average duration and cost of stay for each age and size pair.
7. Highly Traveled Dogs: Identifying dogs involved in at least 3 travels and listing their respective breeds.
8. Data Cleaning: Rectifying the travels table to accurately compute the state from manual and found fields. Prioritizing manual if available.
9. Travel-to-Population Ratio: Calculating the ratio between the number of travels and the population for each state.
10. Days Since Posted: Computing the number of days from the posted day to the day of last access for each dog.
11. Temporal Partitioning: Partitioning dogs based on the number of weeks from the posted day to the day of last access.
12. Duplicate Detection: Identifying duplicate records in the dogs dataset based on breed, sex, and textual description. Considering a refined approach that involves a 90% word similarity in the description field.
Contribution: This project demonstrates the practical application of data science techniques to analyze and derive insights from a real-world dataset related to dog adoptions.