This repository contains jupyter notebooks related to multiple data science projects completed using the PGA tour dataset I created. Please read the file descriptions below.
- PGAtour.com Web Scraper - Contains code used to scrape the pgatour.com website for PGA tour player statistics from 2007-2017.
- pgatour_raw.db - Sqlite database file containing pgatour player data scraped in file 1.
- pgatour_raw.csv - CSV file containing raw data from file #2.
- pgatour_cleaned.csv - CSV file containing cleaned version of pgatour_raw.csv. The process I used to clean this data can be found in the PGA tour - EDA notebook in this repository.
- PGA Tour Machine Learning Project - Classification.ipynb - Contains a machine learning project focused on classifying players as tournament and non-tournament winners.
- PGA Tour - EDA - Contains exploratory data analysis for the dataset in the pgatour_raw.db database file. This EDA includes data cleaning and formatting, feature investigation, and a thorough analysis of the PGA tour statistics collected over time.
More files will be added to this repository as I continue to develop more project ideas associated with this dataset.