Repository Overview

The first section of this application involves scraping NBA statistics from Basketball Reference in order to train a machine learning model we can use to generate time-series predictions for each player, in regards to their chance of winning the MVP for a certain year.

Data Collection

We use Python to scrape data from Basketball Reference, a website that provides basketball statistics and player data from the past 30 years in the NBA.

Source: https://www.basketball-reference.com/
Datasets: MVP Data, Player Statistics, Team Statistics

Scraping MVP Data

The following block of code scrapes MVP data from the past 30 years (1991 to 2022), then creates an HTML file for each year. After we are finished webscraping, we will extract the relevant data from each HTML file, and convert those files into individual CSV files.

def Scrape_MVP():   
    years = range(1991, 2022)
    url_start = "https://www.basketball-reference.com/awards/awards_{}.html"
    for year in years:
        url = url_start.format(year)
        data = requests.get(url)        
        with open("mvp/{}.html".format(year), "w+") as f:
            f.write(data.text)

Scraping NBA Player Statistics

The webpages that contain NBA player statistics contain dynamic content, making it a challenge to scrape all of the data we need in order to train our machine learning model. However, we address this problem using a Selenium chrome driver in order to loa block of code scrapes MVP data from the past 30 years (1991 to 2022), then creates an HTML file for each year. After we are finished webscraping, we will extract the relevant data from each HTML file, and convert those files into individual CSV files.

Preparing Datasets - CSV Files

After collecting the MVP data from the past 30 years, we iterate every year in our 'mvp' folder and apply the following operations to each HTML file:

Create an empty array that will be used to store multiple dataframes
Extract relevant information from each HTML file, specifying the 'id' attribute of the table we need.
Use Pandas to read the HTML table using Pandas as a dataframe
Create a CSV file in the "mvp" folder using the .to_csv() operation

def Parse_MVP(years):
    dfs = []
    years = range(1991, 2022)
    for year in years:
        with open(f"mvp/{year}.html") as f:
            page = f.read()
        soup = BeautifulSoup(page, "html.parser")
        soup.find('tr', class_="over_header").decompose()
        mvp_table = soup.find(id="mvp")
        mvp_df = pd.read_html(str(mvp_table))[0]
        mvp_df["Year"] = year        
        dfs.append(mvp_df)
    mvps = pd.concat(dfs)
    mvps.to_csv("mvp/mvps.csv")

Avoid overloading website by sending too many requests
Google Chrome Version: 102.0.5.005.61
Selenium Chrome Driver: https://chromedriver.storage.googleapis.com/index.html?path=102.0.5005.61/

Run in Command-Line:

xattr -d com.apple.quarantine /Users/danieldayto/Downloads/chromedriver

Machine Learning

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms.

Ridge Regression

Ridge regression is a regularization technique that performs L2 regularization. It modifies the loss function by adding the penalty equivalent to the square of the magnitude of coefficients.

  train = stats[stats["Year"] < 2021]
  test = stats[stats["Year"] == 2021]
  reg = Ridge(alpha=.1)
  reg.fit(train[predictors], train["Share"])
  predictions = reg.predict(test[predictors])
  predictions = pd.DataFrame(predictions, columns=["Predictions"], index=test.index)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
mvp		mvp
player		player
team		team
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
cleaning.py		cleaning.py
data-collection.py		data-collection.py
ml.py		ml.py
player_mvp_stats.csv		player_mvp_stats.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository Overview

Data Collection

Scraping MVP Data

Scraping NBA Player Statistics

Preparing Datasets - CSV Files

Machine Learning

Ridge Regression

About

Releases

Packages

Contributors 2

Languages

ddayto21/NBA-Time-Series-Forecasts

Folders and files

Latest commit

History

Repository files navigation

Repository Overview

Data Collection

Scraping MVP Data

Scraping NBA Player Statistics

Preparing Datasets - CSV Files

Machine Learning

Ridge Regression

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages