05 Dec 04:01

oseymour

9c11fab

v3.2.0 Latest

Latest

FBref

Added Saudi Pro League
Added logic to handle a case where some match pages have the date in a different element
Fixed an issue where some matches that have been abandoned/forfeit have an "*" next to a team's score. Scores are now parsed as strings, not ints, to accommodate this.
Changed some of the logic in scrape_stats() to better handle Big 5 Leagues competition vs. not
Added a warning that prints if player stats tables don't load in time (usually because they're not present for a certain stat in certain year-league)
Added some tests

Oddsportal

Removed this module's file (it was never imported)

Sofascore

Added Saudi Pro League
Added a scrape_match_shots() function
Changed some warning text if requests don't get status code 200
Added a common function to check match URL/IDs and then convert them to ID

Transfermarkt

Added code to always close cloudscrapers
Added tests

Docs

Added a code examples page (replaces the examples.ipynb notebook that was here)

CI/CD

Made some changes to the tox test envs
Changed how docs build to always build every file, even if it hasn't been changed
Added a parallel tox test env. This only works locally, unfortunately. It errored out on GitHub actions.

Assets 2

13 Nov 19:16

oseymour

v3.1.2

c01af74

v3.1.2

Fixed issue #46 (Capology timeout exception when looking for element)
Capology scrapes cleaned column names for current season.
- Previously, the column names for the current season included any options from the dropdown menus of help hover icons in the column
Added numpy docstring validation when building Sphinx docs

Assets 2

16 Sep 02:45

oseymour

v3.1.1

b70dd90

v3.1.1

Added a get_match_links() function to Transfermarkt that returns all match links for a given year and league
Updated the output of scrape_player_match_stats() in the Sofascore module to also return team name and team ID columns for each player.

Assets 2

02 Jul 00:22

oseymour

v3.1.0

c1260e4

v3.1.0

Removed the FiveThirtyEight module.
- See https://scraperfc.readthedocs.io/en/latest/fivethirtyeight.html for a really simple way to acquire the FiveThirtyEight data.
Added Copa Libertardors as a competition to the Sofascore module.
FBref
- Revamped the scrape_match() function.
- Updated the rate limiting in the FBref module due to FBref changing their bot rate limit speed.
- Also added a new ScraperFC exception class that should be raised when FBref has temporarily flagged your IP due to rate limit infringments.
Added linting and typechecking to Tox and GitHub Actions.
Added some new test cases for the FBref module.

Assets 2

17 Jun 22:58

oseymour

v3.0.0

5525b59

v3.0.0

Why the change?

This is a big update and it's not backwards compatible; some of you will have to rewrite small parts of your own code. I know this can be frustrating so I want to explain why I'm making these changes. If you're not interested in the "why", feel free to skip to the [[#Changelog]] below and see what the changes are!

A lot of the changes are non-codebase changes. Things I should have done from Day 1. Unit tests, CI pipelines for testing, docs, and builds, etc. Most of you won't care or see these unless you a) look for them or b) contribute code in the future.

The codebase changes fall into a few categories:

Making it easier for me to maintain the code moving forward. The code got pretty messy and hard for me to take care of.
Making the code run faster and more reliably.
Making it easier for community members (you!) to contribute new code.

Changelog

Now the part you've all been waiting for.

Shared functions

Moved the ScraperFC exceptions into their own file.
Got rid of the overly-complicated function to check years and leagues, get_source_comp_info(). This was a function from very early on in ScraperFC. It was poor architecting and was too much of a pain in the a$$ to fix before this. Now, each module now has a comps dict in its .py file. Any checks to make sure year and league inputs are valid are done in the module functions.

FBref

Updated the capitalization, I finally realized the "r" is lowercase 🤦‍♂️.
FBref.close() has been removed. Only 1 function used the Selenium driver and that function has been updated to open, use, and then close the driver without the user needing to call close().
Added FBref.get_valid_seasons(). This returns the valid seasons for a given competition, scraped directly from the competition's history page on FBref.
The year argument is no longer an int. This is a byproduct of adding get_valid_seasons(). The year is now a str and needs to match the year as it appears on the competition's history page on FBref. This will require a lot of user code changes but makes it far easier to assert the year is valid. See the year parameter page on ReadTheDocs for more details.
FBref.scrape_league_table() now returns all tables from the season's league table page. The first table should be the league table and then any tables after that vary by competition.

Understat

No longer need to call Understat.close(). The Understat module doesn't even need Selenium anymore! They embed a lot of the raw data as JSON in JS scripts right in the HTML.
As a result of getting the data in a different format, a lot of the functions have changed functionality or been deprecated in favor of new functions. Please read the ReadTheDocs page for this module.
Added Understat.get_valid_seasons().
The year argument is a string now. Write the year as it appears in the season dropdown on the Understat website. See the year parameter page on ReadTheDocs for more details.

Sofascore

I switched from requests to the Botasaurus library. Requests was no longer returning accurate data but using Botasaurus fixes this.
I renamed a lot of the functions to more closely match the naming convention of the rest of the modules.
Just about the only complaint I ever heard about this module was that it wasn't automated enough; a lot of the functions required a match link as input but there was no way to get all of the match URLs for a given season. So....
- I've added a function to return basic info for all of the matches, Sofascore.get_match_dicts().
- You can use the match IDs in the output of this function as input to a lot of the other functions because they now take match URLs or match IDs as inputs. Match URLs must be strings, match IDs must be ints.

Transfermarkt

Removed Transfermarkt.close(). The Transfermarkt module now uses cloudscraper instead of a Selenium driver.
Added Transfermarkt.get_valid_seasons()
year argument is a string now. Enter the string as it appears in the competition's season dropdown on the Transfermarkt website. See the year parameter page on ReadTheDocs for more details.

Capology

No longer need to call Capology.close(). Driver will be closed on its own when scraping is done.
Added Capology.get_valid_seasons().
The year argument is a string now. Write the year as it appears in the season dropdown on the Capology website. See the year parameter page on ReadTheDocs for more details.
Removed Capology.scrape_payrolls(). It ended up doing the same thing as Capology.scrape_salaries().

ClubELO

Minor changes to how invalid team names are detected. Shouldn't impact anything.

FiveThirtyEight

No longer need to call FiveThirtyEight.close(). Driver will be closed on its own when scraping is done.

"Behind the Scenes"

Unit tests
- Uses pytest and pytest-cov
- These are in the test folder at the root of the GitHub repository.
- There's a test file for each ScraperFC module.
Python packaging tooling changes
- tox: I've created tox environments for running the unit tests, building the docs, and building the package.
- GitHub Actions:
  - Every push now automatically runs the test suite and does a test build of the docs.
  - Tagged commits will trigger a workflow to build from that commit and upload to PyPI.
I've updated the layout of the documentation on Read the Docs.
I've updated the examples in Examples.ipynb in the GitHub repo to reflect all of the changes introduced in ScraperFC 3.0.

Assets 2

07 Dec 05:36

oseymour

v2.9.2

27d7a42

v2.9.2

removed webdriver_manager import in shared_functions.py because it's no longer required and not included in requirements.txt (v2.9.1, technically)
renamed Sofascore file, class, and in init.py to all align on capitalization
updates to FBRef.py for issues found during unit test dev

Assets 2

15 Oct 21:04

oseymour

v2.9.0

0231af5

v2.9.0

Added RFPL as a scrape-able league for Understat
Fixed some residual bugs from the transition away from ChromeDriverManager and to new get_source_comp_info() function
Added Oddsportal module (unstable)
Fixed #32

Assets 4

17 Aug 21:22

oseymour

v2.8.0

5d8d039

v2.8.0

Fixed #26
Removed Service and webdriver-manager from webdriver inits. New Selenium versions handle the driver binary automatically now.
Fixed issue where FBRef squad and opponent stats tables were filled with all NaNs

Assets 2

28 Nov 18:01

oseymour

v2.6.1

13aa917

v2.6.1

Fixed a bug in FBRef module, scrape_stats() function where player and team ID were not being parsed correctly

Assets 2

24 Nov 03:42

oseymour

v2.6.0

3f92d9f

v2.6.0

Double release since v2.5.0 was also tagged tonight.

2.5.0 fixed an issue with the matchweek vs. competition stage strings not being robustly handled. Issue #18, specifically.

2.6.0 fixed an issue that was found while testing 2.5.0, where the head coach appears in the player stats table after receiving a card (example). When the player ID's are being collected, the coach was skipped and this led to a dimension mismatch in when add the ID's column to the player stats dataframes.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FBref

Oddsportal

Sofascore

Transfermarkt

Docs

CI/CD

Why the change?

Changelog

Shared functions

FBref

Understat

Sofascore

Transfermarkt

Capology

ClubELO

FiveThirtyEight

"Behind the Scenes"

Releases: oseymour/ScraperFC

v3.2.0

FBref

Oddsportal

Sofascore

Transfermarkt

Docs

CI/CD

v3.1.2

v3.1.1

v3.1.0

v3.0.0

Why the change?

Changelog

Shared functions

FBref

Understat

Sofascore

Transfermarkt

Capology

ClubELO

FiveThirtyEight

"Behind the Scenes"

v2.9.2

v2.9.0

v2.8.0

v2.6.1

v2.6.0