Exporting NBA Stats for Fantasy

Analytics has taken over the sports industry, and data is crucial to fantasy sports leagues, including fantasy basketball. While most fantasy sports apps provide you with a wide variety of data and analytics, there are limitations to them, like not being able to customize how you view and interact with the data.

That’s why we found a way to pull the data ourselves so that we can analyze it exactly how we need to. We found an API on RapidAPI that provides NBA player stats and metadata, and we used that data to improve our fantasy basketball team. This article goes over the first step in the process: connecting to the API and setting up our data.

API Setup

The API that we found is this API-NBA one that is hosted on RapidAPI, which is a marketplace that hosts a wide variety of APIs to get data on a large range of topics. You can find this API by navigating to rapidapi.com/hub, searching for “NBA”, and selecting the API-NBA one in the search results.

This brings you to a page where you can navigate through the different endpoints, view parameters and sample results for each, and get sample code snippets for the different endpoints. The code snippets come in a variety of different programming languages and frameworks, and in this article we’ll be using the Python requests library.

If you aren’t signed in yet, you’ll notice it says “Sign Up for Key” where there should be an API key. To get your API key, you’ll need to create a free RapidAPI account and select one of the subscription options. The APIs on RapidAPI are all subscription based, but many of them offer free tiers.

While the API-NBA one does offer a free tier with a limit of 100 requests per day, this is a pretty low limit, so we’d recommend subscribing to either the Pro or Ultra options so that you don’t have to worry about going over.

Once you’re signed in and have selected a subscription option, you’ll your own personal API key in the headers dictionary in the code snippet. However, since this API key is basically your password for accessing the API, we recommend storing it in a more secure way than written out in your code. One alternative is saving it as an environment variable on your computer.

You can set this up in Windows by searching for “environment” in the windows search bar, hitting the Environment Variables button, clicking the New button under the User variables section, and pasting it into the Variable value field with NBA_API_KEY in the Variable name field.

Python can read that into a variable using the os package and plug it into our headers dictionary, which we’ll use during the API calls to authenticate access.

call_get_endpoint

One thing you’ll notice from the individual code snippets is that the code is very similar across the different endpoints, with just a few differences in the parameters and URL. We can simplify our code with a function that standardizes this consistent code and uses function arguments to specify each endpoint.

With everything setup for the API connection, we’re ready to start calling it and transforming the data.

Dimension Tables

Rather than storing the data in one big flat format, the API structures the data like a relational database with fact and dimension tables. It defines IDs for each game, team, and player, and the full data model looks something like this:

We’ll start by recreating those dimension tables using the API and saving them as CSV files.

Teams

First, we’ll have a Teams dimension table that lists out each of the NBA teams. We can get this from the games endpoint, which doesn’t take any parameters, just pulls all of the current NBA teams with some basic info on each.

Our Teams dimension table will include a TEAM_ID attribute to identify each NBA team along with other info like the team name, abbreviation, mascot, and city.

Players

Next, we’ll have a Players dimension table that lists out the individual NBA players. We can get this from the players endpoint, which takes a specific TEAM_ID as a parameter along with the desired season. We’ll use a for loop to iterate through each of the TEAM_ID values from our Teams dimension table and add everything to one big dataframe of all the players.

Our Players dimension table will include a PLAYER_ID attribute to identify each player along with other info like the team they belong to, first and last name, position, and jersey number.

Games

Lastly, we’ll have a Games dimension table that lists out all of the games on the schedule. We can get this from the games endpoint, and if we only use the current season as a parameter then it provides a list of all of the games for the whole year.

One attribute that the API doesn’t provide is the fantasy basketball week number that each game corresponds to. Our fantasy basketball league matchups are split up by week, where matchups start on Monday and end the following Sunday. Thus, if we know the date of a particular game, we can calculate the week number by the number of days since the first Monday of the league.

We defined a little function to do that math for us:

We can then use that function to add the week number attribute based on the date for each game.

Our Games dimension table will include a GAME_ID attribute to identify each game along with other info like the game date, home and away teams, and that week number that we calculated.

Daily Stats

With all our dimension tables setup, we can put together our actual fact table of player stats by game. There’s a statistics endpoint that takes a specific GAME_ID as a parameter and returns all the game stats broken out by player.

Because of how this is setup, we’ll need to make a separate API call for each individual game, which would be a lot of API calls if we tried to re-pull everything every time we update the data. Instead, we’ll set it up to only pull data for games that we haven’t pulled yet.

We’ll probably only refresh the data once a day (at the most), so we decided to group everything by date and make a separate CSV file for each date.

That way, whenever we update the data, we just need to pull the dates that we don’t have files for yet.

get_daily_stats

To start, we’ll define a function that pulls all stats for a given date and saves them to a CSV.

Pull missing dates

As the season goes on, we want to fill in the missing dates without re-pulling dates with existing files, so we can define some code to get a list of missing dates and pull stats for each with our new get_daily_stats function.

Putting Everything Together

While we’ve pulled all of the stats we need, they’re scattered across the different daily files, so our last step is to combine everything together. This is also a good time to add other attributes like the fantasy points calculation and week number label.

calculate_fantasy_points

The API provides raw NBA stats, so we need to define our own function to calculate fantasy points based on our specific league scoring rules.

Combine daily files

We’ll put all the daily stats files together with a for loop and apply our new calculate_fantasy_points function to add the fantasy points column. We can also join in a few attributes from the Games dimension such as game date and week number for easier filtering later, and we’ll end by saving it all to our combined stats CSV file.

Conclusion

With the API-NBA API and a little Python code, we were able to setup our own little relational database of NBA stats. In the following posts, this database will be a foundation for improving our fantasy basketball team through more customized analysis and improved projections.

There’s a lot that we covered, so we’ll provide you with the full Jupyter notebook of Python code for you to download and try out yourself:

Pull NBA Data Download

If you found this guide helpful, feel free to drop a comment below with any questions or feedback. Your input helps us improve future posts and inspires new ideas.

Thanks for joining us today, and we hope to see you next time!