Pulling Data from API-NBA

In this post, we’ll kick off our fantasy basketball optimization project by pulling NBA stats using the API-NBA service. This is an API that we found on RapidAPI that provides us with historical NBA stats per player/game. We’ll use the raw stats to calculate fantasy points by game, which is the backbone of our fantasy basketball optimization work.

This article outlines how we used Python to pull and re-format the data for our relational database of CSV files. Note that while the final pipeline implements an AWS Lambda function for this, we’ll demo the code in a Jupyter notebook for the scope of this article.

API Setup

To start, you can find the API by navigating to rapidapi.com/hub, searching for “NBA”, and selecting the API-NBA one in the search results.

This brings you to a page where you can navigate through the different endpoints, view parameters and sample results for each, and get sample code snippets for the different endpoints. The code snippets come in a variety of different programming languages and frameworks, and in this article we’ll be using the Python requests library.

If you aren’t signed in yet, you’ll notice it says “Sign Up for Key” where there should be an API key. To get your API key, you’ll need to create a free RapidAPI account and select one of the subscription options. The APIs on RapidAPI are all subscription based, but many of them offer free tiers.

While the API-NBA one does offer a free tier with a limit of 100 requests per day, this is a pretty low limit, so we’d recommend subscribing to either the Pro or Ultra options so that you don’t have to worry about going over.

Once you’re signed in and have selected a subscription option, you’ll your own personal API key in the headers dictionary in the code snippet. However, since this API key is basically your password for accessing the API, we recommend storing it in a more secure way than written out in your code. One alternative is saving it as an environment variable on your computer.

You can set this up in Windows by searching for “environment” in the windows search bar, hitting the Environment Variables button, clicking the New button under the User variables section, and pasting it into the Variable value field with NBA_API_KEY in the Variable name field.

Python can read that into a variable using the os package and plug it into our headers dictionary, which we’ll use during the API calls to authenticate access.

call_get_endpoint

One thing you’ll notice from the individual code snippets is that the code is very similar across the different endpoints, with just a few differences in the parameters and URL. We can simplify our code with a function that standardizes this consistent code and uses function arguments to specify each endpoint.

With everything setup for the API connection, we’re ready to start calling it and transforming the data.

Dimension Tables

Rather than storing the data in one big flat format, the API structures the data like a relational database with fact and dimension tables. It defines IDs for each game, team, and player, and the full data model looks something like this:

We’ll start by recreating those dimension tables using the API and saving them as CSV files.

Teams

First, we’ll have a Teams dimension table that lists out each of the NBA teams. We can get this from the games endpoint, which doesn’t take any parameters, just pulls all of the current NBA teams with some basic info on each.

Our Teams dimension table will include a TEAM_ID attribute to identify each NBA team along with other info like the team name, abbreviation, mascot, and city.

Players

Next, we’ll have a Players dimension table that lists out the individual NBA players. We can get this from the players endpoint, which takes a specific TEAM_ID as a parameter along with the desired season. We’ll use a for loop to iterate through each of the TEAM_ID values from our Teams dimension table and add everything to one big dataframe of all the players.

Our Players dimension table will include a PLAYER_ID attribute to identify each player along with other info like the team they belong to, first and last name, position, and jersey number.

Games

Lastly, we’ll have a Games dimension table that lists out all of the games on the schedule. We can get this from the games endpoint, and if we only use the current season as a parameter then it provides a list of all of the games for the whole year.

One attribute that the API doesn’t provide is the fantasy basketball week number that each game corresponds to. Our fantasy basketball league matchups are split up by week, where matchups start on Monday and end the following Sunday. Thus, if we know the date of a particular game, we can calculate the week number by the number of days since the first Monday of the league.

We defined a little function to do that math for us:

We can then use that function to add the week number attribute based on the date for each game.

Our Games dimension table will include a GAME_ID attribute to identify each game along with other info like the game date, home and away teams, and that week number that we calculated.

Daily Stats

With all our dimension tables setup, we can put together our actual fact table of player stats by game. There’s a statistics endpoint that takes a specific GAME_ID as a parameter and returns all the game stats broken out by player.

Because of how this is setup, we’ll need to make a separate API call for each individual game, which would be a lot of API calls if we tried to re-pull everything every time we update the data. Instead, we’ll set it up to only pull data for games that we haven’t pulled yet.

We’ll probably only refresh the data once a day (at the most), so we decided to group everything by date and make a separate CSV file for each date.

That way, whenever we update the data, we just need to pull the dates that we don’t have files for yet.

get_daily_stats

To start, we’ll define a function that pulls all stats for a given date and saves them to a CSV.

Pull missing dates

As the season goes on, we want to fill in the missing dates without re-pulling dates with existing files, so we can define some code to get a list of missing dates and pull stats for each with our new get_daily_stats function.

Putting Everything Together

While we’ve pulled all of the stats we need, they’re scattered across the different daily files, so our last step is to combine everything together. This is also a good time to add other attributes like the fantasy points calculation and week number label.

calculate_fantasy_points

The API provides raw NBA stats, so we need to define our own function to calculate fantasy points based on our specific league scoring rules.

Combine daily files

We’ll put all the daily stats files together with a for loop and apply our new calculate_fantasy_points function to add the fantasy points column. We can also join in a few attributes from the Games dimension such as game date and week number for easier filtering later, and we’ll end by saving it all to our combined stats CSV file.

Conclusion

With the API-NBA integrated, we now have our historical fantasy stats that will be the backbone of our fantasy basketball optimization project. In the following posts, we’ll showcase how we pulled it together with some other data sources to create a relational database for our analysis.

Feel free to download the full Jupyter notebook of Python code and try it out yourself:

Pull NBA Data Download

If you found this guide helpful, please drop a comment below with questions and/or feedback. Your input helps us improve future posts and inspire new ideas!

Thanks for joining us today, and check out our next post in the series: Pulling Data from the Sleeper API