Fantasy Basketball Optimization

We used data science and analytics to optimize our fantasy basketball team, eventually leading to a league championship! This series outlines how, including the full stats theory behind it and an automated data pipeline in the cloud.

The flow chart below gives a quick overview of the whole data pipeline used with some quick notes about the cloud infrastructure:

Problem Statement: The Lock-in Dilemma

In fantasy basketball, players have multiple games per week, so you have to pick one game per week to “lock-in” points for. The hard part is that you have to decide to lock before the player’s next game starts, so you’re constantly asking yourself “should I take the current points, or hold out for a better score?”

This lock-in concept then throws off our scoring expectations. Most fantasy apps provide projections for individual games, but if we’re good with our lock-ins, the final totals are higher than the single-game projections (which we refer to as the “lock-in bias”).

Thus, we utilized data and analytics to alleviate these challenges. First, we modeled a probability distribution for single game totals to estimate probabilities for our lock-in decisions. Then, we applied Monte Carlo Simulation to create adjusted projections that account for the lock-in bias.

Project Goals

We summarized the project deliverables into 3 main objectives:

Better insights with tailored exploratory analysis and stats
Better planning through projections that adjust for lock-in bias
Better decisions by associating probabilities with lock-in options

Better Insights

While most fantasy apps provide a variety of different stats, they don’t always summarize the data exactly how we want it.

For instance, we wanted to see how our final locked totals compare to each player’s single game totals (which isn’t provided in the Sleeper fantasy app). To get this, we pulled our stats data into a Power BI report and made our own customized table to showcase this:

Due to the lock-in bias, our actual totals per player averaged 37.5% higher than the single game totals. This type of tailored analysis gave us new insights that helped us better manage our team.

Better Planning

Most fantasy apps provide player projections for individual games. However, as we learned with the lock-in bias, the actual totals are usually higher than these single game projections. Thus, we need to create new adjusted projections that actually account for this.

We did this with a little Monte Carlo Simulation where we simulated fantasy points for all the games in a week, estimated which game we would’ve locked in, and averaged out the locked scores across a bunch of different simulation repetitions.

With the adjusted projections, we get a much better idea of how many points to actually expect (and thus can better plan accordingly).

Better Decisions

While some lock-in decisions are easy (either the player did really well or really badly in a particular game), not all of them are that simple. You often find yourself in the “lock-in dilemma” of whether to take the current total or wait for a better score.

To balance this, we started by modeling each player’s single game point totals as a normal distribution. Then, for a given point total with a given number of games left, we can approximate the probability of the player scoring better later in the week. If the probability is low, we should lock it in. If it’s high, we wait.

With our lock-in decisions backed by probabilities, we can make smarter, data-driven choices – ultimately leading to more points at the end of the day.

AWS Infrastructure Setup

To fully automate our data collection and processing workflows, we implemented a serverless data pipeline using AWS cloud services. This setup allowed us to run the analysis daily (and even send automatic updates via email) without having to manually run it on a local computer every time.

Here’s a quick overview of the AWS cloud services we used to achieve this:

Automated Python workflows with AWS Lambda functions
Deployed code/packages to AWS Lambda using Docker and AWS Elastic Container Registry (ECR)
Stored and accessed data as CSV files through AWS S3 buckets
Scheduled daily triggers for Lambda functions using AWS EventBridge
Securely managed API keys and other secrets using AWS Secret Manager
Controlled access and permissions across services with AWS IAM

Blog Posts

With all the different aspects of the pipeline, we split it out across a series of blog posts (Python code is included in the individual posts):

Pulling Data from API-NBA
Pulling Data from the Sleeper API
Managing the Lock-in Spreadsheet
Exploratory Analysis in Power BI
Addressing the Lock-in Dilemma
Python Automation with AWS Lambda
Connecting Python to SharePoint
Sending Automated Emails with Python
Full Cloud Pipeline Deployment

Note that we’re still working on some of these, so the ones that don’t have links are still in progress. Subscribe below to get updates on when new posts are released.

Whether you’re a fantasy sports fan, an aspiring data analyst, or just someone curious about how data can drive better decisions, each of these posts offers a practical look at applying analytics in the real world. From API integrations to cloud automation and custom visualizations, this series breaks down the full pipeline step by step.

We hope you can learn something the series, and feel free to drop a comment if you do!