Apple Watch Analytics: Scatter Plots with Python and Jupyter Notebooks

Our Apple Watches track numerous health metrics during workouts, including running speed and calories burned. While the Apple Health app lets us view these metrics independently, we’re limited to only what the Health app offers.

What if we wanted to dive deeper into correlation between them, like seeing how running speed affects calories burned?

Using Python, we can take the next step with this data by analyzing and visualizing it in new ways. In this article, we’ll guide you through this end-to-end process, including:

Exporting the data to your computer
Transforming raw XML format into a friendly CSV format
Creating an interactive scatter plot with a trendline

Ready to dive in? Let’s explore how to analyze your Apple Watch data in Python.

Exporting Data

To start, we’ll assume that you have an outdoor run workout tracked with your Apple watch. Your watch will have tracked running speed and calories burned every few seconds throughout the workout.

This data is continuously synced to your iPhone while paired, allowing you to see a quick summary of these metrics in the Health app. However, to analyze the data with Python, we need to transfer the raw data to our computer.

We’ll do this through the health app, where there’s an option to export all of your health data to a file format called XML. To do this, we’ll open up our health app, scroll to the bottom of the screen, and press the “Export All Health Data” button. It will prompt us to save it somewhere, and we’ll select “My Files”, which allows us to save it to ICloud drive. ICloud drive is basically just a file storage location that connects all of our different devices together through the cloud.

We’ll use the Health app to export all of your health data in an XML format. Here’s how:

Open the Health app on your iPhone.
Scroll to the bottom of the screen and tap the “Export All Health Data” button.
When prompted, save the file to “My Files,” which will store it in iCloud Drive (iCloud Drive is a cloud-based file storage service that connects all your devices)

Name the file and hit save. This uploads the exported XML file to your iCloud Drive. Next, access your iCloud Drive files on your computer by visiting iCloud.com, logging in if necessary, navigating to the file, and downloading it to your computer.

Save screenshot and computer download screenshot.

Now that we’ve downloaded the XML file on our computer, we’re ready to start manipulating it with Python!

Full workflow of data moving across the different devices

XML to CSV

The XML format stores data in a hierarchical structure with tags and attributes, but for most data science tasks, we prefer a tabular format like a CSV file.

This is where Python comes in handy. Python can handle various data formats, allowing us to read the XML data, transform it into a tabular format, and save it as a CSV file. We’ll do this using a Jupyter notebook, which we’ll open and edit in Visual Studio Code.

In the XML export, each measurement is represented by a Record tag, with attributes detailing the measurement type, value, units, and date/time.

We need to extract these attributes into columns in our table, with each Record tag representing a row.

In Python, we’ll use the xml package to read the XML file. Then, we’ll use the findall method to iterate through all Record tags in the file and append them to a Pandas DataFrame.

Once we have the data formatted as a Pandas DataFrame, saving it to a CSV file is pretty straightforward.

Creating the Scatter Plot

Now that we have our data in a tabular format, we’re ready to start analyzing it.

Join Energy Burned and Running Speed Records

Since our CSV data is just a big table of all records, we need to do a few more transformations to prepare it for the scatter plot. First, we’ll split the data into separate tables for energy burned and running speed. Then, we’ll join these tables on the time columns to create a final table with columns for both running speed and energy burned.

It’s important to note the difference in how each report type is calculated. For energy burned, the watch tracks the total calories burned over a few seconds. For running speed, it tracks an instantaneous snapshot at specific times (note how the start date and end date attributes are always equal in the raw data).

The Python code below shows how we can extract the running speed records into their own table:

Since the energy burned records measure the total calories burned over an interval, the value depends on the interval size. To adjust for this, we’ll calculate the rate of calories burned per hour to eliminate dependency on interval size.

Finally, we’ll join the two dataframes to create one joined dataframe. The running speed records align with the start and end times for each energy burned interval, so we can join them by matching each speed time to the energy burned interval end time.

Now that we have each speed record matched to an energy burned record, we’re ready to start plotting it.

Coding up the Plotly Scatter Plot

We can start by plotting all records as points on a scatter plot, with running speed on the x-axis and calories burned on the y-axis. We’ll use the Plotly package to create interactive plots where we can navigate around the axes, zoom in, and hover over data points for more information.

Our scatter plot shows a positive relationship between running speed and calories burned, indicating that as we run faster, we tend to burn calories at a higher rate.

Add the Trendline

To measure this relationship more precisely, we can add a trendline, which is a simple linear regression model with one variable. We’ll use the numpy.polyfit function to estimate the slope and intercept coefficients.

These coefficients define the formula of our trendline like this:

The slope coefficient tells us that if we increase our speed by 1 mph, we expect to burn an additional 138.4 calories per hour.

We can then plot the trendline on top of our scatter plot:

Conclusion

In this article, we outlined how to export Apple Watch data to XML, transform that XML to a CSV, join running speed and calories burned records together, and visualize the relationship between them with an interactive scatter plot. While the scatter plot is just one simple use case, exporting our data to Python opens up a whole new world of analysis possibilities!

If you want to dive deeper and access the full code used in this analysis, consider supporting us on Patreon. Your support helps us continue to provide detailed tutorials and expand our data science projects.