Export Apple Health Data and Visualise with Tableau
While I love working with online and public datasets, it’s always fun to work with my own data. My iPhone is always with me and has been recording walking distance and steps. I rarely left my apartment since the quarantine began in March, so I thought it would be interesting to see the change in my walking distance from 2016 to 2020.
Export Data
Steps to export in Apple Health
- Launch Health app on your iOS device
- Tap profile photo on the top right corner
- Scroll down and tap Export All Health Data
- Tap Export and wait
- Save data (I use AirDrop to transfer the file to my MacBook)
The process generates an export.zip file. One unzipped, it contains a folder called apple_health_export
with two .xml
files. export.xml
is the file that I will be working with.
Organise Data
XML (Extensible Markup Language) is a data file. It is formatted much like an HTML document but uses custom tags to define objects and the data within each object. Atom and VSCode can both read XML files.
Here is a snapshot of the export.xml
file. The record type, HKQuantityTypeIdentifierDietaryWater
, refers to my water intake. The entry also documents the source application, date, value, unit, etc.
Record type="HKQuantityTypeIdentifierDietaryWater" sourceName="Flo" sourceVersion="4.10.3.1" unit="mL" creationDate="2018-09-18 10:59:02 -0500" startDate="2018-09-17 11:00:00 -0500" endDate="2018-09-17 11:00:00 -0500" value="500">
<Record type="HKQuantityTypeIdentifierDietaryWater" sourceName="Shortcuts" sourceVersion="754" unit="mL" creationDate="2019-01-23 21:16:36 -0500" startDate="2019-01-23 21:16:36 -0500" endDate="2019-01-23 21:16:36 -0500" value="300"/> <
I needed to first transform the tag-based XML file to other formats that are easier to work with for me. xmltodict
is a Python module that makes working with XML feel like you are working with JSON, as stated in the documentation. I installed the module via pip using pip install xmltodict
command.
Here’s my code for parsing XML data and organising data frame.
import xmltodict
import json
import numpy as np
import pandas as pd
#Parse XML to dictionary
with open('export.xml') as f:
= xmltodict.parse(f.read())
health
#Extract desired data and turn into a Data Frame
= pd.DataFrame(health['HealthData']['Record'])
data
#Create two Data Frames for distance and steps walked
= data[data['@type'] == 'HKQuantityTypeIdentifierDistanceWalkingRunning'][['@value','@creationDate','@startDate','@endDate']]
distance = data[data['@type'] == 'HKQuantityTypeIdentifierStepCount'][['@value','@creationDate','@startDate','@endDate']]
step
#Turn the kilometers into float data type from object and others into datetime
'@value'] = distance['@value'].astype(float)
distance[for column in distance.columns:
if column != '@value':
= pd.to_datetime(distance[column])
distance[column]
#Create a new column to get the time differences
'@walkHours'] = distance['@endDate'] - distance['@startDate']
distance[
#Resample to aggregate data by day and merge kilometres and walking time dataframes
= pd.merge(
data = distance.resample("D",on='@startDate')['@walkHours'].sum().reset_index(),
left = distance.resample("D",on='@startDate')['@value'].sum().reset_index(),
right = '@startDate',
on = 'outer',
how = False).rename(columns={'@value':'@kms'})
indicator
#Same with step
'@value'] = step['@value'].astype(int)
step[for column in step.columns:
if column != '@value':
= pd.to_datetime(step[column])
step[column]
#Merge with earlier data
= data.merge(
data "D",on='@startDate')['@value'].sum().reset_index(),
step.resample(='@startDate',
on='outer').rename(columns={'@value':'@steps'})
how
#Turn HH:MM:SS into minutes.
'@walkMins'] = round((data['@walkHours'].dt.total_seconds() / 60), 2)
data[= data[data['@walkMins'] < 1440]
data
#Filter out incorrect entries
'@startDate'] = data['@startDate'].dt.tz_localize(None)
data[
#Export to Excel
'health.xlsx') data.to_excel(
Visualise Data
Lastly, I decided to use Tableau Public for visualisation as I enjoy its flexibility. Here is a static version of my dashboard. You can find the interactive version here. The design was inspired by Lindsey Poulter, who is absolutely genius with Tableau.
Conclusion
I expected to see more walking while I am travelling and much less after implementing social distancing rule. Surprisingly, I did not walk as much as I thought and the drop in 2020 was not as significant as expected. Wrapping up, I briefly explained the process of extracting Apple Health data, parsing XML file with Python, and visualising with Tableau Public.