Over a million developers have joined DZone.

Analyzing Five Years of FitBit Data

DZone 's Guide to

Analyzing Five Years of FitBit Data

Learn about data viz, ingestion data from an IoT device, and time series data — all while getting your steps in!

· Big Data Zone ·
Free Resource

About five years ago, my wife got me a present, a FitBit. I didn’t wear a watch for a while, and I didn’t really see the need, but it was nice to see how many steps I took and we had a competition about who had the most steps a day. It was fun. I've had a few FitBits since then and I’m mostly wearing one. As it turns out, FitBit allows you to get an export of all of your data, so a few months ago I decided to see what kind of information I have stored there, and what kind of data I can get from it.

The export process is painless and I got a zip with a lot of JSON files in it. I was able to process that and get a CSV file that had my heartrate over time. Here is what this looked like:

CSV Data File

The file size is just over 300MB and it contains 9.42 million records, spanning the last 5 years.

The reason I looked into getting the FitBit data is that I’m playing with timeseries right now, and I wanted a realistic data set. One that contains dirty data. For example, even in the image above, you can see that the measurements aren’t done in a consistent way. It seems like ten and five second intervals, but the range varies. I’m working on a timeseries feature for RavenDB, so that was perfect testing ground for me. I threw that into RavenDB and I got the data to just under 40MB in size.

I’m using Gorilla encoding as a first pass and then LZ4 to further compress the data. In a data set where the duration between measurement is stable, I can stick over 10,000 measurements in a single 2KB segment. In the case of my heartrate, I can store an average of 672 entries in each 2KB segment. Once I have the data in there, I can start actually looking at interesting patterns.

For example, consider the following query:

Data Query

Basically, I want to know how I’m doing on a global sense, just to have a place to start figuring things out. The output of this query is:

Query Output

These are interesting numbers. I don’t know what I did to hit 177 BPM in 2016, but I’m not sure that I like it.

What I do like is this number:

Query Performance

I then ran this query, going for a daily precision over all of 2016:

Data Query

And I got the following results in under 120 ms.

Query Results

These are early days for this feature, but I was able to take that and generate the following (based on the query above).

Time Series Data Visualization

All of the results have been generated on my laptop, and we haven’t done any performance work yet. In fact, I’m posting about this feature because I was so excited to see that I got queries to work properly now. This feature is early stages yet.

But it is already quite cool.

big data ,data ingestion ,time series data model ,data visualization

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}