Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Analyzing Runkeeper Data With the ELK Stack

DZone's Guide to

Analyzing Runkeeper Data With the ELK Stack

If you're an athlete of any kind, you're most likely tracking your activities using an application. Learn how to analyze and visualize your activity data with Logstash.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

I know.

2017 is behind us and the time for summaries is over.

But indulge me.

I'd like to give a healthy start to 2018 by combining two of my favorite past times — running and analyzing random data sets — and providing a simple workflow for analyzing sports activity.

If you're an athlete of any kind, you're most likely tracking your activities using an application. Personally, I use Runkeeper to track my runs, and the occasional bike ride or hike, but there are many other applications that do the job quite well.

Most of these applications allow you to export your activity data, which means it can be analyzed using a data analysis tool of your choice.

Guess what tool I'm going to use?

Exporting Your Runkeeper Data

The initial step is, of course, to export the activity report.

This will be different in each application, but in Runkeeper, this is done from the application's Settings page. There, all you have to do is click the Export Data tab, select the time period you want to analyze, and then hit the Export Data button.

After a short while, a runkeeper-data-export-*.ZIP file is generated, and to download it, just click the Download Now button.

Unzip the file.

In the uncompressed folder, you will find a cardioActivities.csv that contains all of your cardio activities — runs, walks, rides, etc. For the sake of simplicity, I renamed it runkeeper.csv.

Shipping Into ELK

Our next step is to ship the data into ELK. To do this, we will need to configure Logstash to process the CSV file and ship it to Elasticsearch.

Taking a look at the file, you'll see that it consists of 12 columns. These columns will translate into fields when indexed in Elasticsearch.

Since I did not measure my heartbeat, use a route name, or enter any notes for the different activities, I'm going to delete these columns. The GPX File column is also of no interest, and I'm also going to rename the Type column to Activity. Last but not least, I have no need for the top line defining the different column names since we will use Logstash to define these.

Our future fields are, therefore: Date, Activity, Route Name, Distance (km), Duration, Average Pace, Average Speed (km/h), Calories Burned, and Climb (m).

Logstash Configuration File

The Logstash configuration file (/etc/logstash/conf.d/runkeeper-01.conf) uses the file input plugin to ship the CSV file. To process the file, we are using the CSV filter plugin, as well as the mutate plugin for mapping some of the fields as integers and the date plugin to define our timestamp field. A local Elasticsearch instance is defined as the output.

input {
  file {
    path => "/home/ubuntu/runkeeper.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
    csv {
      separator => ","
      columns => ["Date","Activity","Distance(km)","Duration","Average Pace","Average Speed (km/h)","Calories Burned","Climb(m)"]
    }
    mutate {
      convert => { 
        "Distance(km)" => "integer" 
        "Calories Burned" => "integer"
        "Average Speed (km/h)" => "integer"
        "Climb(m)" => "integer" 
        }
    }
    date {
     match => [ "Date" , "MM/dd/yy hh:mm", "MM/dd/yy h:mm"]
     remove_field => [ "Date" ]
    }
}

output {
  elasticsearch { 
    hosts => ["localhost:9200"] 
  }
}

Starting Logstash, the data should be indexed into separate Elasticsearch indices, one per Runkeeper activity date.

To verify, cURL Elasticsearch with:

curl -XGET 'localhost:9200/_cat/indices?v&pretty'

You should be seeing a list of the indices created:

health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-2017.11.23 B_rI8jimQxeuJq_Ekstakw   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.11.26 1ktPMaI5S12DnLNi3ys_tg   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.11.16 oaQP1mX7RCuX_TL8fhjMJA   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.06.19 YgAVNjRnQ46WVS_uV_UjHg   5   1          2            0     16.3kb         16.3kb
yellow open   logstash-2017.09.10 tQHOAMFTQxWoCjXGEOL3Dw   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.01.16 Izixk0EvTfC33EgBKPX97g   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.10.08 202CivWCQ72FBVe06gwxEg   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.04.06 6rU9Kym2TrapjxfWcxqv6w   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.10.01 c5Fg_G--TjeuERbAyhe9nQ   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.01.10 xdk1bEhrRPWBTd5rZYoH0g   5   1          1            0      8.7kb          8.7kb
yellow open   logstash-2017.08.21 CaogzquKTWSFvAIf6i8r5g   5   1          1            0      8.7kb          8.7kb
...

In Kibana, you will be able to now define the index pattern as logstash-* and view your activity, looking a year back.

Adding some of the fields into the main display area, we get a nice overview of all the activities tracked by Runkeeper in 2017:

Shipping Into Logz.io

If you're using Logz.io, a few modifications are required to be made to the Logstash configuration file. We need to add the Logz.io user token (retrieved from the Settings page in the Logz.io UI) and define Logz.io as an output destination.

The resulting configuration should look something like this:

input {
  file {
    path => "/home/ubuntu/runkeeper.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
    csv {
      separator => ","
      columns => ["Date","Activity","Distance(km)","Duration","Average Pace","Average Speed (km/h)","Calories Burned","Climb(m)"]
    }
    mutate {
      convert => { 
        "Distance(km)" => "integer" 
        "Calories Burned" => "integer"
        "Average Speed (km/h)" => "integer"
        "Climb(m)" => "integer" 
        }
    }
    date {
     match => [ "Date" , "MM/dd/yy hh:mm", "MM/dd/yy h:mm"]
     remove_field => [ "Date" ]
    }
    mutate {
      add_field => { "token" => "tWMKrePSAcfaBSTPKLZeEXGCeiVMpuHb" }
    }
}

output {
  tcp {
    host => "listener.logz.io"
    port => 5050
    codec => json_lines
    }
}

Analyzing the Runkeeper Data

Now that the data is indexed and parsed correctly, it's time for some fun.

Using a series of simple Kibana visualizations, we can build a nice dashboard that gives us an overview of our activities.

For example, a simple pie chart visualization shows that my favorite activity is...running!

A metric visualization gives us the number of activities conducted over the year — 95!

How about the average distance covered per activity?

A bar chart visualization gives us a depiction of the different activities per month:

Let's take a look at the change in distance covered over time. To do this, we'll use a line chart:

If you must ask, that huge drop in June was due to a knee injury. We can see that it corresponds to me moving to walks instead of running in the bar chart above.

And to summarize it all, we can use a data table aggregating metrics per month:

Adding these visualizations and others into one dashboard, we get a nice overview of the last year:

Yes, analyzing your sports activity with Logstash, Elasticsearch, and Kibana is akin to bringing a gun to a swordfight. But while not a typical use case for the ELK Stack, it's a nice and extremely simple data analysis exercise, especially if you're a keen runner as I am.

It was also a nice opportunity to get acquainted with some of the new improvements in Kibana 6, soon to be introduced in Logz.io as well. Kudos to the folks at Elastic on some great work in improving the UX and UI, especially to the dashboarding experience.

I wish you all a great, and active, 2018!

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,data analytics ,elk stack ,tutorial ,logstash ,data visualization

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}