Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Loading and Analyzing Strava Runs Using PostgreSQL JSON Data Type

DZone's Guide to

Loading and Analyzing Strava Runs Using PostgreSQL JSON Data Type

After mapping some Strava runs, the API returned a lot of other data that I discarded because I wasn’t sure what I should keep. Enter: PostgreSQL’s JSON data type.

· Database Zone
Free Resource

Download the Guide to Open Source Database Selection: MySQL vs. MariaDB and see how the side-by-side comparison of must-have features will ease the journey. Brought to you in partnership with MariaDB.

In my last post, I showed how to map Strava runs using data that I’d extracted from their /activities API, but the API returns a lot of other data that I discarded because I wasn’t sure what I should keep.

The API returns a nested JSON structure, so the easiest solution would be to save each run as an individual file. But I’ve always wanted to try out PostgreSQL’s JSON data type and this seemed like a good opportunity.

Creating a JSON-Ready PostgreSQL Table

First up, we need to create a database in which we’ll store our Strava data. Let’s name it appropriately:

CREATE DATABASE strava;
\CONNECT strava;

Now, we can now create a table with one field with the JSON data type:

CREATE TABLE runs (
  id INTEGER NOT NULL,
  DATA jsonb
);
 
ALTER TABLE runs ADD PRIMARY KEY(id);

Easy enough. Now we’re ready to populate the table.

Importing Strava API

We can partially reuse the script from the last post except rather than saving to CSV file we’ll save to PostgreSQL using the psycopg2 library.

The script relies on a TOKEN environment variable. If you want to try this on your own Strava account you’ll need to create an application, which will give you a key.

import requests
import os
import json
import psycopg2
 
token = os.environ["TOKEN"]
headers = {'Authorization': "Bearer {0}".format(token)}
 
with psycopg2.connect("dbname=strava user=markneedham") as conn:
    with conn.cursor() as cur:
        page = 1
        while True:
            r = requests.get("https://www.strava.com/api/v3/athlete/activities?page={0}".format(page), headers = headers)
            response = r.json()
 
            if len(response) == 0:
                break
            else:
                for activity in response:
                    r = requests.get("https://www.strava.com/api/v3/activities/{0}?include_all_efforts=true".format(activity["id"]), headers = headers)
                    json_response = r.json()
                    cur.execute("INSERT INTO runs (id, data) VALUES(%s, %s)", (activity["id"], json.dumps(json_response)))
                    conn.commit()
                page += 1

Querying Strava

We can now write some queries against our newly imported data.

My Quickest Runs

SELECT id, data->>'start_date' AS start_date, 
       (data->>'average_speed')::FLOAT AS speed 
FROM runs 
ORDER BY speed DESC 
LIMIT 5
 
    id     |      start_date      | speed 
-----------+----------------------+-------
 649253963 | 2016-07-22T05:18:37Z | 3.736
 914796614 | 2017-03-26T08:37:56Z | 3.614
 653703601 | 2016-07-26T05:25:07Z | 3.606
 548540883 | 2016-04-17T18:18:05Z | 3.604
 665006485 | 2016-08-05T04:11:21Z | 3.604
(5 ROWS)

My Longest Runs

SELECT id, data->>'start_date' AS start_date, 
       (data->>'distance')::FLOAT AS distance
FROM runs
ORDER BY distance DESC
LIMIT 5
 
    id     |      start_date      | distance 
-----------+----------------------+----------
 840246999 | 2017-01-22T10:20:33Z |  10764.1
 461124609 | 2016-01-02T08:42:47Z |  10457.9
 467634177 | 2016-01-10T18:48:47Z |  10434.5
 471467618 | 2016-01-16T12:33:28Z |  10359.3
 540811705 | 2016-04-10T07:26:55Z |   9651.6
(5 ROWS)

Runs This Year

SELECT COUNT(*)
FROM runs
WHERE data->>'start_date' >= '2017-01-01 00:00:00'
 
 COUNT 
-------
    62
(1 ROW)

Runs per Year

SELECT EXTRACT(YEAR FROM to_date(data->>'start_date', 'YYYY-mm-dd')) AS YEAR, 
       COUNT(*) 
FROM runs 
GROUP BY YEAR 
ORDER BY YEAR
 
 YEAR | COUNT 
------+-------
 2014 |    18
 2015 |   139
 2016 |   166
 2017 |    62
(4 ROWS)

That’s all for now. Next, I’m going to learn how to query segments, which are stored inside a nested array inside the JSON document. Stay tuned for that in a future post.

Interested in reducing database costs by moving from Oracle Enterprise to open source subscription?  Read the total cost of ownership (TCO) analysis. Brought to you in partnership with MariaDB.

Topics:
database ,tutorial ,strava ,json ,postgresql

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}