Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python: Parsing a JSON HTTP Chunking Stream

DZone's Guide to

Python: Parsing a JSON HTTP Chunking Stream

How I parse a JSON HTTP chunking stream in Python using meetup.com's API to filter RSVPs for events I'm interested in.

· Web Dev Zone
Free Resource

Discover how to focus on operators for Reactive Programming and how they are essential to react to data in your application.  Brought to you in partnership with Wakanda

I’ve been playing around with meetup.com’s API again and this time wanted to consume the chunked HTTP RSVP stream and filter RSVPs for events I’m interested in.

I use Python for most of my hacking these days and if HTTP requests are required the requests library is my first port of call.

I started out with the following script:

import requests
import json

def stream_meetup_initial():
    uri = "http://stream.meetup.com/2/rsvps"
    response = requests.get(uri, stream = True)
    for chunk in response.iter_content(chunk_size = None):
        yield chunk

for raw_rsvp in stream_meetup_initial():
    print raw_rsvp
    try:
        rsvp = json.loads(raw_rsvp)
    except ValueError as e:
        print e
        continue

This mostly worked but I also noticed the following error from time to time:

No JSON object could be decoded

Although less frequent, I also saw errors suggesting I was trying to parse an incomplete JSON object. I tweaked the function to keep a local buffer and only yield that if the chunk ended in a new line character:

def stream_meetup_newline():
    uri = "http://stream.meetup.com/2/rsvps"
    response = requests.get(uri, stream = True)
    buffer = ""
    for chunk in response.iter_content(chunk_size = 1):
        if chunk.endswith("\n"):
            buffer += chunk
            yield buffer
            buffer = ""
        else:
            buffer += chunk

This mostly works although I’m sure I’ve seen some occasions where two JSON objects were being yielded and then the call to ‘json.loads’ failed. I haven’t been able to reproduce that though.

A second read through the requests documentation made me realise I hadn’t read it very carefully the first time since we can make our lives much easier by using ‘iter_lines’ rather than ‘iter_content’:

r = requests.get('http://stream.meetup.com/2/rsvps', stream=True)
for raw_rsvp in r.iter_lines():
    if raw_rsvp:
        rsvp = json.loads(raw_rsvp)
        print rsvp

We can then process ‘rsvp’, filtering out the ones we’re interested in.

Learn how divergent branches can appear in your repository and how to better understand why they are called “branches".  Brought to you in partnership with Wakanda

Topics:
python ,http ,chunking ,json ,web dev

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}