DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. Pushpin Reliable Streaming

Pushpin Reliable Streaming

In this article, we take a look at how one team of developers is working to take some of the pain out of strreaming data via an API.

Justin Karneges user avatar by
Justin Karneges
·
Oct. 10, 17 · Opinion
Like (3)
Save
Tweet
Share
4.71K Views

Join the DZone community and get the full member experience.

Join For Free

earlier this month we discussed the challenges of pushing data reliably . fanout products such as pushpin (and fanout cloud , which runs pushpin) do not entirely insulate developers from these challenges, as it is not possible to do so within our scope. however, we recently devised a way to reduce the pain involved.

realtime apis usually require receivers to juggle two data sources if they want to receive data reliably. for example, a client might listen for updates using a best-effort streaming api, and recover data using a rest api. so we thought, what if pushpin could manage these two data sources, such that the client only needs to worry about one?

pushpin, now with pull

as a proxy server, pushpin is uniquely positioned to be able to recover data from the backend server on the client’s behalf, and so we implemented a feature to do just that. the backend server provides a “recovery” url that pushpin can use to retrieve missing data.

here’s a diagram of the process:

pushpin-reliability

the recovery url is specified in the grip-link response header when returning a hold instruction:

http/1.1 200 ok
grip-hold: stream
grip-channel: fruit; prev-id=3
grip-link: </fruit/?after=3>; rel=next; timeout=120

later on, if pushpin detects a gap in the stream of published messages, or if the stream is idle for too long, then it will make a get request to the recovery url and dump the response into the stream. for more details, see the documentation .

the end result is that the client always sees a perfect stream without any gaps. what’s interesting about this approach is that the overall architecture is still mostly publish-subscribe, so the backend plumbing remains scalable and easy to reason about. we just moved the recovery logic up one hop from the client to the edge.

example walkthrough

below is a simple php program (we’ll call it reliable-stream.php ) that serves a stream of updates of a json blob.

<?php

$data = file_get_contents('data.json');
$id = hash('md5', $data);
$last_id = null;

header('content-type: text/plain');

if($_get['recover'] == 'true') {
    // grip-last: {channel}; last-id={id}
    $grip_last = $_server['http_grip_last'];
    $pos = strpos($grip_last, 'last-id=');
    if($pos === false) {
        http_response_code(400);
        echo "invalid grip-last header.\n";
        return;
    }
    $last_id = substr($grip_last, $pos + 8);
}

header('grip-hold: stream');
header('grip-channel: reliable-stream; prev-id=' . $id);
header('grip-link: </reliable-stream.php?recover=true>; rel=next; timeout=120');

if($last_id != $id) {
    echo $data;
}

?>

the program reads the json data from a file called data.json and returns the current data upon connect. let’s assume the content of the file is {"text": "hello"} and make a request to the program, through pushpin:

$ curl -i http://localhost:7999/reliable-stream.php

we get a response that provides the current data and then hangs open:

http/1.1200ok
server:apache/2.4.7 (ubuntu)
x-powered-by:php/5.5.9-1ubuntu4.20
connection:transfer-encoding
transfer-encoding:chunked
content-type:text/plain
{"text": "hello"}

the program doesn’t do any publishing of its own, which is meant to be handled separately whenever the data.json file is updated. the file’s entire content should be published to the reliable-stream channel, such that each line of the stream provides a whole replacement of the previous data. the update process could go like this:

$ md5sum data.json | cut -d ' ' -f 1 > lastid
$ echo'{"text": "new data"}' > data.json
$ pushpin-publish --id=`md5sum data.json | cut -d ' ' -f 1`\
 --prev-id=`cat lastid` reliable-stream @data.json
$ rm lastid

the above sequence of commands would cause additional output to the existing stream:

{"text":"new data"}

you’ll notice the use of ids when holding and publishing, which is required by pushpin’s reliability feature. in this example, we’re using md5 hashes of the content as ids.

now let’s get into the actual recovery mechanism. the example program specifies /reliable-stream.php?recover=true as the recovery url. if pushpin needs to ask for data that may have been missed, it will make a get request to this url. the program points back to itself, so it uses some conditional code to only return data if pushpin needs it:

if($last_id != $id) {
 echo $data;
}

in other words, if the md5 hash known by pushpin ( $last_id ) matches the hash of the current data, then don’t return anything. how does the program know the last id that pushpin has seen? by reading the grip-last header provided by pushpin in the request. the following block of code does some rudimentary parsing of that header, in order to populate that $last_id variable:

if($_get['recover'] == 'true') {
 // grip-last: {channel}; last-id={id}
 $grip_last = $_server['http_grip_last'];
 $pos = strpos($grip_last, 'last-id=');
 if($pos === false) {
 http_response_code(400);
 echo "invalid grip-last header.\n";
 return;
 }
 $last_id = substr($grip_last, $pos + 8);
}

to demonstrate how this works, let’s make a request directly to the program (not proxied by pushpin) so we can see its output. assume it’s being served on localhost port 8000:

$ curl -i http://localhost:8000/reliable-stream.php

the response:

http/1.1200ok
server:apache/2.4.7 (ubuntu)
x-powered-by:php/5.5.9-1ubuntu4.20
grip-hold:stream
grip-channel:reliable-stream; prev-id=ea405059015cd95c7fac18b4aaeea653
grip-link:</reliable-stream.php?recover=true>; rel=next; timeout=120
content-length:18
content-type:text/plain
{"text": "hello"}

as you can see, prev-id was set to the current hash. now, let’s pretend to be pushpin and set a grip-last header in a request, so we can see what comes back:

$ curl -i \
 -h "grip-last: reliable-stream; last-id=ea405059015cd95c7fac18b4aaeea653"\
 http://localhost:8000/reliable-stream.php?recover=true

the response:

http/1.1200ok
server:apache/2.4.7 (ubuntu)
x-powered-by:php/5.5.9-1ubuntu4.20
grip-hold:stream
grip-channel:reliable-stream; prev-id=ea405059015cd95c7fac18b4aaeea653
grip-link:</reliable-stream.php?recover=true>; rel=next; timeout=120
content-length:0
content-type:text/plain

empty response body! as expected.

we mentioned earlier that each line of the stream provides a whole replacement of the data, by publishing the entire content of the data.json file whenever it changes. you can see now, how a recovery request would conform to this expectation. if the hashes don’t match, then the entire content is appended to the stream, which is identical to what a successful publish would have looked like.

note that the reliability feature is not limited to streams of whole data replacements. that’s merely what we’re doing in this example. the feature can work just as well with a stream of deltas.

a better api developer experience

pushpin’s reliability feature greatly simplifies the client-side:

  • the client doesn’t need to worry about sequencing.
  • the client doesn’t need to periodically sync with the server.
  • as long as a connection exists, the client can assume there are no gaps in the data.
  • a single request can be used to receive historical data and reliable pushed data going forward, so there is a single source of truth.
  • resynchronizing after a disconnect only requires a single request.

most public realtime apis today don’t work this way, but we think our approach could greatly improve the developer experience around such apis.

video

you may enjoy the following video, which covers the same material but shows the reliability feature in action.

this post originally appeared on the fanout blog .

Data (computing) Stream (computing) Requests

Published at DZone with permission of Justin Karneges, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 5 PHP REST API Frameworks
  • The Importance of Delegation in Management Teams
  • The Future of Cloud Engineering Evolves
  • Data Mesh vs. Data Fabric: A Tale of Two New Data Paradigms

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: