Stream Processing Business Logic State Management Using REST APIs

DZone 's Guide to

Stream Processing Business Logic State Management Using REST APIs

Read this article in order to learn more information about business logic state management.

· Integration Zone ·
Free Resource


At Plantronics, we have taken our mission from building the best in class industry enterprise headsets to a relentless mission of providing insights for improving audio communications. The data we collect from headsets are aggregated in our cloud SaaS offering Plantronics Manager Pro. We also expose the data via APIs (Please refer links below), which help our partners, MSPs, and customers to innovate and customize their tailored use cases with solutions, apps, etc.

Our APIs are of two categories (like most APIs):

  1. REST pull mechanism using APIs for historical analytics use cases

  2. Real-time Streaming — using PubNub messaging for a real-time relay of device and call events from Plantronics headsets (Quick Disconnect, Mute, etc.) as in our call center wall board integration e.g. Agentq wall broad.

Streaming Analytics Backend Architecture

Streaming API architecture uses Storm Topologies to subscribe to real-time events pushed from Hubs (Plantronics client software in a laptop/mobile platform) to a PubNub channel.

Realtime multi-app muti-tenant architecture

Architecture Flow

The diagram above shows the command control for setting up basic authorization of real-time data flow between PubNub, Plantronics Cloud, and a partner application. For basics on starting a real-time stream with Plantronics APIs, please look here.


API servers (Host REST APIs, read/write to persistence layer MySQL, HBase, MongoDB).

1. 3rd party app calls into Streaming API using REST protocol to set up subscription interest for a specific real-time stream.

2. As part of SUCCESS/201 response, the app gets PubNub subscription metadata (which pub/sub channel, keyset, etc). Now, the app is ready to subscribe to interested real-time events.

3. Plantronics cloud internally pushes intent to subscribe to real-time events relevant to a tenant's Hubs. Now, Hub client software, which communicates with customer headsets (via Bluetooth, corded USB) start publishing real-time events to Plantronics cloud through a negotiated PubNub channel/keyset.

Apache Storm Topologies

The topology is designed to subscribe to Pubnub cloud

  • PubNub Spouts — subscribe to Pubnub channels for multiple tenants to listen to device/call stat events coming in from the Hub. Each tenant's Hubs (running on customer laptops/mobile) uses multiple PubNub channels into which clients push events to load balance on spouts and PubNub SDK threads.

  • PubNub Bolts — Bolt maintains a cache to relevant 3rd party app subscriptions and corresponding PubNub metadata needed. They parse event JSON from Spout does some aggregation, stores any analytics stat needed in HBase and pushes the event to all apps that have a subscription to the event stream.


Now, with the background of the overall use case and workflow of setting up real-time streaming for 3rd party apps, we can get to the description of challenges that this article tries to provide an architectural pattern for.


This article focusses on business logic state management and not times series data aggregation state, which lots of articles deal with. With the use case description above, tenants can be subscribed and unsubscribed using APIs. A 3party app might subscribe and unsubscribe interested streams. Tenants can come and go. An application might reach the end of life. So, how do you update the cached state on Storm spouts/bolts? I researched many options based on our requirement. Having a direct, very low latency DB access is not a requirement here since this will be a periodic update of cache on when to start/stop reading some tenants, publish to apps, etc. Also, the real-time path should not be affected while processing events due to DB read/writes.


Dynamic Sate update using REST APIs

The above diagram illustrates the state management we zeroed in on, which has worked well for us. The following component descriptions explain the ideas and flow.

  1. Dynamic state manager: This singleton instance will periodically kick off on an independent thread, a local cache update, which maintains the active apps and their event stream subscriptions and tenants, which are enabled for real-time and their event subscriptions and other metadata needed for subscribing from tenants and publishing to partner applications. The dynamic manager updates its state periodically, using the REST API calls to API servers. They do not talk to databases directly.

  2. Spout: The spout periodically checks the dynamic state manager and updates its local cache for which tenants it is has a subscription for so that it can add and delete tenants as and when needed. Use something like a regular Java timer to kick off this task. Do it within your stream processing framework's (in this case, Storm worker thread) thread to avoid synchronization related issues.

  3. Bolt: The bolt periodically updates (using Tick tuple construct in Storm's doc) to kick off updates for apps and their subscriptions.

The above patterns have some advantages:

  • Avoids close coupling with database schema keeping interface contract with REST APIs.

  • Some of the APIs given to external parties can be reused.

  • Testability increases at the service level and storm components using mock libraries.

  • Performance tuning can focus on the API server layer instead of becoming part of Storm components. So, it gives a good isolation of sub-components for performance testing.

big data, integration, stream processing

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}