Real-time Visualization of Your Data Streams
Working with real-time data streams is a relatively new practice. While there are many tools that allow visualization and exploration of data at rest, there are very few tools to visualize and interact with data streams in motion.
Alooma is a real-time data platform. In fact, one of Alooma's greatest features is making streaming data pipelines more transparent and accessible. For a while we've been looking for a way to allow our product to better represent the real-time nature of our vision, answering the greatest question of all: "What data is flowing through my pipeline right now?"
Introducing Alooma Live
Alooma Live provides a visualization of your data streams, shows statistics about your data flow, extracts live samples, and allows searching and filtering your data streams - all in real-time. Now you can easily dive into your data streams, validate integrations of new data sources, debug data streams or just sit-back and enjoy watching your data flow patterns.
Looking Into the Black Box
Traditionally, data pipelines are opaque black boxes that offer little insight into what is happening in real-time. We let a few of our customers see Alooma Live throughout the development process, and their reactions were inspiring. For the first time, our customers were able to visually see their data in motion. When the data is already loaded to the data warehouse it appears uniform, almost as if it arrived together as a batch. Seeing it flow in real-time shows different patterns for different data sources: some flow continuously in a relatively uniform rate, while others flow in bursts. Alooma Live also shows you the proportions between different data streams, which can be surprising - you might sometimes find that the smallest streams are actually the most important ones.
While building Alooma Live we tried to stick to 2 basic principles:
- Keep it real - Use real data flowing into our Kafka cluster
- Keep it real-time - Show the data with minimal latency.
Therefore, the data samples, filtering, and statistics represent the actual state of the system and are updated in real-time.
We put a lot of thought into the stream visualization: the data comes from disparate sources where it is unorganized, and slowly converges into organized and well-defined routes, the same way data flows through the Alooma platform. The samples, statistics, and metrics are also calculated and updated in real-time.
Visualizing Kafka in Real-time
To implement Alooma Live, we used real-time technologies both on the front-end and back-end.
For the back-end, we built a node.js application that consumes a Kafka topic. The application is in charge of both filtering the stream based on a user-defined query, and on emitting aggregated statistics of the data stream.
The implementation of both search and aggregation is very preliminary at the moment (we only allow text search), but as we'll get more feedback, we'll keep adding more advanced capabilities (such as supporting a richer query language, and user-defined aggregations).
For the front-end, we used WebSockets to update the browser view in real-time. The data flow, statistics, metrics, and samples are all updated continuously as they are extracted and calculated.