Pushing IoT Data Gathering, Analysis, and Response to the Edge
Pushing IoT Data Gathering, Analysis, and Response to the Edge
Let's take a look at how edge computing is augmenting data ingestion and analysis for IoT in conjunction with the TICK stack.
Join the DZone community and get the full member experience.Join For Free
Read why times series is the fastest growing database category.
The Internet of Things is not so much a thing as it is a concept. It's a concept that enables us to instrument our world with sensors and respond to the data coming in from those sensors in meaningful ways. It's about adding sensors to all the things in our world so that we can measure, analyze, visualize, predict, and react to the environment around those things.
The IoT is not a product because it's not a single thing. If that sounds fairly abstract, it is, but we'll figure it all out. In addition to all of that, there are multiple segments, or markets, within the IoT. The one most people are familiar with is consumer IoT. You may have a smart thermostat in your house, some smart switches, or other internet-connected devices and appliances. These are generally considered part of consumer IoT. Then there's the Industrial IoT, or IIoT. This segment includes things like smart buildings, industrial automation, and monitoring of industrial processes. This is a part of the IoT that most people never see, and rarely hear about, but it's where the most growth and innovation happens, and it's what we will focus on in this article.
An IoT Architecture
Almost any IoT architecture is going to involve a few basic components like sensors, a place to collect and store data, some way to visualize and interact with the data, and often some way for actions to be taken based on events in the data.
It's a fairly simple concept that's been around for a very long time: a sensor collects data and sends it to a server to store it. That data is then made available for analysis and, based on that, some action is taken. But as IoT deployments grow in size and complexity, the ability to simply have a sensor send all its data to a single, monolithic backend begins to become less and less practical.
First, the sheer amount of data begins to become overwhelming for any one sensor. Second, there are few systems that can handle the volume of sensor data. To illustrate this point, let's look at a modest-sized IoT deployment of, say, 10,000 sensors deployed across an enterprise. Each sensor takes a series of 5 readings every second. That's 50k readings/second streaming over the internet to a single backend server. If each reading is 1kB of data, that's 5kB per sensor/second, or 250kB/s of data per second overall.
It's fairly reasonable to assume that almost any competent backend system could handle this fairly modest amount of data. But now let's begin to scale that to something that would actually go into production, and the numbers rapidly begin to grow out of control. Sensors that collect 1,000 readings per second, at 1kB of data per reading, grows to 1MB of data per second, per sensor. At 10,000 sensors, you're streaming a GB/second of data. Still think it sounds reasonable to stream all of that data to a single backend system in real time? We need to look for alternatives.
Pushing Data Collection to the Edge
You could compress your data, but there's a compute overhead in doing so. You could scale back the frequency of data collection, but this could impact your ability to detect and respond to anomalies. Or, you could push your data collection, analysis, and response out from the data center or cloud to the edge.
In the scenario above, with 10,000 sensors, it would be reasonable to segment the deployment into groups, each group connecting to the internet via a gateway device. If you could turn each gateway device into a mini data collection, analysis, and response machine, that would help with the overall scaling problem. If you segment each gateway device to service as many as 1,000 sensors, each gateway device would be at the 250kB/s data rate and could easily handle the load.
Now you've got 1,000 gateway devices, all collecting data from their 1,000 individual sensors, which has solved your data-rate scaling problem, but it has created another problem: distributed data. Now your data is on 1,000 different devices, and you have no way to see aggregated data from all of your sensors. As always, the left hand giveth while the right hand taketh away!
Now, on top of your data collection and scaling problems, you've got a data aggregation problem! I know it sounds like I'm creating more problems than I am solving here, but there is a way to apply a solution across the entire deployment that makes all of these problems go away: the TICK Stack.
Deploying the TICK Stack to Solve the Scaling Problem
The TICK Stack is made of 4 open source software components designed specifically to make the collection, storage, management, visualization, and manipulation of time series data easy and scalable.
The 'T' in TICK stands for Telegraf. Telegraf is a plugin-based high-performance data ingestion engine designed to gather incoming data streams from multiple sources, and in an extremely efficient manner, output those streams to data storage platforms.
One of those platforms, the one we're going to focus on here, is InfluxDB, the 'I.' InfluxDB is a Time Series Database designed from the ground up for performance and ease of use when dealing with time series data — and what is IoT data if not time series data?
The 'C' in TICK is for Chronograf, the data visualization and management front-end for the other components of the stack. Using Chronograf, you can quickly and easily build stunning dashboards for data monitoring:
These dashboards can help you easily monitor your sensor data and spot anomalies and trends in your data that you might otherwise miss.
The 'K' in TICK is Kapacitor. Kapacitor is our streaming processing engine that runs alongside InfluxDB to do more complex data processing, process alerts, etc. Great, so how will this help? Well, one thing I've been working on lately is deploying the entire TICK Stack from edge to data center for complete IoT data collection, analysis, reporting, and alerting — and I can say that it has been wildly successful. I took a $30 Pine-64 LTS board, added a $35 7" touchscreen display, a $9 Bluetooth/Wifi card, and built an edge device that is capable of collecting sensor data via Wi-Fi, wired Ethernet, or Bluetooth LE (I've since added a LoRA radio to it as well, just for fun).
That device, with a 32GB MicroSD card, collects sensor data and displays it on a dashboard (in fact, that image above is the dashboard on the device). In addition, it processes that data and sends alerts when the temperature from one of the sensors changes more than 1ºF, or whenever the CO2 concentration in the room changes by more than 100ppm.
But I said I was going to conquer the distributed data problem, and with Kapacitor I have done exactly that. I've used Kapacitor to generate the alerts discussed above, but I'm also using Kapacitor to do some fairly sophisticated downsampling of the data. I'm collecting about 100 sensor readings every second on this device. But because I am handling the data visualization and anomaly detection on this edge device, I don't actually need to send this highly granular data to my backend system.
So, I'm downsampling my data before sending it back upstream. What is downsampling of data? It's reducing the granularity, or resolution, of the data while preserving the overall trends in that data. Some refer to this as a "rollup" of the data. So, I am taking a rolling 5-minute mean of the temperature data over a 5-minute window, and rolling that data up to the cloud for long-term storage and further analysis.
The second-by-second analysis of the temperature data is important locally, but it becomes less important on the backend system. Further, on the backend system, I am analyzing the aggregated data from multiple gateway devices collecting this data—each of which is doing its own local processing, visualization, and alerting—so overall trend analysis is more important. In this way, I am distributing the load for data collection, data processing and analysis, and action to the closest point to where the data is actually generated, while still preserving the data to a persistent backend data store for further, higher-level, analysis.
The best part of the whole thing? I'm deploying the exact same code base at all levels of the architecture. It's the same code — the same stack, from edge to data center, which reduces complexity and makes deployment easier and more economical. Here's what it all looks like as a GIF.
It's a simple solution to a complex architecture designed to maximize data collection and scalability while reducing complexity and maintenance issues.
I say quite often that IoT data must be timely, accurate, and actionable in order to be useful. This architecture and use of the TICK Stack maximizes the ability to be timely and accurate in your data collection, and take action on that data as close as possible to the point of data generation where it makes the most impact.
Opinions expressed by DZone contributors are their own.