Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Building an IoT Backbone

DZone's Guide to

Building an IoT Backbone

From data ingestion and storage to the gateway to processing, see what infrastructure tools Hazelcast has been working on while taking the IoT journey.

· IoT Zone
Free Resource

Download Red Hat’s blueprint for building an open IoT platform—open source from cloud to gateways to devices.

The Internet of Things has evolved from the convergence of new mobility-enabling technologies such as the proliferation of public Wi-Fi access, cellular devices, RFID technologies, and micro electromechanical systems, with the latest computing technologies such as microservices, cloud computing, and, of course, increased human social consumption of mobile technologies. This convergence has allowed unstructured machine generated data to be analysed for insights that will drive the future.

The “Thing” in IoT could mean routinely used objects around you – a pacemaker, automobile with built-in sensors, a farm animal with a biochip transponder, tracking bands for kids and parents in theme parks, even light bulbs in your house.

Kevin Ashton, co-founder and executive director of the Auto-ID center at MIT, first used this term Internet of Things in a presentation he made to P&G in 1999. This is how he described IoT:

“Today computers — and, therefore, the Internet — are almost wholly dependent on human beings for information. Nearly all of the roughly 50 petabytes of data available on the Internet (do note that this was in 1999) were first captured and created by human beings by typing, pressing a record button, taking a digital picture or scanning a barcode.

The problem is, people have limited time, attention and accuracy — all of which means they are not very good at capturing data >about things in the real world. If we had computers that knew everything there was to know about things — using data they gathered without any help from us — we would be able to track and count everything and greatly reduce waste, loss and cost. We would know when things needed replacing, repairing or recalling and whether they were fresh or past their best.”

In today’s times, this pretty much translates into a world where anything can be connected to the internet and communicate in an intelligent fashion. It describes the future where everyday objects will be linked through wired or wireless networks, using Internet Protocol (IP) to connect to the internet and be able to identify themselves to other devices. This makes IoT even more significant because an object that can represent itself digitally becomes an object of something greater and more important significance.

This Is Fantastic; Is There a Problem?

When objects can sense the environment and communicate, they become tools for understanding complexity and to help the system respond to it swiftly. But this also presents some great challenges: objects that are connected through networks, churn out massive amount of data that flows to computers for analysis. In some cases, these analysis are required to be done in real time. For example: a supermarket offering dynamic pricing at specific locations and at specific times by sensing customer’s buying preferences and reacting in real time. Another example: sensor in heart monitor implant in a serious heart patient sending out raw data related to functioning of heart, every second to an external system for statistical review and help design preventive measures.

The challenges of handling such volumes of data are:

  • Ingesting massive data and keep up with what is coming. This could be hundreds of thousands or millions of devices, each updating periodically, say once every few seconds. This can amount of 100s of GBs of data per system.
  • Anomaly detection – recognize and validate the correctness of data being sent to the system.
  • Analysis, aggregation, extrapolation, and reaction to a situation – all in real time. For example, a Jet engine stopped on a plane – this needs to be immediately actioned.
  • Work with extremely low latency access to data.
  • Reporting – run on-demand or scheduled batch processes to generate reports.

IoT With Hazelcast

What does Hazelcast have to do with this whole IoT thing? Well, the answer is that Hazelcast provides tools and means to build infrastructure for IoT applications. But before we get into nuts and bolts of Hazelcast components, let us first understand the full IoT landscape.

The general IoT system scheme is: Producer -> Gateway -> Processing -> Storage (further processing)

Producer

  • Normally sensors, devices
  • They need to be connected to rest of system
  • Use protocols like HTTP, IP, MQTT, AMQP
  • Individual data events with timestamp from sensors. These are normally in the form of streams, therefore it sometimes requires time-series data store

Gateway

  • Core features
    • Ingest events from sensors, therefore must understand device’s protocols
    • Must absorb the big data volume in real time
  • Additional features
    • Filter “bad” data (broken sensors)
    • Enriching the data, adding metadata, adjusting data
    • Checking data identity (are those data if mine?)
    • Detecting duplicities (sensor sends event, had failure and didn’t get ACK, sends event again)

Processing

  • This is the heart of the IoT solution
  • Data coming from producers through their gateway requires to be processed, real-time in many cases
  • Programs written on top of processor’s API are used to perform cleansing, aggregation, combining, real-time analytics etc.

Storage

  • Processed data is stored for presentation (on dashboards) or further processing such as batch analysis by data analyst, machine learning, indexing etc…
  • Key requirements are to store high volumes of data and delivering high throughput. In general, data is stored in big distributed databases and used by batch processing tools

Hazelcast, with all its sorcery, can play a pivotal role in building infrastructure for IoT. It has components in its armory that facilitate processing and storing of massive scale of data in real time. If you are not already aware of Hazelcast, let us take a look at some of the important components in Hazelcast architecture:

  • Data Processing
    • Distributed Events, ExecutionCallback
    • EntryProcessor
    • ExecutorService
    • Topic, Queue
  • Data Storage
    • Map, Queue, Set, List, JCache
    • RingBuffer, ReliableTopic

Use Case

Time to get practical with a real life use case. Consider a cloud based IoT system where a cluster manages 100,000s of objects coming from buildings using building automation solutions. This is a critical cloud-based component of overall platform strategy rolling out on three continents.

The house-automation (not home, but house) system generates data from components such as sensors, like light switches, temperature sensors, etc… and is collected in a single global cloud cluster. This data can then further be managed using a web interface. A local on-site system connects the sensors with the cloud and every single element in the system becomes addressable using a unique (REST) URL which routes requests/commands to the right device (whatever this device will look like).

Some of the basic requirements to process and store this data:

  • Each sensor has its own stream of data going into the IoT system.
  • Hundreds of thousands of such sensors.
  • Sensor’s data must be collected in a time-series order so that various timestamp bound operations can be performed.
  • Every time sliced data point must be processed as soon as it enters the system.
  • Concurrent jobs to execute for each of the sensor’s data stream collected over a period of time and/or for each sensor data point coming into the system.

Hazelcast Solution

Because the data needs to be stored in time-series order, usual data structures like Map, Queue or JCache are not good choice. Hazelcast offers RingBuffer as a distributed data structure that allows to store data in a ring like structure. Ring Buffers can be thought of as circular arrays with a fixed capacity. See a representation:

Note: Take a look at the documentation for more details on RingBuffer

Use Hazelcast RingBuffer's data structure to store all data points streaming in for each of the sensor i.e. each sensor has its own dedicated RingBuffer. Since there is no limitation on the number of RingBuffers allowed in a Hazelcast cluster — you can have millions of them.

Now, let’s consider three possible ways of processing sensor data:

  1. Real-time processing: When a sensor data entry arrives at RingBuffer – fire event and run a job at every data entry
  2. Scheduled processing, near real-time: For a pre-configured number of events in the RingBuffer, we trigger a job, do some aggregation, dump the events to a data store and clear the RingBuffer
  3. Time-series data processing: Start a job that aggregates the events which were added in last 3 minutes. As soon as a new data entry arrives and a RingBuffer is created for that sensor, a job is scheduled for this RingBuffer that will run every three minutes. This job will do the aggregation and clean up, regardless of the number of subsequent entries added for that sensor.

Case 1

Use ExecutionCallback with RingBuffer to do something on successful add. Example of ExecutionCallback: http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#callback-when-task-completes

Similar principles of ExecutionCallback will apply on RingBuffer. See below for an example:

HazelcastInstance client = HazelcastClient.newHazelcastClient();
Ringbuffer ringbuffer = client.getRingbuffer("ring-buffer");
ringbuffer.addAsync("Some_Data", OverflowPolicy.OVERWRITE).andThen(new ExecutionCallback() {
        public void onResponse(Object o) {
<Your execution code when added successfully>                
        }

        public void onFailure(Throwable throwable) {
    <Your execution code when adding failed>
        }
    });

Case 2

Here is a pseudo code to demonstrate how a RingBuffer is used to store data points coming in from a sensor at different time intervals, and how those data points are aggregated.

The infinite for-loop checks for 10 data items at a time, does the aggregation and repeats itself. This is only a pseudo code, hence numbers used are hypothetical and for explanation use.

HazelcastInstance client = HazelcastClient.newHazelcastClient();
Ringbuffer ringbuffer = client.getRingbuffer("");
final AtomicLong sequenceId = new AtomicLong(ringbuffer.headSequence());
for(;;){
    Integer[] items = java.util.stream.Stream.generate(() -> ringbuffer.readManyAsync(
    sequenceId.incrementAndGet(), 1, 1, null)).limit(10).toArray(Integer[]::new);
    doCalculation(items);
}

Case 3

    Ringbuffer ringbuffer = client.getRingbuffer("Case3_Ringbuffer");
    long sequenceId = ringbuffer.tailSequence();
    //Using tailSequence here as we need to go backwards and read entries in
    //past 3 minutes
    for(;;){
        Integer items[] = new Integer[10];
        for(int k=0;k<10;k++) {
            Object item = ringbuffer.readOne(sequenceId);
            sequenceId--;

            if(item instanceof TimeEvent){
                // it is a timer event, so lets stop collecting and go to 'doCalculation'
                break;
            }else{
                items[k] = (Integer) item;
            }
        }
        doCalculation(items);
    }

And to add a timer event to RingBuffer at a certain time stamp, below code may run on a daemon thread.

for(;;){
    TimeEvent e = new TimeEvent();
    for(Ringbuffer ringbuffer : eventbuffer){
        if(time is 3 minutes since last add)
        ringbuffer.add(e);
    }
    Thread.sleep(1 minute);
}

Note:

  • To achieve higher concurrency, you could have as many runnables/callables as the number of ring buffers and all these callables/runnables will be executed by a bunch of threads
  • Hazelcast RingBuffer allows storing time series data as there is no need of maintaining the validity of data points based on their time-stamp. Ring Buffer will take care of this by overwriting old events with new events.

Tip: If you get an event per second and you want to process every minute, give the RingBuffer a capacity of 120, so that you have 60 seconds for the calculations before unprocessed data start to get overwritten.

There are a number of ways that one can design an IoT infrastructure with Hazelcast; there are already business critical IoT applications created with Hazelcast components, running worldwide.

Build an open IoT platform with Red Hat—keep it flexible with open source software.

Topics:
iot ,hazelcast ,data ingestion ,iot platform

Published at DZone with permission of Rahul Gupta, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}