The expected flood of data from billions of connected devices is raising many challenges for how IoT solutions will be architected. Common design paradigms from device-to-cloud will allow more flexibility on how compute will best be utilised for data analytics. A big challenge is where do we place data analytics: on device, at the edge, or in the cloud?
Just what role does data analytics play in the Internet of Things? Whilst there is no single definition of IoT, it is good to define IoT from a Data Analytics (DA) perspective – “Applying algorithms to data from smart devices that leads to process (industrial IoT) and life (consumer IoT) optimisation.” IoT is the train that data analytics was waiting all these years for. Billions of devices producing data. A marriage made in heaven.
Where the Challenges Lie
Two frequent problem statements from developers working on data analytics use cases in IoT are 1: “I can’t place any analytics on the device or gateway as it just doesn’t have the resources to cope with the amount of data,” and 2: “How do we store and process all the data?”
Looking at 1, if developers could look to classify the data into two buckets at edge and device, namely (1) use now data and (2) use later data, then it can begin to perform local data reduction by pre analytics. This will ensure we can minimise data storage and transfer rates, and free up compute for device based analytics.
Looking at 2, the first step for any data scientist af ter getting a data set is to cleanse it. This by its very existence should suggest that we don’t need all the data to make the decision required for business impact, as there is still a lot of junk data being generated by these IoT devices.
Introducing Haze Computing
IoT edge gateways are now an essential part of IoT applications. But where does the edge begin and end? Compute exists right from the devices all the way to cloud, so why should the gateway be
treated any differently? In fact, in lots of cases the pooled compute of the devices that are connected to the gateway can exceed what is available at the edge gateway. The challenge is how devices are configured in Machine to Machine (M2M). It is predicted here that the classic gateway will be squeezed from above by cloud and below from devices, and it may become distributed by design.
What is being proposed here is to create a dynamic model for your analytics applications, which I name presently as Haze Computing (named due to coverage from device to cloud), where you begin with a pooled view of your resources. Each DA app that you build analyses the local and global compute available to it across cloud, edge and device(s), and the haze data management controllers (DMC) aggregate and design how and where analytics take place in a dynamic fashion.
By being a little more clever at the data source, one can both reduce the amount of data being kept locally and pushed to cloud by designing a data consistency aware messaging service from cloud to device that serves a series of DMC that are in sync and take control over other messaging communication for IoT. Each IoT application for a single device would have its own cloud DMC, device DMC and edge DMC. Their purpose is to manage the data’s
3V’s (velocity, volume, variety) and the application of analytics apps on the data stream at predefined intervals. Having this type of architecture will ensure you can scale your applications and services across cloud, edge, and device.
Figure 1: Haze Computing
This holistic design approach for IoT shown at a high level in figure 1, has many advantages including:
The single view per IoT application ensures that security can be better managed across device, edge, and cloud. Security and privacy are still the main concerns for IoT practitioners across the industry. Applying a more holistic architecture design makes implementing next generation security topologies much easier. One such topology is blockchain, of bit coin fame. If you consider that the cloud application can act as the parent blockchain that can spawn multiple sidechains at the edge, which can in turn manage device based sidechains, then you can create a security ecosystem that is automatic, based on consensus, and fully auditable.
If developers can become more conscious and implement green computing paradigms at the haze level, where energy usage rates are much more visible, then best practices can be designed and
built much easier. Normally the more data we have to process, store, and transfer means the more energy that will be consumed. This architecture ensures you can reduce the energy use as each
DMC acts as a data filter point.
It is common for developers to be slightly behind in where the trends are within the IoT landscape. One main reason for this is normally we have experts in one of the areas required to build full breadth IoT applications, and there are a myriad of technologies with various design practices that are not in sync. This architecture will allow developers to eliminate design chasms across edge, device and cloud, and introduce simplicity in IoT standards being driven by the IIC and OPC.
Cascading Analytics Model for IoT
Having this type of dynamic haze architecture outlined above will allow for a cascading model for analytics. Everyone will have seen the impact apps have on our technology fingerprint as individuals. Analytics use cases are just apps. And like any app, there can be an app store to host it in. And those apps can be installed all the way from personal computers to tablets, phones
and wearables. It’s still the same app at its core. A retail store assistant in a city may only require data from that city, however a regional manager may require the same application with data
from the entire state.
With container based technologies such as Docker now becoming more mainstream, it is becoming easier to move or expand your applications across your IoT application domain and function exactly the same. These technologies will ensure that you can build your apps once, and deploy to many device types.
Analytics will need to be dynamic for IoT, architecture allows analytics to move across IoT compute spectrum
Decision making acts like a neural network for IoT analytics
Whilst IoT can be a huge source of both data and challenges that can be solved with the data, it can be easy for developers to get distracted. Regardless of consumer or industrial IoT, an idea might be to start small with a simple classifier at the device level. Build on this at the edge, with some more advanced time-based analysis. Then once you get to cloud level, use the vast compute and applications you have there to run much more sophisticated machine learning algorithms on your data. However, ensure that this learning is fed back to your simplistic model at the device and edge, so that you can improve your data models over time. Thus applying iterative reinforcement learning down through your cascaded analytics model.
Finally, the single most important aspect to building any analytics application in IoT is to ensure you keep the customer at the centre of the discussions right from the start, a form of agile IoT. Too of ten the fragmented nature of how business is done means that a huge amount of insight is discovered, but if there isn’t a dollar value attached to it, then it will stay exactly that. The key aspect is to ensure you can translate the “insight” to “impact”. This is where your customers will see the key role analytics plays in their IoT strategic future.
DENIS CANTY is the Lead Technologist for Data Science and IoT Strategy with Tyco’s Innovation Garage, focused at sensing and building analytics applications to solve customer challenges. He holds a Masters in Computer Science and a Masters in Microelectronic IC Design. Denis is passionate about promoting STEM careers for future generations. Denis is on twitter @deniscanty, and his blog is at deniscanty.wordpress.com.
For more insights on IoT, get your free copy of the new DZone Guide to The Internet of Things, Volume III!