How to Ensure Security of Your Real-Time Data Analytics Pipelines

DZone 's Guide to

How to Ensure Security of Your Real-Time Data Analytics Pipelines

Protecting sensitive IoT data at the cloud edge, including pipelines such as streaming data and using real-time big data analytics.

· IoT Zone ·
Free Resource

pipeline running across an open field

Keep your pipelines secure!

Fog computing, a paradigm that aims to bring cloud-like services closer to users and data sources, has become quite popular among IoT firms. Not only does it allow speeding up reaction to events and acting on perishable insights, but it also helps companies conserve core network bandwidth.

You may also like: DataOps: Leveraging DevSecOps Principles for Secure Data Analytics

Moving computations to where IoT data (in all its variety and velocity) is generated has some downsides too. The devices at the edge usually have a low computing capacity and they may, due to slow uplinks, receive security updates with substantial delays.

Besides that, all sensor data in IoT networks usually flows through multiple vulnerable infrastructure components, which opens a wide attack surface.

How to Go About Protecting Streaming Data?

First, we must talk about the unique characteristics of IoT data:

1) It comes in large volumes as there are, typically, myriads of devices in a connected network that are constantly emitting events.

2) The datasets are heterogeneous as the devices producing them are of different types.

3) Each data object may come with both location and time stamps attached to it.

4) IoT data is often very noisy as the devices producing and transmitting it (weak in terms of computational power) tend to distort it.

Another key point is that information from IoT devices (say at a manufacturing plant) requires a quick reaction as opposed to general big data that can be stored in remote data centers for days. The insights extracted from IoT data are typically extremely perishable.

Fog computing was created specifically to address the latency issue in cloud-based IoT networks, but, as it turns out, processing lots of information at the edge, one or a few hops from the IoT devices, is quite risky.

If an attacker manages to break into an IoT network, which shouldn’t be too hard to do given the devices’ vulnerability, they would not only access sensitive data; they could potentially corrupt datasets, send them to the cloud and thus compromise a firm's entire IoT deployment.

As far as protection goes, one of the most recent (and promising) approaches has been locking both edge data and computations into a Trusted Execution Environment and thus, preventing their direct contact with the sophisticated and untrustworthy infrastructure components at the edge such as commodity operating systems, user libraries, etc.

Applying this strategy, however, also creates challenges: it’s tough to execute high-throughput analytics in a single, separated TEE due to it being constrained by a small trusted computing base (TCB). Also, since processing verification procedures (that involve elements outside a TEE) must somehow take place, it’s hard to avoid the risks completely.

Heejin Park et al, suggest resolving these challenges through implementing a streaming solution that includes a data plane that, while incurring insignificant overhead, supports remote attestations via cloud verifiers.

The plane, according to the researchers, has a very narrow interface and so it barely interacts with the vulnerable elements in the network. It encloses sensor data, low-level streaming algorithms (trusted primitives) and key runtime functions. All the heavy computations, including scheduling and threading, are left out of TEEs in this scenario for the sake of security.

To handle the velocity of IoT data, a programming abstraction called uArrows is used (these are unbounded buffers that can encapsulate all analytics data.)

To ensure processing verification, the solution supports cloud verifiers that attest to two things:

1) that the stream pipeline has low output delays;

2) that all the ingested data is processed according to the pipeline declaration.

The proposed system captures coarse-grained data flows, produces audit records, compresses them through a domain-specific encoding, and then sends them out to the cloud for verification.

To maintain security, the data plane only ingests data through a trusted IO.

Due to limited physical memory in TEEs, a memory allocator must be implemented also to deal with the high-velocity data streams in IoT networks. It needs to be lightweight enough to suit reduced TCBs, create small memory layouts, and be able to reclaim memory quickly.

Generic allocators, such as the ones used in popular streaming engines today, wouldn’t work as practically all of them involve optimizations far too complex for a TEE.


Real-time big data analytics applications are quickly becoming popular among IoT firms and so does the fog (edge) computing paradigm.

Processing large streams of data near the cloud edge seem compelling as it allows firms to avoid the need to transmit large volumes of data to the cloud. It also exposes IoT networks to risks; the devices at the edge are computationally weak and a powerful adversary could easily exploit their vulnerabilities to get access to sensitive information or even corrupt entire IoT deployments.

The approach of locking both data and edge computations into a TEE, if implemented properly, can help companies quickly resolve these pressing security issues.

Further Reading

Security, Through the Lens of Data Science

Security Analytics: Big Data Use Case

3 Ways Predictive Analytics Can Boost Your Cybersecurity

big data analytics ,real-time analytics ,fog computing ,iot ,streaming ,streaming data ,pipelines

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}