Data Management for Industrial IoT

Table of Contents

What Is Industry 4.0? Time-Series Data and Databases Getting Started Building With a Time-Series Database Conclusion

Section 1

What Is Industry 4.0?

Advances in technology continue to drive change in industrial operations. As businesses seek to leverage these advances, it’s important to understand how different technologies impact their operations. Instrumentation, i.e., the use of sensors to measure different aspects of a process, is a key area in the evolution of the industrial internet of things (IIoT). The data generated from sensors and applications have the potential to dramatically affect industrial processes; they generate a lot of data. Businesses need to understand the characteristics and shape of that data, as well as how to effectively analyze and apply it to drive improvements.

One shared characteristic of data generated by instrumentation is that it is time-stamped. Time-series data, therefore, functions as a critical piece in industrial observability and optimization. Industrial operators need to understand this type of data and how to work with it to maximize its vast potential. In this Refcard, we’ll discuss one of the key infrastructure components that contributes to IIoT success: a time-series database.

To understand why the time-series database is so integral, we’ll take a brief look at what is being labeled as “industry 4.0” and how it relates to IIoT. Next, we’ll discuss the characteristics of time-series data and why you should replace legacy data historians in order to achieve industry 4.0 goals. Finally, we turn our attention to getting started with open-source, time-series databases, how to ingest time-series data, and important considerations for building IIoT applications based on time-series data.

The industrial world is one that values consistency and predictability. The evolution of industrialization demonstrates this: What started with basic mechanization came to embrace the assembly line, followed by the use of computers and robots. Today, we’re in the midst of the fourth wave of the industrial revolution where autonomous systems being fed raw and trained data (i.e., machine learning) enhance manufacturing processes. The goal of automation at this level is to keep industrial production running efficiently and safely while minimizing downtime

Industry 4.0 has a direct impact on IIoT because the technological tools and advances of industry 4.0 provide businesses with the data and analysis they need to make informed decisions about mission-critical processes. Instrumentation (i.e., putting sensors on things) is the heart of IIoT. These sensors generate large volumes of time-stamped data that tell businesses how their equipment, machines, and systems are functioning.

Time-series data, therefore, functions as a critical component of industry 4.0. Fortunately, the basic design principles behind industry 4.0 dovetail with the characteristics of time-series data:

Interconnection – The ability to have devices, sensors, and people connect and communicate with each other.
Information transparency – Interconnection allows for the collection of large amounts of data from all points of the manufacturing process. Making this data available to industrial operators provides them with an informed understanding that can aid in the identification of areas of innovation and improvement.
Technical assistance – The ability to aggregate and visualize the data collected with a centralized dashboard allows industrial operators to make informed decisions and solve urgent issues on the fly. Furthermore, centralized data views help industrial operators avoid conducting a range of tasks that are unpleasant or unsafe for them to perform.
Decentralized decisions – The ability for systems to perform their tasks autonomously based on data collected. These systems only require human input for exceptions.

Moving from 3.0 to 4.0

The shift to industry 4.0 is ongoing and presents many challenges. Oftentimes, it’s not possible or practical for businesses to implement wholesale upgrades to their industry 3.0 processes to make them compatible with industry 4.0 specifications. Such a piecemeal process exacerbates pre-existing issues with legacy systems.

A quick look at a typical enterprise control system stack helps to illustrate this challenge. Industrial organizations of all sizes all around the world work with any number of solutions to digitally transform their manufacturing processes. The following is a simple depiction of the enterprise control system stack as described by ISA-95.

SCADA Systems

At the control systems level, a system of software and hardware components called Supervisory Control and Data Acquisition (SCADA) helps control machinery and systems in real time. These SCADA systems control processes locally by gathering and recording event data from sensors, valves, pumps, and motors. The SCADA system presents relevant data to the operator locally so that they can make decisions to keep machinery running optimally.

Many industries rely on SCADA systems — including energy producers, manufacturing, and food and beverage — to collect event data such as:

Instrument readings (e.g., flow rate, valve position, temperature)
Performance monitoring (e.g., units/hour, machine utilization vs. capacity, scheduled vs. unscheduled outages)
Environmental readings (e.g., weather, atmospheric conditions, groundwater contamination)
Production status (e.g., machine up/down, downtime reason tracking)

Section 2

Time-Series Data and Databases

Metric and events data generated by industrial processes, such as SCADA data, includes a value and a timestamp. This time-stamped data is the core of IIoT automation in industry 4.0 as it fuels the analytics, visualization, modeling, and prediction capabilities in IIoT use cases. This data is stored in a data historian, which is essentially a time-series database. However, industrial organizations need a data historian that can function in an industry 4.0 context in order to use that data effectively.

Characteristics of Time-Series Data

Data historians exist because time-series data typically gets used differently than other data workloads; it requires a database optimized to handle time-series data. With time-series data, the most recent data is the most relevant. As a result, time-series data uses data lifecycle management, summarization, and large range scans of many records in ways that other data types do not.

With time series, high-precision data tends to be retained for a short period of time. This data gets aggregated and downsampled to indicate long-term trends. This usually means that every data point that goes into the database gets deleted after a set period. However, not all time-series use cases maintain their time-series data the same way:

	DevOps use case	IIoT use case
Metrics	CPU load, disk I/O, database stats	Temperature, pressure, flow, valve state
Resolution	Seconds to minutes	(Sub) seconds
Retention	Weeks, then downsampled	5 to 10 years, no downsampling
Main goals	Incident detection, performance monitoring	Quality guarantee, Overall Equipment Effectiveness (OEE), predictive maintenance

Different business and use case goals affect both the resolution and retention of the data. Regardless of the use case, regularly deleting data in this type of lifecycle management process is difficult to implement on top of regular databases. A non-time-series database requires schemes for effectively evicting large sets of data and constantly summarizing that data at scale. 

Similarly, aggregation and downsampling performed on time-series data occurs within a specific time window. This requires using a range of data points to perform a computation, like a percentile, to generate a summary of the underlying series for the user. This kind of workload is very difficult to optimize for a distributed key-value database. 

These are some of the reasons that industrial operators rely on data historians to store and manage time-series data. However, it’s critical to have a data historian that can meet the evolving needs of modern industrial operations.

Replace Your Legacy Data Historian

The specialized nature of industrial equipment means that many industrial organizations use legacy data historians. These systems work well in the context of Industry 3.0 because they integrate with other operational technology systems. However, as industrial operations evolve, these legacy data historians cannot keep pace with the changes and technologies that companies seek to implement in an Industry 4.0 context.

There are many commercial data historian solutions available, yet they come with several challenges that impact their ability to function effectively with other industry 4.0 solutions. Some reasons you may want to replace your legacy data historian with an alternative solution, like an open-source, time-series database, include:

Cost — These solutions are expensive to set up and maintain, plus they charge annual license and support fees. Most installations of legacy data historians require custom development work to fit the needs of a specific business or process and may require external consulting resources. The proprietary nature of these systems means this work is time-consuming and expensive.
Vendor lock-in — These solutions are often Windows-based and do not offer a simple, open API to interface with other software. Therefore, you need to buy all integrations and components from a single vendor, locking you into a proprietary solution.
Scalability — Scalability issues can stem from both commercial and technical reasons. On the technical side, these legacy data historians were built with a limited dataset in mind. This creates problems when introducing advanced capabilities like artificial intelligence or machine learning (AI/ML). These capabilities require a lot more data in order to train the models, which legacy systems cannot handle.
Poor developer experience — Most legacy solutions have a traditional closed design with limited API support. As a result, it takes a lot of time and money to implement or integrate these systems. These closed-design solutions provide few built-in tools, no developer community, and do not support a modular development approach, thereby limiting developers’ ability to pick and choose the tools that best fit the needs of their organization.
Siloed data — SCADA makers may provide a data historian for their devices, but most industrial organizations that use a traditional manufacturing execution system (MES) consolidate all their data to a single on-premises data historian. However, the lack of a microservices architecture, open APIs, and an extensive use of firewalls and subnets typically separate the data at the site level.

In short, without the ability to integrate with modern IT, cloud, or open-source software (OSS) solutions, legacy data historians do not provide the flexibility and connectivity necessary to evolve industrial operations. This significantly reduces the efficacy of these systems — and the data they contain — in an industry 4.0 context because the lack of interoperability inhibits innovation and limits observability.

Section 3

Getting Started

To effectively collect data from and for IIoT applications, we recommend using a data ingestion tool and a time-series database to handle all your data storage and management needs. You should also consider an open-source, time-series database as an industry-4.0-compatible replacement for your legacy data historian. We’ll look at some example open-source solutions and how to use them with your IIoT infrastructure.

Tagging Data

We’ve already discussed some of the unique characteristics of time-series data. Another important factor to consider when working with time-series data is the shape of the data you collect. Data in a relational database has a uniform shape defined by a schema. Time-series databases typically do not require a schema because the shape of the data can change.

Therefore, it’s important to think about how you want to organize your time-series data. One way to do this in a schema-less database is to use tags, which are a form of metadata.

In order to make effective and efficient use of time-series data, you need to make sure to collect time-stamped values with appropriate metadata. This metadata increases the speed and flexibility for graphing, querying, and alerting.

We can use an open-source, time-series database, like InfluxDB, as an example to see how metadata gets tied to time-series data. Consider the following data model:

    
  
<measurement name>,<tag set> <field set> <timestamp> 

The characteristics of each component of the data model are:

Measurement name is a string
Tag set is a collection of key/value pairs where all values are strings
Field set is a collection of key/value pairs where the values can be int64, float64, bool, or string.

Often, time-series databases support data encoding beyond float64 values so that you can collect metadata associated with the time series that’s not limited to numeric values. This is helpful if you need to know if a valve is open or closed, for example.

Using multiple fields and tags for the same measurement optimizes data transmission, which is important for remote devices sending metrics. For example, if you collected battery metrics in InfluxDB, the data model might look something like this:

    
  
telemetry,product_id=1,battery_type=lead-acid voltage=10.1,current=-0.5,temperature=23.4 1464623548s 

Here, the measurement name is telemetry. There are two tags, product_id and battery_type, as well as three fields, voltage, current, and temperature.

Ingesting Data

Once you know how to format your data, use a data ingestion tool to ingest it into your instance of the time-series database. In our example, we will be using Telegraf, an open-source data ingestion tool. Telegraf is written in Go, compiles into a single binary with no external dependencies, and requires a minimal memory footprint.

The goal of an ingestion agent is to connect to data sources, which it can scrape for data to ingest. To simplify connectivity, an ingestion agent may use APIs or plugins to collect data from a range of sources, such as:

Databases: Connect to data sources like MongoDB, MySQL, Redis, and others to collect and send metrics.
Systems: Collect metrics from your modern stack of cloud platforms, containers, and orchestrators.
IoT sensors: Collect critical stateful data (pressure levels, temp levels, etc.) from IoT sensors and devices.

When looking for plugins, search for ones that leverage popular IIoT protocols and solutions. Some examples, include:

Solution	Description
AMQP	Advanced Message Queuing Protocol (AMQP) is an open standard application layer protocol for message-oriented middleware. Companies often use it to stream data from IIoT processes through an AMQP 0-9-1 broker, like RabbitMQ.
ModBus	Modbus is often used to connect a plant or system supervisory computer with a remote terminal unit (RTU) in supervisory control and data acquisition (SCADA) systems. ModBus protocols include ModBus TCP or ModBus RTU/ASCII.
MQTT	The Message Queue Telemetry Transport (MQTT) protocol is a simple and lightweight messaging protocol ideal for IoT devices.
OPC-UA	The OPC-UA protocol is a leading protocol for connecting to industrial machinery.
LM Sensor	The LM-Sensors package is a free, open-source software tool for Linux that provides tools and drivers for monitoring things like temperature, voltage, humidity, and fans.
Apache Kafka	Apache Kafka is a lightweight, open-source framework designed to handle real-time feeds. Kafka has higher throughput, superior reliability, and better replication characteristics than a lot of alternative solutions. With benefits like guaranteed ordering, highly efficient processing, zero message loss, and more, Kafta supports mission-critical IIoT use cases.

Section 4

Building With a Time-Series Database

Once you configure your time-series solution for data ingestion and storage, respectively, you can start to build applications and create graphs and visualizations. The possibilities here are endless, but the following are a few considerations for building applications that use time-series data, especially when working with an open-source, time-series database.

One thing to consider when working with an open-source solution is what language(s) it caters to. We recommend solutions with wide language support because this makes it easier for a larger number of developers to use the solution.

Visualization significantly helps you understand your time-series data. Make sure to spend some time thinking about how you want to present your data to maximize its effectiveness. Popular visualization tools, like Grafana and Seeq, or Javascript frameworks, like Highcharts, enable you to create custom dashboards based on your data.

In addition to the IIoT protocols mentioned above, there are also IIoT platforms that provide native integrations with time-series databases. ThingWorx Kepware and HiveMQ, for example, are popular IIoT platforms that offer native integrations for a wide range of solutions from within their ecosystems. Balena, Losant, and Node Red are additional IIoT platforms that simplify integrations through APIs.

Section 5

Conclusion

Developing and implementing industry 4.0 processes in your IIoT operations can unlock new capabilities, provide previously unavailable observability, and drive optimization. At the core of these processes is a continuous stream of sensor data that informs decisions and automated processes. When you can analyze and visualize all this data, it reveals aspects of industrial processes not previously observable.

Raw IIoT data is unlocked potential. You must effectively analyze and leverage that data to drive improvements to your industrial operations. And to do that, you need tools that can handle data at scale. Equally important, these tools must offer agnostic interconnectivity with the modern IT, cloud, and OSS solutions that propel IIoT into industry 4.0. Limiting connectivity simply reduces the number of sources you can collect data from, resulting in an incomplete picture of your industrial operations.

By collecting and analyzing data from every aspect of your operation, you can optimize your IIoT processes and implement more accurate forecasting models and predictive maintenance schedules. Time-series data lies at the heart of this entire process, so select a time-series database with the tools, features, and interoperability necessary to leverage that data, which can have a direct effect on your ability to evolve your IIoT operations in an industry 4.0 context.