Over a million developers have joined DZone.

Why the Industrial IoT World Needs Open Source to Innovate

DZone 's Guide to

Why the Industrial IoT World Needs Open Source to Innovate

Learn more about why IIoT needs open-source solutions.

· IoT Zone ·
Free Resource

This article is featured in the new DZone Guide to IoT: Connecting Devices & Data. Get your free copy for insightful articles, industry stats, and more!

The industrial world has a long history of modernizing their process controls in order to keep production running efficiently and safely while minimizing downtime. Yet, many are locked in established data historian solutions that are costly and lack the methods needed to provide innovation and interoperability. In contrast, open-source software —which is built on the foundation of community — inherently provides diverse design perspectives not available from a single software vendor. It provides freedom from vendor lock-in, which means it will always provide you with the ability to integrate with other solutions. And finally, open-source software provides customization, allowing you to adapt the code to fit your ever-changing system requirements (which is not easy with proprietary systems). In this article, we will examine what the existing solutions lack and review a few open-source projects that should be considered for future success for operators.

Current Situation

Industrial organizations around the world, large or small, have been working with a number of solutions to digitally transform their manufacturing processes. Most organizations use a system of software and hardware components called Supervisory Control and Data Acquisition (SCADA) to help control machinery and systems in a factory in real time. In particular, these systems control processes locally or remotely by gathering event data from sensors, valves, pumps, motors, and recordings. In addition, the relevant data is presented to the operator to make decisions about the machinery to keep it running optimally. Many industries rely on SCADA systems — including energy producers, manufacturing, and food and beverage — to collect event data such as:

  • Instrument readings (flow rate, valve position, temperature)
  • Performance monitoring (units/hour, machine utilization vs. capacity, scheduled vs. unscheduled outages)
  • Environmental readings (weather, atmospheric conditions, groundwater contamination)
  • Production status (machine up/down, downtime reason tracking)

All process and event data include a value and a timestamp and are stored in a data historian to show trends per machine or across a collection of machines. A data historian is a time-series database, and as such, needs to allow for fast ingest and query of data in near real-time and provide compression of the data to minimize storage.

There are many commercial data historian solutions in the market, and several have been on the market for some time, yet all these solutions come with a number of challenges — primarily cost, vendor lock-in, and scalability.

Cost — These solutions are not cheap, charge an annual license and support fee, and are costly to set up and maintain. Moreover, custom development on top of these off-the-shelf products is common, which may require outside consulting resources. And since these solutions are proprietary systems, the work is time-consuming and expensive.

Vendor Lock-in — These solutions are often Windows-based and do not offer a simple, open API for other software to interface with. This means you are limited to integrate and buy all components from only one vendor, locking you into a proprietary solution.

Scalability — Collecting event data from your equipment is just the beginning. True digital transformation requires more data sources and more analysis of the combined data to gain a better understanding of your systems. Doing so with the existing solutions will require the vendors to create (and charge for) new interfaces for data import. The good news is that this data is easy to export to spreadsheets. The bad news is that spreadsheets only give you a static view. What is required instead are modern dashboarding engines that have obsoleted the idea of exporting large time series datasets. Ultimately, with all of this data coming in, you can no longer rely on manual techniques for analysis.

An Open-Source Alternative

To contend with these challenges, the industrial sector should consider new ways for optimizing operations, including trying open-source solutions. Previously, open source held the stigma of being the cheap alternative to proprietary software. Today, open source is at the heart of innovation in organizations, as it allows developer teams to quickly bring ideas to fruition faster.


Built for developers, InfluxDB provides high throughput ingest, compression, and real-time querying of that same data.

Efficiency and effectiveness have to start in the data structure, ensuring that time-stamped values are collected with the necessary precision and metadata to provide flexibility and speed in graphing, querying and alerting. The InfluxDB data model takes the following form:

<measurement name>,<tag set> <field set> <timestamp>

The measurement name is a string, the tag set is a collection of key/value pairs where all values are strings, and the fieldset is a collection of key/value pairs where the values can be int64float64bool, or  string. Support for data encoding beyond float64 values means that metadata can be collected along with the time series, and not limited to only numeric values. In addition, there are no hard limits on the number of tags and fields.

Efficiency and effectiveness have to start in the data structure, ensuring that time-stamped values are collected with the necessary precision and metadata to provide flexibility and speed in graphing, querying and alerting.

Having multiple fields and tags under the same measurement optimizes the transmission of the data, which is important for remote devices sending metrics. The measurement name and tag sets are kept in an inverted index which makes lookups for specific series very fast. See below how battery metrics mapped in InfluxDB line-protocol:



Factry Historian is a data collection platform for production systems. They use InfluxDB to store time series data and open-sourced two projects to collect and expose time series data using the OPC-UA standard. The OPCUA-to-InfluxDB open-source app polls a number of PLC values and writes the data into the database. If the database is unavailable, it buffers the data in a local database. As you can see, these are the types of values that they are able to collect and store.

name = "temperature"
tags = { equipment = "TANK42" }
nodeId = "ns=3;s=PLC_TANKS.db103.16,r"
collectionType = "polled"
pollRate = 12 # samples / minute.
deadbandAbsolute = 0 # Absolute max
difference for a value not to be collected
deadbandRelative = 0.0 # Relative max
difference for a value not to be collected

They also open-sourced an OPC-UA server that exposes the data stored so a SCADA system can connect to this server and read the data at the interval it needs.

The Factry solution can serve as a model for creating a Data Historian replacement, using open-source projects for data collection and storage.

Grafana Labs

Grafana is an open-source tool for visualizing time series data. It connects directly to your data source to help you create visualizations that can show data in near-real time, instead of the stale data that you present with spreadsheets. Grafana has a number of visualization options to help you understand your data, and is backed by a vibrant community who help to contribute a number of dashboards and plugins for a variety of use cases.

The interesting thing about dashboards is that they help to fuel collaboration and inquisitiveness, which can result in even more efficiencies. With a simple point-and-click interface, you will find all users starting to dig into the data to find areas to improve upon.

Image title

Loud ML

Loud ML is an open-source deep learning API that makes it simple to prepare, train, and deploy machine learning models and crunch the data stored in a number of data stores. The user selects the times series that they want to model and sets the model date ranges, then Loud ML will build the models and save them for inference in production. You can add predictive capabilities and machine learning to your application in minutes and use it for:

  • Forecasting capacity, usage, and load imbalance for energy producers and suppliers
  • Forecasting demand for inventory and supply chain optimization
  • Predict equipment failure for maintenance operations planning

The user interface makes it easy to start building powerful predictions of your operational data.

Image title


Open-source tools are powerful and easy to use and have shown to make a difference in various industries, leveling the playing field for startups to compete with established players.

These solutions are quick to set up and get started, but more importantly, they will work in your environment and produce the improvements that will start to appear in your bottom line quickly.

This article is featured in the new DZone Guide to IoT: Connecting Devices & Data. Get your free copy for insightful articles, industry stats, and more!

iot ,open source ,iiot ,industrial iot ,influxdb ,grafana labs ,factry ,loud ml ,open-source tools

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}