Design Considerations for Real-Time Operational Data Architectures for Industry 4.0
We examine the rise of the Industry 4.0 era and how IoT and big data processes can help us meet its challenges.
Join the DZone community and get the full member experience.
Join For FreeThe fourth Industrial Revolution brought cyber-physical systems to the manufacturing floor, leading to to the production of data at an unforeseen volume. Most of the current statistical process control systems (SPCs) were designed in the Industry 3.0 era, and they use only a fraction of the data produced on production lines to monitor the production quality.
However, to ensure waste reduction and yield increases, new age manufacturing systems need real-time operational data replication and analysis to improve. This article lists the key design considerations for real-time operational data architecture.
Separate Storage and Compute
Building a centralized data lake or data warehouse to consolidate all the machine and application data may be an easy first decision to make. However, careful consideration is needed to separate storage and compute in order to provide greater scalability and performance. Below are some examples.
Ability to store diverse data types – IoT in Industry 4.0 keeps introducing new data formats. Most of the time, companies realize the data potential months or years after the data is generated. Separating storage from computing allows organizations to onboard new data formats quickly without having to worry about how to process the data.
Fast data replication – Most of the packaged data marts, where storage and compute are coupled together, expect the data to be prepared before getting ingested into data marts. Choosing data marts such as data lakes, where storage is decoupled from compute, allows companies to store any data format without any preparation. Sometimes, one can even avoid data replication altogether if distributed query engines work with distributed data sources or data marts.
Fast data retrieval – Once compute is separated from storage, it can be scaled up or down according to your needs. New age distributed query engines come with sophisticated auto scale features.
Leverage Edge Computing
Unlimited computing is available with the cloud, which makes it tempting to send all the data to the cloud and let the cloud prepare data for consumption. At the outset, it looks like the right decision since, in the cloud, data replication speed is at its best. However, replicated data is not immediately useful for consumption as it needs to be prepared further. Moreover, continuous data preparation in the cloud is not going to be cheap, and cost will increase at a rate directly proportional to the increase in data volume.
Edge computing layers, which are attached to machines, can be leveraged to solve the problem mentioned above. Data can be curated and structured closer to the data source at the edge layer, reducing the compute needed in the cloud and making the data ready for consumption much faster. Whenever new intelligent machines are added to the production floor or the supply chain, computing cost goes up. Hence making the best out of edge computing will help to create sustainable and cost-effective architecture for the industrial world.
Metadata Management
There are multiple, well-documented advantages of maintaining metadata. Below are some of the advantages o fcentrally maintaining metadata in an industrial world.
Distributed query systems - If the metadata is centrally maintained, it can help create a customer experience as if the customer is querying one single database even though the data is stored in distributed data marts. Frequently queried operational data can be stored in a database, and all others can be preserved in inexpensive object storage.
Data quality – Metadata can help to validate the data before ingesting into data marts, leading to high data quality and reduced effort in data cleansing. Centralized metadata repositories, combined with API engines, can help ensure data quality throughout the enterprise.
The above advantages — distributed query engines and data quality — also help to increase data retrieval speed.
Optimized File Formats
Optimized file formats are hugely popular in big data systems. However, the usefulness of optimized file formats can go beyond big data storage systems and help to create low latency data transfers as well. For example, data can be converted to a columnar format closer to the edge and be transmitted to the destination rather than sending the data in traditional formats such as CSV, XML, or JSON.
Using a columnar format, one can reduce the size of the data file close to one-fourth of the actual size. Reduced size helps to expedite data transfer processes. If the destination is expected to be a data lake or big data file systems such as HDFS, these file formats are easy to be stored without further data formatting and will be ready to be consumed immediately.
The above mentioned points are critical design considerations in either creating the data system from scratch or implementing a packaged solution.
Opinions expressed by DZone contributors are their own.
Comments