Over a million developers have joined DZone.

Mainframe Data Externalization: Breaking Data Silos

DZone's Guide to

Mainframe Data Externalization: Breaking Data Silos

Mainframe data analytics is just as important as big data analytics. Learn about mainframe and legacy data externalization and see how to break down data silos.

· Big Data Zone ·
Free Resource

The Architect’s Guide to Big Data Application Performance. Get the Guide.

We are living in a data world, and data plays a huge role in today’s businesses. The mainframe is one of the critical big data sources, but many enterprise platforms don’t have connectivity with it. Enterprises see tremendous opportunity in establishing connectivity with the mainframe and new platforms. Organization decision-makers need timely, accurate data to create value for the enterprise. To gain complete insight into the business, they must be able to access all relevant data — regardless of the source or originating format — and integrate it across social, mobile, and cloud technologies. Big data, which usually unstructured, is grabbing headlines in the enterprise world. But the system of record data that exists on legacy mainframes is just as important as big data, and big data analytics is incomplete without this system of record data. Like big data, mainframe data exists in large volumes in the form of transactional and log data. Organizations are investing more to create efficient big data solutions for integrating mainframe data and big data so that businesses can get a 360-degree view of the system using data analytics on a single platform.

The Need for Legacy Data Externalization

Still, a large portion of world enterprise data is stored in mainframe systems, and organizations rely on this mainframe data for day-to-day operations. The business world knows that leveraging this mainframe data in Hadoop can create a remarkable competitive advantage. Legacy data services can make mainframe data available across the enterprises, regardless of platform, operating system, storage system, format, or original physical location of that data. That’s why organizations are now investing more in creating solutions to integrate mainframe data with other data sources and bring it into a single platform for the better analytics solutions. Below given are some of the reasons for mainframe data externalization.

  • Mainframe and open source, data integration, expertise, and security gaps.

  • Data analytics in the mainframe is cost-effective.

  • It's an effective way of data visualization using the latest technology.

  • The data format is different; mainframe stores in EBCDIC whereas others store data in ASCII.

  • Mainframe uses a variety of binary data types, such as packed decimal.

  • Cobol copybooks are typically used to define the layout of mainframe data files. These can be very complex and may contain logic such as nested OCCURS DEPENDING ON clauses.

Approaches to Mainframe Data Externalization

Mainframe data externalization has become a trend, and every organization is trying to invest more to get more business insights and better customer services. There are many enterprise and open-source solutions are available for data externalization. Choosing the right solution and tool requires a lot of research. The key trends of data services are to reduce corporate data silos to gain efficiency and productivity, to create common data backbone for operational and informational use, and to create a bimodal IT in the enterprise modernization effort. Predominantly, mainframe data can be served in two ways.

  1. Data virtualization
  2. Data offloading

Image title

Data Virtualization

In today’s digital economy, companies need to quickly access data from different sources to gain real-time insights on enterprise applications. Data virtualization enables data structures that were designed independently to be leveraged together from a single source in real-time without data movement. With data virtualization, users have virtual access to mainframe data in real-time without needing to know details such as how the data is formatted or where it is located. It eliminates scalability bottlenecks associated with data connectors and the cost and complexity of ETL.

Spark has mainframe connector packages donated by Syncsort. Using the spark mainframe connector, a frame can be developed to access the mainframe data and other source data in real-time.

Data Offloading

Data offloading is the traditional way of data moving out of the different source system and to the target system periodically. It involves many phases like connect, extract, transform, and load. With evolving business needs, enterprises are searching for new ways to integrate mainframe data into big data analytics platforms. To create such solutions, enterprises either can go with market-leading tools or a customized data pipelined solution can be created.

  • Export the VSAM file to flat files using IDCAMS utilities (REPRO).
  • Download DB2/IMS table to flat files.
  • Convert the packed/binary data using SORT utilities.
  • Transfer the data using via FTP, SFTP, or FTPS.
  • All the jobs to be scheduled to run at appropriate intervals as per needs.


Learn how taking a DataOps approach will help you speed up processes and increase data quality by providing streamlined analytics pipelines via automation and testing. Learn More.

big data ,data analytics ,mainframe data ,legacy data ,data silos

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}