Bringing Transaction and AI Data Worlds Closer: A Notion of Integrated Data Platforms
With the continuously evolving data landscape, enterprises are really struggling to deal with their data silos and make the maximum out of their data.
Join the DZone community and get the full member experience.Join For Free
Vaibhav S Dantale – email@example.com
Subhendu Dey – firstname.lastname@example.org
Sandipan Sarkar – email@example.com
Ram Ravishankar - firstname.lastname@example.org
The recent accelerated adoption of hybrid multi-cloud and cloud-native application architectures is rapidly changing the data architecture landscape. The impact is to both, the transactional data processing domain, and the analytical data processing domain.
Factors driving the new world of transaction processing applications (OLTP) include: composable applications, new programming models, microservices/API-based application architectures, event-driven transaction management patterns, the need for data at the edge and horizontally scalable cloud resources are the key factors primarily driving the new data models in the transaction processing domain. While relational databases are still ruling the world, new types of databases such as documents, key-value pairs, wide column databases, and graphs are fast evolving leveraging the cloud infrastructure architectures. Data exchange techniques simplified with JSON is another key factor driving the new data model.
Not only the world of new applications accelerated by digital transformation initiatives (largely due to Covid) but also fast-catching modernization of monoliths adding fuel to the fire for designing data models the new way.
From the analytical processing domain (OLAP), AI/ML, the quest for intelligent workflows, and the three Vs (Velocity, Volume, and Variety) are the major factors for new data models and management systems.
All that said, the value for businesses will be realized when intelligence is infused directly into the transaction processing domain. Impacting customer experience dynamically and instantly with intelligence drawn from analytics is what enterprises are string for.
What if there is a data platform that can bring the two domains close together thus influencing customer experience?
New-Normal 1: Distributed Transactional Processing Systems
The hyper-modular application architectures with microservices and other cloud-native architecture styles are fundamentally altering the data-gravity picture in an enterprise. In this new normal, transactional data has not gravitated in huge monoliths – but distributed on multiple cloud ecosystems as well as on the edge. This drives the new transaction processing system architectures requiring polyglot database technology based on the type of data it needs to process and various non-functional requirements. While relational database technologies are still highly preferred, the adoption of NoSQL, NewSQL, Graph, etc. type of database technologies and various object-store mechanisms like S3 is increasing dramatically. Also, when a monolithic database is broken into micro DBs to support distributed application architecture, it imposes a challenge of maintaining a single version of the truth, a master data, in one place. The change in master data needs to be immediately available across all distributed applications. This is shifting the need for Master Data Management (MDM) capability from analytical systems to transactional systems.
In order to govern the data models in distributed systems, an enterprise-wide consistent definition in data models or Canonical Data Model (CDM) is also experiencing a shift-left trend. This can be achieved by data cataloging.
Also, due to the increasing trend of performing analytics on the transactional systems in the attempt of improving customer experience, the database technologies capable of handling mix-workload (OLTP and OLAP) like in-memory databases are in demand.
One of the key needs for the distributed data transaction model is a well-architected interoperability fabric. Such a capability will also be required to manage the co-existence of the legacy systems with the evolving new applications and data systems. Such Co-existence and Interoperability will need to be enabled by the fabric in different ways of ensuring guaranteed data delivery and completion of transactions. Architectural patterns like caching, change data capture, API-centric transactions, SAGA, ETL, data virtualization, and event-driven data management pattern will be needed in that well-architected fabric to ensure consistency, security, and eventually consistent transactions. All these without compromising security, integrity, and performance.
New-Normal 2: Intelligent Workflows Requiring Advanced Analytical Processing Systems
On the other side, the analytical systems architecture is also going through drastic changes. The emergence of Intelligent Workflows that embodies the infusion of AI into enterprise business processes is a fundamental factor for the change. While the traditional capabilities like real-time streaming, batch processing, data warehousing, data lake, and analytics dashboards are still going to be around, the maturity of enterprises and the adoption of AI/ML is triggering the need for more advanced capabilities in the architecture for data engineering, and for business users to equally contribute on day to day basis. Now analytics models are data-hungry, and they do not want to live with long data latency. Descriptive analytics wants to publish the latest state of the transactional world. Prescriptive analytics models need retraining on the latest version of the truth to keep them up-to-date with the ever-changing state of affairs. Also, stream analytics on the data in transit is also gaining momentum. Thus, real-time ingestion through streaming and event hub is necessary.
To suit this change, the data engineering and insight development would need such capabilities as the DataOps toolchain, collaborative model development environment, advanced visualization, AI lifecycle automation. In order to improve business user contributions, capabilities like data cataloging, data virtualization, self-service, and advanced visualization will be required as well.
A Vision for Integrated Data Platform: Bringing the Transaction and Analytical Processing Worlds Closer
To sum it all up, from a functional requirements point of view, an integrated data platform will be the one that will have the capabilities required by both the domains where there will significant overlapping capabilities like integration layer, data storage, and master data management.
On the non-functional requirements front, apart from traditional NFRs, all functional components should be well integrated, available on a uniform platform, allow build consistent architecture across hybrid cloud environment, portable on any cloud or data center, extensible to include/integrate new technologies.
And on top of all these, there will have to be common security and governance capabilities required across both domains.
IBM Cloud Pak for Data: One of The Right Choices for Integrated Data Platform
IBM Cloud Pak for Data is a fully integrated data platform comprising all necessary data technologies for transactional and analytical systems well integrated on OpenShift Container Platform required for modern application and data architecture.
To cater to the needs of modern transactional systems, it has all possible database technologies including DB2, PostgreSQL, MongoDB, CockroachDB (SQL, NoSQL, and NewSQL), possible integration technologies including IBM Data Stage for ETL and IBM Streams for real-time data integration and IBM Data Virtualization for a unified view across disparate databases and IBM MDM for maintaining a single version of the truth.
To cater to the needs of advanced analytical systems, it has:
- Db2 Warehouse for in-memory data warehousing;
- Big Data for data lake, Watson Studio for Integrated development environment for collaborative data model development;
- Watson OpenScale and Scale AI for monitoring model performance and automatically trigger retraining and redeployment as rolling upgrades;
- Watson Data knowledge for data cataloging;
- And many other technologies...
OpenShift Container Platform as a foundation makes it portable across all market-leading cloud platforms as well as data centers. This enables enterprises to build consistent data architecture across a hybrid multi-cloud environment thereby reducing the management overheads to a large extent. This also helps to lay down an easy path for developing cloud-native applications.
The build-in data security capabilities and extensibility to advanced security tools like Guardium provide enterprise-grade common security capabilities across application and analytics systems.
With the continuously evolving data landscape, enterprises are really struggling to deal with their data silos and make the maximum out of their data. Also, data is becoming one of the key inhibitors for the adoption of cloud and the challenges aggravate further when enterprises have multi and hybrid cloud environments. The IBM Cloud Pak for Data, an integrated data platform can be the best choice to address these challenges and lay down a concrete foundation for data across the entire enterprise landscape.
Having both the systems, transactional and analytical, it will be easier to infuse intelligence directly to the transaction processing domain and improve customer experience dynamically and instantly.
Opinions expressed by DZone contributors are their own.