Tech Talks With Tom Smith: Transformer Enables DataOps
Debugging ETL is like a self-inflicted root canal.
Join the DZone community and get the full member experience.Join For Free
While attending the DataOps Summit, I had the opportunity to speak with Arvind Prabhakar, CTO, StreamSets. From Arvind’s perspective, the world has gone from a handful of large, managed apps to smaller more specialized apps that are constantly changing and evolving with needed features and patches.
The paradigm of integration 10+ years of smart apps and dumb integrations cannot be sustained. You need smart pipelines feeding data to apps giving them the breathing room to evolve.
Developers need smart pipelines to decouple endpoints so one change doesn’t break the application. A smart fabric for all apps helps innovation and adds value. The value the application adds is based on how deeply it is integrated into the data fabric.
You may also like: Compliant DevOps.
To match human classification, it has to be put into the context of the broader application of the enterprise with access to a feed of all of the data it needs. Burden shifts to how tightly integrated into the fabric. Smart, instrumented pipelines enable the integration at a fraction of the cost and time
Start DataOps by using smart pipelines with the practice of integrating over time and automated monitoring to ensure working correctly. Drift will break automation. Smart pipelines manage drift. Traditional data integration approaches do not take drift into account. A smart pipeline gives you drift handling, enabling you to innovate and be more specialized. Data fabric will keep all of these applications functioning to provide a seamless experience. The vast majority of critical business logic driving business today resides in integrations; as such, the integrations need to be treated with the same reverence as code.
That’s why Arvind was excited about the announcement of StreamSets Transformer, a drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users, even those without specialized skills, Transformer enables the creation of pipelines for performing ETL, stream processing and machine-learning operations. Data engineers, scientists, architects, and operators gain visibility into the execution of Apache Spark while broadening use across the business.
Transformer brings DataOps to Spark, the framework for dominant computation for ETL, ML, and AI. Spark is deeply ingrained in the enterprise as an analytic engine. Where Spark needs DataOps is no different than another meaningful tech, when deployed at scale have to manage drift. As great as it is, it’s susceptible to the requirements of tight coupling
Transformer enables the implementation of Spark and the creation of self-defining pipelines without worrying about breaking if dataset changes. Smart pipelines automatically handle changes in the system
Today the idea of debugging ETL is like a self-inflicted root canal, and Transformer removes the pain in debugging complex applications built on Spark with progressive error detection and removal. It enables people to trust critical business metrics.
Customers had operated blindly without realizing it. DataOps creates a line of sight from business owners to developers and data engineers. Transformer enables a new way of ETL workloads previously not possible due to the scale and cost of error management. Accepting changes with confidence and agility to be competitive and get to the next level.
Opinions expressed by DZone contributors are their own.