TechTalks With Tom Smith: What and Why of DataOps
Enables the enterprise to get insights from analytics projects in weeks rather than months.
Join the DZone community and get the full member experience.Join For Free
Shekhar Iyer, President, StreamSets kicked off the first DataOps Summit with an observation about Amazon driving digital transformation. From Sheyhar’s perspective, Amazo isn’t an online retailer, it’s a technology company that initially disrupted the retail book industry; now it's the 1,600-pound gorilla radically changing any industry it targets.
The largest healthcare companies in the U.S. lost $30 billion market cap when Amazon announced they were entering the healthcare market. Amazon is an example of a new breed of companies that “go fast and get things done right.”
You may also like: CodeTalk: DataOps 101 - Key Concepts on the Self-Service Data Movement for Devs [Podcast].
Kirk Borne, Principal Data Scientist and Executive Advisor, Booz Allen Hamilton shared his thoughts on “DataOps – Buzzword or Buzzworthy.” From Kirk’s perspective, the ability to blend and integrate data is a fundamental challenge for organizations given how much data we have today. He believes DataOps gives us the power to understand data through metadata. He defines DataOps as DevOps for all data activities enabling data to deliver value at the point where it’s most needed.
Companies today have so much data; however, they are constrained because they don’t know what to do with it. This isn’t a technology problem, it’s a creativity problem. Companies and their employees need to be more creative in how they think about using the data they have, and other data they can assess, to solve business problems and make their consumers lives simpler and easier.
Science is an iterative process. Learning is an iterative process. As such, machine learning (ML) and Data Science are iterative processes. Kirk encourages companies to adopt a culture of experimentation, think big, start small, learn fast, build proofs of value, and encourage experimentation. Ten Signs of Data Science Maturity addresses this. The learn fast culture of DataOps helps to avoid data “oops.” Fail fast, to learn fast.
Kirk wrapped up his presentation with his 10 Commandments of DataOps:
- Honor business value above all other goals.
- Begin with the end in mind -- focus on the outcome you are trying to achieve with analytics.
- Know thy data – it’s like having a first date with your data.
- Data are like your children – love them all (clean, label, curate, blend, catalog, and expect big things from each one).
- Validation is a virtue, but the generalization is vital. Don’t just build the model right. Build the right model. That’s DevOps and DataOps.
- Honor your data’s last-mile and first-mile challenges – integration of multiple data sources and extracting actionable intelligence.
- Keep it agile – lean, iterative, incremental, continuous learning.
- K.I.S.S. – Keep it simple and smart with reusable components and reusable business logic, APIs, microservices, and building blocks.
- Test early and often – fail fast in order to learn fast.
- Leave no team member behind – DataOps connects data collectors, data engineers, data scientists, data operations . . . all stakeholders.
Girish Pancha, CEO, StreamSets posited that DataOps is the foundation upon which all software will be built in the future. Data’s role in the enterprise is changing. BI is under pressure from unstructured and real-time data that business needs to make informed decisions and to provide better user and customer experiences (UX/CX). DataOps can help bring order to data chaos and it is critical to do so in an app-driven economy.
Data drift is a natural consequence of modern data infrastructures as we move from a transactional view of the world to an event-driven view of the world. Every piece of logic you apply to data is a piece of logic in an information supply chain. Data drift is unpredictable and unending; however, DataOps uses smart tools to discover data, detect changes, and monitor operations in an automated way enabling developers to harness drift, detect it, act on it, and provide governance with a policy-driven perspective.
DataOps enables developers to build smarter applications, engineering to be responsive, ops to be efficient and scale, and business owners to have confidence in data.
There were several speakers at the summit who shared how traditional methods of data analytics required six or more months while DataOps delivered the same results in less than two weeks
Opinions expressed by DZone contributors are their own.