DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Introduction to DataOps

Introduction to DataOps

For each new tech word, it's always good to get an introduction. Here's your entry point to DataOps.

Thomas Jardinet user avatar by
Thomas Jardinet
CORE ·
May. 17, 19 · Presentation
Like (7)
Save
Tweet
Share
12.54K Views

Join the DZone community and get the full member experience.

Join For Free

DataOps

The term DataOps is currently gaining a lot of traction, with solutions emerging that have significantly matured. Let's dicscuss whatDataOps is all about.

Why?

I can start by citing the first sentence of the Wikipedia page, which reads: "DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics.” Beyond this condensed summary, this means that DataOps makes it possible to quickly meet the data analytics needs of the business, with reliable figures. Quickly because all the tools are industrialized and facilitate collaborative work, but also because we are confident of the quality of the figures reported.

For Whom?

So, what are the necessary profiles and their associated roles? Here is the list:

  • Data Engineer: They are a kind of extended data architect. In addition to knowing how to manage a SQL database, they must be able to manage big data technologies, but also to manage data ingestion flows. They are the ones who must make the data available to the other actors in the DataOps chain.

  • Data Analyst: They are the ones who are responsible for the subject of the report/visualization. They must therefore be able to make the visuals, clean the data, program, and make statistics and machine learning modules to be able to estimate for example future figures.

  • Data Scientist: This is still an evolving role, but, to put it simply, they are the person who will be an expert in the field of the business, and who has skills in statistics, machine learning, and mathematics, in order to extract intelligence from data.

  • DataOps Engineer: Their role is to provide a unified platform between all stakeholders, and to orchestrate the data pipeline and automated data quality control.

How?

The idea in the end is to have two pipelines. A continuous data ingestion pipeline, and a pipeline for new developments, which meet during data production. Ideally, therefore, a unified platform is needed to handle all this and centralize people around the same tool. Tools exist, such as DataKitchen or Saagie, to monitor the data production chain. This chain, where the typical steps of data access, transformation, modeling, and visualization and reporting are performed, must be able to be followed from start to finish, but also allow for a unified view of the non-regression tests. The tests to be implemented are the typical tests that we are used to having, but to which we will add "Statistical process control" tests. These tests consist in detecting that the returned metrics remain in normal numbers. If you measure stock consumption in a factory, you do not normally expect to increase by 50% in one month. The subject of the SPC is a rather broad one, and which would greatly deserve a book-length treatment; but, I'll just redirect you to the Wikipedia link first.

In terms of capabilities, you also need a personal sandbox for everyone. Except that the sandbox must contain a fresh local dataset. And, of course, all this should be performed with version management! This allows you to properly manage the whole big data ecosystem that you will orchestrate, from the recovery of the data to its final restitution to business people.

All this in order to set up a Datops process, where the steps are as follows:

  • Sandbox management: Like you already know, in DevOps, the goal is to have an isolated dev environment. But here you have data to manage too....

  • Develop: As in DevOps, you develop your features.

  • Orchestrate: You orchestrate all the binaries, all the code, and all the data to be manipulated.

  • Test: Then you test the whole thing.

  • Deploy: Like in DevOps!

  • Monitor: Still like in DevOps!

I hope this article was useful as an introduction to DataOps, and will help you in your DataOps adoption! You'll find very interesting articles on two DataOps vendors' web sites:

  • https://www.saagie.com/resources/blog/

  • https://www.datakitchen.io/blog/

Big data

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Strategies for Kubernetes Cluster Administrators: Understanding Pod Scheduling
  • Integrate AWS Secrets Manager in Spring Boot Application
  • Tackling the Top 5 Kubernetes Debugging Challenges
  • Cloud Performance Engineering

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: