Key Takeaways from the 2017 Gartner Market Guide for Data Preparation
Key Takeaways from the 2017 Gartner Market Guide for Data Preparation
Gartner's guide to data prep came out in December, so let's take a look at the two major takeaways orgs should keep an eye on for successful data-based initiatives.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
The recently released 2017 Gartner Market Guide for Data Preparation estimates that Data Preparation is well on its way to a $1B market, quickly growing from around $780M in 2016. The analyst firm’s Market Guide series is meant to cover new and emerging areas where software products and organizational requirements are still being defined for what Gartner would deem a ‘fledgling’ area of the market. However, they can still be a great resource for customers looking to understanding and identify current and future business needs.
While everyone’s priority with a Market Guide might be to check the vendor profiles (where I’m proud to see Talend Data Preparation categorized among the 19 “stand-alone data preparation tools” with a detailed profile), I would recommend focusing on the thought leadership or market analysis and advice the report provides. Customers should consider the commentary delivered by the report’s author’s — Ehtisham Zaidi, Rita Sallam and Shubhangi Vashisthas food for thought on how to successfully expand the reach and value of Data Preparation within their organization.
What are some of the more prominent insights you should take away from this research? Let’s take a look.
From "Self-Service Data Preparation for Analytics" to Simply Data Preparation
The first thing that caught my attention is the evolution of the report’s name over the last three years: in March 2015, Gartner published the “Market Guide for Self-Service Data Preparation for Analytics”; in August 2016, its name was shortened as the “Market Guide for Self-Service Data Preparation”, highlighting the evolution of Data Preparation into a cross-functional data management activity that includes, but is not restricted to, analytics. Now, the new “Market Guide for Data Preparation” indicates that technologies have evolved beyond self-service into “platforms that enable data and analytics teams to build agile and searchable datasets at an enterprise scale”. In other words, the shortened name of the market over the years indicates that its scope has widened. At the same time, the number of profiled vendors has shrunk from 36 to 28, as the result of tougher qualification criteria and market consolidation.
The report also highlights how data preparation is evolving from a toolkit for a select few business specialists, into a pervasive activity that is predicted to be “used in more than 50% of new data integration efforts for analytics by 2020”. However, this evolution is still in its early steps: only a few vendors agreed to disclose an estimation of their number of customers and users, and among those, the average number of users per customer is in the range of 5 to 50, a number that hints that Data Preparation has still to reach mainstream. The Data Preparation market might be at a tipping point where organizations must develop more formal deployment strategies so that usage can spread and scale across the enterprise.
From a Technology Tool to a Business Discipline
The report is particularly inspiring at drawing a bigger scope for Data Prep than the commonly known dimension targeted at improving efficiency for data consumption: Gartner states “Data Preparation = Faster Time to Insight + Improved Trust”. Of course, Data Preparation tackles time-to-insight—a major pain point in data-driven initiatives—as we all know that most data workers spend more than 60% of their times on simply preparing data for analysis, which leaves little time to derive thorough conclusions. I believe this confirms that organizations should consider equipping more employees with self-service data preparation tools to enable more effective data consumption and use.
But Data Preparation can deliver much more than the last mile of the information supply chain. The report also notes that the use of Data Preparation help prevent data governance challenges and maintaining multiple versions of the truth, which—in turn—results in improved trust in enterprise information.
This goal brings Data Prep beyond self-service. It has the potential to change and improve the relationship between the lines of business and IT operations by fostering better communication and collaboration, in the same way that DevOps is closing the gap between developer communities and operational teams. This enterprise-wide collaboration and broader organizational benefit can be done through the operationalization of data preparation wherein the ‘preparations’ performed by business users who ‘know the data best’ can be shared cross-functionally with other departments.
Let’s take the example of data ingestion. When a data scientist designs a predictive model, he might need to provision new datasets in an ad-hoc way for experimentation without having to wait for support from IT or other colleagues thanks to self-service data preparation. When their experimentation shows success, those new datasets then become critical and need to be provisioned in a more professional way to a wider audience, using a defined Service Level Agreement, data latency, quality checks, etc. This is when the ad hoc data preparation becomes an enterprise asset that can be reused and operationalized upstream to what Gartner references as a “modern data pipeline”.
“This enterprise-wide collaboration and broader organizational benefit can be done through the operationalization of data preparation wherein the ‘preparations’ performed by business users who ‘know the data best’ can be shared cross-functionally with other departments.”
Not only does Gartner predict that this will have a profound impact on technologies, considering that “by 2023, machine-learning-augmented master data management, data quality, data preparation and data catalogs will converge into a single modern enterprise information management platform”; but it recommends how to turn this into reality in your organization. Some of the suggestions in the report include:
- Assign responsibilities through the new role of the data engineer
- Create a process for capturing and operationalizing the models created by business users
- Bring the necessary governance and change management dimension to secure and maximize the impact of your data preparation and data discovery initiatives
Published at DZone with permission of Jean-Michel Franco , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.