Executives' Perspectives on the Evolution of Data Management
Data management has evolved very rapidly since the introduction of Hadoop and big data.
Join the DZone community and get the full member experience.Join For Free
To gather insights for DZone's Data Persistence Research Guide, scheduled for release in March, 2016, we spoke to 16 executives, from 13 companies, who develop databases and manage persistent data in their own company or help clients do so.
Here's who we talked to:
Satyen Sangani, CEO, Alation | Sam Rehman, CTO, Arxan | Andy Warfield, Co-Founder/CTO, Coho Data | Rami Chahine, V.P. Product Management and Dan Potter, CMO, Datawatch | Eric Frenkiel, Co-Founder/CEO, MemSQL | Will Shulman, CEO, MongoLab | Philip Rathle, V.P. of Product, Neo Technology | Paul Nashawaty, Product Marketing and Strategy, Progress | Joan Wrabetz, CTO, Qualisystems | Yiftach Shoolman, Co-Founder and CTO and Leena Joshi, V.P. Product Marketing, Redis Labs | Partha Seetala, CTO, Robin Systems | Dale Lutz, Co-Founder, and Paul Nalos, Database Team Lead, Safe Software | Jon Bock, VP of Product and Marketing, Snowflake Computing
We’ve gone from gigabytes to zetabytes of data that is distributed across, and needs to be accessed from, several sources very quickly. Organizations are building cultures with data scientists and data management now has visibility in the c-suite with the Chief Data Officer reporting to the CEO or the CTO.
Here's what we heard when we asked, "How has data management evolved?":
There is a huge evolution and it is happening very rapidly. Relational databases with big iron are long gone. We're now focused on distributed data and accessing information from different sources.
It’s gone from having strict and rigorous requirements to being much more nimble and flexible. Databases and data model design used to require a lot of time and planning up front. Now they’re easy to change, deploy, and iterate. This agility has changed how people think about data management.
Data is not easily read or created by humans. The last 10 to 15 years there’s been one predominant method—the relational database. Now, there’s a myriad of tools that are different. The number of things you can do has increased. More people are using data more broadly but there’s a lot of technocratic knowledge required to use well. We’ve gone from one million people using databases to five million. We need to make it easier so they can be used by 50 million.
Bifurcation of the databases has caused an explosion of databases since Yahoo launched Hadoop in 2005. Each database has a use case it is solving: In memory = Spark; Memory-centric = SAP Hana; Semi-traditional = Oracle; Analytics = Hadoop; and Compliance = Object store. And, there are probably eight more variations in between. Amazon has all five. We are moving from solid-state storage to memory chips operating at web scale. We can store more data more effectively without requiring transactional semantics, like Oracle, which slows down access and analysis.
Size - we’ve gone from a couple of gigabytes to zetabytes. The next generation of databases will be concerned with fast aggregation summarized to find the needle in the haystack. We will not treat data as preciously as we have in the past. There will be less data hoarding. Change management and machine learning where the database knows value didn’t change so it just stores the timestamp. Real-time visualization helps see what’s important and just records the changes to data rather than the entire data stream.
Engineers are assuming more responsibilities formerly allocated to database administrators like Hadoop and Apache Spark. Developers are comfortable knowing more than one database. We're seeing the evolution of the data scientist who deals with different types of data. Database management is just one component of a larger skill set.
At the top, data diversity and new tools with new ways to represent the data. Data curation and indexing are taking hold. Organizations are building cultures with data scientists. At the bottom, it used to take weeks and months to put a new infrastructure in place. Today, it installs in minutes and is virtualized. The value is that business is able to move faster and developers can experiment more with new tools.
It was perceived by business and industry 10 years ago that data management had passionate experts who understood governance and architecture, but struggled to explain the importance of investing in data management. That’s changed because so many different technology and data companies that didn’t value data management foundation were slow to adapt and compete. Now, data management has visibility in the c-suite with the Chief Data Officer reporting to the CEO or CTO.
Three trends: specialization, JSON/API driven, and big data. There’s a trend to specialize from the relational database to all sorts of options on the scene. JSON is now the lingua franca of the REST API. It’s evolved—easy to read for representing objects and tucking objects into other objects. There’s a push to adopt JSON in the persistence layer with a number of databases, including PostgreSQL. Big data results in the need to scale much larger than traditional databases. Most NoSQL are distributed databases with a collection of nodes for large scale persistence.
Schema and constraints are big deals. We used to focus on storage and access semantics. Now, it’s more user experience driven. More usability-driven balanced with intelligence on the backend. It’s more focused on the front-end but you can’t forget about the backend.
There has been an explosion of new technologies to store and manage data; the design space of use cases and technology constraints is being thoroughly explored. Some notable examples include the development of vertical rather than horizontal storage of data in a traditional RDBMS, NoSQL (or more correctly no-schema) databases and graphing databases like Neo4J, in-memory databases like extremeDB, XML databases like MarkLogic, and the move to the cloud and increasing database capacity while maintaining performance speeds. The growing variety of data structures solves specific types of data storage problems and data queries. Any claims of "one format or system to rule them all" only adds to the ever-growing list. We are often asked if our business is threatened by the introduction of "such and such" new open format, or industry standard format, or what have you. In truth, these new formats simply increase the number of formats we provide support for.
The challenge of NoSQL is how to allow people to use without using database administrators. Redis is schema-less and easy to maintain. Cassandra and Mongo need a highly skilled professional to manage. Redis does full DBaaS—everything runs in memory to avoid latency. There's on-site install and the ability to manage remotely. Zero-touch software—this is a platform that can host as many databases as needed without additional nodes—100 databases in a second. Built for micro services. Application managers no longer need to deal with a database administrator. There's a problem with consistent performance as you add nodes. Eliminate to detect noisy neighbors in the cloud from the cluster. Guarantee consistent performance to keep your clients happy.
- Context depends on the nature of data and what’s being done with the data. How do you contain the data so analysis can be done quickly. It’s become much more complex. Smaller data sets need to provide low latency access, make copies or calls, and keep track of them, so you can take data management actions.
How has data management evolved from your perspective?
Opinions expressed by DZone contributors are their own.