This article is featured in the new DZone Guide to Data Persistence. Get your free copy for more insightful articles, industry statistics, and more.
In order to gauge the state of the Persistent Data and Databases in the “real world,” we interviewed 16 executives, from 13 companies, actively involved in databases and persistent data. All of the executives have extensive experience in data and data management.
Satyen Sangani CEO, Alation | Sam Rehman CTO, Arxan | Andy Warfield Co-founder/CTO, Coho Data | Rami Chahine V.P. Product Management, Datawatch | Dan Potter CMO, Datawatch | Eric Frenkiel Co-founder/CEO, MemSQL | Will Shulman CEO, mLab | Philip Rathle V.P. of Product, Neo Technology | Paul Nashawaty Product Marketing and Strategy, Progress | Joan Wrabetz CTO, Quali | Yiftach Shoolman Co-founder and CTO, Redis Labs | Leena Joshi V.P. Product Marketing, Redis Labs | Partha Seetala CTO, Robin Systems | Dale Lutz Co-founder, Safe Software |Paul Nalos, Database Team Lead, Safe Software | Jon Bock V.P. of Product and Marketing, Snowflake Computing
Here’s what we learned from the executives:
Companies tend to use their own databases as well as those their clients are using. Service providers are agnostic with regards to the databases they use. They also have a good understanding of the specific strengths of each database. Specific mentions of non-proprietary databases included: MongoDB, Cassandra, Spark SQL, MySQL, PostgreSQL, Teradata, Vertica, Oracle, AWS RDS for Aurora, Geodatabase, Smallworld, and even Microsoft Excel.
There’s a consistent definition of persistent data as data that doesn't change across time, systems, and memory; data that considered durable at rest with the coming and going of software and devices; master data that’s stable, that is set and recoverable whether in flash or in memory.
The most important elements of the database depend on what is needed. Foremost is storing data in some form of durability to maintain asset properties with the ability to access it. There’s a tradeoff between speed, scale, and usability. Ultimately databases must be consistent, available, and able to tolerate partitions. The ability to support a broad variety of data for aggregation, analysis, and reporting. Performance, scalability, and the ability to process more data more quickly is becoming more important as data becomes more prolific. Databases have bifurcated into what’s most relevant for the use case—“the consumerization of databases.
There are six features critical to ensuring high availability and safeguard against every type of failure or outage event: 1) in memory replication; 2) multi-rack/zone/data center replication; 3) instant auto-failover; 4) AOF (append-only file) data persistence; 5) backup; and 6) multi-region/cloud replication.
Databases are enabling companies to use data to inform real-time decisions about their business as well as to use predictive analytics to make better informed, real-time decisions. The macro-trend is that more data is being analyzed in real-time. The internet of connected things enables you to see how things interact. Applications are tending to use multiple databases to provide polyglot persistence.
There were a number of skills mentioned by executives that make someone good at working with databases. These include: understanding the proper design structure, knowing what’s in the database you’re working with, and understanding data science and what data scientists are looking for. As the number of databases grows, it’s important to understand the strengths and weaknesses of the different tools and to choose the right database for what you’re trying to accomplish. More Big Data jobs are requiring a broader set of skills.
Data management has evolved very rapidly since the introduction of Hadoop and Big Data. We’ve gone from gigabytes to zettabytes of data that is distributed across—and needs to be accessed from—several sources very quickly. Organizations are building cultures with data scientists and data management now having visibility in the C-suite with the Chief Data Officer reporting to the CEO or the CTO.
The obstacles to success are consistent with the growth of data and the growth of databases. Data resides in a number of different places and you need access to the data sources regardless of where they are. There’s an explosion of new database technologies, and someone in the organization needs to stay abreast of what's available and what’s the best solution to the problem at hand, someone with more diverse data literacy with different databases and languages. Given the growth and variety of options, it’s rare for an enterprise to have the resources they need to analyze Big Data themselves. This has led to the growth of companies providing databases as a service (DBaaS), since these companies have the bandwidth to keep up with all of the latest technologies, know their strengths and weaknesses, and employ professionals who know the nuances of each database.
The only concerns around data management are the tremendous growth in the number of databases and the inherent complexity therein. Several people expressed concern that it’s more complex than it needs to be, as marketers create confusion with different terminology and have a tendency to overpromise and under-deliver. There’s agreement that many of the database options will coalesce over time, and there will be SQL and NoSQL options—with those in the know realizing that all NoSQL databases are not the same.
The future for databases involves consolidation around BigData down to around 10 core technologies that make data easy to access and leads to more data-driven analytics and services. Data will be easier to access and use. More processing will be done on the edge to facilitate real-time computations and decision making. Polyglot persistence will ensure the safety of persistent data. Data science will improve research by defining the questions that need to be asked.
What developers need to keep in mind when working with different databases is consistent with what they need to keep in mind when working with all technologies: use best practices that are already established, proven, and tested; don’t reinvent the wheel if you already have the right technology for the job; understand how the data will be used so it’s in the right datastore and language for the required analysis; and, there’s nosuch thing as “one size fits all,” so don’t become too attached toa single solution. Specific recommendations include: knowingSQL while learning as many other languages as you can; exploring JSON; getting up to speed on predictive analytics; and considering geospatial data, given the growth of mobile.
Other trends mentioned by the executives are the importance of supporting SQL, and understanding when one database is more cost efficient than another as the data quickly scales. Lastly, open source, and the role it plays, is a very big trend since open source has democratized the database layer.
The executives we spoke with are fully invested in the evolution of databases and data management and want tocontinue to lead its evolution and success to meet business and consumer needs. We’re interested in hearing fromdevelopers, and other IT professionals, to see if these insights offer real value. Is it helpful to hear others’ perspectives from an executive point of view?
Are their experiences and perspectives consistent with yours?
We welcome your feedback at firstname.lastname@example.org
For more insights on data tools and solutions, persisting data on mobile devices, and choosing a DBaaS, get your free copy of the new DZone Guide to Data Persistence!