To gather insights for DZone's Data Persistence Research Guide, scheduled for release in March, 2016, we spoke to 16 executives, from 13 companies, who develop databases and manage persistent data in their own company or help clients do so.
Here's who we talked to:
Satyen Sangani, CEO, Alation | Sam Rehman, CTO, Arxan | Andy Warfield, Co-Founder/CTO, Coho Data | Rami Chahine, V.P. Product Management and Dan Potter, CMO, Datawatch | Eric Frenkiel, Co-Founder/CEO, MemSQL | Will Shulman, CEO, MongoLab | Philip Rathle, V.P. of Product, Neo Technology | Paul Nashawaty, Product Marketing and Strategy, Progress | Joan Wrabetz, CTO, Qualisystems | Yiftach Shoolman, Co-Founder and CTO and Leena Joshi, V.P. Product Marketing, Redis Labs | Partha Seetala, CTO, Robin Systems | Dale Lutz, Co-Founder, and Paul Nalos, Database Team Lead, Safe Software | Jon Bock, VP of Product and Marketing, Snowflake Computing
Data resides in a number of different places and your need to access the data sources regardless of where they are. There’s an explosion of new database technologies, and someone in the organization needs to stay abreast of what’s available and what’s the best solution to the problem. There needs to be more diverse data literacy with the different databases and languages. Given the growth and variety of options, it’s rare for an enterprise to have the resources they need to analyze big data themselves. This has led to the growth of companies providing database as a service (DBaaS) since these companies have the bandwidth to keep up with all of the latest technologies, know their strengths and weaknesses, and employ professionals who know the nuances of each database.
Here's what we heard when we asked, "What are the obstacles to success of data management?":
Data resides in a number of different places. You need access to the proper data sources regardless of where they are. Do this in a way that doesn’t compromise security. Access data in place to extract the desired result. Any data transmitted should have an enhanced encryption protocol.
Explosion of new technologies. You need to understand what’s real and what’s the best solution to the problem. Try to control complexity. A lot of open source tools are available and people will start becoming more familiar with and using them. Planned projects can become very complex with people using new technologies rather than what they already had. Complexity and sprawl is difficult to control.
Data literacy—using databases and database technology for a particular problem is necessary. Understand limitations and possibilities of data. Be smart about answering questions. Know which dataset can answer which questions. Know what you’re trying to accomplish.
Development and testing around production. You need to know how to test new applications and functions when working around persistent data versus ephemeral data. You need to work with real data, data that’s involved with the metrics around the infrastructure. Know how to deal with a persistent infrastructure.
Data volume, time to access, and cost of storage are all important. Compress data in a way to store it more cheaply. Know the technology of memory and consider the time it takes to get data to and from an application.
We see the most problems with companies trying to DIY with big data. There’s a menagerie of different packages and alternatives that has led to a fragile environment. The brittle infrastructure leads to more downtime and slower development. Developers need to know the right tool for the job.
Business doesn’t appreciate the investment required. There is a series of cascading impacts by investing in infrastructure. There are so many technologies today that you have to sympathize with the database administrator or decision maker to understand and make sense of the hundreds of options. Companies do not have the right programs in place to understand the different categories of solutions. Companies can end up building something that doesn’t work or something that is not the best choice. You must get that knowledge throughout the organization of database decision makers.
Developers have a love/hate relationship with databases. Managing them is not easy. With NoSQL there are many machines that have to always be up with redundancy, secured with back-ups, scale, and fast. The developer must do things to ensure the database continues to perform. Usually this is an ops job and it’s making cloud-based hosting of databases very appealing. Because of the cloud and the tools available within the cloud, developers are able to manage the database themselves. Everything is becoming more software defined (that’s the idea of the cloud). Devs still have to decide on the aspect of configuration with regards to RAM and storage space. The cloud enables developers to manage the database themselves by providing an intuitive interface with the process.
Some clients are becoming more concerned about the security aspects of data. Most important is access to the data set when it's needed. Apple changed everyone's’ expectations of usability—they raised the bar. How quickly can I get data to the end user without leaks.
The biggest obstacles we see in our line of work are related to data silos, vendor lock in, an non-open/non-extensible systems. Once customers overcome these, they must face data validation on loading, mapping complex schemas or data models, and finding suitable hardware or cloud infrastructure for their deployment. Additional issues are databases or hardware being retired, the amount of bad data, and the raw complexity of data models.
- The context of containers. How do I bring my traditional database onto the container? We just started seeing this in 2015 and Robin enables the process.
What obstacles of success to data management are you encountering?