To gather insights for DZone's Data Persistence Research Guide, scheduled for release in March, 2016, we spoke to 16 executives, from 13 companies, who develop databases and manage persistent data in their own company or help clients do so.
Here's who we talked to:
Satyen Sangani, CEO, Alation | Sam Rehman, CTO, Arxan | Andy Warfield, Co-Founder/CTO, Coho Data | Rami Chahine, V.P. Product Management and Dan Potter, CMO, Datawatch | Eric Frenkiel, Co-Founder/CEO, MemSQL | Will Shulman, CEO, MongoLab | Philip Rathle, V.P. of Product, Neo Technology | Paul Nashawaty, Product Marketing and Strategy, Progress | Joan Wrabetz, CTO, Qualisystems | Yiftach Shoolman, Co-Founder and CTO and Leena Joshi, V.P. Product Marketing, Redis Labs | Partha Seetala, CTO, Robin Systems | Dale Lutz, Co-Founder, and Paul Nalos, Database Team Lead, Safe Software | Jon Bock, VP of Product and Marketing, Snowflake Computing
The future for databases is consolidation around big data with a rationalization down to 10 core technologies that make data easy to access and leads to more data-driven analytics and services. Data will be easier to access and use. More processing will be done on the edge to facilitate real-time computations and decision making. Polyglot persistence ensures the safety of persistent data. Data science expands into research by defining the questions that need to be asked.
Here's what we heard when we asked, "What’s the future for data and databases as the sources of data grow?:"
In 2015, there are 4.4 zettabytes of data. In 2020, we’re projected to have 44 zettabytes of data. Moving data around to access information is not realistic - too slow and expensive. Accessing data in place is the future. Big data and new sources will consolidate. This consolidation will introduce new ways to access and manage information.
Rationalizations of different technologies down to 10 core. The way we use data will change with a new generation of data services. Data that is easy to access leads to more data driven analytics and services. Example - maps overlaid with rich data (restaurants and stores). Data services combined into new applications. New focus on data management, databases and data processing technology. The burden of use has shifted from the user to the provider of the technology. Now becoming a service managed by the provider (i.e. backup in the SaaS environment). Centers of excellence around data used to have to include experts in many areas, now companies can just focus on the core competency of the business.
Making data easier to use. Data allows us to learn faster. Gather data implicitly by watching what people do.
Transactional is going away while in-memory and data stores will become the core sources. Each will be architected differently. More focus on storing and processing data on the edge. Throw away data after initial processing and just keep the results. We’ll go from collect -> analyze -> aggregate -> store and start with classification up front. Ultimately we’ll have quick up-front analytics but the technology is not there yet. We’ll be shrinking the data size so it can be parsed across networks. Databases and networks integrate, systems become databases.
Faster, smarter technology becomes hybrid (on premises and in the cloud). It won’t happen quickly. Data lakes with a meta layer for metadata cataloguing enables communication with all sources (e.g. Tamer). We will always need multiple databases for multiple purposes.
In-memory become ubiquitous as memory prices go down and density goes up. Memory is one million times faster than disk. Companies become more comfortable with the cloud and hybrid solutions.
It’s cool working around automated ingestion tools and adding structure to unstructured data. Databases were the product of the material they were built from. Today everything can be flexible because of the availability of fast access memory. You see systems where the data can be more active. You can work with larger sets of data and ask unexpected questions of the data.
Real-time computation, less off-line computation. Rather than pre-computing a recommendation off line there will be a lot more real-time responsiveness. More use of connections. Google was the 36th search engine but the first to connect the dots. There will be companies doing this in every industry. There’s a trend to hosting databases in the cloud but it will take longer. The cloud can introduce latency and errors. There’s a lot of coordination when putting a set of systems in the cloud. Bandwidth to and from the cloud can be an obstacle.
Polyglot persistence. Use databases for their particular specialty. Cloud-based services empower people to use more complex databases (i.e. 30 different nodes talking to each other at once) managed by automation. More DevOps automation with DBaaS and distributed databases. Big data leads to more scalability and speed.
Data science expands out around research to define the questions that need to be asked. If you already know the question, then you use business intelligence (BI) to answer it. the distributed nature of data will continue to grow. There’s a place for SQL and NoSQL. We will find more creative ways to store data. Data signatures, data pedigrees, will become more important - especially around IoT. There will be more user-generated content with more data, more apps, and more users. All of this data will provide a more reliable image of the real world.
We anticipate seeing more data, higher frequency, and an increase in real-time and persistence of real time measurements. To cope with this, people will leverage more of the cloud and as-a-service offerings. People will also demand more commoditization and end-user value from their data, which will see more integration, increasing spatial content including big formats like raster and LiDAR, and improved in-database functionality and processing tools, particularly surrounding spatial data. We’ll also see more ontology, finding fuzzy relationships or similarities of data.
Scale with enterprise based on RAM and persistent data. Use Flash as a RAM extender. RAM is 10X more expensive than SSD. We can use SSD as a RAM extender. 100,000 operations per second is good for 90% of use cases. All chip and memory vendors are including RAM and Flash together. Add more nodes and more RAM will be cheaper as new technologies evolve. Use memory in a smart way.
Seeing an interesting trend in NoSQL databases to DBMS. Infinite scalability with the structure on top. Databases that expand with multiple geographies.
What's the future of data and databases from your perspective?