To gather insights for DZone's Data Persistence Research Guide, scheduled for release in March, 2016, we spoke to 16 executives, from 13 companies, who develop databases and manage persistent data in their own company or help clients do so.
Here's who we talked to:
Satyen Sangani, CEO, Alation | Sam Rehman, CTO, Arxan | Andy Warfield, Co-Founder/CTO, Coho Data | Rami Chahine, V.P. Product Management and Dan Potter, CMO, Datawatch | Eric Frenkiel, Co-Founder/CEO, MemSQL | Will Shulman, CEO, MongoLab | Philip Rathle, V.P. of Product, Neo Technology | Paul Nashawaty, Product Marketing and Strategy, Progress | Joan Wrabetz, CTO, Qualisystems | Yiftach Shoolman, Co-Founder and CTO and Leena Joshi, V.P. Product Marketing, Redis Labs | Partha Seetala, CTO, Robin Systems | Dale Lutz, Co-Founder, and Paul Nalos, Database Team Lead, Safe Software | Jon Bock, VP of Product and Marketing, Snowflake Computing
Persistent data is data that’s considered durable at rest with the coming and going of software and devices. Master data that’s stable—that is set and recoverable whether in flash or in memory.
Here's what we heard when we asked, "How do you define persistent data?":
The opposite of dynamic—it doesn’t change and is not accessed very frequently.
Core information, also known as dimensional information in data warehousing. Demographics of entities—customers, suppliers, orders.
Master data that’s stable.
Data that exists from one instance to another. Data that exists across time independent of the systems that created it. Now there’s always a secondary use for data, so there’s more persistent data. A persistent copy may be made or it may be aggregated. The idea of persistence is becoming more fluid.
Stored in actual format and stays there versus in-memory where you have it once, close the file and it’s gone. You can retrieve persistent data again and again. Data that’s written to the disc; however, the speed of the discs is a bottleneck for the database. Trying to move to memory because it’s 16X faster.
Every client has their own threshold for criticality (e.g. financial services don’t want to lose any debits or credits). Now, with much more data from machines and sensors, there is greater transactionality. The meta-data is as important as the data itself. Meta-data must be transactional.
Non-volatile. Persists in the face of a power outage.
Any data stored in a way that it stays stored for an extended period versus in-memory data. Stored in the system modeled and structured to endure power outages. Data doesn’t change at all.
Data considered durable at rest with the coming and going of hardware and devices. There’s a persistence layer at which you hold your data at risk.
Data that is set and recoverable whether in flash or memory backed.
With persistent data, there is reasonable confidence that changes will not be lost and the data will be available later. Depending on the requirements, in-cloud or in-memory systems can qualify. We care most about the "data" part. If it’s data, we want to enable customers to read, query, transform, write, add-value, etc.
A way to persist data to disk or storage. Multiple options to do so with one replica across data centers in any combination with and without persistence. Snapshot data to disk or snapshot changes. Write to disk every one second or every write. Users can choose between all options. Persistence is part of a high availability suite which provides replication and instant failover. Registered over multiple clouds. Host thousands of instances over multiple data centers with only two node failures per day. Users can choose between multiple data centers and multiple geographies. We are the company behind Redis. Others treat as a cache and not a database. Multiple nodes - data written to disks. You can’t do that with regular open source. If you don’t do high availability, like recommended, you can lose your data.
- Anything that goes to a relational or NoSQL database in between.
So, how do you define persistent data?