There’s a lot of data out there to be sure. And people are constantly looking for ways to utilize that data. But the sheer amount of data is no longer the only thing we have to worry about. We’re now at the point where we need to deal with not just big data, but fast data, too.
As we develop ways of gathering data, we’re having to find ways to work with that data as it comes to us, rather than just amassing piles of data to deal with later. Take the world of the Internet of Things, for example. IoT devices allow for devices and sensors to speak with one another; but the data sent from a motion sensor won’t do much good if that data is waiting around somewhere to be analyzed. When data from a motion sensor is collected, it needs to be analyzed and dealt with in the moment, as fast as it comes.
Big data has been a buzzword for some time now, and there are many, many database tools out there to deal with the humongous amount of data being collected. But database tools are more limited when it comes to working with data that needs to be dealt with quickly.
Enter VoltDB: a proposed database solution for fast data needs. VoltDB is an in-memory NewSQL database developed by Dr. Michael Stonebraker, and the folks over at VoltDB believe that this database is the answer to fast data issues. They claim it to be fast, smart, and scalable, with the ability to handle streaming data and allow for real-time analysis, so when a sensor detects motion, that data can be utilized immediately from the DB it’s sent to.
VoltDB uses a few different methods to work with fast data. First, it partitions database tables across the CPUs of a cluster of machines. Second, each transaction - a transaction being ad hoc SQL statements, or stored procedures containing a mix of SQL and Java logic - can be executed against that data, in memory, at each partition. This lets VoltDB execute multiple requests (transactions) simultaneously, in parallel.
By analyzing and precompiling the data access logic in the stored procedures, VoltDB can distribute both the data and the processing associated with it to the individual partitions on the cluster. In this way, each partition contains a unique "slice" of the data and the data processing. Each node in the cluster can support multiple partitions.
It also uses serialized processing
per partition, allowing for consistency between transactions “without the overhead of locking, latching, and transaction logs.”
When a procedure does require data from multiple partitions, one node acts as a coordinator and hands out the necessary work to the other nodes, collects the results and completes the task. This coordination makes multi-partitioned transactions slightly slower than single-partitioned transactions. However, transactional integrity is maintained and the architecture of multiple parallel partitions ensures throughput is kept at a maximum.
VoltDB scaling has also been designed to be simple, so that the database can be scaled without changes to schemas or application code. Nodes can even be added to the database while it is running.
VoltDB offers SQL, Java, and ACID support, and claims to run these faster than standard databases. Its architecture does not require special hardware - a few benchmarks from this article on VoltDB scaling came from instances of VoltDB run on virtual machines in Amazon’s cloud.
Fast data’s importance is growing rapidly as the Internet of Things gains traction, and VoltDB aims to solve the issue of handling streaming data in real-time. Find out more at VoltDB.com.
Also, make sure you check out the DZone Guide to Database and Persistence Management, coming March 9.