I sat down to talk to Richard Lamb, the Presiden of SnappyData and Jags Ramnarayan, the CTO. They should sound familiar as they came from Pivotal and were the big data / in memory team from their. They have done some amazing work in a short-time with their open source SnappyData product that runs on Spark.
SnappyData uses the enterprise proven Gemfire technology to power an incredible analytics, OLTP and OLAP engine. The project itself defines it best as "SnappyData is a distributed in-memory data store for real-time operational analytics, delivering stream analytics, OLTP (online transaction processing) and OLAP (online analytical processing) in a single integrated cluster. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire XD (as an in-memory transactional store with scale-out SQL semantics)." As a previous user of Gemfire XD in real-time streaming applications, I can attest to the amazingly coolness of what this provides. You can run SQL to access in-memory data at lightning speed and easily code scalable applications tapping huge data sources. It includes HA, Transactions and clustering.
They fixed all the issues and shortcomings of Gemfire XD and moved further. They have an extended Query optimizer. They also run their Snappy code inside the JVM as each Spark node. They are working with MIT on their technology. They have an interesting approximater query engine that can return approximated data amazingly fast within a given error tolerance. For sampling and for rapid display as part of a Lambda application, that can be great. I like this for dashboard displaying a sample of what data is rapidly streaming through the system. Great especially for showing social data and logs which don't have to be exact.
They have over 25 members of the Gemfire Team are working at SnappyData and they have funding from Pivotal, but are a separate company. I am really excited for this project knowing that Jags and Richard are running it. They are brilliant guys and I am expecting a lot of this project.
There are a lot of enterprise use casee that this is needed so enterprises can replace their proprietary data warehouses with open source Spark. The company will be providing enterprise support and will add some enhanced tools on top of the product for enterprise customers.
I am working on a follow-up to this article with an example application and run through on setup and usage. So stay tuned.
At airis.DATA, we do a lot of experimental application development to show our clients how to use advanced tools and new products and this is one I am excited to look at.
Checkout this SnappyData presentation.