It was great talking with Monte Zweben, CEO of Splice Machine, about how he and his team are making artificial intelligence (AI) and machine learning (ML) data processing easier than constructing a lambda architecture.
Monte worked on machine learning systems for NASA, serves on the advisory board of the Dean of Computer Science at Carnegie Mellon University, and was Chairman of Rocket Fuel, where he saw first-hand the difficulty of putting together the three different engines of a lambda architecture to make AI/ML computations:
- A batch layer for data analysis and building ML models, like Spark or Hive.
- A streaming engine for ingesting and analyzing streams in real-time, like Kafka.
- An app serving layer for low-latency computing, which looks up and updates data in milliseconds for applications, like HBase.
Specific limitations of lambda architectures that Splice Machine set out to address include:
- Complexity: Separate systems, often open-source projects, written in different languages, that all need to be integrated and maintained to remain operational.
- Specialized skills: Requires developers who program in multiple programming languages and distributed system paradigms (and these are not easy people to find).
- Loose coupling: Engines are loosely coupled, so changes to data in one layer take time and effort to be communicated to another layer in order to be considered in the analysis.
- Concurrency: Unlike operational databases designed to handle concurrent users, Lambda-based systems can make it difficult to ensure updates are made properly in the event of errors, power failures, and simultaneous updates.
- Resilience: Pipelines often need to be restarted, resulting in high latency and missed deadlines. Companies that experience these missed deadlines are flying blind and often must act on stale data.
While big companies like Google, Facebook, and Netflix may have the computing power and data scientists necessary to do this work, few other companies have these resources. Splice Machine strives to make it simple for companies by providing an online predictive processing (OLPP) RDBMS with a lambda architecture underneath. By democratizing the lambda architecture with SQL, companies can scale and analyze petabytes of data in real-time.
A couple of examples:
- Clearsense uses big data analysis of electronic medical records (EMRs) of patients in hospitals to predict sepsis or code blue events so that nurses can intervene and prevent these potentially deadly events from occurring.
- Mojix, an IoT company that uses RFID to track supply chain and inventory, had a lambda architecture that was too complex. Splice Machine provided them with a platform that enables them to ingest and analyze their data in real-time as well as power their application.
- A leading financial services company is powering a customer service application with seven petabytes of data and is able to look up and update data in milliseconds.
OLPP is a new way to provide the power of the lambda architecture but with the familiarity and ubiquity of SQL.