High-Performance Analytics for the Data Lakehouse
New platform enables lakehouse analytics and reduces the cost of infrastructure by conducting analytics without ingesting data into the central warehouse.
Join the DZone community and get the full member experience.Join For Free
CelerData, a unified analytics platform for the modern, real-time enterprise, has announced the latest version of its enterprise analytics platform.
“The data lakehouse has added critical capabilities to the data lake architecture by introducing ACID control, table formats, and data governance,” said James Li, CEO, CelerData. “However, analytics capabilities on the lakehouse are still limited and cost prohibitive. Most query engines struggle to support interactive ad-hoc queries, are not able to support real-time analytics, and fall apart when facing a large number of concurrent users.”
The new platform enables lakehouse users to conduct high-performance analytics without ingesting data into a central data warehouse. Queries will be completed three times faster at a significant cost reduction.
Data lakehouse users can perform analytics by querying across streaming data, and historical data in real-time without waiting and combining streaming data into batches for analysis. This simplifies the data architecture and improves the timeliness of lakehouse analytics. The advanced query engine can support thousands of concurrent users at 10,000 QPS (Queries Per Second), enabling new use cases.
I had the opportunity to interview: Li Kang, VP of Strategy, CelerData to learn more about the benefits of the platform evolution:
I am not familiar with the table formats Iceberg, Hudi, and Deltalake. Can you provide a use case for each that I can relate to?
“Table formats are a way to organize data files. Files stored in the data lake don’t have standard metadata information like database tables do. Table formats try to bring database-like features to the data lake, such as tables, columns, transaction logs, etc. The result is that a data lake can be managed in a way similar to how a database is managed.”
Why is it important to be able to perform high-performance analytics without ingesting data into a central data warehouse? Is this where the 3X query performance is achieved?
“Data ingestion has three side effects: It’s expensive because dedicated hardware resources are required to ingest data, and the cloud vendors may also charge for the network traffic. It’s time-consuming because, depending on the amount of data and hardware resources, this process can take minutes to hours. There is also a potential for data quality issues, such as inconsistency, because there is now a duplication of data in its original location and data warehouse.”
“People need data warehouses because data warehouses provide metadata information and great query performance. With CelerData and table formats, we can address these two concerns in the data lake.”
What are some examples of business problems you are solving, or will be able to solve for clients?
“CelerData can be used across industries for any customer who needs to analyze large amounts of data to drive business decisions. Examples of business problems we solve include real-time fraud detection, digital advertisement placement and performance analysis, retail promotion and product recommendations, supply chain management, social media platform analysis, and much more.”
Why will developers like this news? How will it make their lives simpler and easier?
“Modern applications increasingly need built-in applications, so application developers, whether they are building SaaS, mobile, or enterprise applications, will appreciate a high-performance query engine that allows them to derive insight from large amounts of data easily, quickly, and cost-effectively.”
I hope you have learned something new from this interview with Li Kang at CelerData and will take this information and apply it to your software development career or hobby.
Opinions expressed by DZone contributors are their own.