Simplify Big Data Analytics With AirMettle
100x faster SQL queries without warehouses. The software-defined approach runs on commodity hardware both on-premises and in the cloud.
Join the DZone community and get the full member experience.Join For Free
Big data analytics continues to grow in importance, but working with massive datasets poses challenges. Moving petabytes of data repeatedly for analysis strains networks and budgets. Even then, irrelevant data can overwhelm analytics platforms, generating more expense without additional insight.
We met with AirMettle during the 53rd IT Press Tour. It offers a new approach, integrating analytics into the data lake itself. Their software runs on commodity hardware, delivering faster insights without the overhead of traditional data warehouses. Key features help developers simplify big data analytics:
Accelerated SQL Queries
AirMettle accelerates SQL directly on data lake storage, eliminating unnecessary data transfers. Built-in parallelism delivers 100x faster SELECT performance than native S3 analytics. This supports ad hoc analysis that data warehouses struggle with.
Focus Relevant Data
Rather than analyzing entire objects, AirMettle summarizes and extracts relevant subsets before they leave storage. This massively reduces the volume developers must manage for a given query, making it feasible to use more historical data.
As software-defined storage, AirMettle runs on any x86 servers with SSDs. It integrates easily into existing infrastructure both on-premises and in the cloud, replacing only the high-performance storage tier. This makes it accessible for teams looking to optimize big data analysis.
Handle Diverse Data Types
AirMettle handles diverse data types, from video to complex scientific data, automatically detecting optimal ways to structure each for fast in-place processing. Support for open APIs like S3 and Arrow makes analytical results readily consumable.
Collaboration to Deployment
The platform spans the development lifecycle with shared workspaces, version control, code review, and CI/CD integration. Role-based access control and security features help manage access, while extensive extensibility supports customizing workflows.
AirMettle is launching in mid-2024 after four years of development driven by $4 million in investor funding. Early customers like Los Alamos National Laboratory validate both commercial and research applications. Upcoming reference deployments in verticals like finance, security, entertainment, and climate science showcase use cases.
How It Works
For engineering teams struggling to extract value from rapidly growing data stores, AirMettle promises welcome simplicity. By moving analytics closer to the data lake, they aim to deliver deeper insights without the overhead enterprises suffer today. Early results suggest they can accelerate SQL queries up to 100x using only commodity infrastructure.
AirMettle stands apart from other vendors in the computational storage space, like Coho Data, in its pure software approach. As software-defined storage, it isn't tied to proprietary hardware and promises easier adoption and infrastructure integration. This aligns more closely with analytics platforms from public cloud providers, but AirMettle differs by running directly within on-premises storage infrastructure.
It also goes beyond cloud analytics offerings by supporting more flexible and granular in-place processing instead of EMDR-only query acceleration. AirMettle's ability to handle unstructured data in native formats helps address gaps left by analytical warehouses reliant on rigid schemas. Its ambitious performance claims of up to 100x faster analysis separate it from existing analytics options, struggling to keep pace with rapidly growing data volumes. The early customer wins lend credence that AirMettle’s integrated architecture could disrupt the separation of storage and insights common today.
"Our scientific large-scale simulations can generate hundreds of petabytes of highly dimensional floating-point data. However, the data associated with a scientific feature of interest can be orders of magnitude smaller than the written data, so a key challenge is quickly and efficiently finding what’s relevant in this sea of data. To optimize this process, we’ve been drawn towards computational storage — processing data in-place and near storage — to eliminate unnecessary data movement while maintaining parallelism and adequate data protection." — Gary Grider, High-Performance Computing Division Leader, Los Alamos National Lab.
"We are a leading SIEM company, spending tens of millions of dollars per year on our data warehouse for security analytics. Our costs keep rising as attacks get more sophisticated and customers demand more proof that their data is protected. By moving analytical processing directly into our data lake with AirMettle, we expect to save over $10 million annually on our largest application while enabling more advanced analytics by leveraging all of our log data." — Anonymous, Public SIEM Company.
"As our therapeutic simulations generate exponentially more data, AirMettle will allow us to extract insights at speeds previously impossible. By parallelizing computation where the data already resides, we can now mine years of archives for clues that could lead to healthcare breakthroughs." — Chief Data Officer, Biotechnology Startup.
If the combination of improved performance, cost savings, and easy access to diverse data in its native formats resonates, AirMettle may open new possibilities for organizations to navigate big data. Engineers searching for analytical freedom from warehouses and proprietary lock-in should keep an eye on their mid-2024 launch.
Opinions expressed by DZone contributors are their own.