How To Simplify Building Interactive Apps on the Data Lake
Starburst provides new capabilities to build interactive analytics apps on the data lake with real-time ingestion, governance, sharing, and maintenance.
Join the DZone community and get the full member experience.Join For Free
Data teams today face the challenge of building and scaling interactive, analytics-driven applications on complex, burdensome data lake architectures. Starburst unveiled new capabilities at AWS re:Invent 2023 to help enterprises overcome these hurdles by unifying real-time data ingestion, governance, sharing, and maintenance on an open, cost-efficient platform.
"We're seeing more and more customers building data applications leveraging this type of architecture," said Justin Borgman, CEO of Starburst. The company's core value proposition is providing flexibility and optionality by enabling access to data anywhere.
Real-Time Data Ingestion
A key ingredient missing from many data lake architectures is the ability to ingest streaming data in real-time. Starburst now enables integration with Kafka to hydrate data lakes with live data feeds.
"With connectivity to Kafka, you can have data that's extremely fresh and make very fast business decisions in an automated way," said Borgman. This positions data teams to build interactive applications that leverage real-time insights.
Upcoming support is planned for fully managed solutions like Confluent Cloud. By incorporating streaming data from Kafka, Starburst customers can power next-generation applications that require milliseconds of latency.
Automated Data Governance
Governing security and access controls are critical when handling sensitive information. Starburst automatically applies classifications to data using machine learning. Based on those classifications, policies can be enforced to grant or restrict access to certain users and groups.
"As new data lands in the lake, machine learning models in Gravity will automatically apply classifications for certain categories," Borgman explained. Gravity, Starburst's governance layer, then restricts access as needed.
"Now, as soon as PII lands in the lake, Gravity will be smart enough to identify and restrict access to that data," explained Borgman. This is extremely useful for teams working with personal data subject to privacy regulations.
For multinational corporations, Starburst facilitates a distributed architecture that keeps protected data within geographic boundaries. This prevents prohibited cross-border data transfers while still enabling analytics. Starburst connectors allow clusters in different regions to communicate while minimizing data movement.
Data Maintenance Automations
Maintaining high performance and efficiency in a data lake as it scales is an arduous manual task. Starburst introduces capabilities like automated data compaction, vacuuming, and schema discovery to optimize infrastructure.
"Users can now maintain warehouse-like performance without adding brittle manual processes," said Borgman. This prevents bottlenecks, allowing teams to focus on innovation versus infrastructure.
By abstracting away common management burdens, Starburst customers can painlessly scale data lakes to petabytes of data across exponential query growth.
Universal Data Sharing
Collaborating and sharing data drives growth but can be limited by proprietary ecosystems. Starburst's Gravity enables packaging datasets into shareable data products accessible to any user or third party.
"In Snowflake's case, you can only share data from Snowflake to Snowflake. Our approach allows sharing data from any database or lake to any customer," explained Borgman. This provides true openness.
The new functionality also enables monitoring and logging for shared data products. With the Starburst data-sharing approach, partners don't need to standardize on the same technology stack.
Self-Service Analytics With AI
Exploratory analytics is difficult without technical SQL expertise. Starburst is incorporating AI to enable natural language queries.
"New AI-powered experiences in Galaxy, like text-to-SQL processing, will enable data teams to offload basic exploratory analytics to business users, freeing up their time to build and scale data pipelines," Borgman said.
“You can say, give me last month’s sales results. It'll turn that into an SQL query,” said Borgman. This allows business teams to self-serve insights.
By leveraging large language models, Starburst offloads basic queries from overloaded data teams. The AI is only as good as the data it's trained on, which Starburst provides access to.
Optimized for the Cloud
Starburst emphasized continued tight integration with Amazon Web Services, including AWS Graviton processors and services like QuickSight, to power its platform. But it also runs on Microsoft Azure and Google Cloud.
“We’ve partnered with them in a number of different areas,” said Borgman of the AWS collaboration. He cited AWS Marketplace transactions as a key go-to-market focus.
Global energy leader Halliburton highlighted improved productivity by leveraging Starburst's new capabilities.
"Previously, it would take 2 to 3 weeks to get an answer to an ad hoc question. By embedding an LLM with Starburst’s data products architecture, data consumers can ask questions in plain language, have it converted to SQL, and get the answer back immediately," said Fahad Ahmad, Data Science Leader at Halliburton.
By combining Starburst’s structured data foundation with large language model augmentation, Halliburton unlocked real-time insights for all its users. Queries that used to take weeks now produce answers immediately by using AI and Starburst together.
Architecting for Scale
Scalability was top-of-mind for Starburst, given its roots at Facebook. The platform leverages parallelism for limitless scale-out capacity. Multiple clusters can also access the same data.
“You could have different teams, departments, and use cases, using their own cluster,” explained Borgman. The separation of storage and compute tiers prevents data silos.
The platform’s connector architecture also facilitates hybrid and multi-cloud agility. Borgman suggests architects focus on data agnosticism and abstraction layers for future-proof designs.
Tips for Developers
For developers building interactive applications, Borgman recommends starting with Starburst Galaxy, its fully managed cloud data lake analytics service. Its free trial makes exploration easy.
He advises leveraging open file formats like Iceberg for storage optimization and Starburst for computational query power. Database and infrastructure agnosticism simplify development.
With large language models creating new potential and data volumes multiplying, Starburst is working to keep data accessible and insights flowing. Its expanded capabilities create a foundation for analytics-driven data applications on cost-efficient cloud data lakes. For enterprises plotting their next analytics chapter, Starburst aims to make the data lake a viable data warehouse alternative.
Opinions expressed by DZone contributors are their own.