Self-Service Tech Simplifies Data Governance, Increases Data Engineering Productivity
Data scientists no longer have to struggle to harness data for business intelligence and data science. Dremio looks to change the current approach to data analytics.
Join the DZone community and get the full member experience.Join For Free
Thanks to Tomer Shiran, co-founder and CEO, and Kelly Stirman, CMO at Dremio for briefing me on their new self-service data platform. Working with existing data sources and business intelligence tools, the platform eliminates the need for traditional ETL, data warehouses, cubes, and aggregation tables, as well as the infrastructure, copies of data, and effort these systems entail. The platform combines consumer-grade ease-of-use with enterprise-grade security and governance and includes execution and caching technologies that accelerate analytical processing.
Despite promises of software designed to unlock the value of data, analysts and data scientists continue to struggle to harness data for business intelligence and data science. Dremio accelerates time to insight by empowering analysts and data scientists to be independent and self-directed in their use of data, from any source, at any scale, while preserving governance and security.
“In our personal lives, most people expect to get answers to questions in just a few seconds. But in the workplace, it can take months to answer a question,” said Tomer. “While tools like Tableau, Power BI, and Qlik provide a self-service model for visualization, Dremio is the first to provide a self-service experience for the rest of the data analytics stack, empowering business users and analysts to discover, explore, and analyze any data at any time, no matter where it is or how big it is.”
“Dremio is a new breed of data analytics platform that doesn’t require ETL, cubes, data warehouses, or even data virtualization tools to deliver self-service analytics to data analysts,” said Wayne Eckerson, founder and principal consultant, Eckerson Group. “The big data platform, designed from the ground up for the cloud and Hadoop, works with any BI product or data science tool, sits between users and data sources, eliminating the need for data movement. This speeds deployments and provides agile access to data.”
Dremio provides a future-proof strategy for data, allowing customers to choose the best tools for analysts and the right database technologies for applications without compromising on the ability to leverage data to power the business.
Key capabilities include the following.
- Apache Arrow execution engine. Dremio is the first Apache Arrow-based distributed query execution engine. This represents a breakthrough in performance for analytical workloads as it enables extreme hardware efficiency and minimizes serialization and deserialization of in-memory data buffers between Dremio and client technologies like Python, R, Spark, and other analytical tools. Arrow is also designed for GPU and FPGA hardware acceleration, making it a powerful paradigm for machine learning workloads.
- Native query push downs. Instead of performing full table scans for all queries, Dremio optimizes processing into underlying data sources, maximizing efficiency and minimizing demands on operational systems. It rewrites SQL in the native query language of each data source, such as Elasticsearch, MongoDB, and HBase, and optimizes processing for file systems such as Amazon S3 and HDFS.
- Dremio Reflections™ accelerates processing and isolates operational systems from analytical workloads by physically optimizing data for specific query patterns, including columnarizing, compressing, aggregating, sorting, partitioning, and co-locating data. It maintains multiple reflections of datasets, optimized for heterogeneous workloads, that are fully transparent to users. The query planner automatically selects the best reflections to provide maximum efficiency, providing a breakthrough in performance that accelerates processing by up to a factor of 1,000.
- Comprehensive data lineage. Data Graph preserves a complete view of the end-to-end flow of data for analytical processing. Companies have full visibility into how data is accessed, transformed, joined, and shared across all sources and all analytical environments. This transparency facilitates data governance, security, knowledge management, and remediation activities,
- Self-service model. Designed with analysts and data scientists in mind, the model provides a powerful and intuitive interface for users to easily discover, curate, accelerate, and share data for specific needs without being dependent on IT. Users can also launch their favorite tools, including Tableau, Qlik, Power BI, and Jupyter Notebooks, directly.
- Built for the cloud. Dremio was designed for modern cloud infrastructure and is able to take advantage of elastic compute resources as well as object storage such as Amazon S3 for its Reflection Store. In addition, Dremio can analyze data from a wide variety of cloud-native and cloud-deployed data sources.
Because Dremio can run in the cloud, on premises, or as a service provisioned and managed in a Hadoop cluster, customers can deploy Dremio to meet their needs at any scale. Popular use cases include BI on modern data, like:
- Elasticsearch, S3, and MongoDB.
- Data acceleration, making even the largest data sets interactive in speed.
- Self-service data, making consumers of data more independent and less reliant on IT.
- Data lineage, tracking the full lineage of data through all analytical jobs across tools and users.
“With more than one million customers and 270,000 servers across our 20 data centers, telemetry data about our infrastructure is a critical asset we use to remain competitive while providing a great experience to our customers," said Vincent Terrasi, head of data, analytics, and CRM for OVH. "Dremio helps our data managers and analysts work with our data independently and effectively, and makes it available for analysis using Tableau Desktop and Tableau Server. We are proud to be a part of this important open source community.”
By working closely with partners, Dremio looks to change the current approach to data analytics by expanding the big data, business intelligence, and analytics ecosystem for the enterprise.
“Qlik is a pioneer in self-service BI and visual analytics," said Hjalmar Gislason, VP of data at Qlik. “Dremio shares our vision of making analysts and data scientists increasingly independent and productive. I have been waiting for a solution like Dremio to emerge in the rapidly evolving landscape of modern data sources, and am excited about the benefits it will bring to our more than 40,000 customers.”
“With more than 100,000 curated datasets, Enigma is the leading provider of analysis-ready public data,” said Hicham Oudghiri, CEO of Enigma Technologies. ”Customers rely on our open source intelligence to enrich their enterprise data to drive smarter decision making. Dremio’s approach for self-service data analytics can drive immense productivity in all types of organizations. We are excited to partner with this innovative open source company.”
Opinions expressed by DZone contributors are their own.