Self-Service Analytics Using Dremio
Learn about performing data transformation and data analysis using Dremio and performing data visualization using Tableau.
Join the DZone community and get the full member experience.
Join For FreeDremio, a self-service data platform, helps data analysts and data scientists to determine, organize, accelerate, and share any data at any time irrespective of volume, velocity, location, or structure. Dremio allows business users to access data from a variety of sources and prevents them from relying on developers.
In this blog, let's discuss data transformation and data analysis using Dremio and data visualization using Tableau.
Prerequisites
Download and install Dremio from here.
Data Description
Online retail data with different product types, product prices, and quantities sold from December 2010 to December 2011 is used as a data source.
Sample data source:
Synopsis:
- Connect different data sources with Dremio
- Perform data transformation
- Create virtual datasets in Dremio
- Connect virtual datasets with BI tools
- Visualize results in Tableau
Connecting Different Data Sources With Dremio
Different types of data sources available for performing data transformation activities are shown in the below screenshot:
To connect Amazon S3 data sources with Dremio, perform the following:
In Data Source Types page, select the Amazon S3 data source.
Connect to the Amazon S3 location as shown in the below screenshot:
Connect to the MySQL connection and provide the required credentials as shown in the below screenshot:
Connect to Network Attached Storage (NAS) as shown in the below screenshot:
Performing Data Transformation
To transform data, perform the following:
Use UNION
function to merge data from three different data sources such as S3, MySQL, and NAS and load data as virtual dataset as shown in the below screenshot:
As price values are based on single quantity, the total price needs to be calculated based on quantity.
Add Total_Price as a new field. Calculate the total price based on quantity as shown in the below diagram:
Perform aggregation with stock quantity and stock price based on the products in the source data as shown in the below diagram:
Round off the total price values to two decimal digits as shown in the below diagram:
Creating Virtual Datasets in Dremio
Upon successfully transforming data, create virtual datasets (View) on Dremio spaces to store the data based on the source.
The virtual dataset for purchases done by each customer is as shown below:
The virtual dataset for most quantity sold based on the product is shown in the below diagram:
Connecting Virtual Datasets With BI Tools
To connect the virtual datasets with BI tools, export the virtual dataset in .tds
format to be used with BI tools such as Tableau, Qlik Sense, and Power BI as shown in the below diagrams:
Visualizing Results in Tableau
On clicking the .tds
file in Tableau, you will be redirected to Tableau for visualizing the data.
Most purchases by customers:
Maximum number of products sold:
And that's it!
Published at DZone with permission of Rathnadevi Manivannan. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments