Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Deciphering Data to Uncover Hidden Insights: Data Modeling

DZone 's Guide to

Deciphering Data to Uncover Hidden Insights: Data Modeling

We take a look at how to use data modeling on a sample data set. In this tutorial, we'll use the Quick BI tool for demonstration purposes.

· Big Data Zone ·
Free Resource

This multi-part article talks about how to collect data, wrangle the data, ingest the data, model the data, and visualize the data from three viewpoints (conceptual, practical, and best practice).

In the first article of this series, we have seen how to understand data conceptually through an example from the Banking, Financial Services, and Insurance (BFSI) domain. In the second article, we have learned how to wrangle the data (i.e. cleaning the data) according to your business scenario to Alibaba Cloud Quick BI.

In this article, I will share how to model the data according to your business scenario in Quick BI. We will be using Quick BI for this tutorial, so please ensure that you have registered for an Alibaba Cloud account. If you haven't, sign up for a free account through this link.

Model the Data (Conceptual)

A data model for one line of business is hardly appropriate for another line of business. Often times, the needs of different businesses, or even departments, do not closely align together. Because of this, understanding and analyzing how an organization should collect, update, and store data become a critical problem.

In the previous articles, we have seen how to collect, cleanse, and organize our data. However, we're still missing an essential part of big data analysis – data modeling. Data modeling has become a critical skill for the business analysts who are involved with discovering, analyzing, and specifying changes to how software systems create and maintain information. To better understand data modeling, let's quickly recap the concepts of OLTP and OLAP.

What Is OLTP?

OLTP stands for Online Transactional Processing and is designed to serve as a persistent state store for front-end applications. They can quickly look up transactional procedures like INSERT, UPDATE, or DELETE.

Some common tasks asked of OLTP systems include:

  1. Finding the age of the current user when given a user ID.
  2. Update the billing addresses for a set of clients.
  3. Delete the users who have not accessed the page for more than 1 year.

These types of problems require a system that can look up and update one or more columns within one or many rows. The strength of OLTPs is that they support fast writes. A typical workload for OLTP is both frequent reads and writes, but the reads tend to be more of looking up a specific value rather than scanning all values to compute an aggregate.

Common OLTP systems are:

  1. MySQL
  2. PostgreSQL

Alibaba Cloud provides the ApsaraDB for RDS service for OLTP systems, including both MySQL and PostgreSQL databases.

What Is OLAP?

In contrast to an OLTP database, an OLAP database is designed to process large datasets quickly to answer questions about data.

Common use cases for an OLAP database are:

  1. What's the average cost spent on maintenance?
  2. What's the user's overall active hours?

An OLAP database is optimized for scanning and performing computations across many rows of data for one or multiple columns. Since OLAP is optimized for analyzing data, basic transactional procedures like writes or updates tend to be done in infrequent batches, typically once a day or an hour. OLAP shines when it comes to reads and analytical calculations like aggregation. Several well-known OLAP systems are:

  1. AnalyticDB
  2. Teradata

Alibaba Cloud provides OLAP services with HybridDB and DataWorks.

What Is Data Modeling?

The term data model refers to two very different but related concepts: a description of a data structure and the way data is organized.

Types of Data Modeling

  1. Conceptual Data Models: High-level, static business structures and concepts.
  2. Logical Data Models (LDMs): Entity types, data attributes, and relationships between entities.
  3. Physical Data Models (PDMs): The internal schema database design.

Data Modeling in Business Intelligence

Data modeling is the key to success in Business Intelligence (BI). It is paramount that the process is business-centered. Data modeling in BI is a way of mapping out all the different data sources and how these sources of data will fit together and flow into one another. We need to get the data from the places where a piece of software or application stores the information into a centralized place (Data Warehouse) for querying and analyzing the data to derive business values.

What Is Data Warehousing?

Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad-hoc queries, and decision making.

Facts: A fact table consists of facts of a business process. A fact table record captures a measurement or a metric.

Dimensions: A dimension table contains the textual descriptor of the business. The fields of dimension tables are designed to satisfy these two important requirements:

  1. Query constraining/grouping/filtering.
  2. Report labeling.

For example, if we were to examine "Sales Revenue by Product," we would most probably see this structure: Sales Revenue (Fact) and Product (Dimension).

Schemas in Data Warehousing

A schema is a logical description of the entire database. It includes the name and description of records of all record types including all associated data items and aggregates. Like a database, a data warehouse also requires maintaining a schema. A database uses the relational model, while a data warehouse uses the Star and Snowflake schema.

Star Schema: In Star Schema, the dimension is represented with a one-dimensional table. Each dimension is connected to only one Fact Table in a star-like structure.

1

Snowflake Schema: Unlike the Star Schema, the dimensions in a snowflake schema are normalized. The normalization splits up the data into additional tables.

2

Model the Data (Practical)

Before entering the practical session of modeling the data, we need to know about the process flow of Quick BI.

The basic workflow of Quick BI is as follows:

  1. Create a data source.
  2. Create a dataset.
  3. Create a workbook, worksheet, or dashboard.
  4. Building a data portal.
  5. Subscribe for timely reports.

Note: Quick BI supports OLAP Data modeling, semantics such as dimensions (dates and locations), measures, star schema, and snowflake schema. This allows you to define calculated fields. The product allows you to edit the existing SQL script of the data source to customize your dimensions and measures.

Use Case 1: ATM Analytics

We will look at our first use case, an ATM dataset. We already ingested the data to Quick BI. We are going to model the data and for that, you need to save the data as a dataset.

Procedure for creating a dataset from a file:

  1. Select the Cube Icon.
  2. Enter the name and specify the location.
  3. Click Ok.

    3
  4. The dataset is created.

Modeling the Data

We are going to model the data by editing the dataset accordingly to meet the requirements of the business scenario, such as joining the two tables or creating a column.

  1. Select the data set.
  2. Click the Edit icon.

    BI software
  3. The data set management page is displayed.
    Data set management

Now we are going to model the data. I will explain various things like changing the data type, creating a calculated column, and creating the hierarchy.

Changing the Dimension or Measure Name

  1. Click the Gear icon
  2. Click Edit to Change the Name and to add notes. 

    Data table










  3. Click Ok
    Changing a file name
  4. The file name is changed.

Basic Functions:

  1. Click the Gear icon. 
    Data table
  2. Duplicate the Dimension or Measure values (create a copy of the column).
  3. Hide the Analysis column (it will hide the column in the report interface).
  4. Cancel all hidden columns (remove all the hidden columns and make it available for reporting).
  5. Delete (delete the column).

Creating a Calculated Column (Dimension)

  1. Click the Gear ocon -> Create a Calculated Column or clikc the '+' icon
    Creating a data table
  2. Search the function on the right-side pane for reference.
  3. Name the Calculated column.
  4. Write the expression (you can add a column by simply double-clicking it on the left-side pane).
  5. Click OK. 

Joining the Dataset

Let's look at the ATM Master Details

  1. Ingest the ATM Master Details data into Quick BI.
  2. Click the Join icon.
  3. Select the Dataset Field from 'ATM.'
  4. Select the Other Dataset, ATM Master Details.
  5. Select the Join on Field.
  6. Click OK. 

    1011

Changing the Dimension Type

  1. Click the Gear icon.
  2. Select the Change Dimension type.
  3. Select Location.
  4. Choose your Country.

    Data tableData table

Creating a Calculated Column (Measure)

  1. Click the Gear icon -> Create a Calculated Column or click the '+' icon. 
    14
  2. Search the function on the right-side pane for reference.
  3. Name the Calculated column.
  4. Write the expression (you can add a column by simply double-clicking it in the left-side pane).
  5. Click OK. 
    Data table

Creating the Hierarchy

  1. Click the Gear icon.
  2. Select 'Create Level.'

    Data table
  3. Hierarchy Level is created.
  4. To add other columns in the hierarchy, select the column with the gear icon.
  5. Click 'Move to'.
  6. The column is added to the Hierarchy.
    Data table

Changing the Aggregation Type

  1. Click the Gear icon.
  2. Select Aggregations.
  3. Select Anyone from the list (e.g. Average) .
    Data table

Changing the Number Format

  1. Click the Gear icon.
  2. Select Number Format.
  3. Select Anyone from the list (e.g. #,###0).

    Data table

Use Case 2: Customer 360

We can apply the same concepts to our next use case, Customer360. The steps are similar, we just need to replace the ATM data to perform all the basic tasks for data modeling. With this understanding, I think we can model the data to our business scenario.

Model the Data (Best Practices)

  1. Validate the data types.
  2. Create 'Calculated' columns whenever necessary.
  3. Change the Location to the Location Type in QuickBI, otherwise, it will not detect it.
  4. Numbers should be an integer.
  5. Format strings as per the business scenario.
  6. Format the date as per the business scenario.
  7. Star Schema is preferable than Snowflake Schema.
  8. Use custom SQL to model the data.

Summary

If you followed the steps correctly, you should have successfully modeled your data in Quick BI as per the demands of your businesses. In my opinion, this is the toughest part in the process of deciphering data to uncover hidden insights. We will explore how we can visualize our data and narrate data stories in future articles of this multi-part series. Stay tuned.

Topics:
data science ,big data ,data modeling ,data modeling software ,data modeling tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}