{{announcement.body}}
{{announcement.title}}

Deciphering Data to Uncover Hidden Insights: Understanding the Data

DZone 's Guide to

Deciphering Data to Uncover Hidden Insights: Understanding the Data

In this article, we will walk you through the process of deciphering data in order to uncover hidden insights from their data.

· Big Data Zone ·
Free Resource
" The best vision is insight" – Malcolm Forbes.

By Ranjith Udayakumar, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

When it comes to data analytics for enterprises, nothing is more important than making accurate and reliable inferences from data. It is no surprise that enterprises are investing heavily in big data analytics as they can reap large profits with accurate insights. However, this is often easier said than done. Data collected from real-world applications is affected by many variables, making data prediction challenging. Regardless, data analytics remains essential for many, if not all, businesses around the world.

In this article, I will walk you through the process of deciphering data for uncovering hidden insights.

Is This Article for Me?

This article is meant for everyone! This includes students who just want to familiarize themselves with general concepts, professional data analysts who want to learn new ways to analyze data, and business decision makers who want to know how to get better insights from business data.

Prerequisites

This article covers the overall process of deciphering data from conceptual, practical, and best practice perspectives. Anyone with valid data can use this article as a guide to get insights from data with the help of open-source technologies. However, in this tutorial, I'll be using Alibaba Cloud QuickBI.

To use Alibaba Cloud QuickBI, you need to do the following:

  1. Create Account in Alibaba Cloud.
  2. Add a valid Payment Method to your account.
  3. Enroll yourself for a free trial of QuickBI Pro in your console.

Overview of the Article

For this article, we are going to be looking at:

  1. Domain - BFSI (Banking, Financial Services, and Insurance)
  2. Modules - From Understanding Data to Visual Stories
  3. Use cases - ATM Analytics, Customer 360

We will be covering the entire process of deciphering data. The overall process involves:

  1. Understanding the data
  2. Wrangling the data according to your business scenario (if needed)
  3. Ingesting the data
  4. Modeling the data
  5. Visualizing the data 

Understanding the Data (Conceptual)

When it comes to big data, more data isn't necessarily better. Your data is only as good as your ability to understand and communicate it, which is why understanding the data is so essential.

Once you've got your data, you need to consider the following problems:

  1. What do you do with it?
  2. What should you look for?
  3. Which tools should you use?

You will need to address these questions for your data analysis to be effective. We will provide some generalized answers for the above questions in this article.

What Do You Do With It?

We should analyze the data to understand the domain it belongs to. With the domain in mind we should ask the right questions against the data to get insights out of it. For example, if the data shows ATM location details, transaction types, the number of transactions, and transaction amounts, it clearly shows that the data belongs to the BFSI domain.

After we determine the domain, it's now our turn to decide what type of insights we can infer out of it from the given data. We will do this in our practical section.

What Should You Look For?

We should look for some "interesting" insights. As we discussed earlier, we need to ask the right questions against the data to understand it better and decipher insights.

For example, let's assume you have some understanding of the BFSI domain. Then, we should able to differentiate the Facts (measures) and Dimensions (other than measures) from the data to get a clear idea about the data.

It's now our turn to understand the facts and dimensions available, and the right questions that we need to ask about the given data. We can do this in our practical section.

What Tools Should You Use?

We need to choose the right tool to wrangle, process, and visualize the data effectively. There are a lot of tools available in the market, all of them with their own unique strengths.

When deploying on the cloud, I prefer using Alibaba Cloud Quick BI, which easily covers the majority of tasks needed at an affordable price.

  1. Quick BI allows you to perform data analytics, exploration, and reporting on mass data with drag-and-drop features and a rich variety of visuals.
  2. Quick BI enables users to perform data analytics, exploration, and reporting and empowers enterprise users to view and explore data and make informed, data-driven decisions.

In this article, we are going to utilize Alibaba Cloud QuickBI as a tool to decipher the data to get insights out of it. We will explore how to do this in our practical section.

Understanding the Data (Practical)

As we discussed earlier, we are going to understand the data better with real use cases.

Use Case 1: ATM Analytics

Here we will use the data from an ATM Dataset.

1

What Do You Do With This Data?

As mentioned previously, we know that this data belongs to the BFSI domain. Specifically, this data talks about ATM transactions. Now, before digging deeper, we need to understand the domain basics and how the business users will see it proceed with the next question.

What Should You Look For?

As we discussed earlier, we need to ask the right questions to understand the data better. We need to differentiate the Facts (measures) and Dimensions (other than the measures).

The Facts include:

  1. no_of_withdrawals
  2. no_of_cub_card_withdrawals
  3. no_of_other_card_withdrawals
  4. total_amount_withdrawn
  5. amount_withdrawn_cub_card
  6. amount_withdrawn_other_card

The Dimensions include:

  1. atm_name
  2. weekday
  3. festival_religion
  4. working_day
  5. holiday_sequence

After separating the facts and dimensions, we can now ask questions about the data. Questions may include:

  1. Total number of transactions.
  2. Total transaction amount.
  3. Top 5 ATMs by transaction volume.
  4. Top 5 ATMs by transaction amount.
  5. Lowest 5 ATMs by transaction volume.
  6. Lowest 5 ATMs by transaction amount.
  7. Number of different transactions by ATM.

These questions are key to deriving insights from the data. Without the right questions, we can't derive the value we need from the data.

Use Case 2: Customer 360

Here we will use the data from Customer360.

2

What Do You Do With It?

Similar to the previous use case, we know the data belongs to the BFSI domain, specifically on bank customer details. Now before digging deeper, we need to understand the domain basics and how the business users will see it to proceed with next question.

What Should You Look For?

Similarly, we need to differentiate the Facts (measures) and Dimensions (other than the measures).

The Facts are:

  1. Balance
  2. Duration
  3. Campaign
  4. Pdays
  5. Previous

The Dimensions are:

  1. Age
  2. Job
  3. Marital status
  4. Education
  5. Default
  6. Housing
  7. Loan
  8. Contact
  9. Day
  10. Month
  11. Poutcome
  12. Deposit

After separating the facts and dimensions, we can ask questions such as:

  1. Balance by job
  2. Balance by marital status
  3. Loan by age
  4. Loan by job

These questions are key to deriving insights from the data. Let's now look at the best practices of understanding data.

Understanding the Data (Best Practices)

Here are some of the best practices when trying to make sense out of data, particularly data relating to the two use cases above.

  1. Determine the appropriate domain, and understand the domain basics.
  2. Always ask the right questions about the data.
    1. Which ATMs fall under the Transaction Volume Benchmark?
    2. Which ATMs fall under the Transaction Amount Benchmark?
    3. Which ATMs fall under the Hit Rate Benchmark?
    4. Which ATMs perform well irrespective of External Influences?
    5. Top Violators.
    6. Income or Profitability of ATMs.
  3. Have a clear understanding of Facts and Dimensions.
  4. Name the columns meaningfully.
    1. "Job" as "Job Category"
    2. "Marital" as "Marital Status"
    3. "pdays" as "Previous Days"
    4. "poutcome" as "Previous Outcome"
  5. Name the columns in sentence case and always use space instead of underscore
    1. "Job_Category" as "Job Category"

Summary

I hope that this article gives you a better grasp of the basic principles of data analytics, specifically on understanding your data.

In the next article of this series, we will be exploring how to wrangle the data. Stay tuned!

Reference: Originally published on Alibaba Cloud

Topics:
data analytics ,business inteligence ,data science ,big data

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}