DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Introduction to Couchbase for Oracle Developers and Experts: Part 4: Data Modeling
  • Useful System Table Queries in Relational Databases
  • Designing a Blog Application Using Document Databases
  • Relational DB Migration to S3 Data Lake Via AWS DMS, Part I

Trending

  • Useful System Table Queries in Relational Databases
  • Memory Leak Due to Time-Taking finalize() Method
  • Building a Real-Time Audio Transcription System With OpenAI’s Realtime API
  • Using Java Stream Gatherers To Improve Stateful Operations
  1. DZone
  2. Data Engineering
  3. Databases
  4. 10 Techniques to Boost Your Data Modeling

10 Techniques to Boost Your Data Modeling

Data modeling is a crucial process in most any business. In this post, we take a look at ten techniques that can help improve your data modeling skills.

By 
Shelby Blitz user avatar
Shelby Blitz
·
May. 09, 17 · Opinion
Likes (1)
Comment
Save
Tweet
Share
7.6K Views

Join the DZone community and get the full member experience.

Join For Free

With new possibilities for enterprises to easily access and analyze their data to improve performance, data modeling is morphing, too. More than arbitrarily organizing data structures and relationships, data modeling must connect with end user requirements and questions as well as offer guidance to help ensure the right data is being used in the right way for the right results. The ten techniques described below will help you enhance your data modeling and its value to your business.

1. Understand the Business Requirements and Results Needed

The goal of data modeling is to help an organization function better. As a data modeler who's collecting, organizing, and storing data for analysis, you can only achieve this goal by knowing what your enterprise needs. Correctly capturing those business requirements to know which data to prioritize, collect, store, transform, and make available to users is often the biggest data modeling challenge. So, we can’t say it enough: get a clear understanding of the requirements by asking people about the results they need from the data. Then start organizing your data with those ends in mind.

2. Visualize the Data to Be Modeled

Staring at countless rows and columns of alphanumeric entries is unlikely to bring enlightenment. Most people are far more comfortable looking at graphical representations of data that make it quick to see any anomalies or using intuitive drag-and-drop screen interfaces to rapidly inspect and join data tables. Data visualization approaches like these help you clean your data to make it complete, consistent, and free from error and redundancy. They also help you spot different data record types that correspond to the same real-life entity (“Customer ID” and “Client Ref.” for example) to then transform them to use common fields and formats, making it easier to combine different data sources.

3. Start With Simple Data Modeling and Extend Afterwards

Data can become complex rapidly due to factors like size, type, structure, growth rate, and query language. Keeping data models small and simple at the start makes it easier to correct any problems or wrong turns. When you are sure your initial models are accurate and meaningful, you can bring in more datasets, eliminating any inconsistencies as you go. You should look for a tool that makes it easy to begin yet can support very large data models afterward, also letting you quickly “mash-up” multiple data sources from different physical locations.

4. Break Business Enquiries Down Into Facts, Dimensions, Filters, and Order

Understanding how business questions can be defined by these four elements will help you organize data in ways that make it easier to provide answers. For example, suppose your enterprise is a retail company with stores in different locations, and you want to know which stores have sold the most of a specific product over the last year. In this case, the facts would be the overall historical sales data (all sales of all products from all stores for each day over the past “N” years), the dimensions being considered are “product” and “store location,” the filter is “previous 12 months,” and order might be “top five stores in decreasing order of sales of the given product.” By organizing your data using individual tables for facts and for dimensions, you facilitate the analysis for finding the top sales performers per sales period, and for answering other business intelligence questions as well.

5. Use Just the Data You Need Rather Than All the Data Available

Computers working with huge datasets can soon run into problems of computer memory and input-output speed. However, in many cases, only small portions of the data are needed to answer business questions. Ideally, you should be able to simply check boxes on-screen to indicate which parts of datasets are to be used, letting you avoid data modeling waste and performance issues.

6. Make Calculations in Advance to Prevent End User Disagreements

A key goal of data modeling is to establish one version of the truth, against which users can ask their business questions. While people may have different opinions on how an answer should be used, there should be no disagreement on the underlying data or the calculation used to get to the answer. For example, a calculation might be required to aggregate daily sales data to derive monthly figures which can then be compared to show best or worst months. Instead of leaving everyone to reach for their calculators or their spreadsheet applications (both common causes of user error), you can avoid problems by setting up this calculation in advance as part of your data modeling and making it available in the dashboard for end users.

7. Verify Each Stage of Your Data Modeling Before Continuing

Each action should be checked before moving to the next step, starting with the data modeling priorities from the business requirements. For example, an attribute called the primary key must be chosen for a dataset, so that each record in the dataset can be identified uniquely by the value of primary key in that record. Suppose you chose “ProductID” as a primary key for the historical sales dataset above. You can verify that this is satisfactory by comparing a total row count for “ProductID” in the dataset with a total distinct (no duplicates) row count. If the two counts match, “ProductID” can be used to uniquely identify each record; if not, look for another primary key. The same technique can be applied to a join of two datasets to check that the relationship between them is either one-to-one or one-to-many and to avoid many-to-many relationships that lead to overly complex or unmanageable data models.

8. Look for Causation, Not Just Correlation

Data modeling includes guidance in the way the modeled data is used. While empowering end users to access business intelligence for themselves is a big step forwards, it is also important that they avoid jumping to wrong conclusions. For example, perhaps they see that sales of two different products appear to rise and fall together. Are sales of one product driving sales of the other one (a cause and effect relationship), or do they just happen to rise and fall together (simple correlation) because of another factor such as the economy or the weather? Confusing causation and correlation here could lead to targeting wrong or non-existent opportunities, and thus wasting business resources.

9. Use Smart Tools to Do the Heavy Lifting

More complex data modeling may require coding or other actions to process data before analysis begins. However, if such “heavy lifting” can be done for you by a software application, this frees you from the need to learn about different programming languages and lets you spend time on other activities of value to your enterprise. A suitable software product can facilitate or automate all the different stages of data ETL (extracting, transforming, and loading). Data can be accessed visually without any coding required, different data sources can be brought together using a simple drag-and-drop interface, and data modeling can even be done automatically based on the query type.

10. Make Your Data Models Evolve

Data models in business are never carved in stone because data sources and business priorities change continually. Therefore, you must plan on updating or changing them over time. For this, store your data models in a repository that makes them easy to access for expansion and modification, and use a data dictionary or “ready reference” with clear, up-to-date information about the purpose and format of each type of data.

Better Data Modeling Leads to Greater Business Benefit

Business performance in terms of profitability, productivity, efficiency, customer satisfaction, and more can benefit from data modeling that helps users quickly and easily get answers to their business questions. Key success factors for this include linking to organizational needs and objectives, using tools to speed up the steps in readying data for answers to all queries, and making priorities of simplicity and common sense. Once these conditions are met, you and your business, whether small, medium, or big, can expect your data modeling to bring you significant business value.

Data dictionary Data modeling Relational database Database Boost (C++ libraries)

Published at DZone with permission of Shelby Blitz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Introduction to Couchbase for Oracle Developers and Experts: Part 4: Data Modeling
  • Useful System Table Queries in Relational Databases
  • Designing a Blog Application Using Document Databases
  • Relational DB Migration to S3 Data Lake Via AWS DMS, Part I

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!