DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Big Data
  4. What Is Data Warehousing?

What Is Data Warehousing?

A look at the benefits of data warehousing for big data professionals, and the various ways data warehouses can aid with the ingestion and analysis of data.

Garrett Alley user avatar by
Garrett Alley
·
Jun. 12, 18 · Opinion
Like (3)
Save
Tweet
Share
6.27K Views

Join the DZone community and get the full member experience.

Join For Free

Today's enterprise relies on the effective collection, storage, integration, and analysis of data. These activities have moved to the heart of revenue generation, cost containment, and profit optimization. As such, it's no surprise that the amounts of data generated - as well as the number and types of data sources - have exploded.

Data-driven companies require heavy-duty solutions for managing and analyzing large quantities of data across their organizations. These systems must be scalable, reliable, and secure enough for regulated industries, as well as flexible enough to support a wide variety of data types. These requirements go way beyond the capabilities of any traditional database. That's where the data warehouse comes in.

Breaking it Down: What Is a Data Warehouse, Anyway?

A data warehouse is a large-capacity repository that sits on top of multiple databases. Whereas the conventional database is optimized for a single data source, such as payroll information, the data warehouse is designed to handle a variety of data sources, such as sales data, data from marketing automation, real-time transactions, SaaS applications, SDKs, APIs, and more.

There are other differences, as well. For example, single-source databases are built for speed, employing online transactional processing (OLTP) to insert and edit small transactions. However, due to their structure, they do not lend themselves to advanced analytics. In contrast, a data warehouse uses online analytical processing (OLAP), which is designed for fast, sophisticated analysis.

Databases and data warehouses do have some similarities, however. Besides the fact that they are both repositories for large amounts of data, both can be queried. And they both have the ability to store data in tables (although databases only store data in two-dimensional tables; data warehouses contain multidimensional tables with layers of columns and rows).

On-Premise and in the Cloud

Companies are increasingly moving away from on-premise data warehouses to the cloud, leveraging the cost savings and scalability managed services can provide. The architecture of these cloud-enabled data warehouses differs from that of their traditional, on-premise counterparts.

Traditional data-warehouse architecture is separated into three tiers: one for the database server that extracts data from multiple data sources, one for the OLAP server (which transforms the data), and one for the client level.

Cloud-based data warehouses are an entirely different animal. Their architecture varies tremendously among vendors. For example, Amazon’s Redshift is essentially a cloud-based representation of on-premise data warehouses. BigQuery is serverless so it manages computing resources dynamically and hides resource-management decisions from the user.

The cloud offers some distinct advantages:

  • It's managed. Instead of hiring your own data-warehousing team, a cloud data warehouse lets you outsource the management hassle to professionals who must meet service level agreements (SLAs).
  • It outperforms on-premise data warehouses. Cloud-based solutions offer superior reliability and speed. They are generally more secure than on-premise data warehouses, making them a good choice for the enterprise.
  • It's built for scale. Cloud-based data warehouses are elastic, so you can instantly add capacity.
  • It's more cost-effective. With cloud, you pay for what you use. Some providers charge by throughput. Others charge per hour per node. In every case, you avoid the mammoth costs incurred by an on-premise data warehouse that runs 24 hours a day, seven days a week.

Check out this guide to selecting the right cloud-based data warehouse for your environment.

Do You Need a Data Warehouse?

Some businesses and industries require more data analysis than others. For example, Amazon uses real-time data to adjust prices three or four times a day. Insurance companies track policies, sales, claims, payroll, and more. They also use machine learning to predict fraud. Gaming companies must track and react to user behavior in real-time to enhance the player's experience. Data warehouses make all of these activities possible.

If your organization has or does any of the following, you're probably a good candidate for a data warehouse:

  • Multiple sources of disparate data.
  • Big-data analysis and visualization - both asynchronously and in real-time.
  • Custom report generation/ad-hoc analysis.
  • Data mining.
  • Machine learning/AI.
  • Data science.

These activities and assets require more than the traditional single-source database can provide. They require an "industrial-strength" data warehouse.

Big data Database Data warehouse

Published at DZone with permission of Garrett Alley, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Core Machine Learning Metrics
  • Learning by Doing: An HTTP API With Rust
  • The New DevOps Performance Clusters
  • Connecting Your Devs' Work to the Business

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: