Over a million developers have joined DZone.

When Is It Time for a Data Warehouse?

DZone's Guide to

When Is It Time for a Data Warehouse?

To do analysis right, first, you need to achieve centralized and easily queryable data. The problem that many companies face is that their data is siloed.

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

Image title

Without capable data analytics, it is practically impossible to compete in any contemporary business sector. But to do analysis right, first, you need to achieve centralized and easily queryable data. The problem that many companies face is that their data is siloed. In other words, it resides in numerous different locations — each potentially with its own data model and query language. Perhaps each department in the company maintains its own dataset or maybe a whole new group of additional databases was gained in an acquisition. In either case, the fact remains that having as few as two disparate data stores can make it difficult to easily perform analyses.

Data Warehouses

The definitive solution to siloed data is a data warehouse (DWH), which, by uniting separate data stores, allows you to query from a central location and enables connections to business intelligence and visualization tools like Tableau and Qlik. Most DWHs utilize massively parallel queries, which divides querying control among the respective servers in a setup. This differs from a search on a traditional database cluster, which uses a more centralized control structure, and it means that a practically unlimited number of servers can be added to a DWH setup.

A primary advantage of a DWH is that operational and analytical data operations are separately maintained, the former in your original data stores and the latter in your DWH. This means that analytical operations don’t add additional workloads to your operational databases and alternately that you can stockpile large amounts of historical data in your DWH. This stockpiling will eventually allow you to employ sophisticated analyses on “big” amounts of data — like machine learning.

There is no doubt that a DWH solution can scale and empower the growth of a business and its data. So, the significant question is: Which data warehouse to choose?


A recent entrant to the DWH market is TaranHouse, an economical solution that is intended for businesses that are just starting to feel the limitations of traditional relational solutions and that are ready to experiment with the advantages that a DWH can bring. In fact, such businesses would see instant speed gains just by dumping and restoring a single relational database into TaranHouse. Consider, for example, that a moderately complex SQL query performed against TaranHouse outperforms the same query performed against a relational database by approximately a factor of five. And calling TaranHouse directly from Tableau in “live mode” is a completely different experience than calling from a relational database or Tableau’s embedded server (have a look at the video below to see the above TaranHouse scenarios in action).

TaranHouse can reside in a cloud of your choosing, on-prem, or a combination of the two. With TaranHouse cloud, your data is safe, as all data is replicated to three data centers and any node failure triggers an automatic failover. And unlike competing products that may require proprietary hardware, TaranHouse can be infinitely horizontally scaled on commodity servers for significant cost savings. Additionally, TaranHouse offers many user-selected parameters, such as the ability to choose between row and column-based configurations, and between ODBC, JDBC, Python, and R connectors for BI tools. Finally, note that TaranHouse is designed to be combined with the Tarantool in-memory database for real-time monitoring alerts and dynamic decision-making.

For a free cloud trial, visit www.taranhouse.com.

12 Best Practices for Modern Data Ingestion. Download White Paper.

big data ,data warehouse ,taranhouse ,rdbms ,data analytics

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}