Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

ETL Tools and Analytics: A Match Made in Heaven

DZone's Guide to

ETL Tools and Analytics: A Match Made in Heaven

ETL alone is not enough. Once you've extracted, transferred, and loaded your data from one data warehouse to another, you need to be able to use the data to get insights.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Imagine this: you order an item of flat-packed furniture, only to have it turn up deconstructed and flat-packed, and with the instruction manual in, say, Korean. There's nothing wrong with those instructions — they make perfect sense when you know the language — but you can't make out a word of it. Without a way to decipher how it fits together, all you can do is optimistically fiddle about with the pieces and see if you can create something that makes sense — even if a few bits are screwed on wrong, and you're not sure it's really doing what it's supposed to.

This, essentially, is what happens when you put data through an ETL process. The data is organized using SQL, packed up and cleaned for transfer, and then deposited at the other end beautifully packaged — but in a language that simply doesn't work for analytics.

Because of this, some people have thrown up their hands and said, "Ah, ETL and analytics — they just can't mix!" ...which is a shame, because, in fact, ETL and analytics can make an excellent match. You just need to get them speaking the same language.

Can't Live Without 'Em

The fact of the matter is that ETL alone is not enough. Once you've extracted, transferred, and loaded your data from one data warehouse to another (or to a data mart, or a delimited flat file, or whatever), you need to be able to use that data to get vital insights about your business, and that means finding some way to link to a powerful visualization tool.

Matchmaker, Matchmaker...

One way of getting around this is to use an integrated tool that combines ETL with a wider BI platform, allowing you to draw data from the original warehouse directly into the system and feed it into a self-service, user-facing dashboard for querying and analysis.

Sisense's ElastiCubes, for example, can handle 99% of all ETL functions, skipping the need for an external tool completely for many types of business and data project.

While it works beautifully in most cases, there are some scenarios where a separate ETL will still be preferable.

If, for example, you're a massive global conglomerate with oceans of data to process, you're using multiple, complex data sources at volume, or you need to document every step of the process for compliance purposes, a built-in ETL tool might not have the capacity you need.

In these situations, you're likely to get much better performance by investing in external ETL capabilities. But where does that leave your analytics language barrier issue?

Well, you simply need something that can act as an interpreter. To return to our flat-pack analogy, a system that can jump in with a key to show you exactly how all the pieces come together so that you can swiftly pull all the pieces together into something elegant and functional without losing or misinterpreting anything along the way.

A Match Made in Data Heaven?

By using a sophisticated connector, for example, you can let the external ETL do its thing and then draw in data from the target data store once the process is complete. A solution like the ElastiCube can handle SQL without any trouble, allowing you to import the data you need into the BI platform and start feeding into beautiful visualizations in pretty much real time.

Self-Service Dashboards

Even better, from the end user's perspective, if you work with a genuinely self-service solution, the whole process should flow smoothly and beautifully, without them having to know anything about the complex translations and transitions going on behind the scenes.

The visualization tool you choose should be flexible and powerful enough to allow business users to build and manipulate their own dashboards and to run ad-hoc queries using any data sources they have permission to access, without sacrificing on either speed or granularity. They should also be able to do all of this autonomously, without having to reach out to IT for help.

The Extra Perks

Ideally, whichever tool you select to elevate your ETL into the realm of analytics, your data should stay put: it shouldn't need to be transferred yet again, to a third location.

For example, the way we work this at Sisense is by deploying a unique combination of columnar database and chip technology, so that only the data required for the calculation at hand is loaded into the system — even though the platform has full access to all data in all the sources it's been connected to, as soon as you need them. This data remains on-demand, and as questions are asked, is called up super-quick lending itself to faster and faster answers.

This keeps your data secure and means that you can store all your sensitive data locally if required for compliance reasons, without having to pay out for hugely expensive infrastructure to handle reporting and analysis.

In other words, finding a system that can talk to your data directly like this means improved performance, governance, and data security — as well as user-friendly dashboards that transform valuable data into invaluable insights.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,etl ,data analytics ,data visualization

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}