Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Loading into Amazon Redshift Simplified: The Podcast, Part 1

DZone's Guide to

Data Loading into Amazon Redshift Simplified: The Podcast, Part 1

In this post, covering the first part of our podcast, we’ll discuss the business side of querying.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

You can hear the whole podcast at this link.

There are two sides to everything, and in the case of software feature development, always at least two stories to be told: that of the business person who requires the feature, and the developer who creates and maintains it.

Treasure Data has added support for exporting query results to Amazon Redshift. Data loading, from any source into Redshift, via Treasure Data, is dead easy. What’s more, data can come from any source;  it’s schemaless, and result export is as simple as adding your Redshift information and making one click.

In this post, covering the first part of our podcast, we’ll discuss the business case, with Treasure Data Sales Engineer Prakhar Agarwal;  in the second part, we’ll talk implementation specifics with architect and feature developer Sadayuki Furuhashi.

result_AmazonRS

Below is a summary of the questions and answers.

Treasure Data:  What are the main motivations for Amazon Redshift output from Treasure Data?

Prakhar: A lot of people use Redshift, but over time realize that their dataset in Redshift is growing. People prefer to use Redshift as a parallel processing database.  The data has grown so much that the costs skyrocket.

Our idea is to use Treasure Data as an intermediate step;  you implement your raw data on TD; you run queries on TD, and push query results to Redshift.  This process helps significantly reduce the cost of Redshift;  it also helps in a scenario where you are looking at aggregated data in your BI tool on Redshift, and you want to get to a specific detail. You pick one of them in your connected BI tool, and you can easily get all of the details for that one data point from Treasure Data.

TD:  What are the pain points from customers that have motivated the move on our part?

Prakhar:  It’s the ease of use that our system brings in, along with the various options for data export it provides.  Right now, it’s very difficult and tedious to load data into Redshift.  For us, adding Redshift result export enables us to complete the analytics pipeline, and enables customers to simply and easily export data there, with only one step.

Treasure Data will simplify this process: you can keep your raw data and you can significantly reduce your Redshift cost.  It’s a win-win from both sides.

TD:  What would the target audience for Amazon Redshift result output be?

Prakhar:  I would say any gaming company, maybe some ad-tech companies, for example.   Their data gets large over time, along with their scaling/planning costs and Redshift cost. We recently had a call with a company whose Redshift costs skyrocketed to $20k/month!  Our solution comes at the perfect time.

TD:  What kind of people are we talking to right now to grow the momentum around using Treasure Data result export into Amazon Redshift? Who (both individual roles and actual companies) could use Treasure Data as a funnel into Redshift?

Prakhar:  As far as individual roles go, any person who’s building a data infrastructure:  could be a data engineer, VP of Marketing or even VP of engineering.  Just yesterday, I was looking at companies that had both VP of Engineering and VP of Data.  The Big Data guy is responsible for providing a big data platform for the company.  These are the kinds of folks who think in terms of technology and money, both. They don’t want to spend a lot of money on a solution that does only one thing. Treasure Data was less expensive than other options they were considering…and it also fulfills a lot of things they were not expecting at all.

In terms of industries, it’s a lot of people who want to do high-performance queries;  a lot of people who are interested in monitoring or alert-based systems, including IOT.   It’s costly to store all the data in Redshift though, so they might use TD as a raw data store, aggregate it over time, and then periodically push to RS, where it can be consumed by a BI tool.

TD:  Would this be similar to a “lambda-architecture, ” where we are using one store for short-term, low latency data and another for longer term, historical (but slightly higher-latency and slower) data query?

Prakhar: Absolutely.  And what will happen is that over time, these two worlds will collide.  So you’ll have one system which can do both things – and we are moving in that direction now.

Note: Appeared on Treasure Data blog written by John Hammink.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
big data ,amazon redshift ,data loading

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}