Over a million developers have joined DZone.

Fun with TCP-H - AWS RedShift

DZone's Guide to

Fun with TCP-H - AWS RedShift

· Cloud Zone
Free Resource

MongoDB Atlas is a database as a service that makes it easy to deploy, manage, and scale MongoDB. So you can focus on innovation, not operations. Brought to you in partnership with MongoDB.

This is the third article I’ve done on the TPC-H benchmark, and part 2 on AWS RedShift. Read the first article here, and the first part on AWS Redshift here.

Previously, I covered loading data into a database in RedShift, and discovered (not unsurprisingly) that the performance of SQL Server on a laptop is pretty much the same as that on RedShift, for a 1GB dataset. No great surprises there.

This time, I’m using a 100GB data set. Having spent quite a while generating the dataset, and then getting it uploaded to S3, I’m now in a position to start the loads.

GZIP is King!

One of the great features of RedShift is that it’ll load GZIP’d datasets directly into the database, you just need to add the gzip parameter at the end. So the load statements below are the same as before, they now have gzip at the end, and I’m loading from a gzip file rather than the straight text version.

copy customer from ‘s3://oldnick-tpch/customer.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;
copy orders from ‘s3://oldnick-tpch/orders.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;
copy lineitem from ‘s3://oldnick-tpch/lineitem.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;
copy nation from ‘s3://oldnick-tpch/nation.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;
copy part from ‘s3://oldnick-tpch/part.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;
copy partsupp from ‘s3://oldnick-tpch/partsupp.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;
copy region from ‘s3://oldnick-tpch/region.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;
copy supplier from ‘s3://oldnick-tpch/supplier.tbl.gz’ CREDENTIALS ‘aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>’ delimiter ‘|’ gzip;

I quite like the AWS interface for loading the data, so while the loads above are running, I can go into the AWS RedShift management console and see the progress of the loads:


I can also see various performance metrics while the jobs are running. Shown below are some of the more interesting ones. It’s particularly interesting that the CPU Utilisation is pegged at 100% while the load is running. I’m guessing that this is due to the loads being GZIP’d, so there’ll be an overhead of decompression in there, aside from the overhead of the load itself.








The Results

Having then run the same query, it took longer to run (as expected).

100gb Dataset Time to Return (sec)
Redshift (1 node cluster) cold 3min 11 sec
Redshift (1 node cluster) warm 2min 47 sec

So, with 100 times the data, the time to execute is slightly over 100 times the time. However, based on the maxing out of the CPU and the IOPS, the spec of the Redshift environment probably needs to be a higher spec for a 100gb dataset.

Comments are welcome as I’m aware that this is a specific test, and should not be taken as a rounded evaluation of Redshift.


MongoDB Atlas is the best way to run MongoDB on AWS — highly secure by default, highly available, and fully elastic. Get started free. Brought to you in partnership with MongoDB.


Published at DZone with permission of Nick Haslam, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}