Over a million developers have joined DZone.

Polyglot Persistence: Embrace the ETL

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

Over the past few years I’ve seen the emergence of polyglot persistence i.e. using different data storage technologies for different data and in most situations we work that out up front.

Etl1

For example we might use MongoDB to store data about a customer journey through our website but we might simultaneously write page view data through to something like Hadoop or Redshift:

This works reasonably well but sometimes it might not be immediately obvious how we want to query our data when we first start collecting it and our storage choice might not be the best for writing these queries.

An interesting thing to think about at this stage is whether it makes sense to add a stage to our data processing pipeline where we write an ETL job to get it into a more appropriate format:

Etl2

My initial experience doing this was when I created the ThoughtWorks graph which involved transforming data into a graph so that I could find links between people.

Ashok and I followed a similar approach for a client we went on to work for and it allowed us to find the answers to questions that couldn’t be answered when the data was in its original format.

The main down side to this approach is that we now have to keep two data sources in sync but it’s interesting to think about whether this trade off is worthwhile if it helps us gain new insights or find the answers to questions more quickly.

I don’t have any experience with how this approach plays out over time so I’d be interesting in hearing how people have got on with this approach/if it does or doesn’t work.

The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}