Over a million developers have joined DZone.

Polyglot Persistence: Embrace the ETL

· Java Zone

Learn more about Kotlin, a new programming language designed to solve problems that software developers face every day brought to you in partnership with JetBrains.

Over the past few years I’ve seen the emergence of polyglot persistence i.e. using different data storage technologies for different data and in most situations we work that out up front.

Etl1

For example we might use MongoDB to store data about a customer journey through our website but we might simultaneously write page view data through to something like Hadoop or Redshift:

This works reasonably well but sometimes it might not be immediately obvious how we want to query our data when we first start collecting it and our storage choice might not be the best for writing these queries.

An interesting thing to think about at this stage is whether it makes sense to add a stage to our data processing pipeline where we write an ETL job to get it into a more appropriate format:

Etl2

My initial experience doing this was when I created the ThoughtWorks graph which involved transforming data into a graph so that I could find links between people.

Ashok and I followed a similar approach for a client we went on to work for and it allowed us to find the answers to questions that couldn’t be answered when the data was in its original format.

The main down side to this approach is that we now have to keep two data sources in sync but it’s interesting to think about whether this trade off is worthwhile if it helps us gain new insights or find the answers to questions more quickly.

I don’t have any experience with how this approach plays out over time so I’d be interesting in hearing how people have got on with this approach/if it does or doesn’t work.

The Java Zone is brought to you in partnership with JetBrains.  Discover how powerful static code analysis and ergonomic design make development not only productive but also an enjoyable experience.

Topics:

Published at DZone with permission of Mark Needham , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}