Over a million developers have joined DZone.

The Joys of Map Reduce Thanks to CouchDB

DZone's Guide to

The Joys of Map Reduce Thanks to CouchDB

· Database Zone
Free Resource

Traditional relational databases weren’t designed for today’s customers. Learn about the world’s first NoSQL Engagement Database purpose-built for the new era of customer experience.

I was tasked with tracking everything a user could do with any of our products. Easy enough, right? I basically only needed to track every page view on our website, every gameplay call to our API, certain Javascript function calls, all attempts to call incomplete, but still publicly visible web and API features, and then just random user actions that are arbitrarily interesting. Oh, and let's not forget the part where we turn that big blob of crazy data into something meaningful for me and my cofounders. Ohhhhhh-kay.

How would I approach that problem in a relational database world? Well, step 1 would probably be start crying, followed quickly by step 2, wetting my pants. I'd embarrass myself like so because the data that I described above is all different. Each class of data has different things we care about. For example, if it's a page view on the website, where did they come from? If it's an API call, who was making the call? If it's a random user action like a successful upload of a video, what'd they upload? The data is more different than it is similar.

With the data being that disparate, I could've explored a few different relational approaches. I could've come up with a table for each one of these user accesses, with maybe a base UserAccess table that I join against, and then have big switch statements for determining what I insert and select from. Or, I could just have one mega table that has 9 gagillion, nullable columns. Perhaps I would've gone with a simple, completely generic table structure that stored all of the interesting parts of the data in XML, and then reach for the Wild Turkey when it came time to query that. I've tried all of these approaches before, and it always seemed harder than it should've been and it resulted in something that was very difficult to maintain.

Fortunately, I didn't have to engage in any of that idiocy because I have a little friend named CouchDB. As you might well know, CouchDB is non-relational and schema-free; it's one of those wacky NoSQL databases. It happens to serve us particularly well for great big blobs of data with differing structures, much like all of this user access data.

It's one thing to be adept at storing wacky data like this, but the hard part is really the analysis portion, where you select the data in such a way that sense can be made out of the whole thing. Fortunately for me and everyone who must put up with me, this is made easy via some wonderful tools borrowed from the world of functional programming. I'm speaking specifically of map and reduce. (If it's been a while since you brushed up on what map and reduce do, pay a visit to Mr. Wikipedia.)

Since each user access, no matter what kind, gets stored as a unique document inside of CouchDB, it's simple for me to write a map function that goes through each document and emits the fields I'm interested in. It's similarly simple to write a reduce function that accumulates all of that data and does something interesting with it, whether it's summing, averaging, or some funnel analysis. (I should note one awesome aspect of CouchDB - its native support for viewing your data via maps and reduces.)

Once I had all of my data stored, I was able to produce a pretty impressive dashboard of exactly what our users are doing via 7 or 8 different map and reduce functions. The function themselves were quite simple too, just two or three lines of code. Could I have recreated those same results in standard SQL? Sure. Would I have wanted to? Hells to the no. Relational databases are great for certain problems, but for flexibly structured data, I encourage everyone to dip their toes into the deep end of NoSQL and map reduce.

Learn how the world’s first NoSQL Engagement Database delivers unparalleled performance at any scale for customer experience innovation that never ends.


Published at DZone with permission of Cody Powell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}