Over a million developers have joined DZone.

The Joys of Map Reduce Thanks to CouchDB

DZone's Guide to

The Joys of Map Reduce Thanks to CouchDB

· Database Zone
Free Resource

Download the Guide to Open Source Database Selection: MySQL vs. MariaDB and see how the side-by-side comparison of must-have features will ease the journey. Brought to you in partnership with MariaDB.

I was tasked with tracking everything a user could do with any of our products. Easy enough, right? I basically only needed to track every page view on our website, every gameplay call to our API, certain Javascript function calls, all attempts to call incomplete, but still publicly visible web and API features, and then just random user actions that are arbitrarily interesting. Oh, and let's not forget the part where we turn that big blob of crazy data into something meaningful for me and my cofounders. Ohhhhhh-kay.

How would I approach that problem in a relational database world? Well, step 1 would probably be start crying, followed quickly by step 2, wetting my pants. I'd embarrass myself like so because the data that I described above is all different. Each class of data has different things we care about. For example, if it's a page view on the website, where did they come from? If it's an API call, who was making the call? If it's a random user action like a successful upload of a video, what'd they upload? The data is more different than it is similar.

With the data being that disparate, I could've explored a few different relational approaches. I could've come up with a table for each one of these user accesses, with maybe a base UserAccess table that I join against, and then have big switch statements for determining what I insert and select from. Or, I could just have one mega table that has 9 gagillion, nullable columns. Perhaps I would've gone with a simple, completely generic table structure that stored all of the interesting parts of the data in XML, and then reach for the Wild Turkey when it came time to query that. I've tried all of these approaches before, and it always seemed harder than it should've been and it resulted in something that was very difficult to maintain.

Fortunately, I didn't have to engage in any of that idiocy because I have a little friend named CouchDB. As you might well know, CouchDB is non-relational and schema-free; it's one of those wacky NoSQL databases. It happens to serve us particularly well for great big blobs of data with differing structures, much like all of this user access data.

It's one thing to be adept at storing wacky data like this, but the hard part is really the analysis portion, where you select the data in such a way that sense can be made out of the whole thing. Fortunately for me and everyone who must put up with me, this is made easy via some wonderful tools borrowed from the world of functional programming. I'm speaking specifically of map and reduce. (If it's been a while since you brushed up on what map and reduce do, pay a visit to Mr. Wikipedia.)

Since each user access, no matter what kind, gets stored as a unique document inside of CouchDB, it's simple for me to write a map function that goes through each document and emits the fields I'm interested in. It's similarly simple to write a reduce function that accumulates all of that data and does something interesting with it, whether it's summing, averaging, or some funnel analysis. (I should note one awesome aspect of CouchDB - its native support for viewing your data via maps and reduces.)

Once I had all of my data stored, I was able to produce a pretty impressive dashboard of exactly what our users are doing via 7 or 8 different map and reduce functions. The function themselves were quite simple too, just two or three lines of code. Could I have recreated those same results in standard SQL? Sure. Would I have wanted to? Hells to the no. Relational databases are great for certain problems, but for flexibly structured data, I encourage everyone to dip their toes into the deep end of NoSQL and map reduce.

Interested in reducing database costs by moving from Oracle Enterprise to open source subscription?  Read the total cost of ownership (TCO) analysis. Brought to you in partnership with MariaDB.


Published at DZone with permission of Cody Powell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}