The Joys of Map Reduce Thanks to CouchDB
Join the DZone community and get the full member experience.Join For Free
How would I approach that problem in a relational database world? Well, step 1 would probably be start crying, followed quickly by step 2, wetting my pants. I'd embarrass myself like so because the data that I described above is all different. Each class of data has different things we care about. For example, if it's a page view on the website, where did they come from? If it's an API call, who was making the call? If it's a random user action like a successful upload of a video, what'd they upload? The data is more different than it is similar.
With the data being that disparate, I could've explored a few different relational approaches. I could've come up with a table for each one of these user accesses, with maybe a base UserAccess table that I join against, and then have big switch statements for determining what I insert and select from. Or, I could just have one mega table that has 9 gagillion, nullable columns. Perhaps I would've gone with a simple, completely generic table structure that stored all of the interesting parts of the data in XML, and then reach for the Wild Turkey when it came time to query that. I've tried all of these approaches before, and it always seemed harder than it should've been and it resulted in something that was very difficult to maintain.
Fortunately, I didn't have to engage in any of that idiocy because I have a little friend named CouchDB. As you might well know, CouchDB is non-relational and schema-free; it's one of those wacky NoSQL databases. It happens to serve us particularly well for great big blobs of data with differing structures, much like all of this user access data.
It's one thing to be adept at storing wacky data like this, but the hard part is really the analysis portion, where you select the data in such a way that sense can be made out of the whole thing. Fortunately for me and everyone who must put up with me, this is made easy via some wonderful tools borrowed from the world of functional programming. I'm speaking specifically of map and reduce. (If it's been a while since you brushed up on what map and reduce do, pay a visit to Mr. Wikipedia.)
Since each user access, no matter what kind, gets stored as a unique document inside of CouchDB, it's simple for me to write a map function that goes through each document and emits the fields I'm interested in. It's similarly simple to write a reduce function that accumulates all of that data and does something interesting with it, whether it's summing, averaging, or some funnel analysis. (I should note one awesome aspect of CouchDB - its native support for viewing your data via maps and reduces.)
Once I had all of my data stored, I was able to produce a pretty impressive dashboard of exactly what our users are doing via 7 or 8 different map and reduce functions. The function themselves were quite simple too, just two or three lines of code. Could I have recreated those same results in standard SQL? Sure. Would I have wanted to? Hells to the no. Relational databases are great for certain problems, but for flexibly structured data, I encourage everyone to dip their toes into the deep end of NoSQL and map reduce.
Published at DZone with permission of Cody Powell, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.