Almost ten years ago, Google changed the Big Data world when it released a paper on MapReduce, a model for working through immense amounts of data that gained immediate traction in the community and continued, for a decade, to be the norm. At its developer conference on Wednesday, however, Google followed a burgeoning trend in dumping MapReduce in favor of what they're calling Google Cloud Dataflow.
The company realizes that time has moved on however and there is a need for a broader tool that allows for ingestion, transformation and analysis of data in ways that cover both streaming data and more traditional batch data processing.
The service is also a response to an exponentially growing set of data to sift through; ten years ago, social media wasn't as abundant, prolific or widely-used; wearable tech was barely a thought and traditional statistical means for digging through data were more than enough to provide valuable insights. As Wired writes,
Long ago, with a sweeping software system called MapReduce, Google set the standard for processing “big data.” A tool that ran across hundreds of servers, MapReduce is what the company used to build the enormous index of webpages that underpins its search engine. Thanks to an open source clone of MapReduce–Hadoop–the rest of the world now crunches data in similar ways. But Hölzle says that Google not longer uses MapReduce. It now uses other Flume, aka FlumeJava, for this kind of massive “batch processing.”
The new service will allow Google to build more complex data pipelines; it will also integrate with MillWheel stream processing. From Venturebeat,
It can either run a series of computing jobs, batch-style, or do constant work as data flows in. Engineers can start using the service in Google’s burgeoning public cloud. Google takes care of managing the thing.