Platinum Partner
architects,bigdata,tool,tools & methods,big data

High-Level Architecture for HDFS Slurper V2

The current HDFS Slurper was created as part of writing “Hadoop in Practice,” and it just so happened that it also happened to fulfill a need that we had at work. The one-sentence description of the Slurper is that it’s a utility that copies files between Hadoop file systems. It’s particularly useful in situations where you want to automate moving files from local disk to HDFS, and vice-versa.

While it has worked well for us, with the addition of a few choice features it could be even more useful:

  • Filter and projection, to remove or reduce data from input files
  • Write to multiple output files from a single input file
  • Keep source files intact

As such I have come up with a high-level architecture for what v2 may look like (subject to change of course).

Slurper v2 architecture

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}