Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

High-Level Architecture for HDFS Slurper V2

DZone's Guide to

High-Level Architecture for HDFS Slurper V2

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

The current HDFS Slurper was created as part of writing “Hadoop in Practice,” and it just so happened that it also happened to fulfill a need that we had at work. The one-sentence description of the Slurper is that it’s a utility that copies files between Hadoop file systems. It’s particularly useful in situations where you want to automate moving files from local disk to HDFS, and vice-versa.

While it has worked well for us, with the addition of a few choice features it could be even more useful:

  • Filter and projection, to remove or reduce data from input files
  • Write to multiple output files from a single input file
  • Keep source files intact

As such I have come up with a high-level architecture for what v2 may look like (subject to change of course).

Slurper v2 architecture

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:

Published at DZone with permission of Eric Gregory. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}