DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Block Placement and Multi-Tenancy: Where Is Your Data?

Block Placement and Multi-Tenancy: Where Is Your Data?

With the ability to place data not only on specific nodes but also on specific drives within a node, MapR FS is really an elegant solution.

Adam Diaz user avatar by
Adam Diaz
·
Mar. 20, 17 · Big Data Zone · Opinion
Like (2)
Save
Tweet
3.38K Views

Join the DZone community and get the full member experience.

Join For Free

Most folks tend to think of the Hadoop storage layer as a large hard drive. At a high level, I guess this is a fair assumption. The real issue comes to light when one considers actual block placement in Hadoop. Many architects want to design systems for multitenancy using Hadoop as a core part of their design. An HDFS system has a very particular strategy for block placement. Within a single cluster, blocks cannot be restricted to a set of hosts (using a default build of HDFS) or even a set of drives within a host.

This means that even though users might feel that both HDFS POSIX ACLs-level permissions protect data from unauthorized access, this only applies to folks using the front door. As they say, locks are for honest people. Only users attempting to access your data via Hadoop-based methods will be blocked. The unscrupulous could still for instance directly access nodes and therefore, the blocks placed in the Linux file system accessed by Hadoop. Hadoop itself leverages the Linux file system. Encryption of this data at rest is a partial answer to this dilemma but this is a very new feature of Hadoop 3 not yet adopted by most distribution vendors.

The classical answer has been to engage a third-party vendor for a more robust answer to the issue of Hadoop encryption at rest. At the end of the day, the block could still be accessed (isolating direct login to data node helps) and theoretically decrypted. I'm sure this could easily be the topic or a least a subplot of a spy movie.

The other alternative is to simply not use HDFS. The MapR FS, for example, has none of these issues because it essentially is a true file system. Block are not the unit of replication. The Namenode metadata is not an issue as its distributed and one cannot access the MapR FS nor any of its components in any way other than the front door (via the MapR client).

Along with the ability to place data not only on specific nodes but also on specific drives within a node, MapR FS is really a more elegant solution. For these reasons, true data isolation is really guaranteed with MapR FS. With HDFS, the correct answer to guarantee data isolation is really an entirely alternate HDFS subsystem (another cluster). HDFS has the concept of namespaces, but that really addresses the small files issue and isn't an enforcement method for data placement on nodes or drives. The comparison on a feature-by-feature basis tends to leave HDFS wanting.

The lesson here is to really study what is included and how components are configured prior to engaging in the use of Hadoop for a multi-tenant infrastructure.

Data (computing) Blocks hadoop File system

Published at DZone with permission of Adam Diaz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 5 Best JavaScript Web Development Frameworks
  • What Developers Need to Know About Table Partition Pruning
  • Artificial Intelligence (AI) And Its Assistance in Medical Diagnosis
  • Implementing Microservices Architectures

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo