DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Block Placement and Multi-Tenancy: Where Is Your Data?

Block Placement and Multi-Tenancy: Where Is Your Data?

With the ability to place data not only on specific nodes but also on specific drives within a node, MapR FS is really an elegant solution.

Adam Diaz user avatar by
Adam Diaz
·
Mar. 20, 17 · Opinion
Like (2)
Save
Tweet
Share
3.53K Views

Join the DZone community and get the full member experience.

Join For Free

Most folks tend to think of the Hadoop storage layer as a large hard drive. At a high level, I guess this is a fair assumption. The real issue comes to light when one considers actual block placement in Hadoop. Many architects want to design systems for multitenancy using Hadoop as a core part of their design. An HDFS system has a very particular strategy for block placement. Within a single cluster, blocks cannot be restricted to a set of hosts (using a default build of HDFS) or even a set of drives within a host.

This means that even though users might feel that both HDFS POSIX ACLs-level permissions protect data from unauthorized access, this only applies to folks using the front door. As they say, locks are for honest people. Only users attempting to access your data via Hadoop-based methods will be blocked. The unscrupulous could still for instance directly access nodes and therefore, the blocks placed in the Linux file system accessed by Hadoop. Hadoop itself leverages the Linux file system. Encryption of this data at rest is a partial answer to this dilemma but this is a very new feature of Hadoop 3 not yet adopted by most distribution vendors.

The classical answer has been to engage a third-party vendor for a more robust answer to the issue of Hadoop encryption at rest. At the end of the day, the block could still be accessed (isolating direct login to data node helps) and theoretically decrypted. I'm sure this could easily be the topic or a least a subplot of a spy movie.

The other alternative is to simply not use HDFS. The MapR FS, for example, has none of these issues because it essentially is a true file system. Block are not the unit of replication. The Namenode metadata is not an issue as its distributed and one cannot access the MapR FS nor any of its components in any way other than the front door (via the MapR client).

Along with the ability to place data not only on specific nodes but also on specific drives within a node, MapR FS is really a more elegant solution. For these reasons, true data isolation is really guaranteed with MapR FS. With HDFS, the correct answer to guarantee data isolation is really an entirely alternate HDFS subsystem (another cluster). HDFS has the concept of namespaces, but that really addresses the small files issue and isn't an enforcement method for data placement on nodes or drives. The comparison on a feature-by-feature basis tends to leave HDFS wanting.

The lesson here is to really study what is included and how components are configured prior to engaging in the use of Hadoop for a multi-tenant infrastructure.

Data (computing) Blocks hadoop File system

Published at DZone with permission of , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Using GPT-3 in Our Applications
  • How To Best Use Java Records as DTOs in Spring Boot 3
  • Real-Time Analytics for IoT
  • Container Security: Don't Let Your Guard Down

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: