Over a million developers have joined DZone.

Hadoop 2.6 and Native Encryption-at-Rest

DZone's Guide to

Hadoop 2.6 and Native Encryption-at-Rest

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

At Rest, as in not motion, not REST as in web services. For a long time the only real answer Hadoop had for encryption at rest was to leverage a third-party tool or consider the use of LUKS for whole disk encryption. What I see customers asking for these days is really encryption in motion (aka wire encryption), encryption at rest (at a HDFS layer) plus policies that will eliminate data from specific directories in HDFS based upon some business rule. The good news is that as of Hadoop 2.6 we now have HDFS-6134 in play so there is light at the end of the security tunnel.

The implementation of this new transparent encryption is supported via the normal Hadoop Filesystem Java API, the libhdfs C API and WebHDFS (REST) API. The great news is that once it is set up normal HDFS ACL control access to reading and writing so while there is some administration upfront from a user perspective there is not a terribly large new burden. This essentially means that third-party integration work should be largely left intact.

There is now a Key Management Server (KMS) used to create keys for the encryption process of “encryption zones” also know as directories in HDFS.

#run as super user (hdfs) or other authorized management user
hadoop key create Project1Key
hdfs crypto ­createZone ­keyName Project1Key ­path /home/adam/projectzone

So how does all this happen? The design doc describes both the read and write action processes. Illustrated here is the read process:


So how does one functionally use encryption zones? Cloudera has a great docs page talking about how to create encryption zones based upon the technology used (over hdfs).

Like most newly invented technology the design doc also calls out some potential issues with the design. While there are some potential vulnerabilities called out in the spec I would still say this is a massive step in the right direction. I also noted this first version really only uses AES-CTR.

It might seem like this is small matter but in the larger context of a security discussion native encryption at rest is an important part of the Hadoop puzzle.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.


Published at DZone with permission of Adam Diaz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}