Over a million developers have joined DZone.

Hadoop 2.6 and Native Encryption-at-Rest

· Big Data Zone

Compliments of Zaloni: Download free eBook "Architecting Data Lakes" to learn the key to building and managing a big data lake, brought to you in partnership with Zaloni.

At Rest, as in not motion, not REST as in web services. For a long time the only real answer Hadoop had for encryption at rest was to leverage a third-party tool or consider the use of LUKS for whole disk encryption. What I see customers asking for these days is really encryption in motion (aka wire encryption), encryption at rest (at a HDFS layer) plus policies that will eliminate data from specific directories in HDFS based upon some business rule. The good news is that as of Hadoop 2.6 we now have HDFS-6134 in play so there is light at the end of the security tunnel.

The implementation of this new transparent encryption is supported via the normal Hadoop Filesystem Java API, the libhdfs C API and WebHDFS (REST) API. The great news is that once it is set up normal HDFS ACL control access to reading and writing so while there is some administration upfront from a user perspective there is not a terribly large new burden. This essentially means that third-party integration work should be largely left intact.

There is now a Key Management Server (KMS) used to create keys for the encryption process of “encryption zones” also know as directories in HDFS.

#run as super user (hdfs) or other authorized management user
hadoop key create Project1Key
hdfs crypto ­createZone ­keyName Project1Key ­path /home/adam/projectzone

So how does all this happen? The design doc describes both the read and write action processes. Illustrated here is the read process:

HDFSEcryptRead

So how does one functionally use encryption zones? Cloudera has a great docs page talking about how to create encryption zones based upon the technology used (over hdfs).

Like most newly invented technology the design doc also calls out some potential issues with the design. While there are some potential vulnerabilities called out in the spec I would still say this is a massive step in the right direction. I also noted this first version really only uses AES-CTR.

It might seem like this is small matter but in the larger context of a security discussion native encryption at rest is an important part of the Hadoop puzzle.

Zaloni, the data lake company, provides data lake management and governance software and services. Learn more about Bedrock and Mica

Topics:

Published at DZone with permission of Adam Diaz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}