Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Deploy a Secure Enterprise Data Hub on Microsoft Azure (Part 1)

DZone's Guide to

Deploy a Secure Enterprise Data Hub on Microsoft Azure (Part 1)

Learn how to use Cloudera Director, Microsoft Active Directory, Samba, and SSSD to deploy a secure EDH cluster for workloads in the public cloud.

· Cloud Zone
Free Resource

Production-proven Mesosphere DC/OS is now even better with GPU scheduling, pods, troubleshooting, enhanced security, and over 100+ integrated services deployed in one-click.

The first line of security we recommend is authenticating users in Apache Hadoop. Like most, if not all RDBMSes, a user is provided with a username and a password to validate their identity. This is a requirement to access any data managed by those systems. The goal is the same in Apache Hadoop. Since the Hadoop stack does not have an authentication component, Kerberos Key Distribution Center is used as the mechanism to identify users.

There are two implementations of a Kerberos KDC that are supported on a CDH cluster: A MIT KDC installation, and/or integration with Microsoft Active Directory (AD) built-in Kerberos KDC. Generally, the latter is recommended to our enterprise customers and the blog will focus on a direct integration of CDH and the Active Directory KDC. This integration is favored because of other tools that will be used to communicate with Active Directory.   

Active Directory

Active Directory is mainly known for its Domain Service (AD DS) service as an Identity Management service which authenticates users and groups. However, there are other powerful services within AD like AD CS, and AD DNS.

On May 6, 2016, my colleague, Ben Spivey wrote a blog on securing a cluster on Amazon AWS. He covered a great deal on the AD DS and AD CS services. For more details, Ben’s blog is a good place to start. This blog will spend more time on AD DNS service.

Active Directory Domain Name System

Deploying a CDH cluster requires both forward and reverse name resolution for internal IP addresses. When deploying a cluster on-premises, this is usually done by your system administrator. When you deploy a cluster on Amazon AWS, this is automatically configured when you launch an EC2 instance.

A forward DNS lookup is resolving a Fully Qualified Domain Name (FQDN) to an IP address, and a reverse DNS lookup is doing the opposite, resolving an IP address to a FQDN. Currently, Microsoft Azure does not provide reverse DNS lookup for internal private IP addresses. This will be covered later.

There are many options for DNS when deploying on Azure. You can install the supported BIND package for your Linux OS, an existing Active Directory Domain Name System, etc. This blog will cover the AD DNS in more details.

If not already configured, ensure your AD administrator has properly configured a reverse DNS zone in the DNS Manager as seen below.

Reverse Zone

Image title


The important section in the figure above is the red box in the “Reverse Lookup Zones.” This illustrates the zone configured to host all the DNS objects for a particular subnet.

Forward Zone

This is a view of the “Forward Lookup Zones” for the CLOUDERA.MORANTUS.COM domain.

Image title

Also a view of my OU tree showing zero entries.

Azure Virtual Machine

I provisioned a VM in Azure with all the default DNS settings, and we will join it to our AD DS and DNS services.

Image title


As you can see, the hostname -f command displays a very long FQDN for my VM and hostname -i gives us the IP address associated with the VM. Next, I did a forward DNS lookup using the host FQDN command, which resolved to the IP address. Then, I did a reverse DNS lookup using host IPaddress as shown in the red box above, it did not locate a reverse entry for that IP address. A reverse lookup is a requirement for a CDH deployment. We’ll revisit this later.

Samba

In order to configure our RHEL 6.7 VM to communicate with Active Directory, we need to configure a tool called Samba. Samba is a Linux based utility that enables the integration of Linux systems with AD.

First, join the VM to AD with Samba. Ensure the DNS servers property for your Virtual Network in the 

Image title


Next, install packages needed to integrate with AD.

sudo yum install -y samba-common krb5-workstation openldap-clients


After that, configure the VM to point to the AD DNS server.

Image title

The nameserver is the IP address for the AD server. This can also be accomplished by running “service network restart” on the VM.

Once that's done, configure samba to join the AD domain and verify the entry in AD. This must be executed as a privileged user. In this case “jmorantus” is an admin account in Active Directory.

Image title

Note: You can ignore the failed DNS update error showed above. We need to create a Kerberos keytab with a privileged account to update/create DNS objects in AD. This step will be executed later.

Image title

As you can above, we succeeded joining our VM to the AD domain and an AD object was created in the OU servers.

Configure Kerberos krb5.conf file to generate keytab file to update DNS in ADKeytab File GenerationThen, Update/Create Forward and Reverse DNS entries.

Image title

Here's a view of the Forward DNS entry added to AD DNS service. 

Image title

And here's a view of reverse DNS entry added to AD DNS service.

Note: it’s worth mentioning that Active Directory will age DNS entries that it considers “inactive”. An additional process should be implemented to keep these entries “alive” in AD.

SSSD

The System Security Service Daemon is used to cache users and groups information locally to a Linux system. This integration is also necessary to configure authorization with Apache Sentry for data access.

Image title

Now that SSSD is fully configured, we’ll verify we can read user information from AD.

Image title

Here you can see with SSSD stopped, the VM does not know user “scm-cloudera.” With SSSD running, the user information was pulled from AD. If you are looking for a commercial option, Cloudera also recommends Centrify.

Conclusion

You should now be able to configure a VM on Azure, join an AD domain, and create DNS entries in AD DNS server. These steps will work for any other cloud provider and on-premise deployments. In Part 2 of this series, we’ll cover creating a Kerberized cluster with Cloudera Director on Azure.

This article was originally published on Cloudera.

Simply build, test, and deploy. Mesosphere DC/OS is the best way to run containers and big data anywhere offering production-proven flexibility and reliability.

Topics:
cloud ,tutorial ,active directory ,microsoft azure ,data hub

Published at DZone with permission of James Morantus, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}