Deploy a Secure Enterprise Data Hub on Microsoft Azure (Part 2)

DZone 's Guide to

Deploy a Secure Enterprise Data Hub on Microsoft Azure (Part 2)

Using Cloudera Director, you can deploy a cluster over Azure. There's a decent amount of configuration involved, but once it's done, your secure data hub will be set.

· Cloud Zone ·
Free Resource

In part one of this series, we covered all the prerequisites needed to deploy a CDH cluster on the Microsoft Azure cloud platform. In part two, we will cover the resources required on the Azure platform and actually deploy a cluster with Cloudera Director.

Cloudera Director Use Case

Cloudera Director simplifies cluster creation and lessens the time to an operational cluster on the cloud. It’s a great tool for running POCs in your organization. It’s also ideal for transient workloads in the cloud, where the exact compute resources and requirements are unknown. 

Microsoft Azure Portal

I recommend reading the Cloudera Enterprise Reference Architecture for Azure Deployments and the Cloudera Director Getting Started on Microsoft Azure for recommendations and best practices.

The “Getting Started” document includes instructions for all the configurations and information required for Cloudera Director to deploy a cluster on the Azure Portal.

First, you will need to gather these four pieces of Azure credentials information for Cloudera Director:  Subscription ID, Tenant ID, Client ID, and Client Secret. Secondly, the following resources are also needed to configure Cloudera Director:

  • Resource Group: morantus-rg
  • Network Security Group (NSG): morantus-nsg
  • Virtual Network (Vnet): morantus-vnet
  • Availability Sets:
    • availmgmt: Cloudera Director.
    • availedge: Cloudera Manager.
    • availmgmt2: Cluster Management nodes.
    • availworker: Cluster DataNodes.

The values in bold are the name of Azure resources I have configured in my Azure Portal. These resources are explained in full details in the RA documentation.

Active Directory

As discussed in part one, in order to join the nodes in the cluster to Active Directory Domain Service and create/update DNS entries in DNS Manager, we need some configurations in place to support our deployment.

The following are required on the AD side:

  • Properly configured DNS Server
  • Privilege user account to join nodes to the ADCS and Create/Update DNS entries
  • Cloudera Manager user account to create Kerberos Service Principal Names (SPNs)

DNS Server

As you may recall, the Azure Cloud service does not provide reverse DNS natively. You must configure your own DNS server, such as BIND or AD. Active Directory is what I am using for my DNS.

In addition to the DNS configurations, there are user accounts and OU setups that are also required to be configured in AD. If you need a refresher, you can jump right into this blog, which covers similar operations when deploying to AWS.


Similarly to Cloudera Manager, Cloudera Director also relies on a RDBMS back-end to store all the metadata for the cluster. In this deployment, I am using a preconfigured MySQL Database for this purpose. For more on MySQL configuration, go to Cloudera Documentation.

Cloudera Director Virtual Machine

We are now ready to provision a VM to install Cloudera Director. Step-by-step instructions are available here.  Pay close attention to the section where you specify a username for your VM. This account will be used to SSH to all nodes, and will also be the owner of resources provisioned by Cloudera Director to create the cluster.

In my deployment, I used “azuredirectoradmin” as the Azure portal privileged account for my VM and the cluster. Here are some key specs for my VM:

  • VM size: D3
  • OS Image: Cloudera CentOS 7.2
  • Storage: Standard
  • Public IP

In this deployment, I elected to assign a public IP for this VM. In an environment where a VPN or ExpressRoute connection is available, it’s not recommended to assign a public IP to any nodes in your cluster. We recommend that all master and worker nodes are placed in the same Network Security Group (NSG).

Configure VM to Install Director

In this section, we will configure the Cloudera Director VM to join the AD domain and create DNS entries for both forward and reverse DNS with FQDN resolutions. These steps were explained in detail in part one of this series. Download, review, and run these scripts from Github.

Login to your VM with the appropriate account and copy the downloaded files to the tmp directory.  Switch to the “root” user to execute director-addnsjoin.sh script, which also calls OS-bootstrap.sh script. This separation is done on purpose to show which script accomplishes which function. The director-addnsjoin.sh script does the AD join, SSSD, and SAMBA configurations.  The OS-bootstrap.sh maintains the DNS records. There are two very important modifications to this script to allow secure DNS updates to the AD DNS Manager:

  • Added to acquire a Kerberos ticket to update DNS entries in AD
kinit -kt /etc/krb5.keytab "$princ"

  • Added -g to initiate a secure update to AD DNS
nsupdate -g "$nsupdatecmds"

A successful run should look like below:

Successful Run

Java 8 and Cloudera Director Installation

Cloudera Director installs Java 7 by default. We will deploy this cluster using Java 8 instead. The following steps must be completed as the root user or a user with sudo privileges.

Install Java JDK 8 and JCE Policy file for encryption. Oracle requires that you acknowledge you have read and accepted the Oracle license terms.

vi /etc/cloudera-director-server/application.properties
lp.database.type: mysql
lp.database.username: director
lp.database.password: director
lp.database.host: morantus-blog.cloudera.morantus.com
lp.database.port: 3306
lp.database.name: director

Start Cloudera Director:

Login to the Director UI and accept the license agreement. Use admin/admin as credentials http://director_publicIP:7189

Note: after login, it’s a good idea to change the default password by clicking the dropdown next to the ‘admin’ username in the upper-right corner of the screen. 

Cloudera Director

Deploy Cluster With Cloudera Director

The remaining steps must be executed as the Cloudera Director admin user you created earlier. In my case, that’s the “azuredirectoradmin” account. All resources created by Cloudera Director in the Azure Portal will be owned by this account. The “root” user is not allowed to create resources on the Azure Portal.

First, we’ll need to create a SSH key as the “azuredirectoradmin” user on the VM where Cloudera Director is installed. This key will be added to our deployment configuration file, which will be added on all the VMs provisioned by Cloudera Director. This will allow us to use passwordless SSH to the cluster nodes with this key.

Create the ssh key (do not enter a passphrase, keep all defaults):

Configure the Cloudera Director Configuration File

Download and inspect configuration file blog.azure.conf from Github, which we will use to create the cluster with Cloudera Director. There are a few sections I want to point out.

1. Cloud Provider

Here you specify your Azure credentials information such as subscriptionID, tenantID, clientID, and Client Secret. Visit this Azure resource and/or your

Azure Portal administrator for this information

2.  SSH Login Key

Add the SSH created earlier for passwordless login to the cluster:

3.  Instance Templates

Define the VM profiles to use for each node type such as management, worker, and edge nodes. This is also the section where you specify all the resources created earlier in the Azure Portal like the ResourceGroup, Network Security Group, etc.

4.  Bootstrap-script

In this section, we combine the same scripts used to prepare the Cloudera Director VM earlier. We also join each VM created by Cloudera Director to the AD domain and create forward and reverse DNS entries for FQDN name resolutions. Finally, we manually installed JAVA 8 on all the nodes.

Note: I am installing the MySQL JDBC Driver on all the nodes as well. The driver is required for all services that are backed by my preconfigured MySQL database. There’s an option to have Cloudera Director automatically create databases for you. As part of that process, it will install a version of the MySQL JDBC Driver for you. I like to control this process, so I do this installation manually.

5.  databaseServers

Specify the database instance information where Cloudera Director would automatically create the databases for you on the fly.

6.  Cloudera-manager

In this section, you specify all the configurations for Cloudera Manager. Since this is a secured deployment with authentication enabled, we will define our username and password to connect to Active Directory to create the Kerberos Service

Principals to secure our cluster. This step will take care of the integration with Active Directory Kerberos service.

Instruct Cloudera Manager to NOT install Java 7:

unlimitedJce: true

Note: This should only be enabled if allowed in your country or jurisdiction.

Active Directory Details:

Here we specify all the configurations for the cluster. Services to install, Enable HDFS High Availability, Database details for dependent services, etc. There are more example configuration files on the Cloudera Director GitHub page.

At this point, you are now ready to create your cluster with Cloudera Director using the configuration file. Use the default username and password “admin/admin”.

Create the cluster:

cloudera-director bootstrap-remote /home/azuredirectoradmin/config/blog.azure.conf --lp.remote.username=admin --lp.remote.password=admin --lp.remote.hostAndPort=localhost:7189

Creating the Cluster

If you run into any error, restart the deployment or if you want to delete your cluster, you can terminate it by running:

A successful deployment will look like the following:

Azure Portal with all nodes

Azure Portal with all nodesCloudera director dashboard

Cloudera Director Dashboard

Cloudera manager dashboard

Cloudera Manager DashboardVerify Kerberos is enabled

Verify Kerberos is Enabled


You should now be able to configure the Azure Portal, provision a VM for Cloudera Director, join your VMs to an AD domain, create DNS entries in AD DNS server, and provision a Kerberized cluster with Cloudera Director using configuration files.

cloud, cloudera, data hub, microsoft azure, tutorial

Published at DZone with permission of James Morantus , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}