Deploy a Secure Enterprise Data Hub on Microsoft Azure (Part 2)
Using Cloudera Director, you can deploy a cluster over Azure. There's a decent amount of configuration involved, but once it's done, your secure data hub will be set.
Join the DZone community and get the full member experience.Join For Free
In part one of this series, we covered all the prerequisites needed to deploy a CDH cluster on the Microsoft Azure cloud platform. In part two, we will cover the resources required on the Azure platform and actually deploy a cluster with Cloudera Director.
Cloudera Director Use Case
Cloudera Director simplifies cluster creation and lessens the time to an operational cluster on the cloud. It’s a great tool for running POCs in your organization. It’s also ideal for transient workloads in the cloud, where the exact compute resources and requirements are unknown.
Microsoft Azure Portal
I recommend reading the Cloudera Enterprise Reference Architecture for Azure Deployments and the Cloudera Director Getting Started on Microsoft Azure for recommendations and best practices.
The “Getting Started” document includes instructions for all the configurations and information required for Cloudera Director to deploy a cluster on the Azure Portal.
First, you will need to gather these four pieces of Azure credentials information for Cloudera Director: Subscription ID, Tenant ID, Client ID, and Client Secret. Secondly, the following resources are also needed to configure Cloudera Director:
- Resource Group: morantus-rg
- Network Security Group (NSG): morantus-nsg
- Virtual Network (Vnet): morantus-vnet
- Availability Sets:
- availmgmt: Cloudera Director.
- availedge: Cloudera Manager.
- availmgmt2: Cluster Management nodes.
- availworker: Cluster DataNodes.
The values in bold are the name of Azure resources I have configured in my Azure Portal. These resources are explained in full details in the RA documentation.
As discussed in part one, in order to join the nodes in the cluster to Active Directory Domain Service and create/update DNS entries in DNS Manager, we need some configurations in place to support our deployment.
The following are required on the AD side:
- Properly configured DNS Server
- Privilege user account to join nodes to the ADCS and Create/Update DNS entries
- Cloudera Manager user account to create Kerberos Service Principal Names (SPNs)
As you may recall, the Azure Cloud service does not provide reverse DNS natively. You must configure your own DNS server, such as BIND or AD. Active Directory is what I am using for my DNS.
In addition to the DNS configurations, there are user accounts and OU setups that are also required to be configured in AD. If you need a refresher, you can jump right into this blog, which covers similar operations when deploying to AWS.
Similarly to Cloudera Manager, Cloudera Director also relies on a RDBMS back-end to store all the metadata for the cluster. In this deployment, I am using a preconfigured MySQL Database for this purpose. For more on MySQL configuration, go to Cloudera Documentation.
Cloudera Director Virtual Machine
We are now ready to provision a VM to install Cloudera Director. Step-by-step instructions are available here. Pay close attention to the section where you specify a username for your VM. This account will be used to SSH to all nodes, and will also be the owner of resources provisioned by Cloudera Director to create the cluster.
In my deployment, I used “azuredirectoradmin” as the Azure portal privileged account for my VM and the cluster. Here are some key specs for my VM:
- VM size: D3
- OS Image: Cloudera CentOS 7.2
- Storage: Standard
- Public IP
In this deployment, I elected to assign a public IP for this VM. In an environment where a VPN or ExpressRoute connection is available, it’s not recommended to assign a public IP to any nodes in your cluster. We recommend that all master and worker nodes are placed in the same Network Security Group (NSG).
Configure VM to Install Director
In this section, we will configure the Cloudera Director VM to join the AD domain and create DNS entries for both forward and reverse DNS with FQDN resolutions. These steps were explained in detail in part one of this series. Download, review, and run these scripts from Github.
Login to your VM with the appropriate account and copy the downloaded files to the tmp directory. Switch to the “root” user to execute director-addnsjoin.sh script, which also calls OS-bootstrap.sh script. This separation is done on purpose to show which script accomplishes which function. The director-addnsjoin.sh script does the AD join, SSSD, and SAMBA configurations. The OS-bootstrap.sh maintains the DNS records. There are two very important modifications to this script to allow secure DNS updates to the AD DNS Manager:
- Added to acquire a Kerberos ticket to update DNS entries in AD
princ="host/$host.$domain" kinit -kt /etc/krb5.keytab "$princ"
- Added -g to initiate a secure update to AD DNS
nsupdate -g "$nsupdatecmds"
A successful run should look like below:
Java 8 and Cloudera Director Installation
Cloudera Director installs Java 7 by default. We will deploy this cluster using Java 8 instead. The following steps must be completed as the root user or a user with sudo privileges.
Install Java JDK 8 and JCE Policy file for encryption. Oracle requires that you acknowledge you have read and accepted the Oracle license terms.
wget --no-cookies --no-check-certificate --header "Cookie: oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u60-b27/jdk-8u60-linux-x64.rpm" -O /tmp/jdk-8u60-linux-x64.rpm yum -y install /tmp/jdk-8u60-linux-x64.rpm wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip -O /tmp/jce_policy-8.zip unzip /tmp/jce_policy-8.zip -d /tmp/ unalias cp cp -f /tmp/UnlimitedJCEPolicyJDK8/*.jar /usr/java/*/jre/lib/security/
Download and install Cloudera Director:
wget "http://archive.cloudera.com/director/redhat/7/x86_64/director/cloudera-director.repo" -O /etc/yum.repos.d/cloudera-director.repo yum clean all yum -y install cloudera-director-server cloudera-director-client
Configure Director to use your existing MySQL database. Modify these values to your own environment:
vi /etc/cloudera-director-server/application.properties lp.database.type: mysql lp.database.username: director lp.database.password: director lp.database.host: morantus-blog.cloudera.morantus.com lp.database.port: 3306 lp.database.name: director
Start Cloudera Director:
service cloudera-director-server start
If you see any errors, view the log file located at /var/log/cloudera-director-server/application.log
Login to the Director UI and accept the license agreement. Use admin/admin as credentials http://director_publicIP:7189
Note: after login, it’s a good idea to change the default password by clicking the dropdown next to the ‘admin’ username in the upper-right corner of the screen.
Deploy Cluster With Cloudera Director
The remaining steps must be executed as the Cloudera Director admin user you created earlier. In my case, that’s the “azuredirectoradmin” account. All resources created by Cloudera Director in the Azure Portal will be owned by this account. The “root” user is not allowed to create resources on the Azure Portal.
First, we’ll need to create a SSH key as the “azuredirectoradmin” user on the VM where Cloudera Director is installed. This key will be added to our deployment configuration file, which will be added on all the VMs provisioned by Cloudera Director. This will allow us to use passwordless SSH to the cluster nodes with this key.
Create the ssh key (do not enter a passphrase, keep all defaults):
ssh-keygen -f ~/.ssh/director_azure_vm_key -t rsa
Configure the Cloudera Director Configuration File
Download and inspect configuration file blog.azure.conf from Github, which we will use to create the cluster with Cloudera Director. There are a few sections I want to point out.
1. Cloud Provider
Here you specify your Azure credentials information such as subscriptionID, tenantID, clientID, and Client Secret. Visit this Azure resource and/or your
Azure Portal administrator for this information
2. SSH Login Key
Add the SSH created earlier for passwordless login to the cluster:
3. Instance Templates
Define the VM profiles to use for each node type such as management, worker, and edge nodes. This is also the section where you specify all the resources created earlier in the Azure Portal like the ResourceGroup, Network Security Group, etc.
In this section, we combine the same scripts used to prepare the Cloudera Director VM earlier. We also join each VM created by Cloudera Director to the AD domain and create forward and reverse DNS entries for FQDN name resolutions. Finally, we manually installed JAVA 8 on all the nodes.
Note: I am installing the MySQL JDBC Driver on all the nodes as well. The driver is required for all services that are backed by my preconfigured MySQL database. There’s an option to have Cloudera Director automatically create databases for you. As part of that process, it will install a version of the MySQL JDBC Driver for you. I like to control this process, so I do this installation manually.
Specify the database instance information where Cloudera Director would automatically create the databases for you on the fly.
In this section, you specify all the configurations for Cloudera Manager. Since this is a secured deployment with authentication enabled, we will define our username and password to connect to Active Directory to create the Kerberos Service
Principals to secure our cluster. This step will take care of the integration with Active Directory Kerberos service.
Instruct Cloudera Manager to NOT install Java 7:
Connect to AD:
krbAdminUsername: "cloudera-scm@CLOUDERA.MORANTUS.COM" krbAdminPassword: "PASSWORD"
Instruct Cloudera Manager to install the JCE policy files on the cluster.
Note: This should only be enabled if allowed in your country or jurisdiction.
Active Directory Details:
KDC_TYPE: "Active Directory" KDC_HOST: "morantusad.cloudera.morantus.com" SECURITY_REALM: "CLOUDERA.MORANTUS.COM" KRB_MANAGE_KRB5_CONF: true KRB_ENC_TYPES: "aes256-cts aes128-cts rc4-hmac" AD_ACCOUNT_PREFIX: "cdh_" AD_KDC_DOMAIN: "ou=serviceaccounts,ou=prod,ou=clusters,ou=hadoop,dc=CLOUDERA,dc=MORANTUS,dc=COM"
Here we specify all the configurations for the cluster. Services to install, Enable HDFS High Availability, Database details for dependent services, etc. There are more example configuration files on the Cloudera Director GitHub page.
At this point, you are now ready to create your cluster with Cloudera Director using the configuration file. Use the default username and password “admin/admin”.
Create the cluster:
cloudera-director bootstrap-remote /home/azuredirectoradmin/config/blog.azure.conf --lp.remote.username=admin --lp.remote.password=admin --lp.remote.hostAndPort=localhost:7189
If you run into any error, restart the deployment or if you want to delete your cluster, you can terminate it by running:
cloudera-director terminate-remote /home/azuredirectoradmin/config/blog.azure.conf --lp.remote.username=admin --lp.remote.password=admin --lp.remote.hostAndPort=localhost:7189
A successful deployment will look like the following:
Azure Portal with all nodes
Cloudera director dashboard
Cloudera manager dashboard
Verify Kerberos is enabled
You should now be able to configure the Azure Portal, provision a VM for Cloudera Director, join your VMs to an AD domain, create DNS entries in AD DNS server, and provision a Kerberized cluster with Cloudera Director using configuration files.
Published at DZone with permission of James Morantus, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.