DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Coding
  3. Languages
  4. Apache Ranger and AWS EMR Automated Installation and Integration Series (5): Windows AD + Open-Source Ranger

Apache Ranger and AWS EMR Automated Installation and Integration Series (5): Windows AD + Open-Source Ranger

In the last article of the five-part series, readers will understand the last high applicability scenario: Scenario 4: Windows AD + Open-Source Ranger.

Laurence Geng user avatar by
Laurence Geng
·
Dec. 16, 22 · Tutorial
Like (2)
Save
Tweet
Share
4.47K Views

Join the DZone community and get the full member experience.

Join For Free

Hopefully you have enjoyed the previous four articles in this series. In the last article of this series, we will introduce the last high applicability scenario: “Windows AD + Open-Source Ranger.”

1. Windows AD + Open-Source Ranger Solution Overview

1.1 Solution Architecture

Solution Architecture

In this solution, Windows AD plays the authentication provider, all user accounts data store on it, and Ranger plays the authorization controller; it will sync accounts data from Windows AD to grant privileges against user accounts from Windows AD. Meanwhile, the EMR cluster needs to install a series of Ranger plugins. These plugins will check with the Ranger server to assure the current user has permission to perform an action. The EMR cluster will also sync accounts data from Windows AD via SSSD so the user can log in nodes of the EMR cluster and submit jobs. As end users, they can log in SSH nodes of the EMR cluster with her/his Windows AD account. If Hue is available, they can also log into Hue with this account.

1.2 Ranger in Detail

Let’s deep dive into Ranger for more details; its architecture looks as follows:

Ranger Architecture

The installer will finish the following jobs:

  1. Install MySQL as the Policy DB for Ranger.
  2. Install Solr as the Audit Store for Ranger.
  3. Install Ranger Admin.
  4. Install Ranger UserSync.
  5. Install the HDFS Ranger Plugin.
  6. Install the Hive Ranger Plugin.

2. Installation and Integration

Generally, the installation and integration process can be divided into three stages: 

  1. Prerequisites
  2. All-In-One Install
  3. Create the EMR Cluster.

The following diagram illustrates the progress in detail:

Progress in Detail

At stage 1, we need to do some preparatory work. At stage 2, we start to install and integrate. Here are two options at this stage: one is an all-in-one installation driven by a command-line-based workflow. The other is a step-by-step installation. For most cases, an all-in-one installation is always the best choice; however, your installation workflow may be interrupted by unforeseen errors. If you want to continue installing from the last failed step, please try the step-by-step installation. If you want to re-try a step with different argument values to find the right one, step-by-step is also a better choice. At stage 3, we need to create an EMR cluster. If you already have one, skip this job. In most cases, we need to install Ranger on an existing cluster, not a new cluster. For EMR-native Ranger, it is impossible to install on an existing cluster (because EMR-native Ranger plugins can only be installed when creating the cluster), but open-source Ranger does not have this problem, so you can be free to install on an existing or new EMR cluster.

There is a little overlap in the execution sequence between stages 2 and 3. At step 2.4, the installation progress will be pending, and the installer will indicate users to create their own cluster and keep monitoring the target cluster’s status. Once the cluster is ready, the progress will resume and continue to perform REST actions.

As a design principle, the installer does not include any actions to create an EMR cluster. You should always create your cluster yourself because an EMR cluster could have unpredictable settings, i.e., application-specific (HDFS, Yarn, etc.) configuration, step scripts, bootstrap scripts, and so on. It is unadvised to couple Ranger’s installation with the EMR cluster’s creation.

Notes:

  1. The installer will treat the local host as a Ranger server to install everything on Ranger. For non-Ranger operations, i.e., installing EMR plugins, it will initiate remote operations via SSH. So, you can stay on the Ranger server to execute command lines. There is no need to switch among multiple hosts.
  2. Although it is not required, we suggest you always use FQDN as the host address. Both the IP and hostnames without domain names are not recommended.

2.1 Prerequisites

2.1.1 VPC Constraints

To integrate with Windows AD, EMR cluster nodes need to join the Windows domain (realm). A series of constraints are imposed on the VPC. Before installing, please ensure the hostname of the EC2 instance is no more than fifteen characters. This is a limitation from Windows AD; however, as AWS assigns DNS hostnames based on the IPv4 address, this limitation propagates to VPC. If the CIDR of VPC can make sure the IPv4 address is no more than nine characters, the assigned DNS hostnames can be limited to fifteen characters. With the limitation, a recommended CIDR setting of VPC is 10.0.0.0/16.

Although we can change the default hostname after the EC2 instances are available, the hostname will be used when the computers join the Windows AD directory. This happened during the creation of the EMR cluster. A post modification on the hostname does not work (a possible workaround is to put modifying hostname actions into bootstrap scripts, but we haven’t tried it. To change the hostname, please refer to the AWS Documentation titled Change the hostname of your Amazon Linux instance. 

2.1.2 Create Windows AD Server

First, we need to create a Windows AD server with PowerShell scripts. First, create an EC2 instance with the Windows Server 2019 Base image (2016 is also tested and supported). Next, log in with an Administrator account, download the Windows AD installation scripts file from here, and save it to your desktop. 

Next, press “Win + R” to open a run dialog, copy the following command line, and replace the parameter values with your own settings:

PowerShell
 
Powershell.exe -NoExit -ExecutionPolicy Bypass -File %USERPROFILE%\Desktop\ad.ps1 -DomainName <replace-with-your-domain> -Password <replace-with-your-password> -TrustedRealm <replace-with-your-realm>


The ad.ps1 has pre-defined default parameter values. The domain name is example.com, the password is Admin1234!, and the trusted realm is COMPUTE.INTERNAL. As a quick start, you can right-click the ad.ps1 file and select Run with PowerShell to execute it. 

Note: You can not run the PowerShell scripts by right-clicking “Run with PowerShell” on us-east-1 because its default trusted realm is EC2.INTERNAL, so you should set the -TrustedRealm EC2.INTERNAL explicitly via the above command line. 

After the scripts are executed, the computer will ask to restart. This is forced by Windows. We should wait for the computer to restart and then log in again as an Administrator so the subsequent commands in the scripts file continue executing. Be sure to log in again; otherwise, a part of the scripts have no chance to execute.

After logging in, we can open “Active Directory Users and Computers” from Start Menu -> Windows Administrative Tools -> Active Directory Users and Computers or enter dsa.msc from the “Run” dialog to check on the created AD. If everything goes well, we will get the following AD directory:AD Directory

Next, we need to check the DNS setting. An invalid DNS setting will result in installation failure. A common error when running scripts is “Ranger Server can't solve DNS of Cluster Nodes.” This problem is usually caused by an incorrect DNS forwarder setting. We can open the “DNS Manager” from Start Menu -> Windows Administrative Tools -> DNS, or enter dnsmgmt.msc from the “Run” dialog. Next, open the “Forwarders” tab. Normally, there is a record, and the IP address should be 10.0.0.2:

Forwarders Tab

10.0.0.2 is the default DNS server address for the 10.0.0.0/16 network in the VPC. According to VPC document:

The Amazon DNS server does not reside within a specific subnet or Availability Zone in a VPC. It’s located at the address 169.254.169.253 (and the reserved IP address at the base of the VPC IPv4 network range, plus two) and fd00:ec2::253. For example, the Amazon DNS Server on a 10.0.0.0/16 network is located at 10.0.0.2. For VPCs with multiple IPv4 CIDR blocks, the DNS server IP address is located in the primary CIDR block.

The forwarder’s IP address usually comes from ”Domain name servers” of your VPC’s “DHCP Options Set;” its default value is AmazonProvidedDNS. If you changed it when creating the Windows AD, the forwarder’s IP will become your changed value. It probably happened during the re-install with Windows AD in a VPC. If you didn’t recover the “Domain name servers” to AmazonProvidedDNS before the re-install, the forwarder’s IP is always the address of the previous Windows AD server; it may not exist anymore, which is why the Ranger server or cluster nodes can’t solve the DNS. We can simply change the forwarder IP to the default value, i.e., 10.0.0.2 in a 10.0.0.0/16 network.

The other DNS related configuration is the IPv4 DNS setting. Usually, its default setting is ok; attach it here as a reference (in the cn-north-1 region):

DNS Setting

2.1.3 Create DHCP Options Set and Attach To VPC

Joining the Windows domain (realm) requires that nodes in the VPC can reach one another over the network and resolve each other’s domain names. So it is required to set the Windows AD as a DNS server in the ”DHCP Options Sets” of the VPC. The following command line will complete this job (run the following scripts on a Linux host that has installed AWS CLI):

Shell
 
# run on a host which has installed aws cli
export REGION='<change-to-your-region>'
export VPC_ID='<change-to-your-vpc-id>'
export DNS_IP='<change-to-your-dns-ip>'

# solve domain name based on region
if [ "$REGION" = "us-east-1" ]; then
    export DOMAIN_NAME="ec2.internal"
else
    export DOMAIN_NAME="$REGION.compute.internal"
fi

# create dhcp options and return id
dhcpOptionsId=$(aws ec2 create-dhcp-options \
    --region $REGION \
    --dhcp-configurations '{"Key":"domain-name","Values":["'"$DOMAIN_NAME"'"]}' '{"Key":"domain-name-servers","Values":["'"$DNS_IP"'"]}' \
    --tag-specifications "ResourceType=dhcp-options,Tags=[{Key=Name,Value=WIN_DNS}]" \
    --no-cli-pager \
    --query 'DhcpOptions.DhcpOptionsId' \
    --output text)


# attach the dhcp options to target vpc
aws ec2 associate-dhcp-options \
    --dhcp-options-id $dhcpOptionsId \
    --vpc-id $VPC_ID


The following is a snapshot of the created DHCP options from the AWS web console:

DHCP Snapshot

The “Domain name” cn-north-1.compute.internal will be the “domain name” part of the long hostname (FQDN). Usually, for the us-east-1 region, please specify the ec2.internal. For other regions, specify <region>.compute.internal.

Note: Do not set the domain name of Windows AD to it, i.e., example.com. In our example, they are two different things; otherwise, the joining realm will fail. The “Domain name server” 10.0.13.40 is the private IP of the Windows AD server. The following is a snapshot of the VPC that has attached to this DHCP options set:

VPC Snapshot

2.1.4 Create EC2 Instances as a Ranger Server

Next, we need to prepare an EC2 instance as a Ranger server. When creating this instance, please select the  Amazon Linux 2 image and guarantee the network connections among instances and the cluster to be created are reachable.

As a best practice, it’s recommended to add the Ranger server into the ElasticMapReduce-master security group because Ranger is very close to the EMR cluster; it can be regarded as a non-EMR-build-in master service. For Windows AD, we have to make sure its port 389 is reachable from Ranger and all nodes of the EMR cluster to be created. You can also add Windows AD into the  ElasticMapReduce-master security group.

2.1.5 Download Installer

After the EC2 instances are ready, pick the Ranger server, log in via SSH, and run the following commands to download the installer package:

Shell
 
sudo yum -y install git
git clone https://github.com/bluishglc/ranger-emr-cli-installer.git


2.1.6 Upload SSH Key File

As mentioned before, the installer is based on the local host (Ranger server). To perform remote installing actions on the EMR cluster, an SSH private key is required, so we should upload it to the Ranger server, and make a note of the file path; it will be the value of the variable SSH_KEY.

2.1.7 Export Environment-Specific Variables

During the installation, the following environment-specific arguments will be passed more than once; it’s recommended to export them first, and then all the command lines will refer to these variables instead of literals.

Shell
 
export REGION='TO_BE_REPLACED'
export ACCESS_KEY_ID='TO_BE_REPLACED'
export SECRET_ACCESS_KEY='TO_BE_REPLACED'
export SSH_KEY='TO_BE_REPLACED'
export AD_HOST='TO_BE_REPLACED'


The following are comments of the above variables:

  • REGION: the AWS Region, i.e., cn-north-1, us-east-1, and so on.
  • ACCESS_KEY_ID: the AWS access key ID of your IAM account. Be sure your account has enough privileges; it’s better having admin permissions.
  • SECRET_ACCESS_KEY: the AWS secret access key of your IAM account.
  • SSH_KEY: the SSH private key file path on the local host you just uploaded.
  • AD_HOST: the FQDN of the Windows AD server. 

Please carefully replace above variables’ value according to your environment, and remember to use the FQDN as the hostname, i.e., AD_HOST. The following is a copy of the example:

Shell
 
export REGION='cn-north-1'
export ACCESS_KEY_ID='<change-to-your-aws-access-key-id>'
export SECRET_ACCESS_KEY='<change-to-your-aws-secret-access-key>'
export SSH_KEY='/home/ec2-user/key.pem'
export AD_HOST='ip-10-0-14-0.cn-north-1.compute.internal'


2.2 All-In-One Installation

2.2.1 Quick Start

Now, let’s start an all-in-one installation. Execute this command line:

Shell
 
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'ad' \
    --ad-host "$AD_HOST" \
    --ad-domain 'example.com' \
    --ad-domain-admin 'domain-admin' \
    --ad-domain-admin-password 'Admin1234!' \
    --ad-base-dn 'cn=users,dc=example,dc=com' \
    --ad-user-object-class 'person' \
    --ranger-plugins 'open-source-hdfs,open-source-hive'


For the parameters specification of the above command line, please refer to the appendix. We will highlight two options: --ad-domain-admin and --ad-domain-admin-password. They only appear in the “Windows AD + Open-Source Ranger” solution, so we need to leverage the two options to finish joining the realm operation.

If everything goes well, the command line will execute steps 2.1 to 2.3 in the workflow diagram. This may spend ten minutes or more, depending on the bandwidth of your network. Next, it will suspend and indicate the user to enter the EMR cluster id. If the target cluster is existing, we can fill its ID immediately. If not, we should switch to the EMR web console to create it. Next, the command line asks the users to confirm if Hue is allowed to integrate with LDAP or not. If so, when the cluster is ready, the installer will update the EMR configuration with Hue-specific settings (this action will overwrite the EMR existing configuration).

Fill in the above two items, and enter “y” to confirm all inputs. The installation process will resume, and if the targeted EMR cluster is not ready yet, the command line will keep monitoring IT until it goes into a “WAITING” status. The following is a snapshot for this moment of the command line:

Command Line Snapshot

When the cluster is ready (status is ”WAITING”), the command line will continue to execute steps 2.4 to 2.6 of the workflow and end with an “ALL DONE!!” message.

2.2.2 Customization

Now, that the all-in-one installation is done, we will introduce more about customization. Generally, this installer follows the principle of “Convention over Configuration.” Most parameters are preset by default values. An equivalent version with the full parameter list of the above command line is as follows:

Shell
 
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'ad' \
    --ad-host "$AD_HOST" \
    --ad-domain 'example.com' \
    --ad-domain-admin 'domain-admin' \
    --ad-domain-admin-password 'Admin1234!' \
    --ad-base-dn 'cn=users,dc=example,dc=com' \
    --ad-user-object-class 'person' \
    --ranger-plugins 'open-source-hdfs,open-source-hive' \
    --java-home '/usr/lib/jvm/java' \
    --skip-install-mysql 'false' \
    --skip-install-solr 'false' \
    --skip-configure-hue 'false' \
    --ranger-host $(hostname -f) \
    --ranger-version '2.1.0' \
    --mysql-host $(hostname -f) \
    --mysql-root-password 'Admin1234!' \
    --mysql-ranger-db-user-password 'Admin1234!' \
    --solr-host $(hostname -f) \
    --ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \
    --ranger-bind-password 'Admin1234!' \
    --hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \
    --hue-bind-password 'Admin1234!' \
    --restart-interval 30


The full-parameters version gives us a complete perspective of all custom options. In the following scenarios, you may change some of the options’ values:

  1. If you want to change the default organization name dc=example,dc=com, or the default password Admin1234!, please run the full-parameters version, and replace them with your own values.
  2. If you need to integrate with external facilities, i.e., an existing MySQL or Solr, please add the corresponding --skip-xxx-xxx options and set it to true.
  3. If you have another pre-defined Bind DN for Hue, Ranger, and SSSD, please add the corresponding --xxx-bind-dn and --xxx-bind-password options to set them. Note: The Bind DN for Hue, Ranger, and the domain-admin will be created automatically when installing Windows AD, but they are fixed with the following naming pattern cn=hue|ranger|domain-admin,ou=services,<your-base-dn>, not the given value of “--xxx-bind-dn” option, so if you assign another DN with the “--xxx-bind-dn” option, you must create this DN by yourself in advance. The reason this install does not create the DN assigned by the “--xxx-bind-dn” option is that a DN is a tree path. To create it, we must create all nodes in the path. It is not cost-effective to implement such a small but complicated function.

2.3 Step-By-Step Installation

As an alternative, you can also select the step-by-step installation instead of the all-in-one installation. We give the command line of each step, as for comments on each parameter, please refer to the appendix.

2.3.1 Init EC2

This step will finish some fundamental jobs, i.e., install AWS CLI, JDK, and so on.

Shell
 
sudo sh ./ranger-emr-cli-installer/bin/setup.sh init-ec2 \
    --region "$REGION" \
    --access-key-id "$ACCESS_KEY_ID" \
    --secret-access-key "$SECRET_ACCESS_KEY"


2.3.2 Install Ranger

This step will install all server-side components of Ranger, including MySQL, Solr, Ranger Admin, and Ranger UserSync.

Shell
 
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger \
    --region "$REGION" \
    --solution 'open-source' \
    --auth-provider 'ad' \
    --ad-domain 'example.com' \
    --ad-host "$AD_HOST" \
    --ad-base-dn 'cn=users,dc=example,dc=com' \
    --ad-user-object-class 'person' \
    --ranger-bind-dn 'cn=ranger,ou=services,dc=example,dc=com' \
    --ranger-bind-password 'Admin1234!'


2.3.3 Create EMR Cluster

For step-by-step installation, there is no interactive process for creating the EMR cluster, so feel free to create the cluster on the EMR web console. We have to wait until the cluster is completely ready (in “WAITING” status), then export the following environment-specific variables:

Shell
 
export EMR_CLUSTER_ID='TO_BE_REPLACED'


The following is a copy of the example:

Shell
 
export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'


2.3.4 Install Ranger Plugins

This step will install HDFS and Hive plugins on the Ranger server side and agent side (EMR nodes). This is different from the EMR-native Ranger solution. For the EMR-native Ranger, EMR will install agent sides on each node automatically. For open-source Ranger, we have to do this job by ourselves via this installer.

Shell
 
sudo sh ./ranger-emr-cli-installer/bin/setup.sh install-ranger-plugins \
    --region "$REGION" \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'ad' \
    --ranger-plugins 'open-source-hdfs,open-source-hive' \
    --emr-cluster-id "$EMR_CLUSTER_ID"


2.3.5 Install SSSD

This step will install and config SSSD on each node of the EMR cluster. We don’t need to log into each node. Stay in the local host to run the command line; it will perform on remote nodes via SSH.

Shell
 
sudo ./ranger-emr-cli-installer/bin/setup.sh install-sssd \
    --ssh-key "$SSH_KEY" \
    --solution 'open-source' \
    --auth-provider 'ad' \
    --ad-host "$AD_HOST" \
    --ad-domain 'example.com' \
    --ad-domain-admin 'domain-admin' \
    --ad-domain-admin-password 'Admin1234!' \
    --emr-cluster-id "$EMR_CLUSTER_ID"


2.3.6 Configure Hue

This step will update the Hue configuration of EMR, as highlighted in the all-in-one installation. If you have another customized EMR configuration, please skip this step, but you can still manually merge the generated JSON file for the Hue configuration by the command line into your own JSON.

Shell
 
sudo sh ./ranger-emr-cli-installer/bin/setup.sh configure-hue \
    --region "$REGION" \
    --solution 'open-source' \
    --auth-provider 'ad' \
    --ad-host "$AD_HOST" \
    --ad-domain 'example.com' \
    --ad-base-dn 'cn=users,dc=example,dc=com' \
    --ad-user-object-class 'person' \
    --hue-bind-dn 'cn=hue,ou=services,dc=example,dc=com' \
    --hue-bind-password 'Admin1234!' \
    --emr-cluster-id "$EMR_CLUSTER_ID"


3. Verification

After the installation and integration is completed, it’s time to see if Ranger works or not. The verification jobs are divided into two parts which are against HDFS and Hive. First, let us log into Windows AD via a client, i.e., LDAPAdmin or Apache Directory Studio. Next, check out all DN; it should look as follows:

All DN

Next, open the Ranger web console. The address is: http://<YOUR-RANGER-HOST>:6080, the default admin account/password is: admin/admin. After logging in, we should open the “Users/Groups/Roles” page first. See if the example users on Windows AD are already synchronized to Ranger as follows:

Synched

Log into the master node of the EMR  cluster and export cluster ID because the subsequent command lines need this variable.

Shell
 
# run on master node of emr cluster
export EMR_CLUSTER_ID='TO_BE_REPLACED'


The following is a copy of the example:

Shell
 
# run on master node of emr cluster
export EMR_CLUSTER_ID='j-2S04VJZ5YQHZ4'


3.1 HDFS Access Control Verification

Usually, there are a set of pre-defined policies for the HDFS plugin after installation as follows:

Predefined Policies

We do not configure any HDFS permissions for example-user-1, but if we log Into Hue with the account example-user-1, you will see it can browse most directories and files on HDFS. This is because most directories and files have a+w permission. Please keep in mind that the HDFS r/w/x file mode attributes and Ranger-based permissions always take effect at the same time.

To verify if the HDFS plugin works, we select the “blacklist” mode to test. First, let’s create a directory named /ranger-test on HDFS and set example-user-1 as its owner:

Shell
 
# run on master node of emr cluster
sudo -u hdfs hdfs dfs -mkdir /ranger-test
sudo -u hdfs hdfs dfs -chown example-user-1:example-group /ranger-test
sudo -u hdfs hdfs dfs -chmod 700 /ranger-test


Next, let’s add a deny-policy which disables example-user-1 read and write ranger-test:

Read and Write

Any policy changes on the Ranger web console will sync to the agent side (EMR cluster nodes) within thirty seconds. We can run the following commands on the master node to check if the local policy file is updated:

Shell
 
# run on master node of emr cluster
for i in {1..10}; do
    printf "\n%100s\n\n"|tr ' ' '='
    sudo stat /etc/ranger/HDFS_${EMR_CLUSTER_ID}/policycache/hdfs_HDFS_${EMR_CLUSTER_ID}.json
    sleep 3
done


Once the local policy file is up to date, the deny policy becomes effective. Next, log into Hue with the Windows AD account “example-user-1” created by the installer, open “File Browser,” click root directory “/”, and then click the “ranger-test” folder. We will get an error message: “Cannot access:/ranger-test:”

Effective Policy

Even If the current user example-user-1 is the owner of this folder; it is still blocked by the Ranger HDFS plugin. This means that HDFS access control is managed by Ranger. 

Finally, remember to remove the “ranger-test” policy so example-user-1 has full privileges to access this folder because the following Hive verification will re-use this folder.

3.2 Hive Access Control Verification

Usually, there is a set of pre-defined policies for the Hive plugin after installation. To eliminate interference and keep verification simple, let’s remove them first:

Remove

Any policy changes on the Ranger web console will sync to the agent side (EMR cluster nodes) within thirty seconds. We run the following commands on master node to check if the local policy file is updated:

Shell
 
# run on master node of emr cluster
for i in {1..10}; do
    printf "\n%100s\n\n"|tr ' ' '='
    sudo stat /etc/ranger/HIVE_${EMR_CLUSTER_ID}/policycache/hiveServer2_HIVE_${EMR_CLUSTER_ID}.json
    sleep 3
done


Once the local policy file is up to date, the removing-all-policies action becomes effective. Next, log into Hue with Windows AD account “example-user-1” created by the installer, open Hive editor, and enter the following SQL (remember to replace “ranger-test” with your own bucket) to create a test table (change ‘ranger-test’ to your own bucket name):

SQL
 
-- run in hue hive editor
create table ranger_test (
  id bigint
)
row format delimited
stored as textfile location '/ranger-test';


Next, run it, and an error occurs:

Error Occurs

It shows that example-user-1 is blocked by database-related permissions. This proves the Hive plugin is working, and then we go back to Ranger and add a Hive policy named “all - database, table, column” as follows:

Added Hive Policy

It grants example-user-1 all privileges on all databases, tables, and columns, then checks the policy file again on the master node with the previous command line. Once updated, go back to Hue, re-run that SQL, and it will go well as follows:

Successful Run

To double-check if example-user-1 has full read and write permissions on the table, we can run the following SQL:

SQL
 
insert into ranger_test(id) values(1);
insert into ranger_test(id) values(2);
insert into ranger_test(id) values(3);
select * from ranger_test;


The execution result is:

Execution Result

By now, Hive access control verifications are passed.

4. Appendix

The following is the parameter specification:

Parameter
Comment
--region
The AWS region.
--access-key-id
The AWS access key ID of your IAM account.
--secret-access-key
The AWS secret access key of your IAM account.
--ssh-key
The SSH private key file path.
--solution
The solution name, accepted values ‘open-source’ or ‘EMR-native.’
--auth-provider
The authentication provider, accepted values ‘AD’ or ‘OpenLDAP.’
--openldap-host
The FQDN of the OpenLDAP host.
--openldap-base-dn
The Base DN of OpenLDAP. For example: ’dc=example,dc=com.’ Change it according to your env.
--openldap-root-cn
The cn of the root account. For example: ‘admin,’ change it according to your env.
--openldap-root-password
The password of the root account. For example: ‘Admin1234!.’ Change it according to your env.
--ranger-bind-dn
The Bind DN for Ranger. For example: ‘cn=ranger,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/OpenLDAP. Change it according to your env.
--ranger-bind-password
The password of the Ranger Bind DN. For example: ‘Admin1234!.’ Change it according to your env.
--openldap-user-dn-pattern
The DN pattern for Ranger to search users on OpenLDAP. For example: ‘uid={0},ou=users,dc=example,dc=com.’ Change it according to your env.
--openldap-group-search-filter
The filter for Ranger to search groups on OpenLDAP. For example ‘(member=uid={0},ou=users,dc=example,dc=com).’ Change it according to your env.
--openldap-user-object-class
The user object class for Ranger to search users. For example: ‘inetOrgPerson.’ Change it according to your env.
--hue-bind-dn
The Bind DN for Hue. For example: ‘cn=hue,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/OpenLDAP. Change it according to your env.
--hue-bind-password
The password of the Hue Bind DN. For example: ‘Admin1234.’ Change it according to your env.
--example-users
The example users to be created on OpenLDAP and Kerberos to demo Ranger’s feature. This parameter is optional. If omitted, no example users will be created.
--ranger-bind-dn
The Bind DN for Ranger. For example: ‘cn=ranger,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/OpenLDAP. Change it according to your env.
--ranger-bind-password
The password of Bind DN. For example: ‘Admin1234!.’ Change it according to your env.
--hue-bind-dn
The Bind DN for Hue. For example: ‘cn=hue,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/OpenLDAP. Change it according to your env.
--hue-bind-password
The password of the Hue Bind DN. For example: ‘Admin1234!.’ Change it according to your env.
--sssd-bind-dn
The Bind DN for SSSD. For example: ‘cn=sssd,ou=services,dc=example,dc=com.’ This should be an existing DN on Windows AD/OpenLDAP. Change it according to your env.
--sssd-bind-password
The password of the SSSD Bind DN. For example: ‘Admin1234!.’ Change it according to your env.
--ranger-plugins
The Ranger plugins to be installed, comma separated for multiple values. For example: ‘open-source-hdfs,open-source-hive.’ Change it according to your env.
--skip-configure-hue
Skip to configure the Hue hue accepted values of ‘true’ or ’false.’ The default value is ‘false.’
--skip-migrate-kerberos-db
Skip to migrate the Kerberos database, accepted values of ‘true’ or ’false.’ The default value is ‘false.’
AWS Command-line interface Domain Name System Virtual private cloud cluster Hue (software) Integration Apache Hive Apache Solr REST JSON MySQL Kerberos (protocol) sql PowerShell Open Broadcaster Software

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How To Use Java Event Listeners in Selenium WebDriver
  • How To Best Use Java Records as DTOs in Spring Boot 3
  • Fixing Bottlenecks in Your Microservices App Flows
  • Shift-Left: A Developer's Pipe(line) Dream?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: