Virtual Clusters With Virtual Machines, Part 1
Virtual Clusters With Virtual Machines, Part 1
As a precursor to a tutorial on creating clusters with Hadoop, Spark, and Cassandra, take a look at how to create virtual machines that communicate.
Join the DZone community and get the full member experience.Join For Free
In this and a subsequent article, I want to show you how to build clusters for technologies like Hadoop, Spark, and Cassandra on your local machine. I will be using virtual machines (VM) for representing nodes. This will emulate each node (server) on your cloud or on-premise solution, enabling you to try clusters without any servers or changing anything on your production servers. This will be very helpful if you don’t have any development environment or are trying something without affecting anyone on your development servers.
For this implementation, you might need a powerful computer because you need at least two virtual machines running at the same time. The power of the computer dictates how much data will you process and how will you do it.
I will use Ubuntu Server 18.04.3 for nodes with Oracle VirtualBox Version 6.0.10 on a Windows 10 host operating system and in the next article, I'll install Spark 2.4.4. You will be able to create any cluster-based technology after this article. As long as you can create and run multiple virtual machines at the same time, you can use any other Linux distributions or virtualization software or host operating system. You can use any OS with GUI, but getting rid of GUI will let you have more processing power especially while running on VM. If you want to see UIs like Spark Web UI or Ambari Console, you can forward ports to use them on your local machine.
You may also enjoy: Elasticsearch Cluster with Vagrant and Virtualbox
Everything that has been done here is also possible on the cloud as long as you can communicate with other nodes. It is also possible to do this virtual clustering with multiple computers, but thatwon’t be shown.
If you are familiar with installing VMs, you can skip to configuring the network .
Setting Up the VM Environment
Let’s start with downloading VirtualBox. You can download Oracle VM VirtualBox here. You should select your machine’s operating system (not the one you will install). I am on Windows, so I have downloaded from Windows Host link; just install it with default options. After installation is done there is a little check before creating VM. You should control whether your computer’s virtualization is on. This option is inside BIOS and it may vary motherboard to motherboard, so just Google it if you don’t know how to enable virtualization. After enabling it, you are ready to create a virtual machine.
If you already don’t have any VM files (like VDI or VMDK etc.), click New, which will create a "Virtual hard disk now" window, which is shown below.
- Name is only for labeling, you can give it whatever name you want. I named the first one to be named. Ubuntu_Master for tracking VMs. Rest of them will be Ubuntu_Slave followed by a number.
- Machine Folder is the directory for keeping your VM’s files.
- Type is Linux in this case.
- Version is Ubuntu (64-bit). You can select other distributions if you want, but there may be some missing software which you will need to install it manually.
I gave 2 GB RAM for each VM. There will be three instances. VMs won’t use all of the RAM capacity as soon as you start them. It will consume RAM when it needs, but in total you should not give more than your host machine’s ram capacity. Also you should consider that your host OS will still run on background so don’t give every resources your computer has.
Next decide how much storage you will have. That is completely up to you, if you don’t have any VM file, select Create a virtual hard disk now and leave all options on default. If you do, just find that file. If you want to try something big and if you have the space on your local device, you can give lots of storage, but be aware, all of this will run on one machine so trying to process/analyze GBs of data probably won’t be possible or will take a very long time. This is just for developing/testing purposes. After setting the number of GBs for your storage, you are ready to install your guest OS.
To install a Ubuntu Server you need to first mount a bootable ISO. You can download it from here. After downloading it, go to VirtualBox, right-click to the Ubuntu instance that newly created (Ubuntu_Master in my case) and click Settings. From left menu, select "Storage" and click Empty under Controller: IDE. You can select your ISO file by clicking the little blue CD icon. This is something like putting a DVD into DVD drive, but it just all happens virtually. Now you're ready to start the VM! Double-click to start it. It might take a while at first. After installation is complete, you should unmount this ISO.
After starting the VM, you will see the screen below:
This is Ubuntu Server installation and I left everything default only checked installing openssh-server. I didn’t install any other software. For detailed installation information, you can visit Ubuntu’s official tutorial for installing the Ubuntu server.
You can repeat this step to create as many as VMs as you want. But I suggest that you clone your VM within Virtualbox. There is only one option you should change while cloning VM and that is the VM’s MAC Address. You should change Policy to Generate new MAC addresses for all network adapters. Path is where VM’s files will be copied. Leave the rest of the options as defaults. You might want to change the cloned VM’s hostname (it will be the same as the master). Check here for changing hostname.
After the installation of the Ubuntu Server, the next step will be configuring VM’s network. For letting VMs share the same network, you should create a NAT network from VirtualBox. From the upper menu, select File -> Preferences -> Network and add net NAT network by clicking the green plus button onthe right side. A new network will appear on the list. Right-click and edit. Network Name is just an alias; name it however you want. You can give network CIDR something appropriate with IP address rules and enable DHCP.
Here you can also configure the port forwarding to connect the VM from your host OS (you can use this for software like Ambari as I mentioned. Connecting localhost:8080 will direct you to the VM). It is also possible to make another port forward for making it accessible from the Internet; you should have static IP or learn your IP address whenever it changes. This enables you to connect to your virtual machine from the Internet without knowing your host OS. This might give you trouble, though.
Next, connect your VMs to this newly created network. Go to VM settings by right-clicking them and from Network tab, select the "Attached to" option to your network (ClusterNetwork in my case). Lastly, expand advanced options and set Promiscuous Mode to allow All. After these settings, you should be able to ping your VMs between each other. You can put as many VMs as you can run. For adding new VMs quickly, you can clone any VM from VirtualBox. If you clone VMs, don’t forget to change the new VMs' MAC address and hostname. I suggest using cloning to avoid making same operations for every VM you will create like installing Java. This way it will be faster.
The virtual environment is ready now. You can continue like you are on the cloud or take a look at the next article.
Happy Coding :)
Opinions expressed by DZone contributors are their own.