How to Create a Local Instance of Hadoop on Your Laptop for Practice
At the end of this eight-step process, we will be able to have a local Hadoop instance on our laptop for tests so that we can practice with it.
Join the DZone community and get the full member experience.
Join For FreeHere is what I learned last week about Hadoop installation: Hadoop sounds like a really big thing with a complex installation process, lots of clusters, hundreds of machines, terabytes (if not petabytes) of data, etc. But actually, you can download a simple JAR and run Hadoop with HDFS on your laptop for practice. It's very easy!
Let's download Hadoop, run it on our local laptop without too much clutter, then run a sample job on it. At the end of this eight-step process, we want to be able to have a local Hadoop instance on our laptop for tests so that we can practice with it.
Our plan:
- Set up
JAVA_HOME
(Hadoop is built on Java). - Download Hadoop tar.gz.
- Extract Hadoop tar.gz.
- Set up Hadoop configuration.
- Start and format HDFS.
- Upload files to HDFS.
- Run a Hadoop job on these uploaded files.
- Get back and print results!
Sounds like a plan!
1. Set Up JAVA_HOME
As we said, Hadoop is built, on Java so we need JAVA_HOME
set up.
➜ hadoop ls /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
➜ hadoop echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
2. Download Hadoop tar.gz
Next, we download Hadoop!
➜ hadoop curl http://apache.spd.co.il/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz --output hadoop.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
1 310M 1 3581k 0 0 484k 0 0:10:57 0:00:07 0:10:50 580k
3. Extract Hadoop tar.gz
Now that we have tar.gz on our laptop, let's extract it.
➜ hadoop tar xvfz ~/Downloads/hadoop-3.1.0.tar.gz
4. Set Up HDFS
Now, let's configure HDFS on our laptop:
➜ hadoop cd hadoop-3.1.0
➜ hadoop-3.1.0
➜ hadoop-3.1.0 vi etc/hadoop/core-site.xml
The configuration should be:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
So, we configured the HDFS port — let's configure how many replicas we need. We are on a laptop, so we want only one replica for our data:
➜ hadoop-3.1.0 vi etc/hadoop/hdfs-site.xml:
The above hdfs-site.xml is the site for replica configuration. Below is the configuration it should have (hint: 1):
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Enable SSHD
Hadoop connects to nodes with SSH, so let's enable it on our Mac laptop:
You should be able to SSH with no pass:
➜ hadoop-3.1.0 ssh localhost
Last login: Wed May 9 17:15:28 2018
➜ ~
If you can't do that, then do this:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
5. Start HDFS
Next, we start and format HDFS on our laptop:
bin/hdfs namenode -format
➜ hadoop-3.1.0 bin/hdfs namenode -format
WARNING: /Users/tomer.bendavid/tmp/hadoop/hadoop-3.1.0/logs does not exist. Creating.
2018-05-10 22:12:02,493 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = Tomers-MacBook-Pro.local/192.168.1.104
➜ hadoop-3.1.0 sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
6. Create Folders on HDFS
Next, we create a sample input folder on HDFS on our laptop:
➜ hadoop-3.1.0 bin/hdfs dfs -mkdir /user
2018-05-10 22:13:16,982 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜ hadoop-3.1.0 bin/hdfs dfs -mkdir /user/tomer
2018-05-10 22:13:22,474 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜ hadoop-3.1.0
Upload Test Data to HDFS
Now that we have HDFS up and running on our laptop, let's upload some files:
➜ hadoop-3.1.0 bin/hdfs dfs -put etc/hadoop input
2018-05-10 22:14:28,802 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: `input': No such file or directory: `hdfs://localhost:9000/user/tomer.bendavid/input'
➜ hadoop-3.1.0 bin/hdfs dfs -put etc/hadoop /user/tomer/input
2018-05-10 22:14:37,526 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜ hadoop-3.1.0 bin/hdfs dfs -ls /user/tomer/input
2018-05-10 22:16:09,325 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - tomer.bendavid supergroup 0 2018-05-10 22:14 /user/tomer/input/hadoop
7. Run Hadoop Job
So, we have HDFS with files on our laptop — now, let's run a job on it:
➜ hadoop-3.1.0 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /user/tomer/input/hadoop/*.xml /user/tomer/output1 'dfs[a-z.]+'
➜ hadoop-3.1.0 bin/hdfs dfs -cat /user/tomer/output1/part-r-00000
2018-05-10 22:22:29,118 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1 dfsadmin
1 dfs.replication
8. Get Back and Print Results
And that's it. We managed to have a local Hadoop installation with HDFS for tests and run a test job! That is so cool!
Opinions expressed by DZone contributors are their own.
Comments