How to Create a Local Instance of Hadoop on Your Laptop for Practice

At the end of this eight-step process, we will be able to have a local Hadoop instance on our laptop for tests so that we can practice with it.

Tomer Ben David

May. 17, 18 · Tutorial

Likes (4)

Comment

Save

24.1K Views

Here is what I learned last week about Hadoop installation: Hadoop sounds like a really big thing with a complex installation process, lots of clusters, hundreds of machines, terabytes (if not petabytes) of data, etc. But actually, you can download a simple JAR and run Hadoop with HDFS on your laptop for practice. It's very easy!

Let's download Hadoop, run it on our local laptop without too much clutter, then run a sample job on it. At the end of this eight-step process, we want to be able to have a local Hadoop instance on our laptop for tests so that we can practice with it.

Our plan:

Set up JAVA_HOME (Hadoop is built on Java).
Download Hadoop tar.gz.
Extract Hadoop tar.gz.
Set up Hadoop configuration.
Start and format HDFS.
Upload files to HDFS.
Run a Hadoop job on these uploaded files.
Get back and print results!

Sounds like a plan!

1. Set Up JAVA_HOME

As we said, Hadoop is built, on Java so we need JAVA_HOME set up.

➜  hadoop ls /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
➜  hadoop echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

2. Download Hadoop tar.gz

Next, we download Hadoop!

➜  hadoop curl http://apache.spd.co.il/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz --output hadoop.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  1  310M    1 3581k    0     0   484k      0  0:10:57  0:00:07  0:10:50  580k

3. Extract Hadoop tar.gz

Now that we have tar.gz on our laptop, let's extract it.

➜  hadoop tar xvfz ~/Downloads/hadoop-3.1.0.tar.gz

4. Set Up HDFS

Now, let's configure HDFS on our laptop:

➜  hadoop cd hadoop-3.1.0
➜  hadoop-3.1.0
➜  hadoop-3.1.0 vi etc/hadoop/core-site.xml

The configuration should be:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

So, we configured the HDFS port — let's configure how many replicas we need. We are on a laptop, so we want only one replica for our data:

➜  hadoop-3.1.0 vi etc/hadoop/hdfs-site.xml:

The above hdfs-site.xml is the site for replica configuration. Below is the configuration it should have (hint: 1):

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Enable SSHD

Hadoop connects to nodes with SSH, so let's enable it on our Mac laptop:

You should be able to SSH with no pass:

➜  hadoop-3.1.0 ssh localhost
Last login: Wed May  9 17:15:28 2018
➜  ~

If you can't do that, then do this:

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

5. Start HDFS

Next, we start and format HDFS on our laptop:

bin/hdfs namenode -format

➜  hadoop-3.1.0 bin/hdfs namenode -format
WARNING: /Users/tomer.bendavid/tmp/hadoop/hadoop-3.1.0/logs does not exist. Creating.
2018-05-10 22:12:02,493 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Tomers-MacBook-Pro.local/192.168.1.104


➜  hadoop-3.1.0 sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes

6. Create Folders on HDFS

Next, we create a sample input folder on HDFS on our laptop:

➜  hadoop-3.1.0 bin/hdfs dfs -mkdir /user
2018-05-10 22:13:16,982 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0 bin/hdfs dfs -mkdir /user/tomer
2018-05-10 22:13:22,474 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0

Upload Test Data to HDFS

Now that we have HDFS up and running on our laptop, let's upload some files:

➜  hadoop-3.1.0 bin/hdfs dfs -put etc/hadoop input
2018-05-10 22:14:28,802 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: `input': No such file or directory: `hdfs://localhost:9000/user/tomer.bendavid/input'
➜  hadoop-3.1.0 bin/hdfs dfs -put etc/hadoop /user/tomer/input
2018-05-10 22:14:37,526 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0 bin/hdfs dfs -ls /user/tomer/input
2018-05-10 22:16:09,325 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x   - tomer.bendavid supergroup          0 2018-05-10 22:14 /user/tomer/input/hadoop

7. Run Hadoop Job

So, we have HDFS with files on our laptop — now, let's run a job on it:

➜  hadoop-3.1.0 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /user/tomer/input/hadoop/*.xml /user/tomer/output1 'dfs[a-z.]+'
➜  hadoop-3.1.0 bin/hdfs dfs -cat /user/tomer/output1/part-r-00000
2018-05-10 22:22:29,118 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1   dfsadmin
1   dfs.replication

8. Get Back and Print Results

And that's it. We managed to have a local Hadoop installation with HDFS for tests and run a test job! That is so cool!

hadoop

Opinions expressed by DZone contributors are their own.

Related

Trending