Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Create a Local Instance of Hadoop on Your Laptop for Practice

DZone's Guide to

How to Create a Local Instance of Hadoop on Your Laptop for Practice

At the end of this eight-step process, we will be able to have a local Hadoop instance on our laptop for tests so that we can practice with it.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Here is what I learned last week about Hadoop installation: Hadoop sounds like a really big thing with a complex installation process, lots of clusters, hundreds of machines, terabytes (if not petabytes) of data, etc. But actually, you can download a simple JAR and run Hadoop with HDFS on your laptop for practice. It's very easy!

Let's download Hadoop, run it on our local laptop without too much clutter, then run a sample job on it. At the end of this eight-step process, we want to be able to have a local Hadoop instance on our laptop for tests so that we can practice with it.

Our plan:

  1. Set up JAVA_HOME (Hadoop is built on Java).
  2. Download Hadoop tar.gz.
  3. Extract Hadoop tar.gz.
  4. Set up Hadoop configuration.
  5. Start and format HDFS.
  6. Upload files to HDFS.
  7. Run a Hadoop job on these uploaded files.
  8. Get back and print results!

Sounds like a plan!

1. Set Up JAVA_HOME

As we said, Hadoop is built, on Java so we need JAVA_HOME set up.

➜  hadoop ls /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
➜  hadoop echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

2. Download Hadoop tar.gz

Next, we download Hadoop!

➜  hadoop curl http://apache.spd.co.il/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz --output hadoop.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  1  310M    1 3581k    0     0   484k      0  0:10:57  0:00:07  0:10:50  580k

3. Extract Hadoop tar.gz

Now that we have tar.gz on our laptop, let's extract it.

➜  hadoop tar xvfz ~/Downloads/hadoop-3.1.0.tar.gz

4. Set Up HDFS

Now, let's configure HDFS on our laptop:

➜  hadoop cd hadoop-3.1.0
➜  hadoop-3.1.0
➜  hadoop-3.1.0 vi etc/hadoop/core-site.xml

The configuration should be:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

So, we configured the HDFS port — let's configure how many replicas we need. We are on a laptop, so we want only one replica for our data:

➜  hadoop-3.1.0 vi etc/hadoop/hdfs-site.xml:

The above hdfs-site.xml is the site for replica configuration. Below is the configuration it should have (hint: 1):

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Enable SSHD

Hadoop connects to nodes with SSH, so let's enable it on our Mac laptop:

http://cdn.osxdaily.com/wp-content/uploads/2011/09/enable-sftp-server-mac-os-x-lion.jpg

You should be able to SSH with no pass:

➜  hadoop-3.1.0 ssh localhost
Last login: Wed May  9 17:15:28 2018
➜  ~

If you can't do that, then do this:

  $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys

5. Start HDFS

Next, we start and format HDFS on our laptop:

bin/hdfs namenode -format

➜  hadoop-3.1.0 bin/hdfs namenode -format
WARNING: /Users/tomer.bendavid/tmp/hadoop/hadoop-3.1.0/logs does not exist. Creating.
2018-05-10 22:12:02,493 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Tomers-MacBook-Pro.local/192.168.1.104


➜  hadoop-3.1.0 sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes

6. Create Folders on HDFS

Next, we create a sample input folder on HDFS on our laptop:

➜  hadoop-3.1.0 bin/hdfs dfs -mkdir /user
2018-05-10 22:13:16,982 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0 bin/hdfs dfs -mkdir /user/tomer
2018-05-10 22:13:22,474 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0

Upload Test Data to HDFS

Now that we have HDFS up and running on our laptop, let's upload some files:

➜  hadoop-3.1.0 bin/hdfs dfs -put etc/hadoop input
2018-05-10 22:14:28,802 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: `input': No such file or directory: `hdfs://localhost:9000/user/tomer.bendavid/input'
➜  hadoop-3.1.0 bin/hdfs dfs -put etc/hadoop /user/tomer/input
2018-05-10 22:14:37,526 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
➜  hadoop-3.1.0 bin/hdfs dfs -ls /user/tomer/input
2018-05-10 22:16:09,325 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x   - tomer.bendavid supergroup          0 2018-05-10 22:14 /user/tomer/input/hadoop

7. Run Hadoop Job

So, we have HDFS with files on our laptop — now, let's run a job on it:

➜  hadoop-3.1.0 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /user/tomer/input/hadoop/*.xml /user/tomer/output1 'dfs[a-z.]+'
➜  hadoop-3.1.0 bin/hdfs dfs -cat /user/tomer/output1/part-r-00000
2018-05-10 22:22:29,118 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1   dfsadmin
1   dfs.replication

8. Get Back and Print Results

And that's it. We managed to have a local Hadoop installation with HDFS for tests and run a test job! That is so cool!

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Topics:
big data ,tutorial ,hadoop ,hdfs

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}