Over a million developers have joined DZone.

How to Add MapRed-Only Node to Hadoop

DZone's Guide to

How to Add MapRed-Only Node to Hadoop

· Cloud Zone ·
Free Resource

Discover a centralized approach to monitor your virtual infrastructure, on-premise IT environment, and cloud infrastructure – all on a single platform.

I was surprised not to be able to google an answer to this so I want to record my findings here. To add (a.k.a. commission) a node to the Hadoop cluster that should be used only for map-reduce tasks and not for storing data, you have multiple options:

  1. Do not start the datanode service on the node.
  2. If you’ve configured Hadoop to allow only nodes on its whitelist files to connect to it and then add it to the file pointed to by the property mapred.hosts but not to the file in dfs.hosts.
  3. Otherwise add the node to the DFS’ blacklist, i.e. file pointed to by the property dfs.hosts.exclude and execute hadoop dfsadmin -refreshNodes on the namenode to apply it.

#3 is what I did as we weren’t using #2.

When the datanode and tasktracker services start on the new node, they will try to register with the namenode and jobtracker. If the node is on the DFS exclude list then its datanode will not be allowed to connect and consequently won’t be used to store data while map-reduce tasks will be allowed to run on it.

You can set (previously unset) property dfs.hosts.exclude in hdfs-site.xml without restarting the namenode service, it will pick it up anyway (likely when running -refreshNodes). Notice that it should contain path to a file on the local filesystem at the namenode server.

Learn how to auto-discover your containers and monitor their performance, capture Docker host and container metrics to allocate host resources, and provision containers.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}