Over a million developers have joined DZone.

How to Add MapRed-Only Node to Hadoop

· Cloud Zone

Learn about the benefits and drawbacks of microservices with best practices for your own architecture, brought to you in partnership with Iron.io.

I was surprised not to be able to google an answer to this so I want to record my findings here. To add (a.k.a. commission) a node to the Hadoop cluster that should be used only for map-reduce tasks and not for storing data, you have multiple options:

  1. Do not start the datanode service on the node.
  2. If you’ve configured Hadoop to allow only nodes on its whitelist files to connect to it and then add it to the file pointed to by the property mapred.hosts but not to the file in dfs.hosts.
  3. Otherwise add the node to the DFS’ blacklist, i.e. file pointed to by the property dfs.hosts.exclude and execute hadoop dfsadmin -refreshNodes on the namenode to apply it.

#3 is what I did as we weren’t using #2.

When the datanode and tasktracker services start on the new node, they will try to register with the namenode and jobtracker. If the node is on the DFS exclude list then its datanode will not be allowed to connect and consequently won’t be used to store data while map-reduce tasks will be allowed to run on it.

You can set (previously unset) property dfs.hosts.exclude in hdfs-site.xml without restarting the namenode service, it will pick it up anyway (likely when running -refreshNodes). Notice that it should contain path to a file on the local filesystem at the namenode server.

The Cloud Zone is brought to you in partnership with Iron.io.  Learn about best practices and common pitfalls for working with Iron.io. Avoid the dead ends, and take the enlightened path.


Published at DZone with permission of Jakub Holý , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}