DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Building REST API Backend Easily With Ballerina Language
  • Creating a Secure REST API in Node.js
  • GraphQL vs REST API: Which Is Better for Your Project in 2025?
  • Transforming Your Node.js REST API into an AI-Ready MCP Server

Trending

  • Building Enterprise-Grade Real-Time IoT Dashboards with Vue 3, MQTT, and Kafka
  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  • Ingesting Fixed-Width Mainframe Files Into Delta Lake: The Details Nobody Writes Down
  • From AI Chaos to Control: Building Enterprise-Grade LLM Gateways With MuleSoft Anypoint
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Accessing Hadoop HDFS Data Using Node.js and the WebHDFS REST API

Accessing Hadoop HDFS Data Using Node.js and the WebHDFS REST API

HDFS files are a popular means of storing data. Learn how to use Node.js and the WebHDFS RESTful API to manipulate HDFS data stored in Hadoop.

By 
Somanath Balakrishnan user avatar
Somanath Balakrishnan
DZone Core CORE ·
Jan. 31, 19 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
23.5K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Hadoop exposes services for accessing and manipulating HDFS content with the help of WebHDFS REST APIs. To check out this official documentation, click here.

Available Services

Below are the set of services available:

1) File and Directory Operations

    1.1 Create and Write to a File: CREATE (HTTP PUT)
    1.2 Append to a File: APPEND (HTTP POST)
    1.3 Open and Read a File: OPEN (HTTP GET)
    1.4 Make a Directory: MKDIRS (HTTP PUT)
    1.5 Rename a File/Directory: RENAME (HTTP PUT)
    1.6 Delete a File/Directory: DELETE (HTTP DELETE)
    1.7 Status of a File/Directory: GETFILESTATUS (HTTP GET)
    1.8 List a Directory: LISTSTATUS (HTTP GET)

2) Other File System Operations

    2.1 Get Content Summary of a Directory: GETCONTENTSUMMARY (HTTP GET)
    2.2 Get File Checksum: GETFILECHECKSUM (HTTP GET)
    2.3 Get Home Directory: GETHOMEDIRECTORY (HTTP GET)
    2.4 Set Permission: SETPERMISSION (HTTP PUT)
    2.5 Set Owner: SETOWNER (HTTP PUT)
    2.6 Set Replication Factor: SETREPLICATION (HTTP PUT)
    2.7 Set Access or Modification Time: SETTIMES (HTTP PUT)

Enabling the WebHDFS API

Make sure the config parameter dfs.webhdfs.enabled is set to true in the hdfs-site.xml file (this config file can be found inside {your_hadoop_home_dir}/etc/hadoop.

<configuration>
    <property>
        .....
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>

Connecting to WebHDFS From Node.js

I am hoping you are familiar with Node.js and package installations. Please go through this  if you are not. There is an npm module, "node-webhdfs," with a wrapper that allows you to access Hadoop WebHDFS APIs. You can install the node-webhdfs package using npm:

npm install webhdfs 

After the above step, you can write a Node.js program to access this API. Below are a few steps to help you out.

Import Dependent Modules

Below are external modules to be imported:

const WebHDFS = require("webhdfs");

var request = require("request");  

Prepare Connection URL

Let us prepare the connection URL:

let url = "http://<<your hdfs host name here>>";

let port = 50070; //change here if you are using different port

let dir_path = "<<path to hdfs folder>>"; 

let path = "/webhdfs/v1/" + dir_path + "?op=LISTSTATUS&user.name=hdfs";

let full_url = url+':'+port+path;

List a Directory

Acess the API and get the results:

request(full_url, function(error, response, body) {

    if (!error && response.statusCode == 200) {

        console.log(".. response body..", body);

        let jsonStr = JSON.parse(body);

        let myObj = jsonStr.FileStatuses.FileStatus;

        let objLength = Object.entries(myObj).length;

                 console.log("..Number of files in the folder: ", objLength);

    } else {

         console.log("..error occured!..");

    }

}  

Here is the sample request and response of LISTSTATUSAPI:

 https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#LISTSTATUS 

Get and Display Content of an HDFS File

Assign the HDFS file name with a path:

let hdfs_file_name = '<<HDFS file path>>' ; 

The below code will connect to HDFS using the WebHDFS client instead of the request module we used in the above section:

 let hdfs = WebHDFS.createClient({

    user: "<<user> >",

    host: "<<host/IP >>",

    port: 50070, //change here if you are using different port

    path: "webhdfs/v1/"

});  

The below code is going to be reading and displaying the contents of an HDFS file,

let remoteFileStream = hdfs.createReadStream( hdfs_file_name );

remoteFileStream.on("error", function onError(err) { //handles error while read

    // Do something with the error

    console.log("...error: ", err);

});



let dataStream = [];

remoteFileStream.on("data", function onChunk(chunk) { //on read success

    // Do something with the data chunk 

    dataStream.push(chunk);

    console.log('..chunk..',chunk);

});

remoteFileStream.on("finish", function onFinish() { //on read finish

    console.log('..on finish..');

    console.log('..file data..',dataStream);

}); 

Here is the sample request and response of OPEN API:

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#OPEN 

How to Read All Files in a Directory

This is not something straightforward, as we don't have a direct method, but we can achieve it by combining the above two operations - reading a directory and then reading the files in that directory one by one.

Conclusion

I am hoping you got some idea about connecting to HDFS and doing basic operations by using Node and the WebHDFS module. All the best!

hadoop API REST Web Protocols Node.js Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Building REST API Backend Easily With Ballerina Language
  • Creating a Secure REST API in Node.js
  • GraphQL vs REST API: Which Is Better for Your Project in 2025?
  • Transforming Your Node.js REST API into an AI-Ready MCP Server

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook