DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Accessing Hadoop HDFS Data Using Node.js and the WebHDFS REST API

Accessing Hadoop HDFS Data Using Node.js and the WebHDFS REST API

HDFS files are a popular means of storing data. Learn how to use Node.js and the WebHDFS RESTful API to manipulate HDFS data stored in Hadoop.

Somanath Balakrishnan user avatar by
Somanath Balakrishnan
CORE ·
Jan. 31, 19 · Tutorial
Like (1)
Save
Tweet
Share
20.99K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Hadoop exposes services for accessing and manipulating HDFS content with the help of WebHDFS REST APIs. To check out this official documentation, click here.

Available Services

Below are the set of services available:

1) File and Directory Operations

    1.1 Create and Write to a File: CREATE (HTTP PUT)
    1.2 Append to a File: APPEND (HTTP POST)
    1.3 Open and Read a File: OPEN (HTTP GET)
    1.4 Make a Directory: MKDIRS (HTTP PUT)
    1.5 Rename a File/Directory: RENAME (HTTP PUT)
    1.6 Delete a File/Directory: DELETE (HTTP DELETE)
    1.7 Status of a File/Directory: GETFILESTATUS (HTTP GET)
    1.8 List a Directory: LISTSTATUS (HTTP GET)

2) Other File System Operations

    2.1 Get Content Summary of a Directory: GETCONTENTSUMMARY (HTTP GET)
    2.2 Get File Checksum: GETFILECHECKSUM (HTTP GET)
    2.3 Get Home Directory: GETHOMEDIRECTORY (HTTP GET)
    2.4 Set Permission: SETPERMISSION (HTTP PUT)
    2.5 Set Owner: SETOWNER (HTTP PUT)
    2.6 Set Replication Factor: SETREPLICATION (HTTP PUT)
    2.7 Set Access or Modification Time: SETTIMES (HTTP PUT)

Enabling the WebHDFS API

Make sure the config parameter dfs.webhdfs.enabled is set to true in the hdfs-site.xml file (this config file can be found inside {your_hadoop_home_dir}/etc/hadoop.

<configuration>
    <property>
        .....
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>

Connecting to WebHDFS From Node.js

I am hoping you are familiar with Node.js and package installations. Please go through this  if you are not. There is an npm module, "node-webhdfs," with a wrapper that allows you to access Hadoop WebHDFS APIs. You can install the node-webhdfs package using npm:

npm install webhdfs 

After the above step, you can write a Node.js program to access this API. Below are a few steps to help you out.

Import Dependent Modules

Below are external modules to be imported:

const WebHDFS = require("webhdfs");

var request = require("request");  

Prepare Connection URL

Let us prepare the connection URL:

let url = "http://<<your hdfs host name here>>";

let port = 50070; //change here if you are using different port

let dir_path = "<<path to hdfs folder>>"; 

let path = "/webhdfs/v1/" + dir_path + "?op=LISTSTATUS&user.name=hdfs";

let full_url = url+':'+port+path;

List a Directory

Acess the API and get the results:

request(full_url, function(error, response, body) {

    if (!error && response.statusCode == 200) {

        console.log(".. response body..", body);

        let jsonStr = JSON.parse(body);

        let myObj = jsonStr.FileStatuses.FileStatus;

        let objLength = Object.entries(myObj).length;

                 console.log("..Number of files in the folder: ", objLength);

    } else {

         console.log("..error occured!..");

    }

}  

Here is the sample request and response of LISTSTATUSAPI:

 https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#LISTSTATUS 

Get and Display Content of an HDFS File

Assign the HDFS file name with a path:

let hdfs_file_name = '<<HDFS file path>>' ; 

The below code will connect to HDFS using the WebHDFS client instead of the request module we used in the above section:

 let hdfs = WebHDFS.createClient({

    user: "<<user> >",

    host: "<<host/IP >>",

    port: 50070, //change here if you are using different port

    path: "webhdfs/v1/"

});  

The below code is going to be reading and displaying the contents of an HDFS file,

let remoteFileStream = hdfs.createReadStream( hdfs_file_name );

remoteFileStream.on("error", function onError(err) { //handles error while read

    // Do something with the error

    console.log("...error: ", err);

});



let dataStream = [];

remoteFileStream.on("data", function onChunk(chunk) { //on read success

    // Do something with the data chunk 

    dataStream.push(chunk);

    console.log('..chunk..',chunk);

});

remoteFileStream.on("finish", function onFinish() { //on read finish

    console.log('..on finish..');

    console.log('..file data..',dataStream);

}); 

Here is the sample request and response of OPEN API:

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#OPEN 

How to Read All Files in a Directory

This is not something straightforward, as we don't have a direct method, but we can achieve it by combining the above two operations - reading a directory and then reading the files in that directory one by one.

Conclusion

I am hoping you got some idea about connecting to HDFS and doing basic operations by using Node and the WebHDFS module. All the best!

hadoop API REST Web Protocols Node.js Data (computing)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • AWS Cloud Migration: Best Practices and Pitfalls to Avoid
  • Building a Scalable Search Architecture
  • Real-Time Stream Processing With Hazelcast and StreamNative
  • Agile Transformation With ChatGPT or McBoston?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: