DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Building REST API Backend Easily With Ballerina Language
  • Creating a Secure REST API in Node.js
  • Building and Integrating REST APIs With AWS RDS Databases: A Node.js Example
  • The Noticeable Shift in SIEM Data Sources

Trending

  • Next Evolution in Integration: Architecting With Intent Using Model Context Protocol
  • Building an AI/ML Data Lake With Apache Iceberg
  • Is Agile Right for Every Project? When To Use It and When To Avoid It
  • Unlocking the Potential of Apache Iceberg: A Comprehensive Analysis
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Accessing Hadoop HDFS Data Using Node.js and the WebHDFS REST API

Accessing Hadoop HDFS Data Using Node.js and the WebHDFS REST API

HDFS files are a popular means of storing data. Learn how to use Node.js and the WebHDFS RESTful API to manipulate HDFS data stored in Hadoop.

By 
Somanath Balakrishnan user avatar
Somanath Balakrishnan
DZone Core CORE ·
Jan. 31, 19 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
23.0K Views

Join the DZone community and get the full member experience.

Join For Free

Apache Hadoop exposes services for accessing and manipulating HDFS content with the help of WebHDFS REST APIs. To check out this official documentation, click here.

Available Services

Below are the set of services available:

1) File and Directory Operations

    1.1 Create and Write to a File: CREATE (HTTP PUT)
    1.2 Append to a File: APPEND (HTTP POST)
    1.3 Open and Read a File: OPEN (HTTP GET)
    1.4 Make a Directory: MKDIRS (HTTP PUT)
    1.5 Rename a File/Directory: RENAME (HTTP PUT)
    1.6 Delete a File/Directory: DELETE (HTTP DELETE)
    1.7 Status of a File/Directory: GETFILESTATUS (HTTP GET)
    1.8 List a Directory: LISTSTATUS (HTTP GET)

2) Other File System Operations

    2.1 Get Content Summary of a Directory: GETCONTENTSUMMARY (HTTP GET)
    2.2 Get File Checksum: GETFILECHECKSUM (HTTP GET)
    2.3 Get Home Directory: GETHOMEDIRECTORY (HTTP GET)
    2.4 Set Permission: SETPERMISSION (HTTP PUT)
    2.5 Set Owner: SETOWNER (HTTP PUT)
    2.6 Set Replication Factor: SETREPLICATION (HTTP PUT)
    2.7 Set Access or Modification Time: SETTIMES (HTTP PUT)

Enabling the WebHDFS API

Make sure the config parameter dfs.webhdfs.enabled is set to true in the hdfs-site.xml file (this config file can be found inside {your_hadoop_home_dir}/etc/hadoop.

<configuration>
    <property>
        .....
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>

Connecting to WebHDFS From Node.js

I am hoping you are familiar with Node.js and package installations. Please go through this  if you are not. There is an npm module, "node-webhdfs," with a wrapper that allows you to access Hadoop WebHDFS APIs. You can install the node-webhdfs package using npm:

npm install webhdfs 

After the above step, you can write a Node.js program to access this API. Below are a few steps to help you out.

Import Dependent Modules

Below are external modules to be imported:

const WebHDFS = require("webhdfs");

var request = require("request");  

Prepare Connection URL

Let us prepare the connection URL:

let url = "http://<<your hdfs host name here>>";

let port = 50070; //change here if you are using different port

let dir_path = "<<path to hdfs folder>>"; 

let path = "/webhdfs/v1/" + dir_path + "?op=LISTSTATUS&user.name=hdfs";

let full_url = url+':'+port+path;

List a Directory

Acess the API and get the results:

request(full_url, function(error, response, body) {

    if (!error && response.statusCode == 200) {

        console.log(".. response body..", body);

        let jsonStr = JSON.parse(body);

        let myObj = jsonStr.FileStatuses.FileStatus;

        let objLength = Object.entries(myObj).length;

                 console.log("..Number of files in the folder: ", objLength);

    } else {

         console.log("..error occured!..");

    }

}  

Here is the sample request and response of LISTSTATUSAPI:

 https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#LISTSTATUS 

Get and Display Content of an HDFS File

Assign the HDFS file name with a path:

let hdfs_file_name = '<<HDFS file path>>' ; 

The below code will connect to HDFS using the WebHDFS client instead of the request module we used in the above section:

 let hdfs = WebHDFS.createClient({

    user: "<<user> >",

    host: "<<host/IP >>",

    port: 50070, //change here if you are using different port

    path: "webhdfs/v1/"

});  

The below code is going to be reading and displaying the contents of an HDFS file,

let remoteFileStream = hdfs.createReadStream( hdfs_file_name );

remoteFileStream.on("error", function onError(err) { //handles error while read

    // Do something with the error

    console.log("...error: ", err);

});



let dataStream = [];

remoteFileStream.on("data", function onChunk(chunk) { //on read success

    // Do something with the data chunk 

    dataStream.push(chunk);

    console.log('..chunk..',chunk);

});

remoteFileStream.on("finish", function onFinish() { //on read finish

    console.log('..on finish..');

    console.log('..file data..',dataStream);

}); 

Here is the sample request and response of OPEN API:

https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#OPEN 

How to Read All Files in a Directory

This is not something straightforward, as we don't have a direct method, but we can achieve it by combining the above two operations - reading a directory and then reading the files in that directory one by one.

Conclusion

I am hoping you got some idea about connecting to HDFS and doing basic operations by using Node and the WebHDFS module. All the best!

hadoop API REST Web Protocols Node.js Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Building REST API Backend Easily With Ballerina Language
  • Creating a Secure REST API in Node.js
  • Building and Integrating REST APIs With AWS RDS Databases: A Node.js Example
  • The Noticeable Shift in SIEM Data Sources

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!