Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Simple Java Program to Append to a File in HDFS

DZone's Guide to

Simple Java Program to Append to a File in HDFS

Get the code and instructions needed to build a Java program to append to a file in HDFS, using Maven as the build tool.

· Database Zone ·
Free Resource

Discover Tarantool's unique features which include powerful stored procedures, SQL support, smart cache, and the speed of 1 million ACID transactions on a single CPU core!

In this article, I will present you with a Java program to append to a file in HDFS.

I will be using Maven as the build tool.

First, we need to add Maven dependencies in the pom.xml.

Now, we need to import the following classes:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.*;

We will be using the hadoop.conf.Configuration class to set the file system configurations as per the configuration of the Hadoop cluster installed.

Let's now start with configuring the file system:

public FileSystem configureFileSystem(String coreSitePath, String hdfsSitePath) {
    FileSystem fileSystem = null;
    try {
        Configuration conf = new Configuration();
        conf.setBoolean("dfs.support.append", true);
        Path coreSite = new Path(coreSitePath);
        Path hdfsSite = new Path(hdfsSitePath);
        conf.addResource(coreSite);
        conf.addResource(hdfsSite);
        fileSystem = FileSystem.get(conf);
    } catch (IOException ex) {
        System.out.println("Error occurred while configuring FileSystem");
    }
    return fileSystem;
}

Make sure that the property dfs.support.append in hdfs-site.xml is set to true.

You can either set it manually by editing the hdfs-site.xml file or programmatically using:

conf.setBoolean("dfs.support.append", true);

Now that the file system is configured, we can access the files stored in HDFS.

Let's start with appending to a file in HDFS.

public String appendToFile(FileSystem fileSystem, String content, String dest) throws IOException {

    Path destPath = new Path(dest);
    if (!fileSystem.exists(destPath)) {
        System.err.println("File doesn't exist");
        return "Failure";
    }

    Boolean isAppendable = Boolean.valueOf(fileSystem.getConf().get("dfs.support.append"));

    if(isAppendable) {
        FSDataOutputStream fs_append = fileSystem.append(destPath);
        PrintWriter writer = new PrintWriter(fs_append);
        writer.append(content);
        writer.flush();
        fs_append.hflush();
        writer.close();
        fs_append.close();
        return "Success";
    }
    else {
        System.err.println("Please set the dfs.support.append property to true");
        return "Failure";
    }
}

To see whether the data has been correctly written to HDFS, let's write a method to read from HDFS and return the content as a String.

public String readFromHdfs(FileSystem fileSystem, String hdfsFilePath) {
    Path hdfsPath = new Path(hdfsFilePath);
    StringBuilder fileContent = new StringBuilder("");
    try{
        BufferedReader bfr=new BufferedReader(new InputStreamReader(fileSystem.open(hdfsPath)));
        String str;
        while ((str = bfr.readLine()) != null) {
            fileContent.append(str+"\n");
        }
    }
    catch (IOException ex){
        System.out.println("----------Could not read from HDFS---------\n");
    }
    return fileContent.toString();
}

After that, we have successfully written and read the file in HDFS. It's time to close the file system.

public void closeFileSystem(FileSystem fileSystem){
    try {
        fileSystem.close();
    }
    catch (IOException ex){
        System.out.println("----------Could not close the FileSystem----------");
    }
}

Before executing the code, you should have Hadoop running on your system.

You just need to go to HADOOP_HOME and run following command:

./sbin/start-all.sh

For the complete program, refer to my GitHub repository.

Happy coding!

Discover Tarantool's unique features such as powerful stored procedures, SQL support, smart cache, and the speed of 1 million ACID transactions on a single CPU.

Topics:
hdfs ,java program ,database ,tutorial ,maven ,appending

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}