Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Working With the Hadoop File System API

DZone's Guide to

Working With the Hadoop File System API

Learn how to use the File System API to create and write to a file in HDFS and to get an application to read a file from HDFS and write it back to the local file system.

· Big Data Zone
Free Resource

NoSQL & Big Data Integration through standard drivers (ODBC, JDBC, ADO.NET). Free Download

Reading data from and writing data to the Hadoop Distributed File System (HDFS) can be done in a number of ways. Let's start understanding how this can be done by using the File System API to create and write to a file in HDFS and to get an application to read a file from HDFS and write it back to the local file system.

Let's get started!

1. Include the Dependencies

We first need to include the (sbt) dependencies (for an sbt project) :

libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.8.0",
"org.apache.hadoop" % "hadoop-hdfs" % "2.8.0"
)

2. Configure

The next step is to configure the File System.

/**
     * This method configures the file system
     * @param coreSitePath Path to core-site.xml in hadoop
     * @param hdfsSitePath Path to hdfs-site.xml in hadoop
     * @return HadoopFileSystem instance
     */
    public FileSystem configureFilesystem(String coreSitePath, String hdfsSitePath) {
        FileSystem fileSystem = null;

        try {
            Configuration conf = new Configuration();
            Path hdfsCoreSitePath = new Path(coreSitePath);
            Path hdfsHDFSSitePath = new Path(hdfsSitePath);
            conf.addResource(hdfsCoreSitePath);
            conf.addResource(hdfsHDFSSitePath);

            fileSystem = FileSystem.get(conf);
            return fileSystem;
        } catch (Exception ex) {
            System.out.println("Error occurred while Configuring Filesystem ");
            ex.printStackTrace();
            return fileSystem;
        }
    }

3. Read From/Write to HDFS

After configuring the File System, we are ready to start reading from HDFS or writing to HDFS:

Let's start by writing something to HDFS from our local File System. To perform this operation we will use the void copyFromLocalFilesystem( Path src, Path dst ) method of the File System API.

/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param sourcePath provides the sample input file which can be written to HDFS
     * @param destinationPath refers to path on hdfs where the sample input file will be written
     * @return
     */
    public String writeToHDFS(FileSystem fileSystem, String sourcePath, String destinationPath) {
        try {
            Path inputPath = new Path(sourcePath);
            Path outputPath = new Path(destinationPath);
            fileSystem.copyFromLocalFile(inputPath, outputPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while writing file to hdfs");
            return Constants.FAILURE;
        }
    }

Next, we can read from HDFS and store to our local file system. To perform this operation, we can use the void copyToLocalFile( Path src, Path dst ) method of the File System API.

/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param hdfsStorePath refers to path on hdfs where the sample input file is present
     * @param localSystemPath refers to a location of file on local system in which data read from hadoop file will be written
     * @return
     */
    public String readFileFromHdfs(FileSystem fileSystem, String hdfsStorePath, String localSystemPath) {
        try {
            Path hdfsPath = new Path(hdfsStorePath);
            Path localPath = new Path(localSystemPath);
            fileSystem.copyToLocalFile(hdfsPath, localPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while reading file from hdfs");
            return Constants.FAILURE;
        }
    }

4. Close It Out

The final step is to close the File System after we are done reading from HDFS or writing to HDFS.

/**
     *  This closes the FileSystem instance
     * @param fileSystem
     */
    public void closeFileSystem(FileSystem fileSystem) {
        try {
            fileSystem.close();
        } catch (Exception ex) {
            System.out.println("Unable to close Hadoop filesystem : " + ex);
        }
    }

References

Easily connect any BI, ETL, or Reporting tool to any NoSQL or Big Data database with CData Drivers (ODBC, JDBC, ADO.NET). Download Now

Topics:
big data ,tutorial ,hadoop ,file system api

Published at DZone with permission of Sangeeta Gulia, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}