DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Working With the Hadoop File System API

Working With the Hadoop File System API

Learn how to use the File System API to create and write to a file in HDFS and to get an application to read a file from HDFS and write it back to the local file system.

Sangeeta Gulia user avatar by
Sangeeta Gulia
·
Apr. 19, 17 · Big Data Zone · Tutorial
Like (3)
Save
Tweet
11.11K Views

Join the DZone community and get the full member experience.

Join For Free

Reading data from and writing data to the Hadoop Distributed File System (HDFS) can be done in a number of ways. Let's start understanding how this can be done by using the File System API to create and write to a file in HDFS and to get an application to read a file from HDFS and write it back to the local file system.

Let's get started!

1. Include the Dependencies

We first need to include the (sbt) dependencies (for an sbt project) :

libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.8.0",
"org.apache.hadoop" % "hadoop-hdfs" % "2.8.0"
)

2. Configure

The next step is to configure the File System.

/**
     * This method configures the file system
     * @param coreSitePath Path to core-site.xml in hadoop
     * @param hdfsSitePath Path to hdfs-site.xml in hadoop
     * @return HadoopFileSystem instance
     */
    public FileSystem configureFilesystem(String coreSitePath, String hdfsSitePath) {
        FileSystem fileSystem = null;

        try {
            Configuration conf = new Configuration();
            Path hdfsCoreSitePath = new Path(coreSitePath);
            Path hdfsHDFSSitePath = new Path(hdfsSitePath);
            conf.addResource(hdfsCoreSitePath);
            conf.addResource(hdfsHDFSSitePath);

            fileSystem = FileSystem.get(conf);
            return fileSystem;
        } catch (Exception ex) {
            System.out.println("Error occurred while Configuring Filesystem ");
            ex.printStackTrace();
            return fileSystem;
        }
    }

3. Read From/Write to HDFS

After configuring the File System, we are ready to start reading from HDFS or writing to HDFS:

Let's start by writing something to HDFS from our local File System. To perform this operation we will use the void copyFromLocalFilesystem( Path src, Path dst ) method of the File System API.

/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param sourcePath provides the sample input file which can be written to HDFS
     * @param destinationPath refers to path on hdfs where the sample input file will be written
     * @return
     */
    public String writeToHDFS(FileSystem fileSystem, String sourcePath, String destinationPath) {
        try {
            Path inputPath = new Path(sourcePath);
            Path outputPath = new Path(destinationPath);
            fileSystem.copyFromLocalFile(inputPath, outputPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while writing file to hdfs");
            return Constants.FAILURE;
        }
    }

Next, we can read from HDFS and store to our local file system. To perform this operation, we can use the void copyToLocalFile( Path src, Path dst ) method of the File System API.

/**
     *
     * @param fileSystem refers to Hadoop FileSystem instance
     * @param hdfsStorePath refers to path on hdfs where the sample input file is present
     * @param localSystemPath refers to a location of file on local system in which data read from hadoop file will be written
     * @return
     */
    public String readFileFromHdfs(FileSystem fileSystem, String hdfsStorePath, String localSystemPath) {
        try {
            Path hdfsPath = new Path(hdfsStorePath);
            Path localPath = new Path(localSystemPath);
            fileSystem.copyToLocalFile(hdfsPath, localPath);
            return Constants.SUCCESS;
        } catch (IOException ex) {
            System.out.println("Some exception occurred while reading file from hdfs");
            return Constants.FAILURE;
        }
    }

4. Close It Out

The final step is to close the File System after we are done reading from HDFS or writing to HDFS.

/**
     *  This closes the FileSystem instance
     * @param fileSystem
     */
    public void closeFileSystem(FileSystem fileSystem) {
        try {
            fileSystem.close();
        } catch (Exception ex) {
            System.out.println("Unable to close Hadoop filesystem : " + ex);
        }
    }

References

  • Apache File System documentation

File system API hadoop File system API

Published at DZone with permission of Sangeeta Gulia, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Building Reactive Java Applications with Spring Framework
  • Introduction to JWT (Also JWS, JWE, JWA, JWK)
  • Quantum Computers Explained
  • Regression Testing: Significance, Challenges, Best Practices and Tools

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo