Over a million developers have joined DZone.

Hadoop: Checking if a Machine Can Communicate with HDFS

I wanted to write a little program to check that one machine could communicate with an HDFS server running on the other and adapted some code from the Hadoop wiki as follows:

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

I wanted to write a little program to check that one machine could communicate with an HDFS server running on the other and adapted some code from the Hadoop wiki as follows:

package org.playground;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
 
import java.io.IOException;
 
public class HadoopDFSFileReadWrite {
 
    static void printAndExit(String str) {
        System.err.println( str );
        System.exit(1);
    }
 
    public static void main (String[] argv) throws IOException {
        Configuration conf = new Configuration();
        conf.addResource(new Path("/Users/markneedham/Downloads/core-site.xml"));
 
        FileSystem fs = FileSystem.get(conf);
 
        Path inFile = new Path("hdfs://192.168.0.11/user/markneedham/explore.R");
        Path outFile = new Path("hdfs://192.168.0.11/user/markneedham/output-" + System.currentTimeMillis());
 
        // Check if input/output are valid
        if (!fs.exists(inFile))
            printAndExit("Input file not found");
        if (!fs.isFile(inFile))
            printAndExit("Input should be a file");
        if (fs.exists(outFile))
            printAndExit("Output already exists");
 
        // Read from and write to new file
        byte buffer[] = new byte[256];
        try ( FSDataInputStream in = fs.open( inFile ); FSDataOutputStream out = fs.create( outFile ) )
        {
            int bytesRead = 0;
            while ( (bytesRead = in.read( buffer )) > 0 )
            {
                out.write( buffer, 0, bytesRead );
            }
        }
        catch ( IOException e )
        {
            System.out.println( "Error while copying file" );
        }
    }
}

I initially thought I only had the following in my POM file:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.7.0</version>
</dependency>

But when I ran the script I got the following exception:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSOutputSummer.<init>(Ljava/util/zip/Checksum;II)V
at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1553)
at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1582)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1614)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1390)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:394)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:390)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:334)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
at org.playground.HadoopDFSFileReadWrite.main(HadoopDFSFileReadWrite.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

From following the stack trace I realised I’d made a mistake and had accidentally pulled in a dependency on hadoop-hdfs 2.4.1. If we don’t have the hadoop-hdfs dependency we’d actually see this error instead:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.playground.HadoopDFSFileReadWrite.main(HadoopDFSFileReadWrite.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Now let’s add the correct version of the dependency and make sure it all works as expected:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.7.0</version>
    <exclusions>
        <exclusion>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
        </exclusion>
        <exclusion>
            <groupId>javax.servlet</groupId>
            <artifactId>servlet-api</artifactId>
        </exclusion>
    </exclusions>
</dependency>

When we run that a new file is created in HDFS on the other machine with the current timestamp:

$ date +%s000
1446336801000
 
$ hdfs dfs -ls
...
-rw-r--r--   3 markneedham supergroup       9249 2015-11-01 00:13 output-1446337098257
...

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
hadoop ,hdfs

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}