Over a million developers have joined DZone.

Reading Large Lines Slower in JDK 7 and JDK 8

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

I recently ran into a case where a particular task (LineContainsRegExp) in an Apache Ant build file ran considerably slower in JDK 7 and JDK 8 than it did in JDK 6for extremely long character lines. Based on a simple example adapted from the Java code used by the LineContainsRegExp task, I was able to determine that the slowness has nothing to do with the regular expression, but rather has to do with reading characters from a file. The remainder of the post demonstrates this.

For my simple test, I first wrote a small Java class to write out a file that includes a line with as many characters as specified on the command line. The simple class, FileMaker, is shown next:

FileMaker.java

import static java.lang.System.out;

import java.io.FileWriter;

/**
 * Writes a file with a line that contains the number of characters provided.
 */
public class FileMaker
{
   /**
    * Create a file with a line that has the number of characters specified.
    *
    * @param arguments Command-line arguments where the first argument is the
    *    name of the file to be written and the second argument is the number
    *   of characters to be written on a single line in the output file.
    */
   public static void main(final String[] arguments)
   {
      if (arguments.length > 1)
      {
         final String fileName = arguments[0];
         final int maxRowSize = Integer.parseInt(arguments[1]);
         try
         {
            final FileWriter fileWriter = new FileWriter(fileName);
            for (int count = 0; count < maxRowSize; count++)
            {
               fileWriter.write('.');
            }
            fileWriter.flush();
         }
         catch (Exception ex)
         {
            out.println("ERROR: Cannot write file '" + fileName + "': " + ex.toString());
         }
      }
      else
      {
         out.println("USAGE: java FileMaker <fileName> <maxRowSize>");
         System.exit(-1);
      }
   }
}

The above Java class exists solely to generate a file with a line that has as many characters as specified (actually one more than specified when the \n is counted). The next class actually demonstrates the difference between the runtime behavior between Java 6 and Java 7. The code for this Main class is adapted from Ant classes that help perform the file reading functionality used by LineContainsRegExp without the regular expression matching. In other words, the regular expression support is not included in my example, but this class executes much more quickly for very large lines when run in Java 6 than when run in Java 7 or Java 8.

Main.java

import static java.lang.System.out;

import java.io.IOException;
import java.io.FileReader;
import java.io.Reader;
import java.util.concurrent.TimeUnit;

/**
 * Adapted from and intended to represent the basic character reading from file
 * used by the Apache Ant class org.apache.tools.ant.filters.LineContainsRegExp.
 */
public class Main
{
   private Reader in;
   private String line;

   public Main(final String nameOfFile)
   {
      if (nameOfFile == null || nameOfFile.isEmpty())
      {
         throw new IllegalArgumentException("ERROR: No file name provided.");
      }
      try
      {
         in = new FileReader(nameOfFile);
      }
      catch (Exception ex)
      {
         out.println("ERROR: " + ex.toString());
         System.exit(-1);
      }
   }
   

   /**
    * Read a line of characters through '\n' or end of stream and return that
    * line of characters with '\n'; adapted from readLine() method of Apache Ant
    * class org.apache.tools.ant.filters.BaseFilterReader.
    */
   protected final String readLine() throws IOException
   {
      int ch = in.read();

      if (ch == -1)
      {
         return null;
      }
        
      final StringBuilder line = new StringBuilder();

      while (ch != -1)
      {
         line.append ((char) ch);
         if (ch == '\n')
         {
            break;
         }
         ch = in.read();
      }

      return line.toString();
   }

   /**
    * Provides the next character in the stream; adapted from the method read()
    * in the Apache Ant class org.apache.tools.ant.filters.LineContainsRegExp.
    */
   public int read() throws IOException
   {
      int ch = -1;
 
      if (line != null)
      {
         ch = line.charAt(0);
         if (line.length() == 1)
         {
            line = null;
         }
         else
         {
            line = line.substring(1);
         }
      }
      else
      {
         for (line = readLine(); line != null; line = readLine())
         {
            if (line != null)
            {
               return read();
            }
         }
      }
      return ch;
   }

   /**
    * Process provided file and read characters from that file and display
    * those characters on standard output.
    *
    * @param arguments Command-line arguments; expect one argument which is the
    *    name of the file from which characters should be read.
    */
   public static void main(final String[] arguments) throws Exception
   {
      if (arguments.length > 0)
      {
        final long startTime = System.currentTimeMillis();
         out.println("Processing file '" + arguments[0] + "'...");
         final Main instance = new Main(arguments[0]);
         int characterInt = 0;
         int totalCharacters = 0;
         while (characterInt != -1)
         {
            characterInt = instance.read();
            totalCharacters++;
         }
         final long endTime = System.currentTimeMillis();
         out.println(
              "Elapsed Time of "
            + TimeUnit.MILLISECONDS.toSeconds(endTime - startTime)
            + " seconds for " + totalCharacters + " characters.");
      }
      else
      {
         out.println("ERROR: No file name provided.");
      }
   }
}

The runtime performance difference when comparing Java 6 to Java 7 or Java 8 is more pronounced as the lines get larger in terms of number of characters. The next screen snapshot demonstrates running the example in Java 6 (indicated by "jdk1.6" being part of path name of java launcher) and then in Java 8 (no explicit path provided because Java 8 is my default JRE) against a freshly generated file called dustin.txtthat includes a line with 1 million (plus one) characters.

Although a Java 7 example is not shown in the screen snapshot above, my tests have shown that Java 7 has similar slowness to Java 8 in terms of processing very lone lines. Also, I have seen this in Windows and RedHat Linux JVMs. As the example indicates, the Java 6 version, even for a million characters in a line, reads the file in what rounds to 0 seconds. When the same compiled-for-Java-6 class file is executed with Java 8, the average length of time to handle the 1 million characters is over 150 seconds (2 1/2 minutes). This same slowness applies when the class is executed in Java 7 and also exists even when the class is compiled with JDK 7 or JDK 8.

Java 7 and Java 8 seem to be exponentially slower reading file characters as the number of characters on a line increases. When I raise the 1 million character line to 10 million characters as shown in the next screen snapshot, Java 6 still reads those very fast (still rounded to 0 seconds), but Java 8 requires over 5 hours to complete the task!

I don't know why Java 7 and Java 8 read a very long line from a file so much slower than Java 6 does. I hope that someone else can explain this. While I have several ideas for working around the issue, I would like to understand why Java 7 and Java 8 read lines with very large number of characters so much slower than Java 6. Here are the observations that can be made based on my testing:

  • The issue appears to be a runtime issue (JRE) rather than a JDK issue because even the file-reading class compiled with JDK 6 runs significantly slower in JRE 7 and JRE 8.
  • Both the Windows 8 and RedHat Linux JRE environments consistently indicated that the file reading is dramatically slower for very large lines in Java 7 and in Java 8 than in Java 6.
  • Processing time for reading very long lines appears to increase exponentially with the number of characters in the line in Java 7 and Java 8.

The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!

Topics:

Published at DZone with permission of Dustin Marx, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}