{{announcement.body}}
{{announcement.title}}

# Longest Common Substring Algorithm in Java

DZone 's Guide to

# Longest Common Substring Algorithm in Java

· Java Zone ·
Free Resource

Comment (21)

Save
{{ articles.views | formatCount}} Views

For jetwick I needed yet another string algorithm and stumbled over this cool and common problem: trying to find the longest substring of two strings. Be sure that you understand the difference to the LC sequence problem.

For example if we have two strings:

and

I’m peter goliswi

The algorithm should print out ‘ peter go’. The longest common substring algorithm can be implemented in an efficient manner with the help of suffic trees.

But in this post I’ll try to explain the bit less efficient ‘dynamic programming‘ version of the algorithm. Dynamic programming means that you can reuse already calculated information in a later step or you break the algorithm into parts to reuse information. To understand the algorithm you just need to fill the entries of an integer-array with the lengths of the identical substrings. Assume we use i for the horizontal string (please …) and j for the vertical string. Then the algorithm hits at some time i=19 and j=0 for one identical character ‘i’. Then the line

`num[i][j] = 1;`

is executed and saves the lengths of the 1 length identical substring.

`  please, peter go swimmingi 0000000000000000000100100' 0000000000000000000000000m 0000000000000000000011000  0000000100000100100000000p 1000000020000000000000000e 0010010003000000000000000t 0000000000400000000000000e 0010010001050000000000000r 0000000000006000000000000  0000000100000700100000000g 0000000000000080000000000o 0000000000000009000000000l 0100000000000000000000000i 0000000000000000000100100s 0001000000000000010000000w 0000000000000000002000000i 0000000000000000000300100`

Later on it hits the m characters and saves 1 two times to the array but then at i=7 and j=3 it starts our substring and saves 1 for the space character. Then some loops later it reaches i=8 and j=4  Now it reuses the already calculated “identical-length” of 1. It will do:

`num = 1 + num;`

and we get 2. So, we now know we have a substring with two 2 characters. And with

`if (num[i][j] > maxlen)`

we make sure that we overwrite the existing longest substring (stored in the StringBuilder) ONLY IF there is a longer substring found and either append the character (if it is the current substring in progress):

`sb.append(str1.charAt(i));`

or we can start a longer substring. See the java code (mainly taken from wikipedia) for yourself:

`public static String longestSubstring(String str1, String str2) {StringBuilder sb = new StringBuilder();if (str1 == null || str1.isEmpty() || str2 == null || str2.isEmpty())  return "";// ignore casestr1 = str1.toLowerCase();str2 = str2.toLowerCase();// java initializes them already with 0int[][] num = new int[str1.length()][str2.length()];int maxlen = 0;int lastSubsBegin = 0;for (int i = 0; i < str1.length(); i++) {for (int j = 0; j < str2.length(); j++) {  if (str1.charAt(i) == str2.charAt(j)) {    if ((i == 0) || (j == 0))       num[i][j] = 1;    else       num[i][j] = 1 + num[i - 1][j - 1];    if (num[i][j] > maxlen) {      maxlen = num[i][j];      // generate substring from str1 => i      int thisSubsBegin = i - num[i][j] + 1;      if (lastSubsBegin == thisSubsBegin) {         //if the current LCS is the same as the last time this block ran         sb.append(str1.charAt(i));      } else {         //this block resets the string builder if a different LCS is found         lastSubsBegin = thisSubsBegin;         sb = new StringBuilder();         sb.append(str1.substring(lastSubsBegin, i + 1));      }   }}}}return sb.toString();}`

From http://karussell.wordpress.com/2011/04/14/longest-common-substring-algorithm-in-java/

Topics:

Comment (21)

Save
{{ articles.views | formatCount}} Views

Opinions expressed by DZone contributors are their own.