A Generic Text Comparison Tool Implementation with LCS Approach
Join the DZone community and get the full member experience.
Join For FreeDetecting and showing differences of two texts (especially having
hundreds or thousands of lines) is a common problem. Using pure
java.lang.String class methods may be a solution, but the most important
issue for that kind of operations, "performance" will not be
satisfactory. We want an efficient solution which may have a view as
below:
![]() |
Text Difference Tool Example |
The problem contains two parts:
- Detecting differences of two texts: For detecting differences, an efficient dynamic algorithm of LCS (Longest Common Subsequence) used in this solution. This solution has O(text1WordCount * text2WordCount) complexity and coded as "longestCommonSubsequence" method below.
- Visualizing the differences: For visualizing, an HTML tag based approach is used, which marks new words of text2 with green color and old words of text1 with red color. This solution has O(changedWordsCount * (text1WordCount+text2WordCount)) complexity and coded as "markTextDifferences" method below.
Note1: For simplicity, "normalizeText" method is used for removing \n, \t and multiple space characters.
Note2: This class
was created as a Vaadin component. But "longestCommonSubsequence" is
pure generic and "markTextDifferences" method is generic on HTML based
visual components, so they can also be used with different frameworks.
import java.util.ArrayList; import com.vaadin.ui.CustomComponent; import com.vaadin.ui.Label; import com.vaadin.ui.Layout; import com.vaadin.ui.VerticalLayout; /** * Text comparison component which marks differences of two texts with colors. * * @author cb */ public class TextCompareComponent extends CustomComponent { private Layout mainLayout = new VerticalLayout(); private ArrayList<String> longestCommonSubsequenceList; private static final String INSERT_COLOR = "#99FFCC"; private static final String DELETE_COLOR = "#CB6D6D"; public TextCompareComponent(String text1, String text2) { text1 = normalizeText(text1); text2 = normalizeText(text2); this.longestCommonSubsequenceList = longestCommonSubsequence(text1, text2); String result = markTextDifferences(text1, text2, longestCommonSubsequenceList, INSERT_COLOR, DELETE_COLOR); Label label = new Label(result, Label.CONTENT_XHTML); mainLayout.addComponent(label); setCompositionRoot(mainLayout); } /** * Finds a list of longest common subsequences (lcs) of given two texts. * * @param text1 * @param text2 * @return - longest common subsequence list */ private ArrayList<String> longestCommonSubsequence(String text1, String text2) { String[] text1Words = text1.split(" "); String[] text2Words = text2.split(" "); int text1WordCount = text1Words.length; int text2WordCount = text2Words.length; int[][] solutionMatrix = new int[text1WordCount + 1][text2WordCount + 1]; for (int i = text1WordCount - 1; i >= 0; i--) { for (int j = text2WordCount - 1; j >= 0; j--) { if (text1Words[i].equals(text2Words[j])) { solutionMatrix[i][j] = solutionMatrix[i + 1][j + 1] + 1; } else { solutionMatrix[i][j] = Math.max(solutionMatrix[i + 1][j], solutionMatrix[i][j + 1]); } } } int i = 0, j = 0; ArrayList<String> lcsResultList = new ArrayList<String>(); while (i < text1WordCount && j < text2WordCount) { if (text1Words[i].equals(text2Words[j])) { lcsResultList.add(text2Words[j]); i++; j++; } else if (solutionMatrix[i + 1][j] >= solutionMatrix[i][j + 1]) { i++; } else { j++; } } return lcsResultList; } /** * Normalizes given string by deleting \n, \t and extra spaces. * * @param text - initial string * @return - normalized string */ private String normalizeText(String text) { text = text.trim(); text = text.replace("\n", " "); text = text.replace("\t", " "); while (text.contains(" ")) { text = text.replace(" ", " "); } return text; } /** * Returns colored inserted/deleted text representation of given two texts. * Uses longestCommonSubsequenceList to determine colored sections. * * @param text1 * @param text2 * @param lcsList * @param insertColor * @param deleteColor * @return - colored result text */ private String markTextDifferences(String text1, String text2, ArrayList<String> lcsList, String insertColor, String deleteColor) { StringBuffer stringBuffer = new StringBuffer(); if (text1 != null && lcsList != null) { String[] text1Words = text1.split(" "); String[] text2Words = text2.split(" "); int i = 0, j = 0, word1LastIndex = 0, word2LastIndex = 0; for (int k = 0; k < lcsList.size(); k++) { for (i = word1LastIndex, j = word2LastIndex; i < text1Words.length && j < text2Words.length;) { if (text1Words[i].equals(lcsList.get(k)) && text2Words[j].equals(lcsList.get(k))) { stringBuffer.append("<SPAN>" + lcsList.get(k) + " </SPAN>"); word1LastIndex = i + 1; word2LastIndex = j + 1; i = text1Words.length; j = text2Words.length; } else if (!text1Words[i].equals(lcsList.get(k))) { for (; i < text1Words.length && !text1Words[i].equals(lcsList.get(k)); i++) { stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" + deleteColor + "'>" + text1Words[i] + " </SPAN>"); } } else if (!text2Words[j].equals(lcsList.get(k))) { for (; j < text2Words.length && !text2Words[j].equals(lcsList.get(k)), j++) { stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" + insertColor + "'>" + text2Words[j] + " </SPAN>"); } } } } for (; word1LastIndex < text1Words.length; word1LastIndex++) { stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" + deleteColor + "'>" + text1Words[word1LastIndex] + " </SPAN>"); } for (; word2LastIndex < text2Words.length; word2LastIndex++) { stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" + insertColor + "'>" + text2Words[word2LastIndex] + " </SPAN>"); } } return stringBuffer.toString(); } }
Comparison (grammar)
Implementation
HTML
Framework
Vaadin
Mark (designation)
Space (architecture)
Algorithm
Published at DZone with permission of Cagdas Basaraner, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments