Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Generic Text Comparison Tool Implementation with LCS Approach

DZone's Guide to

A Generic Text Comparison Tool Implementation with LCS Approach

· Java Zone ·
Free Resource

Get the Edge with a Professional Java IDE. 30-day free trial.

Detecting and showing differences of two texts (especially having hundreds or thousands of lines) is a common problem. Using pure java.lang.String class methods may be a solution, but the most important issue for that kind of operations, "performance" will not be satisfactory. We want an efficient solution which may have a view as below: 

Text Difference Tool Example
The problem contains two parts: 
  • Detecting differences of two texts: For detecting differences, an efficient dynamic algorithm of LCS (Longest Common Subsequence) used in this solution. This solution has O(text1WordCount * text2WordCount) complexity and coded as "longestCommonSubsequence" method below.
  • Visualizing the differences: For visualizing, an HTML tag based approach is used, which marks new words of text2 with green color and old words of text1 with red color. This solution has O(changedWordsCount * (text1WordCount+text2WordCount)) complexity and coded as "markTextDifferences" method below.
Note1: For simplicity, "normalizeText" method is used for removing \n, \t and multiple space characters. 
Note2: This class was created as a Vaadin component. But "longestCommonSubsequence" is pure generic and "markTextDifferences" method is generic on HTML based visual components, so they can also be used with different frameworks.
    import java.util.ArrayList;
    import com.vaadin.ui.CustomComponent;
    import com.vaadin.ui.Label;
    import com.vaadin.ui.Layout;
    import com.vaadin.ui.VerticalLayout;
    /**
     * Text comparison component which marks differences of two texts with colors.
     *
     * @author cb
     */
    public class TextCompareComponent extends CustomComponent {
        private Layout mainLayout = new VerticalLayout();
        private ArrayList<String> longestCommonSubsequenceList;
        private static final String INSERT_COLOR = "#99FFCC";
        private static final String DELETE_COLOR = "#CB6D6D";
        public TextCompareComponent(String text1, String text2) {
           
            text1 = normalizeText(text1);
            text2 = normalizeText(text2);
            this.longestCommonSubsequenceList = longestCommonSubsequence(text1, text2);
            String result = markTextDifferences(text1, text2,
                longestCommonSubsequenceList, INSERT_COLOR, DELETE_COLOR);
           
            Label label = new Label(result, Label.CONTENT_XHTML);
            mainLayout.addComponent(label);
            setCompositionRoot(mainLayout);
        }
        /**
         * Finds a list of longest common subsequences (lcs) of given two texts.
         *
         * @param text1
         * @param text2
         * @return - longest common subsequence list
         */
        private ArrayList<String> longestCommonSubsequence(String text1, String text2) {
            String[] text1Words = text1.split(" ");
            String[] text2Words = text2.split(" ");
            int text1WordCount = text1Words.length;
            int text2WordCount = text2Words.length;
           
            int[][] solutionMatrix = new int[text1WordCount + 1][text2WordCount + 1];
           
            for (int i = text1WordCount - 1; i >= 0; i--) {
                for (int j = text2WordCount - 1; j >= 0; j--) {
                    if (text1Words[i].equals(text2Words[j])) {
                        solutionMatrix[i][j] = solutionMatrix[i + 1][j + 1] + 1;
                    }
                    else {
                        solutionMatrix[i][j] = Math.max(solutionMatrix[i + 1][j],
                            solutionMatrix[i][j + 1]);
                    }
                }
            }
           
            int i = 0, j = 0;
            ArrayList<String> lcsResultList = new ArrayList<String>();
            while (i < text1WordCount && j < text2WordCount) {
                if (text1Words[i].equals(text2Words[j])) {
                    lcsResultList.add(text2Words[j]);
                    i++;
                    j++;
                }
                else if (solutionMatrix[i + 1][j] >= solutionMatrix[i][j + 1]) {
                    i++;
                }
                else {
                    j++;
                }
            }
            return lcsResultList;
        }
       
        /**
         * Normalizes given string by deleting \n, \t and extra spaces.
         *
         * @param text - initial string
         * @return - normalized string
         */
        private String normalizeText(String text) {
           
            text = text.trim();
            text = text.replace("\n", " ");
            text = text.replace("\t", " ");
           
            while (text.contains("  ")) {
                text = text.replace("  ", " ");
            }
            return text;
        }
        /**
         * Returns colored inserted/deleted text representation of given two texts.
         * Uses longestCommonSubsequenceList to determine colored sections.
         *
         * @param text1
         * @param text2
         * @param lcsList
         * @param insertColor
         * @param deleteColor
         * @return - colored result text
         */
        private String markTextDifferences(String text1, String text2,
            ArrayList<String> lcsList, String insertColor, String deleteColor) {
            StringBuffer stringBuffer = new StringBuffer();
            if (text1 != null && lcsList != null) {
                String[] text1Words = text1.split(" ");
                String[] text2Words = text2.split(" ");
                int i = 0, j = 0, word1LastIndex = 0, word2LastIndex = 0;
                for (int k = 0; k < lcsList.size(); k++) {
                    for (i = word1LastIndex, j = word2LastIndex;
                        i < text1Words.length && j < text2Words.length;) {
                        if (text1Words[i].equals(lcsList.get(k)) &&
                            text2Words[j].equals(lcsList.get(k))) {
                            stringBuffer.append("<SPAN>" + lcsList.get(k) + " </SPAN>");
                            word1LastIndex = i + 1;
                            word2LastIndex = j + 1;
                            i = text1Words.length;
                            j = text2Words.length;
                        }
                        else if (!text1Words[i].equals(lcsList.get(k))) {
                            for (; i < text1Words.length &&
                                !text1Words[i].equals(lcsList.get(k)); i++) {
                                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                    deleteColor + "'>" + text1Words[i] + " </SPAN>");
                            }
                        } else if (!text2Words[j].equals(lcsList.get(k))) {
                            for (; j < text2Words.length &&
                                !text2Words[j].equals(lcsList.get(k)), j++) {
                                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                    insertColor + "'>" + text2Words[j] + " </SPAN>");
                            }
                        }
                    }
                }
                for (; word1LastIndex < text1Words.length; word1LastIndex++) {
                    stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                        deleteColor + "'>" + text1Words[word1LastIndex] + " </SPAN>");
                }
                for (; word2LastIndex < text2Words.length; word2LastIndex++) {
                    stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                        insertColor + "'>" + text2Words[word2LastIndex] + " </SPAN>");
                }
            }
            return stringBuffer.toString();
        }
    }

 
 

 

 

Get the Java IDE that understands code & makes developing enjoyable. Level up your code with IntelliJ IDEA. Download the free trial.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}