DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. A Generic Text Comparison Tool Implementation with LCS Approach

A Generic Text Comparison Tool Implementation with LCS Approach

Cagdas Basaraner user avatar by
Cagdas Basaraner
·
May. 03, 12 · Interview
Like (0)
Save
Tweet
Share
12.44K Views

Join the DZone community and get the full member experience.

Join For Free
Detecting and showing differences of two texts (especially having hundreds or thousands of lines) is a common problem. Using pure java.lang.String class methods may be a solution, but the most important issue for that kind of operations, "performance" will not be satisfactory. We want an efficient solution which may have a view as below: 

Text Difference Tool Example
The problem contains two parts: 
  • Detecting differences of two texts: For detecting differences, an efficient dynamic algorithm of LCS (Longest Common Subsequence) used in this solution. This solution has O(text1WordCount * text2WordCount) complexity and coded as "longestCommonSubsequence" method below.
  • Visualizing the differences: For visualizing, an HTML tag based approach is used, which marks new words of text2 with green color and old words of text1 with red color. This solution has O(changedWordsCount * (text1WordCount+text2WordCount)) complexity and coded as "markTextDifferences" method below.
Note1: For simplicity, "normalizeText" method is used for removing \n, \t and multiple space characters. 
Note2: This class was created as a Vaadin component. But "longestCommonSubsequence" is pure generic and "markTextDifferences" method is generic on HTML based visual components, so they can also be used with different frameworks.
    import java.util.ArrayList;
    import com.vaadin.ui.CustomComponent;
    import com.vaadin.ui.Label;
    import com.vaadin.ui.Layout;
    import com.vaadin.ui.VerticalLayout;
    /**
     * Text comparison component which marks differences of two texts with colors.
     *
     * @author cb
     */
    public class TextCompareComponent extends CustomComponent {
        private Layout mainLayout = new VerticalLayout();
        private ArrayList<String> longestCommonSubsequenceList;
        private static final String INSERT_COLOR = "#99FFCC";
        private static final String DELETE_COLOR = "#CB6D6D";
        public TextCompareComponent(String text1, String text2) {
           
            text1 = normalizeText(text1);
            text2 = normalizeText(text2);
            this.longestCommonSubsequenceList = longestCommonSubsequence(text1, text2);
            String result = markTextDifferences(text1, text2,
                longestCommonSubsequenceList, INSERT_COLOR, DELETE_COLOR);
           
            Label label = new Label(result, Label.CONTENT_XHTML);
            mainLayout.addComponent(label);
            setCompositionRoot(mainLayout);
        }
        /**
         * Finds a list of longest common subsequences (lcs) of given two texts.
         *
         * @param text1
         * @param text2
         * @return - longest common subsequence list
         */
        private ArrayList<String> longestCommonSubsequence(String text1, String text2) {
            String[] text1Words = text1.split(" ");
            String[] text2Words = text2.split(" ");
            int text1WordCount = text1Words.length;
            int text2WordCount = text2Words.length;
           
            int[][] solutionMatrix = new int[text1WordCount + 1][text2WordCount + 1];
           
            for (int i = text1WordCount - 1; i >= 0; i--) {
                for (int j = text2WordCount - 1; j >= 0; j--) {
                    if (text1Words[i].equals(text2Words[j])) {
                        solutionMatrix[i][j] = solutionMatrix[i + 1][j + 1] + 1;
                    }
                    else {
                        solutionMatrix[i][j] = Math.max(solutionMatrix[i + 1][j],
                            solutionMatrix[i][j + 1]);
                    }
                }
            }
           
            int i = 0, j = 0;
            ArrayList<String> lcsResultList = new ArrayList<String>();
            while (i < text1WordCount && j < text2WordCount) {
                if (text1Words[i].equals(text2Words[j])) {
                    lcsResultList.add(text2Words[j]);
                    i++;
                    j++;
                }
                else if (solutionMatrix[i + 1][j] >= solutionMatrix[i][j + 1]) {
                    i++;
                }
                else {
                    j++;
                }
            }
            return lcsResultList;
        }
       
        /**
         * Normalizes given string by deleting \n, \t and extra spaces.
         *
         * @param text - initial string
         * @return - normalized string
         */
        private String normalizeText(String text) {
           
            text = text.trim();
            text = text.replace("\n", " ");
            text = text.replace("\t", " ");
           
            while (text.contains("  ")) {
                text = text.replace("  ", " ");
            }
            return text;
        }
        /**
         * Returns colored inserted/deleted text representation of given two texts.
         * Uses longestCommonSubsequenceList to determine colored sections.
         *
         * @param text1
         * @param text2
         * @param lcsList
         * @param insertColor
         * @param deleteColor
         * @return - colored result text
         */
        private String markTextDifferences(String text1, String text2,
            ArrayList<String> lcsList, String insertColor, String deleteColor) {
            StringBuffer stringBuffer = new StringBuffer();
            if (text1 != null && lcsList != null) {
                String[] text1Words = text1.split(" ");
                String[] text2Words = text2.split(" ");
                int i = 0, j = 0, word1LastIndex = 0, word2LastIndex = 0;
                for (int k = 0; k < lcsList.size(); k++) {
                    for (i = word1LastIndex, j = word2LastIndex;
                        i < text1Words.length && j < text2Words.length;) {
                        if (text1Words[i].equals(lcsList.get(k)) &&
                            text2Words[j].equals(lcsList.get(k))) {
                            stringBuffer.append("<SPAN>" + lcsList.get(k) + " </SPAN>");
                            word1LastIndex = i + 1;
                            word2LastIndex = j + 1;
                            i = text1Words.length;
                            j = text2Words.length;
                        }
                        else if (!text1Words[i].equals(lcsList.get(k))) {
                            for (; i < text1Words.length &&
                                !text1Words[i].equals(lcsList.get(k)); i++) {
                                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                    deleteColor + "'>" + text1Words[i] + " </SPAN>");
                            }
                        } else if (!text2Words[j].equals(lcsList.get(k))) {
                            for (; j < text2Words.length &&
                                !text2Words[j].equals(lcsList.get(k)), j++) {
                                stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                                    insertColor + "'>" + text2Words[j] + " </SPAN>");
                            }
                        }
                    }
                }
                for (; word1LastIndex < text1Words.length; word1LastIndex++) {
                    stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                        deleteColor + "'>" + text1Words[word1LastIndex] + " </SPAN>");
                }
                for (; word2LastIndex < text2Words.length; word2LastIndex++) {
                    stringBuffer.append("<SPAN style='BACKGROUND-COLOR:" +
                        insertColor + "'>" + text2Words[word2LastIndex] + " </SPAN>");
                }
            }
            return stringBuffer.toString();
        }
    }

 
 

 

 

Comparison (grammar) Implementation HTML Framework Vaadin Mark (designation) Space (architecture) Algorithm

Published at DZone with permission of Cagdas Basaraner, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Key Elements of Site Reliability Engineering (SRE)
  • Top 10 Best Practices for Web Application Testing
  • DevOps vs Agile: Which Approach Will Win the Battle for Efficiency?
  • How To Use Linux Containers

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: