Extracting Knowledge From Legacy Applications

DZone 's Guide to

Extracting Knowledge From Legacy Applications

Sometimes, it's just not smart or feasible to maintain some legacy applications. Learn about extracting knowledge from these applications into a scalable repository.

· DevOps Zone ·
Free Resource


We are part of Legacy Modernization group in India based multinational company (aka Smart Modernization CoE) and our charter is to build solutions and frameworks to modernize the legacy applications and products. 

The first step in Legacy Modernization exercise is to assess and unlock the legacy system. Our solution, the Legacy Analyzer, helps in just that, in extracting the knowledge from the legacy application/product, thereby enabling effective maintenance or planning modernization strategies.  In addition, this will be a useful tool for Product Transitions and Team Onboarding.

This paper describes the challenges in that process (ie., “in knowing the legacy applications”, or more precisely, extraction of knowledge from the Legacy Applications/Products) and solutions we built to handle those challenges.

Problem Faced

Legacy Applications have several inherent problems:

  • Monolithic and complex
  • Usually have out-dated and inconsistent documentation
  • The SMEs knowing the application are aged and almost in the retiring phase
  • The skillset available on the market is scarce for legacy technologies
  • Longer lead time to launch product enhancements and digitalization

These problems make application discovery a challenging task. Understanding an application’s functional components and business logic is the primary step to provide an optimal design for modernizing the application.

Solution Approach/Remedial Action

  • Identification of the critical information needed
  • Developing/reusing the accelerators for knowledge extraction
  • Organizing the extracted application knowledge in scalable content repository
  • Content enrichment through manual information capture (through discussions with SMEs/existing documents)

Let’s see each step in a bit more detail.

  • Identification of the critical information needed

The list of critical info (and non-critical info, too) needed to be presented in the documentation are identified by both the business stakeholders, as well as the modernization team.  This is typically done through an initial discussion with the application SMEs or through a PoC.

Image title

Fig.1: Application discovery helps in both modernization decision as well as application documentation.

  • Developing/re-using the accelerators for knowledge extraction
  • Based on prior experiences, we come up with solution design (either to re-use previously developed accelerators or to write new ones based on the requirements identified).  In most cases, we would be tweaking the existing scripts to cater to the new requirements (either in terms of changing the configuration files or in terms of minor modifications to the accelerators). 

    The Legacy Analyzer decode monolithic business critical Legacy products/ applications and help in Analysis and Documentation by leveraging suites of automation Tools and Accelerators.

    Image title

    Fig.2: Legacy Analyzer accelerators.

    The Key accelerators among these are

    • Batch Analyzer, which takes the scheduler definitions as input and traces down the entire batch processing of the application
    • Business/Functional Blueprint, a semi-automated process that builds the flow of the business logic from the code, that is further enriched manually by a BA to add business flavor to the document
    • Legacy Portfolio Assessment, a Q&A tool that assesses the application’s business and technical values
    • Data Analyzer, which builds the flow of the data variables across the programs and presents the info in form of a field-backtracking tool
    • Organizing the extracted application knowledge in scalable content repository

    The presentation plays an equally important role in utilization value of the documentation.  The interface should be simple, easy-to-use and maintainable.  For these requirements, what other format, than HTML, is best suited?

    The HTML output has many advantages –

    • Interactive
    • Easy to navigate
    • Search capability (across various documents/sources), and so on.

    Image title

    Fig.3: Legacy Analyzer’s customizable output formats.

    Based on customers’ need, the format of the Legacy Analyzer output can be constructed. For example, for one client, we had produced the entire documentation in MS Excel.

    • Content enrichment through manual information capture

    As anybody would appreciate it, there isn’t any tool in the world that can capture the entire documentation of an application/module. Moreover, the knowledge gathered by SMEs over the years about the product or application plays a vital role in understanding the application.  And we need to provide a mechanism to capture this knowledge in the repository.

    For this reason, our documentation solution always had ways and means of capturing this knowledge - the manually captured/captured through discussion with the SMEs.  The pages of the content repository either have a simple “post-comments” like interface to add additional information, or they can be further designed to have detailed work/process flow (such as content modification, followed by verification/approval, etc).

    Let us share few of our experiences in this knowledge extraction exercise.

    Case Study 1: A Core Banking Product

    Problem Statement: In addition to the inherent problems of a legacy application, this product had a meta layer in-between (i.e., a set of programs, written in Assembler, in this case, acting as an interface between the COBOL code and the resources such as files, databases, memory, etc).

    Solution Approach: Through SME interaction, we had identified the list of critical info to be extracted.  We had our basic knowledge extraction accelerators at that time but we had to rebuild them in almost entirety. Though we had found a market tool supporting analysis of this COBOL variation, but that tool’s support was very preliminary and we had to develop our solution approach further for the knowledge extraction.

    In addition, we identified an in-built trace mechanism in the application and using that, we were able to produce the “dynamic” call flow of the application.  This involved execution of test cases with the trace feature on, and parsing the trace dump to extract the call flow information.

    Image title

    Fig.4: Control Flow within a program (highlighting the conditions as tool-tip comments).

    Case Study 2: A HRM Product

    Problem Statement:  We had a clear list of information to be extracted, but again, similar to the meta layer, here we faced a “closed development” environment (such as, you attach the files that are used in a program through the application development framework and the code is generated automatically by the framework). A synonym for “meta-layer” situation indeed.

    And, the customer wanted the knowledge artifacts to be represented in MS Excel in a pre-defined format.

    Solution Approach: We developed PERL scripts as that would both help to manipulate the output files of the application platform’s utilities as well as to write the output in Excel format. We also mimicked the exact format (including styles, cell colors, etc) of manually produced Excel output.

    Image title

    Fig.5: Screenshots showing application knowledge in MS Excel output.

    In addition, we were able to produce the control flow within the program (ie., how the control flows through the program’s sections and paragraphs) and represented this info in an interactively expandable/collapsible manner in HTML output.  The customer is quite impressed with this feature.

    And, to comply with the customer’s needs, we presented the program control flow in a sheet and the paragraph names/section names in this flow are hyperlinked to individual sheets where the description/business processing of the paragraph/section are manually captured.

    data transfer ,devops ,legacy applications

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}