DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • How to Convert XLS to XLSX in Java
  • Graceful Shutdown: Spring Framework vs Golang Web Services
  • Build a REST API With Just 2 Classes in Java and Quarkus
  • Thread-Safety Pitfalls in XML Processing

Trending

  • A Guide to Developing Large Language Models Part 1: Pretraining
  • It’s Not About Control — It’s About Collaboration Between Architecture and Security
  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  • Unlocking the Benefits of a Private API in AWS API Gateway
  1. DZone
  2. Coding
  3. Languages
  4. Automate Migration Assessment With XML Linter

Automate Migration Assessment With XML Linter

Discover how ZK Team developed an XML linter to detect potential compatibility issues in existing codebases to assist migration decision-making.

By 
Rebecca Lai user avatar
Rebecca Lai
·
Aug. 25, 23 · Tutorial
Likes (11)
Comment
Save
Tweet
Share
11.3K Views

Join the DZone community and get the full member experience.

Join For Free

When people think of linting, the first thing that comes to mind is usually static code analysis for programming languages, but rarely for markup languages.

In this article, I would like to share how our team developed ZK Client MVVM Linter, an XML linter that automates migration assessment for our new Client MVVM feature in the upcoming ZK 10 release. The basic idea is to compile a catalog of known compatibility issues as lint rules to allow users to assess the potential issues flagged by the linter before committing to the migration.

For those unfamiliar with ZK, ZK is a Java framework for building enterprise applications; ZUL (ZK User Interface Markup Language) is its XML-based language for simplifying user interface creation. Through sharing our experience developing ZK Client MVVM Linter, we hope XML linters can find broader applications.

File Parsing

The Problem

Like other popular linters, our ZUL linter starts by parsing source code into AST (abstract syntax tree). Although Java provides several libraries for XML parsing, they lose the original line and column numbers of elements in the parsing process. As the subsequent analysis stage will need this positional information to report compatibility issues precisely, our first task is to find a way to obtain and store the original line and column numbers in AST.

How We Address This

After exploring different online sources, we found a Stack Overflow solution that leverages the event-driven property of SAX Parser to store the end position of each start tag in AST. Its key observation was that the parser invokes the startElement method whenever it encounters the ending ‘>’ character. Therefore, the parser position returned by the locator must be equivalent to the end position of the start tag, making the startElement method the perfect opportunity for creating new AST nodes and storing their end positions.

Java
 
public static Document parse(File file) throws Exception {
  Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
  SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
  parser.parse(file, new DefaultHandler() {
    private Locator _locator;
    private final Stack<Node> _stack = new Stack<>();

    @Override
    public void setDocumentLocator(Locator locator) {
      _locator = locator;
      _stack.push(document);
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) {
      // Create a new AST node
      Element element = document.createElement(qName);
      for (int i = 0; i < attributes.getLength(); i++)
        element.setAttribute(attributes.getQName(i), attributes.getValue(i));
      // Store its end position
      int lineNumber = _locator.getLineNumber(), columnNumber = _locator.getColumnNumber();
      element.setUserData("position", lineNumber + ":" + columnNumber, null);
      _stack.push(element);
    }

    @Override
    public void endElement(String uri, String localName, String qName) {
      Node element = _stack.pop();
      _stack.peek().appendChild(element);
    }
  });
  return document;
}


Building on the solution above, we implemented a more sophisticated parser capable of storing the position of each attribute. Our parser uses the end positions returned by the locator as reference points to reduce the task into finding attribute positions relative to the end position. Initially, we started with a simple idea of iteratively finding and removing the last occurrence of each attribute-value pair from the buffer. For example, if <elem attr1="value" attr2="value"> ends at 3:34 (line 3: column 34), our parser will perform the following steps:

Plain Text
 
Initialize buffer = <elem attr1="value" attr2="value">
Find buffer.lastIndexOf("value") = 28 → Update buffer = <elem attr1="value" attr2="
Find buffer.lastIndexOf("attr2") = 21 → Update buffer = <elem attr1="value"
Find buffer.lastIndexOf("value") = 14 → Update buffer = <elem attr1="
Find buffer.lastIndexOf("attr1") =  7 → Update buffer = <elem
From steps 3 and 6, we can conclude that attr1 and attr2 start at 3:7 and 3:21, respectively.


Then, we further improved the mechanism to handle other formatting variations, such as a single start tag across multiple lines and multiple start tags on a single line, by introducing the start index and leading space stack to store the buffer indices where new lines start and the number of leading spaces of each line. For example, if there is a start tag that starts from line 1 and ends at 3:20 (line 3: column 20):

XML
 
<elem attr1="value
    across 2 lines"
    attr2 = "value">


Our parser will perform the following steps:

Plain Text
 
Initialize buffer = <elem attr1="value across 2 lines" attr2 = "value">
Initialize startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
Find buffer.lastIndexOf("value") = 45
Find buffer.lastIndexOf("attr2") = 36
 → lineNumber = 3, startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
 → columnNumber = 36 - startIndexes.peek() + leadingSpaces.peek() = 5
Find buffer.lastIndexOf("value across 2 lines") = 14
Find buffer.lastIndexOf("attr1") =  7
 → Update lineNumber = 1, startIndexes = [0], and leadingSpaces = [0]
 → columnNumber =  7 - startIndexes.peek() + leadingSpaces.peek() = 7
From steps 4 and 8, we can conclude that attr1 and attr2 start at 1:7 and 3:5, respectively.


As a result of the code provided below:

Java
 
public void startElement(String uri, String localName, String qName, Attributes attributes) {
  // initialize buffer, startIndexes, and leadingSpaces
  int endLineNumber = _locator.getLineNumber(), endColNumber = _locator.getColumnNumber();
  for (int i = 0; _readerLineNumber <= endLineNumber; i++, _readerLineNumber++) {
    startIndexes.push(buffer.length());
    if (i > 0) _readerCurrentLine = _reader.readLine();
    buffer.append(' ').append((_readerLineNumber < endLineNumber ? _readerCurrentLine :
            _readerCurrentLine.substring(0, endColNumber - 1)).stripLeading());
    leadingSpaces.push(countLeadingSpaces(_readerCurrentLine));
  }
  _readerLineNumber--;
  // recover attribute positions
  int lineNumber = endLineNumber, columnNumber;
  Element element = document.createElement(qName);
  for (int i = attributes.getLength() - 1; i >= 0; i--) {
    String[] words = attributes.getValue(i).split("\\s+");
    for (int j = words.length - 1; j >= 0; j--)
      buffer.delete(buffer.lastIndexOf(words[j]), buffer.length());
    buffer.delete(buffer.lastIndexOf(attributes.getQName(i)), buffer.length());
    while (buffer.length() < startIndexes.peek()) {
      lineNumber--; leadingSpaces.pop(); startIndexes.pop();
    }
    columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek();
    Attr attr = document.createAttribute(attributes.getQName(i));
    attr.setUserData("position", lineNumber + ":" + columnNumber, null);
    element.setAttributeNode(attr);
  }
  // recover element position
  buffer.delete(buffer.lastIndexOf(element.getTagName()), buffer.length());
  while (buffer.length() < startIndexes.peek()) {
    lineNumber--; leadingSpaces.pop(); startIndexes.pop();
  }
  columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek();
  element.setUserData("position", lineNumber + ":" + columnNumber, null);
  _stack.push(element);
}


File Analysis

Now that we have a parser that converts ZUL files into ASTs, we are ready to move on to the file analysis stage. Our ZulFileVisitor class encapsulates the AST traversal logic and delegates the responsibility of implementing specific checking mechanisms to its subclasses. This design allows lint rules to be easily created by extending the ZulFileVisitor class and overriding the visit method for the node type the lint rule needs to inspect.

Java
 
public class ZulFileVisitor {
  private Stack<Element> _currentPath = new Stack<>();

  protected void report(Node node, String message) {
    System.err.println(node.getUserData("position") + " " + message);
  }

  protected void visit(Node node) {
    if (node.getNodeType() == Node.ELEMENT_NODE) {
      Element element = (Element) node;
      _currentPath.push(element);
      visitElement(element);
      NamedNodeMap attributes = element.getAttributes();
      for (int i = 0; i < attributes.getLength(); i++)
        visitAttribute((Attr) attributes.item(i));
    }
    NodeList children = node.getChildNodes();
    for (int i = 0; i < children.getLength(); i++)
      visit(children.item(i));
    if (node.getNodeType() == Node.ELEMENT_NODE) _currentPath.pop();
  }

  protected void visitAttribute(Attr node) {}

  protected void visitElement(Element node) {}
}


Conclusion

The Benefits

For simple lint rules such as "row elements not supported," developing an XML linter may seem like an overkill when manual checks would suffice. However, as the codebase expands or the number of lint rules increases over time, the advantages of linting will quickly become noticeable compared to manual checks, which are both time-consuming and prone to human errors.

Java
 
class SimpleRule extends ZulFileVisitor {
  @Override
  protected void visitElement(Element node) {
    if ("row".equals(node.getTagName()))
      report(node, "`row` not supported");
  }
}


On the other hand, complicated rules involving ancestor elements are where XML linters truly shine. Consider a lint rule that only applies to elements inside certain ancestor elements, such as "row elements not supported outside rows elements," our linter would be able to efficiently identify the infinite number of variations that satisfy the rule, which cannot be done manually or with a simple file search.

Java
 
class ComplexRule extends ZulFileVisitor {
  @Override
  protected void visitElement(Element node) {
    if ("row".equals(node.getTagName())) {
      boolean outsideRows = getCurrentPath().stream()
        .noneMatch(element -> "rows".equals(element.getTagName()));
      if (outsideRows) report(node, "`row` not supported outside `rows`");
    }
  }
}


Now It's Your Turn

Despite XML linting not being widely adopted in the software industry, we hope our ZK Client MVVM Linter, which helps us to automate migration assessment, will be able to show the benefits of XML linting or even help you to develop your own XML linter. 

XML Java (programming language) Data migration Stack overflow Coding (social sciences) Framework

Opinions expressed by DZone contributors are their own.

Related

  • How to Convert XLS to XLSX in Java
  • Graceful Shutdown: Spring Framework vs Golang Web Services
  • Build a REST API With Just 2 Classes in Java and Quarkus
  • Thread-Safety Pitfalls in XML Processing

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!