Automate Migration Assessment With XML Linter

Discover how ZK Team developed an XML linter to detect potential compatibility issues in existing codebases to assist migration decision-making.

Rebecca Lai

Aug. 25, 23 · Tutorial

Likes (11)

Comment

Save

11.3K Views

When people think of linting, the first thing that comes to mind is usually static code analysis for programming languages, but rarely for markup languages.

In this article, I would like to share how our team developed ZK Client MVVM Linter, an XML linter that automates migration assessment for our new Client MVVM feature in the upcoming ZK 10 release. The basic idea is to compile a catalog of known compatibility issues as lint rules to allow users to assess the potential issues flagged by the linter before committing to the migration.

For those unfamiliar with ZK, ZK is a Java framework for building enterprise applications; ZUL (ZK User Interface Markup Language) is its XML-based language for simplifying user interface creation. Through sharing our experience developing ZK Client MVVM Linter, we hope XML linters can find broader applications.

File Parsing

The Problem

Like other popular linters, our ZUL linter starts by parsing source code into AST (abstract syntax tree). Although Java provides several libraries for XML parsing, they lose the original line and column numbers of elements in the parsing process. As the subsequent analysis stage will need this positional information to report compatibility issues precisely, our first task is to find a way to obtain and store the original line and column numbers in AST.

How We Address This

After exploring different online sources, we found a Stack Overflow solution that leverages the event-driven property of SAX Parser to store the end position of each start tag in AST. Its key observation was that the parser invokes the startElement method whenever it encounters the ending ‘>’ character. Therefore, the parser position returned by the locator must be equivalent to the end position of the start tag, making the startElement method the perfect opportunity for creating new AST nodes and storing their end positions.

    Java
   
 

   public static Document parse(File file) throws Exception {
  Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
  SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
  parser.parse(file, new DefaultHandler() {
    private Locator _locator;
    private final Stack<Node> _stack = new Stack<>();

    @Override
    public void setDocumentLocator(Locator locator) {
      _locator = locator;
      _stack.push(document);
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) {
      // Create a new AST node
      Element element = document.createElement(qName);
      for (int i = 0; i < attributes.getLength(); i++)
        element.setAttribute(attributes.getQName(i), attributes.getValue(i));
      // Store its end position
      int lineNumber = _locator.getLineNumber(), columnNumber = _locator.getColumnNumber();
      element.setUserData("position", lineNumber + ":" + columnNumber, null);
      _stack.push(element);
    }

    @Override
    public void endElement(String uri, String localName, String qName) {
      Node element = _stack.pop();
      _stack.peek().appendChild(element);
    }
  });
  return document;
}
  

Building on the solution above, we implemented a more sophisticated parser capable of storing the position of each attribute. Our parser uses the end positions returned by the locator as reference points to reduce the task into finding attribute positions relative to the end position. Initially, we started with a simple idea of iteratively finding and removing the last occurrence of each attribute-value pair from the buffer. For example, if <elem attr1="value" attr2="value"> ends at 3:34 (line 3: column 34), our parser will perform the following steps:

    Plain Text
   
 

   Initialize buffer = <elem attr1="value" attr2="value">
Find buffer.lastIndexOf("value") = 28 → Update buffer = <elem attr1="value" attr2="
Find buffer.lastIndexOf("attr2") = 21 → Update buffer = <elem attr1="value"
Find buffer.lastIndexOf("value") = 14 → Update buffer = <elem attr1="
Find buffer.lastIndexOf("attr1") =  7 → Update buffer = <elem
From steps 3 and 6, we can conclude that attr1 and attr2 start at 3:7 and 3:21, respectively.
  

Then, we further improved the mechanism to handle other formatting variations, such as a single start tag across multiple lines and multiple start tags on a single line, by introducing the start index and leading space stack to store the buffer indices where new lines start and the number of leading spaces of each line. For example, if there is a start tag that starts from line 1 and ends at 3:20 (line 3: column 20):

    XML
   
   <elem attr1="value
    across 2 lines"
    attr2 = "value">

Our parser will perform the following steps:

    Plain Text
   
 

   Initialize buffer = <elem attr1="value across 2 lines" attr2 = "value">
Initialize startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
Find buffer.lastIndexOf("value") = 45
Find buffer.lastIndexOf("attr2") = 36
 → lineNumber = 3, startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
 → columnNumber = 36 - startIndexes.peek() + leadingSpaces.peek() = 5
Find buffer.lastIndexOf("value across 2 lines") = 14
Find buffer.lastIndexOf("attr1") =  7
 → Update lineNumber = 1, startIndexes = [0], and leadingSpaces = [0]
 → columnNumber =  7 - startIndexes.peek() + leadingSpaces.peek() = 7
From steps 4 and 8, we can conclude that attr1 and attr2 start at 1:7 and 3:5, respectively.
  

As a result of the code provided below:

    Java
   
 

   public void startElement(String uri, String localName, String qName, Attributes attributes) {
  // initialize buffer, startIndexes, and leadingSpaces
  int endLineNumber = _locator.getLineNumber(), endColNumber = _locator.getColumnNumber();
  for (int i = 0; _readerLineNumber <= endLineNumber; i++, _readerLineNumber++) {
    startIndexes.push(buffer.length());
    if (i > 0) _readerCurrentLine = _reader.readLine();
    buffer.append(' ').append((_readerLineNumber < endLineNumber ? _readerCurrentLine :
            _readerCurrentLine.substring(0, endColNumber - 1)).stripLeading());
    leadingSpaces.push(countLeadingSpaces(_readerCurrentLine));
  }
  _readerLineNumber--;
  // recover attribute positions
  int lineNumber = endLineNumber, columnNumber;
  Element element = document.createElement(qName);
  for (int i = attributes.getLength() - 1; i >= 0; i--) {
    String[] words = attributes.getValue(i).split("\\s+");
    for (int j = words.length - 1; j >= 0; j--)
      buffer.delete(buffer.lastIndexOf(words[j]), buffer.length());
    buffer.delete(buffer.lastIndexOf(attributes.getQName(i)), buffer.length());
    while (buffer.length() < startIndexes.peek()) {
      lineNumber--; leadingSpaces.pop(); startIndexes.pop();
    }
    columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek();
    Attr attr = document.createAttribute(attributes.getQName(i));
    attr.setUserData("position", lineNumber + ":" + columnNumber, null);
    element.setAttributeNode(attr);
  }
  // recover element position
  buffer.delete(buffer.lastIndexOf(element.getTagName()), buffer.length());
  while (buffer.length() < startIndexes.peek()) {
    lineNumber--; leadingSpaces.pop(); startIndexes.pop();
  }
  columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek();
  element.setUserData("position", lineNumber + ":" + columnNumber, null);
  _stack.push(element);
}
  

File Analysis

Now that we have a parser that converts ZUL files into ASTs, we are ready to move on to the file analysis stage. Our ZulFileVisitor class encapsulates the AST traversal logic and delegates the responsibility of implementing specific checking mechanisms to its subclasses. This design allows lint rules to be easily created by extending the ZulFileVisitor class and overriding the visit method for the node type the lint rule needs to inspect.

    Java
   
 

   public class ZulFileVisitor {
  private Stack<Element> _currentPath = new Stack<>();

  protected void report(Node node, String message) {
    System.err.println(node.getUserData("position") + " " + message);
  }

  protected void visit(Node node) {
    if (node.getNodeType() == Node.ELEMENT_NODE) {
      Element element = (Element) node;
      _currentPath.push(element);
      visitElement(element);
      NamedNodeMap attributes = element.getAttributes();
      for (int i = 0; i < attributes.getLength(); i++)
        visitAttribute((Attr) attributes.item(i));
    }
    NodeList children = node.getChildNodes();
    for (int i = 0; i < children.getLength(); i++)
      visit(children.item(i));
    if (node.getNodeType() == Node.ELEMENT_NODE) _currentPath.pop();
  }

  protected void visitAttribute(Attr node) {}

  protected void visitElement(Element node) {}
}
  

Conclusion

The Benefits

For simple lint rules such as "row elements not supported," developing an XML linter may seem like an overkill when manual checks would suffice. However, as the codebase expands or the number of lint rules increases over time, the advantages of linting will quickly become noticeable compared to manual checks, which are both time-consuming and prone to human errors.

    Java
   
 

   class SimpleRule extends ZulFileVisitor {
  @Override
  protected void visitElement(Element node) {
    if ("row".equals(node.getTagName()))
      report(node, "`row` not supported");
  }
}
  

On the other hand, complicated rules involving ancestor elements are where XML linters truly shine. Consider a lint rule that only applies to elements inside certain ancestor elements, such as "row elements not supported outside rows elements," our linter would be able to efficiently identify the infinite number of variations that satisfy the rule, which cannot be done manually or with a simple file search.

    Java
   
 

   class ComplexRule extends ZulFileVisitor {
  @Override
  protected void visitElement(Element node) {
    if ("row".equals(node.getTagName())) {
      boolean outsideRows = getCurrentPath().stream()
        .noneMatch(element -> "rows".equals(element.getTagName()));
      if (outsideRows) report(node, "`row` not supported outside `rows`");
    }
  }
}
  

Now It's Your Turn

Despite XML linting not being widely adopted in the software industry, we hope our ZK Client MVVM Linter, which helps us to automate migration assessment, will be able to show the benefits of XML linting or even help you to develop your own XML linter.

XML Java (programming language) Data migration Stack overflow Coding (social sciences) Framework

Opinions expressed by DZone contributors are their own.

Related

Trending