XML Parsing Using Java org.w3c.dom

DZone 's Guide to

XML Parsing Using Java org.w3c.dom

Explore XML parsing by using Java org.w3c.dom.

· Java Zone ·
Free Resource

You might often have challenging XML data to be parsed. This is an example showing you how to parse XML data efficiently using Java org.w3c.domImage title

XML Data to Be Parsed:

<?xml version="1.0"?>

Most XML data can easily be parsed through JAXB, but there are situations like the above where we need to do manual parsing. If we give the above XML data to JAXB, it's quite obvious that JAXB can not churn it around and give you a clean object mapping. This is where XML parsers like org.w3c.dom and others come into play.

In many cases, we will be parsing through some generated XML data rather than parsing XML data stored in a file. In this example, to keep it simple, we will not be reading data from the file.

First, we will get the object of DocumentBuilder using DocumentBuilderFactory. DocumentBuilderFactory first looks for implementation of DocumentBuilder and provides its object. The parse method of DocumentBuilder will provide the Document, which can finally be parsed.

String xmlRecords = "<?xml version=\"1.0\"?>\n" +
                "<COMMAND>\n" +
                "    <DATA>\n" +
                "        <TXNID>1234567891</TXNID>\n" +
                "        <TXNAMT>15.00</TXNAMT>\n" +
                "        <TXNID>1234567892</TXNID>\n" +
                "        <TXNAMT>15.00</TXNAMT>\n" +
                "        <TXNID>1234567893</TXNID>\n" +
                "        <TXNAMT>15.00</TXNAMT>\n" +
                "        <TXNID>1234567894</TXNID>\n" +
                "        <TXNAMT>15.00</TXNAMT>\n" +
                "    </DATA>\n" +

DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(data));
Document doc = documentBuilder.parse(inputSource);

Now that we have the document to be parsed inside doc, we can easily parse through the elements of XML DOM. 

Now we will get the DATA tag from the given XML data and then iterate through all the elements of DATA tag, which is a collection of TXNID and TXNAMT tags. Parsing through the elements can be done in two ways.

Way 1:

NodeList dataTag = doc.getElementsByTagName("DATA");
NodeList dataItems = dataTag.item(0).getChildNodes();
DataItem item = null;
List<DataItem> items = new LinkedList<>();
for (int j = 1; j < dataItems.getLength(); j+=2) {
if (dataItems.item(j).getNodeName().equalsIgnoreCase("TXNID")) {
    item = new DataItem();
setValue(item, dataItems.item(j).getNodeName(), dataItems.item(j).getTextContent());

Inside the for loop, we are trying the get the jth node in the list of child nodes of "DATA" tag( dataItems ). Although this will give you the result, it gets slower in the long run. The culprit is the item(j) method of NodeList.

The object of Node holds the reference to its first child node, parent nodes (referred to as owner node), previous sibling node, and next sibling node. And the object of NodeList does not store the node in a ArrayList or LinkedList. This makes the item method to use the nextSibling reference and traverse through the node list. Therefore, wherever you call the item method, it always starts from  0 and iterates through until it reaches the jth node sibling. We can solve this performance issue simply by storing the reference of the next sibling.

Way 2:

Node node = dataItems.item(1);
while (node != null) {
if (node.getNodeName().equalsIgnoreCase("TXNID")) {
    item = new DataItem();
    setValue(item, node.getNodeName(), node.getTextContent());
  // I am using getNextSibling() twice because of new line character \n
    node = node.getNextSibling().getNextSibling();

Here, instead of iterating through nodes using the item method, we are using the  nestSibling reference. If the nextSibling reference is null , it means there are no more siblings. 

Hope this article helps you code

dom parser ,java ,org.w3c.dom ,tutorial ,xml

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}