XML Parsing Using Java org.w3c.dom
Explore XML parsing by using Java org.w3c.dom.
Join the DZone community and get the full member experience.
Join For FreeYou might often have challenging XML data to be parsed. This is an example showing you how to parse XML data efficiently using Java org.w3c.dom.
XML Data to Be Parsed:
<?xml version="1.0"?>
<COMMAND>
<DATA>
<TXNID>1234567891</TXNID>
<TXNAMT>15.00</TXNAMT>
<TXNID>1234567892</TXNID>
<TXNAMT>15.00</TXNAMT>
<TXNID>1234567893</TXNID>
<TXNAMT>15.00</TXNAMT>
<TXNID>1234567894</TXNID>
<TXNAMT>15.00</TXNAMT>
</DATA>
</COMMAND>
Most XML data can easily be parsed through JAXB, but there are situations like the above where we need to do manual parsing. If we give the above XML data to JAXB, it's quite obvious that JAXB can not churn it around and give you a clean object mapping. This is where XML parsers like org.w3c.dom and others come into play.
In many cases, we will be parsing through some generated XML data rather than parsing XML data stored in a file. In this example, to keep it simple, we will not be reading data from the file.
First, we will get the object of DocumentBuilder using DocumentBuilderFactory. DocumentBuilderFactory first looks for implementation of DocumentBuilder and provides its object. The parse method of DocumentBuilder will provide the Document, which can finally be parsed.
String xmlRecords = "<?xml version=\"1.0\"?>\n" +
"<COMMAND>\n" +
" <DATA>\n" +
" <TXNID>1234567891</TXNID>\n" +
" <TXNAMT>15.00</TXNAMT>\n" +
" <TXNID>1234567892</TXNID>\n" +
" <TXNAMT>15.00</TXNAMT>\n" +
" <TXNID>1234567893</TXNID>\n" +
" <TXNAMT>15.00</TXNAMT>\n" +
" <TXNID>1234567894</TXNID>\n" +
" <TXNAMT>15.00</TXNAMT>\n" +
" </DATA>\n" +
"</COMMAND>";
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(data));
Document doc = documentBuilder.parse(inputSource);
Now that we have the document to be parsed inside doc, we can easily parse through the elements of XML DOM.
Now we will get the DATA tag from the given XML data and then iterate through all the elements of DATA tag, which is a collection of TXNID and TXNAMT tags. Parsing through the elements can be done in two ways.
Way 1:
NodeList dataTag = doc.getElementsByTagName("DATA");
NodeList dataItems = dataTag.item(0).getChildNodes();
DataItem item = null;
List<DataItem> items = new LinkedList<>();
for (int j = 1; j < dataItems.getLength(); j+=2) {
if (dataItems.item(j).getNodeName().equalsIgnoreCase("TXNID")) {
item = new DataItem();
items.add(item);
}
setValue(item, dataItems.item(j).getNodeName(), dataItems.item(j).getTextContent());
}
Inside the for
loop, we are trying the get the jth
node in the list of child nodes of "DATA" tag( dataItems
). Although this will give you the result, it gets slower in the long run. The culprit is the item(j)
method of NodeList.
The object of Node holds the reference to its first child node, parent nodes (referred to as owner node), previous sibling node, and next sibling node. And the object of NodeList does not store the node in a ArrayList or LinkedList. This makes the item
method to use the nextSibling
reference and traverse through the node list. Therefore, wherever you call the item
method, it always starts from 0
and iterates through until it reaches the jth
node sibling. We can solve this performance issue simply by storing the reference of the next sibling.
Way 2:
Node node = dataItems.item(1);
while (node != null) {
if (node.getNodeName().equalsIgnoreCase("TXNID")) {
item = new DataItem();
items.add(item);
}
setValue(item, node.getNodeName(), node.getTextContent());
// I am using getNextSibling() twice because of new line character \n
node = node.getNextSibling().getNextSibling();
}
Here, instead of iterating through nodes using the item
method, we are using the nestSibling
reference. If the nextSibling
reference is null
, it means there are no more siblings.
Hope this article helps you code.
Opinions expressed by DZone contributors are their own.
Trending
-
How To Scan and Validate Image Uploads in Java
-
Java String Templates Today
-
How To Use the Node Docker Official Image
-
Payments Architecture - An Introduction
Comments