Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Querying namespace-based XML data in C#

DZone's Guide to

Querying namespace-based XML data in C#

·
Free Resource

Working with regular XML files in C# isn’t a really complicated process, unless those XML files use namespaces and that often confuses people who just started working with XML data manipulation.

XML namespaces are introduced to avoid ambiguous node and attribute names. For example, the same node can have two types of identifiers, however each one of them can be specific to a different domain. To avoid confusion, each one of them is specified as a member of a separate namespace.

Here is a simple XML file that is using namespaces to define some of its contents:

<orders xmlns:ord="urn:id" xmlns:cust="urn:cid">
<order>
<ord:id>223948778989</ord:id>
<cust:cid>32984029472</cust:id>
</order>

<order>
<ord:id>34523234</ord:id>
<cust:cid>246542</cust:id>
</order>

<order>
<ord:id>2434234234</ord:id>
<cust:cid>67352352</cust:id>
</order>

<order>
<ord:id>56232</ord:id>
<cust:cid>352324234</cust:id>
</order>
</orders>

As you can see here, I have two ID nodes, however each one of them belongs to a separate namespace – one is for orders (ord) and one is for customers (cust).

There are two ways to handle XML data that contains elements in various namespaces – via XmlDocument (one of the most widely used methods) and via LINQ (Language Integrated Query).

Let’s start with the first method. If I want to select node that represents a specific order ID, I can use the code shown below:

XmlDocument doc = new XmlDocument();
doc.Load(@"D:\Temporary\file.xml");

XmlNamespaceManager manager = new XmlNamespaceManager(doc.NameTable);
manager.AddNamespace("ord", "urn:id");
manager.AddNamespace("cust", "urn:cid");

XmlNodeList list = doc.SelectNodes("/orders/order/ord:id[. = 34523234]", manager);

foreach (XmlNode node in list)
{
Debug.Print(node.InnerText);
}

First of all, I am creating an instance of XmlDocument and loading the file, that is set to have exactly the same contents presented above. In case the file would not have namespaces, I could instantly start with the node selection and iteration. However, there are two namespaces declared and in order to select the nodes that are members of the mentioned namespaces, there is an instance of XmlNamespaceManager declared, that contains the namespaces and the assigned paths for the document that are going to be used.

Note that when instantiating the XmlNamespaceManager, I am using the document name table as the parameter for the class instance. The name table contains the list of element and attribute names that are stored in the document, and it basically facilitates the query process through the document. You can read more about XmlDocument.NameTable property here.

Once the namespaces are declared, I can proceed with node selection. I am using a standard XPath query that will select all nodes that have the order ID equal to 34523234. Applied to an order ID, the number is unique. Therefore, in my specific case I will only get a single node as the returned result. However, for example if I would select the nodes with a specific name and the same customer made several orders, there will be more nodes in the XmlNodeList instance. Also note that when selecting the nodes, I am passing the XmlNamespaceManager instance as the second parameter. In this case, if a node that belongs to a namespace is called, it will know where exactly to look for its contents.

If you try and run the node selection without the list of namespaces, you will get an exception, since you are calling a namespace member, but the namespace URI is unknown.

The LINQ method for working with nodes that are namespace members is a little bit different:

XElement root = XElement.Load(@"D:\Temporary\file.xml");
XNamespace ord = "urn:id";

IEnumerable<XElement> list = from c in root.Descendants("order")
select c.Element(ord + "id");

foreach (XElement element in list)
{
Debug.Print(element.ToString());
}

The XML data is loaded via XElement. I am also declaring an instance of XNamespace to define the ord namespace that defines the order ID - the URI is assigned as a simple string and the prefix is not introduced. When querying for XElements that fit my criteria, I select those that are descendants of the order child node, but at the same time that belong to the ord namespace and are named id. The result printed in the output console looks like this:

<ord:id xmlns:ord="urn:id">223948778989</ord:id>
<ord:id xmlns:ord="urn:id">34523234</ord:id>
<ord:id xmlns:ord="urn:id">2434234234</ord:id>
<ord:id xmlns:ord="urn:id">56232</ord:id>

Both methods described here are pretty much performing the same task; however as a personal choice I  prefer using LINQ due to its flexibility and ease of modification for the target data query.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}