Over a million developers have joined DZone.

Gemli - Do We Need An XPath-to-Object Mapper?

DZone's Guide to

Gemli - Do We Need An XPath-to-Object Mapper?

Free Resource

It dawned on me today that two fundamental problems exist in my day-to-day (and year-to-year) programming activities that haven’t been fairly addressed with all of our innovations in language and tooling improvements, and they are a) the absence of a flexible, late XML-to-object-graph binding mechanism, and b) the absence of an XPath-like query mechanism for CLR object graphs.

Right now I want to focus on a) the absence of a flexible, late XML-to-object graph binding mechanism.

Microsoft.NET inherently supports XML serialization and deserialization of object graphs, but what I’ve found so far is that

  1. the mappings are defined in explicit C# code (with attributes) at compile-time,
  2. class members must be sibling elements or attributes of their parent and cannot be “relational” cousins,
  3. XML trees must have a fixed hierarchy pattern or else manual serialization/deserialization code directly inside the class is required,
  4. in deserialization, serialized content cannot be XmlElement objects, and self-sufficiently XML-serialized objects often cannot be appended as children to other XML trees because of XML/XSD namespace declarations,
  5. extra or missing CLR members, or extra or missing XML members, do not always get handled gracefully when values are mapped at runtime.

Let’s take a look at each of these problems.

The first problem is simple. You add [Serializable] to your objects, and you express how the members align with the XML using [XmlElement], etc. There’s nothing much wrong with this, I don’t mind this, but because of 2) above, it’s pretty inflexible. For one thing, you cannot define BookName’s XML mapping with, say, [XmlElement(“../../Books/Book[@ID={BookId}]/Name”)], where {BookId} maps to the C# sibling member BookId, as a way of saying “map this member to cousin ‘Name’ in that other branch under Books”. The other issue with the first problem is that you cannot late-define a mapping of XML to an object without manual reflection and manual population of data. What if you have a POCO object that was never intended to be serialized?

(I already went over the second problem, as it aligns with the first problem.)

The third problem is that if you have “nephew” nodes that map as siblings, you can accomplish this, but not without manually implementing IXmlSerializable. I went down this path for Gemli.Data 0.3 in order to load up the O/RM mappings from XML, and I must say I am floored by how little confidence I have in my implementation. You’re literally poking at stream data with IXmlSerializable, not working with logical constructs at all, and I find that to be a thoroughly unacceptable way to deal with XML-to-object mapping considering the fact that XML and CLR object graphs are both expressive of their structures and hierarchies. Perhaps I am spoiled by the W3C XML DOM (System.Xml.XmlDocument), which is not used in XML deserialization, and rightly so, I suppose, since object-mappable XML streams can be huge (gigabytes) when expressed from an XML stream. We are left with few options, though, in the IXmlSerializable interface declaration.

The fourth problem goes something like this. Say that you XML-serialized an object, with rich XSL namespace semantics. Then, you created an XmlDocument that contains a bunch of custom XML markup. You cannot simply drop your XML-serialized object into the XmlDocument as a child element because your custom XML declares namespaces in its header. Runtime errors result because either the namespaces of your object are not understood (are not declared) or your object’s XML is declaring the namespaces and being put inside the body of the XML which is invalid. This is a fundamental problem with currently used XML/XSL to begin with, actually, that you cannot have nested namespace declarations inside the XML body, but this may or may not have already been addressed by the W3C with XML/XSL revisions, I don’t know.

On the fifth problem, where missing or extra members are not handled gracefully, this is actually an implementation detail and, truthfully, it’s not normally a problem because XSD and .NET attributes can accomodate (per generated xsd.exe output). However, combined with resolutions to the previously mentioned problems, this is something to keep in mind. Sometimes the problem goes the other way around, actually. Sometimes you need exceptions thrown if, for example, an XML element is missing. XSD and the mapping attributes in .NET define that, too. However, as far as I know there is no way to establish late bindings where such mapping behaviors can be defined at runtime.

Is There A Solution?

Actually, I’m not sure that there is a solution to these problems, or even a set of solutions that work together to address these problems. However, I am pondering on whether or not it would be worth my time to start working on some tooling in Gemli to make Object/XML Mapping easier.

The scenario that had me pondering such tooling goes something like this. Suppose you have an XML node with hierarchy described in XML that maps to an object graph, and you aren’t sure exactly just yet how the XML hierarchy or naming convention is going to flow, i.e. whether it will be a child of this element or a child of that element or if it will be at the root of the document. Let’s say that you want to express, in a hand-crafted, initially XSL-less XML structure, some kind of definition of an object graph and the values of the members therein. Let’s say that you already have some POCO objects that can accomodate the hierarchical structure of such a described XML tree. At this juncture, what tools are available for you to use to perform said mapping? Without going into your POCO code and manually declaring the XML deserialization behaviors, there is no straightforward way of doing this.

Tooling I might work into Gemli, then, might work like this.

First, consider the following XML and POCO sample:

# <myBigFatXmlDocument>  
<event name="Big Spectacular Event">
<dateTime>01/01/2011 12:01 AM</dateTime>
// POCO!  

public class PocoEvent
public string Name { get; set; }
public int AttendeeCount { get; set; }
public List<string> Attendees { get; set; }

By the way, I realize that it seems like a stupid design to have AttendeeCount be something other than a lookup of Attendees.Count, but bear with me here, I’m demonstrating something. Besides, what if you know that you have more attendees than you know the names of from the XML?

Second, I might cobble together a rough mapping document like this (this is NOT at all a proper definition proposal, it’s more of a “pseudo-code XML document” to get the point across):

<?xml version="1.0"?>  
<clrType name="MyProject.MyPocoType" xmlElement="event">
<clrMember name="Name" xpath="@name" />
<clrMember name="AttendeeCount" xpath="count(.//attendee)" />
<clrMember name="Attendees" xpath="./attendees/attendee/name"
clrType="string" />

Third, the C# code that performs the loading might appear this way:

// get node of first event in the list  
var eventXmlNode = myXmlDocument.SelectSingleNode("//events/event[0]");

// deserialization setup
var graphDeserializer = new Gemli.Xml.ObjectGraphSerializer<PocoEvent>();
XmlNode graphMap = new XmlDocument();
// drill-down; just use the XmlNode I need for the job
graphMap = graphMap.SelectSingleNode("//clrType[@name=\"MyProject.MyPocoType\"]");

// work is done here
PocoEvent objectGraph = graphDeserializer.Deserialize(eventXmlNode, graphMap);

What I have done, assuming that such tooling was implemented, was

  1. declare an XML node that defines the data I want to be loaded into an object graph,
  2. declare an XPath-to-object mapping strategy in ridiculously easy to read semantics, and
  3. load my POCO object graph without any manual declaration of XML serialization behavior outside of that simple XML mapping file.
  4. I also demonstrated, albeit poorly, how I used an XPath function to define an initial value of AttendeeCount, which can be manipulated in isolation from Attendees.Count (just assume this was a requirement).

It’s still not succinct enough. I could have just as well set up the API to take an XPath selection in .Deserialize() so that I wouldn’t have to call .SelectSingleNode(). For that matter, perhaps I could have written this block of code as succinct as this, using a couple do-it-all methods:

PocoEvent objectGraph = new Gemli.Xml.GraphSerializer<PocoEvent>(".\\myObjectGraphMap.xml")  
.Deserialize(".\\myXmlFile.xml", "//events/event[0]");

It could very well be that something in .NET, perhaps even LINQ-to-XML, supports this level of simplicity in the tooling of XML-to-object mapping. But I highly doubt it.

If someone out there reading this knows of some tooling that does this already, let me know. Otherwise, I am likely to get started on this in Gemli.



Published at DZone with permission of Jon Davis, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}