SmartXML: An Alternative to XPath for Complex XML Files

We'll discuss SmartXML, an XPath alternative for parsing complex XML files, converting them to SQL, and loading the results into a database seamlessly.

Luca Sanders

Jan. 30, 25 · Tutorial

Likes (0)

Comment

Save

360 Views

XML is one of the most widely used data formats, which in popularity can compete only with JSON. Still, very often, this format is used as an intermediate representation of data that needs to be transferred between two information systems. And like any intermediate representation the final storage point of XML is a database.

Usually, XPath is used to parse XML because it represents a set of functions that allows you to extract data from an XML tree. However, not all XML files are formed correctly, which creates great difficulties when using XPath.

Typical Problems When Working With XPath

Differences in node naming. You may have an array of documents with a similar logical structure, but they may have differences in the way node names are spelled.
Missing nodes. If bad XML generators are used on the server side, they may skip some nesting levels for some of the data in the resulting XML.
Object or array? XPath does not allow you to explicitly specify whether the contents of a particular node are an object or an array of objects.
Inability to extend syntax. XPath is just a node traversal tool with no syntax extension capability.

In this article, I will discuss a tool called SmartXML that solves these problems and allows you to upload complex XML documents to a database.

Project Structure

SmartXML uses an intermediate representation when processing data — SmartDOM. Unlike traditional DOM, this structure controls the level of element hierarchy and can complete its nodes.

SmartDOM consists of the declarative description itself and sets of rules for its transformation.

Three Examples of Documents With a Divergent Structure

Example 1

The document has a relatively correct structure. All sections have correct nesting.

    Plain Text
   
 

   <doc>
    <commonInfo>
        <supplyNumber>100480</supplyNumber>
        <supplyDate>2025-01-20</supplyDate>
    </commonInfo>
    <lots>
        <lot>
            <objects>
                <object>
                    <name>apples</name>
                    <price>3.25</price>
                    <currency>USD</currency>
                </object>
                <object>
                    <name>oranges</name>
                    <price>3.50</price>
                    <currency>USD</currency>
                </object>
            </objects>
        </lot>
        <lot>
            <objects>
                <object>
                    <name>bananas</name>
                    <price>2.50</price>
                    <currency>EUR</currency>
                </object>
                <object>
                    <name>strawberries</name>
                    <price>5.00</price>
                    <currency>USD</currency>
                </object>
                <object>
                    <name>grapes</name>
                    <price>3.75</price>
                    <currency>USD</currency>
                </object>
            </objects>
        </lot>
    </lots>
</doc>
  

Example 2

The nesting of sections is broken. The object does not have a parent.

    Plain Text
   
 

   <doc>
  <commonInfo>
      <supplyNumber>100593</supplyNumber>
      <date>2025-01-21</date>
  </commonInfo>
  <lots>
      <lot>
          <object>
              <name>raspberry</name>
              <price>7.50</price>
              <currency>USD</currency>
          </object>
      </lot>
  </lots>
</doc>
  

Example 3

The nesting is correct, but the node names do not match the other sections.

    Plain Text
   
 

   <doc>
  <commonInfo>
      <supplyNumber>100601</supplyNumber>
      <date>2025-01-22</date>
  </commonInfo>
  <lots>
      <lot>
          <objects>
              <obj>
                  <name>cherries</name>
                  <price>3.20</price>
                  <currency>EUR</currency>
              </obj>
              <obj>
                  <name>blueberries</name>
                  <price>4.50</price>
                  <currency>USD</currency>
              </obj>
              <obj>
                  <name>peaches</name>
                  <price>2.80</price>
                  <currency>USD</currency>
              </obj>
          </objects>
      </lot>
  </lots>
</doc>
  

As you can see, all three of these documents contain the same data but have different storage structures.

Intermediate Data View

The full structure of the SmartDOM view from data-templates.red:

    Plain Text
   
 

   #[
    sample: #[ ; section name
        supply_sample: #[ ; subsection name
            supply_number: none
            supply_date: none
            delivery_items: [
                item: [
                    name: none
                    price: none
                    currency: none
                ]
            ]
        ]
    ]
]
  

Project Setup

Create a project and set up a mapping between SmartDOM and XML tree nodes for each XML file.

Now, we need to specify how XML nodes are mapped to SmartDOM. This can be done either in the interface on the Rules tab or in the configuration file grow-rules.red, making it look as follows:

    Plain Text
   
   sample: [
    item: ["object" "obj"]
]

For correct linking of tables, we also need to specify the name of the tag from the root element, which should be passed to the descendant nodes. Without this, it will be impossible to link two tables.

Since we have a unique supply_number, it can be used as a similar key.

To do this, let's add it to the injection-rules.red rule:

    Plain Text
   
 

   sample: [
    inject-tag-to-every-children: [supply_number] 
    enumerate-nodes: [] 
    injection-tag-and-recipients: []
]
  

Now, it remains to create the necessary tables in the database and insert there the results of processing XML files:

    Plain Text
   
 

   PRAGMA foreign_keys = ON;

CREATE TABLE supply_sample (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    supply_number TEXT NOT NULL UNIQUE,
    supply_date TEXT NOT NULL
);

CREATE TABLE delivery_items (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    supply_number TEXT NOT NULL,
    name TEXT NOT NULL,
    price REAL NOT NULL,
    currency TEXT NOT NULL,
    FOREIGN KEY (supply_number) REFERENCES supply_sample(supply_number)
);
  

Result

The result of converting three XML files to SQL:

    Plain Text
   
 

   INSERT INTO supply_sample ("supply_number", "supply_date")
 VALUES ('100480', '2025-01-20');

INSERT INTO delivery_items ("supply_number", "name", "price", "currency")
 VALUES ('100480', 'apples', '3.25', 'USD'),
        ('100480', 'oranges', '3.50', 'USD'),
        ('100480', 'bananas', '2.50', 'EUR'),
        ('100480', 'strawberries', '5.00', 'USD'),
        ('100480', 'grapes', '3.75', 'USD');

--

INSERT INTO supply_sample ("supply_number", "supply_date")
 VALUES ('100593', '2025-01-21');

INSERT INTO delivery_items ("supply_number", "name", "price", "currency")
 VALUES ('100593', 'raspberry', '7.50', 'USD');

--

INSERT INTO supply_sample ("supply_number", "supply_date")
 VALUES ('100601', '2025-01-22');

INSERT INTO delivery_items ("supply_number", "name", "price", "currency")
 VALUES ('100601', 'cherries', '3.20', 'EUR'),
        ('100601', 'blueberries', '4.50', 'USD'),
        ('100601', 'peaches', '2.80', 'USD');
  

This is what the result looks like in tabular form:

Conclusion

So, we have demonstrated how you can parse and upload to a database quite complex XML files without writing program code. This solution can be useful for system analysts, as well as other people who often work with XML.

In the case of parsing using popular programming languages such as Python, we would have to process each separate file with a separate script, which would require more code and time.

You can learn more about the SmartXML project structure in the official documentation.

Data structure XML XPath Object (computer science) Processing

Opinions expressed by DZone contributors are their own.

Related

Trending