DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Speeding Up Large Collections Processing in Java
  • Java String: A Complete Guide With Examples
  • Architecting and Building LLM-Powered Generative AI Applications
  • Generics in Java and Their Implementation

Trending

  • Unlocking the Benefits of a Private API in AWS API Gateway
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  • Unlocking AI Coding Assistants Part 3: Generating Diagrams, Open API Specs, And Test Data
  • Integrating Security as Code: A Necessity for DevSecOps
  1. DZone
  2. Coding
  3. Languages
  4. SmartXML: An Alternative to XPath for Complex XML Files

SmartXML: An Alternative to XPath for Complex XML Files

We'll discuss SmartXML, an XPath alternative for parsing complex XML files, converting them to SQL, and loading the results into a database seamlessly.

By 
Luca Sanders user avatar
Luca Sanders
·
Jan. 30, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
3.2K Views

Join the DZone community and get the full member experience.

Join For Free

XML is one of the most widely used data formats, which in popularity can compete only with JSON. Still, very often, this format is used as an intermediate representation of data that needs to be transferred between two information systems. And like any intermediate representation the final storage point of XML is a database.

Usually, XPath is used to parse XML because it represents a set of functions that allows you to extract data from an XML tree. However, not all XML files are formed correctly, which creates great difficulties when using XPath.

Typical Problems When Working With XPath

  • Differences in node naming. You may have an array of documents with a similar logical structure, but they may have differences in the way node names are spelled.
  • Missing nodes. If bad XML generators are used on the server side, they may skip some nesting levels for some of the data in the resulting XML.
  • Object or array? XPath does not allow you to explicitly specify whether the contents of a particular node are an object or an array of objects.
  • Inability to extend syntax. XPath is just a node traversal tool with no syntax extension capability.

In this article, I will discuss a tool called SmartXML that solves these problems and allows you to upload complex XML documents to a database.

Project Structure

SmartXML uses an intermediate representation when processing data — SmartDOM. Unlike traditional DOM, this structure controls the level of element hierarchy and can complete its nodes.

SmartDOM consists of the declarative description itself and sets of rules for its transformation.

Three Examples of Documents With a Divergent Structure

Example 1

The document has a relatively correct structure. All sections have correct nesting.

Plain Text
 
<doc>
    <commonInfo>
        <supplyNumber>100480</supplyNumber>
        <supplyDate>2025-01-20</supplyDate>
    </commonInfo>
    <lots>
        <lot>
            <objects>
                <object>
                    <name>apples</name>
                    <price>3.25</price>
                    <currency>USD</currency>
                </object>
                <object>
                    <name>oranges</name>
                    <price>3.50</price>
                    <currency>USD</currency>
                </object>
            </objects>
        </lot>
        <lot>
            <objects>
                <object>
                    <name>bananas</name>
                    <price>2.50</price>
                    <currency>EUR</currency>
                </object>
                <object>
                    <name>strawberries</name>
                    <price>5.00</price>
                    <currency>USD</currency>
                </object>
                <object>
                    <name>grapes</name>
                    <price>3.75</price>
                    <currency>USD</currency>
                </object>
            </objects>
        </lot>
    </lots>
</doc>


Example 2

The nesting of sections is broken. The object does not have a parent.

Plain Text
 
<doc>
  <commonInfo>
      <supplyNumber>100593</supplyNumber>
      <date>2025-01-21</date>
  </commonInfo>
  <lots>
      <lot>
          <object>
              <name>raspberry</name>
              <price>7.50</price>
              <currency>USD</currency>
          </object>
      </lot>
  </lots>
</doc>


Example 3

The nesting is correct, but the node names do not match the other sections.

Plain Text
 
<doc>
  <commonInfo>
      <supplyNumber>100601</supplyNumber>
      <date>2025-01-22</date>
  </commonInfo>
  <lots>
      <lot>
          <objects>
              <obj>
                  <name>cherries</name>
                  <price>3.20</price>
                  <currency>EUR</currency>
              </obj>
              <obj>
                  <name>blueberries</name>
                  <price>4.50</price>
                  <currency>USD</currency>
              </obj>
              <obj>
                  <name>peaches</name>
                  <price>2.80</price>
                  <currency>USD</currency>
              </obj>
          </objects>
      </lot>
  </lots>
</doc>


As you can see, all three of these documents contain the same data but have different storage structures.

Intermediate Data View

The full structure of the SmartDOM view from data-templates.red:

Plain Text
 
#[
    sample: #[ ; section name
        supply_sample: #[ ; subsection name
            supply_number: none
            supply_date: none
            delivery_items: [
                item: [
                    name: none
                    price: none
                    currency: none
                ]
            ]
        ]
    ]
]


Project Setup

Create a project and set up a mapping between SmartDOM and XML tree nodes for each XML file.

Project setup

Now, we need to specify how XML nodes are mapped to SmartDOM. This can be done either in the interface on the Rules tab or in the configuration file grow-rules.red, making it look as follows:

Plain Text
 
sample: [
    item: ["object" "obj"]
]


For correct linking of tables, we also need to specify the name of the tag from the root element, which should be passed to the descendant nodes. Without this, it will be impossible to link two tables.

Since we have a unique supply_number, it can be used as a similar key.

To do this, let's add it to the injection-rules.red rule:

Plain Text
 
sample: [
    inject-tag-to-every-children: [supply_number] 
    enumerate-nodes: [] 
    injection-tag-and-recipients: []
]


Now, it remains to create the necessary tables in the database and insert there the results of processing XML files:

Plain Text
 
PRAGMA foreign_keys = ON;

CREATE TABLE supply_sample (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    supply_number TEXT NOT NULL UNIQUE,
    supply_date TEXT NOT NULL
);

CREATE TABLE delivery_items (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    supply_number TEXT NOT NULL,
    name TEXT NOT NULL,
    price REAL NOT NULL,
    currency TEXT NOT NULL,
    FOREIGN KEY (supply_number) REFERENCES supply_sample(supply_number)
);


Output

Result

The result of converting three XML files to SQL:

Plain Text
 
INSERT INTO supply_sample ("supply_number", "supply_date")
 VALUES ('100480', '2025-01-20');

INSERT INTO delivery_items ("supply_number", "name", "price", "currency")
 VALUES ('100480', 'apples', '3.25', 'USD'),
        ('100480', 'oranges', '3.50', 'USD'),
        ('100480', 'bananas', '2.50', 'EUR'),
        ('100480', 'strawberries', '5.00', 'USD'),
        ('100480', 'grapes', '3.75', 'USD');

--

INSERT INTO supply_sample ("supply_number", "supply_date")
 VALUES ('100593', '2025-01-21');

INSERT INTO delivery_items ("supply_number", "name", "price", "currency")
 VALUES ('100593', 'raspberry', '7.50', 'USD');

--

INSERT INTO supply_sample ("supply_number", "supply_date")
 VALUES ('100601', '2025-01-22');

INSERT INTO delivery_items ("supply_number", "name", "price", "currency")
 VALUES ('100601', 'cherries', '3.20', 'EUR'),
        ('100601', 'blueberries', '4.50', 'USD'),
        ('100601', 'peaches', '2.80', 'USD');


This is what the result looks like in tabular form:

Results in tabular form

Conclusion

So, we have demonstrated how you can parse and upload to a database quite complex XML files without writing program code. This solution can be useful for system analysts, as well as other people who often work with XML.

In the case of parsing using popular programming languages such as Python, we would have to process each separate file with a separate script, which would require more code and time.

You can learn more about the SmartXML project structure in the official documentation.

Data structure XML XPath Object (computer science) Processing

Opinions expressed by DZone contributors are their own.

Related

  • Speeding Up Large Collections Processing in Java
  • Java String: A Complete Guide With Examples
  • Architecting and Building LLM-Powered Generative AI Applications
  • Generics in Java and Their Implementation

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!