DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Validate XML Request Against XML Schema in Mule 4
  • Mixing SQL and NoSQL With MariaDB and MongoDB
  • Generating MongoDB Annotations for Java POJOs from JSON Schema Using the JSONSchema2Pojo Plugin
  • How to Convert JSON to XML or XML to JSON in Java

Trending

  • Next Evolution in Integration: Architecting With Intent Using Model Context Protocol
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 2
  • GitHub Copilot's New AI Coding Agent Saves Developers Time – And Requires Their Oversight
  • AWS to Azure Migration: A Cloudy Journey of Challenges and Triumphs
  1. DZone
  2. Coding
  3. Languages
  4. Loading XML into MongoDB

Loading XML into MongoDB

Learn how to export XML data to MongoDB using SmartXML ETL tools, simplifying the process and ensuring efficient data handling and storage.

By 
Luca Sanders user avatar
Luca Sanders
·
Feb. 12, 25 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
3.9K Views

Join the DZone community and get the full member experience.

Join For Free

There are many situations where you may need to export data from XML to MongoDB.

Despite the fact that XML and JSON(B) formats used in MongoDB have much in common, they also have a number of differences that make them non-interchangeable.

Therefore, before you face the task of exporting data from XML to MongoDB, you will need to:

  1. Write your own XML parsing scripts;
  2. Use ETL tools.


Although modern language models can write parsing scripts quite well in languages like Python, these scripts will have a serious problem — they won't be unified. For each file type, modern language models will generate a separate script. If you have more than one type of XML, this already creates significant problems in maintaining more than one parsing script.

The above problem is usually solved using specialized ETL tools. In this article, we will look at an ETL tool called SmartXML. Although SmartXML also supports converting XML to a relational representation we will only look at the process of uploading XML into MongoDB.  

The actual XML can be extremely large and complex. This article is an introductory article, so we will dissect a situation in which:

  1. All XML has the same structure;
  2. The logical model of the XML is the same as the storage model in MongoDB;
  3. Extracted fields don't need complex processing;

We'll cover those cases later, but first, let's examine a simple example:

XML
 
<marketingData>
    <customer>
        <name>John Smith</name>
        <email>john.smith@example.com</email>
        <purchases>
            <purchase>
                <product>Smartphone</product>
                <category>Electronics</category>
                <price>700</price>
                <store>TechWorld</store>
                <location>New York</location>
                <purchaseDate>2025-01-10</purchaseDate>
            </purchase>
            <purchase>
                <product>Wireless Earbuds</product>
                <category>Audio</category>
                <price>150</price>
                <store>GadgetStore</store>
                <location>New York</location>
                <purchaseDate>2025-01-11</purchaseDate>
            </purchase>
        </purchases>
        <importantInfo>
            <loyaltyStatus>Gold</loyaltyStatus>
            <age>34</age>
            <gender>Male</gender>
            <membershipID>123456</membershipID>
        </importantInfo>
        <lessImportantInfo>
            <browser>Chrome</browser>
            <deviceType>Mobile</deviceType>
            <newsletterSubscribed>true</newsletterSubscribed>
        </lessImportantInfo>
    </customer>
    <customer>
        <name>Jane Doe</name>
        <email>jane.doe@example.com</email>
        <purchases>
            <purchase>
                <product>Laptop</product>
                <category>Electronics</category>
                <price>1200</price>
                <store>GadgetStore</store>
                <location>San Francisco</location>
                <purchaseDate>2025-01-12</purchaseDate>
            </purchase>
            <purchase>
                <product>USB-C Adapter</product>
                <category>Accessories</category>
                <price>30</price>
                <store>TechWorld</store>
                <location>San Francisco</location>
                <purchaseDate>2025-01-13</purchaseDate>
            </purchase>
            <purchase>
                <product>Keyboard</product>
                <category>Accessories</category>
                <price>80</price>
                <store>OfficeMart</store>
                <location>San Francisco</location>
                <purchaseDate>2025-01-14</purchaseDate>
            </purchase>
        </purchases>
        <importantInfo>
            <loyaltyStatus>Silver</loyaltyStatus>
            <age>28</age>
            <gender>Female</gender>
            <membershipID>654321</membershipID>
        </importantInfo>
        <lessImportantInfo>
            <browser>Safari</browser>
            <deviceType>Desktop</deviceType>
            <newsletterSubscribed>false</newsletterSubscribed>
        </lessImportantInfo>
    </customer>
    <customer>
        <name>Michael Johnson</name>
        <email>michael.johnson@example.com</email>
        <purchases>
            <purchase>
                <product>Headphones</product>
                <category>Audio</category>
                <price>150</price>
                <store>AudioZone</store>
                <location>Chicago</location>
                <purchaseDate>2025-01-05</purchaseDate>
            </purchase>
        </purchases>
        <importantInfo>
            <loyaltyStatus>Bronze</loyaltyStatus>
            <age>40</age>
            <gender>Male</gender>
            <membershipID>789012</membershipID>
        </importantInfo>
        <lessImportantInfo>
            <browser>Firefox</browser>
            <deviceType>Tablet</deviceType>
            <newsletterSubscribed>true</newsletterSubscribed>
        </lessImportantInfo>
    </customer>
    <customer>
        <name>Emily Davis</name>
        <email>emily.davis@example.com</email>
        <purchases>
            <purchase>
                <product>Running Shoes</product>
                <category>Sportswear</category>
                <price>120</price>
                <store>FitShop</store>
                <location>Los Angeles</location>
                <purchaseDate>2025-01-08</purchaseDate>
            </purchase>
            <purchase>
                <product>Yoga Mat</product>
                <category>Sportswear</category>
                <price>40</price>
                <store>FitShop</store>
                <location>Los Angeles</location>
                <purchaseDate>2025-01-09</purchaseDate>
            </purchase>
        </purchases>
        <importantInfo>
            <loyaltyStatus>Gold</loyaltyStatus>
            <age>25</age>
            <gender>Female</gender>
            <membershipID>234567</membershipID>
        </importantInfo>
        <lessImportantInfo>
            <browser>Edge</browser>
            <deviceType>Mobile</deviceType>
            <newsletterSubscribed>false</newsletterSubscribed>
        </lessImportantInfo>
    </customer>
    <customer>
        <name>Robert Brown</name>
        <email>robert.brown@example.com</email>
        <purchases>
            <purchase>
                <product>Smartwatch</product>
                <category>Wearable</category>
                <price>250</price>
                <store>GadgetPlanet</store>
                <location>Boston</location>
                <purchaseDate>2025-01-07</purchaseDate>
            </purchase>
            <purchase>
                <product>Fitness Band</product>
                <category>Wearable</category>
                <price>100</price>
                <store>HealthMart</store>
                <location>Boston</location>
                <purchaseDate>2025-01-08</purchaseDate>
            </purchase>
        </purchases>
        <importantInfo>
            <loyaltyStatus>Silver</loyaltyStatus>
            <age>37</age>
            <gender>Male</gender>
            <membershipID>345678</membershipID>
        </importantInfo>
        <lessImportantInfo>
            <browser>Chrome</browser>
            <deviceType>Mobile</deviceType>
            <newsletterSubscribed>true</newsletterSubscribed>
        </lessImportantInfo>
    </customer>
</marketingData>


In this example, we will upload in the MongoDB only the fields that serve a practical purpose, rather than the entire XML.

Create a New Project 

It is recommended to create a new project from the GUI. This will automatically create the necessary folder structure and parsing rules. A full description of the project structure can be found in the official documentation.

All parameters described in this article can be configured in graphical mode, but for clarity, we will focus on the textual representation.

In addition to the config.txt file with project settings, job.txt for batch work, the project itself consists of:

  1. Template of intermediate internal SmartDOM view, located in the project folder templates/data-templates.red.
  2. Rules for processing and transformation of SmartDOM itself, located in the rules folder.

Let's consider the structure of data-templates.red:

Plain Text
 
#[
    sample: #[
        marketing_data: #[ 
            customers: [
                customer: [
                    name: none
                    email: none
                    purchases: [
                        purchase: [
                            product: none
                            category: none
                            price: none
                            store: none
                            location: none
                            purchase_date: none
                        ]
                    ]
                ]
            ]
        ]
    ]
]


Note

  1. The name sample is the name of the category, and it doesn't matter.
  2. The marketing_data is the name of the subcategory. We need at least one code subcategory (subtype).
  3. The intermediate view names don't require exact matches with XML tag names. In this example, we intentionally used the snake_case style.

Extract Rules

The rules are located in the rules directory in the project folder.

When working with MongoDB we will only be interested in two rules:

  1. tags-matching-rules.red — sets the matches between the XML tag tree and SmartDOM
  2. grow-rules.red — describes the relationship between SmartDOM nodes and real XML nodes
Plain Text
 
sample: [
    purchase: ["purchase"]
    customer: ["customer"]
]


The key will be the name of the node in SmartDOM; the value will be an array containing the node spelling variants from the real XML file. In our example, these names are the same.

Ignored Tags

To avoid loading minor data into MongoDB in the example above, we create files in the ignores folder — one per section, named after each section. These files contain lists of tags to skip during extraction. For our example, we'll have a sample.txt file containing:

Plain Text
 
["marketingData" "customer" "lessImportantInfo" "browser"]
["marketingData" "customer" "lessImportantInfo" "deviceType"]
["marketingData" "customer" "lessImportantInfo" "newsletterSubscribed"]


As a result, when analyzing morphology, the intermediate representation will take the next form:

Plain Text
 
customers: [
    customer: [
        name: "John Smith" 
        email: "john.smith@example.com" 
        loyalty_status: "Gold" 
        age: "34" 
        gender: "Male" 
        membership_id: "123456" 
        purchases: [
            purchase: [
                product: "Smartphone" 
                category: "Electronics" 
                price: "700" 
                store: "TechWorld" 
                location: "New York" 
                purchase_date: "2025-01-10"
            ]
        ]
    ]
]


Note that after morphological analysis, only a minimal representation is shown containing data from the first found nodes. 

Here's the JSON file that will be generated:

JSON
 
{
   "customers": [
      {
         "name": "John Smith",
         "email": "john.smith@example.com",
         "loyalty_status": "Gold",
         "age": "34",
         "gender": "Male",
         "membership_id": "123456",
         "purchases": [
            {
               "product": "Smartphone",
               "category": "Electronics",
               "price": "700",
               "store": "TechWorld",
               "location": "New York",
               "purchase_date": "2025-01-10"
            },
            {
               "product": "Wireless Earbuds",
               "category": "Audio",
               "price": "150",
               "store": "GadgetStore",
               "location": "New York",
               "purchase_date": "2025-01-11"
            }
         ]
      },
      {
         "name": "Jane Doe",
         "email": "jane.doe@example.com",
         "loyalty_status": "Silver",
         "age": "28",
         "gender": "Female",
         "membership_id": "654321",
         "purchases": [
            {
               "product": "Laptop",
               "category": "Electronics",
               "price": "1200",
               "store": "GadgetStore",
               "location": "San Francisco",
               "purchase_date": "2025-01-12"
            },
            {
               "product": "USB-C Adapter",
               "category": "Accessories",
               "price": "30",
               "store": "TechWorld",
               "location": "San Francisco",
               "purchase_date": "2025-01-13"
            },
            {
               "product": "Keyboard",
               "category": "Accessories",
               "price": "80",
               "store": "OfficeMart",
               "location": "San Francisco",
               "purchase_date": "2025-01-14"
            }
         ]
      },
      {
         "name": "Michael Johnson",
         "email": "michael.johnson@example.com",
         "loyalty_status": "Bronze",
         "age": "40",
         "gender": "Male",
         "membership_id": "789012",
         "purchases": [
            {
               "product": "Headphones",
               "category": "Audio",
               "price": "150",
               "store": "AudioZone",
               "location": "Chicago",
               "purchase_date": "2025-01-05"
            }
         ]
      },
      {
         "name": "Emily Davis",
         "email": "emily.davis@example.com",
         "loyalty_status": "Gold",
         "age": "25",
         "gender": "Female",
         "membership_id": "234567",
         "purchases": [
            {
               "product": "Running Shoes",
               "category": "Sportswear",
               "price": "120",
               "store": "FitShop",
               "location": "Los Angeles",
               "purchase_date": "2025-01-08"
            },
            {
               "product": "Yoga Mat",
               "category": "Sportswear",
               "price": "40",
               "store": "FitShop",
               "location": "Los Angeles",
               "purchase_date": "2025-01-09"
            }
         ]
      },
      {
         "name": "Robert Brown",
         "email": "robert.brown@example.com",
         "loyalty_status": "Silver",
         "age": "37",
         "gender": "Male",
         "membership_id": "345678",
         "purchases": [
            {
               "product": "Smartwatch",
               "category": "Wearable",
               "price": "250",
               "store": "GadgetPlanet",
               "location": "Boston",
               "purchase_date": "2025-01-07"
            },
            {
               "product": "Fitness Band",
               "category": "Wearable",
               "price": "100",
               "store": "HealthMart",
               "location": "Boston",
               "purchase_date": "2025-01-08"
            }
         ]
      }
   ]
}


Configuring Connection to MongoDB

Since MongoDB doesn't support direct HTTP data insertion, an intermediary service will be required.

Let's install the dependencies: pip install flask pymongo.

The service itself:

Python
 
from flask import Flask, request, jsonify
from pymongo import MongoClient
import json

app = Flask(__name__)

# Connection to MongoDB
client = MongoClient('mongodb://localhost:27017')
db = client['testDB']
collection = db['testCollection']

@app.route('/insert', methods=['POST'])
def insert_document():
    try:
        # Flask will automatically parse JSON if Content-Type: application/json
        data = request.get_json() 

        if not data:
            return jsonify({"error": "Empty JSON payload"}), 400

        result = collection.insert_one(data)
        return jsonify({"insertedId": str(result.inserted_id)}), 200
    except Exception as e:
        import traceback
        print(traceback.format_exc())
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(port=3000)


We'll set up the MongoDB connection settings in the config.txt file (see nosql-url):

Plain Text
 
job-number: 1
root-xml-folder: "D:/data/data-samples"
xml-filling-stat: false ; table: filling_percent_stat should exists
ignore-namespaces: false
ignore-tag-attributes: false
use-same-morphology-for-same-file-name-pattern: false
skip-schema-version-tag: true
use-same-morphology-for-all-files-in-folder: false
delete-data-before-insert: none
connect-to-db-at-project-opening: true
source-database: "SQLite" ; available values: PostgreSQL/SQLite
target-database: "SQLite" ; available values: PostgreSQL/SQLite/NoSQL

bot-chatID: ""
bot-token: ""
telegram-notifications: true

db-driver: ""
db-server: "127.0.0.1"
db-port: ""
db-name: ""
db-user: ""
db-pass: ""

sqlite-driver-name: "SQLite3 ODBC Driver"
sqlite-db-path: ""

nosql-url: "http://127.0.0.1:3000/insert"
append-subsection-name-to-nosql-url: false
no-sql-login: "" ; login and pass are empty
no-sql-pass: ""


Remember that MongoDB will automatically create a database and a collection of the same name if they do not exist. However, this behavior may cause errors, and it is recommended to disable it by default.

Let's run the service itself:

Python
 
python .\app.py


Next, click Parse, then Send JSON to NoSQL.

Send JSON to NoSQL
Now connect to the MongoDB console in any convenient way and execute the following commands:

Plain Text
 
show databases
admin   40.00 KiB
config  72.00 KiB
local   72.00 KiB
testDB  72.00 KiB
use testDB
switched to db testDB
db.testCollection.find().pretty()


The result should look like the following:

JSON
 
{
  _id: ObjectId('278e1b2c7c1823d4fde120ef'),
  customers: [
    {
      name: 'John Smith',
      email: 'john.smith@example.com',
      loyalty_status: 'Gold',
      age: '34',
      gender: 'Male',
      membership_id: '123456',
      purchases: [
        {
          product: 'Smartphone',
          category: 'Electronics',
          price: '700',
          store: 'TechWorld',
          location: 'New York',
          purchase_date: '2025-01-10'
        },
        {
          product: 'Wireless Earbuds',
          category: 'Audio',
          price: '150',
          store: 'GadgetStore',
          location: 'New York',
          purchase_date: '2025-01-11'
        }
      ]
    },
    {
      name: 'Jane Doe',
      email: 'jane.doe@example.com',
      loyalty_status: 'Silver',
      age: '28',
      gender: 'Female',
      membership_id: '654321',
      purchases: [
        {
          product: 'Laptop',
          category: 'Electronics',
          price: '1200',
          store: 'GadgetStore',
          location: 'San Francisco',
          purchase_date: '2025-01-12'
        },
        {
          product: 'USB-C Adapter',
          category: 'Accessories',
          price: '30',
          store: 'TechWorld',
          location: 'San Francisco',
          purchase_date: '2025-01-13'
        },
        {
          product: 'Keyboard',
          category: 'Accessories',
          price: '80',
          store: 'OfficeMart',
          location: 'San Francisco',
          purchase_date: '2025-01-14'
        }
      ]
    },
    {
      name: 'Michael Johnson',
      email: 'michael.johnson@example.com',
      loyalty_status: 'Bronze',
      age: '40',
      gender: 'Male',
      membership_id: '789012',
      purchases: [
        {
          product: 'Headphones',
          category: 'Audio',
          price: '150',
          store: 'AudioZone',
          location: 'Chicago',
          purchase_date: '2025-01-05'
        }
      ]
    },
    {
      name: 'Emily Davis',
      email: 'emily.davis@example.com',
      loyalty_status: 'Gold',
      age: '25',
      gender: 'Female',
      membership_id: '234567',
      purchases: [
        {
          product: 'Running Shoes',
          category: 'Sportswear',
          price: '120',
          store: 'FitShop',
          location: 'Los Angeles',
          purchase_date: '2025-01-08'
        },
        {
          product: 'Yoga Mat',
          category: 'Sportswear',
          price: '40',
          store: 'FitShop',
          location: 'Los Angeles',
          purchase_date: '2025-01-09'
        }
      ]
    },
    {
      name: 'Robert Brown',
      email: 'robert.brown@example.com',
      loyalty_status: 'Silver',
      age: '37',
      gender: 'Male',
      membership_id: '345678',
      purchases: [
        {
          product: 'Smartwatch',
          category: 'Wearable',
          price: '250',
          store: 'GadgetPlanet',
          location: 'Boston',
          purchase_date: '2025-01-07'
        },
        {
          product: 'Fitness Band',
          category: 'Wearable',
          price: '100',
          store: 'HealthMart',
          location: 'Boston',
          purchase_date: '2025-01-08'
        }
      ]
    }
  ]
}


Conclusion

In this example, we have seen how we can automate the uploading of XML files to MongoDB without having to write any code. Although the example considers only one file, it is possible within the framework of one project to a huge number of types and subtypes of files with different structures, as well as to perform quite complex manipulations, such as type conversion and the use of external services to process field values in real time. This allows not only the unloading of data from XML but also the processing of some of the values via external API, including the use of large language models.

Extract, transform, load JSON MongoDB XML

Opinions expressed by DZone contributors are their own.

Related

  • Validate XML Request Against XML Schema in Mule 4
  • Mixing SQL and NoSQL With MariaDB and MongoDB
  • Generating MongoDB Annotations for Java POJOs from JSON Schema Using the JSONSchema2Pojo Plugin
  • How to Convert JSON to XML or XML to JSON in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!