DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Extracting Data From Very Large XML Files With X-definition
  • What Is Ant, Really?
  • Working With Data in Microservices
  • SmartXML: An Alternative to XPath for Complex XML Files

Trending

  • Implementing API Design First in .NET for Efficient Development, Testing, and CI/CD
  • How to Merge HTML Documents in Java
  • Understanding the Shift: Why Companies Are Migrating From MongoDB to Aerospike Database?
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  1. DZone
  2. Coding
  3. Languages
  4. XML Processing Made Easy with Ballerina

XML Processing Made Easy with Ballerina

Let's take a look at a modern approach in handling XML as a built-in functionality in a programming language.

By 
Anjana Fernando user avatar
Anjana Fernando
·
Oct. 26, 20 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
6.3K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

The Ballerina programming language contains built-in support for XML data. It supports defining, validating, and manipulating XML directly from the language syntax itself. In this article, we will go through its features, and how to use it effectively.

Creating and Manipulating XML

The first approach when defining an XML value in Ballerina is to use direct literals. 

C
 




x


 
1
xml movie = xml `<Movie>
2
                   <Name>Jurassic Park</Name>
3
                   <Year>1993</Year>
4
                   <Director>Steven Spielberg</Director>
5
                 </Movie>`;


The XML value above is created using the XML literal syntax. In this way, the compiler identifies this specifically as an XML value and validates the literal value given by the user. So if we have mistakes in the XML value, such as mismatching start/end tags, you will be immediately given an error at compile-time, and of course, it will be highlighted as an invalid value in the IDE. 

An XML value in Ballerina is structured as a sequence of singleton XML values. These singletons are XML elements, processing instructions, comments, and text. The following example shows how to create a single XML value by combining two XML elements. 

C
 




xxxxxxxxxx
1
12


 
1
xml movie1 = xml `<Movie year="1993">
2
                    <Name>Jurassic Park</Name>
3
                    <Director>Steven Spielberg</Director>
4
                  </Movie>`;
5
 
6
xml movie2 = xml `<Movie year="1997">
7
                    <Name>Titanic</Name>
8
                    <Director>James Cameron</Director>
9
                  </Movie>`;
10
 
11
xml movieList = movie1 + movie2;


The above “movieList” contains an XML sequence of two XML element values. We can access individual items in the sequence similar to arrays by using the subscript operator in the following manner. 

C
 




xxxxxxxxxx
1


 
1
xml m1 = movieList[0];
2
xml m2 = movieList[1];


The built-in function “length” can be used to get the number of items in the sequence. 

C
 




xxxxxxxxxx
1


 
1
int n = movieList.length();


Also, other functions generally available for lists such as “filter”, “foreach”, and “map” can be used for functional iteration operations.  

In XML, literal values can also be interpolated with an expression to provide parts of its content. This is done with the syntax “${expr}”. In this manner, for the expression, we can provide an in-scope variable, function call, or any expression which will return a supported value in the placeholder. An example of this is shown below, where we use variables to provide an integer and a string value for the movie year and director respectively. 

C
 




xxxxxxxxxx
1


 
1
int titanicYear = 1997;
2
string titanicDirector = "James Cameron";
3
 
4
xml movie2 = xml `<Movie year="${titanicYear}">
5
                    <Name>Titanic</Name>
6
                    <Director>${titanicDirector}</Director>
7
                  </Movie>`;


The language library also provides XML subtypes: “xml:Element”, “xml:ProcessingInstruction”, “xml:Comment”, and “xml:Text”. These types can be used when we need to use specific subtype related operations. Let’s create an XML element and set its child elements using this functionality. 

C
 




xxxxxxxxxx
1


 
1
import ballerina/lang.'xml;
2
...
3
 
          
4
'xml:Element movies = <'xml:Element> xml `<Movies/>`;
5
movies.setChildren(movieList);


The XML values can also be namespace qualified. The following code snippet shows how an XML namespace is defined for a given namespace prefix. 

C
 




xxxxxxxxxx
1


 
1
xmlns "http://example.com/ns1" as ns1;
2
xml movies = xml `<ns1:Movies>${movieList}</ns1:Movies>`;


Here, we’ve defined the namespace prefix “ns1” as associated with the namespace URI "http://example.com/ns1”. Afterward, we have created a new XML element by associating it with the namespace in the “ns1” prefix. Also, we have simultaneously set the children of the element by using interpolation.

Similarly, we can set the default namespace of the XML values in the scope following its declaration by simply not setting a namespace prefix. 

C
 




xxxxxxxxxx
1


 
1
xmlns "http://example.com/ns1";
2
xml movies = xml `<Movies>${movieList}</Movies>`;


In the example above, the “movies” XML element and its children will inherit the namespace defined above it since it has been declared as the default namespace. 

Accessing XML

After we have XML values in our code, let’s see how we can access and query the structure that it’s representing. 

Let’s start with attribute access of an XML element value. This is done in the format “xml_value.attr_name”. The following code snippet shows how we can extract the “year” attribute from the movie element we created earlier. 

C
 




xxxxxxxxxx
1


 
1
string|error year = movie1.year;


Here, the attribute accessing expression returns a union of “string” and “error”. This is because, in the runtime, if the attribute is not existent in the given XML element value, it will return an “error” value. 

In the case of an XML attribute having a specific namespace, we can prefix the attribute name with the namespace prefix in the following manner.

C
 




xxxxxxxxxx
1
10
9


 
1
xmlns "http://example.com/ns1" as ns1;
2
 
3
xml movie2 = xml `<ns1:Movie ns1:year="1997">
4
                    <Name>Titanic</Name>
5
                    <Director>James Cameron</Director>
6
                  </ns1:Movie>`;
7
 
          
8
string|error year = movie2.ns1:year;


Now let’s see how we access elements in an XML sequence value. As we saw earlier, an XML sequence can have multiple single XML values at the same level. Let’s see how we can extract specific elements in such a sequence using filter expressions. First, let’s define a set of XML values. 

C
 




xxxxxxxxxx
1
24


 
1
xmlns "http://example.com/ns1" as ns1;
2
 
3
xml movie1 = xml `<Movie year="1993">
4
                   <Name>Jurassic Park</Name>
5
                   <Director>Steven Speilberg</Director>
6
                   </Movie>`;
7
xml movie2 = xml `<ns1:Movie ns1:year="1997">
8
                   <Name>Titanic</Name>
9
                   <Director>James Cameron</Director>
10
                   </ns1:Movie>`;
11
xml book1 = xml `<Book>
12
                   <Name>Harry Potter</Name>
13
                   <Author>J.K. Rowling</Author>
14
                   </Book>`;
15
 
16
xml person1 = xml `<Person>
17
                       <Name>Jack Smith</Name>
18
                       <BirthYear>1990</BirthYear>
19
                   </Person>`;
20
 
21
xml entries = movie1 + movie2 + book1 + person1;


Here, we have created an XML value “entries”, which contains a sequence of XML elements. Now let’s select all the elements that have the element “Movie”. The syntax for this is “xml_val.<xml_name_pattern>”

C
 




x


 
1
xml<'xml:Element> movieElements = entries.<Movie>;


Here, we have directly given “Movie” as the element name. And in the “movieElements” XML sequence, it will contain a single element, which is “movie1”. The XML element in “movie2” is not there, due to it being namespace qualified. Notice that we can also use a more constrained “xml<’xml:Element>” type for “movieElements” because the filter expressions specifically return XML elements. 

Now if we want to specifically access the movie element with a given namespace, we can use the following syntax. 

C
 




xxxxxxxxxx
1


 
1
xml<'xml:Element> movieElements = entries.<ns1:Movie>;


In this case, we will only get the XML element in “movie2” in “movieElements”. If we want to extract multiple elements with different names, we can delineate the names using “|” and provide this in the filter expression. 

C
 




xxxxxxxxxx
1


 
1
xml<'xml:Element> moviesAndBooks = entries.<ns1:Movie|Movie|Book>;


The statement above extracted all the XML elements in “entries” having either “Movie” or “Book” element names. 

The XML name pattern can also be “*”, which is used to select all the XML elements in a sequence. 

C
 




xxxxxxxxxx
1


 
1
xml<'xml:Element> allElements = entries.<*>;


The code above returns all the XML elements that are in the XML sequence “entries” and returns a new XML sequence that only has XML element values. 

Next, let’s see how we can query child items in an XML value. This can be done with XML step expressions. This has a syntax and functionality that would be familiar if you have used XPath before. 

C
 




xxxxxxxxxx
1


 
1
xml allChildItems = movie1/*;


Here, we read in all the child items in the XML value “movie1”. If we want to restrict this to only XML child elements, we will use the following syntax.

C
 




xxxxxxxxxx
1


 
1
xml allChildElements = movie1/<*>;


We can drill into any level as we want recursively with each XML value returned in each step. 

C
 




xxxxxxxxxx
1


 
1
xml doc = xml `<Doc>${entries}</Doc>`;
2
xml allMoviesNames = doc/<Movie|ns1:Movie>/<Name>;


Here, we read in all the movie names, where we used step expressions to consider all the elements which have the name “Movie”. 

Also, we can search through all the descendants of an XML value to access the required items. The example below shows how this is done.

C
 




xxxxxxxxxx
1


 
1
xml allNames = doc/**/<Name>;


With the “/**” syntax, the execution will search through all the descendants of the “doc” XML value and find all the elements with the name “Name”.

Operations that are supported in XML filter expressions and step expressions can also be implemented using the functions available in the XML language library. 

XML and Language Integrated Queries

In Ballerina, we can incorporate the language integrated query features with XML processing to do advanced processing and transformation operations. Let’s take a look at a sample dataset and see how we can transform it to have a better representation. Here, we will be using a publicly available XML dataset, which contains the annual CO2 emissions of each country. Below shows a sample snippet of this data. 

XML
 




xxxxxxxxxx
1
16


 
1
<Root>
2
   <data>
3
       <record>
4
           <field name="Country or Area" key="ABW">Aruba</field>
5
           <field name="Item" key="EN.ATM.CO2E.PC">CO2 emissions (metric tons per capita)</field>
6
           <field name="Year">1960</field>
7
           <field name="Value">204.620372249175</field>
8
       </record>
9
       <record>
10
           <field name="Country or Area" key="AFG">Afghanistan</field>
11
           <field name="Item" key="EN.ATM.CO2E.PC">CO2 emissions (metric tons per capita)</field>
12
           <field name="Year">1964</field>
13
           <field name="Value">0.0861736143685528</field>
14
       </record>
15
   </data>
16
</Root>


We want the above dataset to be transformed in a way that we have the XML element names itself represent the meaning of its text value. Also, the source dataset contains some records with value fields that are empty, which we would like to skip. The final result should be similar to the dataset below. 

XML
 




xxxxxxxxxx
1
13


 
1
<records>
2
   <record>
3
       <country>Aruba</country>
4
       <year>1960</year>
5
       <value>204.620372249175</value>
6
   </record>
7
   <record>
8
       <country>Afghanistan</country>
9
       <year>1964</year>
10
       <value>0.0861736143685528</value>
11
   </record>
12
</records>


The transformation above can be done with a single statement in Ballerina using its integrated query functionality. Below contains the full Ballerina source code used in implementing the required transformation. 

C
 




xxxxxxxxxx
1
30


 
1
import ballerina/io;
2
 
3
public function main() returns @tainted error? {
4
 
5
   io:ReadableByteChannel rbc = check io:openReadableFile("/home/laf/Downloads/API_EN.ATM.CO2E.PC_DS2_en_xml_v2_1500418.xml");
6
   io:ReadableCharacterChannel rch = new (rbc, "UTF8");
7
 
8
   xml payload = check rch.readXml();
9
 
10
   xml transformedData = xml `<records>
11
                               ${from var x in payload/<data>/<*>
12
                                 let var country = <xml> x/<'field>[0]/*
13
                                 let var year = <xml> x/<'field>[2]/*
14
                                 let var value = <xml> x/<'field>[3]/*
15
                                 where value.length() > 0
16
                                 select xml `<record>
17
                                                 <country>${country}</country>
18
                                                 <year>${year}</year>
19
                                                 <value>${value}</value>
20
                                             </record>`
21
                                }
22
                              </records>`;
23
 
24
   io:WritableByteChannel wbc = check io:openWritableFile("/home/laf/Downloads/transformed.xml");
25
   io:WritableCharacterChannel wch = new (wbc, "UTF8");   
26
   check wch.writeXml(transformedData);
27
   check wch.close();
28
   check rch.close();
29
}


As shown in the code above, we can mix and match various aspects of the language to create more powerful functionality. 

Summary

In this article, we have gone through the main aspects of XML handling in Ballerina. We provided an overview of how to create XML values, manipulate them, and access XML using various technologies available in the language. 

For more information on Ballerina and XML handling, refer to the following resources:

  • Ballerina by Example
  • Ballerina API Documentation


XML Ballerina (programming language) Element Processing Database Syntax (programming languages)

Opinions expressed by DZone contributors are their own.

Related

  • Extracting Data From Very Large XML Files With X-definition
  • What Is Ant, Really?
  • Working With Data in Microservices
  • SmartXML: An Alternative to XPath for Complex XML Files

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!