DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Extracting Data From Very Large XML Files With X-definition
  • What Is Ant, Really?
  • Why Queues Don’t Fix Scaling Problems
  • SmartXML: An Alternative to XPath for Complex XML Files

Trending

  • Introduction to Retrieval Augmented Generation (RAG)
  • How to Format Articles for DZone
  • A 5-Step SOC Guide That Meets RBI Expectations and Strengthens Security Operations
  • Kafka and Spark Structured Streaming in Enterprise: The Patterns That Hold Up Under Pressure
  1. DZone
  2. Coding
  3. Languages
  4. XML Processing Made Easy with Ballerina

XML Processing Made Easy with Ballerina

Let's take a look at a modern approach in handling XML as a built-in functionality in a programming language.

By 
Anjana Fernando user avatar
Anjana Fernando
·
Oct. 26, 20 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
6.9K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

The Ballerina programming language contains built-in support for XML data. It supports defining, validating, and manipulating XML directly from the language syntax itself. In this article, we will go through its features, and how to use it effectively.

Creating and Manipulating XML

The first approach when defining an XML value in Ballerina is to use direct literals. 

C
 




x


 
1
xml movie = xml `<Movie>
2
                   <Name>Jurassic Park</Name>
3
                   <Year>1993</Year>
4
                   <Director>Steven Spielberg</Director>
5
                 </Movie>`;


The XML value above is created using the XML literal syntax. In this way, the compiler identifies this specifically as an XML value and validates the literal value given by the user. So if we have mistakes in the XML value, such as mismatching start/end tags, you will be immediately given an error at compile-time, and of course, it will be highlighted as an invalid value in the IDE. 

An XML value in Ballerina is structured as a sequence of singleton XML values. These singletons are XML elements, processing instructions, comments, and text. The following example shows how to create a single XML value by combining two XML elements. 

C
 




xxxxxxxxxx
1
12


 
1
xml movie1 = xml `<Movie year="1993">
2
                    <Name>Jurassic Park</Name>
3
                    <Director>Steven Spielberg</Director>
4
                  </Movie>`;
5
 
6
xml movie2 = xml `<Movie year="1997">
7
                    <Name>Titanic</Name>
8
                    <Director>James Cameron</Director>
9
                  </Movie>`;
10
 
11
xml movieList = movie1 + movie2;


The above “movieList” contains an XML sequence of two XML element values. We can access individual items in the sequence similar to arrays by using the subscript operator in the following manner. 

C
 




xxxxxxxxxx
1


 
1
xml m1 = movieList[0];
2
xml m2 = movieList[1];


The built-in function “length” can be used to get the number of items in the sequence. 

C
 




xxxxxxxxxx
1


 
1
int n = movieList.length();


Also, other functions generally available for lists such as “filter”, “foreach”, and “map” can be used for functional iteration operations.  

In XML, literal values can also be interpolated with an expression to provide parts of its content. This is done with the syntax “${expr}”. In this manner, for the expression, we can provide an in-scope variable, function call, or any expression which will return a supported value in the placeholder. An example of this is shown below, where we use variables to provide an integer and a string value for the movie year and director respectively. 

C
 




xxxxxxxxxx
1


 
1
int titanicYear = 1997;
2
string titanicDirector = "James Cameron";
3
 
4
xml movie2 = xml `<Movie year="${titanicYear}">
5
                    <Name>Titanic</Name>
6
                    <Director>${titanicDirector}</Director>
7
                  </Movie>`;


The language library also provides XML subtypes: “xml:Element”, “xml:ProcessingInstruction”, “xml:Comment”, and “xml:Text”. These types can be used when we need to use specific subtype related operations. Let’s create an XML element and set its child elements using this functionality. 

C
 




xxxxxxxxxx
1


 
1
import ballerina/lang.'xml;
2
...
3
 
          
4
'xml:Element movies = <'xml:Element> xml `<Movies/>`;
5
movies.setChildren(movieList);


The XML values can also be namespace qualified. The following code snippet shows how an XML namespace is defined for a given namespace prefix. 

C
 




xxxxxxxxxx
1


 
1
xmlns "http://example.com/ns1" as ns1;
2
xml movies = xml `<ns1:Movies>${movieList}</ns1:Movies>`;


Here, we’ve defined the namespace prefix “ns1” as associated with the namespace URI "http://example.com/ns1”. Afterward, we have created a new XML element by associating it with the namespace in the “ns1” prefix. Also, we have simultaneously set the children of the element by using interpolation.

Similarly, we can set the default namespace of the XML values in the scope following its declaration by simply not setting a namespace prefix. 

C
 




xxxxxxxxxx
1


 
1
xmlns "http://example.com/ns1";
2
xml movies = xml `<Movies>${movieList}</Movies>`;


In the example above, the “movies” XML element and its children will inherit the namespace defined above it since it has been declared as the default namespace. 

Accessing XML

After we have XML values in our code, let’s see how we can access and query the structure that it’s representing. 

Let’s start with attribute access of an XML element value. This is done in the format “xml_value.attr_name”. The following code snippet shows how we can extract the “year” attribute from the movie element we created earlier. 

C
 




xxxxxxxxxx
1


 
1
string|error year = movie1.year;


Here, the attribute accessing expression returns a union of “string” and “error”. This is because, in the runtime, if the attribute is not existent in the given XML element value, it will return an “error” value. 

In the case of an XML attribute having a specific namespace, we can prefix the attribute name with the namespace prefix in the following manner.

C
 




xxxxxxxxxx
1
10
9


 
1
xmlns "http://example.com/ns1" as ns1;
2
 
3
xml movie2 = xml `<ns1:Movie ns1:year="1997">
4
                    <Name>Titanic</Name>
5
                    <Director>James Cameron</Director>
6
                  </ns1:Movie>`;
7
 
          
8
string|error year = movie2.ns1:year;


Now let’s see how we access elements in an XML sequence value. As we saw earlier, an XML sequence can have multiple single XML values at the same level. Let’s see how we can extract specific elements in such a sequence using filter expressions. First, let’s define a set of XML values. 

C
 




xxxxxxxxxx
1
24


 
1
xmlns "http://example.com/ns1" as ns1;
2
 
3
xml movie1 = xml `<Movie year="1993">
4
                   <Name>Jurassic Park</Name>
5
                   <Director>Steven Speilberg</Director>
6
                   </Movie>`;
7
xml movie2 = xml `<ns1:Movie ns1:year="1997">
8
                   <Name>Titanic</Name>
9
                   <Director>James Cameron</Director>
10
                   </ns1:Movie>`;
11
xml book1 = xml `<Book>
12
                   <Name>Harry Potter</Name>
13
                   <Author>J.K. Rowling</Author>
14
                   </Book>`;
15
 
16
xml person1 = xml `<Person>
17
                       <Name>Jack Smith</Name>
18
                       <BirthYear>1990</BirthYear>
19
                   </Person>`;
20
 
21
xml entries = movie1 + movie2 + book1 + person1;


Here, we have created an XML value “entries”, which contains a sequence of XML elements. Now let’s select all the elements that have the element “Movie”. The syntax for this is “xml_val.<xml_name_pattern>”

C
 




x


 
1
xml<'xml:Element> movieElements = entries.<Movie>;


Here, we have directly given “Movie” as the element name. And in the “movieElements” XML sequence, it will contain a single element, which is “movie1”. The XML element in “movie2” is not there, due to it being namespace qualified. Notice that we can also use a more constrained “xml<’xml:Element>” type for “movieElements” because the filter expressions specifically return XML elements. 

Now if we want to specifically access the movie element with a given namespace, we can use the following syntax. 

C
 




xxxxxxxxxx
1


 
1
xml<'xml:Element> movieElements = entries.<ns1:Movie>;


In this case, we will only get the XML element in “movie2” in “movieElements”. If we want to extract multiple elements with different names, we can delineate the names using “|” and provide this in the filter expression. 

C
 




xxxxxxxxxx
1


 
1
xml<'xml:Element> moviesAndBooks = entries.<ns1:Movie|Movie|Book>;


The statement above extracted all the XML elements in “entries” having either “Movie” or “Book” element names. 

The XML name pattern can also be “*”, which is used to select all the XML elements in a sequence. 

C
 




xxxxxxxxxx
1


 
1
xml<'xml:Element> allElements = entries.<*>;


The code above returns all the XML elements that are in the XML sequence “entries” and returns a new XML sequence that only has XML element values. 

Next, let’s see how we can query child items in an XML value. This can be done with XML step expressions. This has a syntax and functionality that would be familiar if you have used XPath before. 

C
 




xxxxxxxxxx
1


 
1
xml allChildItems = movie1/*;


Here, we read in all the child items in the XML value “movie1”. If we want to restrict this to only XML child elements, we will use the following syntax.

C
 




xxxxxxxxxx
1


 
1
xml allChildElements = movie1/<*>;


We can drill into any level as we want recursively with each XML value returned in each step. 

C
 




xxxxxxxxxx
1


 
1
xml doc = xml `<Doc>${entries}</Doc>`;
2
xml allMoviesNames = doc/<Movie|ns1:Movie>/<Name>;


Here, we read in all the movie names, where we used step expressions to consider all the elements which have the name “Movie”. 

Also, we can search through all the descendants of an XML value to access the required items. The example below shows how this is done.

C
 




xxxxxxxxxx
1


 
1
xml allNames = doc/**/<Name>;


With the “/**” syntax, the execution will search through all the descendants of the “doc” XML value and find all the elements with the name “Name”.

Operations that are supported in XML filter expressions and step expressions can also be implemented using the functions available in the XML language library. 

XML and Language Integrated Queries

In Ballerina, we can incorporate the language integrated query features with XML processing to do advanced processing and transformation operations. Let’s take a look at a sample dataset and see how we can transform it to have a better representation. Here, we will be using a publicly available XML dataset, which contains the annual CO2 emissions of each country. Below shows a sample snippet of this data. 

XML
 




xxxxxxxxxx
1
16


 
1
<Root>
2
   <data>
3
       <record>
4
           <field name="Country or Area" key="ABW">Aruba</field>
5
           <field name="Item" key="EN.ATM.CO2E.PC">CO2 emissions (metric tons per capita)</field>
6
           <field name="Year">1960</field>
7
           <field name="Value">204.620372249175</field>
8
       </record>
9
       <record>
10
           <field name="Country or Area" key="AFG">Afghanistan</field>
11
           <field name="Item" key="EN.ATM.CO2E.PC">CO2 emissions (metric tons per capita)</field>
12
           <field name="Year">1964</field>
13
           <field name="Value">0.0861736143685528</field>
14
       </record>
15
   </data>
16
</Root>


We want the above dataset to be transformed in a way that we have the XML element names itself represent the meaning of its text value. Also, the source dataset contains some records with value fields that are empty, which we would like to skip. The final result should be similar to the dataset below. 

XML
 




xxxxxxxxxx
1
13


 
1
<records>
2
   <record>
3
       <country>Aruba</country>
4
       <year>1960</year>
5
       <value>204.620372249175</value>
6
   </record>
7
   <record>
8
       <country>Afghanistan</country>
9
       <year>1964</year>
10
       <value>0.0861736143685528</value>
11
   </record>
12
</records>


The transformation above can be done with a single statement in Ballerina using its integrated query functionality. Below contains the full Ballerina source code used in implementing the required transformation. 

C
 




xxxxxxxxxx
1
30


 
1
import ballerina/io;
2
 
3
public function main() returns @tainted error? {
4
 
5
   io:ReadableByteChannel rbc = check io:openReadableFile("/home/laf/Downloads/API_EN.ATM.CO2E.PC_DS2_en_xml_v2_1500418.xml");
6
   io:ReadableCharacterChannel rch = new (rbc, "UTF8");
7
 
8
   xml payload = check rch.readXml();
9
 
10
   xml transformedData = xml `<records>
11
                               ${from var x in payload/<data>/<*>
12
                                 let var country = <xml> x/<'field>[0]/*
13
                                 let var year = <xml> x/<'field>[2]/*
14
                                 let var value = <xml> x/<'field>[3]/*
15
                                 where value.length() > 0
16
                                 select xml `<record>
17
                                                 <country>${country}</country>
18
                                                 <year>${year}</year>
19
                                                 <value>${value}</value>
20
                                             </record>`
21
                                }
22
                              </records>`;
23
 
24
   io:WritableByteChannel wbc = check io:openWritableFile("/home/laf/Downloads/transformed.xml");
25
   io:WritableCharacterChannel wch = new (wbc, "UTF8");   
26
   check wch.writeXml(transformedData);
27
   check wch.close();
28
   check rch.close();
29
}


As shown in the code above, we can mix and match various aspects of the language to create more powerful functionality. 

Summary

In this article, we have gone through the main aspects of XML handling in Ballerina. We provided an overview of how to create XML values, manipulate them, and access XML using various technologies available in the language. 

For more information on Ballerina and XML handling, refer to the following resources:

  • Ballerina by Example
  • Ballerina API Documentation


XML Ballerina (programming language) Element Processing Database Syntax (programming languages)

Opinions expressed by DZone contributors are their own.

Related

  • Extracting Data From Very Large XML Files With X-definition
  • What Is Ant, Really?
  • Why Queues Don’t Fix Scaling Problems
  • SmartXML: An Alternative to XPath for Complex XML Files

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook