Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Parsing XML Into Scala Case Classes Using Xtract

DZone's Guide to

Parsing XML Into Scala Case Classes Using Xtract

Want to learn more about how to parse an XML project into Scala case classes? Check out this tutorial to learn how using the xtract library.

· Java Zone ·
Free Resource

Get the Edge with a Professional Java IDE. 30-day free trial.

In computing, the Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is one of the well-known data formats for transporting information from one system to another with reliability and convenience. It uses a tag-based format for composing data. In real-world data processing, we often come across XML data parsing issues. Play-json is one of the easy and convenient ways to parse JSON (JavaScript Object Notation) data into Scala case classes, and it is being used widely by various organizations to parse and write data into/from a JSON object. As part of one of the project, I discovered a use case for parsing XML data into the Scala case class to process it further. I was quite familiar with JSON parsing with Play-Json and Json4s, but XML parsing was a bit new for me, so I tried to look for few alternatives available to parse XML object into a Scala case class. After several efforts, I came across a few alternatives:

  • JAXB: “JAXB stands for Java architecture for XML binding. It is used to convert XML to a Java object and a Java object to XML.”
  • Scalaxb: “Scalaxb is an XML data-binding tool that supports XSD and WSDL, and as output, it generates Scala source files.”
  • Xtract: “Xtract is a Scala library for deserializing XML. It is heavily inspired by the combinators in the Play JSON library, in particular, the Reads[T] class.”

20180323_143223

The JAXB is more specific to Java classes and the Scalaxb doesn’t look mature enough to be used right now from the blogs, but the most common thing about them is that they are most suitable when you have a schema defined for your XML objects.

For my use case, the schema was not defined, so I wanted to use a play-json-like library that can convert my XML data into a Scala case class. One of the solutions I found was the xtract library. As I mentioned earlier, it is very similar to play-json and reads data. This post looks specifically at parsing XML into Scala objects only, so if you are looking for a reverse conversion (Scala to XML), you can explore the xtract library itself.

Let’s start exploring this library by parsing an XML object into a Scala case class.

Here is the “build.sbt” file to define the dependency related to the xtract library. The xtract library uses few classes for functional syntax from play-json, so we have to provide the play-json dependency as well.

Sample XML Data

Here is a complex XML object sample which contains a family tree (Example):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Response>
    <person name="Raaj Kapoor" dob="14 December 1924" gender="male">
        <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
        <wife name="Krishna Malhotra" dob="30 December 1930" gender="female"/>
        <kids name="Randheer Kapoor" dob="15 February 1947" gender="male">
            <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <wife name="Babita" dob="NA" gender="female"/>
            <kids name="Karishma Kapoor" dob="25 June 1974" gender="female">
                <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
                <husband name="Sanjay Kapoor" dob="NA" gender="male"/>
                <kids name="Samaira Kapoor" dob="NA" gender="female"/>
            </kids>
            <kids name="Kareena Kapoor" dob="21 September 1980" gender="female">
                <address street="Mumbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
                <husband name="Saif Ali Khan" dob="16 August 1970" gender="male"/>
                <kids name="Taimoor Ali Khan" dob="NA" gender="male"/>
            </kids>
        </kids>
        <kids name="Ritu Nanda" dob="30 October 1948" gender="female">
            <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <husband name="Ranjan Nanda" dob="NA" gender="male"/>
            <kids name="Nitasha Nanda" dob="NA" gender="female"/>
            <kids name="Nikhil Nanda" dob="NA" gender="male">
                <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
                <wife name="Shweta Bachchan Nanda" dob="80" gender="female"/>
                <kids name="Navya Naveli Nanda" dob="NA" gender="male"/>
                <kids name="Agastyle Nanda" dob="NA" gender="male"/>
            </kids>
        </kids>
        <kids name="Rishi Kappor" dob="4 September 1952" gender="male">
            <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <wife name="Neetu Singh Kapoor" dob="NA" gender="female"/>
            <kids name="Rishima Sahni" dob="NA" gender="female"/>
            <kids name="Ranveer Kapoor" dob="28 September 1982" gender="male"/>
        </kids>
        <kids name="Reema Jain" dob="NA" gender="male">
            <address street="Mumbai46" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <husband name="NA" dob="NA" gender="male"/>
            <kids name="Adar Jain" dob="NA" gender="male"/>
            <kids name="Arman Jain" dob="NA" gender="male"/>
        </kids>
        <kids name="Rajeev Kapoor" dob="NA" gender="male">
            <address street="Mumbai46mbai" city="Mumbai" state="Maharashtra" pin="36770047" country="India"/>
            <wife name="NA" dob="NA" gender="female"/>
        </kids>
    </person>
</Response>


XML Readers

Just like paly-json, xtract let us write readers and writers to parse XML into Scala and vice versa. Here is an example for the reader.

import com.lucidchart.open.xtract.XmlReader._
import com.lucidchart.open.xtract.{XmlReader, __}
import play.api.libs.functional.syntax._

object Person {
  implicit val reader: XmlReader[Person] = (
    attribute[String]("name") and
      attribute[String]("dob") and
      attribute[String]("gender") and
      (__ \ "address").read[Address].optional and
      (__ \ "wife").lazyRead(first[Person]).optional and
      (__ \ "husband").lazyRead(first[Person]).optional and
      (__ \ "kids").lazyRead(seq[Person]).default(Nil)
    ) (apply _)
}

case class Person(
                   name: String,
                   dob: String,
                   gender: String,
                   address: Option[Address],
                   wife: Option[Person],
                   husband: Option[Person],
                   kids: Seq[Person]
                 )

object Address {
  implicit val reader: XmlReader[Address] = (
    attribute[String]("street") and
      attribute[String]("city") and
      attribute[String]("state") and
      attribute[String]("pin") and
      attribute[String]("country")
    ) (apply _)
}

case class Address(
                    street: String,
                    city: String,
                    state: String,
                    pin: String,
                    country: String
                  )

case class Response(
                     person: Seq[Person]
                   )

object Response {
  implicit val reader: XmlReader[Response] = (__ \ "person").read(seq[Person]).default(Nil).map(apply _)
}


Here are few important keywords that are used to parse XML into Scala case classes:

  • Reading attributes (attribute): An [[XmlReader]] that extracts a value from the attribute of the input NodeSeq. 
attribute[String](“state”)


  • Reading nodes(read): Create an [[XmlReader]] that reads the node(s) located at this xpath.
(__ \ “address”).read[Address].optional


  • Reading nodes recursively (lazyRead): Same as [[read]] but take the reader as a lazy argument so that it can be used in recursive
(__ \ “wife”).lazyRead(first[Person]).optional


  • Reading optional nodes (optional): Convert to a reader that always succeeds with an option (None if it would have failed). Any errors are dropped


  • Reading nodes with default values (default): Use a default value if unable to parse, always successful, drops any errors


  • Reading sequence/lists (seq): Read each node in the NodeSeq with a reader and succeed with a [[PartialParseSuccess]] if any of the elements fail.
 (__ \ “person”).read(seq[Person]).default(Nil).map(apply _)


XML Helper

This class works as an XML helper for parsing XML data into Scala case classes.

import java.io.File
import com.knoldus.xtract.models._
import com.lucidchart.open.xtract.XmlReader
import scala.io.Source
import scala.xml.XML

/**
  * This class provide functionality to parse xml data into scala case classes
  */
trait XmlHelper {

  def xtract(filePath: String): Option[Response] = {
    val xmlData = Source.fromFile(new File(filePath)).getLines().mkString("\n")
    println("***File to be parsed: ")
    println(xmlData)
    val xml = XML.loadString(xmlData)
    XmlReader.of[Response].read(xml).toOption
  }
}


Sample App

Here, we have a simple application that takes an XML file and parses it into a complex Scala object.

import com.knoldus.xtract.util.XmlHelper

object XtractSampleApp extends App with XmlHelper {
  val path = "src/main/resources/person.xml"
  val response = xtract(path)
  println("***RESPONSE: " + response)
}


Sample Scala Object After Parsing

After parsing the XML data, here is a sample outcome in the form of Scala case classes:

Some(Response(Vector(Person(Raaj Kapoor,14 December 1924,male,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),Some(Person(Krishna Malhotra,30 December 1930,female,None,None,None,Vector())),None,Vector(Person(Randheer Kapoor,15 February 1947,male,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),Some(Person(Babita,NA,female,None,None,None,Vector())),None,Vector(Person(Karishma Kapoor,25 June 1974,female,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),None,Some(Person(Sanjay Kapoor,NA,male,None,None,None,Vector())),Vector(Person(Samaira Kapoor,NA,female,None,None,None,Vector()))), Person(Kareena Kapoor,21 September 1980,female,Some(Address(Mumbai,Mumbai,Maharashtra,36770047,India)),None,Some(Person(Saif Ali Khan,16 August 1970,male,None,None,None,Vector())),Vector(Person(Taimoor Ali Khan,NA,male,None,None,None,Vector()))))), Person(Ritu Nanda,30 October 1948,female,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),None,Some(Person(Ranjan Nanda,NA,male,None,None,None,Vector())),Vector(Person(Nitasha Nanda,NA,female,None,None,None,Vector()), Person(Nikhil Nanda,NA,male,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),Some(Person(Shweta Bachchan Nanda,80,female,None,None,None,Vector())),None,Vector(Person(Navya Naveli Nanda,NA,male,None,None,None,Vector()), Person(Agastyle Nanda,NA,male,None,None,None,Vector()))))), Person(Rishi Kappor,4 September 1952,male,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),Some(Person(Neetu Singh Kapoor,NA,female,None,None,None,Vector())),None,Vector(Person(Rishima Sahni,NA,female,None,None,None,Vector()), Person(Ranveer Kapoor,28 September 1982,male,None,None,None,Vector()))), Person(Reema Jain,NA,male,Some(Address(Mumbai46,Mumbai,Maharashtra,36770047,India)),None,Some(Person(NA,NA,male,None,None,None,Vector())),Vector(Person(Adar Jain,NA,male,None,None,None,Vector()), Person(Arman Jain,NA,male,None,None,None,Vector()))), Person(Rajeev Kapoor,NA,male,Some(Address(Mumbai46mbai,Mumbai,Maharashtra,36770047,India)),Some(Person(NA,NA,female,None,None,None,Vector())),None,Vector()))))))


Running Application

Step 1. Clone the git repo from here:
Git repository for the sample project

Step 2. Run the application using the following command:

sbt run


After running the application using the above command, you can find the outcome on the terminal. For further queries, you can play with code and find the required outcomes.

Hope you enjoyed the post. In our next post, we will be looking more deeply into how the “xtract library” works and converts the XML data into Scala case classes.

Thanks for reading!

Get the Java IDE that understands code & makes developing enjoyable. Level up your code with IntelliJ IDEA. Download the free trial.

Topics:
java ,tutorial ,scala ,xml ,json ,data ,xtract ,xtract library

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}