Data Integration With Ballerina

DZone 's Guide to

Data Integration With Ballerina

Let's take a look at Ballerina and explore how Ballerina Integration works as well as why to use it for data integration.

· Integration Zone ·
Free Resource

Data integration is a complex problem. Data flows to and from many directions via different sources such as databases, the cloud, legacy systems, ERPs, and on-premise applications. Data integration allows businesses to combine data residing in different sources to provide users a consolidated view. Data in a cloud-native world is shaped in many different ways and resides in many places. How you choose to use, integrate with, and analyze cloud-native data may be different than what you're used to. Learn how a new programming language, Ballerina, can be used to solve this problem.

What is Ballerina?

Ballerina is an open source, concurrent programming language that has both textual and graphical representations. It is designed for seamless integration of networked applications. Ballerina is strongly and statically typed with a union type system. Ballerina is an open source project under the Apache 2.0 license. This article content and code snippets are based on Ballerina 0.981.0 release.

How Ballerina Data Integration Works

Ballerina has a rich set of packages that expose endpoints to enable the connectivity between various external data sources such as SQL databases, NoSQL databases, CSV files, FTP files, Google spreadsheets, etc. For each of these data sources, there is a corresponding client endpoint which is running on top of the Ballerina VM.

Image title

Why Using Ballerina for Data Integration?

Ballerina language is specifically designed for integration domain and it allows fast and easy data integration due to following reasons.

Connector Support for Various Data Sources

Ballerina client endpoints are used to connect with external data sources, backend services or APIs. For data integration purposes ballerina provides several SQL and NoSQL endpoints to interact with tabular SQL and NoSQL data sources. Ballerina is equipped with following data endpoints as of now (Ballerina 0.981.0 version) Ballerina provides extension mechanisms for writing custom native/Ballerina client endpoints which can connect to any custom data sources if required.

  1. JDBC Endpoint— Built-in connector which connects with SQL based tabular data sources via JDBC drivers.
  2. MySQL Endpoint — SQL endpoint customized for MySQL DB.
  3. H2 Endpoint — SQL endpoint customized for H2 DB.
  4. MongoDB Endpoint— Connects to MongoDB and allows data find operations and manipulation operations like update, delete etc.
  5. Cassandra Endpoint— Used to connect Ballerina with Cassandra data source and update, select data.
  6. Redis Endpoint — To connect Ballerina with Redis data sources.
  7. FTP Endpoint — To connect to an FTP server and perform I/O operations.
  8. Google Spreadsheet Endpoint — to access the Google Spreadsheet API Version v4 through Ballerina.

A simple example in Ballerina that creates a table and inserting/selecting data is as follows:

import ballerina/config;
import ballerina/h2;
import ballerina/io;

type Student record {
    int age;
    string firstName;

endpoint h2:Client testDB {
    path: config:getAsString("DATABASE_PATH", default = "./h2-client"),
    name: config:getAsString("DATABASE_PATH", default = "testdb"),
    username: config:getAsString("DATABASE_USER", default = "SA"),
    password: config:getAsString("DATABASE_PASS", default = ""),
    poolOptions: { maximumPoolSize: 5 }

function main(string... args) {
    //Create Table in a in memory H2 Database
    var ret= testDB ->update("CREATE TABLE student(age INT, name VARCHAR(255))");
    //Insert data into the table
    ret = testDB ->update("INSERT INTO student(age, name) values (?, ?)", 10, "John");
    ret = testDB ->update("INSERT INTO student(age, name) values (?, ?)", 20, "Anne");
    //Select data from the table
    table<Student> tableStudent = check testDB ->select("SELECT * FROM student", 
                                                Student, loadToMemory = true);
    //Get the row count
    int count = tableStudent .count();
    //Convert table into json
    json jsonData = check <json>tableStudent;
    //Convert table to xml
    xml xmlData = check <xml>tableStudent;
    //Access each data record
    foreach s in tableStudent {

Built-In Transaction Support

Transactions plays an important role in data integration domain. A transaction is a series of data manipulation statements that must either fully complete or fully fail, leaving the system in a consistent state. Ballerina language supports both local and xa transactions for data and JMS connector actions. And one of the key differences is that Ballerina supports distributed transactions among multiple microservices. When multiple microservices are interacting with each other and if we need to tie them together in a single transaction, it is required to have a coordination protocol over the conventional two-phase commit as each microservice is running on a different process. In Ballerina, for transactions in microservices are using a protocol which can result in the joint outcome based on a coordinator. Ballerina provides syntax support for defining transaction boundaries easily and handling transaction failures and retries.

A simple code for a distributed transaction using Ballerina is as follows:

import ballerina/http;
import ballerina/log;
import ballerina/mysql;
import ballerina/config;

//Create mysql client endpoint with config for employee db.
endpoint mysql:Client employeeDB {
   host: config:getAsString("employeedb.host"),
   port: config:getAsInt("employeedb.port"),
   name: config:getAsString("employeedb.name"),
   username: config:getAsString("employeedb.user"),
   password: config:getAsString("employeedb.pass"),
   dbOptions: { useSSL: false }
//Create mysql client endpoint with config for salary db.
endpoint mysql:Client salaryDB {
   host: config:getAsString("salarydb.host"),
   port: config:getAsInt("salarydb.port"),
   name: config:getAsString("salarydb.name"),
   username: config:getAsString("salarydb.user"),
   password: config:getAsString("salarydb.pass"),
   dbOptions: { useSSL: false }

//Employee Add service
@http:ServiceConfig {
   basePath: "/hr"
service<http:Service> employeeService bind { port: 9090 } {
   @http:ResourceConfig {
       methods: ["POST"],
       path: "/employee"
   addEmployee(endpoint caller, http:Request req) {
       json payload = check req.getJsonPayload();
       string name = payload.name.toString();
       float salary = check <float> payload.salary;
       int age = check <int>payload.age;
       int key = -1;
       transaction with retries = 2 {
           //Update first table and get the generated key
           (int, string[]) retWithKey = check employeeDB->updateWithGeneratedKeys(
               "INSERT INTO Employee (name, age) VALUES (?,?)", (), name, age);
           var (count, ids) = retWithKey;
           string generatedKey = ids[0];           
           key = check <int>generatedKey;
           //Update the second table with previous key
           var ret = salaryDB->update("INSERT INTO Salary (id, month_salary)
                VALUES (?, ?)", key, salary);
       } onretry {
           log:printError("Transaction failed, retrying ...");
           key = -1;
       //Generate and send the response
       http:Response res = new;
       res.setPayload("Account added for: " + untaint name + " with ID:" + key + "\n");
       caller ->respond(res) but {
           error e => log:printError("Error in responding", err = e)

Ballerina Type System

In most of the traditional programming languages SQL result-sets, JSON data, XML data, etc. are not treated as first-class types. When using or manipulating these data, we have to use various external libraries or add-ons to get the work done. But Ballerina is designed with a sophisticated type system with first-class support for different data types and formats. So users can generate, manipulate, and convert from one type to another easily and with less number of code lines. The following are the basic types in Ballerina that are capable of handling different data types.

  • Value types — int, float, string, boolean, byte
  • table— Represents tabular data in Ballerina. (Ex: Data in a Result-set returned from a SQL query)
  • JSON — built-in type to represent JSON data
  • XML — built-in type to represent XML data
  • record— allows to define user-defined types
  • array — array of data
  • map — key value pairs

The table, JSON, and XML record types are highly useful when working with data integration scenarios. The table is a special type that is designed with the intention of being similar to the table of the relational database table. A table value contains an immutable set of column names and a mutable collection of rows. Each column name is a string and each row is a mapping that associates a value with every column name. The rows are unordered and allows duplicates. Ballerina table can be backed by an actual database table if it is generated from a SQL operation or users can create an in-memory table with some data. The advantage of table type is users can directly map their records types defined in the business logic into the data in database tables.

Here is how you can create an in-memory table in Ballerina.

import ballerina/io;

type Employee record {
   int id,
   string name,
   float salary,

function main(string... args) {
   table<Employee> tbEmployee = table {
       { primarykey id, name, salary },
       [ { 1, "Mary",  300.5 },
         { 2, "John",  200.5 },
         { 3, "Jim", 330.5 }
   io:print("Table Information: ");

In Ballerina, table can be directly converted into XML or JSON type and table can be mapped into record types where each row of tabular data is mapped into record. Also, JSON, map, record, etc. are inter-operable types and casting/converting allows transformation between these different types easily.

Data Streaming Support

In Ballerina, table to JSON and table to XML type conversions result in streamed data. With the data streaming functionality, when a service client makes a request, the result is streamed to the service client rather than building the full result in the server and returning it. This allows virtually unlimited payload sizes in the result, and the response is instantaneous to the client. There the result set corresponding to a particular query is converted to XML/JSON row by row and written to the wire as the conversion takes place upon a row.

Ability to Expose Data as Services via HTTP Service

Success of a business lies in its ability to integrate its data from across the organization and analyze it to make more informed decisions. So accessing data in a convenient way is a key requirement in any data integration scenario. APIs make this data exposure possible and REST is one of the most popular APIs to communicate with web, mobile, and cloud apps. With the rich, fast and easy HTTP REST service development support in Ballerina, it allows rapid data services exposure via REST APIs.

Following is a simple service that exposes data in the student table via the URL http://localhost:9090/dataservice/student as streamed JSON data.

import ballerina/h2;
import ballerina/log;
import ballerina/config;
import ballerina/http;

endpoint h2:Client testDB {
 path: config:getAsString("DATABASE_PATH", default = "./h2-client"),
 name: config:getAsString("DATABASE_PATH", default = "testdb"),
 username: config:getAsString("DATABASE_USER", default = "SA"),
 password: config:getAsString("DATABASE_PASS", default = ""),
 poolOptions: { maximumPoolSize: 5 }

service<http:Service> data bind { port: 9090 } {
   students(endpoint caller, http:Request req) {
       http:Response res = new;
       table tb = check testDB->select("SELECT * FROM student", ());
       json jsonData = check <json> tb;
       res.setPayload(untaint jsonData);
       caller->respond(res) but { error e => log:printError(
                          "Error sending response", err = e) };

Graphical Data Modeling With Tools

Ballerina provides both textual syntax and corresponding graphical syntax for any program. An entire Ballerina program can be viewed as a collection of sequence diagrams that interact with each other. This gives a better view and understanding of the integration scenario and database endpoints are represented as actors of the sequence diagram. This helps the developer to have a clear view on the entire data integration flow. Ballerina IDE tools and plugins support this diagram view for a given Ballerina code.
Image title

ballerina, cloud native, data-driven development, database, integration

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}