Platinum Partner
java,nosql,tutorial,integration,mule,spring integration,mongodb,fuse esb,c24

Agile Message Management with C24 and MongoDB

AUTHORS

Matt Vickery - C24 Technologies & Incept5                                                  Daniel Roberts - 10gen
Iain Porter - C24 Technologies & Incept5



ABSTRACT
C24 and mongoDB  - Agile Message Management. This article presents and demonstrates a natural combination of two enterprise software products  - C24 Technologies C24 iO Studio and 10gen's mongoDB. Both technologies will be introduced, key feature sets outlined and a practical demonstration of their combined capability provided through working code. The primary driver for promoting these two technologies is that they form a powerful toolset that has underpinned agile software delivery for several significant enterprise applications that required non-trivial messaging and data storage capability.

INTRODUCTION
Both C24 iO Studio and mongoDB are enterprise software products that in their respective technology fields lead the way. This article will articulate their primary features and the reason that the technology support that they provide has become increasingly significant. It will start to become obvious that a class-leading, robust, message parsing and transformation capability coupled with a document-oriented database is a compelling technological combination.

Furthermore, both of these products are highly flexible from an architectural perspective. Application software can make use of both technologies through simple to use APIs, which is shown in the sample project presented in this article. Both C24 iO and mongoDB have wide support amongst the Spring community and so both can be used in a Spring container context. Both products also enjoy support from the enterprise software community and can be easily integrated, as an example, into SpringSource's Spring Integration, MuleSoft's Mule, RedHat's Fuse and Apache's Camel integration platforms using adapters that those vendors have implemented and provide to their customer base.

This article will present typical challenges facing technology architects whom are building applications that need to take advantage of technologies that support gaining a competitive advantage through agile construction and rapid delivery. This article will concentrate on advocating the right tools for that challenge.

TYPICAL DATA INTEGRATION CHALLENGES IN SOFTWARE SYSTEMS
Inherent in many Message Driven Architectures (MDA) is a messaging toolkit that must deal with both simple and complex messaging, whether it be, for example, FIX and SWIFT from the financial industry, ACORD from the insurance industry or SS7 from the telecommunications industry.

Message Parsing & Business Validation Rules
Building software around message standards (whether formal, de facto or custom) has generally proven to be highly error-prone and costly. Building software parsers is error prone because it is highly complex.  Whilst message parsers are necessary to translate raw messages into well-defined message models,  without validation capability,  a message parser has limited use. Therefore, software is also required to support the application of business rules so that they can be applied to form a statement of validation for each message.

Building software parsers and validation rule capabilities is costly because many of these standards are frequently updated and you then have to keep pace with that change. Keeping pace generally means updating, testing and releasing code. Furthermore, for some message standards, compliance failure can be very costly, for example SWIFT compliance failure is calculated as a fixed value fine per message.

Message Transformation
A typical  MDA  use case is  one where messages must be transformed from one message type to another. Even using, what appears to be direct field mapping, data often has to go through a process of cleansing, enrichment and type change. On a technical level, and as an example, many developers have had experience of trying to resolve differences between values based on different type systems. Because we have different type systems, an XSD date type is different from a JDBC date type is different from a Java date type is different from an ISO-8601 date type. Further examples of type system variance exist around numeric types; this can be particularly serious for financial applications.

Although source  messages often require some type of validation, an entire set of rules is often required to validate target transformation messages as well. The most agile and efficient approach would be to use a single mechanism.

The primary point made in this section is that data transformation is never as easy as one might imagine. Tooling support, provided by experts in the field, not only provides a competitive advantage through improved time-to-market deliveries  but also saves costs associated  with maintenance and getting parsing and validation wrong.

Key Motivators
With a motive to provide agile, robust and rapid time-to-market software solutions, we are going to advocate using an industry leading messaging toolkit instead of building bespoke parser, transformation and validation rule capability.

Message Storage
Another key technology employed in a typical MDA solution is one that provides message persistence. Most enterprise applications are required to store messages for at least, a short period of time.  That storage requirement may be triggered on message entry to the system or subsequent to having been processed through business logic.

A number of short-term storage technologies and strategies are available for selection. Any database must be able to perform adequately, cope with schema change easily, scale-out to commodity server infrastructure and not cost more than the servers on which they run.

Regarding long-term storage, and as an example, some applications used by the finance industry have requirements to store messages for a number of years in order to meet regulatory requirements. Some fairly typical problems associated with this requirement are the cost of database licenses and servers (a distinct archive DBMS is typically employed), scalability for a growing data storage requirement and having capability to cope with a statically defined schema  being used within an evolving business.

A number of vendors compete in this technology space. An important key differentiator for choosing a technology, and probably the first to consider is, fit-for-purpose. Although it is entirely possible to take a message and build a 4th normal form relational model and then a schema design from it, enterprises are beginning to demand access to technologies that are more suited to agile delivery and storage in a native structure.

Following on from fit-for-purpose requirements, the chosen storage technology must support performance requirements, schema  change through business  evolution, horizontal scalability (scale-out) and have a reasonable license cost.

BREAKING AWAY FROM THE RDBMS - Agile operational data stores with flexible dynamic schemas
Introduction
During the last fifteen to twenty years, the Relational Database Management System (RDBMS) has provided a capability that has seen it become the dominant storage technology in the enterprise application market. Vendor offerings in this space have become rich and plentiful; they include Oracle RDBMS, IBM DB2 and Sybase ASE amongst others. Other lesser known vendors have also appeared within the last 10 years but are founded upon the same roots; a typical relational model implementation along with a standards based Structured Query Language (SQL) provision.

Static Versus Dynamic Schemas
Modern software development requires a highly efficient response to requirements change; this is usually achieved via an agile development process. In order to be most successful, experience has  proven that agile development needs to be supplemented and facilitated with agile products, frameworks and tools.

A great example of this is the significant development advantage that can be leveraged by using dynamic schema capabilities such as those provided by document databases. In order to express entities in a document database, rather than formally specifying tables and attributes in a static Data Definition Language (DDL) script, the document is the schema.

The dynamic schema capability also means that costly data migrations are not mandatory and maintenance burdens are eased.

Scalability
Most relational database vendors  provide vertical scaling capability.  Vertical  scaling, or scale-up, has proven to be very costly as it is usually provided by purchase of bigger servers with more CPU, RAM, faster  storage and  high-speed networking. The subsequent cost of migration of data to the new bigger, faster server is also sometimes substantial. If you have a requirement  to scale-out to multiple servers because  of growing data requirements it becomes increasingly complex and consequently, increasingly expensive. Furthermore, and from a technical perspective,  RDBMS solutions are table driven which makes  high-performance access along with sensible data distribution very difficult to achieve.

Conversely, document databases naturally support  scale-out, data is generally co-located, and partitioning (or sharding) data across distributed nodes is  generally  far  less complex. Furthermore, rather than requiring high-powered servers for scale-up, document databases can be scaled-out using commodity hardware.


Performance
Regarding  both  document  and relational  databases, two aspects of performance are interesting. The first aspect can be described by considering a query that must navigate a single object versus one that must traverse tables that need to be joined prior to projection. Reading data from a single relational table is very fast but as soon as joins across tables are required, performance significantly decreases compared to the equivalent operation using a document database.


The second interesting aspect of performance can be described by considering a growing installation of commodity hardware servers that are being used to house big  data sets. Because documents are stored without undergoing  normalization, it's easy to distribute documents across very large clusters of servers and provide linear scalability for both reads and writes.

ORM Layers
For some types of application, a key challenge that faces the relational model (or at least its vendor implementation) is the much discussed and debated 'impedance mismatch'. This challenge is typically overcome by the introduction of a new layer that brokers between application objects and relational tables - the Object Relational Mapping (ORM) layer. ORM layers are key in the technology stack but only exist because two technologies don't naturally fit together. Rather cleverly, some RDBMS vendors extended their business around the necessity for ORM, consider Oracle's TopLink product for example. As an ORM product, Hibernate became popular but many developers that required reasonable levels of
performance discovered that it performed very poorly compared to their application software - it became a bottleneck. Relational tables and joins have become such a key aspect of performance that very specialist knowledge is required to design schemas, scale horizontally, write (distributed and non-distributed) queries and tune databases around them. Performance is further complicated by the introduction of an ORM layer - you certainly can't ignore it.

Non-relational Data
As an extreme example of an application that does not fit typical relational database facilities, designing a relational schema that would store FpML or SEPAs ISO-20022 messages would require entities with thousands of attributes spread across a huge number of tables. This is completely impractical unless the chosen storage design treats the message as a complete entity or document.

Costly & Heavyweight Product Sets
As RDBMSs have matured, some large vendors have added extensive capability and built large organisations around their products. The extensive capability now includes not just the core RDBMS but modelling and design tools, management and monitoring tools, BI tools, cluster management tools, analytic tools, public issue tracking and support tools and highly skilled professional services. This software tool capability has to be funded and that's typically done through licencing. Some of the RDBMS products have become so complex that vendor PS is required to design and tune them.

Architectural Results
Many customers requiring data storage for their documents are writing software against a technology that forces the impedance mismatch to be resolved. They resolve this technology mismatch by introducing an ORM layer in their application, merely because they are using two technologies that don't fit together. Furthermore, customers then purchase an RDBMS from a mainstream vendor that has built a global company around extended capability, much of which is not required for simple document storage. Expensive consultancy is often required to get the performance that customers need. Scalability is often restricted to scale-up  because data has been normalised into tables that can't be easily distributed. This, in turn, removes potential for using commodity hardware to scale-out.

A New Paradigm
The software industry is undergoing somewhat of an evolution in the thinking around the fundamentals of storage technologies. For projects  that require a different fit between application messaging and the underlying storage technology, document databases look very attractive and are gaining significant traction  in today’s agile driven market. Several very interesting aspects arise from this:
  • Dynamic schemas fit well with agile development and lessen the project development and maintenance burden.
  • Object graph traversal type queries  are computationally cheaper for document databases than the equivalent using relational structures.
  • Scale-out, using commodity hardware is a more cost effective approach than typical RDBMS scale-up.
  • An ORM layer is not necessary; there is no impedance mismatch problem to solve.
  • Large, expensive product sets are not required - purchase and use only what you need.
  • Queries can be expressed in much more simple terms, normalization is not exposed to the query writer through having to join tables together to build documents.
From RDBMS to Document Databases
There's no doubt that  developers  coming from an RDBMS & SQL background will be challenged (albeit briefly) attempting to understanding this new technology - it is a significant and fundamental paradigm shift. However, learning to use document databases is proving to be very compelling to developers because it delivers results quickly and supports change instantly through dynamic schemas. Installation and setup is usually trivial.

C24 INTEGRATION OBJECTS (IO) - DATA MODELLING AND MANAGEMENT
C24 Technologies is a software house specialising in standards-based messaging and integration solutions aimed at the wholesale financial services markets.  C24 Integration Objects (C24 iO) Studio is a data modelling, meta-data management, transformation and messaging integration toolkit based on Java binding technology.

C24 iO Studio can be found in production use at more than twenty blue chip financial services customers worldwide.

Major features that result in C24 iO Studio being one of the leading players in its market space are:
  • Graphical based  message model construction using typical  XSD like syntax, complex types, simple types and attributes.
  • Graphical based  transformation construction using source and target messages. Drag and drop mapping links between source and target messages and apply functions on those links to apply updates, enrichment and type conversions to message fields. A large palette of functions is available for use by transformation designers.
  • Out-of-the-box standards library support, this means that you can start processing complex financial messages without writing a single line of custom parser code. Furthermore, financial messaging standards can be enforced using C24 iO validation rules; these are also all included out-of-the-box. C24 Technologies maintain these standards libraries throughout the year, when new standards are published they update the models and release them to the customer base. 
  • Rich  validation facilities, for standards libraries or custom models, a set of rich validation languages exist that mean you can go well beyond the capabilities of technologies such as the XSD constraint language.
  • Each and every one of the standards based message models are tested to a degree that would surprise even the most test oriented developer.
10GEN MONGODB
MongoDB (from "humongous") is a scalable, high-performance, open source NoSQL database. 10gen has an extensive list of customers that use MongoDB across a number of vertical industries,  including, but not limited to: SecondMarket, Athena Capital Research, Equilar, SAP, MTV and craigslist.

Key Features:
  • Document-oriented storage - JSON-style documents with dynamic schemas.
  • Full Index Support - Index on any attribute, just like you're used to.
  • Replication & High Availability - Mirror across LANs and WANs for scale & reliability.
  • Auto-Sharding - Scale horizontally without compromising functionality.
  • Querying - Rich, document-based queries.
  • Fast In-Place Updates - Atomic modifiers for contention-free performance.

C24 IO AND MONGODB TECHNICAL SAMPLE

Introduction

This section is a deep technical dive into a sample application that demonstrates one potential mechanism for coupling C24 iO and mongoDB. The sample uses a scenario that's manufactured to resemble a high-level business operation.

Scenario
Before that technical deep dive, it would be useful to understand the scenario from a business perspective. The essence of it is that a client of a brokerage firm places orders, the broker then fills those orders. Each client order (NewOrderSingle) may give rise to one or more execution reports (ExecutionReport); orders can be filled with a single execution or multiple individual executions.


Sample Messages
In order to support that scenario, a set of FIX NewOrderSingle and ExecutionReport messages, which were generated from a front office simulator, will be saved into a mongoDB database. All of the messages used in this sample are provided as static data and are contained on two files that can be found in the sample project source (src/main/java/resources).

The ExecutionReports that are loaded represent simulated processing for the collection of the NewOrderSingle messages. The core driver for this sample was to demonstrate financial messages being parsed by C24 iO, saved to a mongoDB database and then queried using mongoDB query facilities.

Inbound Message Delivery
In a typical production scenario, messages would be received via a JMS Queue or a File Reader as raw FIX they then undergo a series of operations:
  1. Message Parsing - C24 iO binds each message to a C24 Java FIX object, this code is provided by the C24 FIX libraries, no custom code is required.
  2. Message Validation - Once parsed (bind), the message is validated to ensure that it is semantically correct.
  3. Message Transformation - Following validation, the message is converted from a C24 Java object to a mongoDB object.
  4. Message Persistence - The mongoDB object is then saved to MongoDB.

Sample Project Distribution
The sample project has been distributed in two forms, the source is available on Github at: https://github.com/C24-Technologies/c24-sample-mongo-trading. The first distribution form is for Internet-enabled environments. Cloning the Github project and running the usual  'mvn clean test' will download dependencies, compile all of the application code and run the integration test classes. The second distribution form provides the sample project as a package that can be run in a non-Internet enabled environment. The package is distributed as a zip file that needs to be unpacked and run using the supplied ant build file or shell script. The ant build file contains several interesting targets; they are clean, compile, createNewOrders and  createExecutionReports. Each of these targets needs to be run in turn in order to populate the database with data necessary for the queries to be executed. The default target invokes all targets in the correct order automatically and so running 'ant' in the root directory of the project will complete the task. Running the shell script './run.sh' will also achieve the same result.

Service Dependencies
Whichever project is executed, a mongoDB database must be running and available for service. The directory  src/main/java/resources contains a database configuration file named  mongoDB.properties. Connection parameters for the mongoDB database that you plan to use need to be configured in that file. Although you would always use authentication credentials in a production system, none are necessary for this sample. All that is required is the server (host) name, database name and port number.

Spring Framework C24 iO Configuration Classes
As the Spring Framework is a popular IoC offering, C24 iO objects used in this sample are configured using Spring Configuration  classes. The class biz.c24.io.mongodb.fix.configuration.C24Configuration contains all of the C24 beans plus a Spring  PopertyPlaceHolderConfigurer  (not shown) that provides access to an external property file (it contains the database details). Within this configuration class are several key bean creation methods. The use of these beans will be discussed in the sections below.


Spring Framework mongoDB Configuration Classes
The Spring Framework creates all of the mongoDB Java objects used in this application during container instantiation. The configuration class is very simple and is as follows.



The key mongoDB properties (database, port & server) have been loaded by the Spring PopertyPlaceHolderConfigurer bean defined in the  C24Configuration class. The Spring bean created by  getMongoDB() creates a connection to the database. The mongoDBTemplate bean provides access to  the mongoDB database instance through the normal Spring template mechanism.

Running The Project
Inbound Message Delivery
For the purposes of this demonstration the data is going to be loaded into MongoDb
via two data loader classes:
  1. biz.c24.io.mongodb.fix.application.NewOrderSingleDataLoader
  2. biz.c24.io.mongodb.fix.application.ExecutionReportDataLoader
These classes load the data from the files in src/main/resources/data-fixture by reading a single line at a time.

Message Parsing
Parsing the String that represents the FIX message requires two classes:
  1. The source parser responsible for parsing the message
  2. The object class to populate
The C24 Parser
Each C24 iO parser extends the abstract class biz.c24.io.api.presentation.Source, the FIX parser class is FIXSource. A single instance of this class is required and so the default  Spring  bean creation options are used.


C24 iO biz.c24.io.api.data.Element objects are used to tell the C24 iO parsers (FIXSource in this sample project) which elements that the caller wants the parser to extract from the message during parsing. These two element beans will be used to tell the FIXSource parser that the caller wants to receive an object representing a NewOrderSingleMessage and also an ExecutionReportMessage.


Two utility beans have been created that actually perform the role of using C24 iO code to parse and validate raw FIX messages; it is boilerplate code and so has been captured within a Spring like C24 template (C24ParseTemplate) class.



Message Validation
C24 iO validation rules are defined on each message model, although of course, they can be shared or re-used. Validation rules are invoked through use of a validation manager. Again, a single instance is required which gives rise to the following configuration. 


Message Transformation
Once the message has been parsed into a Java C24 ComplexDataObject it is transformed into to a mongoDb object prior to being persisted. See the method asMongoDbObject() in the class biz.c24.io.mongodb.fix.application.C24ParseTemplateImpl.

Message Persistence
Messages are persisted through use of Spring’s MongoTemplate.

Application Execution
Application code can be invoked in two different ways, firstly through an application launcher and secondly through an integration test. This section will follow application launcher code through the new order creation process.

The application invocation sequence is as follows:
  • Through the createNewOrders() method, the CreateNewOrderSingle class code loads the Spring container through specification of a context loader. The only configuration that needs to be loaded is the  MongoDbConfiguration class context, the  C24Configuration.class is loaded as a direct dependency using an @Import({C24Configuration.class}) statement. The Spring context loader reads the two Spring @Configuration classes and creates all of the beans that have been defined.
  • The createNewOrders() method gets the mongoDb template bean [line 35], as well as the C24ParseTemplate bean  [line 36] from the Spring container. The same method then loads a file that represents a sample set of FIX NewOrderSingle messages from a source located on the project classpath. The method is now setup for work.



















  • The file containing the FIX messages contains multiple messages, one per line. The C24ParseTemplate class is called to perform parsing of each raw FIX message into a C24 iO java object [line 45], this is the bind() method invocation.
  • The validation manager validates the message is semantically correct [line 46]
  • The C24ParseTemplate converts the C24 iO object into a mongoDB object [line 47].
  • The mongoDB template is invoked with two parameters, the mongoDB object and the other is the name of the collection for which to add the new document object [line 47].



The interesting section in this last code snippet is the code that parses the raw FIX message [45] and the code that writes it into the mongoDB database [47].  The C24ParseTemplate code takes a raw string type message, performs some basic checks and parses that message into a C24 iO Java object. There are two key methods in this class, the one that binds the string to the Java object and the one that converts the Java object into a mongoDB object. 

In the  bind(ComplexDataObject) method, Lines 21 and 22 show the simplicity of using C24 iO code to parse a raw string into a C24 iO Java object. The parser (or source) has a reader set on it. The reader supplies the raw message as a string. At the moment that readObject(...) is called, the parser looks for an instance of the element in the string format message and passes it back to the caller as a C24 iO Java object. Note that All C24 iO message objects extend the class ComplexDataObject. This provides the significant benefit that all C24 iO messages can be handled as a single type.


The other interesting method in this class is the asMongoDBObject(...), this takes a C24 iO Java object (ComplexDataObject) and converts it to a mongoDB document object. C24 iO Java objects can be emitted in any format that's supported for a message, JSON, XML, CSV, Java Source etc. The C24ParseTemplate class above contains an example of how you could format any C24 iO Java object as XML, no transformation necessary, this is merely a formatting option.

The save() method of the mongoDB template is compiled code and distributed as part of the mongoDB Java driver. The MongoTemplate contains all of the methods that you typically need for inserting, updating, deleting and querying a mongoDB database and its collections. The method call in this example accepts a mongoDB object but also the name of a collection, if the collection does not exist, it will be created on behalf of the caller. Once the save(...) method has been called, the FIX documents will now be present in the database. A simple mongoDB  findOne() query on the  NewOrderSingles collection reveals the following results:



Retrieving results through use of a simple query is useful for checking that data is inserted whilst developing an application. However, when it comes to accessing production data, the task will be approached with a different target goal.

SUMMARY
  • This article began by exploring typical challenges for message toolkits and persistence mechanisms in today's software market considering the necessity for low-cost, high-performance, scalable, robust and agile tools.
  • Two key enterprise technologies were introduced,  C24 iO Studio and  mongoDB, that together, form a powerful partnership within Message Driven Architectures. Together they provide a fully featured messaging toolkit and document-oriented data storage.
  • Key features of each product were explored along with driving forces that lead to their existence in today’s software market.
  • An example implementation using both technologies has been created, presented in this paper and distributed through two mechanisms; for  Internet and non-internet enabled environments.

REFERENCES AND SOURCES
  1. C24 iO Whitepaper - Just How Hard Can It Be Parsing SWIFT Messages?
  2. mongoDB Slides and Video 


C24 INTEGRATION OBJECTS (IO)
To learn more about C24 Technologies and C24 Integration Objects including datasheets, customer successes and reference implementations, please visit www.c24.biz.

MONGODB
To learn more about 10gen and mongoDB, please visit www.10gen.com and www.mongodb.org.

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}