Over a million developers have joined DZone.

Software Pipelines in the Real World: Two SOA Performance Case Studies

· Integration Zone

Build APIs from SQL and NoSQL or Salesforce data sources in seconds. Read the Creating REST APIs white paper, brought to you in partnership with CA Technologies.

[img_assist|nid=2871|title=|desc=|link=url|url=http://www.soamag.com/contributors/bio-cisaacson.asp|align=right|width=70|height=90]Performance of service-oriented business applications is a constant concern and one that has led architects and developers to investigate a variety of architectural design options. This, the second in a series of articles by Cory Isaacson, explores one such option in detail by showing how the software pipelines technology and methodology can be leveraged in real world scenarios. Two case study examples are provided, each utilizing software pipelines in a different business context and each exploring different aspects of the software pipelines technology platform. This article further contrasts this approach to SOA performance optimization with traditional methods.

Introduction

The first article in this series (entitled “High Performance SOA with Software Pipelines”) demonstrated that traditional approaches to concurrent processing are not always a good fit for service-oriented applications because they do not account for key requirements such as order-of-execution, transaction priorities, or the ability quickly reallocate resources to handle unpredictable workloads.

This article continues the discussion of software pipelines as an alternative design approach, by providing specific implementation examples. These examples shed some light on how software pipelines work in different business contexts, and how their use can affect service performance and throughput. The examples also highlight the manner in which software pipelines can change the way developers handle the distribution and prioritization of transactions across multiple hardware systems or multiple processor cores.

First we’ll look at a hypothetical example involving a network of automated teller machines (ATMs). This is followed by a customer case study that illustrates specific customer benefits. Both examples are presented with the implied context that the discussed environments are part of a service-oriented enterprise.



Banking ATM Case Study

Let's consider a simple example for processing transactions from a distributed network of banking automated teller machines. The ATMs must access a centralized back-end computing resource to process account-related transactions. The centralized computing facility for this application is an ideal situation for concurrent software pipelines because transaction volume is highly variable, response times are critical, and enforcement of key business rules is essential.

The business requirements for this example include:

Ensure that each account transaction is performed by an authorized individual.
Ensure that the transaction is valid (e.g., there are enough funds in an account to support a requested withdrawal).
Ensure that multiple transactions on a given single account are guaranteed to be performed sequentially (i.e., FIFO is mandatory, preventing a customer from overdrawing an account via near-simultaneous transactions).
If this central processing system were implemented using a traditional monolithic design, it would have a single centralized software application and all functions of that application would be tightly coupled. In other words, each transaction would be processed in a sequential order through all of the functions, and the functions would be self-contained within the application as shown in Figure 1.


Figure 1: A monolithic software application processes all transactions sequentially.
The simplicity of this design has several benefits:
It is very easy to implement.
All business rules are contained in a single set of code.
The sequence of transactions is guaranteed. (It is impossible for a subsequent withdrawal to happen before the debit of the first withdrawal has reduced the account balance.)
However, it is also obvious that such a design results in every user transaction having to wait for any previous transactions to complete. If volume scales dramatically, as in peak periods, and the input flow outstrips the capacity of this single software component to handle the load, a lot of customers will be waiting for transactions to process. All too often, customers that are kept waiting won’t be customers for long – an intolerable condition for a successful
bank.

Using software pipelines, the processing task can be divided into logical units of work for concurrent processing. The first step in any pipeline analysis is therefore to decompose the process into the discrete steps that are required for processing. For this simple application, the steps of the business process are shown in Figure 2.


Figure 2: The first step in pipeline analysis is to decompose the process into discrete steps.
The steps are:
1. Authenticate the user.
2. Ensure the transaction is valid (e.g., if the transaction is a withdrawal, ensure sufficient funds exist to handle the transaction).
3. Update the ATM daily account record.
For our simple example, it is safe to perform the initial authentication of users in a separate pipeline. A different system can perform this work, and once the authentication returns, the remainder of the task can be handled. In fact, because we are not concerned with ordering at this stage, it is safe to have multiple pipelines for this single task as shown in Figure 3. Our interest is simply to process as many authentications per unit of time, regardless of order.


Figure 3: Authentication requests can be handled asynchronously in a separate pipeline pool.
While this option can potentially speed performance, the bulk of the work (ATM account updates) is still a serial process. Because these steps are downstream from the authentication step, bottlenecks will occur unless the downstream processes can handle the same transaction load. To achieve a dramatic performance improvement, we must further analyze the process to determine other potential optimization points.

Once authentication is complete, the next step is to ensure that the requested transaction is valid. This requires evaluating the current account information to ensure that the transaction will not overdraw the account. It is possible to perform multiple validation transactions simultaneously without violating our key business requirement. The primary requirement is that no two transactions for the same account are performed out of sequence (or simultaneously).

This FIFO sequence can be a key bottleneck in a monolithic business application because it requires that all transactions be performed using a FIFO methodology. Software pipelines, on the other hand, enable developers to ensure the FIFO sequence while implementing a concurrent processing solution. The key is to establish multiple software pipelines, each responsible for processing a certain segment of the incoming transactions.

A pipelines distributor can be used to sort transactions based on their content so that all transactions for a given account number will go through the same pipeline. This approach enables each pipeline to maintain the FIFO order for the accounts that it services, yet to process transactions for only a portion of the entire list of accounts. By distributing the process in this way, we can process many transactions concurrently while guaranteeing a proper FIFO sequence within each pipeline. This guarantees that no two transactions for the same account can be performed out of sequence (or simultaneously).

In this ATM example, our design includes a pipeline for each branch of the bank as shown in Figure 4. Each branch controls a subset of individual accounts so those subsets are thus allocated to a specific pipeline. Transactions are sorted based on their content (account number/branch ID) and delivered to the appropriate software pipelines. A pipeline distributor is used perform this function. It reads all transactions and routes each transaction to the pipeline configured for that branch ID (e.g. Branch_1, Branch_2, etc.). A single pipelines pool is used to sequentially perform both the “validate” and “update” functions for each transaction received.


Figure 4: A separate pipeline pool can be used to enable concurrent processing of transactions.
The pipeline distributor evaluates the branch ID of each account number for each transaction, and routes the message to the appropriate branch pipeline for processing. Because each branch pipeline is configured for a FIFO order, it sequentially handles the transactions delegated to it, first validating the transaction, and then updating the ATM daily account record if the transaction is valid.

This example shows the processing of many branches in parallel, an approach that can result in a far greater number of completed transactions per unit of time. By distributing the load across multiple software pipelines, we can assign the appropriate hardware resources to each pipeline to enable linear or near-linear scalability as more pipelines are added.

If even greater scalability is needed for a specific portion of the process, a secondary pipelines pool can be added along with more hardware resources to boost performance in that area. For example, a very large branch of the bank may have in excess of 100,000 accounts. The peak volume of transactions for this large branch may not be sustained by this design. The answer is to create additional downstream pipelines, now dividing transactions by a range of account numbers (Account_1000_1999, Account_2000_2999, etc.) as shown in Figure 5.


Figure 5: Additional pipeline pool can be added to extend scalability at any point in the design.


Customer Relationship Management Case Study

A large customer relationship management (CRM) outsourcer was facing a major challenge in their need to both cut IT costs while also developing a scalable platform that would support significant business expansion. The company is an outsourcing provider for a variety of customer relationship services including outbound marketing, customer data management, and inbound call management. As a result, they must process a very large amount of client data every day.

The existing solution was based on a 48 processor UNIX server that represented an investment of more than $1 million in hardware. This system was considered too costly for the size of the organization and would be even more expensive to scale in response to expected future demands. Another challenge with the existing solution was that it relied on a centralized database, which was becoming a performance bottleneck. The new environment would need to offer cost-effective scalability while also reducing contention on the central database.



Proof-of-concept Solution

The company decided to implement a proof-of-concept solution using software pipelines to determine how effective this design approach could be in addressing the two seemingly contradictory requirements. They selected a portion of their automotive dealer processing system to be implemented using software pipelines and concurrent processing with 10 Intel dual-processor (quad-core) servers for the deployment.

Software pipelines were used to distribute transactions based on a dealer ID with one pipeline for identifying vehicles that were in need of maintenance and another pipeline for processing repair orders. Figure 6 shows that the software pipelines design involved two pipelines for each dealer and a single database.


Figure 6: Software pipelines provide distributed and concurrent processing of dealer transactions.
To further enhance performance, each dealer's records were read into memory and results were written to the database only when all processing of data for a specific dealer was complete. This significantly lowered database contention and enabled further gains in performance throughput. The reduction in database usage also provided an opportunity to reduce costs in terms of database server hardware and database software license costs.



Performance Improvements

The performance test that was conducted included a dealer data set of 150 records and simulated multiple dealers running concurrently. Figure 7 shows that total runtime for the software pipelines solution with 40 concurrent dealers was five times faster than the existing dealer processing application. This solution required nearly 22 hours to process the 6000 records for 40 dealers and the software pipeline-based solution was able to process the same 6000 records in slightly less than 4 hours.


Figure 7: Performance of the software pipelines solution was five times faster than the existing solution.
Table 1 provides the detailed results of the tests. For the single dealer test that involved only 150 records, the software pipelines solutions delivered an average time per record of 2.91 seconds, nearly twice as fast as the 5.38 seconds for the existing solution. As more concurrent dealers were added, the software pipelines solution was able to maintain an average time per record of less than 2.5 seconds.

The existing solution, however, did not scale as well. It’s average time per record increased significantly when the number of concurrent dealers was increased from 20 to 40 dealers. The steep slope in the top line in Figure 7 highlights the scalability problem in the existing dealer application. With a total runtime of 78,533 seconds (nearly 22 hours), the existing solution took five times longer than the 14,216 seconds (4 hours) for the software pipelines solution.


Table 1: Software pipelines offered consistent processing times even as the number of records grew large.


Conclusion

Software pipelines offer a simple way for business applications to implement concurrency while maintaining order of execution priorities and simplicity of application development. The examples in this article illustrated the following application of this technology:
Decompose the application process into discrete steps.
Determine what steps can be carried out concurrently based on business rules (e.g. no transactions for the same account can be performed out of order).
Establish a pipelines design that distributes the total pool of transactions across several pipelines while grouping together in a single pipeline all transactions that are dependent on each other in terms of execution sequence.
Balance the flow of transactions by adding extra pipelines pools, distributors and/or hardware resources wherever there is a bottleneck.

In modern service-oriented solutions, design optimization is a paramount consideration. Maximizing the throughput and overall efficiency of message exchanges is a prime design concern that deserves an investigation of all viable technologies and architectural options. This article focused on software pipelines as one of those options capable of addressing performance issues while distributing performance loads across multiple pipelines in support of linear or near linear scalability.

This article was originally published in The SOA Magazine (www.soamag.com), a publication officially associated with "The Prentice Hall Service-Oriented Computing Series from Thomas Erl" (www.soabooks.com). Copyright ©SOA Systems Inc. (www.soasystems.com)

The Integration Zone is brought to you in partnership with CA Technologies.  Use CA Live API Creator to quickly create complete application backends, with secure APIs and robust application logic, in an easy to use interface.

Topics:

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}