Software Pipelines in the Real World: Two SOA Performance Case Studies
Join the DZone community and get the full member experience.Join For Free
[img_assist|nid=2871|title=|desc=|link=url|url=http://www.soamag.com/contributors/bio-cisaacson.asp|align=right|width=70|height=90]Performance of service-oriented business applications is a constant concern and one that has led architects and developers to investigate a variety of architectural design options. This, the second in a series of articles by Cory Isaacson, explores one such option in detail by showing how the software pipelines technology and methodology can be leveraged in real world scenarios. Two case study examples are provided, each utilizing software pipelines in a different business context and each exploring different aspects of the software pipelines technology platform. This article further contrasts this approach to SOA performance optimization with traditional methods.
The first article in this series (entitled “High Performance SOA with Software Pipelines”) demonstrated that traditional approaches to concurrent processing are not always a good fit for service-oriented applications because they do not account for key requirements such as order-of-execution, transaction priorities, or the ability quickly reallocate resources to handle unpredictable workloads.
This article continues the discussion of software pipelines as an alternative design approach, by providing specific implementation examples. These examples shed some light on how software pipelines work in different business contexts, and how their use can affect service performance and throughput. The examples also highlight the manner in which software pipelines can change the way developers handle the distribution and prioritization of transactions across multiple hardware systems or multiple processor cores.
First we’ll look at a hypothetical example involving a network of automated teller machines (ATMs). This is followed by a customer case study that illustrates specific customer benefits. Both examples are presented with the implied context that the discussed environments are part of a service-oriented enterprise.
Banking ATM Case Study
Let's consider a simple example for processing transactions from a distributed network of banking automated teller machines. The ATMs must access a centralized back-end computing resource to process account-related transactions. The centralized computing facility for this application is an ideal situation for concurrent software pipelines because transaction volume is highly variable, response times are critical, and enforcement of key business rules is essential.
The business requirements for this example include:
|•||Ensure that each account transaction is performed by an authorized individual.|
|•||Ensure that the transaction is valid (e.g., there are enough funds in an account to support a requested withdrawal).|
|•||Ensure that multiple transactions on a given single account are guaranteed to be performed sequentially (i.e., FIFO is mandatory, preventing a customer from overdrawing an account via near-simultaneous transactions).|
Figure 1: A monolithic software application processes all transactions sequentially.
|•||It is very easy to implement.|
|•||All business rules are contained in a single set of code.|
|•||The sequence of transactions is guaranteed. (It is impossible for a subsequent withdrawal to happen before the debit of the first withdrawal has reduced the account balance.)|
Using software pipelines, the processing task can be divided into logical units of work for concurrent processing. The first step in any pipeline analysis is therefore to decompose the process into the discrete steps that are required for processing. For this simple application, the steps of the business process are shown in Figure 2.
Figure 2: The first step in pipeline analysis is to decompose the process into discrete steps.
|1.||Authenticate the user.|
|2.||Ensure the transaction is valid (e.g., if the transaction is a withdrawal, ensure sufficient funds exist to handle the transaction).|
|3.||Update the ATM daily account record.|
Figure 3: Authentication requests can be handled asynchronously in a separate pipeline pool.
Once authentication is complete, the next step is to ensure that the requested transaction is valid. This requires evaluating the current account information to ensure that the transaction will not overdraw the account. It is possible to perform multiple validation transactions simultaneously without violating our key business requirement. The primary requirement is that no two transactions for the same account are performed out of sequence (or simultaneously).
This FIFO sequence can be a key bottleneck in a monolithic business application because it requires that all transactions be performed using a FIFO methodology. Software pipelines, on the other hand, enable developers to ensure the FIFO sequence while implementing a concurrent processing solution. The key is to establish multiple software pipelines, each responsible for processing a certain segment of the incoming transactions.
A pipelines distributor can be used to sort transactions based on their content so that all transactions for a given account number will go through the same pipeline. This approach enables each pipeline to maintain the FIFO order for the accounts that it services, yet to process transactions for only a portion of the entire list of accounts. By distributing the process in this way, we can process many transactions concurrently while guaranteeing a proper FIFO sequence within each pipeline. This guarantees that no two transactions for the same account can be performed out of sequence (or simultaneously).
In this ATM example, our design includes a pipeline for each branch of the bank as shown in Figure 4. Each branch controls a subset of individual accounts so those subsets are thus allocated to a specific pipeline. Transactions are sorted based on their content (account number/branch ID) and delivered to the appropriate software pipelines. A pipeline distributor is used perform this function. It reads all transactions and routes each transaction to the pipeline configured for that branch ID (e.g. Branch_1, Branch_2, etc.). A single pipelines pool is used to sequentially perform both the “validate” and “update” functions for each transaction received.
Figure 4: A separate pipeline pool can be used to enable concurrent processing of transactions.
This example shows the processing of many branches in parallel, an approach that can result in a far greater number of completed transactions per unit of time. By distributing the load across multiple software pipelines, we can assign the appropriate hardware resources to each pipeline to enable linear or near-linear scalability as more pipelines are added.
If even greater scalability is needed for a specific portion of the process, a secondary pipelines pool can be added along with more hardware resources to boost performance in that area. For example, a very large branch of the bank may have in excess of 100,000 accounts. The peak volume of transactions for this large branch may not be sustained by this design. The answer is to create additional downstream pipelines, now dividing transactions by a range of account numbers (Account_1000_1999, Account_2000_2999, etc.) as shown in Figure 5.
Figure 5: Additional pipeline pool can be added to extend scalability at any point in the design.
Customer Relationship Management Case Study
A large customer relationship management (CRM) outsourcer was facing a major challenge in their need to both cut IT costs while also developing a scalable platform that would support significant business expansion. The company is an outsourcing provider for a variety of customer relationship services including outbound marketing, customer data management, and inbound call management. As a result, they must process a very large amount of client data every day.
The existing solution was based on a 48 processor UNIX server that represented an investment of more than $1 million in hardware. This system was considered too costly for the size of the organization and would be even more expensive to scale in response to expected future demands. Another challenge with the existing solution was that it relied on a centralized database, which was becoming a performance bottleneck. The new environment would need to offer cost-effective scalability while also reducing contention on the central database.
The company decided to implement a proof-of-concept solution using software pipelines to determine how effective this design approach could be in addressing the two seemingly contradictory requirements. They selected a portion of their automotive dealer processing system to be implemented using software pipelines and concurrent processing with 10 Intel dual-processor (quad-core) servers for the deployment.
Software pipelines were used to distribute transactions based on a dealer ID with one pipeline for identifying vehicles that were in need of maintenance and another pipeline for processing repair orders. Figure 6 shows that the software pipelines design involved two pipelines for each dealer and a single database.
Figure 6: Software pipelines provide distributed and concurrent processing of dealer transactions.
The performance test that was conducted included a dealer data set of 150 records and simulated multiple dealers running concurrently. Figure 7 shows that total runtime for the software pipelines solution with 40 concurrent dealers was five times faster than the existing dealer processing application. This solution required nearly 22 hours to process the 6000 records for 40 dealers and the software pipeline-based solution was able to process the same 6000 records in slightly less than 4 hours.
Figure 7: Performance of the software pipelines solution was five times faster than the existing solution.
The existing solution, however, did not scale as well. It’s average time per record increased significantly when the number of concurrent dealers was increased from 20 to 40 dealers. The steep slope in the top line in Figure 7 highlights the scalability problem in the existing dealer application. With a total runtime of 78,533 seconds (nearly 22 hours), the existing solution took five times longer than the 14,216 seconds (4 hours) for the software pipelines solution.
Table 1: Software pipelines offered consistent processing times even as the number of records grew large.
Software pipelines offer a simple way for business applications to implement concurrency while maintaining order of execution priorities and simplicity of application development. The examples in this article illustrated the following application of this technology:
|•||Decompose the application process into discrete steps.|
|•||Determine what steps can be carried out concurrently based on business rules (e.g. no transactions for the same account can be performed out of order).|
|•||Establish a pipelines design that distributes the total pool of transactions across several pipelines while grouping together in a single pipeline all transactions that are dependent on each other in terms of execution sequence.|
|•||Balance the flow of transactions by adding extra pipelines pools, distributors and/or hardware resources wherever there is a bottleneck.|
In modern service-oriented solutions, design optimization is a paramount consideration. Maximizing the throughput and overall efficiency of message exchanges is a prime design concern that deserves an investigation of all viable technologies and architectural options. This article focused on software pipelines as one of those options capable of addressing performance issues while distributing performance loads across multiple pipelines in support of linear or near linear scalability.
This article was originally published in The SOA Magazine (www.soamag.com), a publication officially associated with "The Prentice Hall Service-Oriented Computing Series from Thomas Erl" (www.soabooks.com). Copyright ©SOA Systems Inc. (www.soasystems.com)
Opinions expressed by DZone contributors are their own.