Over a million developers have joined DZone.

How to Implement Complex Business Rules with a Powerful Computation Tool

· Java Zone

Learn more about the advantages of moving from a monolithic to microservices architecture.  Brought to you in partnership with IBM.

In an ever-changing business world, people nowadays often find themselves get stuck in various computing problems involving complex business rules. Consider the below example case:

1.  Based on the order details, find the top 10 greatest buyers in each month of 2012.

2.  Average the account balance of those customers who once bought all items.

3.  Compute the month-over-month growth rate of contract value settled by each sales man.

4.  Based on the daily trading statistics of multiple stocks, find the stocks having been rising for consecutive 5 days.

To most people, SQL may be their first idea in solving the complex computation over big data. It is true that SQL implements various typical big data computing with the abundant resources and a wide range of applications. However, when it comes to the complex business rules, SQL is quite inconvenient evidently, owing to the fact that SQL offers no direct support for the step-by-step mechanism, no support for explicit sets, ordered sets, and lacks the object reference.

No direct support for step-by-step computing. Streamlining, simplifying, and stepwise refinement is an important method to solve the complex problems. This best practice is also applicable to the business computing over unusually complex business rules. However, SQL does not provide the direct support for the step-by-step computing. In other words, SQL users have to compose one lengthy statement to solve any complex problem, adding difficulties to the coding design, comprehension, and maintenance.

Still the example case 1, based on the order details, find the top 10 greatest buyers in each month of 2012. The SQL solution is:

select Client from(

  select * from(

  select B.*,row_number() over(partition by month order by SumValue desc)

  rown from(

  select to_char(SellDate,'mm') month,Client,sum(Quantity*Amount) SumValue

   from contract

  where SellDate>=to_date('2012-01-01','yyyy-mm-dd')

  and SellDate<=to_date('2012-12-31','yyyy-mm-dd')

  group by to_char(SellDate,'mm'),Client order by month,client

  ) B


  where rown<=10


group by Client

having count(Client)=(

  select  count(distinct(to_char(SellDate,'mm')))

  from contract

  where SellDate>=to_date('2012-01-01','yyyy-mm-dd')

  and SellDate<=to_date('2012-12-31','yyyy-mm-dd')


As a computing tool outperforming others on handling the complex business rules, the step-by-step computation is expected as a basic function. For the same problem, esPoc can implement it like this:

Cell A1: Filter out the data for year 2012 by date

Cell A2: Group A1 result by month

Cell A3: Compute the monthly sales of each clients, based on the A2 result

Cell A4: Sort clients by sales in reverse order, based on the A3 result

Cell A5: Filter out the top 10 greatest buyers for each month, based on the A4 result

Cell A6: Retrieve the client IDs for each month, based on the A5 result

Cell A7: Find the top 10 buyers in each month by computing the intersection set for each month, based on the A6 result

As can be seen, esProc can split a complex problem into several components to complete in several simple steps. A difficult problem can be solved by simplifying and stepwise refinement.

No support for explicit set. The explicit sets are the variables of explicit type of set. Although SQL supports the algorithm of sets, no such a data type as set in SQL, not to mention the explicit set. Many algorithms must be implemented with special tactics.

The previous example case is also about the operations on set. The way to find the top 10 for each month is just the intersection operation on the top 10 buyers in each month. The esProc code is A6.isect(), representing the intersection - i.e. isect computation – to conduct over the top 10 for each month as what is computed in A6. The solution and business logic are identically matched, while solving that intersection job with SQL solution highly depends on brain-racking and zigzag ideas:

1. Group by month and then count the groups.

2. Group by CLIENT and count the occurrences of each CLIENT.

3. Select out those CLIENTs for which the counts in the 2 is equal to the count in the 1.

Obviously, even the experienced programmers may fail to figure out this cunning method, which is by far the simplest solution as far as we know.

Then, let’s check out another example to locate the contracts meeting the below conditions respectively:

1.  All contracts

2.  Contracts settled in 2012

3.  Ordered over 40 items (great quantity)

4.  Unit price is over 2000 (great unit price)

5.  Meeting the conditions 2 - 4

6.  Excluding those meeting the condition 5

SQL code is shown below:

1.  select SellDate,Quantity,Amount,Client from Contract

2.  select SellDate,Quantity,Amount,Client from Contract where to_char(SellDate,'yyyy')='2012'

3.  select SellDate,Quantity,Amount,Client from Contract where quantity>=40

4.  select SellDate,Quantity,Amount,Client from Contract where AMOUNT>=2000

5.  select SellDate,Quantity,Amount,Client from Contract where to_char(SellDate,'yyyy')='2012'  and quantity>=40 and AMOUNT>=2000

6.  select SellDate,Quantity,Amount,Client from Contract where not(to_char(SellDate,'yyyy')='2012'  and quantity>=40 and AMOUNT>=2000)

Unlike SQL, esProc supports the explicit set by nature, advocating the natural and clear thought of design, simpler coding, and higher readability. So easy can the above problem b solved with esProc, as its impressive code shows below:

Note: The A1-A6 is for the conditions 1-6.

The explicit set can be used in combination with the step-by-step computation. For example, based on the result of example case 1, compute the contract values per month for clients who are not the top 10 buyers. esProc users only need to add one statement: (A4\\A5).(~.sum(SumValue)).

By comparison, SQL users have to redesign the whole solution:

with A as (

  select SellDate,Quantity,Amount,Client from contract

  where SellDate>=to_date('2012-01-01','yyyy-mm-dd')

  and SellDate<=to_date('2012-12-31','yyyy-mm-dd')

),B as(

  select to_char(SellDate,'mm') month,Client,sum(Quantity*Amount) SumValue

  from A group by to_char(SellDate,'mm'),Client order by month,client


,C as(

  select B.*,row_number() over(partition by month order by SumValue desc)rown

  from B

),D as(

  select * from C where rown<=10


select sum(sumValue) monthValue from(

  select * from C


  select * from D

)group by month order by month

Lack of object reference. As we know, a group of relevant data can be regarded as an object. So, we can take advantage of object reference to present the data relation intuitively and straightforwardly. By doing this, the difficulty of association-related computing can be reduced untimely. The complexity of SQL majorly owes to its lack of support for the object reference. Programmers always feel tough to figure out an intuitive way to present the associating relations between data intuitively. As demonstrated in the example case 2, to find the Customer who once bought all items and average their respective account balances, the way of solving the problem can be:

1.  Find the respective orders corresponding to each client, and locate the items in the order details by referring to orders.

2.  Compute the distinct value for each item, then count the distinct to have the length of purchased item list for each client

3.  By simply comparing the lengths of purchased item list and the standard item list, we can find the clients meeting these conditions.

SQL solution is as follows:

with A as(

  select orders.order_id,orders.customer_id,orderdetails.item_id

  from orders left join orderdetails

  on orders.order_id=orderdetails.order_id

),B as(

  select customer_id,count(distinct(item_id))ItemCount

  from A group by customer_id

  having count(distinct(item_id))=(select count(item_id) from item)


select avg(customers.acct_balance) from B left join customers

on B.customer_id=customers.customer_id

As can be seen above, even the not-so-lengthy SQL statement above is hard to comprehend considering the several joins. Of all these abstruse points, the key point is “locating the corresponding orders for each client and the corresponding items by referring to the orders”. By comparison, thanks to the native esProc supports for the object reference, esProc users can use the operator “.” to access data directly. For these abstruse points, esProc can clearly put it like this: cGroup.(order_id).(item_id). In which, the “cGroup” represents each client, “.(order_id)” the orders corresponding to the clients, and “.(item_id)” the items corresponding to the orders.

To further compute and count the distinct value after associating, esProc code is cGroup.(order_id).(item_id).id().count().

Simple association, intuitive access, and clear logics – they are benefits of object-reference.

No support for ordered set. The complex business rules are often related to the ordered computation, such as the month-over-month comparison, link relative ratio comparison, data retrieval in the relative range, ranking in groups or the first or last ones computing. Take the example case 3 on computing the month-over-month growth rate of contract value for each sales man (ACTUALSALE). The SQL solution is:

with A as(

  select actualSale,Quantity*Amount sales,sellDate from contract

),B as(

  select actualSale,TO_NUMBER(to_char(SellDate,'yyyy')) year,

  TO_NUMBER(to_char(SellDate,'mm')) month,

  sum(sales) salesMonth

  from A group by actualSale,TO_NUMBER(to_char(SellDate,'yyyy')) ,


  order by actualSale,year,month

),C as(

  select actualSale,year, month, salesMonth,

  lag(salesMonth,1,0) over(order by actualSale,year,month) prev_salesMonth,

  lag(actualSale,1,0) over(order by actualSale) prev_actualSale

  from B


select actualSale,prev_actualSale,year, month,

   (case when prev_salesMonth!=0 and


  then ((salesMonth/prev_salesMonth)-1)

  else 0


from C

In which, the most difficult part is to present the month-over-month sales comparison for each sales man with SQL. With the native support for ordered set, esProc enable programmers to simply put this algorithm as salesMonth/salesMonth[-1]-1, in which the salesMonth represents the sales of this month, and salesMonth[-1] the sales of the previous month.

For another example, to find the top 3 salesman having achieved the greatest sales values, esProc users can simply put it like this: A2.sort(-sum(SALES)).select(#<3). For the computations related to range and order, such as the moving average of salesman in 3 months, esProc users can simply write: salesMonth{-1,1}.avg().

These algorithms are never easy for SQL to implement.

As can be seen, with the supports for step-by-step computation, object reference, and ordered set, esProc can implement the complex business rules easily.

Needless to say, some other computing languages may offer one or several above-mentioned features. For example, Java, C#, and other high-level languages support the step-by-step computation, object reference, and other features well. However, the high-level languages also have the obvious drawbacks: neither do they have enough basic function libraries, nor are their computing capabilities dedicated and purpose-designed. To have a clear insight about this fact, let’s try to compute the moving average with Java: 


  st=conn.prepareStatement("select sum(amount)amount, to_char(time,'MM')month from loan  where to_char(time,'yyyy')=to_char(sysdate,'yyyy') group by to_char(time,'MM') order by month",ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_READ_ONLY);

  ResultSet rs=st.executeQuery();

  int size=rs.getFetchSize();

  for(int currentPos=1;currentPos<=size;currentPos++){


  float preAmount=-1,thisAmount=-1,nextAmount=-1;

  float avgAmount=-1;

  String month=rs.getString("month");


















  System.out.println(month+"  "+avgAmount);



  ……As can be seen above, Java and other high-level languages are fit for developing the basic function libraries or utilities, but unfit for the structural data computing. They are particularly awkward in implementing the computation involving complex business rules.

The R language is a scientific computing tool and characterized with the step-by-step computation, explicit set, objects reference, and ordered set. With the four advantages, R is more adept to solve the complex computation involving complex business rules. However, it is a pity that R users are mainly the scientific researchers because R syntax and function is too specialized for the average programmer to use. For example, in the example case 4, with the daily trading data, select the stocks which have been rising for 5 consecutive days from multiple stocks. The codes of R are:

01  library(gdata) #use excel function library

02  A1<- read.xls("e:\\data\\all.xlsx") #import data

03  A2<-subset(A1,as.POSIXlt(Date)>=as.POSIXlt('2012-06-01') & as.POSIXlt(Date)<=as.POSIXlt('2012-06-30')) #filter by date

04  A3 <- split(A2,A2$Code) #group by Code

05  A8<-list()

06  for(i in 1:length(A3)){

07    A3[[i]][order(as.numeric(A3[[i]]$Date)),] #sort by Date in each group

08    A3[[i]]$INC<-with(A3[[i]], Close-c(0,Close[- length (Close)])) #add a column, increased price

09    if(nrow(A3[[i]])>0){  #add a column, continuous increased days

10    A3[[i]]$CID[[1]]<-1

11    for(j in 2:nrow(A3[[i]])){

12    if(A3[[i]]$INC[[j]]>0 ){

13    A3[[i]]$CID[[j]]<-A3[[i]]$CID[[j-1]]+1

14    }else{

15    A3[[i]]$CID[[j]]<-0

16    }

17    } 

18    }

19    if(max(A3[[i]]$CID)>=5){  #stock max CID is bigger than 5

20    A8[[length(A8)+1]]<-A3[[i]]

21    }

22  }

23  A9<-lapply(A8,function(x) x$Code[[1]]) #finally,stock code

esProc features the agile syntax, strong presentation, and much friendlier mechanism to follow the natural thought pattern of programmers and help them solve the above problems effortlessly.

In addition, the code of esProc is presented in a grid, which is intuitive for presenting the step-by-step computation and convenient for code debugging. In esProc, users can set the break point, run step by step, and run to cursor. When debugging, users can monitor the computing result of the current cell, and review every executed cells at any time. All these conveniences are the result of true debugging.

R lacks the true invoking mechanism, and only provides the very inconvenient console command for calling. 

R is poor when integrating to other tools. Because it is designed for scientists to use on desktop, it is hard to integrate R with reporting tools or high-level languages like JAVA and C#.

The library function of R is quite good. But R language is poor in implementing the loops with acceptable performance. Looping for 100 million times would cost over 1 hour to complete. Considering the facts that the data traversal is very common for the business computing involving complex business rules, and not all jobs can be done with library function, R language currently is not a practical tool for handling big data with such poor performance.

esProc provides the JDBC interface for integration into the reporting tool or Java applications. Regarding the performance, esProc is comparable to the popular interpreting scripts like Perl and Python, and superior to R language by one order of magnitude. In addition, esProc allows for the inexpensive scale-out to meet the high demand on big data computing.

From Idea to Application gives you the architecture to quickly build, manage and run a range of applications (web, mobile, big data, new smart devices, etc.) on an open-standard, cloud-based platform. See why developers are using IBM Bluemix. Brought to you in partnership with IBM.


The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}