DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Non-blocking Database Migrations
  • Building a Database Written in Node.js From the Ground Up
  • Understanding the Fan-Out/Fan-In API Integration Pattern
  • The Bill You Didn't See Coming

Trending

  • Liquibase: Database Change Management and Automated Deployments
  • Architecting Sub-Microsecond HFT Systems With C++ and Zero-Copy IPC
  • We Went Multi-Cloud and Almost Drowned: Lessons From Running Across AWS, GCP, and Azure
  • Stop Using Python for Your GenAI Apps, Use Go and Genkit Instead
  1. DZone
  2. Data Engineering
  3. Data
  4. How To Handle 100k Rows Decision Table in Drools (Part 1)

How To Handle 100k Rows Decision Table in Drools (Part 1)

I create a prototype to demonstrate how to handle large rows in decision tables with reasonable performance. Following this are three different solutions.

By 
Ryan ZhangCheng user avatar
Ryan ZhangCheng
·
Feb. 26, 21 · Analysis
Likes (5)
Comment
Save
Tweet
Share
6.7K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

When handling large rows of decision tables, one of the biggest pains is performance. In this article, I prepared a prototype setup to a simple scenario to simulate the large decision table use case and provided three solutions to utilize drools (a rules-oriented application framework). I focus on the decision table rule execution performance.

For the sake of explanation of the core concept of problem-solving, I prepared two decision tables: 10k & 100k row data to simulate decision-making procedure usage in rules application.

I provide 3 different solutions in three git branches; for coding fans, feel free to jump directly into the following github links and check the code.

  • Rule-template-solution — Use Rule Template + XLS raw format decision table
  • Precompile-rule-solution — Use Kie-Maven-Plugin to precompile Formatted Drools Decision Table
  • Row-as-fact-solution — Use Large row data as Fact instead of Rules as a solution;

For an overview, the performance comparison in my demonstration code is:


Solution Warmup time One rule Execution Time(Run) Rule Compile time(Package)

Rule Template + XLS 99s 9500ms 1.5 mins

Precompile rules 21s 8500ms 15 mins

Row as Fact instead of Rules ~8s 100 ms 9s
10k Rows Decision Table Scenario

Solution Warm up time One rule Execution Time(Run) Rule Compile time(Package)

Rule Template + XLS 99s 9500ms 1.5 mins

Precompile rules 21s 8500ms 15 mins

Row as Fact instead of Rules ~8s 100 ms 9s
100K Rows Decision Table Scenario


(Warm-up time includes: Load Rules & Facts, Xls, create kiesession, etc.)

The performance difference is obvious, however, each solution has its pros and cons. If you are interested in this topic, let me guide you as we dive into the details.

Scenario Briefing

Recently I helped a customer handle a very large number of decision tables in drools and provided reasonable performance. Typically in the insurance, healthcare, or bank, logistic industry, it's not a rare thing to have to maintain huge numbers, rules or keywords, or values for computing a result.

Usually, the decision table is recommended for such scenarios because it's easy to understand and maintain from a business user perspective.

A decision table is very convenient for handling large amounts of rules. However, performance is a big concern. As you can see from the previous tables—Does my rule framework (eg: drools in this article is a rule framework) satisfy my performance requirement?

Compiling Rules

It can take minutes to just compile rules, and it could also cost seconds to serve one rule request. So in my setup, I prototype a rule usage scenario that can reproduce the large decision table rule firing usage.

Assuming I have a decision table, which would match a keyword when a keyword is matched, the result is decided as true or false.

When A ClientObject description matches a keyword in my list then clientObject.result is false, otherwise clientObject.result is true (Note that I make the true value the default).

Of course, in reality, there are more complex rules when making a business decision. Drools have sophisticated solutions for complex rule handling; for example, using drl or integrate the other rules into the same decision tables. However, in this article, the problem we want to solve is handling a very big number of rows, so I hide the complex rule configurations.


Object Description CheckResult

"DangerObject1.*" false

".*DangerObject2" false

"DangerObject3" false
...hide ..100k..rows false

"DangerObject10000.*" false
100k Decision Table

For the purpose of comparison and efficiency, I provided two decision tables: 10kTable.xls and 100kTable.xls which contain 10k rows and 100k rows.

Java
 




xxxxxxxxxx
1


 
1
public class ClientObject {
2
 
3
   private int id;
4
   private java.lang.String descr;
5
   private boolean pass=true;
6
}


ClientObject


We also want to separate business rule code from application code. To do that, we have set up two separate maven projects:

  1. rules — Use to store and manage rules logic
  2. myapp — Use to maintain the generic application code
Shell
 




xxxxxxxxxx
1
27


 
1
├── myapp
2
│   ├── pom.xml
3
│   └── src
4
│       ├── main
5
│       │   ├── java
6
│       │   │   └── com
7
│       │   │       └── myspace
8
│       │   │           └── App.java
9

          
10
└── rules
11
    ├── pom.xml
12
    └── src
13
        ├── main
14
        │   ├── java
15
        │   │   └── com
16
        │   │       └── myspace
17
        │   │           └── spreadsheet_decisiontable
18
        │   │               └── ClientObject.java
19
        │   └── resources
20
        │       ├── com
21
        │       │   └── myspace
22
        │       │       └── spreadsheet_decisiontable
23
        │       │           ├── 100kTable.xls.deactive
24
        │       │           ├── 10kTable.xls
25
        │       │           └── SimpleTemplate.drt
26
        │       └── META-INF
27
        │           ├── kmodule.xml


Project Setup

In production, we would usually run rules in a standalone process instead of embedded into the myapp. However, in our setup, we mainly focus on rules execution performance, so we simplify it to run the rule in embedded mode but we package the rules in a separate jar file(a.k.a kjar). There should be no performance difference for rules execution in embedded or standalone mode.

Also regarding my CPU and memory configuration, my laptop is 8 Core x 64G. Due to the large decision table, it would take a lot of computing resources to compile and run in some test runs. If it takes too long for you to run the 100k decision table test scenario, I suggest you just run the 10k decision table scenario. I also deactivate 100k decision tables by default. You need to follow the Readme in GitHub to active the 100k decision table test.

Solution 1: Rule Template + Xls File

In drools, we can utilize the rule template to handle Excel files.

In my example code, all you need to do is follow these 2 steps:

  1. Configure kmodule.xml and package the kjar in the maven project (by utilizing the kie-maven-plugin)
XML
 




xxxxxxxxxx
1
16


 
1
<?xml version="1.0" encoding="UTF-8"?>
2
<kmodule xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3
         xmlns="http://www.drools.org/xsd/kmodule">
4
    <!-- <kbase name="DTableWithTemplateKB1" packages="com.myspace.spreadsheet_decisiontable">
5
        <ruleTemplate dtable="com/myspace/spreadsheet_decisiontable/10kTable.xls"
6
                      template="com/myspace/spreadsheet_decisiontable/SimpleTemplate.drt"
7
                      row="2" col="2"/>
8
        <ksession name="mykiesession1"/>
9
    </kbase> -->
10
    <kbase name="DTableWithTemplateKB2" packages="com.myspace.spreadsheet_decisiontable">
11
        <ruleTemplate dtable="com/myspace/spreadsheet_decisiontable/100kTable.xls"
12
                      template="com/myspace/spreadsheet_decisiontable/SimpleTemplate.drt"
13
                      row="2" col="2"/>
14
        <ksession name="mykiesession2"/>
15
    </kbase>    
16
</kmodule>



2. Trigger your rule in the client code like so:

Java
 




xxxxxxxxxx
1
12


 
1
//Get kie container
2
KieContainer kieContainer = KieServices.Factory.get().getKieClasspathContainer();
3
//Create kie session
4
KieSession ksession = kieContainer.newKieSession("mykiesession2");
5
//insert Factor
6
ClientObject o1 = new ClientObject();
7
o1.setDescr("DangerObject99999");
8
ksession.insert(o1);
9
//Fire rules
10
int fired = ksession.fireAllRules();
11
//Get result
12
System.out.println("Is Object Pass:" + o1.isPass());



Decision Table is plain xls format:

Decision Table is plain xls format


Rules configuration is managed in a rule template:

Java
 




xxxxxxxxxx
1
22


 
1
template header
2
description
3
result
4

          
5
package com.myspace.spreadsheet_decisiontable;
6

          
7
import com.myspace.spreadsheet_decisiontable.ClientObject;
8

          
9
template "classification-rules"
10
description
11
result
12

          
13
rule "Categorize Objects_@{row.rowNumber}"
14
no-loop true
15
when
16
    clientObject: ClientObject(descr matches @{description})
17
then
18
    clientObject.setPass(@{result});
19
    System.out.println("@{result}");
20
end
21

          
22
end template


SimpleRuleTemplate.drt

The syntax almost explains for itself. First, you define the variable for each column as description and result, then you reference the variable in drl as @{result}, @{description}.

As long as you are familiar with the basic syntax of drools rule language, it’s quite straightforward.

Notice that the rule template would generate 1 rule per row in the decision table. So assume that you have a 100k row table, you would have 100k rules in your runtime memory.

Pros

This solution does not need a special header for Excel data. The header and condition configuration is configured in the rule template.

As you might notice, our decision table is plain simple:

Simple decision table


By contrast, let’s see what a typical drools domain header looks like for a spreadsheet decision table:

Typical drools domain header


So the first good point is the rule data is easy. A plain spreadsheet is easier to understand and maintain by any user who does not yet have knowledge of drools domain syntax. You probably already maintain your business rules in such forms before you even adopt drools for rule management.

The configuration is managed in a separate rule template file which can be managed by a different person or team.

Secondly, by utilizing the rule template, it’s very flexible for rule conditions and actions since you can add multiple lines of code logic just like any coding block instead of putting them in Excel columns which would lose readability when it becomes multiple lines.

Also, you can store the rule data in DB or CSV if you like, since it’s simple row data without meta information. Drools provides an interface to handle this.

In some cases, users might want to customize the rule data governance by developing their own solution for editing, managing. Although usually, I would suggest users utilize the kie workbench, which is sophisticated and rich in functioned rules authoring and governance tools. However, if you want to develop your own portal or integrate the rule authoring experience into an existing application, then you can edit the business rules via DB row data and convert them to drools rule through rule templates.

Cons

The disadvantage is also obvious from my point of view: Raw Excel is not an easily controlled version.

Believe it or not, it’s vital to separate the business logic from application code in rules-oriented applications. Therefore rule data version control is vital and difficult. Although the drools community provides sophisticated and fully functional governance tools, like the “kie workbench,” to manage the rules authoring and version control features, etc., it can’t recognize both the “rule template” and raw Excel format.

So basically you can’t import them into the kie workbench or export them.

And the performance is not good. This can be observed by executing the client code in my demonstration code:

Markdown
 




xxxxxxxxxx
1
27


 
1
# Run 10K rows Decision Tables
2
```bash
3
cd rules
4
mvn clean package
5
// rule package is spreadsheet-decisiontable-1.1-SNAPSHOT.jar *
6
cd ../myapp
7
mvn clean compile exec:java
8

          
9
Initial Kie Session elapsed time(ms): 8755
10
false
11
fired rules: 1 elapsed time(ms): 387
12
Is Object Pass:false
13
```
14
 
15
# Run 100K rows Decision Tables
16
```bash
17
cd rules
18
// It would take 1.5 mins to package the rule jar
19
mvn clean package
20
// rule package is spreadsheet-decisiontable-1.1-SNAPSHOT.jar
21
cd ../myapp
22
mvn clean compile exec:java
23
Initial Kie Session elapsed time(ms): 99290
24
false
25
fired rules: 1 elapsed time(ms): 9206
26
Is Object Pass:false
27
```



So obviously for a large decision table, this would not be an ideal solution, considering that one rule execution might take several seconds, although it’s quite flexible and the rules data is easy to manage by the excel file.

I think there are two reasons that cause the slow running:

  1. There are huge numbers of rule; in my case, there are 10K or 100k rules in my rule execution session.
  2. Converting row data into drools rules is slow and it would slow down the application; it could possibly cause a lot of JVM overheads.

Although this article mainly focuses on performance, I think rules governance is of at least the same importance as performance, if not more. Otherwise, I can actually handle this logic in plain java instead of involving drools.

I also see some user choosing a partial solution:

  1. Separate the big row data from the rules engine
  2. Keep other rules data in the rules engine as usual

It indeed can fix performance issues by separating the challenges outside of the drools domain.

Personally, I think it’s far from an ideal solution as well because it doesn’t keep all the business rules in one place. It would leak into or let’s say pollute your generic application code.

Conclusion

For rules-oriented application, a half barrel is like an empty barrel. It’s very hard to manage the business rules software lifecycle, such as editing, version control, and deployment if you don’t utilize the rules application framework, such as drools.

I describe solution 2: precompile rule solution in my next article; the problems I want to fix are:

  1. Don’t dynamically load Excel data at runtime; let’s precompile it at build time
  2. Use drools spreadsheet decision table so that it can be “version-controlled” by KIE workbench
Database Drools application Data (computing) Template IT

Published at DZone with permission of Ryan ZhangCheng. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Non-blocking Database Migrations
  • Building a Database Written in Node.js From the Ground Up
  • Understanding the Fan-Out/Fan-In API Integration Pattern
  • The Bill You Didn't See Coming

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook