Building a Boolean Logic Rules Engine in Neo4j

The first time I built a boolean logic rules engine, I forgot what sleep was and lost all sense of fear — but I managed to do it. Now, I'm teaching YOU how to build one!

Max De Marzi

Aug. 27, 17 · Tutorial

Like (3)

Save

7.1K Views

A boolean logic rules engine was the first project I did for Neo4j before I joined the company some five years ago. I was working for some start-up at the time but took a week off to play consultant. I had never built a rules engine before, but as far as I know, ignorance has never stopped anyone from trying. Neo4j shipped me to the client site and put me in a room with a projector and a white board where I live coded with an audience of developers staring at me, analyzing every keystroke and cringing at every typo and failed unit test. I forgot what sleep was, but I managed to figure it out... and I lost all sense of fear after that experience.

The data model chained together fact nodes with crisscrossing relationships, each chain containing the same path ID property that we followed until reaching an end node which triggered a rule. There were a few complications along the way and more complexity near the end for ordering and partial matches. The traversal ended up being some 40 lines of the craziest Gremlin code I ever wrote, but it worked. After the proof of concept, the project was rewritten using the Neo4j Java API because at the time only a handful of people could look at a 40-line Gremlin script and not shudder in horror. I think we're up to two handfuls now.

Over the years, a few projects have come up that needed rules that had to be evaluated. One example is a shopping cart that could trigger a discount if a set of items was purchased. Another is a job seeker with a set of skills that could match a job posting. One more could be a user with a set of tags that prompted an advertisement to be displayed.

Let's imagine we have two users. User "max" has attributes "a1" and "a2". User "alex" just has one attribute, "a3". A rule exists that is triggered by any user that has both "a1" and "a2" in its set or alternatively has "a3" but not "a4". I'm going to use the shorthand "&" for ANDs, "|" for ORs, and "!" for NOTs. So our rule would be "(a1 & a2) | (a3 & !a4)". We could model the data this way:

To see if a user triggers a rule, we would start by collecting their attributes and then from each attribute traversing in and out the AND and OR nodes looking backward and checking if we had (or didn't have, for NOTs) the attributes required. All relationships leading to the AND nodes would have been valid, but with just one of the relationships leading to the OR nodes. This is a valid way of doing it, but today I'm going to show you an alternative.

Instead of adding the AND/OR/NOT logic nodes to the graph, we are going to compress into Path nodes. We split our rule into two paths "(a1 & a2)" and "(a3 & !a4)" and our graph looks like this:

In order to find whether a user triggers a rule, we'll create a stored procedure that finds the user, gets their attributes, and for each attribute tries to find a set of rules:

@Procedure(name = "com.maxdemarzi.match.user", mode = Mode.READ)
    @Description("CALL com.maxdemarzi.match.user(username) - find matching rules")
    public Stream<NodeResult> matchUser(@Name("username") String username) throws IOException {
        // We start by finding the user
        Node user = db.findNode(Labels.User, "username", username);
        if (user != null) {
            // Gather all of their attributes in to two Sets (nodes, attribute ids)
            Set<Node> userAttributes = new HashSet<>();
            Collection<String> attributes = new HashSet<>();

            for (Relationship r : user.getRelationships(Direction.OUTGOING, RelationshipTypes.HAS)) {
                userAttributes.add(r.getEndNode());
                attributes.add((String)r.getEndNode().getProperty("id"));
            }
            // Find the rules
            Set<Node> rules = findRules(attributes, userAttributes);
            return rules.stream().map(NodeResult::new);
        }

        return null;
    }

The magic happens in the findRules method. For each attribute node, we traverse out the IN_PATH relationship and get the "path" property of the relationship. We split it into an array of strings each holding an attribute ID. We also split the "path" property into a character array using just the ANDs ("&") and NOTs ("!"). For each path, we check to see if we are missing a needed attribute or if we have an attribute we should not have. If either one of those is true, we fail the path we are on and try the next.

Set<Node> findRules(@Name("attributes") Collection<String> attributes, Set<Node> userAttributes) {
    Set<Node> rules = new HashSet<>();
    Set<Node> paths = new HashSet<>();

    for (Node attribute : userAttributes) {
        for (Relationship r : attribute.getRelationships(Direction.OUTGOING, RelationshipTypes.IN_PATH)) {
           // Get the "path" property
           String path = (String)r.getProperty("path");

           // Split it up by attribute and by & and !
           String[] ids = path.split("[!&]");
           char[] rels = path.replaceAll("[^&^!]", "").toCharArray();

           // Assume path is valid unless we find a reason to fail it
           boolean valid = true;

           // Since our starting attribute is how we got here we skip checking it.
           if (ids.length > 1) {
               for (int i = 0; i < rels.length; i++) {
                   if (rels[i] == '&') {
                       // Fail if attribute is not there
                       if (!attributes.contains(ids[1+i])) {
                           valid = false;
                           break;
                       }
                   } else {
                       // Fail if attribute is there but should NOT be
                       if (attributes.contains(ids[1+i])) {
                           valid = false;
                           break;
                        }                    
                   }
               }
           }
           // If we made it add it to the set of valid paths
           if (valid) {
               paths.add(r.getEndNode());
           }
       }
   }

   // For each valid path get the rules
   for (Node path : paths) {
       for (Relationship r : path.getRelationships(Direction.OUTGOING, RelationshipTypes.HAS_RULE)) {
           rules.add(r.getEndNode());
       }
   }
   return rules;
}

This makes our logic pretty easy to follow and we don't have to worry about chasing ANDs, ORs, or NOTs depending on which nodes we land on along the way. But how do we build this graph in the first place?

We'll create another stored procedure for this called rules.create that takes a rule ID and formula as input:

    // This procedure creates a rule, ex. ("Rule 1", "(0 & 1) | (2 & 3)")
    @Procedure(name = "com.maxdemarzi.rules.create", mode = Mode.WRITE)
    @Description("CALL com.maxdemarzi.rules.create(id, formula) - create a rule")
    public Stream<StringResult> create(@Name("id") String id, @Name("formula") String formula) throws IOException {
        // See if the rule already exists
        Node rule = db.findNode(Labels.Rule, "id", id);
        if (rule == null) {
            // Create the rule
            rule = db.createNode(Labels.Rule);
            rule.setProperty("id", id);
            rule.setProperty("formula", formula);

This next part is where the magic happens. We take our formula and create a boolean expression. In this boolean expression, we are going to create a Truth Table and use the Quine McCluskey algorithm to find the prime implicants, and Petrick's Method to find all minimum sum-of-products solutions, which become our "Path" nodes.

// Use the expression to find the required paths
BooleanExpression boEx = new BooleanExpression(formula);
boEx.doTabulationMethod();
boEx.doQuineMcCluskey();
boEx.doPetricksMethod();

Do you remember taking "Digital Logic" in college and thinking, I'm never going to use this stuff? ...Yeah. OK, now for each path, we'll check to see if another rule already created this path, and if not, we'll make one. This will let us trigger many rules if they share a common path.

            for (String path : boEx.getPathExpressions()) {
                Node pathNode = db.findNode(Labels.Path, "id", path);
                if (pathNode == null) {
                    // Create the path node if it doesn't already exist
                    pathNode = db.createNode(Labels.Path);
                    pathNode.setProperty("id", path);

We'll be generous here and for each attribute in the path we will either find it or create it. We will connect our path node to the first attribute in the path. All of our paths will be sorted the same way, so we will never have two equivalent Path nodes. If we have an "a1 & a2", we will never have an "a2 & a1" path.

                    // Create the attribute nodes if they don't already exist
                    String[] attributes = path.split("[!&]");
                    for (int i = 0; i < attributes.length; i++) {
                        String attributeId = attributes[i];
                        Node attribute = db.findNode(Labels.Attribute, "id", attributeId);
                        if (attribute == null) {
                            attribute = db.createNode(Labels.Attribute);
                            attribute.setProperty("id", attributeId);
                        }
                        // Create the relationship between the lead attribute node to the path node
                        if (i == 0) {
                            Relationship inPath = attribute.createRelationshipTo(pathNode, RelationshipTypes.IN_PATH);
                            inPath.setProperty("path", path);
                        }
                    }

Lastly, we'll connect our path node to the rule node we created earlier:

// Create a relationship between the path and the rule
pathNode.createRelationshipTo(rule, RelationshipTypes.HAS_RULE);

Alright. Let's create some sample data and try it. We will create two rules:

CALL com.maxdemarzi.rules.create('Rule 1', '(a1 & a2) | (a3 & a4)')
CALL com.maxdemarzi.rules.create('Rule 2', '(a1 & a2) | (a3 & !a4)')

...and create a few sample users and connect them to some attributes like so:

For example user "max" has attributes "a1" and "a2". So, when we run our match procedure:

CALL com.maxdemarzi.match.user('max') yield node return node

We will return both rules.

In a large production deployment, you may not want to store the user to attribute relationships or they may be transient like items in a temporary shopping cart, for example. In this case, you'll want to pass in the attributes directly and call it this way:

CALL com.maxdemarzi.match.attributes(['a1','a2']) yield node return node

I'll spare you the details, but the complete source code is on GitHub as always... and now you know one way to build a rules engine in Neo4j.

Neo4j Attribute (computing) Engine

Published at DZone with permission of Max De Marzi, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Building a Boolean Logic Rules Engine in Neo4j

The first time I built a boolean logic rules engine, I forgot what sleep was and lost all sense of fear — but I managed to do it. Now, I'm teaching YOU how to build one!

Related

Partner Resources