A Guide to Rules Engines for IoT: Flow Processing Engines
Check out this post to learn more about flow processing engines.
Join the DZone community and get the full member experience.Join For Free
What Are Flow Processing Engines?
Flow-based programming is a programming paradigm that defines applications as networks of "black-box" processes. These processes, a.k.a functions, are represented as nodes that exchange data across predefined connections by message passing. The nodes can be reconnected endlessly to form different applications without having to change their associated functions.
Flow-based programming (FBP) is, thus, naturally "component-oriented". Some of the benefits of FBP are:
- Change of connection wiring without rewriting components.
- Inherently concurrent — suited for the multi-core CPU world.
What Are Some Examples of Flow Processing Engines?
Yahoo! Pipes and Node-RED are two examples of rules engines built using flow-based programming. Flow-based programming has become even more popular with the introduction of “serverless” computing, where cloud applications can be built by chaining functions.
IBM’s OpenWhisk is an example of flow-based programming by chaining cloud functions (which IBM calls actions). Another serverless orchestration approach (such as AWS step functions) is based on Finite State Machine rules engines.
Can Flow Processing Engines Model Complex Logic?
Flow-based programming has no notion of states and state transitions. Combining multiple non-binary outcomes of functions (observations) in the rule is possible, but it must be coded in every function where it is applied. That also implies that you have to branch at every function where you need to model a multiple-choice outcome. This leads to extremely busy flow graphs that are hard to follow, especially since logic is expressed both in the functions themselves and in their “connectors” — path executions. These connectors somehow suggest not only the information flow but also the decisions that are being taken.
Similar to decision trees, such an approach for modeling suffers from exponential growth of the number of nodes, as the complexity of the logic increases. What makes matters even worse is that, unlike in decision trees, we cannot track the function outcomes as states. There is no better illustration of this drawback than to look at a slightly more complex flow being implemented using Node-RED and count the number of nodes and connectors. It is not unusual to have simple use cases designed by node-RED with 30 or 40 nodes and connectors, which can hardly even fit on one screen.
Majority voting in flow engines is possible only if we introduce the concept of merging the outputs of different nodes into a separate merge node. Even so, it’s still problematic, as it requires to code majority rules within the function of that merge node.
Can Flow Processing Engines Model Time?
Flow engines can barely deal with any aspect of the time dimension since FBP is by design a stateless rules engine. In some limited use cases (which can hardly scale), you can merge streams within a time window.
Are Flow Processing Engines Explainable?
Some of the things that make a rule engine explainable are:
The intent of the rule should be clear to all users, developers, and business owners alike
There should be a compact representation of the logic
The engine should have simulation and debugging capabilities both during design time and at runtime
For simple use cases, a flow-based data stream representation feels natural, at least from the perspective of the information flow. But any attempt to create complex logic using FPB makes validating the intended logic very difficult.
Having said that, understanding which decisions are taken by looking at the flow graph is a very difficult task. The main reason for this is that the logical representation is not compact and the validation of the rules often requires streaming test data, followed by the validation of the function logs across all pipelines.
The logic is split between the flow pathways (as data travels between processing nodes) and the payload processing in each node, which might lead to different paths being taken after that processing node. Hence, debugging and rules validation becomes a very tedious and error-prone process. Moreover, we are never sure that all corner cases (the outputs as decisions from different inputs) are covered by a particular rule expressed using FBP — it looks almost as FBP based rules validation is an NP-hard problem.
Are Flow Processing Engines Adaptable?
Flow-based programming engines have reusable black box nodes (functions). However, a partial update of any particular rule is, nevertheless, difficult and risky because this usually implies major changes to the graph and revalidation of the rules. In a way, the main reason for this is that for most rules engines, and for FBP in particular, there is a high correlation between explainability and flexibility. Flow-based rules engines are easy to extend with third-party services and extensibility is achieved in an elegant way.
Are Flow Processing Engines Easy to Operate?
Templating is very difficult to achieve since special care needs to be taken when handling payload transformations that happen as payloads are passed between different processing nodes. Also, thresholds and branching logic are part of the same payloads processing flow, making it very hard to abstract this logic. It’s for this same reason that bulk upgrades are error-prone and risky.
Are Flow Processing Engines Scalable?
Flow-based programming engines are inherently concurrent since they have to distribute functional computations. They are also stateless, which means that the rules engine only needs to keep track of the current execution and further actions that need to be executed. On the other hand, if merging multiple outputs of different nodes is required in one rule, or when decision branching is introduced with different path executions, the rules engine will need to keep the snapshot (scope) of the rules execution somewhere.
Using Node-RED for IoT Application Development
Node-RED is today very popular in the maker community and the de-facto tool in the gateways of many industrial vendors. This has to do with its creators' decision to let different protocol streams come directly into nodes as input data events. This was done deliberately in order to simplify protocol termination and to allow payload normalization being performed within node-RED. But it’s a decision that acts as a double-edged sword.
On the one hand, this means that protocol-dependent data streams can be implemented by any third party and immediately used within the node-RED environment. And as protocol transformation and payload normalization are very important in IoT deployments, node-RED can be very valuable for edge deployments.
But on the other hand, it also makes Node-RED suffer from operability issues that are even bigger than other flow-based programming engines. It makes templating, for example, an order of magnitude more difficult: protocol transformation and payload normalization need to be part of the Node-RED template, together with threshold definitions and branching.
Though a good fit for edge deployments, an off-the-shelf Node-RED instance is not scalable for the cloud. Some vendors provide cloud solutions with sharding implemented on top of Node-RED and by externalizing the protocol termination in a separate component. However, when taking such an approach, they could as well switch back to the more generic FPB engines.
This is an excerpt from our latest Benchmark Report on Rules Engines used in the IoT domain. You can have a look at a summary of the results or download the full report over here.
Published at DZone with permission of Veselin Pizurica, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.