Mule 4 — Thread Management and Self-Tuning Runtime
In Mule4 the runtime engine is designed for nonblocking and asynchronous execution. Mule4 runtime is a “Reactive” execution engine.
Join the DZone community and get the full member experience.
Join For FreeMule4 Execution Engine
In Mule4 the runtime engine is designed for nonblocking and asynchronous execution. Mule4 runtime is a “Reactive” execution engine.
This new runtime manages/tune different workload automatically. It means the Mule Event processor itself indicates what type of operation they want to follow. The operation could be CPU-light, CPU-intensive, or IO-intensive.
Mule Event Processing Types
As we all know Mule 4 runtime is based on the event. And based on the type of event below are the processors. They decide how a Mule component works or operates. That means, how to speed a mule component will process, how many threads or which type of thread pull will allocate.
- CPU-light:
- This processor is for quick operation. The processing speed will be around 10 ms.
- By default, this processor does perform only NON Blocking activities.
- Logger, HTTP Requestor component. These types of tasks will not perform any Blocking activities.
- While running any Mule4 application, we can identify these processors in a console logger.
- From the console log, we can find CPU_LIGHT and CPU_LIGHT_ASYNC string, which will tell us which component is running on the CPU-light processing type.
- CPU-intensive:
- This type of processor is not for quick operation. It takes more than 10 ms to perform any task/activity.
- These tasks should not perform any I/O activity.
- The transform Message component uses this processor.
- From the Studio console log, we can identify CPU_INTENSIVE string, which gives information which mule component use CPU-intensive processing type.
- Blocking IO:
- For any operation where the Mule component has to wait for a response or it blocks the thread, in those operations Blocking IO will use.
- Database select operation or SFTP read operation.
- BLOCKING or IO in console logs will indicate to us that which Mule component is using Blocking IO processing type.
Centralized Pools
Based on Mule event processing type we have 3 thread pools in Mule 4 engine.
Now we can not manage or configure the thread pools in Application levels, Instead Mule 4 engine using the thread management internally.
All these 3 pools are centralized, which will manage at the runtime level. If any configuration or changes required according to Application processing, we have to handle at runtime level by using JVM parameters inside the Mule runtime.
Mule Application will use threads from each pool based on the event. In this threading process, a single flow can utilize multiple threads from a different pool based on the usage of components.
Below are the 3 centralized thread pool which is based on Mule event process:
- CPU_LITE
- This thread pool is the small pool in the Mule 4 runtime engine. By default, It has only 2 threads for each available core.
- This pool performs the handshake between processors in a flow and handling only Nonblocking I/O.
- Due to some bad code or code misusing CPU Light pools may get un-responsive or throughput will drop. WAITING or BLOCKED, these strings in console logs will help us to identify the issue easily.
- CPU_INTENSIVE
- This thread pool is also the small pool in the Mule 4 runtime engine. By default, It has only 2 threads for each available core.
- But a queue is provided in this pool, which helps to process more tasks.
- This pool will use by the Transform Message component, if we have complex logic or big lines of code in the transform logic, it may cause thread blocking and which can leads to slowing of processing speed.
- BLOCKING_IO
- This is the bigger pool among all 3 pools. This is an elastic pool, which means it can grow to the max limit (the limit will vary based on types of container/runtime system) based on the number of requests.
- For transaction scope or transactional flows, this pool will use. Because most transaction managers require all steps of the transaction to be performed in a single thread.
- Tasks running in this pool should spend most of their time in WAITING or BLOCKED states instead of performing CPU processing so that they do not compete with the work of the other pools.
Custom Thread Pool
Apart from default THREE pools Mule 4 runtime use some additional pools for specific purposes:
- NIO Selector: Based on requirement components use NON-Blocking IO. Internally Java NIO Selector is using most of the time by the connector or component.
- Recurring Pools: Some connectors or components can create this type of custom pool for recurring tasks.
- GRIZZLY
- In Mule 4 runtime this one of the most custom thread pools being used HTTP components.
- This is an NIO Selector thread pool. Java “NIO” has the concept of a selector thread.
- This pool is also configured at runtime level and shared by application deployed to that runtime.
- GRIZZLY is divided into TWO pools. One is GRIZZLY(Shared) and the other is GRIZZLY(Dedicated). The Shared one will use by HTTP Listener and HTTP Requestor will use the dedicated one.
Thread Pool Configuration
The minimum size of the thread pool will determine on CPU size, and that will decide once the runtime starts.
Here is the Mule 4 thread pool configuration, which will depend on how we are configured our CPU and RAM size.
Name of Pool | Minimum Size | Maximum Size | When the size created by the runtime |
---|---|---|---|
CPU_LITE | #cores | 2 * #cores | Mule Runtime startup |
CPU_INTENSIVE | #cores | 2 * #cores | Mule Runtime startup |
BLOCKING_IO | #cores | #cores * mem-245760/5120 | Mule Runtime startup |
GRIZZLY (Shared) | #cores | #cores + 1 | Deployment of first App using HTTP Listener |
GRIZZLY (Dedicated) | #cores | #cores + 1 | Deployment of each App using HTTP Requestor |
Example of a Mule Container
For a Mule runtime sitting on a 2 core CPU with 1 Gig machine or container, the following table shows what the minimum and maximum values are for each thread pool.
Example | Name of Pool | Minimum Size | Maximum Size |
---|---|---|---|
2 Core CPU with 1 GB RAM | CPU_LITE | 2 | 4 |
CPU_INTENSIVE | 2 | 4 | |
BLOCKING_IO | 2 | 151 | |
GRIZZLY | 2 | 3 |
Knowledge Before Customise a Mule 4 Container
Based on the performance test of a Mule Application we may have to customize our Mule engine.
To take that decision we may have to know about each pool and its usability. Which will help to perform the fine-tuning of a Mule server for better performance?
Mule 4 calculates the sizing of thread pools dynamically and automatically, and in most scenarios the defaults are optimal. Under most circumstances, it not recommended from MuleSoft to change the default values. However, this exercise has discovered the default thread pools sizing are insufficient due to the high number of HTTP requests with relatively low memory allocation and thread pool sizing.
Here are the pool and its event processors:
Pool Name | Event Processors |
---|---|
CPU_LITE |
|
CPU_INTENSIVE |
|
BLOCKING_IO |
|
GRIZZLY (Shared) |
|
GRIZZLY (Dedicated) |
|
The thread pool sizing is changed in the following file of the Mule runtime: MULE_HOME/conf/scheduler-pools.conf
Mule 4 Container Configuration
The thread pool is automatically configured by Mule at startup, applying formulas that consider available resources such as CPU and memory.
We can modify these global formulas by editing the MULE_HOME/conf/schedulers-pools.conf file in our local Mule instance.
In Mule, we have TWO Scheduling Strategy
- UBER: Unified scheduling strategy. (Default)
- DEDICATED: Separated pools strategy. (Legacy)
UBER Scheduling Strategy
When the strategy is set to UBER, the following configuration applies:
org.mule.runtime.scheduler.uber.threadPool.coreSize=cores
org.mule.runtime.scheduler.uber.threadPool.maxSize=max(2, cores + mem - 245760) / 5120
org.mule.runtime.scheduler.uber.workQueue.size=0
org.mule.runtime.scheduler.uber.threadPool.threadKeepAlive=30000
DEDICATED Scheduling Strategy
When the strategy is set to DEDICATED, the parameters from the default UBER strategy are ignored.
To enable this configuration, uncomment the following parameters in our schedulers-pools.conf file:
xxxxxxxxxx
org.mule.runtime.scheduler.cpuLight.threadPool.size=2*cores
org.mule.runtime.scheduler.cpuLight.workQueue.size=0
org.mule.runtime.scheduler.io.threadPool.coreSize=cores
org.mule.runtime.scheduler.io.threadPool.maxSize=max(2, cores + mem - 245760) / 5120
org.mule.runtime.scheduler.io.workQueue.size=0
org.mule.runtime.scheduler.io.threadPool.threadKeepAlive=30000
org.mule.runtime.scheduler.cpuIntensive.threadPool.size=2*cores
org.mule.runtime.scheduler.cpuIntensive.workQueue.size=2*cores
Use Case for a Container Tuning
- Issue: If we call a Java component from DW, that can lead us to performance issues.
- Explanation: Dataweaves ideally execute in CPU_INTENSIVE threads since they should be processed in a non-blocking fashion. And if the DataWeave component calls Java code, which can lead CPU_INTENSIVE threads becoming blocked for significant periods, because the underlying Java code is executed synchronously leading to these thread blocks.
Solution 1
- Change the Java invoked by Java component instead of DW.
- Use Java module 1.2.5
- higher, which will support executing Java code in the BLOCKING_IO thread pool.
Solution 2
- Ideally DW component executes on CPU_INTENSIVE which has a limited number of threads ( 2 thread per core).
- We can utilize the BLOCKING_IO pool instead of CPU_INTENSIVE, which will help us to use more number of threads.
- Also, The BLOCKING_IO thread pool is better suited for blocking operations.
- To perform this change, the following argument must be passed to the JVM:
Dmule.dwScript.processingType=BLOCKING
Opinions expressed by DZone contributors are their own.
Comments