Process Mining Key Elements
This article introduces process mining, explaining its key elements and practical applications for discovering and analyzing workflows using event data.
Join the DZone community and get the full member experience.
Join For FreeProcess mining is a technique that helps organizations understand, analyze, and improve their processes. This article tries to break it down into simple terms. It explains how process mining can benefit users looking to extract process-level metrics from their applications or tools with minimal data mining background.
Key Questions About Process Mining
The following questions help get a high-level idea of process mining and whether it’s a good fit for your analytics use case:
Why Process Mining?
Process mining provides a clear view of how processes run. It extracts data in a straightforward form from systems like ERP or CRM or any event/transition log generated by an application.
Why Now?
Digital processes are growing and becoming complex with rule-based auto nodes. Identifying end-to-end processes is not easy with many connected systems. Process mining not only discovers processes rapidly for millions of events (in a distributed environment), but it also provides tools to find bottlenecks and highways.
Why Data-Driven?
Process mining is a bottom-up approach. It discovers the process model from actual event data and provides tools to compare it with the expected process, making it easier to identify deviations. Many BPM system users simply assume that all process instances are compliant and following designed Business process without the need to monitor them.
Why Visualize Processes?
Visualizing workflows makes it easier for users without a technical background to do the analysis. It’s easier to spot inefficiencies and compliance issues. Long process maps can be cut into smaller ones, and each sub-process can be analyzed separately. Visualization may not fit every use case and automation may be needed to do conformance checking or trigger alerts in case of bottlenecks or anomalies.
Why Continuous Improvement?
Processes evolve. Process mining makes process owners’ lives easy by providing tools to continuously monitor and improve workflows.
Key Elements of Process Mining
Process mining has the following core elements:
1. Event Logs
Event logs are the input for process mining. Most algorithms expect at least three mandatory fields to discover a process. A unique ID that ties activities occurring at different time stamps. A process model can be discovered if any system is capturing a unique ID, activity name, and time stamp (There are algorithms that require only two mandatory fields). If there are more attributes, then it helps in filtering the data.
Example
In an order-to-cash process, event logs might capture when an order was placed, processed, and shipped.
2. Process Discovery
Process discovery involves creating visual models from event logs. This helps uncover how processes work.
Example
A manufacturing pipeline can be visualized where each device id passes through a station at a given time. Attributes like Pass/Fail of the device can be aggregated at the variant level to see if any specific variant fails more.
3. Conformance Checking
This step compares the actual process (discovered through event logs) with the expected process model. By aggregating the mean or median time for various variants, performance degradation can be identified by continuously monitoring deviations. These deviations can point to compliance issues or areas needing improvement.
Example
A bank might use conformance checking to ensure loan approvals follow regulatory guidelines.
How Process Mining Differs from Data Mining
At first glance, process mining and data mining might seem similar because both analyze data. Below are some points that can help decide when to use process mining for a use case:
Focus on Processes
Data mining identifies patterns or trends in large datasets. For example, it might find a correlation between product sales and customer demographics.
Process mining, on the other hand, specifically focuses on workflows and how activities are carried out in sequence. So, it can be used to study a specific demographic and what steps are taken by them before placing an order. A full process journey can also visualize post-order flow, including returns, customer service, etc.
Event-Based Approach
Process mining uses event logs containing specific timestamps and sequences of activities. Data mining usually evaluates more aggregated data instead of raw events.
Goal
Data mining aims to predict future outcomes or classify data. Process mining aims to discover, monitor, and optimize processes.
Process Mining Algorithms
Why can’t we just sequence event logs without using process mining to understand processes? If your use case requires process discovery and analysis, then it’s better to stick to process mining algorithms for the following reasons:
Handling Scale
Real-world processes involve millions of events. Click events on the website can be in billions. Sequencing activities by simply using time stamps may lead to wrong analysis. Process mining algorithms generate a process model, which is simpler as it removes weak connections. Most of these algorithms can run in a distributed environment.
Dealing With Variants
Processes don’t always follow a single path. Process mining algorithms generate variants and use those to provide an aggregated view. It also identifies activities that are parallel instead of sequential.
Identifying Anomalies
Algorithms help discover inefficiencies, loops, and deviations.
Auto Discovery
Process mining helps generate process models that can be visualized in a simpler form by removing weak edges and activities. It saves time and effort in the analysis of the discovered process.
Process Mining Beyond Formal Processes
Process mining is not limited to formal business workflows. Any application with event data can be analyzed using this technique. For example, in e-commerce, event logs can reveal customer navigation paths through a website. Different visualizations like Sankey diagrams can map user journeys and help analyze session drop-offs.
Sample Event Data
Consider the following minimal sample event log:
Case ID | Activity | Timestamp | Resource |
---|---|---|---|
1 |
Submit Order |
2024-12-09 09:00:00 |
User1 |
1 |
Process Payment |
2024-12-09 09:05:00 |
User2 |
1 |
Ship Order |
2024-12-09 10:00:00 |
User3 |
2 |
Submit Order |
2024-12-09 09:10:00 |
User1 |
2 |
Process Payment |
2024-12-09 09:15:00 |
User2 |
2 |
Quality Check |
2024-12-09 09:50:00 |
User4 |
2 |
Ship Order |
2024-12-09 11:00:00 |
User3 |
Using this data, process mining can:
- Discover process model: Aggregate all dependencies and generate a simplified process map.
- Analyze highways and bottlenecks: Identify delays, e.g., longer time between Process Payment and Ship Order in Case 2.
- Evaluate performance: Slice and dice data by attributes like resources or timestamps to find bottlenecks.
One of the most interesting use cases is to find the order of events causing any aggregated metrics to deviate from the normal trend. It’s possible that C->A->A->C->B takes more mean time compared to C->A->C->A->B.
In the example above, there are two variants:
- Submit Order -> Process Payment -> Ship Order
- Submit Order -> Process Payment -> Quality Check -> Ship Order
Let’s add another attribute to the event data set – “returned” with a Boolean value. We can aggregate this value at the variant level and compare the two variants to identify if more items get returned when there is no Quality step. This is a very simplified example, but this approach can be used for many complex scenarios where the sequence of activities impacts critical metrics.
Identifying Parallel Activities
Imagine a customer service process where different teams handle billing and technical issues simultaneously. Without process mining, activities might be linear due to separate variants in event logs. However, a process mining algorithm can:
- Aggregate multiple variants into a model.
- Identify patterns where billing and technical activities overlap.
- Discover the parallel activities in the process map.
This insight helps optimize resource allocation.
Where to Start With Process Mining?
There are open-source libraries and event logs available online to practice process discovery and conformance checking. Below is the list of a few algorithms and their ideal use cases to start with.
- Directed flow graph: Direct activity sequences.
- Alpha miner: Simple processes.
- Heuristic miner: Moderately complex and noisy processes.
- Inductive miner: Complex logs requiring precise model.
- Fuzzy miner: Flexible processes requiring a very high-level overview.
- Clustering-based miner: Varied process logs.
- Declarative miner: Rule-based processes.
Conclusion
Process mining, if used properly and implemented correctly for applications at scale, can help provide tools to analyze with visualizations. By discovering actual workflows and checking for conformance, process owners can identify compliance issues. Compared to data mining, process mining uniquely focuses on workflows. Still, it’s important to evaluate each use case properly before picking process mining for analytics, as a lot of traditional use cases can be solved by existing data mining techniques.
Opinions expressed by DZone contributors are their own.
Comments