Mind the Long Tail: Understanding the Difficulty of Delay Detection in Business Processes

This is the supplementary githob repository of the paper: "Mind the Long Tail: Understanding the Difficulty of Delay Detection in Business Processes", submitted to BPM 2026.

Supplementary Report

The supplementary report of the paper is accessible here

Installation

Clone this GitHub repository to your local machine. To install and set up the required environment on a Linux system, run the following commands:

conda create -n delay python=3.11
conda activate delay
pip install -r imbalanced_regression.txt
conda clean --all

Running Experiments

To execute the pipeline for a dataset (e.g., BPIC20PTC) and an imbalanced regression technique (e.g., BMSE) run the following:

python main.py --dataset BPIC20PTC --IR BMSE

If no imbalanced regression technique is parsed (--IR) the Vanilla model is trained.

CSW (Cost Sensitive re-Weighting) and EAL (Error-Aware Loss) can be combined with Label Distribution Smoothing (LDS) and/or Feature Distribution Smoothing (FDS). Therefore, the pipeline includes running experiments with four different configurations (wos: without smoothing, LDS, FDS, LDS+FDS). For more information, please refer to Delving into Deep Imbalanced Regression and its corresponding GitHub repository.

Balanced MSE (BMSE) cannot be combined with LDS, but the authors suggested that FDS should be complementary to their technique. Therefore, the piepline includes experiments with two configurations (wos and FDS). For more information, please refer to Balanced MSE for Imbalanced Visual Regression and its its corresponding GitHub repository.

The pipeline includes the same two configurations (wos and FDS) for Squared Error Relevance Area (SERA). For more information, please refer to Model Optimization in Imbalanced Regression and Imbalanced regression and extreme value prediction. The original implementation of SERA is provided in R (in this package), and in our implementation it is implemented in Python.

For running SMOGN, please go into the SMOGN branch and run the following (replace the argument parameters as needed):

python main.py --dataset BPIC20PTC --sampling SMOGN --smogn_rel_thres 0.8 --smogn_over_ratio 5.0 --smogn_under_ratio 0.3

To train the uncertainty-aware approach based on survival analysis, --IR argument must be set to 'survival'. It is also possible to train an uncertainty-aware model based on quantile regression using 'quantile' for --IR argument.

All event logs are collected here.
All configurations that are used for hyper-parameter optimization and training are collected here. You need to adjust cdg.data.path in the cfg file in order to determine the path to the XES or CSV file.

Once the survival model is trained, the second step for uncertainty-aware classification (and training the point-estimate deterministic baseline) for a dataset (e.g., BPIC20PTC) can be achieved by running the following:

python delay_analysis.py --dataset BPIC20PTC

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
analysis		analysis
cfg		cfg
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Supplementary_Report.pdf		Supplementary_Report.pdf
delay_analysis.py		delay_analysis.py
delay_report.py		delay_report.py
imbalanced_regression.txt		imbalanced_regression.txt
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mind the Long Tail: Understanding the Difficulty of Delay Detection in Business Processes

Supplementary Report

Installation

Running Experiments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mind the Long Tail: Understanding the Difficulty of Delay Detection in Business Processes

Supplementary Report

Installation

Running Experiments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages