Encoding For Predictive Process Monitoring (enc4ppm)
enc4ppm
is a Python package than provides common process mining encodings.
Installation
Using pip:
pip install enc4ppm
Example
The following example performs frequency encoding with latest payload for next activity prediction task:
import pandas as pd
from enc4ppm.frequency_encoder import FrequencyEncoder
from enc4ppm.constants import LabelingType
# Load log
log = pd.read_csv('bpic2012.csv')
# Create encoder
encoder = FrequencyEncoder(
labeling_type=LabelingType.NEXT_ACTIVITY,
include_latest_payload=True,
attributes=['AMOUNT_REQ'],
)
# Encode log
encoded_log = encoder.encode(log)
Features
- Frequency, simple-index and complex-index encodings
- Next activity, remaining time and outcome labelings
- Save encoder to disk for later use
- Freeze encoder on training set, then use it on unseen data (automatic handling of unknown values)
- Standardize numerical features
- Convert categorical features to one-hot encoding, or keep them as strings
- Add time features (time since case start and time since last event) to the encoding