ComplexIndexEncoder Module API Reference
ComplexIndexEncoder
Bases: BaseEncoder
Source code in src/enc4ppm/complex_index_encoder.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
__init__(include_timestamps=False, *, labeling_type=LabelingType.NEXT_ACTIVITY, attributes=[], categorical_encoding=CategoricalEncoding.STRING, numerical_scaling=NumericalScaling.NONE, prefix_length=None, prefix_strategy=PrefixStrategy.UP_TO_SPECIFIED, add_time_features=False, timestamp_format=None, case_id_key='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp', outcome_key='outcome')
Initialize the ComplexIndexEncoder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_timestamps
|
bool
|
Whether to add Timestamp columns or not. |
False
|
labeling_type
|
LabelingType
|
Label type to apply to examples. |
NEXT_ACTIVITY
|
attributes
|
list[str] | str
|
Which attributes to consider. Can be a list of the attributes to consider or the string 'all' (all attributes found in the log will be encoded). |
[]
|
categorical_encoding
|
CategoricalEncoding
|
How to encode categorical features. They can either remain strings (CategoricalEncoding.STRING) or be converted to one-hot vectors splitted across multiple columns (CategoricalEncoding.ONE_HOT). |
STRING
|
numerical_scaling
|
NumericalScaling
|
How to scale numerical features. They can be standardized (NumericalScaling.STANDARDIZATION) or left as-is (NumericalScaling.NONE). |
NONE
|
prefix_length
|
int
|
Maximum prefix length to consider: longer prefixes will be discarded, shorter prefixes may be discarded depending on prefix_strategy parameter. If not provided, defaults to maximum prefix length found in log. If provided, it must be a non-zero positive int number. |
None
|
prefix_strategy
|
PrefixStrategy
|
Whether to consider prefix lengths from 1 to prefix_length (PrefixStrategy.UP_TO_SPECIFIED) or only the specified prefix_length (PrefixStrategy.ONLY_SPECIFIED). |
UP_TO_SPECIFIED
|
add_time_features
|
bool
|
Whether to add time features (time since case start and time since last event) to the encoding. |
False
|
timestamp_format
|
str
|
Format of the timestamps in the log. If not provided, formatting will be inferred from the data. |
None
|
case_id_key
|
str
|
Column name for case identifiers. |
'case:concept:name'
|
activity_key
|
str
|
Column name for activity names. |
'concept:name'
|
timestamp_key
|
str
|
Column name for timestamps. |
'time:timestamp'
|
outcome_key
|
str
|
Column name for outcome predition. |
'outcome'
|
Source code in src/enc4ppm/complex_index_encoder.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
encode(df, *, freeze=False)
Encode the provided DataFrame with complex-index encoding and apply the specified labeling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame to encode. |
required |
freeze
|
bool
|
Freeze encoder with provided parameters. Usually set to True when encoding the train log, False otherwise. Required if you want to later save the encoder to a file. |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
The encoded DataFrame. |
Source code in src/enc4ppm/complex_index_encoder.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|