Training¶
aiqc(actual, high_quantile, low_quantile)
¶
Average InterQuantile Coverage for Quantile Regression. Check if the interquantile coverage is close to the coverage of in the test dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
ndarray
|
actual values |
required |
high_quantile_predicted
|
ndarray
|
high quantile predicted values |
required |
low_quantile_predicted
|
ndarray
|
low quantile predicted values |
required |
Source code in inference_model/training/metrics.py
77 78 79 80 81 82 83 84 85 86 87 88 |
|
nacil(actual, high_quantile_predicted, low_quantile_predicted)
¶
Normalized Average Confidence Interval Length for Quantile Regression. The value is normalized to actual(expected) value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
ndarray
|
actual values |
required |
high_quantile_predicted
|
ndarray
|
high quantile predicted values |
required |
low_quantile_predicted
|
ndarray
|
low quantile predicted values |
required |
Source code in inference_model/training/metrics.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
ndcg(actual, predicted)
¶
Normalized Discounted Cumulative Gain
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
ndarray
|
actual values |
required |
predicted
|
ndarray
|
predicted values |
required |
Source code in inference_model/training/metrics.py
32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
rmse(actual, predicted)
¶
Root Mean Squared Error
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
ndarray
|
actual values |
required |
predicted
|
ndarray
|
predicted values |
required |
Source code in inference_model/training/metrics.py
47 48 49 50 51 52 53 54 55 56 |
|
smape(actual, predicted)
¶
Symmetric Mean Absolute Percentage Error https://vedexcel.com/how-to-calculate-smape-in-python/
Parameters:
Name | Type | Description | Default |
---|---|---|---|
actual
|
ndarray
|
actual values |
required |
predicted
|
ndarray
|
predicted values |
required |
Source code in inference_model/training/metrics.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
LGBOptunaOptimizer
¶
Bases: BaseOptimizer
Source code in inference_model/training/optuna_optimizer.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
__init__(objective, n_class=None)
¶
Fallback/backup Optuna optimizer. Development is focused on Raytune. Kepping this code as backup.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
objective
|
str
|
objective of the model |
required |
n_class
|
int
|
number of classes in the dataset |
None
|
Source code in inference_model/training/optuna_optimizer.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
optimize(dtrain, deval)
¶
Optimize LGBM model on provided datasets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtrain
|
Dataset
|
training lgb dataset |
required |
deval
|
Dataset
|
evaluation lgb dataset |
required |
Source code in inference_model/training/optuna_optimizer.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
Trainer
¶
Bases: BaseTrainer
Source code in inference_model/training/trainer.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
|
__init__(cat_cols, target_col, id_cols, objective, optimizer=None, n_class=None, preprocessors=None)
¶
Objects that governs training and parameter optimization of the lgbm model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cat_cols
|
list
|
list of categorical feature column names |
required |
target_col
|
str
|
column name that represents target |
required |
id_cols
|
list
|
identification column names |
required |
objective
|
str
|
type of the task/objective |
required |
optimizer
|
BaseOptimizer
|
parameter optimizer object |
None
|
n_class
|
int
|
number of classes in the dataset |
None
|
preprocessors
|
List[Union[Any, PreprocessData]]
|
ordered list of objects to preprocess dataset before optimization and training |
None
|
Source code in inference_model/training/trainer.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
fit(df_train, df_valid, df_test, random_state=1)
¶
Train the model and optimize the parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_train
|
DataFrame
|
training dataset |
required |
df_valid
|
DataFrame
|
validation dataset |
required |
df_test
|
DataFrame
|
testing dataset |
required |
Source code in inference_model/training/trainer.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
|
train(df_train, params=None, df_valid=None)
¶
Train the model with the parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_train
|
DataFrame
|
training dataset |
required |
params
|
dict
|
model paramaters |
None
|
df_valid
|
DataFrame
|
optional validation dataset |
None
|
Source code in inference_model/training/trainer.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
flatten_dict(d, parent_key='', sep='_')
¶
fastest according to https://www.freecodecamp.org/news/how-to-flatten-a-dictionary-in-python-in-4-different-ways/
Source code in inference_model/training/utils.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|
get_feature_importance(model)
¶
Extract model feature importances and return sorted dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Booster
|
LightGBM model |
required |
Returns:
Name | Type | Description |
---|---|---|
feature_imp |
DataFrame
|
sorted dataframe with features and their importances |
Source code in inference_model/training/utils.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
get_or_create_experiment(experiment_name)
¶
Retrieve the ID of an existing MLflow experiment or create a new one if it doesn't exist.
This function checks if an experiment with the given name exists within MLflow. If it does, the function returns its ID. If not, it creates a new experiment with the provided name and returns its ID.
Parameters: - experiment_name (str): Name of the MLflow experiment.
Returns: - str: ID of the existing or newly created MLflow experiment.
Source code in inference_model/training/utils.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
predict_cls_lgbm_from_raw(preds_raw, task)
¶
Helper function to convert raw margin predictions through a sigmoid to represent a probability.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preds_raw
|
ndarray
|
predictions |
required |
lgbDataset
|
Dataset
|
dataset, containing labels, used for prediction |
required |
Source code in inference_model/training/utils.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
|
predict_proba_lgbm_from_raw(preds_raw, task, binary2d=False)
¶
Apply softmax to array of arrays that is an output of lightgbm.predct(). This replaces predict_proba().
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicted_raw
|
ndarray
|
1D numpy array of arrays |
required |
task
|
str
|
type of task/objective |
required |
binary2d
|
boolean
|
wether the output of binary classification should be 1 or 2d vector |
False
|
Source code in inference_model/training/utils.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
to_lgbdataset(train, cat_cols, target_col, id_cols=[], valid=None)
¶
Transform pandas dataframe to lgbm dataset, or datasets(eval).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train
|
DataFrame
|
training dataset |
required |
cat_cols
|
list
|
list of categorical columns |
required |
target_col
|
str
|
target column in the dataset |
required |
id_cols
|
list
|
list of identifier columns |
[]
|
valid
|
DataFrame
|
validation dataset |
None
|
Source code in inference_model/training/utils.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|