Training¶
metrics ¶
smape ¶
smape(actual, predicted)
Symmetric Mean Absolute Percentage Error https://vedexcel.com/how-to-calculate-smape-in-python/
Parameters:
-
actual
(
np.ndarray
) –actual values
-
predicted
(
np.ndarray
) –predicted values
Returns:
-
smape(
float
) –symmetric mean absolute percentage error
Source code in churn_pred/training/metrics.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
ndcg ¶
ndcg(actual, predicted)
Normalized Discounted Cumulative Gain
Parameters:
-
actual
(
np.ndarray
) –actual values
-
predicted
(
np.ndarray
) –predicted values
Return
ndcg (float): normalized discounted cumulative gain
Source code in churn_pred/training/metrics.py
32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
rmse ¶
rmse(actual, predicted)
Root Mean Squared Error
Parameters:
-
actual
(
np.ndarray
) –actual values
-
predicted
(
np.ndarray
) –predicted values
Returns:
-
rmse(
float
) –root mean square error
Source code in churn_pred/training/metrics.py
47 48 49 50 51 52 53 54 55 56 |
|
nacil ¶
nacil(
actual, high_quantile_predicted, low_quantile_predicted
)
Normalized Average Confidence Interval Length for Quantile Regression. The value is normalized to actual(expected) value.
Parameters:
-
actual
(
np.ndarray
) –actual values
-
high_quantile_predicted
(
np.ndarray
) –high quantile predicted values
-
low_quantile_predicted
(
np.ndarray
) –low quantile predicted values
Returns:
-
nacil(
float
) –normalized average confidence interval length
Source code in churn_pred/training/metrics.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
aiqc ¶
aiqc(actual, high_quantile, low_quantile)
Average InterQuantile Coverage for Quantile Regression. Check if the interquantile coverage is close to the coverage of in the test dataset.
Parameters:
-
actual
(
np.ndarray
) –actual values
-
high_quantile
(
np.ndarray
) –high quantile predicted values
-
low_quantile
(
np.ndarray
) –low quantile predicted values
Returns:
-
aiqc(
float
) –average interquantile coverage
Source code in churn_pred/training/metrics.py
77 78 79 80 81 82 83 84 85 86 87 88 |
|
optuna_optimizer ¶
LGBOptunaOptimizer ¶
LGBOptunaOptimizer(objective, n_class=None)
Bases: BaseOptimizer
Fallback/backup Optuna optimizer. Development is focused on Raytune. Kepping this code as backup.
Parameters:
-
objective
(
str
) –objective of the model
-
n_class
(
int
) –number of classes in the dataset
Source code in churn_pred/training/optuna_optimizer.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
optimize ¶
optimize(dtrain, deval)
Optimize LGBM model on provided datasets.
Parameters:
-
dtrain
(
lgbDataset
) –training lgb dataset
-
deval
(
lgbDataset
) –evaluation lgb dataset
Source code in churn_pred/training/optuna_optimizer.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
trainer ¶
Trainer ¶
Trainer(
cat_cols,
target_col,
id_cols,
objective,
optimizer=None,
n_class=None,
preprocessors=None,
)
Bases: BaseTrainer
Objects that governs training and parameter optimization of the lgbm model.
Parameters:
-
cat_cols
(
list
) –list of categorical feature column names
-
target_col
(
str
) –column name that represents target
-
id_cols
(
list
) –identification column names
-
objective
(
str
) –type of the task/objective
-
optimizer
(
BaseOptimizer
) –parameter optimizer object
Source code in churn_pred/training/trainer.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
train ¶
train(df_train, params=None, df_valid=None)
Train the model with the parameters.
Parameters:
-
df_train
(
pd.DataFrame
) –training dataset
-
params
(
dict
) –model paramaters
-
df_valid
(
pd.DataFrame
) –optional validation dataset
Returns:
-
model(
lgb.basic.Booster
) –trained model
Source code in churn_pred/training/trainer.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
fit ¶
fit(df_train, df_valid, df_test)
Train the model and optimize the parameters.
Parameters:
-
df_train
(
pd.DataFrame
) –training dataset
-
df_valid
(
pd.DataFrame
) –validation dataset
-
df_test
(
pd.DataFrame
) –testing dataset
Returns:
-
model(
lgb.basic.Booster
) –trained model
Source code in churn_pred/training/trainer.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
|
utils ¶
flatten_dict ¶
flatten_dict(d, parent_key='', sep='_')
fastest according to https://www.freecodecamp.org/news/how-to-flatten-a-dictionary-in-python-in-4-different-ways/
Source code in churn_pred/training/utils.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
get_or_create_experiment ¶
get_or_create_experiment(experiment_name)
Retrieve the ID of an existing MLflow experiment or create a new one if it doesn't exist.
This function checks if an experiment with the given name exists within MLflow. If it does, the function returns its ID. If not, it creates a new experiment with the provided name and returns its ID.
- experiment_name (str): Name of the MLflow experiment.
- str: ID of the existing or newly created MLflow experiment.
Source code in churn_pred/training/utils.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
to_lgbdataset ¶
to_lgbdataset(
train, cat_cols, target_col, id_cols=[], valid=None
)
Transform pandas dataframe to lgbm dataset, or datasets(eval).
Parameters:
-
train
(
pd.DataFrame
) –training dataset
-
cat_cols
(
list
) –list of categorical columns
-
target_col
(
str
) –target column in the dataset
-
id_cols
(
list
) –list of identifier columns
-
valid
(
pd.DataFrame
) –validation dataset
Returns:
-
lgb_train(
lgbDataset
) –lgbm training dataset
-
lgb_valid(
lgbDataset
) –lgbm valid dataset
Source code in churn_pred/training/utils.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
predict_proba_lgbm_from_raw ¶
predict_proba_lgbm_from_raw(
preds_raw, task, binary2d=False
)
Apply softmax to array of arrays that is an output of lightgbm.predct(). This replaces predict_proba().
Parameters:
-
preds_raw
(
ndarray
) –1D numpy array of arrays
-
task
(
str
) –type of task/objective
-
binary2d
(
boolean
) –wether the output of binary classification should be 1 or 2d vector
Returns:
-
predicted_probs(
ndarray
) –array with predicted probabilities
Source code in churn_pred/training/utils.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
predict_cls_lgbm_from_raw ¶
predict_cls_lgbm_from_raw(preds_raw, task)
Helper function to convert raw margin predictions through a sigmoid to represent a probability.
Parameters:
-
preds_raw
(
ndarray
) –predictions
-
task
(
Literal['binary', 'multiclass']
) –task type
Returns:
-
y_true, preds
–tuple containg labels and predictions for further evaluation
Source code in churn_pred/training/utils.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
|
get_feature_importance ¶
get_feature_importance(model)
Extract model feature importances and return sorted dataframe.
Parameters:
-
model
(
lgb.basic.Booster
) –LightGBM model
Returns:
-
feature_imp(
pd.DataFrame
) –sorted dataframe with features and their importances
Source code in churn_pred/training/utils.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
|