Evaluator Module
fairlib.src.evaluators.__init__
- fairlib.src.evaluators.__init__.present_evaluation_scores(valid_preds, valid_labels, valid_private_labels, test_preds, test_labels, test_private_labels, epoch, epochs_since_improvement, model, epoch_valid_loss, is_best, prefix='checkpoint')
Conduct evaluation, present results, and save evaluation results to file.
- Parameters
valid_preds (np.array) – model predictions over the validation dataset.
valid_labels (np.array) – true labels over the validation dataset.
valid_private_labels (np.array) – protected labels over the validation dataset.
test_preds (np.array) – model predictions over the test dataset.
test_labels (np.array) – true labels over the test dataset.
test_private_labels (np.array) – protected labels over the test dataset.
epoch (float) – number of epoch of the model training.
epochs_since_improvement (int) – epoch since the best epoch is updated.
model (torch.module) – the trained model.
epoch_valid_loss (float) – loss over the validation dataset.
is_best (bool) – indicator of whether the current epoch is the best.
prefix (str, optional) – _description_. Defaults to “checkpoint”.
fairlib.src.evaluators.evaluator
- fairlib.src.evaluators.evaluator.Aggregation_GAP(distinct_groups, all_scores, metric='TPR', group_agg_power=None, class_agg_power=2)
Aggregate fairness metrics at the group level and class level.
- Parameters
distinct_groups (list) – a list of distinc labels of protected groups.
all_scores (dict) – confusion matrix based scores for each protected group and all.
metric (str, optional) – fairness metric. Defaults to “TPR”.
group_agg_power (int, optional) – generalized mean aggregation power at the group level. Use absolute value aggregation if None. Defaults to None.
class_agg_power (int, optional) – generalized mean aggregation power at the class level. Defaults to 2.
- Returns
aggregated fairness score.
- Return type
np.array
- fairlib.src.evaluators.evaluator.Aggregation_Ratio(distinct_groups, all_scores, metric='TPR', group_agg_power=None, class_agg_power=2)
Aggregate fairness metric ratios at the group level and class level.
- Parameters
distinct_groups (list) – a list of distinc labels of protected groups.
all_scores (dict) – confusion matrix based scores for each protected group and all.
metric (str, optional) – fairness metric. Defaults to “TPR”.
group_agg_power (int, optional) – generalized mean aggregation power at the group level. Use absolute value aggregation if None. Defaults to None.
class_agg_power (int, optional) – generalized mean aggregation power at the class level. Defaults to 2.
- Returns
aggregated fairness score.
- Return type
np.array
- fairlib.src.evaluators.evaluator.confusion_matrix_based_scores(cnf)
Calculate confusion matrix based scores.
Implementation from https://stackoverflow.com/a/43331484 See https://en.wikipedia.org/wiki/Confusion_matrix for different scores
- Parameters
cnf (np.array) – a confusion matrix.
- Returns
a set of metrics for each class, indexed by the metric name.
- Return type
dict
- fairlib.src.evaluators.evaluator.gap_eval_scores(y_pred, y_true, protected_attribute, metrics=['TPR', 'FPR', 'PPR'], args=None)
fairness evaluation
- Parameters
y_pred (np.array) – model predictions.
y_true (np.array) – target labels.
protected_attribute (np.array) – protected labels.
metrics (list, optional) – a list of metric names that will be considered for fairness evaluation. Defaults to [“TPR”,”FPR”,”PPR”].
- Returns
(fairness evaluation results, confusion matrices)
- Return type
tuple
- fairlib.src.evaluators.evaluator.power_mean(series, p, axis=0)
calculate the generalized mean of a given list.
- Parameters
series (list) – a list of numbers.
p (int) – power of the generalized mean aggregation
axis (int, optional) – aggregation along which dim of the input. Defaults to 0.
- Returns
aggregated scores.
- Return type
np.array
fairlib.src.evaluators.utils
- fairlib.src.evaluators.utils.print_network(net, verbose=False)
print the NN architecture and number of parameters
- Parameters
net (torch.Module) – the model object.
verbose (bool, optional) – whether or not print the model architecture. Defaults to False.
- fairlib.src.evaluators.utils.save_checkpoint(epoch, epochs_since_improvement, model, loss, dev_evaluations, valid_confusion_matrices, test_confusion_matrices, test_evaluations, is_best, checkpoint_dir, prefix='checkpoint', dev_predictions=None, test_predictions=None)
save check points to a specified file.
- Parameters
epoch (float) – number of epoch of the model training.
epochs_since_improvement (int) – epoch since the best epoch is updated.
model (torch.module) – the trained model.
loss (float) – training loss.
dev_evaluations (dict) – evaluation results over the development set.
valid_confusion_matrices (dict) – a dict of confusion matrices over the validation set.
test_confusion_matrices (dict) – a dict of confusion matrices over the test set.
test_evaluations (dict) – evaluation results over the test set.
is_best (bool) – indicator of whether the current epoch is the best.
checkpoint_dir (str) – path the to checkpoint directory.
prefix (str, optional) – the predict of checkpoint file names. Defaults to “checkpoint”.
dev_predictions (_type_, optional) – save the model predictions over the development set if needed. Defaults to None.
test_predictions (_type_, optional) – save the model predictions over the test set if needed. Defaults to None.