Evaluator Module

fairlib.src.evaluators.__init__

fairlib.src.evaluators.__init__.present_evaluation_scores(valid_preds, valid_labels, valid_private_labels, test_preds, test_labels, test_private_labels, epoch, epochs_since_improvement, model, epoch_valid_loss, is_best, prefix='checkpoint')

Conduct evaluation, present results, and save evaluation results to file.

Parameters
  • valid_preds (np.array) – model predictions over the validation dataset.

  • valid_labels (np.array) – true labels over the validation dataset.

  • valid_private_labels (np.array) – protected labels over the validation dataset.

  • test_preds (np.array) – model predictions over the test dataset.

  • test_labels (np.array) – true labels over the test dataset.

  • test_private_labels (np.array) – protected labels over the test dataset.

  • epoch (float) – number of epoch of the model training.

  • epochs_since_improvement (int) – epoch since the best epoch is updated.

  • model (torch.module) – the trained model.

  • epoch_valid_loss (float) – loss over the validation dataset.

  • is_best (bool) – indicator of whether the current epoch is the best.

  • prefix (str, optional) – _description_. Defaults to “checkpoint”.

fairlib.src.evaluators.evaluator

fairlib.src.evaluators.evaluator.Aggregation_GAP(distinct_groups, all_scores, metric='TPR', group_agg_power=None, class_agg_power=2)

Aggregate fairness metrics at the group level and class level.

Parameters
  • distinct_groups (list) – a list of distinc labels of protected groups.

  • all_scores (dict) – confusion matrix based scores for each protected group and all.

  • metric (str, optional) – fairness metric. Defaults to “TPR”.

  • group_agg_power (int, optional) – generalized mean aggregation power at the group level. Use absolute value aggregation if None. Defaults to None.

  • class_agg_power (int, optional) – generalized mean aggregation power at the class level. Defaults to 2.

Returns

aggregated fairness score.

Return type

np.array

fairlib.src.evaluators.evaluator.Aggregation_Ratio(distinct_groups, all_scores, metric='TPR', group_agg_power=None, class_agg_power=2)

Aggregate fairness metric ratios at the group level and class level.

Parameters
  • distinct_groups (list) – a list of distinc labels of protected groups.

  • all_scores (dict) – confusion matrix based scores for each protected group and all.

  • metric (str, optional) – fairness metric. Defaults to “TPR”.

  • group_agg_power (int, optional) – generalized mean aggregation power at the group level. Use absolute value aggregation if None. Defaults to None.

  • class_agg_power (int, optional) – generalized mean aggregation power at the class level. Defaults to 2.

Returns

aggregated fairness score.

Return type

np.array

fairlib.src.evaluators.evaluator.confusion_matrix_based_scores(cnf)

Calculate confusion matrix based scores.

Implementation from https://stackoverflow.com/a/43331484 See https://en.wikipedia.org/wiki/Confusion_matrix for different scores

Parameters

cnf (np.array) – a confusion matrix.

Returns

a set of metrics for each class, indexed by the metric name.

Return type

dict

fairlib.src.evaluators.evaluator.gap_eval_scores(y_pred, y_true, protected_attribute, metrics=['TPR', 'FPR', 'PPR'], args=None)

fairness evaluation

Parameters
  • y_pred (np.array) – model predictions.

  • y_true (np.array) – target labels.

  • protected_attribute (np.array) – protected labels.

  • metrics (list, optional) – a list of metric names that will be considered for fairness evaluation. Defaults to [“TPR”,”FPR”,”PPR”].

Returns

(fairness evaluation results, confusion matrices)

Return type

tuple

fairlib.src.evaluators.evaluator.power_mean(series, p, axis=0)

calculate the generalized mean of a given list.

Parameters
  • series (list) – a list of numbers.

  • p (int) – power of the generalized mean aggregation

  • axis (int, optional) – aggregation along which dim of the input. Defaults to 0.

Returns

aggregated scores.

Return type

np.array

fairlib.src.evaluators.utils

fairlib.src.evaluators.utils.print_network(net, verbose=False)

print the NN architecture and number of parameters

Parameters
  • net (torch.Module) – the model object.

  • verbose (bool, optional) – whether or not print the model architecture. Defaults to False.

fairlib.src.evaluators.utils.save_checkpoint(epoch, epochs_since_improvement, model, loss, dev_evaluations, valid_confusion_matrices, test_confusion_matrices, test_evaluations, is_best, checkpoint_dir, prefix='checkpoint', dev_predictions=None, test_predictions=None)

save check points to a specified file.

Parameters
  • epoch (float) – number of epoch of the model training.

  • epochs_since_improvement (int) – epoch since the best epoch is updated.

  • model (torch.module) – the trained model.

  • loss (float) – training loss.

  • dev_evaluations (dict) – evaluation results over the development set.

  • valid_confusion_matrices (dict) – a dict of confusion matrices over the validation set.

  • test_confusion_matrices (dict) – a dict of confusion matrices over the test set.

  • test_evaluations (dict) – evaluation results over the test set.

  • is_best (bool) – indicator of whether the current epoch is the best.

  • checkpoint_dir (str) – path the to checkpoint directory.

  • prefix (str, optional) – the predict of checkpoint file names. Defaults to “checkpoint”.

  • dev_predictions (_type_, optional) – save the model predictions over the development set if needed. Defaults to None.

  • test_predictions (_type_, optional) – save the model predictions over the test set if needed. Defaults to None.