Evaluator Module

fairlib.src.evaluators.init

fairlib.src.evaluators.__init__.present_evaluation_scores(valid_preds, valid_labels, valid_private_labels, test_preds, test_labels, test_private_labels, epoch, epochs_since_improvement, model, epoch_valid_loss, is_best, prefix='checkpoint')

Conduct evaluation, present results, and save evaluation results to file.

Parameters

valid_preds (np.array) – model predictions over the validation dataset.
valid_labels (np.array) – true labels over the validation dataset.
valid_private_labels (np.array) – protected labels over the validation dataset.
test_preds (np.array) – model predictions over the test dataset.
test_labels (np.array) – true labels over the test dataset.
test_private_labels (np.array) – protected labels over the test dataset.
epoch (float) – number of epoch of the model training.
epochs_since_improvement (int) – epoch since the best epoch is updated.
model (torch.module) – the trained model.
epoch_valid_loss (float) – loss over the validation dataset.
is_best (bool) – indicator of whether the current epoch is the best.
prefix (str, optional) – _description_. Defaults to “checkpoint”.

fairlib.src.evaluators.evaluator

fairlib.src.evaluators.evaluator.Aggregation_GAP(distinct_groups, all_scores, metric='TPR', group_agg_power=None, class_agg_power=2)

Aggregate fairness metrics at the group level and class level.

Parameters

distinct_groups (list) – a list of distinc labels of protected groups.
all_scores (dict) – confusion matrix based scores for each protected group and all.
metric (str, optional) – fairness metric. Defaults to “TPR”.
group_agg_power (int, optional) – generalized mean aggregation power at the group level. Use absolute value aggregation if None. Defaults to None.
class_agg_power (int, optional) – generalized mean aggregation power at the class level. Defaults to 2.

Returns

aggregated fairness score.

Return type

np.array

fairlib.src.evaluators.evaluator.Aggregation_Ratio(distinct_groups, all_scores, metric='TPR', group_agg_power=None, class_agg_power=2)

Aggregate fairness metric ratios at the group level and class level.

Parameters

distinct_groups (list) – a list of distinc labels of protected groups.
all_scores (dict) – confusion matrix based scores for each protected group and all.
metric (str, optional) – fairness metric. Defaults to “TPR”.
group_agg_power (int, optional) – generalized mean aggregation power at the group level. Use absolute value aggregation if None. Defaults to None.
class_agg_power (int, optional) – generalized mean aggregation power at the class level. Defaults to 2.

Returns

aggregated fairness score.

Return type

np.array

fairlib.src.evaluators.evaluator.confusion_matrix_based_scores(cnf)

Calculate confusion matrix based scores.

Implementation from https://stackoverflow.com/a/43331484 See https://en.wikipedia.org/wiki/Confusion_matrix for different scores

Parameters: cnf (np.array) – a confusion matrix.
Returns: a set of metrics for each class, indexed by the metric name.
Return type: dict

fairlib.src.evaluators.evaluator.gap_eval_scores(y_pred, y_true, protected_attribute, metrics=['TPR', 'FPR', 'PPR'], args=None)

fairness evaluation

Parameters

y_pred (np.array) – model predictions.
y_true (np.array) – target labels.
protected_attribute (np.array) – protected labels.
metrics (list, optional) – a list of metric names that will be considered for fairness evaluation. Defaults to [“TPR”,”FPR”,”PPR”].

Returns

(fairness evaluation results, confusion matrices)

Return type

tuple

fairlib.src.evaluators.evaluator.power_mean(series, p, axis=0)

calculate the generalized mean of a given list.

Parameters

series (list) – a list of numbers.
p (int) – power of the generalized mean aggregation
axis (int, optional) – aggregation along which dim of the input. Defaults to 0.

Returns

aggregated scores.

Return type

np.array

fairlib.src.evaluators.utils

fairlib.src.evaluators.utils.print_network(net, verbose=False)

print the NN architecture and number of parameters

Parameters

net (torch.Module) – the model object.
verbose (bool, optional) – whether or not print the model architecture. Defaults to False.

fairlib.src.evaluators.utils.save_checkpoint(epoch, epochs_since_improvement, model, loss, dev_evaluations, valid_confusion_matrices, test_confusion_matrices, test_evaluations, is_best, checkpoint_dir, prefix='checkpoint', dev_predictions=None, test_predictions=None)

save check points to a specified file.

Parameters

epoch (float) – number of epoch of the model training.
epochs_since_improvement (int) – epoch since the best epoch is updated.
model (torch.module) – the trained model.
loss (float) – training loss.
dev_evaluations (dict) – evaluation results over the development set.
valid_confusion_matrices (dict) – a dict of confusion matrices over the validation set.
test_confusion_matrices (dict) – a dict of confusion matrices over the test set.
test_evaluations (dict) – evaluation results over the test set.
is_best (bool) – indicator of whether the current epoch is the best.
checkpoint_dir (str) – path the to checkpoint directory.
prefix (str, optional) – the predict of checkpoint file names. Defaults to “checkpoint”.
dev_predictions (_type_, optional) – save the model predictions over the development set if needed. Defaults to None.
test_predictions (_type_, optional) – save the model predictions over the test set if needed. Defaults to None.

Evaluator Module

fairlib.src.evaluators.__init__

fairlib.src.evaluators.evaluator

fairlib.src.evaluators.utils

fairlib.src.evaluators.init