Analysis Module

fairlib.src.analysis.load_results

fairlib.src.analysis.load_results.model_selection_parallel(results_dir, project_dir, model_id, GAP_metric_name, Performance_metric_name, selection_criterion, checkpoint_dir='models', checkpoint_name='checkpoint_epoch', index_column_names=['adv_lambda', 'adv_num_subDiscriminator', 'adv_diverse_lambda'], n_jobs=20, save_path=None, return_all=False, keep_original_metrics=False)

perform model selection over different runs wrt different hyperparameters

Parameters

results_dir (str) – dir to the saved experimental results
project_dir (str) – experiment type identifier, e.g., final, hypertune, dev. Same as the arguments.
checkpoint_dir (str) – dir to checkpoints, models by default.
checkpoint_name (str) – checkpoint_epoch{num_epoch}.ptr.gz
model_id (str) – read all experiment start with the same model_id. E.g., “Adv” when tuning hyperparameters for standard adversarial
GAP_metric_name (str) – fairness metric in the log
Performance_metric_name (str) – performance metric name in the log
selection_criterion (str) – {GAP_metric_name | Performance_metric_name | “DTO”}
index_column_names (list) – tuned hyperparameters, [‘adv_lambda’, ‘adv_num_subDiscriminator’, ‘adv_diverse_lambda’] by default.
n_jobs (nonnegative int) – 0 for non-parallel, positive integer refers to the number of parallel processes

Returns

loaded results

Return type

pd.DataFrame

fairlib.src.analysis.tables_and_figures

fairlib.src.analysis.tables_and_figures.final_results_df(results_dict, model_order=None, Fairness_metric_name='fairness', Performance_metric_name='performance', pareto=True, pareto_selection='test', selection_criterion='DTO', return_dev=True, Fairness_threshold=0.0, Performance_threshold=0.0, return_conf=False, save_conf_dir=None, num_trail=None, additional_metrics=[])

Process the results to a single dataset from creating tables and plots.

Parameters

results_dict (dict) – retrived results dictionary, which is typically the returned dict from function retrive_results
model_order (list, optional) – a list of models that will be considered in the final df. Defaults to None.
Fairness_metric_name (str, optional) – the metric name for fairness evaluation. Defaults to “fairness”.
Performance_metric_name (str, optional) – the metric name for performance evaluation. Defaults to “performance”.
pareto (bool, optional) – whether or not to return only the Pareto frontiers. Defaults to True.
pareto_selection (str, optional) – which split is used to select the frontiers. Defaults to “test”.
selection_criterion (str, optional) – model selection criteria, one of {performance, fairness, both (DTO)} . Defaults to “DTO”.
return_dev (bool, optional) – whether or not to return dev results in the df. Defaults to True.
Fairness_threshold (float, optional) – filtering rows with a minimal fairness as the threshold. Defaults to 0.0.
Performance_threshold (float, optional) – filtering rows with a minimal performance as the threshold. Defaults to 0.0.
return_conf (bool, optional) – return the selected epoch and corresponding YAML configure files if True. Defaults to False.
save_conf_dir (str, optional) – save selected epoch and configure files to the dir. Defaults to None.
num_trail (int, optional) – downsampling the number of searches of each method to $num_trail if not None. Defaults to None.
additional_metrics (list, optional) – report additional evaluation metrics for the selected epoch. Defaults to [].

Returns

selected results of different models for report

Return type

pandas.DataFrame

fairlib.src.analysis.tables_and_figures.interactive_plot(plot_df, figsize=(12, 7), dpi=100, selection='DTO')

Create interactive plots for DTO and constrained selection.

Parameters

plot_df (_type_) – a pd.DataFrame including numbers for each method.
figsize (tuple, optional) – figure size in tuple. Defaults to (12, 7).
dpi (int, optional) – figure resolution. Defaults to 100.
selection (str, optional) – constrained | DTO, indicating which model selection approach is used. Defaults to “DTO”.

fairlib.src.analysis.tables_and_figures.make_zoom_plot(plot_df, figure_name=None, xlim=None, ylim=None, figsize=(7.5, 6), dpi=150, zoom_xlim=None, zoom_ylim=None, zoomed_location=[1.05, 0.05, 0.37, 0.9])

Make tradeoff plots with zoomed-in area.

Parameters

plot_df (pd.DataFrame) – a pd.DataFrame including numbers for each method.
figure_name (str, optional) – save the plot with figure_name. Defaults to None.
xlim (tuple, optional) – x-axis limit. Defaults to None.
ylim (tuple, optional) – y-aix limit. Defaults to None.
figsize (tuple, optional) – figure size. Defaults to (7.5, 6).
dpi (int, optional) – figure resolution. Defaults to 150.
zoom_xlim (tuple, optional) – x-axis interval of the zoomed-in area. Defaults to None.
zoom_ylim (tuple, optional) – y-axis interval of the zoomed-in area. Defaults to None.
zoomed_location (list, optional) – location of the zoomed-in area, [x, y, length, height]. Defaults to [1.05, 0.05, 0.37, 0.9].

fairlib.src.analysis.tables_and_figures.retrive_results(dataset, log_dir='results')

retrive loaded results of a dataset from files

Parameters

dataset (str) – dataset name, e.g. Moji, Bios_both, and Bios_gender
log_dir (str, optional) – _description_. Defaults to “results”.

Returns

experimental result dataframes of different methods.

Return type

dict

fairlib.src.analysis.utils

fairlib.src.analysis.utils.DTO(fairness_metric, performacne_metric, utopia_fairness=None, utopia_performance=None)

calculate DTO for each condidate model

Parameters

fairness_metric (List) – fairness evaluation results (1-GAP)
performacne_metric (List) – performance evaluation results

fairlib.src.analysis.utils.auc_performance_fairness_tradeoff(pareto_df, random_performance=0, pareto_selection='test', fairness_metric_name='fairness', performance_metric_name='performance', interpolation='linear', performance_threshold=None, normalization=False)

calculate the area under the performance–fairness trade-off curve.

Parameters

pareto_df (DataFrame) – A data frame of pareto frontiers
random_performance (float, optional) – the lowest performance, which leads to the 1 fairness. Defaults to 0.
pareto_selection (str, optional) – which split is used to select the frontiers. Defaults to “test”.
fairness_metric_name (str, optional) – . the metric name for fairness evaluation. Defaults to “fairness”.
performance_metric_name (str, optional) – the metric name for performance evaluation. Defaults to “performance”.
interpolation (str, optional) – interpolation method for the threshold fairness. Defaults to “linear”.
performance_threshold (float, optional) – the performance threshold for the method. Defaults to None.
normalization (bool, optional) – if normalize the auc score with the maximum auc that can be achieved. Defaults to False.

Returns

(AUC score, AUC DataFrame)

Return type

tuple

fairlib.src.analysis.utils.get_dir(results_dir, project_dir, checkpoint_dir, checkpoint_name, model_id)

retrive logs for experiments

Parameters

results_dir (str) – dir to the saved experimental results
project_dir (str) – experiment type identifier, e.g., final, hypertune, dev. Same as the arguments.
checkpoint_dir (str) – dir to checkpoints, models by default.
checkpoint_name (str) – checkpoint_epoch{num_epoch}.ptr.gz
model_id (str) – read all experiment start with the same model_id. E.g., “Adv” when tuning hyperparameters for standard adversarial

Returns

a list of dictionaries, where each dict contains the information for a experiment.

Return type

list

fairlib.src.analysis.utils.get_model_scores(exp, GAP_metric, Performance_metric, keep_original_metrics=False)

given the log path for a exp, read log and return the dev&test performacne, fairness, and DTO

Parameters

exp (str) – get_dir output, includeing the options and path to checkpoints
GAP_metric (str) – the target GAP metric name
Performance_metric (str) – the target performance metric name, e.g., F1, Acc.

Returns

a pandas df including dev and test scores for each epoch

Return type

pd.DataFrame

fairlib.src.analysis.utils.is_pareto_efficient(costs, return_mask=True)

Find the pareto-efficient points

If return_mask is True, this will be an (n_points, ) boolean array Otherwise it will be a (n_efficient_points, ) integer array of indices.

Parameters

costs (np.array) – An (n_points, n_costs) array
return_mask (bool, optional) – True to return a mask. Defaults to True.

Returns

An array of indices of pareto-efficient points.

Return type

np.array

fairlib.src.analysis.utils.l2norm(matrix_1, matrix_2)

calculate Euclidean distance

Parameters

matrix_1 (n*d np array) – n is the number of instances, d is num of metric
matrix_2 (n*d np array) – same as matrix_1

Returns

the row-wise Euclidean distance

Return type

float

fairlib.src.analysis.utils.mkdir(path)

make a new directory

Parameters: path (str) – path to the directory

fairlib.src.analysis.utils.mkdirs(paths)

make a set of new directories

Parameters: paths (list) – a list of directories

fairlib.src.analysis.utils.power_mean(series, p, axis=0)

calculate the generalized mean

Parameters

series (np.array) – a array of numbers.
p (int) – power of the generalized mean.
axis (int, optional) – axis to the aggregation. Defaults to 0.

Returns

generalized mean of the inputs

Return type

np.array

fairlib.src.analysis.utils.retrive_all_exp_results(exp, GAP_metric_name, Performance_metric_name, index_column_names, keep_original_metrics=False)

retrive experimental results according to the input list of experiments.

Parameters

exp (list) – a list of experiment info.
GAP_metric_name (str) – gap metric name, such as TPR_GAP.
Performance_metric_name (str) – performance metric name, such as accuracy.
index_column_names (_type_) – a list of hyperparameters that will be used to differentiate runs within a single method.
keep_original_metrics (bool) – whether or not keep all experimental results in the checkpoint. Default to False.

Returns

retrived results.

Return type

dict

fairlib.src.analysis.utils.retrive_exp_results(exp, GAP_metric_name, Performance_metric_name, selection_criterion, index_column_names, keep_original_metrics=False)

Retrive experimental results of a epoch from the saved checkpoint.

Parameters

exp (_type_) – _description_
GAP_metric_name (_type_) – _description_
Performance_metric_name (_type_) – _description_
selection_criterion (_type_) – _description_
index_column_names (_type_) – _description_
keep_original_metrics (bool, optional) – besides selected performance and fairness, show original metrics. Defaults to False.

Returns

retrived results.

Return type

dict