Analysis Module
fairlib.src.analysis.load_results
- fairlib.src.analysis.load_results.model_selection_parallel(results_dir, project_dir, model_id, GAP_metric_name, Performance_metric_name, selection_criterion, checkpoint_dir='models', checkpoint_name='checkpoint_epoch', index_column_names=['adv_lambda', 'adv_num_subDiscriminator', 'adv_diverse_lambda'], n_jobs=20, save_path=None, return_all=False, keep_original_metrics=False)
perform model selection over different runs wrt different hyperparameters
- Parameters
results_dir (str) – dir to the saved experimental results
project_dir (str) – experiment type identifier, e.g., final, hypertune, dev. Same as the arguments.
checkpoint_dir (str) – dir to checkpoints, models by default.
checkpoint_name (str) – checkpoint_epoch{num_epoch}.ptr.gz
model_id (str) – read all experiment start with the same model_id. E.g., “Adv” when tuning hyperparameters for standard adversarial
GAP_metric_name (str) – fairness metric in the log
Performance_metric_name (str) – performance metric name in the log
selection_criterion (str) – {GAP_metric_name | Performance_metric_name | “DTO”}
index_column_names (list) – tuned hyperparameters, [‘adv_lambda’, ‘adv_num_subDiscriminator’, ‘adv_diverse_lambda’] by default.
n_jobs (nonnegative int) – 0 for non-parallel, positive integer refers to the number of parallel processes
- Returns
loaded results
- Return type
pd.DataFrame
fairlib.src.analysis.tables_and_figures
- fairlib.src.analysis.tables_and_figures.final_results_df(results_dict, model_order=None, Fairness_metric_name='fairness', Performance_metric_name='performance', pareto=True, pareto_selection='test', selection_criterion='DTO', return_dev=True, Fairness_threshold=0.0, Performance_threshold=0.0, return_conf=False, save_conf_dir=None, num_trail=None, additional_metrics=[])
Process the results to a single dataset from creating tables and plots.
- Parameters
results_dict (dict) – retrived results dictionary, which is typically the returned dict from function retrive_results
model_order (list, optional) – a list of models that will be considered in the final df. Defaults to None.
Fairness_metric_name (str, optional) – the metric name for fairness evaluation. Defaults to “fairness”.
Performance_metric_name (str, optional) – the metric name for performance evaluation. Defaults to “performance”.
pareto (bool, optional) – whether or not to return only the Pareto frontiers. Defaults to True.
pareto_selection (str, optional) – which split is used to select the frontiers. Defaults to “test”.
selection_criterion (str, optional) – model selection criteria, one of {performance, fairness, both (DTO)} . Defaults to “DTO”.
return_dev (bool, optional) – whether or not to return dev results in the df. Defaults to True.
Fairness_threshold (float, optional) – filtering rows with a minimal fairness as the threshold. Defaults to 0.0.
Performance_threshold (float, optional) – filtering rows with a minimal performance as the threshold. Defaults to 0.0.
return_conf (bool, optional) – return the selected epoch and corresponding YAML configure files if True. Defaults to False.
save_conf_dir (str, optional) – save selected epoch and configure files to the dir. Defaults to None.
num_trail (int, optional) – downsampling the number of searches of each method to $num_trail if not None. Defaults to None.
additional_metrics (list, optional) – report additional evaluation metrics for the selected epoch. Defaults to [].
- Returns
selected results of different models for report
- Return type
pandas.DataFrame
- fairlib.src.analysis.tables_and_figures.interactive_plot(plot_df, figsize=(12, 7), dpi=100, selection='DTO')
Create interactive plots for DTO and constrained selection.
- Parameters
plot_df (_type_) – a pd.DataFrame including numbers for each method.
figsize (tuple, optional) – figure size in tuple. Defaults to (12, 7).
dpi (int, optional) – figure resolution. Defaults to 100.
selection (str, optional) – constrained | DTO, indicating which model selection approach is used. Defaults to “DTO”.
- fairlib.src.analysis.tables_and_figures.make_zoom_plot(plot_df, figure_name=None, xlim=None, ylim=None, figsize=(7.5, 6), dpi=150, zoom_xlim=None, zoom_ylim=None, zoomed_location=[1.05, 0.05, 0.37, 0.9])
Make tradeoff plots with zoomed-in area.
- Parameters
plot_df (pd.DataFrame) – a pd.DataFrame including numbers for each method.
figure_name (str, optional) – save the plot with figure_name. Defaults to None.
xlim (tuple, optional) – x-axis limit. Defaults to None.
ylim (tuple, optional) – y-aix limit. Defaults to None.
figsize (tuple, optional) – figure size. Defaults to (7.5, 6).
dpi (int, optional) – figure resolution. Defaults to 150.
zoom_xlim (tuple, optional) – x-axis interval of the zoomed-in area. Defaults to None.
zoom_ylim (tuple, optional) – y-axis interval of the zoomed-in area. Defaults to None.
zoomed_location (list, optional) – location of the zoomed-in area, [x, y, length, height]. Defaults to [1.05, 0.05, 0.37, 0.9].
- fairlib.src.analysis.tables_and_figures.retrive_results(dataset, log_dir='results')
retrive loaded results of a dataset from files
- Parameters
dataset (str) – dataset name, e.g. Moji, Bios_both, and Bios_gender
log_dir (str, optional) – _description_. Defaults to “results”.
- Returns
experimental result dataframes of different methods.
- Return type
dict
fairlib.src.analysis.utils
- fairlib.src.analysis.utils.DTO(fairness_metric, performacne_metric, utopia_fairness=None, utopia_performance=None)
calculate DTO for each condidate model
- Parameters
fairness_metric (List) – fairness evaluation results (1-GAP)
performacne_metric (List) – performance evaluation results
- fairlib.src.analysis.utils.auc_performance_fairness_tradeoff(pareto_df, random_performance=0, pareto_selection='test', fairness_metric_name='fairness', performance_metric_name='performance', interpolation='linear', performance_threshold=None, normalization=False)
calculate the area under the performance–fairness trade-off curve.
- Parameters
pareto_df (DataFrame) – A data frame of pareto frontiers
random_performance (float, optional) – the lowest performance, which leads to the 1 fairness. Defaults to 0.
pareto_selection (str, optional) – which split is used to select the frontiers. Defaults to “test”.
fairness_metric_name (str, optional) – . the metric name for fairness evaluation. Defaults to “fairness”.
performance_metric_name (str, optional) – the metric name for performance evaluation. Defaults to “performance”.
interpolation (str, optional) – interpolation method for the threshold fairness. Defaults to “linear”.
performance_threshold (float, optional) – the performance threshold for the method. Defaults to None.
normalization (bool, optional) – if normalize the auc score with the maximum auc that can be achieved. Defaults to False.
- Returns
(AUC score, AUC DataFrame)
- Return type
tuple
- fairlib.src.analysis.utils.get_dir(results_dir, project_dir, checkpoint_dir, checkpoint_name, model_id)
retrive logs for experiments
- Parameters
results_dir (str) – dir to the saved experimental results
project_dir (str) – experiment type identifier, e.g., final, hypertune, dev. Same as the arguments.
checkpoint_dir (str) – dir to checkpoints, models by default.
checkpoint_name (str) – checkpoint_epoch{num_epoch}.ptr.gz
model_id (str) – read all experiment start with the same model_id. E.g., “Adv” when tuning hyperparameters for standard adversarial
- Returns
a list of dictionaries, where each dict contains the information for a experiment.
- Return type
list
- fairlib.src.analysis.utils.get_model_scores(exp, GAP_metric, Performance_metric, keep_original_metrics=False)
given the log path for a exp, read log and return the dev&test performacne, fairness, and DTO
- Parameters
exp (str) – get_dir output, includeing the options and path to checkpoints
GAP_metric (str) – the target GAP metric name
Performance_metric (str) – the target performance metric name, e.g., F1, Acc.
- Returns
a pandas df including dev and test scores for each epoch
- Return type
pd.DataFrame
- fairlib.src.analysis.utils.is_pareto_efficient(costs, return_mask=True)
Find the pareto-efficient points
If return_mask is True, this will be an (n_points, ) boolean array Otherwise it will be a (n_efficient_points, ) integer array of indices.
- Parameters
costs (np.array) – An (n_points, n_costs) array
return_mask (bool, optional) – True to return a mask. Defaults to True.
- Returns
An array of indices of pareto-efficient points.
- Return type
np.array
- fairlib.src.analysis.utils.l2norm(matrix_1, matrix_2)
calculate Euclidean distance
- Parameters
matrix_1 (n*d np array) – n is the number of instances, d is num of metric
matrix_2 (n*d np array) – same as matrix_1
- Returns
the row-wise Euclidean distance
- Return type
float
- fairlib.src.analysis.utils.mkdir(path)
make a new directory
- Parameters
path (str) – path to the directory
- fairlib.src.analysis.utils.mkdirs(paths)
make a set of new directories
- Parameters
paths (list) – a list of directories
- fairlib.src.analysis.utils.power_mean(series, p, axis=0)
calculate the generalized mean
- Parameters
series (np.array) – a array of numbers.
p (int) – power of the generalized mean.
axis (int, optional) – axis to the aggregation. Defaults to 0.
- Returns
generalized mean of the inputs
- Return type
np.array
- fairlib.src.analysis.utils.retrive_all_exp_results(exp, GAP_metric_name, Performance_metric_name, index_column_names, keep_original_metrics=False)
retrive experimental results according to the input list of experiments.
- Parameters
exp (list) – a list of experiment info.
GAP_metric_name (str) – gap metric name, such as TPR_GAP.
Performance_metric_name (str) – performance metric name, such as accuracy.
index_column_names (_type_) – a list of hyperparameters that will be used to differentiate runs within a single method.
keep_original_metrics (bool) – whether or not keep all experimental results in the checkpoint. Default to False.
- Returns
retrived results.
- Return type
dict
- fairlib.src.analysis.utils.retrive_exp_results(exp, GAP_metric_name, Performance_metric_name, selection_criterion, index_column_names, keep_original_metrics=False)
Retrive experimental results of a epoch from the saved checkpoint.
- Parameters
exp (_type_) – _description_
GAP_metric_name (_type_) – _description_
Performance_metric_name (_type_) – _description_
selection_criterion (_type_) – _description_
index_column_names (_type_) – _description_
keep_original_metrics (bool, optional) – besides selected performance and fairness, show original metrics. Defaults to False.
- Returns
retrived results.
- Return type
dict