DataLoader Module

fairlib.src.dataloaders.init

fairlib.src.dataloaders.__init__.get_dataloaders(args)

Initialize the torch dataloaders according to arguments.

Parameters: args (namespace) – arguments
Raises: NotImplementedError – if correspoding components have not been implemented.
Returns: dataloaders for training set, development set, and test set.
Return type: tuple

fairlib.src.dataloaders.encoder

class fairlib.src.dataloaders.encoder.text2id(args): mapping natural language to numeric identifiers.

fairlib.src.dataloaders.loaders

class fairlib.src.dataloaders.loaders.BiosDataset(args, split)

class fairlib.src.dataloaders.loaders.DeepMojiDataset(args, split)

class fairlib.src.dataloaders.loaders.SampleDataset(args, split)

class fairlib.src.dataloaders.loaders.TestDataset(args, split)

class fairlib.src.dataloaders.loaders.ValenceDataset(args, split)

fairlib.src.dataloaders.utils

class fairlib.src.dataloaders.utils.BaseDataset(args, split)

fairlib.src.dataloaders.utils.full_label_data(df, tasks)

filter the instances with all required labels

Parameters

df (pd.DataFrame) – a DataFrame containing data instances
tasks (list) – a list of names of target columns

Returns

an array of boolean values indicating whether or not each row meets the requirement.

Return type

np.array

fairlib.src.dataloaders.generalized_BT

fairlib.src.dataloaders.generalized_BT.generalized_sampling(default_distribution_dict, N=None, joint_dist=None, g_dist=None, y_dist=None, g_cond_y_dist=None, y_cond_g_dist=None)

Perform resampling according to the specified distribution information

Parameters

default_distribution_dict (dict) – a dict of distribution information of the original dataset.
N (int, optional) – The total number of returned indices. Defaults to None.
joint_dist (np.ndarray, optional) – n_class * n_groups matrix, where each element refers to the joint probability, i.e., proportion size. Defaults to None.
g_dist (np.ndarray, optional) – n_groups array, indicating the prob of each group. Defaults to None.
y_dist (np.ndarray, optional) – n_class array, indicating the prob of each class. Defaults to None.
g_cond_y_dist (np.ndarray, optional) – n_class * n_groups matrix, g_cond_y_dit[y_id,:] refers to the group distribution within class y_id. Defaults to None.
y_cond_g_dist (np.ndarray, optional) – n_class * n_groups matrix, y_cond_g_dit[:,g_id] refers to the class distribution within group g_id. Defaults to None.

Returns

list of selected indices.

Return type

list

fairlib.src.dataloaders.generalized_BT.get_data_distribution(y_data, g_data)

Given target label and protected labels, calculate empirical distributions.

joint_dist: n_class * n_groups matrix, where each element refers to the joint probability, i.e., proportion size. g_dist: n_groups array, indicating the prob of each group y_dist: n_class array, indicating the prob of each class g_cond_y_dit: n_class * n_groups matrix, g_cond_y_dit[y_id,:] refers to the group distribution within class y_id y_cond_g_dit: n_class * n_groups matrix, y_cond_g_dit[:,g_id] refers to the class distribution within group g_id

Parameters

y_data (np.ndarray) – target labels
g_data (np.ndarray) – protected labels

Returns

a dict of distribution info.

Return type

dict

fairlib.src.dataloaders.generalized_BT.manipulate_data_distribution(default_distribution_dict, N=None, GBTObj='original', alpha=1)

generalized BT

Parameters

default_distribution_dict (dict) – a dict of distribution information of the original dataset.
N (int, optional) – The total number of returned indices. Defaults to None.
GBTObj (str, optional) – original | joint | g | y | g_cond_y | y_cond_g. Defaults to “original”.
alpha (int, optional) – interpolation between the original distribution and the target distribution. Defaults to 1.

Returns

list of selected indices.

Return type

list

DataLoader Module

fairlib.src.dataloaders.__init__

fairlib.src.dataloaders.encoder

fairlib.src.dataloaders.loaders

fairlib.src.dataloaders.utils

fairlib.src.dataloaders.generalized_BT

fairlib.src.dataloaders.init