DataLoader Module

fairlib.src.dataloaders.__init__

fairlib.src.dataloaders.__init__.get_dataloaders(args)

Initialize the torch dataloaders according to arguments.

Parameters

args (namespace) – arguments

Raises

NotImplementedError – if correspoding components have not been implemented.

Returns

dataloaders for training set, development set, and test set.

Return type

tuple

fairlib.src.dataloaders.encoder

class fairlib.src.dataloaders.encoder.text2id(args)

mapping natural language to numeric identifiers.

fairlib.src.dataloaders.loaders

class fairlib.src.dataloaders.loaders.BiosDataset(args, split)
class fairlib.src.dataloaders.loaders.DeepMojiDataset(args, split)
class fairlib.src.dataloaders.loaders.SampleDataset(args, split)
class fairlib.src.dataloaders.loaders.TestDataset(args, split)
class fairlib.src.dataloaders.loaders.ValenceDataset(args, split)

fairlib.src.dataloaders.utils

class fairlib.src.dataloaders.utils.BaseDataset(args, split)
fairlib.src.dataloaders.utils.full_label_data(df, tasks)

filter the instances with all required labels

Parameters
  • df (pd.DataFrame) – a DataFrame containing data instances

  • tasks (list) – a list of names of target columns

Returns

an array of boolean values indicating whether or not each row meets the requirement.

Return type

np.array

fairlib.src.dataloaders.generalized_BT

fairlib.src.dataloaders.generalized_BT.generalized_sampling(default_distribution_dict, N=None, joint_dist=None, g_dist=None, y_dist=None, g_cond_y_dist=None, y_cond_g_dist=None)

Perform resampling according to the specified distribution information

Parameters
  • default_distribution_dict (dict) – a dict of distribution information of the original dataset.

  • N (int, optional) – The total number of returned indices. Defaults to None.

  • joint_dist (np.ndarray, optional) – n_class * n_groups matrix, where each element refers to the joint probability, i.e., proportion size. Defaults to None.

  • g_dist (np.ndarray, optional) – n_groups array, indicating the prob of each group. Defaults to None.

  • y_dist (np.ndarray, optional) – n_class array, indicating the prob of each class. Defaults to None.

  • g_cond_y_dist (np.ndarray, optional) – n_class * n_groups matrix, g_cond_y_dit[y_id,:] refers to the group distribution within class y_id. Defaults to None.

  • y_cond_g_dist (np.ndarray, optional) – n_class * n_groups matrix, y_cond_g_dit[:,g_id] refers to the class distribution within group g_id. Defaults to None.

Returns

list of selected indices.

Return type

list

fairlib.src.dataloaders.generalized_BT.get_data_distribution(y_data, g_data)

Given target label and protected labels, calculate empirical distributions.

joint_dist: n_class * n_groups matrix, where each element refers to the joint probability, i.e., proportion size. g_dist: n_groups array, indicating the prob of each group y_dist: n_class array, indicating the prob of each class g_cond_y_dit: n_class * n_groups matrix, g_cond_y_dit[y_id,:] refers to the group distribution within class y_id y_cond_g_dit: n_class * n_groups matrix, y_cond_g_dit[:,g_id] refers to the class distribution within group g_id

Parameters
  • y_data (np.ndarray) – target labels

  • g_data (np.ndarray) – protected labels

Returns

a dict of distribution info.

Return type

dict

fairlib.src.dataloaders.generalized_BT.manipulate_data_distribution(default_distribution_dict, N=None, GBTObj='original', alpha=1)

generalized BT

Parameters
  • default_distribution_dict (dict) – a dict of distribution information of the original dataset.

  • N (int, optional) – The total number of returned indices. Defaults to None.

  • GBTObj (str, optional) – original | joint | g | y | g_cond_y | y_cond_g. Defaults to “original”.

  • alpha (int, optional) – interpolation between the original distribution and the target distribution. Defaults to 1.

Returns

list of selected indices.

Return type

list