DataLoader Module
fairlib.src.dataloaders.__init__
- fairlib.src.dataloaders.__init__.get_dataloaders(args)
Initialize the torch dataloaders according to arguments.
- Parameters
args (namespace) – arguments
- Raises
NotImplementedError – if correspoding components have not been implemented.
- Returns
dataloaders for training set, development set, and test set.
- Return type
tuple
fairlib.src.dataloaders.encoder
- class fairlib.src.dataloaders.encoder.text2id(args)
mapping natural language to numeric identifiers.
fairlib.src.dataloaders.loaders
- class fairlib.src.dataloaders.loaders.BiosDataset(args, split)
- class fairlib.src.dataloaders.loaders.DeepMojiDataset(args, split)
- class fairlib.src.dataloaders.loaders.SampleDataset(args, split)
- class fairlib.src.dataloaders.loaders.TestDataset(args, split)
- class fairlib.src.dataloaders.loaders.ValenceDataset(args, split)
fairlib.src.dataloaders.utils
- class fairlib.src.dataloaders.utils.BaseDataset(args, split)
- fairlib.src.dataloaders.utils.full_label_data(df, tasks)
filter the instances with all required labels
- Parameters
df (pd.DataFrame) – a DataFrame containing data instances
tasks (list) – a list of names of target columns
- Returns
an array of boolean values indicating whether or not each row meets the requirement.
- Return type
np.array
fairlib.src.dataloaders.generalized_BT
- fairlib.src.dataloaders.generalized_BT.generalized_sampling(default_distribution_dict, N=None, joint_dist=None, g_dist=None, y_dist=None, g_cond_y_dist=None, y_cond_g_dist=None)
Perform resampling according to the specified distribution information
- Parameters
default_distribution_dict (dict) – a dict of distribution information of the original dataset.
N (int, optional) – The total number of returned indices. Defaults to None.
joint_dist (np.ndarray, optional) – n_class * n_groups matrix, where each element refers to the joint probability, i.e., proportion size. Defaults to None.
g_dist (np.ndarray, optional) – n_groups array, indicating the prob of each group. Defaults to None.
y_dist (np.ndarray, optional) – n_class array, indicating the prob of each class. Defaults to None.
g_cond_y_dist (np.ndarray, optional) – n_class * n_groups matrix, g_cond_y_dit[y_id,:] refers to the group distribution within class y_id. Defaults to None.
y_cond_g_dist (np.ndarray, optional) – n_class * n_groups matrix, y_cond_g_dit[:,g_id] refers to the class distribution within group g_id. Defaults to None.
- Returns
list of selected indices.
- Return type
list
- fairlib.src.dataloaders.generalized_BT.get_data_distribution(y_data, g_data)
Given target label and protected labels, calculate empirical distributions.
joint_dist: n_class * n_groups matrix, where each element refers to the joint probability, i.e., proportion size. g_dist: n_groups array, indicating the prob of each group y_dist: n_class array, indicating the prob of each class g_cond_y_dit: n_class * n_groups matrix, g_cond_y_dit[y_id,:] refers to the group distribution within class y_id y_cond_g_dit: n_class * n_groups matrix, y_cond_g_dit[:,g_id] refers to the class distribution within group g_id
- Parameters
y_data (np.ndarray) – target labels
g_data (np.ndarray) – protected labels
- Returns
a dict of distribution info.
- Return type
dict
- fairlib.src.dataloaders.generalized_BT.manipulate_data_distribution(default_distribution_dict, N=None, GBTObj='original', alpha=1)
generalized BT
- Parameters
default_distribution_dict (dict) – a dict of distribution information of the original dataset.
N (int, optional) – The total number of returned indices. Defaults to None.
GBTObj (str, optional) – original | joint | g | y | g_cond_y | y_cond_g. Defaults to “original”.
alpha (int, optional) – interpolation between the original distribution and the target distribution. Defaults to 1.
- Returns
list of selected indices.
- Return type
list