yamle.data.datamodule module#

class yamle.data.datamodule.BaseDataModule(validation_portion=0.1, test_portion=0.1, calibration_portion=0.0, seed=0, data_dir=None, train_splits=None, train_splits_proportions=None, train_size=None, train_transform=None, test_transform=None, test_augmentations=None, train_target_transform=None, test_target_transform=None, train_joint_transform=None, test_joint_transform=None, num_workers=0, batch_size=32, pin_memory=True)[source]#

Bases: ABC

General data module returning training, validation and test data loaders.

Parameters:
  • validation_portion (float) – Portion of the training data to use for validation.

  • test_portion (float) – Portion of the training data to use for test if test data is not provided.

  • calibration_portion (float) – Portion of the training data to use for calibration.

  • seed (int) – Seed for the random number generator.

  • data_dir (str) – Path to the data directory.

  • train_splits (Optional[int]) – Number of splits to use for the training data.

  • train_splits_proportions (Optional[List[float]]) – Proportions of the training data to use for each split.

  • train_size (Optional[int]) – Size of the training data.

  • train_tranform (Optional[List[str]]) – Transformations to apply to the training data. Note that if the list is provided, it is ordered.

  • test_transform (Optional[List[str]]) – Transformations to apply to the test data. Note that if the list is provided, it is ordered.

  • test_augmentations (Optional[List[str]]) – Augmentations to apply to the test data. Note that if the list is provided, it is ordered.

  • train_target_transform (Optional[List[str]]) – Transformations to apply to the training targets. Note that if the list is provided, it is ordered.

  • test_target_transform (Optional[List[str]]) – Transformations to apply to the test targets. Note that if the list is provided, it is ordered.

  • train_joint_transform (Optional[List[str]]) – Transformations to apply to the training data as well as the targets. Note that if the list is provided, it is ordered.

  • test_joint_transform (Optional[List[str]]) – Transformations to apply to the test data as well as the targets. Note that if the list is provided, it is ordered.

  • num_workers (Optional[int]) – Number of workers to use for the data loaders. Defaults to None.

  • batch_size (int) – Batch size to use for the data loaders. Defaults to 32.

  • pin_memory (bool) – Whether to use pinned memory for the data loaders. Defaults to True.

data_shape = None#
inputs_dim = None#
inputs_dtype = torch.float32#
outputs_dim = None#
outputs_dtype = torch.float32#
targets_dim = None#
task = ''#
ignore_indices = []#
get_transform(name)[source]#

Returns the transformation with the given name.

Return type:

Callable

get_transform_composition(names, joint=False)[source]#

Returns the composition of the transformations with the given names.

Return type:

Compose

train_transform()[source]#

Returns the training data transformation.

Return type:

Optional[Compose]

validation_transform()[source]#

Returns the validation data transformation.

Return type:

Optional[Compose]

test_transform()[source]#

Returns the test data transformation.

Return type:

Optional[Compose]

test_augmentation(name)[source]#

Returns the augmentation with the given name.

Return type:

Callable

calibration_transform()[source]#

Returns the calibration data transformation.

Return type:

Optional[Compose]

train_target_transform()[source]#

Returns the training target transformation.

Return type:

Optional[Compose]

validation_target_transform()[source]#

Returns the validation target transformation.

Return type:

Optional[Compose]

test_target_transform()[source]#

Returns the test target transformation.

Return type:

Optional[Compose]

calibration_target_transform()[source]#

Returns the calibration target transformation.

Return type:

Optional[Compose]

train_joint_transform()[source]#

Returns the training joint transformation.

Return type:

Optional[Compose]

validation_joint_transform()[source]#

Returns the validation joint transformation.

Return type:

Optional[Compose]

test_joint_transform()[source]#

Returns the test joint transformation.

Return type:

Optional[Compose]

calibration_joint_transform()[source]#

Returns the calibration joint transformation.

Return type:

Optional[Compose]

train_dataset(split=None)[source]#

Returns the training dataset.

Return type:

Union[SurrogateDataset, Subset]

train_dataset_size(split=None)[source]#

Returns the size of the training dataset.

Return type:

int

train_dataloader(shuffle=True, split=None)[source]#

Returns the training data loader.

Return type:

DataLoader

train_number_of_batches(split=None)[source]#

Returns the number of batches in the training dataset.

Return type:

int

validation_dataset()[source]#

Returns the validation dataset.

Return type:

Optional[SurrogateDataset]

validation_dataset_size()[source]#

Returns the size of the validation dataset.

Return type:

int

validation_dataloader(shuffle=False)[source]#

Returns the validation data loader.

Return type:

Optional[DataLoader]

validation_number_of_batches()[source]#

Returns the number of batches in the validation dataset.

Return type:

int

test_dataset()[source]#

Returns the test dataset.

Return type:

SurrogateDataset

test_dataset_size()[source]#

Returns the size of the test dataset.

Return type:

int

test_dataloader(shuffle=False)[source]#

Returns the test data loader.

Return type:

DataLoader

test_number_of_batches()[source]#

Returns the number of batches in the test dataset.

Return type:

int

calibration_dataset()[source]#

Returns the calibration dataset.

Return type:

Optional[SurrogateDataset]

calibration_dataset_size()[source]#

Returns the size of the calibration dataset.

Return type:

int

calibration_dataloader(shuffle=False)[source]#

Returns the calibration data loader.

Return type:

Optional[DataLoader]

calibration_number_of_batches()[source]#

Returns the number of batches in the calibration dataset.

Return type:

int

total_dataset_size()[source]#

Returns the size of the total dataset.

Return type:

int

sample_data(batch_size=1, dataset='train')[source]#

Sample random data from training, validation or test dataset.

It returns the input, target and index of the sampled data.

Return type:

Tuple[Tensor, Tensor, Tensor]

abstract prepare_data()[source]#

Download and prepare the data, the data is stored in self._train_dataset, self._validation_dataset, self._test_dataset and self._calibration_dataset.

Return type:

None

setup(*args, **kwargs)[source]#

Split the data into training, validation, calibration and test sets.

The training and test sets need to be always provided, the validation and calibration sets are optional. The validation and calibration sets can be also provided in the base datamodule, then the portions are ignored. The splitting with respect to validation and calibration sets is done with respect to the training set.

Return type:

None

plot(tester, save_path, specific_name='')[source]#

if self.can_be_plotted is True, this method is used to plot the data and the model predictions.

Return type:

None

static add_specific_args(parent_parser)[source]#

This method is used to add datamodel specific arguments to the general parser.

Return type:

ArgumentParser

class yamle.data.datamodule.VisionDataModule(*args, **kwargs)[source]#

Bases: BaseDataModule

Data module for the vision datasets.

Parameters:
  • validation_portion (float) – Portion of the training data to use for validation.

  • seed (int) – Seed for the random number generator.

  • data_dir (str) – Path to the data directory.

  • train_tranform (Callable) – Transformations to apply to the training data. Default: transforms.ToTensor(), transforms.Normalize(mean, str).

  • test_transform (Callable) – Transformations to apply to the test data. Default: transforms.ToTensor(), transforms.Normalize(mean, str).

mean: Tuple[float, ...] = None#
std: Tuple[float, ...] = None#
inputs_dtype = torch.float32#
get_transform(name)[source]#

This is a helper function to get the transform by name.

Return type:

Callable

setup(augmentation=None, *args, **kwargs)[source]#

Split the data into training, validation and test sets.

Additionally for apply augmentation to the test data. Insert the augmentation into the existing test transformation at the first position.

Parameters:

augmentation (str) – Name of the augmentation to apply to the test data.

Return type:

None

plot(tester, save_path, specific_name='')[source]#

Plot random samples from the training, validation and test set to check if the data is correctly predicted.

Return type:

None

class yamle.data.datamodule.VisionClassificationDataModule(*args, **kwargs)[source]#

Bases: VisionDataModule

Data module for the vision classification datasets.

task = 'classification'#
available_transforms: List[str]#
available_test_augmentations: List[str]#
test_augmentations: List[str]#
class yamle.data.datamodule.VisionRegressionDataModule(*args, **kwargs)[source]#

Bases: VisionDataModule

Data module for the vision regression datasets.

task = 'regression'#
plot(tester, save_path, specific_name='')[source]#

Plot random samples from the training, validation and test set to check if the data is correctly predicted.

Return type:

None

available_transforms: List[str]#
available_test_augmentations: List[str]#
test_augmentations: List[str]#
class yamle.data.datamodule.RealWorldDataModule(*args, **kwargs)[source]#

Bases: BaseDataModule

Data module for real world datasets.

To test out-of-distribution robustness, the test dataset can be modified with tabular corruptions. The corruptions are applied to the test dataset only.

inputs_dtype = torch.float32#
outputs_dtype = torch.int64#
targets_dim = 1#
available_test_augmentations: List[str]#
test_augmentations: List[str]#
get_transform(name)[source]#

This is a helper function to get the transform by name.

Return type:

Callable

prepare_data()[source]#

Prepares the data for training, validation, and testing.

Return type:

None

setup(augmentation=None)[source]#

Split the data into training, validation and test sets.

Additionally for apply augmentation to the test data. Insert the augmentation into the existing test transformation at the first position.

Parameters:

augmentation (str) – Name of the augmentation to apply to the training data. Default: None.

Return type:

None

available_transforms: List[str]#
class yamle.data.datamodule.RealWorldRegressionDataModule(*args, **kwargs)[source]#

Bases: RealWorldDataModule

Data module for real world regression datasets.

Parameters:

test_portion (float) – Portion of the training data to use for testing.

inputs_dtype = torch.float32#
outputs_dtype = torch.float32#
targets_dim = 1#
prepare_data()[source]#

Prepares the data for training, validation, and testing.

Return type:

None

available_transforms: List[str]#
available_test_augmentations: List[str]#
test_augmentations: List[str]#
class yamle.data.datamodule.RealWorldClassificationDataModule(*args, **kwargs)[source]#

Bases: RealWorldDataModule

Data module for real world classification datasets.

Parameters:

test_portion (float) – Portion of the training data to use for testing.

inputs_dtype = torch.float32#
outputs_dtype = torch.int64#
targets_dim = 1#
available_transforms: List[str]#
available_test_augmentations: List[str]#
test_augmentations: List[str]#