Data Factory
Convenience functions to create NG data configurations.
SupportedPatchFilterConfig = MaxPatchFilterConfig | MeanStdPatchFilterConfig | ShannonPatchFilterConfig
module-attribute
Configuration for filtering background patches during training.
create_ng_data_configuration(data_type, axes, patch_size, batch_size, augmentations=None, normalization=None, patch_filter_config=None, channels=None, in_memory=None, n_val_patches=8, num_workers=-1, train_dataloader_params=None, val_dataloader_params=None, pred_dataloader_params=None, seed=None)
Create a training NGDatasetConfig.
Note that num_workers is applied to all dataloaders unless explicitly overridden
in the respective dataloader parameters.
Parameters:
-
data_type((array, tiff, zarr, czi, custom), default:"array") –Type of the data.
-
axes(str) –Axes of the data.
-
patch_size(list of int) –Size of the patches along the spatial dimensions.
-
batch_size(int) –Batch size.
-
augmentations(list of transforms or None, default:None) –List of transforms to apply. If
None, default augmentations are applied (flip in X and Y, rotations by 90 degrees in the XY plane). -
normalization(dict, default:None) –Normalization configuration dictionary. If None, defaults to mean_std normalization with automatically computed statistics.
-
patch_filter_config(SupportedPatchFilterConfig | None, default:None) –Specify the configuration for patch filtering. Patch filtering reduces the probability of background patches being selected during training. If
None, no patch filter is applied. -
channels(Sequence of int, default:None) –List of channels to use. If
None, all channels are used. -
in_memory(bool, default:None) –Whether to load all data into memory. This is only supported for 'array', 'tiff' and 'custom' data types. If
None, defaults toTruefor 'array', 'tiff' andcustom, andFalsefor 'zarr' and 'czi' data types. Must beTrueforarray. -
n_val_patches(int, default:8,) –The number of patches to set aside for validation during training. This parameter will be ignored if separate validation data is specified for training.
-
num_workers(int, default:-1) –Number of workers for data loading. Use
-1to automatically choose based on the number of available CPUs (calls :func:get_default_num_workers). -
augmentations(list of transforms or None, default:None) –List of transforms to apply. If
None, default augmentations are applied (flip in X and Y, rotations by 90 degrees in the XY plane). -
train_dataloader_params(dict, default:None) –Parameters for the training dataloader, see PyTorch notes, by default None.
-
val_dataloader_params(dict, default:None) –Parameters for the validation dataloader, see PyTorch notes, by default None.
-
pred_dataloader_params(dict, default:None) –Parameters for the test dataloader, see PyTorch notes, by default None.
-
seed(int, default:None) –Random seed for reproducibility. If
None, seed is generated automatically.
Returns:
-
DataConfig–Next-Generation Data model with the specified parameters.
list_spatial_augmentations(augmentations=None, seed=None)
List the augmentations to apply.
Parameters:
-
augmentations(list of transforms, default:None) –List of transforms to apply, either both or one of XYFlipConfig and XYRandomRotate90Config.
-
seed(int, default:None) –Random seed for reproducibility.
Returns:
-
list of transforms–List of transforms to apply.
Raises:
-
ValueError–If the transforms are not XYFlipConfig or XYRandomRotate90Config.
-
ValueError–If there are duplicate transforms.