Data Config

Source

Data configuration.

`Float = Annotated[float, PlainSerializer(np_float_to_scientific_str, return_type=str)]` `module-attribute`

Annotated float type, used to serialize floats to strings.

`PatchFilterConfig = Union[MaxPatchFilterConfig, MeanStdPatchFilterConfig, ShannonPatchFilterConfig]` `module-attribute`

Patch filter type.

`PatchingConfig = Union[FixedRandomPatchingConfig, RandomPatchingConfig, StratifiedPatchingConfig, TiledPatchingConfig, WholePatchingConfig]` `module-attribute`

Patching strategy type.

`DataConfig`

Bases: BaseModel

Next-Generation Dataset configuration.

DataConfig are used for both training and prediction, with the patching strategy determining how the data is processed. Note that random is the only patching strategy compatible with training, while tiled and whole are only used for prediction.

All supported transforms are defined in the SupportedTransform enum.

`augmentations = Field(default=(XYFlipConfig(), XYRandomRotate90Config()), validate_default=True)` `class-attribute` `instance-attribute`

List of augmentations to apply to the data, available transforms are defined in SupportedTransform.

`axes` `instance-attribute`

Axes of the data, as defined in SupportedAxes.

`batch_size = Field(default=1, ge=1, validate_default=True)` `class-attribute` `instance-attribute`

Batch size for training.

`channels = Field(default=None)` `class-attribute` `instance-attribute`

Channels to use from the data. If None, all channels are used. Note that it is applied to both inputs and targets.

`data_type` `instance-attribute`

Type of input data.

`in_memory = Field(default_factory=default_in_memory, validate_default=True)` `class-attribute` `instance-attribute`

Whether to load all data into memory. This is only supported for 'array', 'tiff' and 'custom' data types. Must be True for array. If None, defaults to True for 'array', 'tiff' and custom, and False for 'zarr' and 'czi' data types.

`mask_filter = Field(default_factory=(lambda data: _create_mask_filter(data)))` `class-attribute` `instance-attribute`

Mask filter configuration to apply when using a mask during training. Coverage is automatically set to 1/(2**ndims) based on data dimensionality where ndims is determined from axes. Only available in training mode.

`mode` `instance-attribute`

Dataset mode, either training, validating or predicting.

`n_val_patches = Field(default=8, ge=0, validate_default=True)` `class-attribute` `instance-attribute`

The number of patches to set aside for validation during training. This parameter will be ignored if separate validation data is specified for training.

`normalization = Field(...)` `class-attribute` `instance-attribute`

Normalization configuration to use.

`num_workers = Field(default_factory=get_default_num_workers, ge=0)` `class-attribute` `instance-attribute`

Default number of workers for all dataloaders that do not explicitly set num_workers. Automatically detected based on the current platform: 0 on Windows and macOS, min(cpu_count - 1, 4) on Linux.

`patch_filter = Field(default=None, discriminator='name')` `class-attribute` `instance-attribute`

Patch filter to apply when using random patching. Only available if mode is training.

`patching = Field(..., discriminator='name')` `class-attribute` `instance-attribute`

Patching strategy to use. Note that random is the only supported strategy for training, while tiled and whole are only used for prediction.

`pred_dataloader_params = Field(default={})` `class-attribute` `instance-attribute`

Dictionary of PyTorch prediction dataloader parameters.

`seed = Field(default_factory=generate_random_seed, gt=0)` `class-attribute` `instance-attribute`

Random seed for reproducibility. If not specified, a random seed is generated.

`train_dataloader_params = Field(default={'shuffle': True}, validate_default=True)` `class-attribute` `instance-attribute`

Dictionary of PyTorch training dataloader parameters. The dataloader parameters, should include the shuffle key, which is set to True by default. We strongly recommend to keep it as True to ensure the best training results.

`val_dataloader_params = Field(default={})` `class-attribute` `instance-attribute`

Dictionary of PyTorch validation dataloader parameters.

`str()`

Pretty string reprensenting the configuration.

Returns:

str –

Pretty string.

`axes_valid(axes, info)` `classmethod`

Validate axes.

Axes must: - be a combination of 'STCZYX' - not contain duplicates - contain at least 2 contiguous axes: X and Y - contain at most 4 axes - not contain both S and T axes

Parameters:

axes (str) –

Axes to validate.
info (ValidationInfo) –

Validation information.

Returns:

str –

Validated axes.

Raises:

ValueError –

If axes are not valid.

`batch_size_not_in_dataloader_params(dataloader_params)` `classmethod`

Validate that batch_size is not set in the dataloader parameters.

batch_size must be set through batch_size field, not through the dataloader parameters.

Parameters:

dataloader_params (dict of {str: Any}) –

The dataloader parameters.

Returns:

dict of {str: Any} –

The validated dataloader parameters.

Raises:

ValueError –

If batch_size is present in the dataloader parameters.

`convert_mode(new_mode, new_patch_size=None, overlap_size=None, new_batch_size=None, new_data_type=None, new_axes=None, new_channels=None, new_in_memory=None, new_dataloader_params=None)`

Convert a training dataset configuration to a different mode.

This method is intended to facilitate creating validation or prediction configurations from a training configuration.

To perform tile prediction when switching to predicting mode, please provide both new_patch_size and overlap_size. Switching mode to predicting without specifying new_patch_size and overlap_size will apply the default patching strategy, namely whole image strategy. new_patch_size and overlap_size are only used when switching to predicting.

channels=None will retain the same channels as in the current configuration. To select all channels, please specify all channels explicitly or pass channels='all'.

New dataloader parameters will be placed in the appropriate dataloader params field depending on the new mode.

To create a new training configuration, please use careamics.config.create_ng_data_configuration.

This method compares the new parameters with the current ones and raises errors if incompatible changes are requested, such as switching between 2D and 3D axes, or changing the number of channels. Incompatibility across parameters may be delegated to Pydantic validation.

Parameters:

new_mode (Literal['validating', 'predicting']) –

The new dataset mode, one of validating or predicting.
new_patch_size (Sequence of int, default: None ) –

New patch size. If None for predicting, uses default whole image strategy.
overlap_size (Sequence of int, default: None ) –

New overlap size. Necessary when switching to predicting with tiled patching.
new_batch_size (int, default: None ) –

New batch size.
new_data_type (Literal['array', 'tiff', 'zarr', 'czi', 'custom'], default: None ) –

New data type.
new_axes (str, default: None ) –

New axes.
new_channels (Sequence of int or "all", default: None ) –

New channels.
new_in_memory (bool, default: None ) –

New in_memory value.
new_dataloader_params (dict of {str: Any}, default: None ) –

New dataloader parameters. These will be placed in the appropriate dataloader params field depending on the new mode.

Returns:

DataConfig –

New DataConfig with the updated mode and parameters.

Raises:

ValueError –

If conversion to training mode is requested, or if incompatible changes are requested.

`is_3D()`

Check if the data is 3D based on the axes.

Either "Z" is in the axes and patching patch_size has 3 dimensions, or for CZI data, "Z" is in the axes or "T" is in the axes and patching patch_size has 3 dimensions.

This method is used during Configuration validation to cross checks dimensions with the algorithm configuration.

Returns:

bool –

True if the data is 3D, False otherwise.

`propagate_seed_to_augmentations()`

Propagate the main seed to all augmentations that support seeds.

This ensures that all augmentations use the same seed for reproducibility, unless they already have a seed explicitly set.

Returns:

Self –

Data model with propagated seeds.

`propagate_seed_to_patching()`

Propagate the main seed to the patching strategy if it supports seeds.

This ensures that the patching strategy uses the same seed for reproducibility, unless it already has a seed explicitly set.

Returns:

Self –

Data model with propagated seed.

`set_3D(axes, patch_size)`

Set 3D parameters.

Parameters:

axes (str) –

Axes.
patch_size (list of int) –

Patch size.

`set_default_max_patch_filter_coverage()`

Set default max patch filter coverage based on data dimensionality.

Returns:

Self –

Data model with default max patch filter coverage updated.

`set_default_pin_memory(dataloader_params)` `classmethod`

Set default pin_memory for dataloader parameters if not provided.

If 'pin_memory' is not set, it defaults to True if CUDA is available.

Parameters:

dataloader_params (dict of {str: Any}) –

The dataloader parameters.

Returns:

dict of {str: Any} –

The dataloader parameters with pin_memory default applied.

`set_default_workers_in_dataloaders()`

Set num_workers and persistent_workers defaults in all dataloaders.

For each of train_dataloader_params, val_dataloader_params, and pred_dataloader_params: sets num_workers from the num_workers field if not already present, and sets persistent_workers=True when num_workers > 0 and not already specified.

Returns:

Self –

Validated data model with worker defaults applied to all dataloaders.

`shuffle_train_dataloader(train_dataloader_params)` `classmethod`

Validate that "shuffle" is included in the training dataloader params.

A warning will be raised if shuffle=False.

Parameters:

train_dataloader_params (dict of {str: Any}) –

The training dataloader parameters.

Returns:

dict of {str: Any} –

The validated training dataloader parameters.

Raises:

ValueError –

If "shuffle" is not included in the training dataloader params.

`validate_channels(channels, info)` `classmethod`

Validate channels.

Channels must be a sequence of non-negative integers without duplicates. If channels are not None, then C must be present in the axes.

Parameters:

channels (Sequence of int or None) –

Channels to validate.
info (ValidationInfo) –

Validation information.

Returns:

Sequence of int or None –

Validated channels.

Raises:

ValueError –

If channels are not valid.

`validate_dimensions()`

Validate 2D/3D dimensions between axes and patch size.

Returns:

Self –

Validated data model.

Raises:

ValueError –

If the patch size dimension is not compatible with the axes.

`validate_filters_against_mode(filter_obj, info)` `classmethod`

Validate that the filters are only used during training.

Parameters:

filter_obj (PatchFilterConfig | MaskFilterConfig | None) –

Filter to validate.
info (ValidationInfo) –

Validation information.

Returns:

PatchFilterConfig | MaskFilterConfig | None –

Validated filter.

Raises:

ValueError –

If a filter is used in a mode other than training.

`validate_in_memory_with_data_type(in_memory, info)` `classmethod`

Validate that in_memory is compatible with data_type.

in_memory can only be True for 'array', 'tiff' and 'custom' data types.

Parameters:

in_memory (bool) –

Whether to load data into memory.
info (Any) –

Additional information about the field being validated.

Returns:

bool –

Validated in_memory value.

Raises:

ValueError –

If in_memory is True for unsupported data types.

`validate_patching_strategy_against_mode(patching, info)` `classmethod`

Validate that the patching strategy is compatible with the dataset mode.

If mode is training, patching strategy must be random or stratified.
If mode is validating, patching must be fixed_random.
If mode is predicting, patching strategy must be tiled or whole.

Parameters:

patching (PatchingStrategies) –

Patching strategy to validate.
info (ValidationInfo) –

Validation information.

Returns:

PatchingStrategies –

Validated patching strategy.

Raises:

ValueError –

If the patching strategy is not compatible with the dataset mode.

`warn_inconsistent_num_workers()`

Warn if num_workers conflicts with a per-dataloader value.

This validator runs before set_default_workers_in_dataloaders, so the dataloader dicts only contain user-supplied values at this point. Only fires when num_workers was explicitly set on the model.

Returns:

Self –

Unchanged data model.

`Mode`

Bases: StrEnum

Dataset mode.

`default_in_memory(validated_params)`

Default factory for the in_memory field.

Based on the value of data_type, set the default for in_memory to True if the data type is 'array', 'tiff', or 'custom', and to False otherwise (zarr or 'czi').

Parameters:

validated_params (dict of {str: Any}) –

Validated parameters.

Returns:

bool –

Default value for the in_memory field.

`get_default_num_workers()`

Return the default number of dataloader workers for the current platform.

Defaults by platform (benchmarked on BSD68, may need revisiting for larger datasets or more performant machines): - pytest: 0 - avoids multiprocessing overhead in tests. - Windows: 0 - multiprocessing with spawn is unreliable in dataloaders. - macOS: 0 - spawn-based worker init causes ~1 min startup hang even for a small number of workers, the throughput gain does not justify the wait. - Linux: min(cpu_count - 1, 4) - one core is left free to keep the UI responsive when training inside napari. Performance gains plateau around 4 workers, so we cap there to avoid wasting resources.

Returns:

int –

Default number of dataloader workers.

`np_float_to_scientific_str(x)`

Return a string scientific representation of a float.

In particular, this method is used to serialize floats to strings, allowing numpy.float32 to be passed in the Pydantic model and written to a yaml file as str.

Parameters:

x (float) –

Input value.

Returns:

str –

Scientific string representation of the input value.

Data Config

Float = Annotated[float, PlainSerializer(np_float_to_scientific_str, return_type=str)] module-attribute

PatchFilterConfig = Union[MaxPatchFilterConfig, MeanStdPatchFilterConfig, ShannonPatchFilterConfig] module-attribute

PatchingConfig = Union[FixedRandomPatchingConfig, RandomPatchingConfig, StratifiedPatchingConfig, TiledPatchingConfig, WholePatchingConfig] module-attribute

DataConfig

augmentations = Field(default=(XYFlipConfig(), XYRandomRotate90Config()), validate_default=True) class-attribute instance-attribute

axes instance-attribute

batch_size = Field(default=1, ge=1, validate_default=True) class-attribute instance-attribute

channels = Field(default=None) class-attribute instance-attribute

data_type instance-attribute

in_memory = Field(default_factory=default_in_memory, validate_default=True) class-attribute instance-attribute

mask_filter = Field(default_factory=(lambda data: _create_mask_filter(data))) class-attribute instance-attribute

mode instance-attribute

n_val_patches = Field(default=8, ge=0, validate_default=True) class-attribute instance-attribute

normalization = Field(...) class-attribute instance-attribute

num_workers = Field(default_factory=get_default_num_workers, ge=0) class-attribute instance-attribute

patch_filter = Field(default=None, discriminator='name') class-attribute instance-attribute

patching = Field(..., discriminator='name') class-attribute instance-attribute

pred_dataloader_params = Field(default={}) class-attribute instance-attribute

seed = Field(default_factory=generate_random_seed, gt=0) class-attribute instance-attribute

train_dataloader_params = Field(default={'shuffle': True}, validate_default=True) class-attribute instance-attribute

val_dataloader_params = Field(default={}) class-attribute instance-attribute

__str__()

axes_valid(axes, info) classmethod

batch_size_not_in_dataloader_params(dataloader_params) classmethod

convert_mode(new_mode, new_patch_size=None, overlap_size=None, new_batch_size=None, new_data_type=None, new_axes=None, new_channels=None, new_in_memory=None, new_dataloader_params=None)

is_3D()

propagate_seed_to_augmentations()

propagate_seed_to_patching()

set_3D(axes, patch_size)

set_default_max_patch_filter_coverage()

set_default_pin_memory(dataloader_params) classmethod

set_default_workers_in_dataloaders()

shuffle_train_dataloader(train_dataloader_params) classmethod

validate_channels(channels, info) classmethod

validate_dimensions()

validate_filters_against_mode(filter_obj, info) classmethod

validate_in_memory_with_data_type(in_memory, info) classmethod

validate_patching_strategy_against_mode(patching, info) classmethod

warn_inconsistent_num_workers()

Mode

default_in_memory(validated_params)

get_default_num_workers()

np_float_to_scientific_str(x)

`Float = Annotated[float, PlainSerializer(np_float_to_scientific_str, return_type=str)]` `module-attribute`

`PatchFilterConfig = Union[MaxPatchFilterConfig, MeanStdPatchFilterConfig, ShannonPatchFilterConfig]` `module-attribute`

`PatchingConfig = Union[FixedRandomPatchingConfig, RandomPatchingConfig, StratifiedPatchingConfig, TiledPatchingConfig, WholePatchingConfig]` `module-attribute`

`DataConfig`

`augmentations = Field(default=(XYFlipConfig(), XYRandomRotate90Config()), validate_default=True)` `class-attribute` `instance-attribute`

`axes` `instance-attribute`

`batch_size = Field(default=1, ge=1, validate_default=True)` `class-attribute` `instance-attribute`

`channels = Field(default=None)` `class-attribute` `instance-attribute`

`data_type` `instance-attribute`

`in_memory = Field(default_factory=default_in_memory, validate_default=True)` `class-attribute` `instance-attribute`

`mask_filter = Field(default_factory=(lambda data: _create_mask_filter(data)))` `class-attribute` `instance-attribute`

`mode` `instance-attribute`

`n_val_patches = Field(default=8, ge=0, validate_default=True)` `class-attribute` `instance-attribute`

`normalization = Field(...)` `class-attribute` `instance-attribute`

`num_workers = Field(default_factory=get_default_num_workers, ge=0)` `class-attribute` `instance-attribute`

`patch_filter = Field(default=None, discriminator='name')` `class-attribute` `instance-attribute`

`patching = Field(..., discriminator='name')` `class-attribute` `instance-attribute`

`pred_dataloader_params = Field(default={})` `class-attribute` `instance-attribute`

`seed = Field(default_factory=generate_random_seed, gt=0)` `class-attribute` `instance-attribute`

`train_dataloader_params = Field(default={'shuffle': True}, validate_default=True)` `class-attribute` `instance-attribute`

`val_dataloader_params = Field(default={})` `class-attribute` `instance-attribute`

`str()`

`axes_valid(axes, info)` `classmethod`

`batch_size_not_in_dataloader_params(dataloader_params)` `classmethod`

`convert_mode(new_mode, new_patch_size=None, overlap_size=None, new_batch_size=None, new_data_type=None, new_axes=None, new_channels=None, new_in_memory=None, new_dataloader_params=None)`

`is_3D()`

`propagate_seed_to_augmentations()`

`propagate_seed_to_patching()`

`set_3D(axes, patch_size)`

`set_default_max_patch_filter_coverage()`

`set_default_pin_memory(dataloader_params)` `classmethod`

`set_default_workers_in_dataloaders()`

`shuffle_train_dataloader(train_dataloader_params)` `classmethod`

`validate_channels(channels, info)` `classmethod`

`validate_dimensions()`

`validate_filters_against_mode(filter_obj, info)` `classmethod`

`validate_in_memory_with_data_type(in_memory, info)` `classmethod`

`validate_patching_strategy_against_mode(patching, info)` `classmethod`

`warn_inconsistent_num_workers()`

`Mode`

`default_in_memory(validated_params)`

`get_default_num_workers()`

`np_float_to_scientific_str(x)`