Skip to content

Data

Source

Data Pydantic configuration models.

DataConfig

Bases: BaseModel

Next-Generation Dataset configuration.

DataConfig are used for both training and prediction, with the patching strategy determining how the data is processed. Note that random is the only patching strategy compatible with training, while tiled and whole are only used for prediction.

All supported transforms are defined in the SupportedTransform enum.

augmentations = Field(default=(XYFlipConfig(), XYRandomRotate90Config()), validate_default=True) class-attribute instance-attribute

List of augmentations to apply to the data, available transforms are defined in SupportedTransform.

axes instance-attribute

Axes of the data, as defined in SupportedAxes.

batch_size = Field(default=1, ge=1, validate_default=True) class-attribute instance-attribute

Batch size for training.

channels = Field(default=None) class-attribute instance-attribute

Channels to use from the data. If None, all channels are used. Note that it is applied to both inputs and targets.

data_type instance-attribute

Type of input data.

in_memory = Field(default_factory=default_in_memory, validate_default=True) class-attribute instance-attribute

Whether to load all data into memory. This is only supported for 'array', 'tiff' and 'custom' data types. Must be True for array. If None, defaults to True for 'array', 'tiff' and custom, and False for 'zarr' and 'czi' data types.

mask_filter = Field(default_factory=(lambda data: _create_mask_filter(data))) class-attribute instance-attribute

Mask filter configuration to apply when using a mask during training. Coverage is automatically set to 1/(2**ndims) based on data dimensionality where ndims is determined from axes. Only available in training mode.

mode instance-attribute

Dataset mode, either training, validating or predicting.

n_val_patches = Field(default=8, ge=0, validate_default=True) class-attribute instance-attribute

The number of patches to set aside for validation during training. This parameter will be ignored if separate validation data is specified for training.

normalization = Field(...) class-attribute instance-attribute

Normalization configuration to use.

num_workers = Field(default_factory=get_default_num_workers, ge=0) class-attribute instance-attribute

Default number of workers for all dataloaders that do not explicitly set num_workers. Automatically detected based on the current platform: 0 on Windows and macOS, min(cpu_count - 1, 4) on Linux.

patch_filter = Field(default=None, discriminator='name') class-attribute instance-attribute

Patch filter to apply when using random patching. Only available if mode is training.

patching = Field(..., discriminator='name') class-attribute instance-attribute

Patching strategy to use. Note that random is the only supported strategy for training, while tiled and whole are only used for prediction.

pred_dataloader_params = Field(default={}) class-attribute instance-attribute

Dictionary of PyTorch prediction dataloader parameters.

seed = Field(default_factory=generate_random_seed, gt=0) class-attribute instance-attribute

Random seed for reproducibility. If not specified, a random seed is generated.

train_dataloader_params = Field(default={'shuffle': True}, validate_default=True) class-attribute instance-attribute

Dictionary of PyTorch training dataloader parameters. The dataloader parameters, should include the shuffle key, which is set to True by default. We strongly recommend to keep it as True to ensure the best training results.

val_dataloader_params = Field(default={}) class-attribute instance-attribute

Dictionary of PyTorch validation dataloader parameters.

__str__()

Pretty string reprensenting the configuration.

Returns:

  • str

    Pretty string.

axes_valid(axes, info) classmethod

Validate axes.

Axes must: - be a combination of 'STCZYX' - not contain duplicates - contain at least 2 contiguous axes: X and Y - contain at most 4 axes - not contain both S and T axes

Parameters:

  • axes (str) –

    Axes to validate.

  • info (ValidationInfo) –

    Validation information.

Returns:

  • str

    Validated axes.

Raises:

batch_size_not_in_dataloader_params(dataloader_params) classmethod

Validate that batch_size is not set in the dataloader parameters.

batch_size must be set through batch_size field, not through the dataloader parameters.

Parameters:

  • dataloader_params (dict of {str: Any}) –

    The dataloader parameters.

Returns:

  • dict of {str: Any}

    The validated dataloader parameters.

Raises:

  • ValueError

    If batch_size is present in the dataloader parameters.

convert_mode(new_mode, new_patch_size=None, overlap_size=None, new_batch_size=None, new_data_type=None, new_axes=None, new_channels=None, new_in_memory=None, new_dataloader_params=None)

Convert a training dataset configuration to a different mode.

This method is intended to facilitate creating validation or prediction configurations from a training configuration.

To perform tile prediction when switching to predicting mode, please provide both new_patch_size and overlap_size. Switching mode to predicting without specifying new_patch_size and overlap_size will apply the default patching strategy, namely whole image strategy. new_patch_size and overlap_size are only used when switching to predicting.

channels=None will retain the same channels as in the current configuration. To select all channels, please specify all channels explicitly or pass channels='all'.

New dataloader parameters will be placed in the appropriate dataloader params field depending on the new mode.

To create a new training configuration, please use careamics.config.create_ng_data_configuration.

This method compares the new parameters with the current ones and raises errors if incompatible changes are requested, such as switching between 2D and 3D axes, or changing the number of channels. Incompatibility across parameters may be delegated to Pydantic validation.

Parameters:

  • new_mode (Literal['validating', 'predicting']) –

    The new dataset mode, one of validating or predicting.

  • new_patch_size (Sequence of int, default: None ) –

    New patch size. If None for predicting, uses default whole image strategy.

  • overlap_size (Sequence of int, default: None ) –

    New overlap size. Necessary when switching to predicting with tiled patching.

  • new_batch_size (int, default: None ) –

    New batch size.

  • new_data_type (Literal['array', 'tiff', 'zarr', 'czi', 'custom'], default: None ) –

    New data type.

  • new_axes (str, default: None ) –

    New axes.

  • new_channels (Sequence of int or "all", default: None ) –

    New channels.

  • new_in_memory (bool, default: None ) –

    New in_memory value.

  • new_dataloader_params (dict of {str: Any}, default: None ) –

    New dataloader parameters. These will be placed in the appropriate dataloader params field depending on the new mode.

Returns:

  • DataConfig

    New DataConfig with the updated mode and parameters.

Raises:

  • ValueError

    If conversion to training mode is requested, or if incompatible changes are requested.

is_3D()

Check if the data is 3D based on the axes.

Either "Z" is in the axes and patching patch_size has 3 dimensions, or for CZI data, "Z" is in the axes or "T" is in the axes and patching patch_size has 3 dimensions.

This method is used during Configuration validation to cross checks dimensions with the algorithm configuration.

Returns:

  • bool

    True if the data is 3D, False otherwise.

propagate_seed_to_augmentations()

Propagate the main seed to all augmentations that support seeds.

This ensures that all augmentations use the same seed for reproducibility, unless they already have a seed explicitly set.

Returns:

  • Self

    Data model with propagated seeds.

propagate_seed_to_patching()

Propagate the main seed to the patching strategy if it supports seeds.

This ensures that the patching strategy uses the same seed for reproducibility, unless it already has a seed explicitly set.

Returns:

  • Self

    Data model with propagated seed.

set_3D(axes, patch_size)

Set 3D parameters.

Parameters:

  • axes (str) –

    Axes.

  • patch_size (list of int) –

    Patch size.

set_default_max_patch_filter_coverage()

Set default max patch filter coverage based on data dimensionality.

Returns:

  • Self

    Data model with default max patch filter coverage updated.

set_default_pin_memory(dataloader_params) classmethod

Set default pin_memory for dataloader parameters if not provided.

  • If 'pin_memory' is not set, it defaults to True if CUDA is available.

Parameters:

  • dataloader_params (dict of {str: Any}) –

    The dataloader parameters.

Returns:

  • dict of {str: Any}

    The dataloader parameters with pin_memory default applied.

set_default_workers_in_dataloaders()

Set num_workers and persistent_workers defaults in all dataloaders.

For each of train_dataloader_params, val_dataloader_params, and pred_dataloader_params: sets num_workers from the num_workers field if not already present, and sets persistent_workers=True when num_workers > 0 and not already specified.

Returns:

  • Self

    Validated data model with worker defaults applied to all dataloaders.

shuffle_train_dataloader(train_dataloader_params) classmethod

Validate that "shuffle" is included in the training dataloader params.

A warning will be raised if shuffle=False.

Parameters:

  • train_dataloader_params (dict of {str: Any}) –

    The training dataloader parameters.

Returns:

  • dict of {str: Any}

    The validated training dataloader parameters.

Raises:

  • ValueError

    If "shuffle" is not included in the training dataloader params.

validate_channels(channels, info) classmethod

Validate channels.

Channels must be a sequence of non-negative integers without duplicates. If channels are not None, then C must be present in the axes.

Parameters:

  • channels (Sequence of int or None) –

    Channels to validate.

  • info (ValidationInfo) –

    Validation information.

Returns:

  • Sequence of int or None

    Validated channels.

Raises:

validate_dimensions()

Validate 2D/3D dimensions between axes and patch size.

Returns:

  • Self

    Validated data model.

Raises:

  • ValueError

    If the patch size dimension is not compatible with the axes.

validate_filters_against_mode(filter_obj, info) classmethod

Validate that the filters are only used during training.

Parameters:

  • filter_obj (PatchFilterConfig | MaskFilterConfig | None) –

    Filter to validate.

  • info (ValidationInfo) –

    Validation information.

Returns:

Raises:

  • ValueError

    If a filter is used in a mode other than training.

validate_in_memory_with_data_type(in_memory, info) classmethod

Validate that in_memory is compatible with data_type.

in_memory can only be True for 'array', 'tiff' and 'custom' data types.

Parameters:

  • in_memory (bool) –

    Whether to load data into memory.

  • info (Any) –

    Additional information about the field being validated.

Returns:

  • bool

    Validated in_memory value.

Raises:

  • ValueError

    If in_memory is True for unsupported data types.

validate_patching_strategy_against_mode(patching, info) classmethod

Validate that the patching strategy is compatible with the dataset mode.

  • If mode is training, patching strategy must be random or stratified.
  • If mode is validating, patching must be fixed_random.
  • If mode is predicting, patching strategy must be tiled or whole.

Parameters:

  • patching (PatchingStrategies) –

    Patching strategy to validate.

  • info (ValidationInfo) –

    Validation information.

Returns:

  • PatchingStrategies

    Validated patching strategy.

Raises:

  • ValueError

    If the patching strategy is not compatible with the dataset mode.

warn_inconsistent_num_workers()

Warn if num_workers conflicts with a per-dataloader value.

This validator runs before set_default_workers_in_dataloaders, so the dataloader dicts only contain user-supplied values at this point. Only fires when num_workers was explicitly set on the model.

Returns:

  • Self

    Unchanged data model.

MaskPatchFilterConfig

Bases: PatchFilterConfig

Pydantic model for the mask patch filter.

coverage = Field(0.25, ge=0.0, le=1.0) class-attribute instance-attribute

Minimum ratio of masked pixels required to keep a sampling region. The optimum value is 1/(2**ndims) where ndims is the number of spatial dimensions.

filtered_patch_prob = Field(default=0.1, ge=0.0, le=1.0) class-attribute instance-attribute

The probability that each patch classed as background will be selected each epoch during training.

name = 'mask' class-attribute instance-attribute

Name of the filter.

ref_channel = 0 class-attribute instance-attribute

The channel to use as reference for filtering.

MaxPatchFilterConfig

Bases: PatchFilterConfig

Pydantic model for the max patch filter.

coverage = Field(default=0.25, ge=0.0, le=1.0) class-attribute instance-attribute

Minimum ratio of masked pixels required to keep a sampling region. The optimum value is 1/(2**ndims) where ndims is the number of spatial dimensions.

filtered_patch_prob = Field(default=0.1, ge=0.0, le=1.0) class-attribute instance-attribute

The probability that each patch classed as background will be selected each epoch during training.

name = 'max' class-attribute instance-attribute

Name of the filter.

ref_channel = 0 class-attribute instance-attribute

The channel to use as reference for filtering.

threshold instance-attribute

Threshold for the minimum of the max-filtered patch.

MeanStdConfig

Bases: BaseModel

Mean and standard deviation normalization configuration.

Holds mean and standard deviation statistics for input and target, used to normalize data. Each statistic can be a single float (applied globally to all channels) or a list of floats (one per channel). If not provided, statistics can be computed automatically.

Attributes:

  • name (Literal['mean_std']) –

    Identifier for the mean-std normalization scheme.

  • input_means (float | list[float] | None) –

    Means for input normalization. None for automatic computation.

  • input_stds (float | list[float] | None) –

    Standard deviations for input normalization. None for automatic computation.

  • target_means (float | list[float] | None) –

    Means for target normalization. None for automatic computation.

  • target_stds (float | list[float] | None) –

    Standard deviations for target normalization. None for automatic computation.

  • per_channel (bool) –

    When True (default), statistics are computed independently for each channel. When False, a single statistic is computed across all channels.

needs_computation()

Check if statistics need to be computed.

Returns:

  • bool

    True if input statistics are missing, False otherwise.

set_input_stats(means, stds)

Set input means and stds together to avoid validation errors.

Parameters:

  • means (list[float]) –

    Mean values per channel.

  • stds (list[float]) –

    Standard deviation values per channel.

set_target_stats(means, stds)

Set target means and stds together to avoid validation errors.

Parameters:

  • means (list[float]) –

    Mean values per channel.

  • stds (list[float]) –

    Standard deviation values per channel.

validate_global_stats_single_element(v, info) classmethod

Validate stats length against the per_channel parameter.

Parameters:

  • v (OptionalFloatStats) –

    Value to validate.

  • info (ValidationInfo) –

    Validated values.

Returns:

  • OptionalFloatStats

    Validate value.

validate_means_stds()

Validate that means and stds are provided in pairs or set to None.

Returns:

  • Self

    The validated model instance.

Raises:

  • ValueError

    If only one of means or stds is provided for input or target, or if paired lists have mismatched lengths.

validate_size(n_input_channels, n_output_channels)

Validate that statistics sizes match the number of channels.

Parameters:

  • n_input_channels (int) –

    The number of input channels to validate against.

  • n_output_channels (int) –

    The number of output channels to validate against.

Raises:

  • ValueError

    If any provided statistics list does not match the expected size.

MeanStdPatchFilterConfig

Bases: PatchFilterConfig

Pydantic model for the mean std patch filter.

filtered_patch_prob = Field(default=0.1, ge=0.0, le=1.0) class-attribute instance-attribute

The probability that each patch classed as background will be selected each epoch during training.

mean_threshold instance-attribute

Minimum mean intensity required to keep a patch.

name = 'mean_std' class-attribute instance-attribute

Name of the filter.

ref_channel = 0 class-attribute instance-attribute

The channel to use as reference for filtering.

std_threshold = None class-attribute instance-attribute

Minimum standard deviation required to keep a patch.

MinMaxConfig

Bases: BaseModel

Min-max normalization configuration.

Stores minimum and maximum statistics for scaling data into a desired range. Each statistic can be a single float (applied globally to all channels) or a list of floats (one per channel). If not provided, statistics can be computed automatically.

Attributes:

  • name (Literal['min_max']) –

    Identifier for min-max normalization.

  • input_mins (float | list[float] | None) –

    Minimum values for input normalization. None for automatic computation.

  • input_maxes (float | list[float] | None) –

    Maximum values for input normalization. None for automatic computation.

  • target_mins (float | list[float] | None) –

    Minimum values for target normalization. None for automatic computation.

  • target_maxes (float | list[float] | None) –

    Maximum values for target normalization. None for automatic computation.

  • per_channel (bool) –

    When True (default), statistics are computed independently for each channel. When False, a single statistic is computed across all channels.

needs_computation()

Check if min/max values need to be computed.

Returns:

  • bool

    True if input statistics are missing, False otherwise.

set_input_range(mins, maxes)

Set input mins and maxes together to avoid validation errors.

Parameters:

  • mins (list[float]) –

    Minimum values per channel.

  • maxes (list[float]) –

    Maximum values per channel.

set_target_range(mins, maxes)

Set target mins and maxes together to avoid validation errors.

Parameters:

  • mins (list[float]) –

    Minimum values per channel.

  • maxes (list[float]) –

    Maximum values per channel.

validate_global_stats_single_element(v, info) classmethod

Validate stats length against the per_channel parameter.

Parameters:

  • v (OptionalFloatStats) –

    Value to validate.

  • info (ValidationInfo) –

    Validated values.

Returns:

  • OptionalFloatStats

    Validate value.

validate_mins_maxes()

Validate that mins and maxes are provided in pairs or both None.

Returns:

  • Self

    The validated model instance.

Raises:

  • ValueError

    If only one of mins or maxes is provided for input or target, or if paired lists have mismatched lengths.

validate_size(n_input_channels, n_output_channels)

Validate that statistics sizes match the number of channels.

Parameters:

  • n_input_channels (int) –

    The number of input channels to validate against.

  • n_output_channels (int) –

    The number of output channels to validate against.

Raises:

  • ValueError

    If any provided statistics list does not match the expected size.

NoNormConfig

Bases: BaseModel

No normalization configuration.

Indicates that no normalization should be applied.

Attributes:

  • name (Literal['none']) –

    Identifier for no normalization scheme.

needs_computation()

Check if statistics need to be computed.

Returns:

  • bool

    Always False, as no statistics are required.

validate_size(*args, **kwargs)

No validation needed.

Parameters:

  • *args (Any, default: () ) –

    Parameters will be ignored.

  • **kwargs (Any, default: {} ) –

    Parameters will be ignored.

QuantileConfig

Bases: BaseModel

Quantile normalization configuration.

Normalizes data using quantile-based range scaling. Quantile levels can be specified as a single value (applied to all channels) or a list (one per channel). If not provided, quantile values can be computed automatically.

Attributes:

  • name (Literal['quantile']) –

    Identifier for quantile normalization.

  • lower_quantiles (float | list[float]) –

    Lower quantile level(s). Values must be in [0, 1).

  • upper_quantiles (float | list[float]) –

    Upper quantile level(s). Values must be in (0, 1].

  • input_lower_quantile_values (float | list[float] | None) –

    Computed lower quantile values for input.

  • input_upper_quantile_values (float | list[float] | None) –

    Computed upper quantile values for input.

  • target_lower_quantile_values (float | list[float] | None) –

    Computed lower quantile values for target.

  • target_upper_quantile_values (float | list[float] | None) –

    Computed upper quantile values for target.

  • per_channel (bool) –

    When True (default), quantile values are computed independently for each channel. When False, a single quantile is computed across all channels.

needs_computation()

Check if quantile values need to be computed.

Returns:

  • bool

    True if quantile values need to be computed.

set_input_quantile_values(lower, upper)

Set input quantile values together to avoid validation errors.

Parameters:

  • lower (list[float]) –

    Lower quantile values per channel.

  • upper (list[float]) –

    Upper quantile values per channel.

set_target_quantile_values(lower, upper)

Set target quantile values together to avoid validation errors.

Parameters:

  • lower (list[float]) –

    Lower quantile values per channel.

  • upper (list[float]) –

    Upper quantile values per channel.

validate_global_stats_single_element(v, info) classmethod

Validate stats length against the per_channel parameter.

Parameters:

  • v (OptionalFloatStats) –

    Value to validate.

  • info (ValidationInfo) –

    Validated values.

Returns:

  • OptionalFloatStats

    Validate value.

validate_quantile_levels()

Validate quantile levels are in valid range and properly ordered.

Returns:

  • Self

    The validated model instance.

validate_quantile_values()

Validate that computed quantile value lists are provided in pairs.

Returns:

  • Self

    The validated model instance.

validate_size(n_input_channels, n_output_channels)

Validate that statistics sizes match the number of channels.

Parameters:

  • n_input_channels (int) –

    The number of input channels to validate against.

  • n_output_channels (int) –

    The number of output channels to validate against.

Raises:

  • ValueError

    If any provided statistics list does not match the expected size.

RandomPatchingConfig

Bases: _PatchedConfig

Random patching Pydantic model.

Attributes:

  • name (random) –

    The name of the patching strategy.

  • patch_size (sequence of int) –

    The size of the patch in each spatial dimension, each patch size must be a power of 2 and larger than 8.

name = 'random' class-attribute instance-attribute

The name of the patching strategy.

patch_size = Field(..., min_length=2, max_length=3) class-attribute instance-attribute

The size of the patch in each spatial dimensions. Must be squared in YX.

seed = Field(default=None, gt=0) class-attribute instance-attribute

Random seed for patch sampling, set to None for random seeding.

ShannonPatchFilterConfig

Bases: PatchFilterConfig

Pydantic model for the Shannon entropy patch filter.

filtered_patch_prob = Field(default=0.1, ge=0.0, le=1.0) class-attribute instance-attribute

The probability that each patch classed as background will be selected each epoch during training.

name = 'shannon' class-attribute instance-attribute

Name of the filter.

ref_channel = 0 class-attribute instance-attribute

The channel to use as reference for filtering.

threshold instance-attribute

Minimum Shannon entropy required to keep a patch.

TiledPatchingConfig

Bases: _OverlappingPatchedConfig

Tiled patching Pydantic model.

Attributes:

  • name (tiled) –

    The name of the patching strategy.

  • patch_size (sequence of int) –

    The size of the patch in each spatial dimension, each patch size must be a power of 2 and larger than 8.

  • overlaps (sequence of int) –

    The overlaps between patches in each spatial dimension. The overlaps must be smaller than the patch size in each spatial dimension, and the number of dimensions be either 2 or 3.

name = 'tiled' class-attribute instance-attribute

The name of the patching strategy.

overlaps = Field(..., min_length=2, max_length=3) class-attribute instance-attribute

The overlaps between patches in each spatial dimension. The overlaps must be smaller than the patch size in each spatial dimension, and the number of dimensions be either 2 or 3.

patch_size = Field(..., min_length=2, max_length=3) class-attribute instance-attribute

The size of the patch in each spatial dimensions. Must be squared in YX.

overlap_even(overlaps) classmethod

Validate overlaps.

Overlap must be even.

Parameters:

  • overlaps (Sequence of int) –

    Overlaps.

Returns:

  • Sequence of int

    Validated overlap.

overlap_smaller_than_patch_size(overlaps, values) classmethod

Validate overlap.

Overlaps must be smaller than the patch size in each spatial dimension.

Parameters:

  • overlaps (Sequence of int) –

    Overlap in each dimension.

  • values (ValidationInfo) –

    Dictionary of values.

Returns:

  • Sequence of int

    Validated overlap.

WholePatchingConfig

Bases: BaseModel

Whole image patching Pydantic model.

name = 'whole' class-attribute instance-attribute

The name of the patching strategy.