Microsplit Data Config
MicroSplit data configuration.
MicroSplitDataConfig
Bases: DataConfig
Dataset configuration for MicroSplit.
alpha_ranges = None
class-attribute
instance-attribute
Ranges used to sample channel mixing weights for synthetic training inputs.
If None, the MicroSplit dataset factory will use equal fixed weights for each
target channel.
augmentations = Field(default=(XYFlipConfig(), XYRandomRotate90Config()), validate_default=True)
class-attribute
instance-attribute
List of augmentations to apply to the data, available transforms are defined in SupportedTransform.
axes
instance-attribute
Axes of the data, as defined in SupportedAxes.
batch_size = Field(default=1, ge=1, validate_default=True)
class-attribute
instance-attribute
Batch size for training.
channels = Field(default=None)
class-attribute
instance-attribute
Channels to use from the data. If None, all channels are used. Note that it is
applied to both inputs and targets.
data_type
instance-attribute
Type of input data.
in_memory = Field(default_factory=default_in_memory, validate_default=True)
class-attribute
instance-attribute
Whether to load all data into memory. This is only supported for 'array',
'tiff' and 'custom' data types. Must be True for array. If None, defaults to
True for 'array', 'tiff' and custom, and False for 'zarr' and 'czi' data
types.
mask_filter = Field(default_factory=(lambda data: _create_mask_filter(data)))
class-attribute
instance-attribute
Mask filter configuration to apply when using a mask during training.
Coverage is automatically set to 1/(2**ndims) based on data dimensionality
where ndims is determined from axes. Only available in training mode.
mode
instance-attribute
Dataset mode, either training, validating or predicting.
multiscale_count = Field(default=1, ge=1)
class-attribute
instance-attribute
Number of lateral-context scales to construct for MicroSplit inputs.
n_val_patches = Field(default=8, ge=0, validate_default=True)
class-attribute
instance-attribute
The number of patches to set aside for validation during training. This parameter will be ignored if separate validation data is specified for training.
normalization = Field(...)
class-attribute
instance-attribute
Normalization configuration to use.
num_workers = Field(default_factory=get_default_num_workers, ge=0)
class-attribute
instance-attribute
Default number of workers for all dataloaders that do not explicitly set
num_workers. Automatically detected based on the current platform:
0 on Windows and macOS, min(cpu_count - 1, 4) on Linux.
padding_mode = 'reflect'
class-attribute
instance-attribute
Padding mode used when lateral-context patches extend beyond image borders.
patch_filter = Field(default=None, discriminator='name')
class-attribute
instance-attribute
Patch filter to apply when using random patching. Only available if
mode is training.
patching = Field(..., discriminator='name')
class-attribute
instance-attribute
Patching strategy to use. Note that random is the only supported strategy for
training, while tiled and whole are only used for prediction.
pred_dataloader_params = Field(default={})
class-attribute
instance-attribute
Dictionary of PyTorch prediction dataloader parameters.
seed = Field(default_factory=generate_random_seed, gt=0)
class-attribute
instance-attribute
Random seed for reproducibility. If not specified, a random seed is generated.
train_dataloader_params = Field(default={'shuffle': True}, validate_default=True)
class-attribute
instance-attribute
Dictionary of PyTorch training dataloader parameters. The dataloader parameters,
should include the shuffle key, which is set to True by default. We strongly
recommend to keep it as True to ensure the best training results.
uncorrelated_channel_prob = Field(default=0.0, ge=0.0, le=1.0)
class-attribute
instance-attribute
Probability of sampling uncorrelated channels for synthetic training inputs.
val_dataloader_params = Field(default={})
class-attribute
instance-attribute
Dictionary of PyTorch validation dataloader parameters.
__str__()
axes_valid(axes, info)
classmethod
Validate axes.
Axes must: - be a combination of 'STCZYX' - not contain duplicates - contain at least 2 contiguous axes: X and Y - contain at most 4 axes - not contain both S and T axes
Parameters:
-
axes(str) –Axes to validate.
-
info(ValidationInfo) –Validation information.
Returns:
-
str–Validated axes.
Raises:
-
ValueError–If axes are not valid.
batch_size_not_in_dataloader_params(dataloader_params)
classmethod
Validate that batch_size is not set in the dataloader parameters.
batch_size must be set through batch_size field, not
through the dataloader parameters.
Parameters:
-
dataloader_params(dict of {str: Any}) –The dataloader parameters.
Returns:
-
dict of {str: Any}–The validated dataloader parameters.
Raises:
-
ValueError–If
batch_sizeis present in the dataloader parameters.
convert_mode(new_mode, new_patch_size=None, overlap_size=None, new_batch_size=None, new_data_type=None, new_axes=None, new_channels=None, new_in_memory=None, new_dataloader_params=None)
Convert mode while preserving MicroSplit-specific fields.
Parameters:
-
new_mode(Literal['validating', 'predicting']) –The new dataset mode, one of
validatingorpredicting. -
new_patch_size(Sequence[int] or None, default:None) –New patch size. If
Noneforpredicting, uses whole image prediction. -
overlap_size(Sequence[int] or None, default:None) –New overlap size. Required when switching to tiled prediction with
new_patch_size. -
new_batch_size(int or None, default:None) –New batch size. If
None, keeps the current batch size. -
new_data_type((array, tiff, zarr, czi, custom), default:"array") –New data type. If
None, keeps the current data type. -
new_axes(str or None, default:None) –New axes. If
None, keeps the current axes. -
new_channels((Sequence[int], all or None), default:None) –New channel selection. If
None, keeps the current channel selection. If "all", selects all channels. -
new_in_memory(bool or None, default:None) –New in-memory loading setting. If
None, keeps the current setting. -
new_dataloader_params(dict[str, Any] or None, default:None) –New dataloader parameters for the converted mode.
Returns:
-
MicroSplitDataConfig–Converted configuration with relevant MicroSplit-specific fields preserved.
is_3D()
Check if the data is 3D based on the axes.
Either "Z" is in the axes and patching patch_size has 3 dimensions, or for CZI
data, "Z" is in the axes or "T" is in the axes and patching patch_size has
3 dimensions.
This method is used during Configuration validation to cross checks dimensions with the algorithm configuration.
Returns:
-
bool–True if the data is 3D, False otherwise.
propagate_seed_to_augmentations()
Propagate the main seed to all augmentations that support seeds.
This ensures that all augmentations use the same seed for reproducibility, unless they already have a seed explicitly set.
Returns:
-
Self–Data model with propagated seeds.
propagate_seed_to_patching()
Propagate the main seed to the patching strategy if it supports seeds.
This ensures that the patching strategy uses the same seed for reproducibility, unless it already has a seed explicitly set.
Returns:
-
Self–Data model with propagated seed.
raise_unsupported_features()
set_3D(axes, patch_size)
set_default_max_patch_filter_coverage()
Set default max patch filter coverage based on data dimensionality.
Returns:
-
Self–Data model with default max patch filter coverage updated.
set_default_pin_memory(dataloader_params)
classmethod
Set default pin_memory for dataloader parameters if not provided.
- If 'pin_memory' is not set, it defaults to True if CUDA is available.
Parameters:
-
dataloader_params(dict of {str: Any}) –The dataloader parameters.
Returns:
-
dict of {str: Any}–The dataloader parameters with pin_memory default applied.
set_default_workers_in_dataloaders()
Set num_workers and persistent_workers defaults in all dataloaders.
For each of train_dataloader_params, val_dataloader_params, and
pred_dataloader_params: sets num_workers from the num_workers
field if not already present, and sets persistent_workers=True when
num_workers > 0 and not already specified.
Returns:
-
Self–Validated data model with worker defaults applied to all dataloaders.
shuffle_train_dataloader(train_dataloader_params)
classmethod
Validate that "shuffle" is included in the training dataloader params.
A warning will be raised if shuffle=False.
Parameters:
-
train_dataloader_params(dict of {str: Any}) –The training dataloader parameters.
Returns:
-
dict of {str: Any}–The validated training dataloader parameters.
Raises:
-
ValueError–If "shuffle" is not included in the training dataloader params.
validate_channels(channels, info)
classmethod
Validate channels.
Channels must be a sequence of non-negative integers without duplicates. If
channels are not None, then C must be present in the axes.
Parameters:
-
channels(Sequence of int or None) –Channels to validate.
-
info(ValidationInfo) –Validation information.
Returns:
-
Sequence of int or None–Validated channels.
Raises:
-
ValueError–If channels are not valid.
validate_dimensions()
Validate 2D/3D dimensions between axes and patch size.
Returns:
-
Self–Validated data model.
Raises:
-
ValueError–If the patch size dimension is not compatible with the axes.
validate_filters_against_mode(filter_obj, info)
classmethod
Validate that the filters are only used during training.
Parameters:
-
filter_obj(PatchFilterConfig | MaskFilterConfig | None) –Filter to validate.
-
info(ValidationInfo) –Validation information.
Returns:
-
PatchFilterConfig | MaskFilterConfig | None–Validated filter.
Raises:
-
ValueError–If a filter is used in a mode other than training.
validate_in_memory_with_data_type(in_memory, info)
classmethod
Validate that in_memory is compatible with data_type.
in_memory can only be True for 'array', 'tiff' and 'custom' data types.
Parameters:
-
in_memory(bool) –Whether to load data into memory.
-
info(Any) –Additional information about the field being validated.
Returns:
-
bool–Validated in_memory value.
Raises:
-
ValueError–If in_memory is True for unsupported data types.
validate_microsplit_params_against_mode()
validate_patching_strategy_against_mode(patching, info)
classmethod
Validate that the patching strategy is compatible with the dataset mode.
- If mode is
training, patching strategy must berandomorstratified. - If mode is
validating, patching must befixed_random. - If mode is
predicting, patching strategy must betiledorwhole.
Parameters:
-
patching(PatchingStrategies) –Patching strategy to validate.
-
info(ValidationInfo) –Validation information.
Returns:
-
PatchingStrategies–Validated patching strategy.
Raises:
-
ValueError–If the patching strategy is not compatible with the dataset mode.
warn_inconsistent_num_workers()
Warn if num_workers conflicts with a per-dataloader value.
This validator runs before set_default_workers_in_dataloaders, so
the dataloader dicts only contain user-supplied values at this point.
Only fires when num_workers was explicitly set on the model.
Returns:
-
Self–Unchanged data model.