Skip to content

Factory

Source

CAREamics dataset factory functions and utilities.

Loading = ReadFuncLoading | ImageStackLoading | None module-attribute

The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

ImageStackLoading dataclass

Loading spec. for a custom image stack loader (chunked / memory-mapped).

image_stack_loader instance-attribute

A function that loads image data to a sequence of ImageStack objects.

image_stack_loader_kwargs = None class-attribute instance-attribute

Additional keyword arguments to pass to the image_stack_loader alongside the source of the image data.

IndependentTargets dataclass

Bases: Generic[T]

MicroSplit data with independent target structures.

The data for different target structures may have a different shape.

The input will be a synthetically generated superposition of the target structures.

MultiChannelTarget dataclass

Bases: Generic[T]

MicroSplit data with target channels acquired together.

The input will be a synthetically generated superposition of the target channels.

PairedInputTarget dataclass

Bases: Generic[T]

MicroSplit data with paired inputs and multi-channel targets.

PredData dataclass

Bases: Generic[T]

Data for prediction.

ReadFuncLoading dataclass

Loading specification using a custom read function.

extension_filter = '' class-attribute instance-attribute

A filter for finding source files using glob-style pattern matching. For example, to select files with the extension .npy one should use the filter "*.npy".

read_kwargs = None class-attribute instance-attribute

Additional keyword arguments to pass to the read_source_func alongside the file path to the image data.

read_source_func instance-attribute

A function for reading image data to numpy arrays.

TrainValData dataclass

Bases: Generic[T]

Data for training with validation data provided.

TrainValSplitData dataclass

Bases: Generic[T]

Data for training with automatic validation splitting.

create_dataset(config, inputs, targets, loading=None, model_constraints=None)

Create a CAREamicsDataset.

Parameters:

  • config (DataConfig) –

    The data configuration (data type, axes, patching, etc.).

  • inputs (Any) –

    The input data sources (paths, arrays, or custom).

  • targets (Any) –

    The target data sources, or None.

  • loading (ReadFuncLoading or ImageStackLoading or None, default: None ) –

    Custom loading specification. Required when data_type is "custom": use ReadFuncLoading for a read function, or ImageStackLoading for a custom image stack loader. Otherwise None.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the data module will validate that the input data shape is compatible with the model constraints. Only used for prediction datasets.

Returns:

create_microsplit_dataset(config, data, loading=None, model_constraints=None, rng=None)

Create a MicroSplit training or validation dataset.

The data type determines which MicroSplit training mode to use. There are three options:

- `MicroSplitMultiplexedTargetData`: When only multiplexed target channels are
available the inputs can be synthesized by summing together the target channels.
- `MicroSplitSeparateTargetData`: Multiplexed target channels are not available,
instead, each channel are acquired separately. This should only be used for
structures which are not spatially correlated.
- `MicroSplitPairedData`: When both the multiplexed target channels and the
real input are available.

Parameters:

  • config (MicroSplitDataConfig) –

    MicroSplit data configuration.

  • data (MicroSplitTrainingData) –

    Data sources used to construct MicroSplit training patches. Either MicroSplitMultiplexedTargetData, MicroSplitSeparateTargetData or MicroSplitPairedData.

  • loading (Loading, default: None ) –

    Loading specification for custom data.

  • model_constraints (ModelConstraints, default: None ) –

    Optional model constraints for dataset validation.

  • rng (Generator, default: None ) –

    Random number generator passed to stochastic MicroSplit constructors.

Returns:

create_microsplit_pred_dataset(config, input_data, loading=None, model_constraints=None)

create_microsplit_pred_dataset(config: MicroSplitDataConfig, input_data: Sequence[NDArray[Any]] | Sequence[Path], loading: ReadFuncLoading | None = None, model_constraints: ModelConstraints | None = None) -> CareamicsDataset[ImageStack]
create_microsplit_pred_dataset(config: MicroSplitDataConfig, input_data: Any, loading: ImageStackLoading, model_constraints: ModelConstraints | None = None) -> CareamicsDataset[ImageStack]

Create a MicroSplit prediction dataset.

Parameters:

  • config (MicroSplitDataConfig) –

    MicroSplit prediction data configuration.

  • input_data ((Sequence[NDArray], Sequence[Path] or Any)) –

    Prediction data sources. For default loading, this is a list of numpy arrays or a list of file paths. If using a custom image stack loader the input can be any type that is supported by the loader.

  • loading (Loading, default: None ) –

    Loading specification. None or ReadFuncLoading is used for standard array and path inputs, while ImageStackLoading is used for custom input data.

  • model_constraints (ModelConstraints, default: None ) –

    Optional model constraints for dataset validation.

Returns:

create_pred_dataset(config, data, loading, model_constraints=None)

Create the dataset for prediction.

Parameters:

  • config (DataConfig) –

    Data configuration.

  • data (PredData) –

    Prediction data sources.

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the prediction data shape is compatible with the model constraints.

Returns:

create_train_dataset(config, data, loading, model_constraints=None)

Create a dataset for training.

Parameters:

  • config (DataConfig) –

    Data configuration (must have mode='training').

  • data (TrainValData | TrainValSplitData) –

    Train and validation data sources (and optional targets/masks).

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

create_train_val_datasets(config, data, loading, model_constraints=None)

Create train and validation datasets when validation data is provided explicitly.

Parameters:

  • config (DataConfig) –

    Data configuration (must have mode='training').

  • data (TrainValData) –

    Train and validation data sources (and optional targets/masks).

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

  • tuple of (CareamicsDataset, CareamicsDataset)

    (train_dataset, val_dataset).

create_val_split_datasets(config, data, loading, rng, model_constraints=None)

Create train and validation datasets by splitting from training data.

Requires stratified patching in config.

Parameters:

  • config (DataConfig) –

    Data configuration (must have mode='training', patching.name='stratified').

  • data (TrainValSplitData) –

    Training data sources and number of validation patches.

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • rng (Generator) –

    Random generator for reproducible validation split.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

  • tuple of (CareamicsDataset, CareamicsDataset)

    (train_dataset, val_dataset).

init_patch_extractor(patch_extractor, image_stack_loader, source, axes)

Build a patch extractor by loading image stacks from the source.

Parameters:

  • patch_extractor (type[PatchExtractor]) –

    The PatchExtractor class to instantiate (e.g. PatchExtractor).

  • image_stack_loader (ImageStackLoader) –

    Callable that takes (source, axes) and returns a list of image stacks.

  • source (Any) –

    Data source (paths, arrays, etc.) passed to the loader.

  • axes (str) –

    Axis order string passed to the loader.

Returns:

  • PatchExtractor[GenericImageStack]

    The constructed patch extractor instance.

select_image_stack_loader(data_type, in_memory, loading=None)

Select image stack loader function for the given data type and loading options.

Parameters:

  • data_type (SupportedData) –

    The type of data (array, tiff, zarr, czi, custom).

  • in_memory (bool) –

    Whether to load full data into memory (True) or use lazy loading.

  • loading (ReadFuncLoading or ImageStackLoading or None, default: None ) –

    Custom loading spec, required when data_type is custom.

Returns:

  • ImageStackLoader

    A callable that takes (source, axes) and returns a list of image stacks.

select_patch_extractor_type(data_type, in_memory)

Select the appropriate PatchExtractor type based on data type and memory mode.

If in_memory is True, or data_type is ZARR or CZI, the standard PatchExtractor is selected, otherwise the LimitFilesPatchExtractor will be used.

Parameters:

  • data_type (SupportedData) –

    The type of data being handled.

  • in_memory (bool) –

    Indicates whether data is to be loaded into memory.

Returns: