Factory

CAREamics dataset factory functions and utilities.

`Loading = ReadFuncLoading | ImageStackLoading | None` `module-attribute`

The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

`ImageStackLoading` `dataclass`

Loading spec. for a custom image stack loader (chunked / memory-mapped).

`image_stack_loader` `instance-attribute`

A function that loads image data to a sequence of ImageStack objects.

`image_stack_loader_kwargs = None` `class-attribute` `instance-attribute`

Additional keyword arguments to pass to the image_stack_loader alongside the source of the image data.

`IndependentTargets` `dataclass`

Bases: Generic[T]

MicroSplit data with independent target structures.

The data for different target structures may have a different shape.

The input will be a synthetically generated superposition of the target structures.

`MultiChannelTarget` `dataclass`

Bases: Generic[T]

MicroSplit data with target channels acquired together.

The input will be a synthetically generated superposition of the target channels.

`PairedInputTarget` `dataclass`

Bases: Generic[T]

MicroSplit data with paired inputs and multi-channel targets.

`PredData` `dataclass`

Bases: Generic[T]

Data for prediction.

`ReadFuncLoading` `dataclass`

Loading specification using a custom read function.

`extension_filter = ''` `class-attribute` `instance-attribute`

A filter for finding source files using glob-style pattern matching. For example, to select files with the extension .npy one should use the filter "*.npy".

`read_kwargs = None` `class-attribute` `instance-attribute`

Additional keyword arguments to pass to the read_source_func alongside the file path to the image data.

`read_source_func` `instance-attribute`

A function for reading image data to numpy arrays.

`TrainValData` `dataclass`

Bases: Generic[T]

Data for training with validation data provided.

`TrainValSplitData` `dataclass`

Bases: Generic[T]

Data for training with automatic validation splitting.

`create_dataset(config, inputs, targets, loading=None, model_constraints=None)`

Create a CAREamicsDataset.

Parameters:

config (DataConfig) –

The data configuration (data type, axes, patching, etc.).
inputs (Any) –

The input data sources (paths, arrays, or custom).
targets (Any) –

The target data sources, or None.
loading (ReadFuncLoading or ImageStackLoading or None, default: None ) –

Custom loading specification. Required when data_type is "custom": use ReadFuncLoading for a read function, or ImageStackLoading for a custom image stack loader. Otherwise None.
model_constraints (ModelConstraints, default: None ) –

If provided, the data module will validate that the input data shape is compatible with the model constraints. Only used for prediction datasets.

Returns:

CareamicsDataset[ImageStack] –

The configured dataset instance.

`create_microsplit_dataset(config, data, loading=None, model_constraints=None, rng=None)`

Create a MicroSplit training or validation dataset.

The data type determines which MicroSplit training mode to use. There are three options:

- `MicroSplitMultiplexedTargetData`: When only multiplexed target channels are
available the inputs can be synthesized by summing together the target channels.
- `MicroSplitSeparateTargetData`: Multiplexed target channels are not available,
instead, each channel are acquired separately. This should only be used for
structures which are not spatially correlated.
- `MicroSplitPairedData`: When both the multiplexed target channels and the
real input are available.

Parameters:

config (MicroSplitDataConfig) –

MicroSplit data configuration.
data (MicroSplitTrainingData) –

Data sources used to construct MicroSplit training patches. Either MicroSplitMultiplexedTargetData, MicroSplitSeparateTargetData or MicroSplitPairedData.
loading (Loading, default: None ) –

Loading specification for custom data.
model_constraints (ModelConstraints, default: None ) –

Optional model constraints for dataset validation.
rng (Generator, default: None ) –

Random number generator passed to stochastic MicroSplit constructors.

Returns:

CareamicsDataset –

The configured MicroSplit dataset.

`create_microsplit_pred_dataset(config, input_data, loading=None, model_constraints=None)`

create_microsplit_pred_dataset(config: MicroSplitDataConfig, input_data: Sequence[NDArray[Any]] | Sequence[Path], loading: ReadFuncLoading | None = None, model_constraints: ModelConstraints | None = None) -> CareamicsDataset[ImageStack]

create_microsplit_pred_dataset(config: MicroSplitDataConfig, input_data: Any, loading: ImageStackLoading, model_constraints: ModelConstraints | None = None) -> CareamicsDataset[ImageStack]

Create a MicroSplit prediction dataset.

Parameters:

config (MicroSplitDataConfig) –

MicroSplit prediction data configuration.
input_data ((Sequence[NDArray], Sequence[Path] or Any)) –

Prediction data sources. For default loading, this is a list of numpy arrays or a list of file paths. If using a custom image stack loader the input can be any type that is supported by the loader.
loading (Loading, default: None ) –

Loading specification. None or ReadFuncLoading is used for standard array and path inputs, while ImageStackLoading is used for custom input data.
model_constraints (ModelConstraints, default: None ) –

Optional model constraints for dataset validation.

Returns:

CareamicsDataset –

The configured MicroSplit prediction dataset.

`create_pred_dataset(config, data, loading, model_constraints=None)`

Create the dataset for prediction.

Parameters:

config (DataConfig) –

Data configuration.
data (PredData) –

Prediction data sources.
loading (ReadFuncLoading or ImageStackLoading or None) –

Custom loading specification when using custom data type.
model_constraints (ModelConstraints, default: None ) –

If provided, the dataset will validate that the prediction data shape is compatible with the model constraints.

Returns:

CareamicsDataset[ImageStack] –

Dataset configured for prediction.

`create_train_dataset(config, data, loading, model_constraints=None)`

Create a dataset for training.

Parameters:

config (DataConfig) –

Data configuration (must have mode='training').
data (TrainValData | TrainValSplitData) –

Train and validation data sources (and optional targets/masks).
loading (ReadFuncLoading or ImageStackLoading or None) –

Custom loading specification when using custom data type.
model_constraints (ModelConstraints, default: None ) –

If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

CareamicsDataset –

The training dataset.

`create_train_val_datasets(config, data, loading, model_constraints=None)`

Create train and validation datasets when validation data is provided explicitly.

Parameters:

config (DataConfig) –

Data configuration (must have mode='training').
data (TrainValData) –

Train and validation data sources (and optional targets/masks).
loading (ReadFuncLoading or ImageStackLoading or None) –

Custom loading specification when using custom data type.
model_constraints (ModelConstraints, default: None ) –

If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

tuple of (CareamicsDataset, CareamicsDataset) –

(train_dataset, val_dataset).

`create_val_split_datasets(config, data, loading, rng, model_constraints=None)`

Create train and validation datasets by splitting from training data.

Requires stratified patching in config.

Parameters:

config (DataConfig) –

Data configuration (must have mode='training', patching.name='stratified').
data (TrainValSplitData) –

Training data sources and number of validation patches.
loading (ReadFuncLoading or ImageStackLoading or None) –

Custom loading specification when using custom data type.
rng (Generator) –

Random generator for reproducible validation split.
model_constraints (ModelConstraints, default: None ) –

If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

tuple of (CareamicsDataset, CareamicsDataset) –

(train_dataset, val_dataset).

`init_patch_extractor(patch_extractor, image_stack_loader, source, axes)`

Build a patch extractor by loading image stacks from the source.

Parameters:

patch_extractor (type[PatchExtractor]) –

The PatchExtractor class to instantiate (e.g. PatchExtractor).
image_stack_loader (ImageStackLoader) –

Callable that takes (source, axes) and returns a list of image stacks.
source (Any) –

Data source (paths, arrays, etc.) passed to the loader.
axes (str) –

Axis order string passed to the loader.

Returns:

PatchExtractor[GenericImageStack] –

The constructed patch extractor instance.

`select_image_stack_loader(data_type, in_memory, loading=None)`

Select image stack loader function for the given data type and loading options.

Parameters:

data_type (SupportedData) –

The type of data (array, tiff, zarr, czi, custom).
in_memory (bool) –

Whether to load full data into memory (True) or use lazy loading.
loading (ReadFuncLoading or ImageStackLoading or None, default: None ) –

Custom loading spec, required when data_type is custom.

Returns:

ImageStackLoader –

A callable that takes (source, axes) and returns a list of image stacks.

`select_patch_extractor_type(data_type, in_memory)`

Select the appropriate PatchExtractor type based on data type and memory mode.

If in_memory is True, or data_type is ZARR or CZI, the standard PatchExtractor is selected, otherwise the LimitFilesPatchExtractor will be used.

Parameters:

data_type (SupportedData) –

The type of data being handled.
in_memory (bool) –

Indicates whether data is to be loaded into memory.

Returns:

type[PatchExtractor] –

The selected PatchExtractor type.

Factory

Loading = ReadFuncLoading | ImageStackLoading | None module-attribute

ImageStackLoading dataclass

image_stack_loader instance-attribute

image_stack_loader_kwargs = None class-attribute instance-attribute

IndependentTargets dataclass

MultiChannelTarget dataclass

PairedInputTarget dataclass

PredData dataclass

ReadFuncLoading dataclass

extension_filter = '' class-attribute instance-attribute

read_kwargs = None class-attribute instance-attribute

read_source_func instance-attribute

TrainValData dataclass

TrainValSplitData dataclass

create_dataset(config, inputs, targets, loading=None, model_constraints=None)

create_microsplit_dataset(config, data, loading=None, model_constraints=None, rng=None)

create_microsplit_pred_dataset(config, input_data, loading=None, model_constraints=None)

create_pred_dataset(config, data, loading, model_constraints=None)

create_train_dataset(config, data, loading, model_constraints=None)

create_train_val_datasets(config, data, loading, model_constraints=None)

create_val_split_datasets(config, data, loading, rng, model_constraints=None)

init_patch_extractor(patch_extractor, image_stack_loader, source, axes)

select_image_stack_loader(data_type, in_memory, loading=None)

select_patch_extractor_type(data_type, in_memory)

`Loading = ReadFuncLoading | ImageStackLoading | None` `module-attribute`

`ImageStackLoading` `dataclass`

`image_stack_loader` `instance-attribute`

`image_stack_loader_kwargs = None` `class-attribute` `instance-attribute`

`IndependentTargets` `dataclass`

`MultiChannelTarget` `dataclass`

`PairedInputTarget` `dataclass`

`PredData` `dataclass`

`ReadFuncLoading` `dataclass`

`extension_filter = ''` `class-attribute` `instance-attribute`

`read_kwargs = None` `class-attribute` `instance-attribute`

`read_source_func` `instance-attribute`

`TrainValData` `dataclass`

`TrainValSplitData` `dataclass`

`create_dataset(config, inputs, targets, loading=None, model_constraints=None)`

`create_microsplit_dataset(config, data, loading=None, model_constraints=None, rng=None)`

`create_microsplit_pred_dataset(config, input_data, loading=None, model_constraints=None)`

`create_pred_dataset(config, data, loading, model_constraints=None)`

`create_train_dataset(config, data, loading, model_constraints=None)`

`create_train_val_datasets(config, data, loading, model_constraints=None)`

`create_val_split_datasets(config, data, loading, rng, model_constraints=None)`

`init_patch_extractor(patch_extractor, image_stack_loader, source, axes)`

`select_image_stack_loader(data_type, in_memory, loading=None)`

`select_patch_extractor_type(data_type, in_memory)`