Skip to content

Factory

Source

CAREamics dataset factory functions and utilities.

Loading = ReadFuncLoading | ImageStackLoading | None module-attribute

The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

ImageStackLoading dataclass

Loading spec. for a custom image stack loader (chunked / memory-mapped).

image_stack_loader instance-attribute

A function that loads image data to a sequence of ImageStack objects.

image_stack_loader_kwargs = None class-attribute instance-attribute

Additional keyword arguments to pass to the image_stack_loader alongside the source of the image data.

PredData dataclass

Bases: Generic[T]

Data for prediction.

ReadFuncLoading dataclass

Loading specification using a custom read function.

extension_filter = '' class-attribute instance-attribute

A filter for finding source files using glob-style pattern matching. For example, to select files with the extension .npy one should use the filter "*.npy".

read_kwargs = None class-attribute instance-attribute

Additional keyword arguments to pass to the read_source_func alongside the file path to the image data.

read_source_func instance-attribute

A function for reading image data to numpy arrays.

TrainValData dataclass

Bases: Generic[T]

Data for training with validation data provided.

TrainValSplitData dataclass

Bases: Generic[T]

Data for training with automatic validation splitting.

create_dataset(config, inputs, targets, loading=None, model_constraints=None)

Create a CAREamicsDataset.

Parameters:

  • config (DataConfig) –

    The data configuration (data type, axes, patching, etc.).

  • inputs (Any) –

    The input data sources (paths, arrays, or custom).

  • targets (Any) –

    The target data sources, or None.

  • loading (ReadFuncLoading or ImageStackLoading or None, default: None ) –

    Custom loading specification. Required when data_type is "custom": use ReadFuncLoading for a read function, or ImageStackLoading for a custom image stack loader. Otherwise None.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the data module will validate that the input data shape is compatible with the model constraints. Only used for prediction datasets.

Returns:

create_pred_dataset(config, data, loading, model_constraints=None)

Create the dataset for prediction.

Parameters:

  • config (DataConfig) –

    Data configuration.

  • data (PredData) –

    Prediction data sources.

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the prediction data shape is compatible with the model constraints.

Returns:

create_train_dataset(config, data, loading, model_constraints=None)

Create a dataset for training.

Parameters:

  • config (DataConfig) –

    Data configuration (must have mode='training').

  • data (TrainValData | TrainValSplitData) –

    Train and validation data sources (and optional targets/masks).

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

create_train_val_datasets(config, data, loading, model_constraints=None)

Create train and validation datasets when validation data is provided explicitly.

Parameters:

  • config (DataConfig) –

    Data configuration (must have mode='training').

  • data (TrainValData) –

    Train and validation data sources (and optional targets/masks).

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

  • tuple of (CareamicsDataset, CareamicsDataset)

    (train_dataset, val_dataset).

create_val_split_datasets(config, data, loading, rng, model_constraints=None)

Create train and validation datasets by splitting from training data.

Requires stratified patching in config.

Parameters:

  • config (DataConfig) –

    Data configuration (must have mode='training', patching.name='stratified').

  • data (TrainValSplitData) –

    Training data sources and number of validation patches.

  • loading (ReadFuncLoading or ImageStackLoading or None) –

    Custom loading specification when using custom data type.

  • rng (Generator) –

    Random generator for reproducible validation split.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the dataset will validate that the input data shape is compatible with the model constraints.

Returns:

  • tuple of (CareamicsDataset, CareamicsDataset)

    (train_dataset, val_dataset).

init_patch_extractor(patch_extractor, image_stack_loader, source, axes)

Build a patch extractor by loading image stacks from the source.

Parameters:

  • patch_extractor (type[PatchExtractor]) –

    The PatchExtractor class to instantiate (e.g. PatchExtractor).

  • image_stack_loader (ImageStackLoader) –

    Callable that takes (source, axes) and returns a list of image stacks.

  • source (Any) –

    Data source (paths, arrays, etc.) passed to the loader.

  • axes (str) –

    Axis order string passed to the loader.

Returns:

  • PatchExtractor[GenericImageStack]

    The constructed patch extractor instance.

select_image_stack_loader(data_type, in_memory, loading=None)

Select image stack loader function for the given data type and loading options.

Parameters:

  • data_type (SupportedData) –

    The type of data (array, tiff, zarr, czi, custom).

  • in_memory (bool) –

    Whether to load full data into memory (True) or use lazy loading.

  • loading (ReadFuncLoading or ImageStackLoading or None, default: None ) –

    Custom loading spec, required when data_type is custom.

Returns:

  • ImageStackLoader

    A callable that takes (source, axes) and returns a list of image stacks.

select_patch_extractor_type(data_type, in_memory)

Select the appropriate PatchExtractor type based on data type and memory mode.

If in_memory is True, or data_type is ZARR or CZI, the standard PatchExtractor is selected, otherwise the LimitFilesPatchExtractor will be used.

Parameters:

  • data_type (SupportedData) –

    The type of data being handled.

  • in_memory (bool) –

    Indicates whether data is to be loaded into memory.

Returns: