Factory
CAREamics dataset factory functions and utilities.
Loading = ReadFuncLoading | ImageStackLoading | None
module-attribute
The type of loading used for custom data. ReadFuncLoading is the use of
a simple function that will load full images into memory.
ImageStackLoading is for custom chunked or memory-mapped next-generation
file formats enabling single patches to be read from disk at a time.
If the data type is not custom loading should be None.
ImageStackLoading
dataclass
Loading spec. for a custom image stack loader (chunked / memory-mapped).
image_stack_loader
instance-attribute
A function that loads image data to a sequence of ImageStack objects.
image_stack_loader_kwargs = None
class-attribute
instance-attribute
Additional keyword arguments to pass to the image_stack_loader alongside the
source of the image data.
IndependentTargets
dataclass
Bases: Generic[T]
MicroSplit data with independent target structures.
The data for different target structures may have a different shape.
The input will be a synthetically generated superposition of the target structures.
MultiChannelTarget
dataclass
Bases: Generic[T]
MicroSplit data with target channels acquired together.
The input will be a synthetically generated superposition of the target channels.
PairedInputTarget
dataclass
PredData
dataclass
ReadFuncLoading
dataclass
Loading specification using a custom read function.
extension_filter = ''
class-attribute
instance-attribute
A filter for finding source files using glob-style pattern matching. For example,
to select files with the extension .npy one should use the filter "*.npy".
read_kwargs = None
class-attribute
instance-attribute
Additional keyword arguments to pass to the read_source_func alongside the
file path to the image data.
read_source_func
instance-attribute
A function for reading image data to numpy arrays.
TrainValData
dataclass
TrainValSplitData
dataclass
create_dataset(config, inputs, targets, loading=None, model_constraints=None)
Create a CAREamicsDataset.
Parameters:
-
config(DataConfig) –The data configuration (data type, axes, patching, etc.).
-
inputs(Any) –The input data sources (paths, arrays, or custom).
-
targets(Any) –The target data sources, or None.
-
loading(ReadFuncLoading or ImageStackLoading or None, default:None) –Custom loading specification. Required when
data_typeis "custom": use ReadFuncLoading for a read function, or ImageStackLoading for a custom image stack loader. Otherwise None. -
model_constraints(ModelConstraints, default:None) –If provided, the data module will validate that the input data shape is compatible with the model constraints. Only used for prediction datasets.
Returns:
-
CareamicsDataset[ImageStack]–The configured dataset instance.
create_microsplit_dataset(config, data, loading=None, model_constraints=None, rng=None)
Create a MicroSplit training or validation dataset.
The data type determines which MicroSplit training mode to use. There are three
options:
- `MicroSplitMultiplexedTargetData`: When only multiplexed target channels are
available the inputs can be synthesized by summing together the target channels.
- `MicroSplitSeparateTargetData`: Multiplexed target channels are not available,
instead, each channel are acquired separately. This should only be used for
structures which are not spatially correlated.
- `MicroSplitPairedData`: When both the multiplexed target channels and the
real input are available.
Parameters:
-
config(MicroSplitDataConfig) –MicroSplit data configuration.
-
data(MicroSplitTrainingData) –Data sources used to construct MicroSplit training patches. Either
MicroSplitMultiplexedTargetData,MicroSplitSeparateTargetDataorMicroSplitPairedData. -
loading(Loading, default:None) –Loading specification for custom data.
-
model_constraints(ModelConstraints, default:None) –Optional model constraints for dataset validation.
-
rng(Generator, default:None) –Random number generator passed to stochastic MicroSplit constructors.
Returns:
-
CareamicsDataset–The configured MicroSplit dataset.
create_microsplit_pred_dataset(config, input_data, loading=None, model_constraints=None)
Create a MicroSplit prediction dataset.
Parameters:
-
config(MicroSplitDataConfig) –MicroSplit prediction data configuration.
-
input_data((Sequence[NDArray], Sequence[Path] or Any)) –Prediction data sources. For default loading, this is a list of numpy arrays or a list of file paths. If using a custom image stack loader the input can be any type that is supported by the loader.
-
loading(Loading, default:None) –Loading specification.
NoneorReadFuncLoadingis used for standard array and path inputs, whileImageStackLoadingis used for custom input data. -
model_constraints(ModelConstraints, default:None) –Optional model constraints for dataset validation.
Returns:
-
CareamicsDataset–The configured MicroSplit prediction dataset.
create_pred_dataset(config, data, loading, model_constraints=None)
Create the dataset for prediction.
Parameters:
-
config(DataConfig) –Data configuration.
-
data(PredData) –Prediction data sources.
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the prediction data shape is compatible with the model constraints.
Returns:
-
CareamicsDataset[ImageStack]–Dataset configured for prediction.
create_train_dataset(config, data, loading, model_constraints=None)
Create a dataset for training.
Parameters:
-
config(DataConfig) –Data configuration (must have mode='training').
-
data(TrainValData | TrainValSplitData) –Train and validation data sources (and optional targets/masks).
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the input data shape is compatible with the model constraints.
Returns:
-
CareamicsDataset–The training dataset.
create_train_val_datasets(config, data, loading, model_constraints=None)
Create train and validation datasets when validation data is provided explicitly.
Parameters:
-
config(DataConfig) –Data configuration (must have mode='training').
-
data(TrainValData) –Train and validation data sources (and optional targets/masks).
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the input data shape is compatible with the model constraints.
Returns:
-
tuple of (CareamicsDataset, CareamicsDataset)–(train_dataset, val_dataset).
create_val_split_datasets(config, data, loading, rng, model_constraints=None)
Create train and validation datasets by splitting from training data.
Requires stratified patching in config.
Parameters:
-
config(DataConfig) –Data configuration (must have mode='training', patching.name='stratified').
-
data(TrainValSplitData) –Training data sources and number of validation patches.
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
rng(Generator) –Random generator for reproducible validation split.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the input data shape is compatible with the model constraints.
Returns:
-
tuple of (CareamicsDataset, CareamicsDataset)–(train_dataset, val_dataset).
init_patch_extractor(patch_extractor, image_stack_loader, source, axes)
Build a patch extractor by loading image stacks from the source.
Parameters:
-
patch_extractor(type[PatchExtractor]) –The PatchExtractor class to instantiate (e.g. PatchExtractor).
-
image_stack_loader(ImageStackLoader) –Callable that takes (source, axes) and returns a list of image stacks.
-
source(Any) –Data source (paths, arrays, etc.) passed to the loader.
-
axes(str) –Axis order string passed to the loader.
Returns:
-
PatchExtractor[GenericImageStack]–The constructed patch extractor instance.
select_image_stack_loader(data_type, in_memory, loading=None)
Select image stack loader function for the given data type and loading options.
Parameters:
-
data_type(SupportedData) –The type of data (array, tiff, zarr, czi, custom).
-
in_memory(bool) –Whether to load full data into memory (True) or use lazy loading.
-
loading(ReadFuncLoading or ImageStackLoading or None, default:None) –Custom loading spec, required when data_type is custom.
Returns:
-
ImageStackLoader–A callable that takes (source, axes) and returns a list of image stacks.
select_patch_extractor_type(data_type, in_memory)
Select the appropriate PatchExtractor type based on data type and memory mode.
If in_memory is True, or data_type is ZARR or CZI, the standard
PatchExtractor is selected, otherwise the LimitFilesPatchExtractor will be used.
Parameters:
-
data_type(SupportedData) –The type of data being handled.
-
in_memory(bool) –Indicates whether data is to be loaded into memory.
Returns:
-
type[PatchExtractor]–The selected PatchExtractor type.