Factory
Factory functions and data types for building CAREamics datasets.
Loading = ReadFuncLoading | ImageStackLoading | None
module-attribute
The type of loading used for custom data. ReadFuncLoading is the use of
a simple function that will load full images into memory.
ImageStackLoading is for custom chunked or memory-mapped next-generation
file formats enabling single patches to be read from disk at a time.
If the data type is not custom loading should be None.
ImageStackLoading
dataclass
Loading spec. for a custom image stack loader (chunked / memory-mapped).
image_stack_loader
instance-attribute
A function that loads image data to a sequence of ImageStack objects.
image_stack_loader_kwargs = None
class-attribute
instance-attribute
Additional keyword arguments to pass to the image_stack_loader alongside the
source of the image data.
PredData
dataclass
ReadFuncLoading
dataclass
Loading specification using a custom read function.
extension_filter = ''
class-attribute
instance-attribute
A filter for finding source files using glob-style pattern matching. For example,
to select files with the extension .npy one should use the filter "*.npy".
read_kwargs = None
class-attribute
instance-attribute
Additional keyword arguments to pass to the read_source_func alongside the
file path to the image data.
read_source_func
instance-attribute
A function for reading image data to numpy arrays.
TrainValData
dataclass
TrainValSplitData
dataclass
create_dataset(config, inputs, targets, loading=None, model_constraints=None)
Create a CAREamicsDataset.
Parameters:
-
config(DataConfig) –The data configuration (data type, axes, patching, etc.).
-
inputs(Any) –The input data sources (paths, arrays, or custom).
-
targets(Any) –The target data sources, or None.
-
loading(ReadFuncLoading or ImageStackLoading or None, default:None) –Custom loading specification. Required when
data_typeis "custom": use ReadFuncLoading for a read function, or ImageStackLoading for a custom image stack loader. Otherwise None. -
model_constraints(ModelConstraints, default:None) –If provided, the data module will validate that the input data shape is compatible with the model constraints. Only used for prediction datasets.
Returns:
-
CareamicsDataset[ImageStack]–The configured dataset instance.
create_pred_dataset(config, data, loading, model_constraints=None)
Create the dataset for prediction.
Parameters:
-
config(DataConfig) –Data configuration.
-
data(PredData) –Prediction data sources.
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the prediction data shape is compatible with the model constraints.
Returns:
-
CareamicsDataset[ImageStack]–Dataset configured for prediction.
create_train_dataset(config, data, loading, model_constraints=None)
Create a dataset for training.
Parameters:
-
config(DataConfig) –Data configuration (must have mode='training').
-
data(TrainValData | TrainValSplitData) –Train and validation data sources (and optional targets/masks).
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the input data shape is compatible with the model constraints.
Returns:
-
CareamicsDataset–The training dataset.
create_train_val_datasets(config, data, loading, model_constraints=None)
Create train and validation datasets when validation data is provided explicitly.
Parameters:
-
config(DataConfig) –Data configuration (must have mode='training').
-
data(TrainValData) –Train and validation data sources (and optional targets/masks).
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the input data shape is compatible with the model constraints.
Returns:
-
tuple of (CareamicsDataset, CareamicsDataset)–(train_dataset, val_dataset).
create_val_split_datasets(config, data, loading, rng, model_constraints=None)
Create train and validation datasets by splitting from training data.
Requires stratified patching in config.
Parameters:
-
config(DataConfig) –Data configuration (must have mode='training', patching.name='stratified').
-
data(TrainValSplitData) –Training data sources and number of validation patches.
-
loading(ReadFuncLoading or ImageStackLoading or None) –Custom loading specification when using custom data type.
-
rng(Generator) –Random generator for reproducible validation split.
-
model_constraints(ModelConstraints, default:None) –If provided, the dataset will validate that the input data shape is compatible with the model constraints.
Returns:
-
tuple of (CareamicsDataset, CareamicsDataset)–(train_dataset, val_dataset).
init_patch_extractor(patch_extractor, image_stack_loader, source, axes)
Build a patch extractor by loading image stacks from the source.
Parameters:
-
patch_extractor(type[PatchExtractor]) –The PatchExtractor class to instantiate (e.g. PatchExtractor).
-
image_stack_loader(ImageStackLoader) –Callable that takes (source, axes) and returns a list of image stacks.
-
source(Any) –Data source (paths, arrays, etc.) passed to the loader.
-
axes(str) –Axis order string passed to the loader.
Returns:
-
PatchExtractor[GenericImageStack]–The constructed patch extractor instance.
select_image_stack_loader(data_type, in_memory, loading=None)
Select image stack loader function for the given data type and loading options.
Parameters:
-
data_type(SupportedData) –The type of data (array, tiff, zarr, czi, custom).
-
in_memory(bool) –Whether to load full data into memory (True) or use lazy loading.
-
loading(ReadFuncLoading or ImageStackLoading or None, default:None) –Custom loading spec, required when data_type is custom.
Returns:
-
ImageStackLoader–A callable that takes (source, axes) and returns a list of image stacks.
select_patch_extractor_type(data_type, in_memory)
Select the appropriate PatchExtractor type based on data type and memory mode.
If in_memory is True, or data_type is ZARR or CZI, the standard
PatchExtractor is selected, otherwise the LimitFilesPatchExtractor will be used.
Parameters:
-
data_type(SupportedData) –The type of data being handled.
-
in_memory(bool) –Indicates whether data is to be loaded into memory.
Returns:
-
type[PatchExtractor]–The selected PatchExtractor type.