Data
CAREamics Lightning Data Modules.
InputVar = TypeVar('InputVar', NDArray[Any], Path, str, Sequence[NDArray[Any]], Sequence[Path | str])
module-attribute
Data source types, numpy arrays or paths or sequences of either.
(Paths can be str or pathlib.Path).
CareamicsDataModule
Bases: LightningDataModule
Data module for Careamics dataset.
Parameters:
-
data_config(DataConfig) –Pydantic model for CAREamics data configuration.
-
train_data(Any, default:None) –Training data. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
train_data_target(Any, default:None) –Training data target. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
train_data_mask(Any, default:None) –Training data mask, an optional mask that can be provided to filter regions of the data during training, such as large areas of background. The mask should be a binary image where a 1 indicates a pixel should be included in the training data. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
val_data(Any, default:None) –Validation data. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
val_data_target(Any, default:None) –Validation data target. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
pred_data(Any, default:None) –Prediction data. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
pred_data_target(Any, default:None) –Prediction data target, this may be used for calculating metrics. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
model_constraints(ModelConstraints | None, default:None) –If provided, the data module will validate that the prediction data shape is compatible with the model constraints.
-
loading(ReadFuncLoading | ImageStackLoading | None, default:None) –The type of loading used for custom data.
ReadFuncLoadingis the use of a simple function that will load full images into memory.ImageStackLoadingis for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not customloadingshould beNone.
Attributes:
-
config(DataConfig) –Pydantic model for CAREamics data configuration.
-
data_type(str) –Type of data, one of SupportedData.
-
batch_size(int) –Batch size for the dataloaders.
Raises:
-
ValueError–If at least one of train_data, val_data or pred_data is not provided.
-
ValueError–If input and target data types are not consistent.
__init__(data_config, *, train_data=None, train_data_target=None, train_data_mask=None, val_data=None, val_data_target=None, pred_data=None, pred_data_target=None, model_constraints=None, loading=None)
__init__(data_config: DataConfig | dict[str, Any], *, train_data: InputVar | None = None, train_data_target: InputVar | None = None, train_data_mask: InputVar | None = None, val_data: InputVar | None = None, val_data_target: InputVar | None = None, pred_data: InputVar | None = None, pred_data_target: InputVar | None = None, model_constraints: ModelConstraints | None = None, loading: ReadFuncLoading | None = None) -> None
__init__(data_config: DataConfig | dict[str, Any], *, train_data: Any | None = None, train_data_target: Any | None = None, train_data_mask: Any | None = None, val_data: Any | None = None, val_data_target: Any | None = None, pred_data: Any | None = None, pred_data_target: Any | None = None, model_constraints: ModelConstraints | None = None, loading: ImageStackLoading = ...) -> None
Data module for Careamics dataset initialization.
Create a lightning datamodule that handles creating datasets for training, validation, and prediction.
Parameters:
-
data_config(DataConfig) –Pydantic model for CAREamics data configuration.
-
train_data(Any, default:None) –Training data. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
train_data_target(Any, default:None) –Training data target. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
train_data_mask(Any, default:None) –Training data mask, an optional mask that can be provided to filter regions of the data during training, such as large areas of background. The mask should be a binary image where a 1 indicates a pixel should be included in the training data. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
val_data(Any, default:None) –Validation data. If not provided,
data_config.n_val_patchespatches will selected from the training data for validation. If customloadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
val_data_target(Any, default:None) –Validation data target. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
pred_data(Any, default:None) –Prediction data. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
pred_data_target(Any, default:None) –Prediction data target, this may be used for calculating metrics. If custom
loadingis provided it can be any type, otherwise it must be apathlib.Path,str,numpy.ndarrayor a sequence of these, or None. -
model_constraints(ModelConstraints, default:None) –If provided, the data module will validate input and target channels and spatial shapes against the model constraints.
-
loading(ReadFuncLoading | ImageStackLoading | None, default:None) –The type of loading used for custom data.
ReadFuncLoadingis the use of a simple function that will load full images into memory.ImageStackLoadingis for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not customloadingshould beNone.
predict_dataloader()
Create a dataloader for prediction.
Returns:
-
DataLoader–Prediction dataloader.
setup(stage)
Setup datasets.
Lightning hook that is called at the beginning of fit (train + validate), validate, test, or predict. Creates the datasets for a given stage.
Parameters:
-
stage(str) –The stage to set up datasets for. Is either 'fit', 'validate', 'test', or 'predict'.
Raises:
-
NotImplementedError–If stage is not one of "fit", "validate" or "predict".
train_dataloader()
Create a dataloader for training.
Returns:
-
DataLoader–Training dataloader.
val_dataloader()
Create a dataloader for validation.
Returns:
-
DataLoader–Validation dataloader.
GroupedIndexSampler
Bases: Sampler
A PyTorch Sampler that iterates through groups of indices.
The order of the groups and the order of indices within each group are shuffled.
This sampler is useful for iterative file loading — one file should be loaded at a time so indices belonging to the same file should be grouped, but the order of the files and the order of the indices should be shuffled.
Parameters:
-
grouped_indices(Sequence of (Sequence of int)) –The indices to iterate through, grouped (e.g. by file).
-
rng(Generator or None) –Random number generator for shuffling. If None, a default generator is used.
__init__(grouped_indices, rng)
Initialize the sampler from grouped index sequences.
Parameters:
-
grouped_indices(Sequence of (Sequence of int)) –The indices to iterate through, grouped (e.g. by file).
-
rng(Generator or None) –Random number generator for shuffling. If None, a default generator is used.
__iter__()
from_dataset(dataset, rng=None)
classmethod
Create the sampler from a CareamicsDataset.
The grouped indices will be retrieved from the dataset's patching strategy.
Parameters:
-
dataset(CareamicsDataset) –An instance of the CareamicsDataset to create the sampler for.
-
rng(Generator, default:None) –Random number generator used to seed the sampler. If None, a default generator is used.
Returns:
-
GroupedIndexSampler–A sampler yielding indices grouped by the dataset's patching strategy.