Skip to content

Data

Source

CAREamics Lightning Data Modules.

InputVar = TypeVar('InputVar', NDArray[Any], Path, str, Sequence[NDArray[Any]], Sequence[Path | str]) module-attribute

Data source types, numpy arrays or paths or sequences of either.

(Paths can be str or pathlib.Path).

CareamicsDataModule

Bases: LightningDataModule

Data module for Careamics dataset.

Parameters:

  • data_config (DataConfig) –

    Pydantic model for CAREamics data configuration.

  • train_data (Any, default: None ) –

    Training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_target (Any, default: None ) –

    Training data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_mask (Any, default: None ) –

    Training data mask, an optional mask that can be provided to filter regions of the data during training, such as large areas of background. The mask should be a binary image where a 1 indicates a pixel should be included in the training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data (Any, default: None ) –

    Validation data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data_target (Any, default: None ) –

    Validation data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data (Any, default: None ) –

    Prediction data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data_target (Any, default: None ) –

    Prediction data target, this may be used for calculating metrics. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • model_constraints (ModelConstraints | None, default: None ) –

    If provided, the data module will validate that the prediction data shape is compatible with the model constraints.

  • loading (ReadFuncLoading | ImageStackLoading | None, default: None ) –

    The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

Attributes:

  • config (DataConfig) –

    Pydantic model for CAREamics data configuration.

  • data_type (str) –

    Type of data, one of SupportedData.

  • batch_size (int) –

    Batch size for the dataloaders.

Raises:

  • ValueError

    If at least one of train_data, val_data or pred_data is not provided.

  • ValueError

    If input and target data types are not consistent.

__init__(data_config, *, train_data=None, train_data_target=None, train_data_mask=None, val_data=None, val_data_target=None, pred_data=None, pred_data_target=None, model_constraints=None, loading=None)

__init__(data_config: DataConfig | dict[str, Any], *, train_data: InputVar | None = None, train_data_target: InputVar | None = None, train_data_mask: InputVar | None = None, val_data: InputVar | None = None, val_data_target: InputVar | None = None, pred_data: InputVar | None = None, pred_data_target: InputVar | None = None, model_constraints: ModelConstraints | None = None, loading: ReadFuncLoading | None = None) -> None
__init__(data_config: DataConfig | dict[str, Any], *, train_data: Any | None = None, train_data_target: Any | None = None, train_data_mask: Any | None = None, val_data: Any | None = None, val_data_target: Any | None = None, pred_data: Any | None = None, pred_data_target: Any | None = None, model_constraints: ModelConstraints | None = None, loading: ImageStackLoading = ...) -> None

Data module for Careamics dataset initialization.

Create a lightning datamodule that handles creating datasets for training, validation, and prediction.

Parameters:

  • data_config (DataConfig) –

    Pydantic model for CAREamics data configuration.

  • train_data (Any, default: None ) –

    Training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_target (Any, default: None ) –

    Training data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_mask (Any, default: None ) –

    Training data mask, an optional mask that can be provided to filter regions of the data during training, such as large areas of background. The mask should be a binary image where a 1 indicates a pixel should be included in the training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data (Any, default: None ) –

    Validation data. If not provided, data_config.n_val_patches patches will selected from the training data for validation. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data_target (Any, default: None ) –

    Validation data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data (Any, default: None ) –

    Prediction data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data_target (Any, default: None ) –

    Prediction data target, this may be used for calculating metrics. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the data module will validate input and target channels and spatial shapes against the model constraints.

  • loading (ReadFuncLoading | ImageStackLoading | None, default: None ) –

    The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

predict_dataloader()

Create a dataloader for prediction.

Returns:

  • DataLoader

    Prediction dataloader.

setup(stage)

Setup datasets.

Lightning hook that is called at the beginning of fit (train + validate), validate, test, or predict. Creates the datasets for a given stage.

Parameters:

  • stage (str) –

    The stage to set up datasets for. Is either 'fit', 'validate', 'test', or 'predict'.

Raises:

train_dataloader()

Create a dataloader for training.

Returns:

  • DataLoader

    Training dataloader.

val_dataloader()

Create a dataloader for validation.

Returns:

  • DataLoader

    Validation dataloader.

GroupedIndexSampler

Bases: Sampler

A PyTorch Sampler that iterates through groups of indices.

The order of the groups and the order of indices within each group are shuffled.

This sampler is useful for iterative file loading — one file should be loaded at a time so indices belonging to the same file should be grouped, but the order of the files and the order of the indices should be shuffled.

Parameters:

  • grouped_indices (Sequence of (Sequence of int)) –

    The indices to iterate through, grouped (e.g. by file).

  • rng (Generator or None) –

    Random number generator for shuffling. If None, a default generator is used.

__init__(grouped_indices, rng)

Initialize the sampler from grouped index sequences.

Parameters:

  • grouped_indices (Sequence of (Sequence of int)) –

    The indices to iterate through, grouped (e.g. by file).

  • rng (Generator or None) –

    Random number generator for shuffling. If None, a default generator is used.

__iter__()

Iterate over indices with groups and within-group order shuffled.

Returns:

  • Iterator[int]

    Indices from all groups in shuffled group order and shuffled order within each group.

from_dataset(dataset, rng=None) classmethod

Create the sampler from a CareamicsDataset.

The grouped indices will be retrieved from the dataset's patching strategy.

Parameters:

  • dataset (CareamicsDataset) –

    An instance of the CareamicsDataset to create the sampler for.

  • rng (Generator, default: None ) –

    Random number generator used to seed the sampler. If None, a default generator is used.

Returns:

  • GroupedIndexSampler

    A sampler yielding indices grouped by the dataset's patching strategy.