Skip to content

Data Module

Source

Next-Generation CAREamics DataModule.

InputVar = TypeVar('InputVar', NDArray[Any], Path, str, Sequence[NDArray[Any]], Sequence[Path | str]) module-attribute

Data source types, numpy arrays or paths or sequences of either.

(Paths can be str or pathlib.Path).

CareamicsDataModule

Bases: LightningDataModule

Data module for Careamics dataset.

Parameters:

  • data_config (DataConfig) –

    Pydantic model for CAREamics data configuration.

  • train_data (Any, default: None ) –

    Training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_target (Any, default: None ) –

    Training data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_mask (Any, default: None ) –

    Training data mask, an optional mask that can be provided to filter regions of the data during training, such as large areas of background. The mask should be a binary image where a 1 indicates a pixel should be included in the training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data (Any, default: None ) –

    Validation data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data_target (Any, default: None ) –

    Validation data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data (Any, default: None ) –

    Prediction data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data_target (Any, default: None ) –

    Prediction data target, this may be used for calculating metrics. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • model_constraints (ModelConstraints | None, default: None ) –

    If provided, the data module will validate that the prediction data shape is compatible with the model constraints.

  • loading (ReadFuncLoading | ImageStackLoading | None, default: None ) –

    The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

Attributes:

  • config (DataConfig) –

    Pydantic model for CAREamics data configuration.

  • data_type (str) –

    Type of data, one of SupportedData.

  • batch_size (int) –

    Batch size for the dataloaders.

Raises:

  • ValueError

    If at least one of train_data, val_data or pred_data is not provided.

  • ValueError

    If input and target data types are not consistent.

__init__(data_config, *, train_data=None, train_data_target=None, train_data_mask=None, val_data=None, val_data_target=None, pred_data=None, pred_data_target=None, model_constraints=None, loading=None)

__init__(data_config: DataConfig | dict[str, Any], *, train_data: InputVar | None = None, train_data_target: InputVar | None = None, train_data_mask: InputVar | None = None, val_data: InputVar | None = None, val_data_target: InputVar | None = None, pred_data: InputVar | None = None, pred_data_target: InputVar | None = None, model_constraints: ModelConstraints | None = None, loading: ReadFuncLoading | None = None) -> None
__init__(data_config: DataConfig | dict[str, Any], *, train_data: Any | None = None, train_data_target: Any | None = None, train_data_mask: Any | None = None, val_data: Any | None = None, val_data_target: Any | None = None, pred_data: Any | None = None, pred_data_target: Any | None = None, model_constraints: ModelConstraints | None = None, loading: ImageStackLoading = ...) -> None

Data module for Careamics dataset initialization.

Create a lightning datamodule that handles creating datasets for training, validation, and prediction.

Parameters:

  • data_config (DataConfig) –

    Pydantic model for CAREamics data configuration.

  • train_data (Any, default: None ) –

    Training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_target (Any, default: None ) –

    Training data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • train_data_mask (Any, default: None ) –

    Training data mask, an optional mask that can be provided to filter regions of the data during training, such as large areas of background. The mask should be a binary image where a 1 indicates a pixel should be included in the training data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data (Any, default: None ) –

    Validation data. If not provided, data_config.n_val_patches patches will selected from the training data for validation. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • val_data_target (Any, default: None ) –

    Validation data target. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data (Any, default: None ) –

    Prediction data. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • pred_data_target (Any, default: None ) –

    Prediction data target, this may be used for calculating metrics. If custom loading is provided it can be any type, otherwise it must be a pathlib.Path, str, numpy.ndarray or a sequence of these, or None.

  • model_constraints (ModelConstraints, default: None ) –

    If provided, the data module will validate input and target channels and spatial shapes against the model constraints.

  • loading (ReadFuncLoading | ImageStackLoading | None, default: None ) –

    The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

predict_dataloader()

Create a dataloader for prediction.

Returns:

  • DataLoader

    Prediction dataloader.

setup(stage)

Setup datasets.

Lightning hook that is called at the beginning of fit (train + validate), validate, test, or predict. Creates the datasets for a given stage.

Parameters:

  • stage (str) –

    The stage to set up datasets for. Is either 'fit', 'validate', 'test', or 'predict'.

Raises:

train_dataloader()

Create a dataloader for training.

Returns:

  • DataLoader

    Training dataloader.

val_dataloader()

Create a dataloader for validation.

Returns:

  • DataLoader

    Validation dataloader.