Skip to content

Careamist

Source

Main interface for training and predicting with CAREamics.

CAREamist

Main interface for training and predicting with CAREamics.

Attributes:

  • workdir (Path) –

    Working directory in which to save training outputs.

  • config (Configuration[AlgorithmConfig]) –

    CAREamics configuration.

  • model (CAREamicsModule) –

    The PyTorch Lightning module to be trained and used for prediction.

  • checkpoint_path (Path | None) –

    Path to a checkpoint file from which model and configuration may be loaded.

  • trainer (Trainer) –

    The PyTorch Lightning Trainer used for training and prediction.

  • callbacks (list[Callback]) –

    List of callbacks used during training.

  • prediction_writer (PredictionWriterCallback) –

    Callback used to write predictions to disk during prediction.

  • train_datamodule (CareamicsDataModule | None) –

    The datamodule used for training, set after calling train().

Parameters:

  • config (Configuration | Path | str, default: None ) –

    CAREamics configuration, or a path to a configuration file. See careamics.config.ng_factories for method to build configurations.

  • checkpoint_path (Path | str, default: None ) –

    Path to a checkpoint file from which to load the model and configuration.

  • bmz_path (Path | str, default: None ) –

    Path to a BioImage Model Zoo archive from which to load the model and configuration.

  • work_dir (Path | str, default: None ) –

    Working directory in which to save training outputs. If None, the current working directory will be used.

  • callbacks (list of PyTorch Lightning Callbacks, default: None ) –

    List of callbacks to use during training. If None, no additional callbacks will be used. Note that ModelCheckpoint and EarlyStopping callbacks are already defined in CAREamics and should only be modified through the training configuration (see Configuration and TrainingConfig).

  • enable_progress_bar (bool, default: True ) –

    Whether to show the progress bar during training.

__init__(config=None, *, checkpoint_path=None, bmz_path=None, work_dir=None, callbacks=None, enable_progress_bar=True)

Constructor.

Exactly one of config, checkpoint_path, or bmz_path must be provided.

Parameters:

  • config (Configuration | Path | str, default: None ) –

    CAREamics configuration, or a path to a configuration file. See careamics.config.ng_factories for method to build configurations. config is mutually exclusive with checkpoint_path and bmz_path.

  • checkpoint_path (Path | str, default: None ) –

    Path to a checkpoint file from which to load the model and configuration. checkpoint_path is mutually exclusive with config and bmz_path.

  • bmz_path (Path | str, default: None ) –

    Path to a BioImage Model Zoo archive from which to load the model and configuration. bmz_path is mutually exclusive with config and checkpoint_path.

  • work_dir (Path | str, default: None ) –

    Working directory in which to save training outputs. If None, the current working directory will be used.

  • callbacks (list of PyTorch Lightning Callbacks, default: None ) –

    List of callbacks to use during training. If None, no additional callbacks will be used. Note that ModelCheckpoint and EarlyStopping callbacks are already defined in CAREamics and should only be modified through the training configuration (see Configuration and TrainingConfig).

  • enable_progress_bar (bool, default: True ) –

    Whether to show the progress bar during training.

export_to_bmz(path_to_archive, friendly_model_name, input_array, authors, general_description, data_description, covers=None, channel_names=None, model_version='0.2.0')

Export the model to the BioImage Model Zoo format.

This method packages the current weights into a zip file that can be uploaded to the BioImage Model Zoo. The archive consists of the model weights, the model specifications and various files (inputs, outputs, README, env.yaml etc.).

path_to_archive should point to a file with a ".zip" extension.

friendly_model_name is the name used for the model in the BMZ specs and website, it should consist of letters, numbers, dashes, underscores and parentheses only.

Input array must be of the same dimensions as the axes recorded in the configuration of the CAREamist.

Parameters:

  • path_to_archive (Path or str) –

    Path in which to save the model, including file name, which should end with ".zip".

  • friendly_model_name (str) –

    Name of the model as used in the BMZ specs, it should consist of letters, numbers, dashes, underscores and parentheses only.

  • input_array (NDArray) –

    Input array used to validate the model and as example.

  • authors (list of dict) –

    List of authors of the model.

  • general_description (str) –

    General description of the model used in the BMZ metadata.

  • data_description (str) –

    Description of the data the model was trained on.

  • covers (list of pathlib.Path or str, default: None ) –

    Paths to the cover images.

  • channel_names (list of str, default: None ) –

    Channel names.

  • model_version (str, default: "0.1.0" ) –

    Version of the model.

get_checkpoints()

Return the filenames of available checkpoints.

Scans the checkpoint directory and returns checkpoint filenames sorted by epoch number.

Returns:

  • list of Path

    Checkpoint paths sorted by epoch number. The last checkpoint (if present) is appended at the end.

get_losses()

Return data that can be used to plot train and validation loss curves.

Returns:

  • dict of str: list

    Dictionary containing losses for each epoch.

predict(pred_data, *, batch_size=None, tile_size=None, tile_overlap=(48, 48), axes=None, data_type=None, num_workers=None, channels=None, in_memory=None, loading=None, checkpoint=None)

predict(pred_data: InputVar, *, batch_size: int | None = None, tile_size: tuple[int, ...] | None = None, tile_overlap: tuple[int, ...] | None = (48, 48), axes: str | None = None, data_type: Literal['array', 'tiff', 'zarr', 'czi', 'custom'] | None = None, num_workers: int | None = None, channels: Sequence[int] | Literal['all'] | None = None, in_memory: bool | None = None, loading: ReadFuncLoading | None = None, checkpoint: str | Path | None = None) -> tuple[list[NDArray], list[str]]
predict(pred_data: Any, *, batch_size: int | None = None, tile_size: tuple[int, ...] | None = None, tile_overlap: tuple[int, ...] | None = (48, 48), axes: str | None = None, data_type: Literal['array', 'tiff', 'zarr', 'czi', 'custom'] | None = None, num_workers: int | None = None, channels: Sequence[int] | Literal['all'] | None = None, in_memory: bool | None = None, loading: ImageStackLoading = ..., checkpoint: str | Path | None = None) -> tuple[list[NDArray], list[str]]

Predict on data and return the predictions.

Input can be a path to a data file, a list of paths, a numpy array, or a list of numpy arrays.

If data_type and axes are not provided, the training configuration parameters will be used. If tile_size is not provided, prediction will be performed on the whole image.

Note that if you are using a UNet model and tiling, the tile size must be divisible in every dimension by 2**d, where d is the depth of the model. This avoids artefacts arising from the broken shift invariance induced by the pooling layers of the UNet. Images smaller than the tile size in any spatial dimension will be automatically zero-padded.

Parameters:

  • pred_data (pathlib.Path, str, numpy.ndarray, or sequence of these) –

    Data to predict on. Can be a single item or a sequence of paths/arrays.

  • batch_size (int, default: None ) –

    Batch size for prediction. If not provided, uses the training configuration batch size.

  • tile_size (tuple of int, default: None ) –

    Size of the tiles to use for prediction. If not provided, prediction will be performed on the whole image.

  • tile_overlap (tuple of int, default: (48, 48) ) –

    Overlap between tiles, can be None.

  • axes (str, default: None ) –

    Axes of the input data, by default None.

  • data_type ((array, tiff, czi, zarr, custom), default: "array" ) –

    Type of the input data.

  • num_workers (int, default: None ) –

    Number of workers for the dataloader, by default None.

  • channels (sequence of int or "all", default: None ) –

    Channels to use from the data. If None, uses the training configuration channels.

  • in_memory (bool, default: None ) –

    Whether to load all data into memory. If None, uses the training configuration setting.

  • loading (Loading, default: None ) –

    Loading strategy to use for the prediction data. May be a ReadFuncLoading or ImageStackLoading. If None, uses the loading strategy from the training configuration.

  • checkpoint (str or Path, default: None ) –

    Checkpoint to load before making predictions. Can be "best", "last", or a path to a specific checkpoint. If None, uses the last checkpoint from training Noise2Void or Noise2Noise models, otherwise the best checkpoint. Call CAREamist.get_checkpoints for a list of available checkpoints.

Returns:

  • tuple of (list of NDArray, list of str)

    Predictions made by the model and their source identifiers.

Raises:

  • ValueError

    If tile overlap is not specified when tile_size is provided.

predict_to_disk(pred_data, *, pred_data_target=None, prediction_dir='predictions', batch_size=None, tile_size=None, tile_overlap=(48, 48), axes=None, data_type=None, num_workers=None, channels=None, in_memory=None, loading=None, checkpoint=None, write_type='tiff', write_extension=None, write_func=None, write_func_kwargs=None)

predict_to_disk(pred_data: InputVar, *, pred_data_target: InputVar | None = None, prediction_dir: Path | str = 'predictions', batch_size: int | None = None, tile_size: tuple[int, ...] | None = None, tile_overlap: tuple[int, ...] | None = (48, 48), axes: str | None = None, data_type: Literal['array', 'tiff', 'zarr', 'czi', 'custom'] | None = None, num_workers: int | None = None, channels: Sequence[int] | Literal['all'] | None = None, in_memory: bool | None = None, loading: ReadFuncLoading | None = None, checkpoint: str | Path | None = None, write_type: Literal['tiff', 'zarr', 'custom'] = 'tiff', write_extension: str | None = None, write_func: WriteFunc | None = None, write_func_kwargs: dict[str, Any] | None = None) -> None
predict_to_disk(pred_data: Any, *, pred_data_target: Any | None = None, prediction_dir: Path | str = 'predictions', batch_size: int | None = None, tile_size: tuple[int, ...] | None = None, tile_overlap: tuple[int, ...] | None = (48, 48), axes: str | None = None, data_type: Literal['array', 'tiff', 'zarr', 'czi', 'custom'] | None = None, num_workers: int | None = None, channels: Sequence[int] | Literal['all'] | None = None, in_memory: bool | None = None, loading: ImageStackLoading = ..., checkpoint: str | Path | None = None, write_type: Literal['tiff', 'zarr', 'custom'] = 'tiff', write_extension: str | None = None, write_func: WriteFunc | None = None, write_func_kwargs: dict[str, Any] | None = None) -> None

Make predictions on the provided data and save outputs to files.

Predictions are saved to prediction_dir (absolute paths are used as-is, relative paths are relative to work_dir). The directory structure matches the source directory.

The file names of the predictions will match those of the source. If there is more than one sample within a file, the samples will be stacked along the sample dimension in the output file.

If data_type and axes are not provided, the training configuration parameters will be used. If tile_size is not provided, prediction will be performed on whole images rather than in a tiled manner.

Note that if you are using a UNet model and tiling, the tile size must be divisible in every dimension by 2**d, where d is the depth of the model. This avoids artefacts arising from the broken shift invariance induced by the pooling layers of the UNet. Images smaller than the tile size in any spatial dimension will be automatically zero-padded.

Parameters:

  • pred_data (pathlib.Path, str, numpy.ndarray, or sequence of these) –

    Data to predict on. Can be a single item or a sequence of paths/arrays.

  • pred_data_target (pathlib.Path, str, numpy.ndarray, or sequence of these, default: None ) –

    Prediction data target, by default None.

  • prediction_dir (Path | str, default: "predictions" ) –

    The path to save the prediction results to. If prediction_dir is an absolute path, it will be used as-is. If it is a relative path, it will be relative to the pre-set work_dir. If the directory does not exist it will be created.

  • batch_size (int, default: None ) –

    Batch size for prediction. If not provided, uses the training configuration batch size.

  • tile_size (tuple of int, default: None ) –

    Size of the tiles to use for prediction. If not provided, uses whole image strategy.

  • tile_overlap (tuple of int, default: (48, 48) ) –

    Overlap between tiles.

  • axes (str, default: None ) –

    Axes of the input data, by default None.

  • data_type ((array, tiff, czi, zarr, custom), default: "array" ) –

    Type of the input data.

  • num_workers (int, default: None ) –

    Number of workers for the dataloader, by default None.

  • channels (sequence of int or "all", default: None ) –

    Channels to use from the data. If None, uses the training configuration channels.

  • in_memory (bool, default: None ) –

    Whether to load all data into memory. If None, uses the training configuration setting.

  • loading (Loading, default: None ) –

    Loading strategy to use for the prediction data. May be a ReadFuncLoading or ImageStackLoading. If None, uses the loading strategy from the training configuration.

  • checkpoint (str or Path, default: None ) –

    Checkpoint to load before making predictions. Can be "best", "last", or a path to a specific checkpoint. If None, uses the last checkpoint from training Noise2Void or Noise2Noise models, otherwise the best checkpoint. Call CAREamist.get_checkpoints for a list of available checkpoints.

  • write_type ((tiff, zarr, custom), default: "tiff" ) –

    The data type to save as, includes custom.

  • write_extension (str, default: None ) –

    If a known write_type is selected this argument is ignored. For a custom write_type an extension to save the data with must be passed.

  • write_func (WriteFunc, default: None ) –

    If a known write_type is selected this argument is ignored. For a custom write_type a function to save the data must be passed. See notes below.

  • write_func_kwargs (dict of {str: any}, default: None ) –

    Additional keyword arguments to be passed to the save function.

Raises:

  • ValueError

    If write_type is custom and write_extension is None.

  • ValueError

    If write_type is custom and write_func is None.

stop_training()

Stop the training loop.

train(*, train_data=None, train_data_target=None, val_data=None, val_data_target=None, filtering_mask=None, loading=None)

train(*, train_data: InputVar | None = None, train_data_target: InputVar | None = None, val_data: InputVar | None = None, val_data_target: InputVar | None = None, filtering_mask: InputVar | None = None, loading: ReadFuncLoading | None = None) -> None
train(*, train_data: Any | None = None, train_data_target: Any | None = None, val_data: Any | None = None, val_data_target: Any | None = None, filtering_mask: Any | None = None, loading: ImageStackLoading = ...) -> None

Train the model on the provided data.

The training data can be provided as arrays or paths.

Parameters:

  • train_data (pathlib.Path, str, numpy.ndarray, or sequence of these, default: None ) –

    Training data, by default None.

  • train_data_target (pathlib.Path, str, numpy.ndarray, or sequence of these, default: None ) –

    Training target data, by default None.

  • val_data (pathlib.Path, str, numpy.ndarray, or sequence of these, default: None ) –

    Validation data. If not provided, data_config.n_val_patches patches will selected from the training data for validation.

  • val_data_target (pathlib.Path, str, numpy.ndarray, or sequence of these, default: None ) –

    Validation target data, by default None.

  • filtering_mask (pathlib.Path, str, numpy.ndarray, or sequence of these, default: None ) –

    Filtering mask for coordinate-based patch filtering, by default None.

  • loading (Loading, default: None ) –

    Loading strategy to use for the prediction data. May be a ReadFuncLoading or ImageStackLoading. If None, uses the loading strategy from the training configuration.

Raises: