Predicting with CAREamics
Returning predictions
The simplest form of prediction is to call predict, which returns the predictions.
Checkpoint
By default, CAREamics uses a checkpoint callback that saves multiple checkpoints during training. Depending on the algorithm, the prediction method will either use the best checkpoint (the one with the lowest validation loss) or the last checkpoint (the one from the last epoch).
Noise2Void and Noise2Noise will use the last checkpoint, while the other algorithms will use the best checkpoint.
Choosing the checkpoint
Checkpoints can be specified by passing a checkpoint argument to the predict method.
It can be either a path to a checkpoint file or one of the keywords specified by
PyTorch Lightning (typically "best" or "last", see documentation).
predictions = careamist.predict(
pred_data=pred_data,
checkpoint="last", # or "best"
)
Noise2Void and Noise2Noise
Noise2Void and Noise2Noise models do not have a well-defined "best" checkpoint based on validation loss. Specify an explicit path if you want to use a specific checkpoint.
Tiling
For odd-sized or large images, tiling should be used. Tiling is enabled by passing
a tile_size to the prediction method.
predictions = careamist.predict(
pred_data=pred_data,
tile_size=[128, 128], # (1)!
tile_overlap=[48, 48], # (2)!
)
- The
tile_sizeneed not be equal to the training patch size. - Overlaps are optional, and default to
(48, 48)
Why use tiling?
If the images have dimensions that are not comaptible with the network architecture
(shape % 2**model_depth != 0), then tiling is required to ensure that the network
can process the images.
If the images are large, then they might simply not fit in memory and need to be processed as tiles.
What tiling size to choose?
The tiling size should be chosen based on the size of the images and the available
memory. A good starting point is to use a patch size that is a multiple of
2**model_depth. You can then play with batch_size to find the optimal memory
usage.
Note that the overlaps must be larger than the receptive field of the network.
Dimensions
Obviously, the tiling and overlaps must respect the dimensions of the images (2D or 3D).
Changing dataloading parameters
During prediction, you can change the dataloading parameters by passing them to the
predict method. The parameters batch_size and num_workers can be set through the predict function arguments, while any other parameters have to be changed through the configuration.
Data parameters
The data we want to predict on might be different from the training data, in terms of
axes or format. The predict method allows changing these parameters (axes,
data_type, channels, in_memory).
Here, let's say we trained from arrays of axes YX, and now want to predict with the
trained model on a TIFF file (we have a path pred_data_path) that has multiple
time-points. We need to set new_axes to SYX to specify the new axes order. We also
need to specify the new axes and data_type. Finally, we do not want to train in-memory.
predictions = careamist.predict(
pred_data=pred_data_path, # (1)!
axes="SYX",
data_type="tiff",
in_memory=False,
)
- Now, data is a path.
Coherence of the data parameters
Prediction data must have the same type content as the training data (we talk about the data being "in distribution"). That means that it cannot have different spatial axes or suddenly have more channels.
New data has channels
If the new data has channels, but the model was not trained on multiple channels,
then the channels parameter can be used to specify which channels to use for
prediction. For example, if the new data has 3 channels, but the model was
trained on single-channel data, then channels=[1] can be used to specify that
only the second channel should be used for prediction.
Predicting to disk
(soon)
CZI format
Prediction directly to disk is not available for CZI data.
Zarr format
Prediction directly to disk is with Zarr requires tiling to be enabled.