Training#
You can provide data in various way to train your model: as a numpy
array, using a path to a folder or files, or by using CAREamics data module class for more control (advanced).
The details of how CAREamics deals with the loading and patching is detailed in the dataset section.
Data type
The data type of the source and targets must be the same as the one specified in the configuration. That is to say array
in the case of np.ndarray
, and tiff
in the case of paths.
Training by passing an array#
CAREamics can be trained by simply passing numpy arrays.
import numpy as np
train_array = np.random.rand(256, 256)
val_array = np.random.rand(256, 256)
careamist.train(
train_source=train_array, # (1)!
val_source=val_array, # (2)!
)
- All parameters to the
train
method must be specified by keyword. - If you don't provide a validation source, CAREamics will use a fraction of the training data to validate the model.
Supervised training
If you are training a supervised model, you must provide the target data as well.
Training by passing a path#
The same thing can be done by passing a path to a folder or files.
careamist.train(
train_source=path_to_train_data, # (1)!
val_source=path_to_val_data,
)
- The path can point to a single file, or contain multiple files.
Training from path
To train from a path, the data type must be set to tiff
or custom
in the configuration.
Splitting validation from training data#
If you only provide training data, CAREamics will extract the validation data directly from the training set. There are two parameters controlling that behaviour: val_percentage
and val_minimum_split
.
val_percentage
is the fraction of the training data that will be used for validation, and val_minimum_split
is the minimum number of images used. If the percentage leads to a number of patches smaller than val_minimum_split
, CAREamics will use val_minimum_split
.
careamist.train(
train_source=train_array,
val_percentage=0.1, # (1)!
val_minimum_split=5, # (2)!
)
- 10% of the training data will be used for validation.
- If the number of images is less than 5, CAREamics will use 5 images for validation.
Arrays vs files
The behaviour of val_percentage
and val_minimum_split
is based different depending on whether the source data is an array or a path. If the source is an array, the split is done on the patches (N
patches are used for validation). If the source is a path, the split is done on the files (N
files are used for validation).
Training by passing a TrainDataModule
object#
CAREamics provides a class to handle the data loading of custom data type. We will dive in more details in the next section into what this class can be used for. Here is a brief overview of how it is passed to the train
method.
from careamics.lightning import TrainDataModule
data_module = TrainDataModule( # (1)!
data_config=config.data_config, train_data=train_array
)
careamist.train(datamodule=data_module)
- Here this does the same thing as passing the
train_source
directly into thetrain
method. In the next section, we will see a more useful example.
Logging the training#
By default, CAREamics simply log the training progress in the console. However, it is possible to use either WandB or TensorBoard.
To decide on the logger, check out the Configuration section.
Loggers installation
Using WandB or TensorBoard require the installation of extra
dependencies. Check out the installation section to know more about it.