Dataset
Dataset module.
InMemoryDataset
Bases: Dataset
Dataset storing data in memory and allowing generating patches from it.
Parameters:
-
data_config(CAREamics DataConfig) –(see careamics.config.data_model.DataConfig) Data configuration.
-
inputs(ndarray or list[Path]) –Input data.
-
input_target(ndarray or list[Path], default:None) –Target data, by default None.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
-
**kwargs(Any, default:{}) –Additional keyword arguments, unused.
__getitem__(index)
Return the patch corresponding to the provided index.
Parameters:
-
index(int) –Index of the patch to return.
Returns:
-
tuple of numpy.ndarray–Patch.
Raises:
-
ValueError–If dataset mean and std are not set.
__init__(data_config, inputs, input_target=None, read_source_func=read_tiff, **kwargs)
Constructor.
Parameters:
-
data_config(GeneralDataConfig) –Data configuration.
-
inputs(ndarray or list[Path]) –Input data.
-
input_target(ndarray or list[Path], default:None) –Target data, by default None.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
-
**kwargs(Any, default:{}) –Additional keyword arguments, unused.
__len__()
get_data_statistics()
Return training data statistics.
This does not return the target data statistics, only those of the input.
Returns:
-
tuple of list of floats–Means and standard deviations across channels of the training data.
split_dataset(percentage=0.1, minimum_patches=1)
Split a new dataset away from the current one.
This method is used to extract random validation patches from the dataset.
Parameters:
-
percentage(float, default:0.1) –Percentage of patches to extract, by default 0.1.
-
minimum_patches(int, default:1) –Minimum number of patches to extract, by default 5.
Returns:
-
CAREamics InMemoryDataset–New dataset with the extracted patches.
Raises:
-
ValueError–If
percentageis not between 0 and 1. -
ValueError–If
minimum_numberis not between 1 and the number of patches.
InMemoryPredDataset
Bases: Dataset
Simple prediction dataset returning images along the sample axis.
Parameters:
-
prediction_config(InferenceConfig) –Prediction configuration.
-
inputs(NDArray) –Input data.
__getitem__(index)
__init__(prediction_config, inputs)
Constructor.
Parameters:
-
prediction_config(InferenceConfig) –Prediction configuration.
-
inputs(NDArray) –Input data.
Raises:
-
ValueError–If data_path is not a directory.
__len__()
InMemoryTiledPredDataset
Bases: Dataset
Prediction dataset storing data in memory and returning tiles of each image.
Parameters:
-
prediction_config(InferenceConfig) –Prediction configuration.
-
inputs(NDArray) –Input data.
__getitem__(index)
Return the patch corresponding to the provided index.
Parameters:
-
index(int) –Index of the patch to return.
Returns:
-
tuple of NDArray and TileInformation–Transformed patch.
__init__(prediction_config, inputs)
Constructor.
Parameters:
-
prediction_config(InferenceConfig) –Prediction configuration.
-
inputs(NDArray) –Input data.
Raises:
-
ValueError–If data_path is not a directory.
__len__()
IterablePredDataset
Bases: IterableDataset
Simple iterable prediction dataset.
Parameters:
-
prediction_config(InferenceConfig) –Inference configuration.
-
src_files(List[Path]) –List of data files.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
-
**kwargs(Any, default:{}) –Additional keyword arguments, unused.
Attributes:
-
data_path(Union[str, Path]) –Path to the data, must be a directory.
-
axes(str) –Description of axes in format STCZYX.
-
mean((Optional[float], optional)) –Expected mean of the dataset, by default None.
-
std((Optional[float], optional)) –Expected standard deviation of the dataset, by default None.
-
patch_transform((Optional[Callable], optional)) –Patch transform callable, by default None.
__init__(prediction_config, src_files, read_source_func=read_tiff, **kwargs)
Constructor.
Parameters:
-
prediction_config(InferenceConfig) –Inference configuration.
-
src_files(list of pathlib.Path) –List of data files.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
-
**kwargs(Any, default:{}) –Additional keyword arguments, unused.
Raises:
-
ValueError–If mean and std are not provided in the inference configuration.
__iter__()
Iterate over data source and yield single patch.
Yields:
-
(ndarray, ndarray or None)–Single patch.
IterableTiledPredDataset
Bases: IterableDataset
Tiled prediction dataset.
Parameters:
-
prediction_config(InferenceConfig) –Inference configuration.
-
src_files(list of pathlib.Path) –List of data files.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
-
**kwargs(Any, default:{}) –Additional keyword arguments, unused.
Attributes:
-
data_path(str or Path) –Path to the data, must be a directory.
-
axes(str) –Description of axes in format STCZYX.
-
mean((float, optional)) –Expected mean of the dataset, by default None.
-
std((float, optional)) –Expected standard deviation of the dataset, by default None.
-
patch_transform((Callable, optional)) –Patch transform callable, by default None.
__init__(prediction_config, src_files, read_source_func=read_tiff, **kwargs)
Constructor.
Parameters:
-
prediction_config(InferenceConfig) –Inference configuration.
-
src_files(List[Path]) –List of data files.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
-
**kwargs(Any, default:{}) –Additional keyword arguments, unused.
Raises:
-
ValueError–If mean and std are not provided in the inference configuration.
__iter__()
Iterate over data source and yield single patch.
Yields:
-
Generator of (np.ndarray, np.ndarray or None) and TileInformation tuple–Generator of single tiles.
PathIterableDataset
Bases: IterableDataset
Dataset allowing extracting patches w/o loading whole data into memory.
Parameters:
-
data_config(DataConfig) –Data configuration.
-
src_files(list of pathlib.Path) –List of data files.
-
target_files(list of pathlib.Path, default:None) –Optional list of target files, by default None.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
Attributes:
-
data_path(list of pathlib.Path) –Path to the data, must be a directory.
-
axes(str) –Description of axes in format STCZYX.
__init__(data_config, src_files, target_files=None, read_source_func=read_tiff)
Constructors.
Parameters:
-
data_config(GeneralDataConfig) –Data configuration.
-
src_files(list[Path]) –List of data files.
-
target_files(list[Path] or None, default:None) –Optional list of target files, by default None.
-
read_source_func(Callable, default:read_tiff) –Read source function for custom types, by default read_tiff.
__iter__()
Iterate over data source and yield single patch.
Yields:
-
ndarray–Single patch.
get_data_statistics()
Return training data statistics.
Returns:
-
tuple of list of floats–Means and standard deviations across channels of the training data.
get_number_of_files()
split_dataset(percentage=0.1, minimum_number=5)
Split up dataset in two.
Splits the datest sing a percentage of the data (files) to extract, or the minimum number of the percentage is less than the minimum number.
Parameters:
-
percentage(float, default:0.1) –Percentage of files to split up, by default 0.1.
-
minimum_number(int, default:5) –Minimum number of files to split up, by default 5.
Returns:
-
IterableDataset–Dataset containing the split data.
Raises:
-
ValueError–If the percentage is smaller than 0 or larger than 1.
-
ValueError–If the minimum number is smaller than 1 or larger than the number of files.