Skip to content

Dataset

Source

Dataset module.

InMemoryDataset

Bases: Dataset

Dataset storing data in memory and allowing generating patches from it.

Parameters:

  • data_config (CAREamics DataConfig) –

    (see careamics.config.data_model.DataConfig) Data configuration.

  • inputs (ndarray or list[Path]) –

    Input data.

  • input_target (ndarray or list[Path], default: None ) –

    Target data, by default None.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments, unused.

__getitem__(index)

Return the patch corresponding to the provided index.

Parameters:

  • index (int) –

    Index of the patch to return.

Returns:

  • tuple of numpy.ndarray

    Patch.

Raises:

  • ValueError

    If dataset mean and std are not set.

__init__(data_config, inputs, input_target=None, read_source_func=read_tiff, **kwargs)

Constructor.

Parameters:

  • data_config (GeneralDataConfig) –

    Data configuration.

  • inputs (ndarray or list[Path]) –

    Input data.

  • input_target (ndarray or list[Path], default: None ) –

    Target data, by default None.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments, unused.

__len__()

Return the length of the dataset.

Returns:

  • int

    Length of the dataset.

get_data_statistics()

Return training data statistics.

This does not return the target data statistics, only those of the input.

Returns:

  • tuple of list of floats

    Means and standard deviations across channels of the training data.

split_dataset(percentage=0.1, minimum_patches=1)

Split a new dataset away from the current one.

This method is used to extract random validation patches from the dataset.

Parameters:

  • percentage (float, default: 0.1 ) –

    Percentage of patches to extract, by default 0.1.

  • minimum_patches (int, default: 1 ) –

    Minimum number of patches to extract, by default 5.

Returns:

  • CAREamics InMemoryDataset

    New dataset with the extracted patches.

Raises:

  • ValueError

    If percentage is not between 0 and 1.

  • ValueError

    If minimum_number is not between 1 and the number of patches.

InMemoryPredDataset

Bases: Dataset

Simple prediction dataset returning images along the sample axis.

Parameters:

  • prediction_config (InferenceConfig) –

    Prediction configuration.

  • inputs (NDArray) –

    Input data.

__getitem__(index)

Return the patch corresponding to the provided index.

Parameters:

  • index (int) –

    Index of the patch to return.

Returns:

  • tuple(ndarray, ...)

    Transformed patch.

__init__(prediction_config, inputs)

Constructor.

Parameters:

  • prediction_config (InferenceConfig) –

    Prediction configuration.

  • inputs (NDArray) –

    Input data.

Raises:

__len__()

Return the length of the dataset.

Returns:

  • int

    Length of the dataset.

InMemoryTiledPredDataset

Bases: Dataset

Prediction dataset storing data in memory and returning tiles of each image.

Parameters:

  • prediction_config (InferenceConfig) –

    Prediction configuration.

  • inputs (NDArray) –

    Input data.

__getitem__(index)

Return the patch corresponding to the provided index.

Parameters:

  • index (int) –

    Index of the patch to return.

Returns:

  • tuple of NDArray and TileInformation

    Transformed patch.

__init__(prediction_config, inputs)

Constructor.

Parameters:

  • prediction_config (InferenceConfig) –

    Prediction configuration.

  • inputs (NDArray) –

    Input data.

Raises:

__len__()

Return the length of the dataset.

Returns:

  • int

    Length of the dataset.

IterablePredDataset

Bases: IterableDataset

Simple iterable prediction dataset.

Parameters:

  • prediction_config (InferenceConfig) –

    Inference configuration.

  • src_files (List[Path]) –

    List of data files.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments, unused.

Attributes:

  • data_path (Union[str, Path]) –

    Path to the data, must be a directory.

  • axes (str) –

    Description of axes in format STCZYX.

  • mean ((Optional[float], optional)) –

    Expected mean of the dataset, by default None.

  • std ((Optional[float], optional)) –

    Expected standard deviation of the dataset, by default None.

  • patch_transform ((Optional[Callable], optional)) –

    Patch transform callable, by default None.

__init__(prediction_config, src_files, read_source_func=read_tiff, **kwargs)

Constructor.

Parameters:

  • prediction_config (InferenceConfig) –

    Inference configuration.

  • src_files (list of pathlib.Path) –

    List of data files.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments, unused.

Raises:

  • ValueError

    If mean and std are not provided in the inference configuration.

__iter__()

Iterate over data source and yield single patch.

Yields:

  • (ndarray, ndarray or None)

    Single patch.

IterableTiledPredDataset

Bases: IterableDataset

Tiled prediction dataset.

Parameters:

  • prediction_config (InferenceConfig) –

    Inference configuration.

  • src_files (list of pathlib.Path) –

    List of data files.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments, unused.

Attributes:

  • data_path (str or Path) –

    Path to the data, must be a directory.

  • axes (str) –

    Description of axes in format STCZYX.

  • mean ((float, optional)) –

    Expected mean of the dataset, by default None.

  • std ((float, optional)) –

    Expected standard deviation of the dataset, by default None.

  • patch_transform ((Callable, optional)) –

    Patch transform callable, by default None.

__init__(prediction_config, src_files, read_source_func=read_tiff, **kwargs)

Constructor.

Parameters:

  • prediction_config (InferenceConfig) –

    Inference configuration.

  • src_files (List[Path]) –

    List of data files.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

  • **kwargs (Any, default: {} ) –

    Additional keyword arguments, unused.

Raises:

  • ValueError

    If mean and std are not provided in the inference configuration.

__iter__()

Iterate over data source and yield single patch.

Yields:

  • Generator of (np.ndarray, np.ndarray or None) and TileInformation tuple

    Generator of single tiles.

PathIterableDataset

Bases: IterableDataset

Dataset allowing extracting patches w/o loading whole data into memory.

Parameters:

  • data_config (DataConfig) –

    Data configuration.

  • src_files (list of pathlib.Path) –

    List of data files.

  • target_files (list of pathlib.Path, default: None ) –

    Optional list of target files, by default None.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

Attributes:

  • data_path (list of pathlib.Path) –

    Path to the data, must be a directory.

  • axes (str) –

    Description of axes in format STCZYX.

__init__(data_config, src_files, target_files=None, read_source_func=read_tiff)

Constructors.

Parameters:

  • data_config (GeneralDataConfig) –

    Data configuration.

  • src_files (list[Path]) –

    List of data files.

  • target_files (list[Path] or None, default: None ) –

    Optional list of target files, by default None.

  • read_source_func (Callable, default: read_tiff ) –

    Read source function for custom types, by default read_tiff.

__iter__()

Iterate over data source and yield single patch.

Yields:

  • ndarray

    Single patch.

get_data_statistics()

Return training data statistics.

Returns:

  • tuple of list of floats

    Means and standard deviations across channels of the training data.

get_number_of_files()

Return the number of files in the dataset.

Returns:

  • int

    Number of files in the dataset.

split_dataset(percentage=0.1, minimum_number=5)

Split up dataset in two.

Splits the datest sing a percentage of the data (files) to extract, or the minimum number of the percentage is less than the minimum number.

Parameters:

  • percentage (float, default: 0.1 ) –

    Percentage of files to split up, by default 0.1.

  • minimum_number (int, default: 5 ) –

    Minimum number of files to split up, by default 5.

Returns:

  • IterableDataset

    Dataset containing the split data.

Raises:

  • ValueError

    If the percentage is smaller than 0 or larger than 1.

  • ValueError

    If the minimum number is smaller than 1 or larger than the number of files.