Skip to content

Dataset

Source

Dataset module.

InMemoryDataset

Bases: Dataset

Dataset storing data in memory and allowing generating patches from it.

Parameters:

Name Type Description Default
data_config CAREamics DataConfig

(see careamics.config.data_model.DataConfig) Data configuration.

required
inputs ndarray or list[Path]

Input data.

required
input_target ndarray or list[Path]

Target data, by default None.

None
read_source_func Callable

Read source function for custom types, by default read_tiff.

read_tiff
**kwargs Any

Additional keyword arguments, unused.

{}

get_data_statistics()

Return training data statistics.

This does not return the target data statistics, only those of the input.

Returns:

Type Description
tuple of list of floats

Means and standard deviations across channels of the training data.

split_dataset(percentage=0.1, minimum_patches=1)

Split a new dataset away from the current one.

This method is used to extract random validation patches from the dataset.

Parameters:

Name Type Description Default
percentage float

Percentage of patches to extract, by default 0.1.

0.1
minimum_patches int

Minimum number of patches to extract, by default 5.

1

Returns:

Type Description
CAREamics InMemoryDataset

New dataset with the extracted patches.

Raises:

Type Description
ValueError

If percentage is not between 0 and 1.

ValueError

If minimum_number is not between 1 and the number of patches.

InMemoryPredDataset

Bases: Dataset

Simple prediction dataset returning images along the sample axis.

Parameters:

Name Type Description Default
prediction_config InferenceConfig

Prediction configuration.

required
inputs NDArray

Input data.

required

InMemoryTiledPredDataset

Bases: Dataset

Prediction dataset storing data in memory and returning tiles of each image.

Parameters:

Name Type Description Default
prediction_config InferenceConfig

Prediction configuration.

required
inputs NDArray

Input data.

required

IterablePredDataset

Bases: IterableDataset

Simple iterable prediction dataset.

Parameters:

Name Type Description Default
prediction_config InferenceConfig

Inference configuration.

required
src_files List[Path]

List of data files.

required
read_source_func Callable

Read source function for custom types, by default read_tiff.

read_tiff
**kwargs Any

Additional keyword arguments, unused.

{}

Attributes:

Name Type Description
data_path Union[str, Path]

Path to the data, must be a directory.

axes str

Description of axes in format STCZYX.

mean (Optional[float], optional)

Expected mean of the dataset, by default None.

std (Optional[float], optional)

Expected standard deviation of the dataset, by default None.

patch_transform (Optional[Callable], optional)

Patch transform callable, by default None.

IterableTiledPredDataset

Bases: IterableDataset

Tiled prediction dataset.

Parameters:

Name Type Description Default
prediction_config InferenceConfig

Inference configuration.

required
src_files list of pathlib.Path

List of data files.

required
read_source_func Callable

Read source function for custom types, by default read_tiff.

read_tiff
**kwargs Any

Additional keyword arguments, unused.

{}

Attributes:

Name Type Description
data_path str or Path

Path to the data, must be a directory.

axes str

Description of axes in format STCZYX.

mean (float, optional)

Expected mean of the dataset, by default None.

std (float, optional)

Expected standard deviation of the dataset, by default None.

patch_transform (Callable, optional)

Patch transform callable, by default None.

PathIterableDataset

Bases: IterableDataset

Dataset allowing extracting patches w/o loading whole data into memory.

Parameters:

Name Type Description Default
data_config DataConfig

Data configuration.

required
src_files list of pathlib.Path

List of data files.

required
target_files list of pathlib.Path

Optional list of target files, by default None.

None
read_source_func Callable

Read source function for custom types, by default read_tiff.

read_tiff

Attributes:

Name Type Description
data_path list of pathlib.Path

Path to the data, must be a directory.

axes str

Description of axes in format STCZYX.

get_data_statistics()

Return training data statistics.

Returns:

Type Description
tuple of list of floats

Means and standard deviations across channels of the training data.

get_number_of_files()

Return the number of files in the dataset.

Returns:

Type Description
int

Number of files in the dataset.

split_dataset(percentage=0.1, minimum_number=5)

Split up dataset in two.

Splits the datest sing a percentage of the data (files) to extract, or the minimum number of the percentage is less than the minimum number.

Parameters:

Name Type Description Default
percentage float

Percentage of files to split up, by default 0.1.

0.1
minimum_number int

Minimum number of files to split up, by default 5.

5

Returns:

Type Description
IterableDataset

Dataset containing the split data.

Raises:

Type Description
ValueError

If the percentage is smaller than 0 or larger than 1.

ValueError

If the minimum number is smaller than 1 or larger than the number of files.