Skip to content

Dataset Utils

Source

Files and arrays utils used in the datasets.

WelfordStatistics

Compute Welford statistics iteratively.

The Welford algorithm is used to compute the mean and variance of an array iteratively. Based on the implementation from: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm

finalize()

Finalize the Welford statistics.

Returns:

Type Description
tuple or numpy arrays

Final mean and standard deviation.

update(array, sample_idx)

Update the Welford statistics.

Parameters:

Name Type Description Default
array NDArray

Input array.

required
sample_idx int

Current sample number.

required

compute_normalization_stats(image)

Compute mean and standard deviation of an array.

Expected input shape is (S, C, (Z), Y, X). The mean and standard deviation are computed per channel.

Parameters:

Name Type Description Default
image NDArray

Input array.

required

Returns:

Type Description
tuple of (list of floats, list of floats)

Lists of mean and standard deviation values per channel.

get_files_size(files)

Get files size in MB.

Parameters:

Name Type Description Default
files list of pathlib.Path

List of files.

required

Returns:

Type Description
float

Total size of the files in MB.

list_files(data_path, data_type, extension_filter='')

List recursively files in data_path and return a sorted list.

If data_path is a file, its name is validated against the data_type using fnmatch, and the method returns data_path itself.

By default, if data_type is equal to custom, all files will be listed. To further filter the files, use extension_filter.

extension_filter must be compatible with fnmatch and Path.rglob, e.g. ".npy" or ".czi".

Parameters:

Name Type Description Default
data_path Union[str, Path]

Path to the folder containing the data.

required
data_type Union[str, SupportedData]

One of the supported data type (e.g. tif, custom).

required
extension_filter str

Extension filter, by default "".

''

Returns:

Type Description
list[Path]

list of pathlib.Path objects.

Raises:

Type Description
FileNotFoundError

If the data path does not exist.

ValueError

If the data path is empty or no files with the extension were found.

ValueError

If the file does not match the requested extension.

validate_source_target_files(src_files, tar_files)

Validate source and target path lists.

The two lists should have the same number of files, and the filenames should match.

Parameters:

Name Type Description Default
src_files list of pathlib.Path

List of source files.

required
tar_files list of pathlib.Path

List of target files.

required

Raises:

Type Description
ValueError

If the number of files in source and target folders is not the same.

ValueError

If some filenames in Train and target folders are not the same.