Dataset Utils
Files and arrays utils used in the datasets.
WelfordStatistics
Compute Welford statistics iteratively.
The Welford algorithm is used to compute the mean and variance of an array iteratively. Based on the implementation from: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_online_algorithm
finalize()
Finalize the Welford statistics.
Returns:
| Type | Description |
|---|---|
tuple or numpy arrays
|
Final mean and standard deviation. |
update(array, sample_idx)
Update the Welford statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array
|
NDArray
|
Input array. |
required |
sample_idx
|
int
|
Current sample number. |
required |
compute_normalization_stats(image)
Compute mean and standard deviation of an array.
Expected input shape is (S, C, (Z), Y, X). The mean and standard deviation are computed per channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image
|
NDArray
|
Input array. |
required |
Returns:
| Type | Description |
|---|---|
tuple of (list of floats, list of floats)
|
Lists of mean and standard deviation values per channel. |
get_files_size(files)
Get files size in MB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
files
|
list of pathlib.Path
|
List of files. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Total size of the files in MB. |
list_files(data_path, data_type, extension_filter='')
List recursively files in data_path and return a sorted list.
If data_path is a file, its name is validated against the data_type using
fnmatch, and the method returns data_path itself.
By default, if data_type is equal to custom, all files will be listed. To
further filter the files, use extension_filter.
extension_filter must be compatible with fnmatch and Path.rglob, e.g. ".npy"
or ".czi".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_path
|
Union[str, Path]
|
Path to the folder containing the data. |
required |
data_type
|
Union[str, SupportedData]
|
One of the supported data type (e.g. tif, custom). |
required |
extension_filter
|
str
|
Extension filter, by default "". |
''
|
Returns:
| Type | Description |
|---|---|
list[Path]
|
list of pathlib.Path objects. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the data path does not exist. |
ValueError
|
If the data path is empty or no files with the extension were found. |
ValueError
|
If the file does not match the requested extension. |
validate_source_target_files(src_files, tar_files)
Validate source and target path lists.
The two lists should have the same number of files, and the filenames should match.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src_files
|
list of pathlib.Path
|
List of source files. |
required |
tar_files
|
list of pathlib.Path
|
List of target files. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the number of files in source and target folders is not the same. |
ValueError
|
If some filenames in Train and target folders are not the same. |