Data Module Utils
Utility functions for file and paths solver.
InputType = ArrayInput | PathInput
module-attribute
Type of input data passed to the dataset.
convert_paths_to_pathlib(input_data, target_data=None)
Create a list of file paths from the input and target data.
Parameters:
-
input_data(Sequence[str | Path]) –Input data, can be a path to a folder, or a list of paths.
-
target_data(Sequence[str | Path] | None, default:None) –Target data, can be None, a path to a folder, or a list of paths.
Returns:
initialize_data_pair(data_type, input_data, target_data=None, loading=None)
Initialize a pair of input and target data.
Parameters:
-
data_type(Literal['array', 'tiff', 'zarr', 'czi', 'custom']) –The type of data to initialize.
-
input_data(InputType) –Input data, can be None, a path to a folder, a list of paths, or a numpy array.
-
target_data(InputType | None, default:None) –Target data, can be None, a path to a folder, a list of paths, or a numpy array.
-
loading(ReadFuncLoading | ImageStackLoading | None, default:None) –The type of loading used for custom data.
ReadFuncLoadingis the use of a simple function that will load full images into memory.ImageStackLoadingis for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not customloadingshould beNone.
Returns:
-
list[ndarray] | list[Path]–Initialized input data. For file paths, returns a list of Path objects. For numpy arrays, returns the arrays directly.
-
list[ndarray] | list[Path] | None–Initialized target data. For file paths, returns a list of Path objects. For numpy arrays, returns the arrays directly. Returns None if target_data is None.
list_files(data_path, data_type, extension_filter='')
List recursively files in data_path and return a sorted list.
If data_path is a file, its name is validated against the data_type using
fnmatch, and the method returns data_path itself.
By default, if data_type is equal to custom, all files will be listed. To
further filter the files, use extension_filter.
extension_filter must be compatible with fnmatch and Path.rglob, e.g. ".npy"
or ".czi".
Parameters:
-
data_path(Union[str, Path]) –Path to the folder containing the data.
-
data_type(Union[str, SupportedData]) –One of the supported data type (e.g. tif, custom).
-
extension_filter(str, default:'') –Extension filter, by default "".
Returns:
Raises:
-
FileNotFoundError–If the data path does not exist.
-
ValueError–If the data path is empty or no files with the extension were found.
-
ValueError–If the file does not match the requested extension.
list_files_in_directory(data_type, input_data, target_data=None, extension_filter='')
List files from input and target directories.
Parameters:
-
data_type(Literal['tiff', 'zarr', 'czi', 'custom']) –The type of data to validate.
-
input_data(str | Path) –Input data, can be a path to a folder, a list of paths, or a numpy array.
-
target_data(str | Path | None, default:None) –Target data, can be None, a path to a folder, a list of paths, or a numpy array.
-
extension_filter(str, default:"") –File extension filter to apply when listing files.
Returns:
validate_array_input(input_data, target_data)
Validate if the input data is a numpy array.
Parameters:
-
input_data(ArrayInput) –Input data, can be a list of or a single numpy array.
-
target_data(ArrayInput | None) –Target data, can be a list of or a single numpy array, or None. array.
Returns:
-
list[ndarray]–Validated input data.
-
list[ndarray] | None–Validated target data, None if the target data is None.
Raises:
-
ValueError–If the input data is not a numpy array or a list of numpy arrays.
validate_input_target_type_consistency(input_data, target_data)
Validate if the input and target data types are consistent.
Parameters:
-
input_data(InputType) –Input data, can be a path to a folder, a list of paths, or a numpy array.
-
target_data(InputType | None) –Target data, can be None, a path to a folder, a list of paths, or a numpy array.
Raises:
-
ValueError–If the input and target data types are not consistent.
validate_path_input(data_type, input_data, target_data, extension_filter='')
Validate if the input data is a path or a list of paths.
Parameters:
-
data_type(Literal['tiff', 'zarr', 'czi', 'custom']) –The type of data to validate.
-
input_data(PathInput) –Input data, can be a path to a folder, a list of paths.
-
target_data(PathInput | None) –Target data, can be None, a path to a folder, a list of paths.
-
extension_filter(str, default:"") –File extension filter to apply when listing files.
Returns:
-
list[Path]–A list of file paths for input data.
-
list[Path] | None–A list of file paths for target data, or None if target_data is None.
Raises:
-
ValueError–If the input data is not a path or a list of paths.
validate_source_target_files(src_files, tar_files)
Validate source and target path lists.
The two lists should have the same number of files, and the filenames should match.
Parameters:
-
src_files(list of pathlib.Path) –List of source files.
-
tar_files(list of pathlib.Path) –List of target files.
Raises:
-
ValueError–If the number of files in source and target folders is not the same.
-
ValueError–If some filenames in Train and target folders are not the same.
validate_zarr_input(input_data, target_data)
Validate if the input data corresponds a zarr input.
Parameters:
-
input_data(PathInput) –Input data, can be a path to a folder, to zarr file, a URI pointing to a zarr dataset, or a list.
-
target_data(PathInput | None) –Target data, can be None.
Returns:
-
list[str] or list[Path]–A list of zarr URIs or path for input data.
-
list[str] or list[Path] | None–A list of zarr URIs or paths for target data, or None if target_data is None.
Raises:
-
ValueError–If the input and target data types are not consistent.
-
ValueError–If the input data is not a zarr URI or path, or a list of zarr URIs or paths.