Skip to content

Data Module Utils

Source

Utility functions for file and paths solver.

InputType = ArrayInput | PathInput module-attribute

Type of input data passed to the dataset.

convert_paths_to_pathlib(input_data, target_data=None)

Create a list of file paths from the input and target data.

Parameters:

  • input_data (Sequence[str | Path]) –

    Input data, can be a path to a folder, or a list of paths.

  • target_data (Sequence[str | Path] | None, default: None ) –

    Target data, can be None, a path to a folder, or a list of paths.

Returns:

  • list[Path]

    A list of file paths for input data.

  • list[Path] | None

    A list of file paths for target data, or None if target_data is None.

initialize_data_pair(data_type, input_data, target_data=None, loading=None)

Initialize a pair of input and target data.

Parameters:

  • data_type (Literal['array', 'tiff', 'zarr', 'czi', 'custom']) –

    The type of data to initialize.

  • input_data (InputType) –

    Input data, can be None, a path to a folder, a list of paths, or a numpy array.

  • target_data (InputType | None, default: None ) –

    Target data, can be None, a path to a folder, a list of paths, or a numpy array.

  • loading (ReadFuncLoading | ImageStackLoading | None, default: None ) –

    The type of loading used for custom data. ReadFuncLoading is the use of a simple function that will load full images into memory. ImageStackLoading is for custom chunked or memory-mapped next-generation file formats enabling single patches to be read from disk at a time. If the data type is not custom loading should be None.

Returns:

  • list[ndarray] | list[Path]

    Initialized input data. For file paths, returns a list of Path objects. For numpy arrays, returns the arrays directly.

  • list[ndarray] | list[Path] | None

    Initialized target data. For file paths, returns a list of Path objects. For numpy arrays, returns the arrays directly. Returns None if target_data is None.

list_files(data_path, data_type, extension_filter='')

List recursively files in data_path and return a sorted list.

If data_path is a file, its name is validated against the data_type using fnmatch, and the method returns data_path itself.

By default, if data_type is equal to custom, all files will be listed. To further filter the files, use extension_filter.

extension_filter must be compatible with fnmatch and Path.rglob, e.g. ".npy" or ".czi".

Parameters:

  • data_path (Union[str, Path]) –

    Path to the folder containing the data.

  • data_type (Union[str, SupportedData]) –

    One of the supported data type (e.g. tif, custom).

  • extension_filter (str, default: '' ) –

    Extension filter, by default "".

Returns:

  • list[Path]

    list of pathlib.Path objects.

Raises:

  • FileNotFoundError

    If the data path does not exist.

  • ValueError

    If the data path is empty or no files with the extension were found.

  • ValueError

    If the file does not match the requested extension.

list_files_in_directory(data_type, input_data, target_data=None, extension_filter='')

List files from input and target directories.

Parameters:

  • data_type (Literal['tiff', 'zarr', 'czi', 'custom']) –

    The type of data to validate.

  • input_data (str | Path) –

    Input data, can be a path to a folder, a list of paths, or a numpy array.

  • target_data (str | Path | None, default: None ) –

    Target data, can be None, a path to a folder, a list of paths, or a numpy array.

  • extension_filter (str, default: "" ) –

    File extension filter to apply when listing files.

Returns:

  • list[Path]

    A list of file paths for input data.

  • list[Path] | None

    A list of file paths for target data, or None if target_data is None.

validate_array_input(input_data, target_data)

Validate if the input data is a numpy array.

Parameters:

  • input_data (ArrayInput) –

    Input data, can be a list of or a single numpy array.

  • target_data (ArrayInput | None) –

    Target data, can be a list of or a single numpy array, or None. array.

Returns:

  • list[ndarray]

    Validated input data.

  • list[ndarray] | None

    Validated target data, None if the target data is None.

Raises:

  • ValueError

    If the input data is not a numpy array or a list of numpy arrays.

validate_input_target_type_consistency(input_data, target_data)

Validate if the input and target data types are consistent.

Parameters:

  • input_data (InputType) –

    Input data, can be a path to a folder, a list of paths, or a numpy array.

  • target_data (InputType | None) –

    Target data, can be None, a path to a folder, a list of paths, or a numpy array.

Raises:

  • ValueError

    If the input and target data types are not consistent.

validate_path_input(data_type, input_data, target_data, extension_filter='')

Validate if the input data is a path or a list of paths.

Parameters:

  • data_type (Literal['tiff', 'zarr', 'czi', 'custom']) –

    The type of data to validate.

  • input_data (PathInput) –

    Input data, can be a path to a folder, a list of paths.

  • target_data (PathInput | None) –

    Target data, can be None, a path to a folder, a list of paths.

  • extension_filter (str, default: "" ) –

    File extension filter to apply when listing files.

Returns:

  • list[Path]

    A list of file paths for input data.

  • list[Path] | None

    A list of file paths for target data, or None if target_data is None.

Raises:

  • ValueError

    If the input data is not a path or a list of paths.

validate_source_target_files(src_files, tar_files)

Validate source and target path lists.

The two lists should have the same number of files, and the filenames should match.

Parameters:

  • src_files (list of pathlib.Path) –

    List of source files.

  • tar_files (list of pathlib.Path) –

    List of target files.

Raises:

  • ValueError

    If the number of files in source and target folders is not the same.

  • ValueError

    If some filenames in Train and target folders are not the same.

validate_zarr_input(input_data, target_data)

Validate if the input data corresponds a zarr input.

Parameters:

  • input_data (PathInput) –

    Input data, can be a path to a folder, to zarr file, a URI pointing to a zarr dataset, or a list.

  • target_data (PathInput | None) –

    Target data, can be None.

Returns:

  • list[str] or list[Path]

    A list of zarr URIs or path for input data.

  • list[str] or list[Path] | None

    A list of zarr URIs or paths for target data, or None if target_data is None.

Raises:

  • ValueError

    If the input and target data types are not consistent.

  • ValueError

    If the input data is not a zarr URI or path, or a list of zarr URIs or paths.