Skip to content

Microsplit Data Module

Source

MicroSplit data module for training and validation.

MicroSplitDataModule

Bases: LightningDataModule

Lightning DataModule for MicroSplit-style datasets.

Matches the interface of TrainDataModule, but internally uses original MicroSplit dataset logic.

Parameters:

  • data_config (MicroSplitDataConfig) –

    Configuration for the MicroSplit dataset.

  • train_data (str) –

    Path to training data directory.

  • val_data (str, default: None ) –

    Path to validation data directory.

  • train_data_target (str, default: None ) –

    Path to training target data.

  • val_data_target (str, default: None ) –

    Path to validation target data.

  • read_source_func (Callable, default: None ) –

    Function to read source data.

  • extension_filter (str, default: '' ) –

    File extension filter.

  • val_percentage (float, default: 0.1 ) –

    Percentage of data to use for validation, by default 0.1.

  • val_minimum_split (int, default: 5 ) –

    Minimum number of samples for validation split, by default 5.

  • use_in_memory (bool, default: True ) –

    Whether to use in-memory dataset, by default True.

__init__(data_config, train_data, val_data=None, train_data_target=None, val_data_target=None, read_source_func=None, extension_filter='', val_percentage=0.1, val_minimum_split=5, use_in_memory=True)

Initialize MicroSplitDataModule.

Parameters:

  • data_config (MicroSplitDataConfig) –

    Configuration for the MicroSplit dataset.

  • train_data (str) –

    Path to training data directory.

  • val_data (str, default: None ) –

    Path to validation data directory.

  • train_data_target (str, default: None ) –

    Path to training target data.

  • val_data_target (str, default: None ) –

    Path to validation target data.

  • read_source_func (Callable, default: None ) –

    Function to read source data.

  • extension_filter (str, default: '' ) –

    File extension filter.

  • val_percentage (float, default: 0.1 ) –

    Percentage of data to use for validation, by default 0.1.

  • val_minimum_split (int, default: 5 ) –

    Minimum number of samples for validation split, by default 5.

  • use_in_memory (bool, default: True ) –

    Whether to use in-memory dataset, by default True.

get_data_stats()

Get data statistics.

Returns:

  • tuple[dict, dict]

    A tuple containing two dictionaries: - data_mean: mean values for input and target - data_std: standard deviation values for input and target

train_dataloader()

Create a dataloader for training.

Returns:

  • DataLoader

    Training dataloader.

val_dataloader()

Create a dataloader for validation.

Returns:

  • DataLoader

    Validation dataloader.

MicroSplitPredictDataModule

Bases: LightningDataModule

Lightning DataModule for MicroSplit-style prediction datasets.

Matches the interface of PredictDataModule, but internally uses MicroSplit dataset logic for prediction.

Parameters:

  • pred_config (MicroSplitDataConfig) –

    Configuration for MicroSplit prediction.

  • pred_data (str or Path or ndarray) –

    Prediction data, can be a path to a folder, a file or a numpy array.

  • read_source_func (Callable, default: None ) –

    Function to read custom types.

  • extension_filter (str, default: '' ) –

    Filter to filter file extensions for custom types.

  • dataloader_params (dict, default: None ) –

    Dataloader parameters.

__init__(pred_config, pred_data, read_source_func=None, extension_filter='', dataloader_params=None)

Constructor for MicroSplit prediction data module.

Parameters:

  • pred_config (MicroSplitDataConfig) –

    Configuration for MicroSplit prediction.

  • pred_data (str or Path or ndarray) –

    Prediction data, can be a path to a folder, a file or a numpy array.

  • read_source_func (Callable, default: None ) –

    Function to read custom types, by default None.

  • extension_filter (str, default: '' ) –

    Filter to filter file extensions for custom types, by default "".

  • dataloader_params (dict, default: None ) –

    Dataloader parameters, by default {}.

predict_dataloader()

Create a dataloader for prediction.

Returns:

  • DataLoader

    Prediction dataloader.

create_microsplit_predict_datamodule(pred_data, tile_size, batch_size=1, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, data_stats=None, tiling_mode=TilingMode.ShiftBoundary, read_source_func=None, extension_filter='', dataloader_params=None, **dataset_kwargs)

Create a MicroSplitPredictDataModule for microSplit-style prediction datasets.

Parameters:

  • pred_data (str or Path or ndarray) –

    Prediction data, can be a path to a folder, a file or a numpy array.

  • tile_size (tuple) –

    Size of one tile of data.

  • batch_size (int, default: 1 ) –

    Batch size for prediction dataloader.

  • num_channels (int, default: 2 ) –

    Number of channels in the input.

  • depth3D (int, default: 1 ) –

    Number of slices in 3D.

  • grid_size (tuple, default: None ) –

    Grid size for patch extraction.

  • multiscale_count (int, default: None ) –

    Number of LC scales.

  • data_stats (tuple, default: None ) –

    Data statistics, by default None.

  • tiling_mode (TilingMode, default: ShiftBoundary ) –

    Tiling mode for patch extraction.

  • read_source_func (Callable, default: None ) –

    Function to read the source data.

  • extension_filter (str, default: '' ) –

    File extension filter.

  • dataloader_params (dict, default: None ) –

    Parameters for prediction dataloader.

  • **dataset_kwargs

    Additional arguments passed to MicroSplitDataConfig.

Returns:

create_microsplit_train_datamodule(train_data, patch_size, batch_size, val_data=None, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, tiling_mode=TilingMode.ShiftBoundary, extension_filter='', val_percentage=0.1, val_minimum_split=5, use_in_memory=True, transforms=None, train_dataloader_params=None, val_dataloader_params=None, **dataset_kwargs)

Create a MicroSplitDataModule for MicroSplit-style datasets.

Parameters:

  • train_data (str) –

    Path to training data.

  • patch_size (tuple) –

    Size of one patch of data.

  • batch_size (int) –

    Batch size for dataloaders.

  • val_data (str, default: None ) –

    Path to validation data.

  • num_channels (int, default: 2 ) –

    Number of channels in the input.

  • depth3D (int, default: 1 ) –

    Number of slices in 3D.

  • grid_size (tuple, default: None ) –

    Grid size for patch extraction.

  • multiscale_count (int, default: None ) –

    Number of LC scales.

  • tiling_mode (TilingMode, default: ShiftBoundary ) –

    Tiling mode for patch extraction.

  • extension_filter (str, default: '' ) –

    File extension filter.

  • val_percentage (float, default: 0.1 ) –

    Percentage of training data to use for validation.

  • val_minimum_split (int, default: 5 ) –

    Minimum number of patches/files for validation split.

  • use_in_memory (bool, default: True ) –

    Use in-memory dataset if possible.

  • transforms (list, default: None ) –

    List of transforms to apply.

  • train_dataloader_params (dict, default: None ) –

    Parameters for training dataloader.

  • val_dataloader_params (dict, default: None ) –

    Parameters for validation dataloader.

  • **dataset_kwargs

    Additional arguments passed to DatasetConfig.

Returns:

get_datasplit_tuples(val_fraction, test_fraction, data_length)

Get train/val/test indices for data splitting.

Parameters:

  • val_fraction (float or None) –

    Fraction of data to use for validation.

  • test_fraction (float or None) –

    Fraction of data to use for testing.

  • data_length (int) –

    Total length of the dataset.

Returns:

  • tuple[ndarray, ndarray, ndarray]

    Training, validation, and test indices.

get_train_val_data(data_config, datadir, datasplit_type, val_fraction=None, test_fraction=None, allow_generation=None, **kwargs)

Load and split data according to configuration.

Parameters:

  • data_config (MicroSplitDataConfig) –

    Data configuration object.

  • datadir (str or Path) –

    Path to the data directory.

  • datasplit_type (DataSplitType) –

    Type of data split to return.

  • val_fraction (float, default: None ) –

    Fraction of data to use for validation.

  • test_fraction (float, default: None ) –

    Fraction of data to use for testing.

  • allow_generation (bool, default: None ) –

    Whether to allow data generation.

  • **kwargs

    Additional keyword arguments.

Returns:

  • ndarray

    Split data array.

load_data(datadir)

Load data from a directory containing channel subdirectories with image files.

Parameters:

  • datadir (str or Path) –

    Path to the data directory containing channel subdirectories.

Returns:

  • ndarray

    Stacked array of all channels' data.

load_one_file(fpath)

Load a single 2D image file.

Parameters:

  • fpath (str or Path) –

    Path to the image file.

Returns:

  • ndarray

    Reshaped image data.