Microsplit Data Module
MicroSplit data module for training and validation.
MicroSplitDataModule
Bases: LightningDataModule
Lightning DataModule for MicroSplit-style datasets.
Matches the interface of TrainDataModule, but internally uses original MicroSplit dataset logic.
Parameters:
-
data_config(MicroSplitDataConfig) –Configuration for the MicroSplit dataset.
-
train_data(str) –Path to training data directory.
-
val_data(str, default:None) –Path to validation data directory.
-
train_data_target(str, default:None) –Path to training target data.
-
val_data_target(str, default:None) –Path to validation target data.
-
read_source_func(Callable, default:None) –Function to read source data.
-
extension_filter(str, default:'') –File extension filter.
-
val_percentage(float, default:0.1) –Percentage of data to use for validation, by default 0.1.
-
val_minimum_split(int, default:5) –Minimum number of samples for validation split, by default 5.
-
use_in_memory(bool, default:True) –Whether to use in-memory dataset, by default True.
__init__(data_config, train_data, val_data=None, train_data_target=None, val_data_target=None, read_source_func=None, extension_filter='', val_percentage=0.1, val_minimum_split=5, use_in_memory=True)
Initialize MicroSplitDataModule.
Parameters:
-
data_config(MicroSplitDataConfig) –Configuration for the MicroSplit dataset.
-
train_data(str) –Path to training data directory.
-
val_data(str, default:None) –Path to validation data directory.
-
train_data_target(str, default:None) –Path to training target data.
-
val_data_target(str, default:None) –Path to validation target data.
-
read_source_func(Callable, default:None) –Function to read source data.
-
extension_filter(str, default:'') –File extension filter.
-
val_percentage(float, default:0.1) –Percentage of data to use for validation, by default 0.1.
-
val_minimum_split(int, default:5) –Minimum number of samples for validation split, by default 5.
-
use_in_memory(bool, default:True) –Whether to use in-memory dataset, by default True.
get_data_stats()
train_dataloader()
Create a dataloader for training.
Returns:
-
DataLoader–Training dataloader.
val_dataloader()
Create a dataloader for validation.
Returns:
-
DataLoader–Validation dataloader.
MicroSplitPredictDataModule
Bases: LightningDataModule
Lightning DataModule for MicroSplit-style prediction datasets.
Matches the interface of PredictDataModule, but internally uses MicroSplit dataset logic for prediction.
Parameters:
-
pred_config(MicroSplitDataConfig) –Configuration for MicroSplit prediction.
-
pred_data(str or Path or ndarray) –Prediction data, can be a path to a folder, a file or a numpy array.
-
read_source_func(Callable, default:None) –Function to read custom types.
-
extension_filter(str, default:'') –Filter to filter file extensions for custom types.
-
dataloader_params(dict, default:None) –Dataloader parameters.
__init__(pred_config, pred_data, read_source_func=None, extension_filter='', dataloader_params=None)
Constructor for MicroSplit prediction data module.
Parameters:
-
pred_config(MicroSplitDataConfig) –Configuration for MicroSplit prediction.
-
pred_data(str or Path or ndarray) –Prediction data, can be a path to a folder, a file or a numpy array.
-
read_source_func(Callable, default:None) –Function to read custom types, by default None.
-
extension_filter(str, default:'') –Filter to filter file extensions for custom types, by default "".
-
dataloader_params(dict, default:None) –Dataloader parameters, by default {}.
predict_dataloader()
Create a dataloader for prediction.
Returns:
-
DataLoader–Prediction dataloader.
create_microsplit_predict_datamodule(pred_data, tile_size, batch_size=1, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, data_stats=None, tiling_mode=TilingMode.ShiftBoundary, read_source_func=None, extension_filter='', dataloader_params=None, **dataset_kwargs)
Create a MicroSplitPredictDataModule for microSplit-style prediction datasets.
Parameters:
-
pred_data(str or Path or ndarray) –Prediction data, can be a path to a folder, a file or a numpy array.
-
tile_size(tuple) –Size of one tile of data.
-
batch_size(int, default:1) –Batch size for prediction dataloader.
-
num_channels(int, default:2) –Number of channels in the input.
-
depth3D(int, default:1) –Number of slices in 3D.
-
grid_size(tuple, default:None) –Grid size for patch extraction.
-
multiscale_count(int, default:None) –Number of LC scales.
-
data_stats(tuple, default:None) –Data statistics, by default None.
-
tiling_mode(TilingMode, default:ShiftBoundary) –Tiling mode for patch extraction.
-
read_source_func(Callable, default:None) –Function to read the source data.
-
extension_filter(str, default:'') –File extension filter.
-
dataloader_params(dict, default:None) –Parameters for prediction dataloader.
-
**dataset_kwargs–Additional arguments passed to MicroSplitDataConfig.
Returns:
-
MicroSplitPredictDataModule–Configured MicroSplitPredictDataModule instance.
create_microsplit_train_datamodule(train_data, patch_size, batch_size, val_data=None, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, tiling_mode=TilingMode.ShiftBoundary, extension_filter='', val_percentage=0.1, val_minimum_split=5, use_in_memory=True, transforms=None, train_dataloader_params=None, val_dataloader_params=None, **dataset_kwargs)
Create a MicroSplitDataModule for MicroSplit-style datasets.
Parameters:
-
train_data(str) –Path to training data.
-
patch_size(tuple) –Size of one patch of data.
-
batch_size(int) –Batch size for dataloaders.
-
val_data(str, default:None) –Path to validation data.
-
num_channels(int, default:2) –Number of channels in the input.
-
depth3D(int, default:1) –Number of slices in 3D.
-
grid_size(tuple, default:None) –Grid size for patch extraction.
-
multiscale_count(int, default:None) –Number of LC scales.
-
tiling_mode(TilingMode, default:ShiftBoundary) –Tiling mode for patch extraction.
-
extension_filter(str, default:'') –File extension filter.
-
val_percentage(float, default:0.1) –Percentage of training data to use for validation.
-
val_minimum_split(int, default:5) –Minimum number of patches/files for validation split.
-
use_in_memory(bool, default:True) –Use in-memory dataset if possible.
-
transforms(list, default:None) –List of transforms to apply.
-
train_dataloader_params(dict, default:None) –Parameters for training dataloader.
-
val_dataloader_params(dict, default:None) –Parameters for validation dataloader.
-
**dataset_kwargs–Additional arguments passed to DatasetConfig.
Returns:
-
MicroSplitDataModule–Configured MicroSplitDataModule instance.
get_datasplit_tuples(val_fraction, test_fraction, data_length)
Get train/val/test indices for data splitting.
Parameters:
-
val_fraction(float or None) –Fraction of data to use for validation.
-
test_fraction(float or None) –Fraction of data to use for testing.
-
data_length(int) –Total length of the dataset.
Returns:
-
tuple[ndarray, ndarray, ndarray]–Training, validation, and test indices.
get_train_val_data(data_config, datadir, datasplit_type, val_fraction=None, test_fraction=None, allow_generation=None, **kwargs)
Load and split data according to configuration.
Parameters:
-
data_config(MicroSplitDataConfig) –Data configuration object.
-
datadir(str or Path) –Path to the data directory.
-
datasplit_type(DataSplitType) –Type of data split to return.
-
val_fraction(float, default:None) –Fraction of data to use for validation.
-
test_fraction(float, default:None) –Fraction of data to use for testing.
-
allow_generation(bool, default:None) –Whether to allow data generation.
-
**kwargs–Additional keyword arguments.
Returns:
-
ndarray–Split data array.