Skip to content

Microsplit Data Module

Source

MicroSplit data module for training and validation.

MicroSplitDataModule

Bases: LightningDataModule

Lightning DataModule for MicroSplit-style datasets.

Matches the interface of TrainDataModule, but internally uses original MicroSplit dataset logic.

Parameters:

Name Type Description Default
data_config MicroSplitDataConfig

Configuration for the MicroSplit dataset.

required
train_data str

Path to training data directory.

required
val_data str

Path to validation data directory.

None
train_data_target str

Path to training target data.

None
val_data_target str

Path to validation target data.

None
read_source_func Callable

Function to read source data.

None
extension_filter str

File extension filter.

''
val_percentage float

Percentage of data to use for validation, by default 0.1.

0.1
val_minimum_split int

Minimum number of samples for validation split, by default 5.

5
use_in_memory bool

Whether to use in-memory dataset, by default True.

True

get_data_stats()

Get data statistics.

Returns:

Type Description
tuple[dict, dict]

A tuple containing two dictionaries: - data_mean: mean values for input and target - data_std: standard deviation values for input and target

train_dataloader()

Create a dataloader for training.

Returns:

Type Description
DataLoader

Training dataloader.

val_dataloader()

Create a dataloader for validation.

Returns:

Type Description
DataLoader

Validation dataloader.

MicroSplitPredictDataModule

Bases: LightningDataModule

Lightning DataModule for MicroSplit-style prediction datasets.

Matches the interface of PredictDataModule, but internally uses MicroSplit dataset logic for prediction.

Parameters:

Name Type Description Default
pred_config MicroSplitDataConfig

Configuration for MicroSplit prediction.

required
pred_data str or Path or ndarray

Prediction data, can be a path to a folder, a file or a numpy array.

required
read_source_func Callable

Function to read custom types.

None
extension_filter str

Filter to filter file extensions for custom types.

''
dataloader_params dict

Dataloader parameters.

None

predict_dataloader()

Create a dataloader for prediction.

Returns:

Type Description
DataLoader

Prediction dataloader.

create_microsplit_predict_datamodule(pred_data, tile_size, batch_size=1, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, data_stats=None, tiling_mode=TilingMode.ShiftBoundary, read_source_func=None, extension_filter='', dataloader_params=None, **dataset_kwargs)

Create a MicroSplitPredictDataModule for microSplit-style prediction datasets.

Parameters:

Name Type Description Default
pred_data str or Path or ndarray

Prediction data, can be a path to a folder, a file or a numpy array.

required
tile_size tuple

Size of one tile of data.

required
batch_size int

Batch size for prediction dataloader.

1
num_channels int

Number of channels in the input.

2
depth3D int

Number of slices in 3D.

1
grid_size tuple

Grid size for patch extraction.

None
multiscale_count int

Number of LC scales.

None
data_stats tuple

Data statistics, by default None.

None
tiling_mode TilingMode

Tiling mode for patch extraction.

ShiftBoundary
read_source_func Callable

Function to read the source data.

None
extension_filter str

File extension filter.

''
dataloader_params dict

Parameters for prediction dataloader.

None
**dataset_kwargs

Additional arguments passed to MicroSplitDataConfig.

{}

Returns:

Type Description
MicroSplitPredictDataModule

Configured MicroSplitPredictDataModule instance.

create_microsplit_train_datamodule(train_data, patch_size, batch_size, val_data=None, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, tiling_mode=TilingMode.ShiftBoundary, extension_filter='', val_percentage=0.1, val_minimum_split=5, use_in_memory=True, transforms=None, train_dataloader_params=None, val_dataloader_params=None, **dataset_kwargs)

Create a MicroSplitDataModule for MicroSplit-style datasets.

Parameters:

Name Type Description Default
train_data str

Path to training data.

required
patch_size tuple

Size of one patch of data.

required
batch_size int

Batch size for dataloaders.

required
val_data str

Path to validation data.

None
num_channels int

Number of channels in the input.

2
depth3D int

Number of slices in 3D.

1
grid_size tuple

Grid size for patch extraction.

None
multiscale_count int

Number of LC scales.

None
tiling_mode TilingMode

Tiling mode for patch extraction.

ShiftBoundary
extension_filter str

File extension filter.

''
val_percentage float

Percentage of training data to use for validation.

0.1
val_minimum_split int

Minimum number of patches/files for validation split.

5
use_in_memory bool

Use in-memory dataset if possible.

True
transforms list

List of transforms to apply.

None
train_dataloader_params dict

Parameters for training dataloader.

None
val_dataloader_params dict

Parameters for validation dataloader.

None
**dataset_kwargs

Additional arguments passed to DatasetConfig.

{}

Returns:

Type Description
MicroSplitDataModule

Configured MicroSplitDataModule instance.

get_datasplit_tuples(val_fraction, test_fraction, data_length)

Get train/val/test indices for data splitting.

Parameters:

Name Type Description Default
val_fraction float or None

Fraction of data to use for validation.

required
test_fraction float or None

Fraction of data to use for testing.

required
data_length int

Total length of the dataset.

required

Returns:

Type Description
tuple[ndarray, ndarray, ndarray]

Training, validation, and test indices.

get_train_val_data(data_config, datadir, datasplit_type, val_fraction=None, test_fraction=None, allow_generation=None, **kwargs)

Load and split data according to configuration.

Parameters:

Name Type Description Default
data_config MicroSplitDataConfig

Data configuration object.

required
datadir str or Path

Path to the data directory.

required
datasplit_type DataSplitType

Type of data split to return.

required
val_fraction float

Fraction of data to use for validation.

None
test_fraction float

Fraction of data to use for testing.

None
allow_generation bool

Whether to allow data generation.

None
**kwargs

Additional keyword arguments.

{}

Returns:

Type Description
ndarray

Split data array.

load_data(datadir)

Load data from a directory containing channel subdirectories with image files.

Parameters:

Name Type Description Default
datadir str or Path

Path to the data directory containing channel subdirectories.

required

Returns:

Type Description
ndarray

Stacked array of all channels' data.

load_one_file(fpath)

Load a single 2D image file.

Parameters:

Name Type Description Default
fpath str or Path

Path to the image file.

required

Returns:

Type Description
ndarray

Reshaped image data.