Microsplit Data Module
MicroSplit data module for training and validation.
MicroSplitDataModule
Bases: LightningDataModule
Lightning DataModule for MicroSplit-style datasets.
Matches the interface of TrainDataModule, but internally uses original MicroSplit dataset logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_config
|
MicroSplitDataConfig
|
Configuration for the MicroSplit dataset. |
required |
train_data
|
str
|
Path to training data directory. |
required |
val_data
|
str
|
Path to validation data directory. |
None
|
train_data_target
|
str
|
Path to training target data. |
None
|
val_data_target
|
str
|
Path to validation target data. |
None
|
read_source_func
|
Callable
|
Function to read source data. |
None
|
extension_filter
|
str
|
File extension filter. |
''
|
val_percentage
|
float
|
Percentage of data to use for validation, by default 0.1. |
0.1
|
val_minimum_split
|
int
|
Minimum number of samples for validation split, by default 5. |
5
|
use_in_memory
|
bool
|
Whether to use in-memory dataset, by default True. |
True
|
get_data_stats()
train_dataloader()
Create a dataloader for training.
Returns:
| Type | Description |
|---|---|
DataLoader
|
Training dataloader. |
val_dataloader()
Create a dataloader for validation.
Returns:
| Type | Description |
|---|---|
DataLoader
|
Validation dataloader. |
MicroSplitPredictDataModule
Bases: LightningDataModule
Lightning DataModule for MicroSplit-style prediction datasets.
Matches the interface of PredictDataModule, but internally uses MicroSplit dataset logic for prediction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pred_config
|
MicroSplitDataConfig
|
Configuration for MicroSplit prediction. |
required |
pred_data
|
str or Path or ndarray
|
Prediction data, can be a path to a folder, a file or a numpy array. |
required |
read_source_func
|
Callable
|
Function to read custom types. |
None
|
extension_filter
|
str
|
Filter to filter file extensions for custom types. |
''
|
dataloader_params
|
dict
|
Dataloader parameters. |
None
|
predict_dataloader()
Create a dataloader for prediction.
Returns:
| Type | Description |
|---|---|
DataLoader
|
Prediction dataloader. |
create_microsplit_predict_datamodule(pred_data, tile_size, batch_size=1, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, data_stats=None, tiling_mode=TilingMode.ShiftBoundary, read_source_func=None, extension_filter='', dataloader_params=None, **dataset_kwargs)
Create a MicroSplitPredictDataModule for microSplit-style prediction datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pred_data
|
str or Path or ndarray
|
Prediction data, can be a path to a folder, a file or a numpy array. |
required |
tile_size
|
tuple
|
Size of one tile of data. |
required |
batch_size
|
int
|
Batch size for prediction dataloader. |
1
|
num_channels
|
int
|
Number of channels in the input. |
2
|
depth3D
|
int
|
Number of slices in 3D. |
1
|
grid_size
|
tuple
|
Grid size for patch extraction. |
None
|
multiscale_count
|
int
|
Number of LC scales. |
None
|
data_stats
|
tuple
|
Data statistics, by default None. |
None
|
tiling_mode
|
TilingMode
|
Tiling mode for patch extraction. |
ShiftBoundary
|
read_source_func
|
Callable
|
Function to read the source data. |
None
|
extension_filter
|
str
|
File extension filter. |
''
|
dataloader_params
|
dict
|
Parameters for prediction dataloader. |
None
|
**dataset_kwargs
|
Additional arguments passed to MicroSplitDataConfig. |
{}
|
Returns:
| Type | Description |
|---|---|
MicroSplitPredictDataModule
|
Configured MicroSplitPredictDataModule instance. |
create_microsplit_train_datamodule(train_data, patch_size, batch_size, val_data=None, num_channels=2, depth3D=1, grid_size=None, multiscale_count=None, tiling_mode=TilingMode.ShiftBoundary, extension_filter='', val_percentage=0.1, val_minimum_split=5, use_in_memory=True, transforms=None, train_dataloader_params=None, val_dataloader_params=None, **dataset_kwargs)
Create a MicroSplitDataModule for MicroSplit-style datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_data
|
str
|
Path to training data. |
required |
patch_size
|
tuple
|
Size of one patch of data. |
required |
batch_size
|
int
|
Batch size for dataloaders. |
required |
val_data
|
str
|
Path to validation data. |
None
|
num_channels
|
int
|
Number of channels in the input. |
2
|
depth3D
|
int
|
Number of slices in 3D. |
1
|
grid_size
|
tuple
|
Grid size for patch extraction. |
None
|
multiscale_count
|
int
|
Number of LC scales. |
None
|
tiling_mode
|
TilingMode
|
Tiling mode for patch extraction. |
ShiftBoundary
|
extension_filter
|
str
|
File extension filter. |
''
|
val_percentage
|
float
|
Percentage of training data to use for validation. |
0.1
|
val_minimum_split
|
int
|
Minimum number of patches/files for validation split. |
5
|
use_in_memory
|
bool
|
Use in-memory dataset if possible. |
True
|
transforms
|
list
|
List of transforms to apply. |
None
|
train_dataloader_params
|
dict
|
Parameters for training dataloader. |
None
|
val_dataloader_params
|
dict
|
Parameters for validation dataloader. |
None
|
**dataset_kwargs
|
Additional arguments passed to DatasetConfig. |
{}
|
Returns:
| Type | Description |
|---|---|
MicroSplitDataModule
|
Configured MicroSplitDataModule instance. |
get_datasplit_tuples(val_fraction, test_fraction, data_length)
Get train/val/test indices for data splitting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
val_fraction
|
float or None
|
Fraction of data to use for validation. |
required |
test_fraction
|
float or None
|
Fraction of data to use for testing. |
required |
data_length
|
int
|
Total length of the dataset. |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray, ndarray]
|
Training, validation, and test indices. |
get_train_val_data(data_config, datadir, datasplit_type, val_fraction=None, test_fraction=None, allow_generation=None, **kwargs)
Load and split data according to configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_config
|
MicroSplitDataConfig
|
Data configuration object. |
required |
datadir
|
str or Path
|
Path to the data directory. |
required |
datasplit_type
|
DataSplitType
|
Type of data split to return. |
required |
val_fraction
|
float
|
Fraction of data to use for validation. |
None
|
test_fraction
|
float
|
Fraction of data to use for testing. |
None
|
allow_generation
|
bool
|
Whether to allow data generation. |
None
|
**kwargs
|
Additional keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Split data array. |