Layers
Script containing the common basic blocks (nn.Module) reused by the LadderVAE.
BottomUpDeterministicResBlock
Bases: ResBlockWithResampling
Resnet block for bottom-up deterministic layers.
forward(x)
Forward pass.
Parameters:
-
x(Tensor) –input # TODO add shape
Returns:
-
Tensor–output # TODO add shape
BottomUpLayer
Bases: Module
Bottom-up deterministic layer.
It consists of one or a stack of BottomUpDeterministicResBlock's.
The outputs are the so-called bu_values that are later used in the Decoder to update the
generative distributions.
NOTE: When Lateral Contextualization is Enabled (i.e., enable_multiscale=True),
the low-res lateral input is first fed through a BottomUpDeterministicBlock (BUDB)
(without downsampling), and then merged to the latent tensor produced by the primary flow
of the BottomUpLayer through the MergeLowRes layer. It is meaningful to remark that
the BUDB that takes care of encoding the low-res input can be either shared with the
primary flow (and in that case it is the "same_size" BUDB (or stack of BUDBs) -> see self.net),
or can be a deep-copy of the primary flow's BUDB.
This behaviour is controlled by lowres_separate_branch parameter.
__init__(n_res_blocks, n_filters, conv_strides=(2, 2), downsampling_steps=0, nonlin=None, batchnorm=True, dropout=None, res_block_type=None, res_block_kernel=None, gated=None, enable_multiscale=False, multiscale_lowres_size_factor=None, lowres_separate_branch=False, multiscale_retain_spatial_dims=False, decoder_retain_spatial_dims=False, output_expected_shape=None)
Constructor.
Parameters:
-
n_res_blocks(int) –Number of
BottomUpDeterministicResBlockmodules stacked in this layer. -
n_filters(int) –Number of channels present through out the layers of this block.
-
downsampling_steps(int, default:0) –Number of downsampling steps that has to be done in this layer (typically 1). Default is 0.
-
nonlin(Optional[Callable], default:None) –The non-linearity function used in the block. Default is
None. -
batchnorm(bool, default:True) –Whether to use batchnorm layers. Default is
True. -
dropout(Optional[float], default:None) –The dropout probability in dropout layers. If
Nonedropout is not used. Default isNone. -
res_block_type(Optional[str], default:None) –A string specifying the structure of residual block. Check
ResidualBlockdoscstring for more information. Default isNone. -
res_block_kernel(Optional[int], default:None) –The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is
None. -
gated(Optional[bool], default:None) –Whether to use gated layer. Default is
None. -
enable_multiscale(bool, default:False) –Whether to enable multiscale (Lateral Contextualization) or not. Default is
False. -
multiscale_lowres_size_factor(Optional[int], default:None) –A factor the expresses the relative size of the primary flow tensor with respect to the lower-resolution lateral input tensor. Default in
None. -
lowres_separate_branch(bool, default:False) –Whether the residual block(s) encoding the low-res input should be shared (
False) or not (True) with the primary flow "same-size" residual block(s). Default isFalse. -
multiscale_retain_spatial_dims(bool, default:False) –Whether to pad the latent tensor resulting from the bottom-up layer's primary flow to match the size of the low-res input. Default is
False. -
decoder_retain_spatial_dims(bool, default:False) –Whether in the corresponding top-down layer the shape of tensor is retained between input and output. Default is
False. -
output_expected_shape(Optional[Iterable[int]], default:None) –The expected shape of the layer output (only used if
enable_multiscale == True). Default isNone.
forward(x, lowres_x=None)
Forward pass.
Parameters:
-
x(Tensor) –The input of the
BottomUpLayer, i.e., the input image or the output of the previous layer. -
lowres_x(Union[Tensor, None], default:None) –The low-res input used for Lateral Contextualization (LC). Default is
None. -
NOTE– -
tensor–
GateLayer
Bases: Module
Layer class that implements a gating mechanism.
Double the number of channels through a convolutional layer, then use half the channels as gate for the other half.
forward(x)
Forward pass.
Parameters:
-
x(Tensor) –input # TODO add shape
Returns:
-
Tensor–output # TODO add shape
MergeLayer
Bases: Module
Layer class that merges two or more input tensors.
Merges two or more (B, C, [Z], Y, X) input tensors by concatenating
them along dim=1 and passes the result through:
a) a convolutional 1x1 layer (merge_type == "linear"), or
b) a convolutional 1x1 layer and then a gated residual block (merge_type == "residual"), or
c) a convolutional 1x1 layer and then an ungated residual block (merge_type == "residual_ungated").
__init__(merge_type, channels, conv_strides=(2, 2), nonlin=nn.LeakyReLU(), batchnorm=True, dropout=None, res_block_type=None, res_block_kernel=None, conv2d_bias=True)
Constructor.
Parameters:
-
merge_type(Literal['linear', 'residual', 'residual_ungated']) –The type of merge done in the layer. It can be chosen between "linear", "residual", and "residual_ungated". Check the class docstring for more information about the behaviour of different merge modalities.
-
channels(Union[int, Iterable[int]]) –The number of channels used in the convolutional blocks of this layer. If it is an
int: - 1st 1x1 Conv2d: in_channels=2*channels, out_channels=channels - (Optional) ResBlock: in_channels=channels, out_channels=channels If it is an Iterable (must havelen(channels)==3): - 1st 1x1 Conv2d: in_channels=sum(channels[:-1]), out_channels=channels[-1] - (Optional) ResBlock: in_channels=channels[-1], out_channels=channels[-1] -
conv_strides(tuple[int], default:(2, 2)) –The strides used in the convolutions. Default is
(2, 2). -
nonlin(Callable, default:LeakyReLU()) –The non-linearity function used in the block. Default is
nn.LeakyReLU. -
batchnorm(bool, default:True) –Whether to use batchnorm layers. Default is
True. -
dropout(Optional[float], default:None) –The dropout probability in dropout layers. If
Nonedropout is not used. Default isNone. -
res_block_type(Optional[str], default:None) –A string specifying the structure of residual block. Check
ResidualBlockdoscstring for more information. Default isNone. -
res_block_kernel(Optional[int], default:None) –The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is
None. -
conv2d_bias(Optional[bool], default:True) –Whether to use bias term in convolutions. Default is
True.
MergeLowRes
Bases: MergeLayer
Child class of MergeLayer.
Specifically designed to merge the low-resolution patches that are used in Lateral Contextualization approach.
forward(latent, lowres)
Forward pass.
Parameters:
-
latent(Tensor) –The output latent tensor from previous layer in the LVAE hierarchy.
-
lowres(Tensor) –The low-res patch image to be merged to increase the context.
ResBlockWithResampling
Bases: Module
Residual block with resampling.
Residual block that takes care of resampling (i.e. downsampling or upsampling) steps (by a factor 2).
It is structured as follows:
1. pre_conv: a downsampling or upsampling strided convolutional layer in case of resampling, or
a 1x1 convolutional layer that maps the number of channels of the input to inner_channels.
2. ResidualBlock
3. post_conv: a 1x1 convolutional layer that maps the number of channels to c_out.
Some implementation notes: - Resampling is performed through a strided convolution layer at the beginning of the block. - The strided convolution block has fixed kernel size of 3x3 and 1 layer of padding with zeros. - The number of channels is adjusted at the beginning and end of the block through 1x1 convolutional layers. - The number of internal channels is by default the same as the number of output channels, but min_inner_channels can override the behaviour.
__init__(mode, c_in, c_out, conv_strides, min_inner_channels=None, nonlin=nn.LeakyReLU(), resample=False, res_block_kernel=None, groups=1, batchnorm=True, res_block_type=None, dropout=None, gated=None, conv2d_bias=True)
Constructor.
Parameters:
-
mode(Literal['top-down', 'bottom-up']) –The type of resampling performed in the initial strided convolution of the block. If "bottom-up" downsampling of a factor 2 is done. If "top-down" upsampling of a factor 2 is done.
-
c_in(int) –The number of input channels.
-
c_out(int) –The number of output channels.
-
min_inner_channels(Union[int, None], default:None) –The number of channels used in the inner layer of this module. Default is
None, meaning that the number of inner channels is set toc_out. -
nonlin(Callable, default:LeakyReLU()) –The non-linearity function used in the block. Default is
nn.LeakyReLU. -
resample(bool, default:False) –Whether to perform resampling in the first convolutional layer. If
False, the first convolutional layer just maps the input to a tensor withinner_channelschannels through 1x1 convolution. Default isFalse. -
res_block_kernel(Optional[Union[int, Iterable[int]]], default:None) –The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is
None. -
groups(int, default:1) –The number of groups to consider in the convolutions. Default is 1.
-
batchnorm(bool, default:True) –Whether to use batchnorm layers. Default is
True. -
res_block_type(Union[str, None], default:None) –A string specifying the structure of residual block. Check
ResidualBlockdoscstring for more information. Default isNone. -
dropout(Union[float, None], default:None) –The dropout probability in dropout layers. If
Nonedropout is not used. Default isNone. -
gated(Union[bool, None], default:None) –Whether to use gated layer. Default is
None. -
conv2d_bias(bool, default:True) –Whether to use bias term in convolutions. Default is
True.
forward(x)
Forward pass.
Parameters:
-
x(Tensor) –input # TODO add shape
Returns:
-
Tensor–output # TODO add shape
ResidualBlock
Bases: Module
Residual block with 2 convolutional layers.
Some architectural notes: - The number of input, intermediate, and output channels is the same, - Padding is always 'same', - The 2 convolutional layers have the same groups, - No stride allowed, - Kernel sizes must be odd.
The output isgiven by: out = gate(f(x)) + x.
The presence of the gating mechanism is optional, and f(x) has different
structures depending on the block_type argument.
Specifically, block_type is a string specifying the block's structure, with:
a = activation
b = batch norm
c = conv layer
d = dropout.
For example, "bacdbacd" defines a block with 2x[batchnorm, activation, conv, dropout].
__init__(channels, nonlin, conv_strides=(2, 2), kernel=None, groups=1, batchnorm=True, block_type=None, dropout=None, gated=None, conv2d_bias=True)
Constructor.
Parameters:
-
channels(int) –The number of input and output channels (they are the same).
-
nonlin(Callable) –The non-linearity function used in the block (e.g.,
nn.ReLU). -
kernel(Union[int, Iterable[int], None], default:None) –The kernel size used in the convolutions of the block. It can be either a single integer or a pair of integers defining the squared kernel. Default is
None. -
groups(int, default:1) –The number of groups to consider in the convolutions. Default is 1.
-
batchnorm(bool, default:True) –Whether to use batchnorm layers. Default is
True. -
block_type(str, default:None) –A string specifying the block structure, check class docstring for more info. Default is
None. -
dropout(float, default:None) –The dropout probability in dropout layers. If
Nonedropout is not used. Default isNone. -
gated(bool, default:None) –Whether to use gated layer. Default is
None. -
conv2d_bias(bool, default:True) –Whether to use bias term in convolutions. Default is
True.
forward(x)
Forward pass.
Parameters:
-
x(Tensor) –input tensor # TODO add shape
Returns:
-
Tensor–output tensor # TODO add shape
ResidualGatedBlock
Bases: ResidualBlock
Layer class that implements a residual block with a gating mechanism.
forward(x)
Forward pass.
Parameters:
-
x(Tensor) –input tensor # TODO add shape
Returns:
-
Tensor–output tensor # TODO add shape
SkipConnectionMerger
Bases: MergeLayer
Specialized MergeLayer module, handles skip connections in the model.
__init__(nonlin, channels, batchnorm, dropout, res_block_type, conv_strides=(2, 2), merge_type='residual', conv2d_bias=True, res_block_kernel=None)
Constructor.
nonlin: Callable, optional
The non-linearity function used in the block. Default is nn.LeakyReLU.
channels: Union[int, Iterable[int]]
The number of channels used in the convolutional blocks of this layer.
If it is an int:
- 1st 1x1 Conv2d: in_channels=2*channels, out_channels=channels
- (Optional) ResBlock: in_channels=channels, out_channels=channels
If it is an Iterable (must have len(channels)==3):
- 1st 1x1 Conv2d: in_channels=sum(channels[:-1]), out_channels=channels[-1]
- (Optional) ResBlock: in_channels=channels[-1], out_channels=channels[-1]
batchnorm: bool
Whether to use batchnorm layers.
dropout: float
The dropout probability in dropout layers. If None dropout is not used.
res_block_type: str
A string specifying the structure of residual block.
Check ResidualBlock doscstring for more information.
conv_strides: tuple, optional
The strides used in the convolutions. Default is (2, 2).
merge_type: Literal["linear", "residual", "residual_ungated"]
The type of merge done in the layer. It can be chosen between "linear", "residual", and "residual_ungated".
Check the class docstring for more information about the behaviour of different merge modalities.
conv2d_bias: bool, optional
Whether to use bias term in convolutions. Default is True.
res_block_kernel: Union[int, Iterable[int]], optional
The kernel size used in the convolutions of the residual block.
It can be either a single integer or a pair of integers defining the squared kernel.
Default is None.
TopDownDeterministicResBlock
Bases: ResBlockWithResampling
Resnet block for top-down deterministic layers.
forward(x)
Forward pass.
Parameters:
-
x(Tensor) –input # TODO add shape
Returns:
-
Tensor–output # TODO add shape
TopDownLayer
Bases: Module
Top-down inference layer.
It includes: - Stochastic sampling, - Computation of KL divergence, - A small deterministic ResNet that performs upsampling.
NOTE 1: The algorithm for generative inference approximately works as follows: - p_params = output of top-down layer above - bu = inferred bottom-up value at this layer - q_params = merge(bu, p_params) - z = stochastic_layer(q_params) - (optional) get and merge skip connection from prev top-down layer - top-down deterministic ResNet
NOTE 2: The Top-Down layer can work in two modes: inference and prediction/generative. Depending on the particular mode, it follows distinct behaviours: - In inference mode, parameters of q(z_i|z_i+1) are obtained from the inference path, by merging outcomes of bottom-up and top-down passes. The exception is the top layer, in which the parameters of q(z_L|x) are set as the output of the topmost bottom-up layer. - On the contrary in predicition/generative mode, parameters of q(z_i|z_i+1) can be obtained once again by merging bottom-up and top-down outputs (CONDITIONAL GENERATION), or it is possible to directly sample from the prior p(z_i|z_i+1) (UNCONDITIONAL GENERATION).
NOTE 3: When doing unconditional generation, bu_value is not available. Hence the merge layer is not used, and z is sampled directly from p_params.
NOTE 4: If this is the top layer, at inference time, the uppermost bottom-up value is used directly as q_params, and p_params are defined in this layer (while they are usually taken from the previous layer), and can be learned.
__init__(z_dim, n_res_blocks, n_filters, conv_strides, is_top_layer=False, upsampling_steps=None, nonlin=None, merge_type=None, batchnorm=True, dropout=None, stochastic_skip=False, res_block_type=None, res_block_kernel=None, groups=1, gated=None, learn_top_prior=False, top_prior_param_shape=None, analytical_kl=False, retain_spatial_dims=False, vanilla_latent_hw=None, input_image_shape=None, normalize_latent_factor=1.0, conv2d_bias=True, stochastic_use_naive_exponential=False)
Constructor.
Parameters:
-
z_dim(int) –The size of the latent space.
-
n_res_blocks(int) –The number of TopDownDeterministicResBlock blocks
-
n_filters(int) –The number of channels present through out the layers of this block.
-
conv_strides(tuple[int]) –The strides used in the convolutions. Default is
(2, 2). -
is_top_layer(bool, default:False) –Whether the current layer is at the top of the Decoder hierarchy. Default is
False. -
upsampling_steps(Union[int, None], default:None) –The number of upsampling steps that has to be done in this layer (typically 1). Default is
None. -
nonlin(Union[Callable, None], default:None) –The non-linearity function used in the block (e.g.,
nn.ReLU). Default isNone. -
merge_type(Union[Literal['linear', 'residual', 'residual_ungated'], None], default:None) –The type of merge done in the layer. It can be chosen between "linear", "residual", and "residual_ungated". Check the
MergeLayerclass docstring for more information about the behaviour of different merging modalities. Default isNone. -
batchnorm(bool, default:True) –Whether to use batchnorm layers. Default is
True. -
dropout(Union[float, None], default:None) –The dropout probability in dropout layers. If
Nonedropout is not used. Default isNone. -
stochastic_skip(bool, default:False) –Whether to use skip connections between previous top-down layer's output and this layer's stochastic output. Stochastic skip connection allows the previous layer's output has a way to directly reach this hierarchical level, hence facilitating the gradient flow during backpropagation. Default is
False. -
res_block_type(Union[str, None], default:None) –A string specifying the structure of residual block. Check
ResidualBlockdocumentation for more information. Default isNone. -
res_block_kernel(Union[int, None], default:None) –The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is
None. -
groups(int, default:1) –The number of groups to consider in the convolutions. Default is 1.
-
gated(Union[bool, None], default:None) –Whether to use gated layer in
ResidualBlock. Default isNone. -
learn_top_prior(bool, default:False) –Whether to set the top prior as learnable. If this is set to
False, in the top-most layer the prior will be N(0,1). Otherwise, we will still have a normal distribution whose parameters will be learnt. Default isFalse. -
top_prior_param_shape(Union[Iterable[int], None], default:None) –The size of the tensor which expresses the mean and the variance of the prior for the top most layer. Default is
None. -
analytical_kl(bool, default:False) –If True, KL divergence is calculated according to the analytical formula. Otherwise, an MC approximation using sampled latents is calculated. Default is
False. -
retain_spatial_dims(bool, default:False) –If
True, the size of Encoder's latent space is kept toinput_image_shapewithin the topdown layer. This implies that the oput spatial size equals the input spatial size. To achieve this, we centercrop the intermediate representation. Default isFalse. -
vanilla_latent_hw(Union[Iterable[int], None], default:None) –The shape of the latent tensor used for prediction (i.e., it influences the computation of restricted KL). Default is
None. -
input_image_shape(Union[tuple[int, int], None], default:None) –The shape of the input image tensor. When
retain_spatial_dimsis set toTrue, this is used to ensure that the shape of this layer output has the same shape as the input. Default isNone. -
normalize_latent_factor(float, default:1.0) –A factor used to normalize the latent tensors
q_params. Specifically, normalization is done by dividing the latent tensor by this factor. Default is 1.0. -
conv2d_bias(bool, default:True) –Whether to use bias term is the convolutional blocks of this layer. Default is
True. -
stochastic_use_naive_exponential(bool, default:False) –If
False, in the NormalStochasticBlock2d exponentials are computed according to the alternative definition provided byStableExponentialclass. This should improve numerical stability in the training process. Default isFalse.
forward(input_=None, skip_connection_input=None, inference_mode=False, bu_value=None, n_img_prior=None, forced_latent=None, force_constant_output=False, mode_pred=False, use_uncond_mode=False, var_clip_max=None)
Forward pass.
Parameters:
-
input_(Union[Tensor, None], default:None) –The input tensor to the layer, which is the output of the top-down layer. Default is
None. -
skip_connection_input(Union[Tensor, None], default:None) –The tensor brought by the skip connection between the current and the previous top-down layer. Default is
None. -
inference_mode(bool, default:False) –Whether the layer is in inference mode. See NOTE 2 in class description for more info. Default is
False. -
bu_value(Union[Tensor, None], default:None) –The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer. Default is
None. -
n_img_prior(Union[int, None], default:None) –The number of images to be generated from the unconditional prior distribution p(z_L). Default is
None. -
forced_latent(Union[Tensor, None], default:None) –A pre-defined latent tensor. If it is not
None, than it is used as the actual latent tensor and, hence, sampling does not happen. Default isNone. -
force_constant_output(bool, default:False) –Whether to copy the first sample (and rel. distrib parameters) over the whole batch. This is used when doing experiment from the prior - q is not used. Default is
False. -
mode_pred(bool, default:False) –Whether the model is in prediction mode. Default is
False. -
use_uncond_mode(bool, default:False) –Whether to use the uncoditional distribution p(z) to sample latents in prediction mode.
-
var_clip_max(Union[float, None], default:None) –The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped.
get_p_params(input_, n_img_prior)
Return the parameters of the prior distribution p(z_i|z_{i+1}).
The parameters depend on the hierarchical level of the layer: - if it is the topmost level, parameters are the ones of the prior. - else, the input from the layer above is the parameters itself.
Parameters:
-
input_(Tensor) –The input tensor to the layer, which is the output of the top-down layer above.
-
n_img_prior(int) –The number of images to be generated from the unconditional prior distribution p(z_L).
sample_from_q(input_, bu_value, var_clip_max=None, mask=None)
Method computes the latent inference distribution q(z_i|z_{i+1}).
Used for sampling a latent tensor from it.
Parameters:
-
input_(Tensor) –The input tensor to the layer, which is the output of the top-down layer.
-
bu_value(Tensor) –The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer.
-
var_clip_max(Optional[float], default:None) –The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped. Default is
None. -
mask(Tensor, default:None) –A tensor that is used to mask the sampled latent tensor. Default is
None.