Layers
Script containing the common basic blocks (nn.Module) reused by the LadderVAE.
BottomUpDeterministicResBlock
Bases: ResBlockWithResampling
Resnet block for bottom-up deterministic layers.
forward(x)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
input # TODO add shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
output # TODO add shape |
BottomUpLayer
Bases: Module
Bottom-up deterministic layer.
It consists of one or a stack of BottomUpDeterministicResBlock's.
The outputs are the so-called bu_values that are later used in the Decoder to update the
generative distributions.
NOTE: When Lateral Contextualization is Enabled (i.e., enable_multiscale=True),
the low-res lateral input is first fed through a BottomUpDeterministicBlock (BUDB)
(without downsampling), and then merged to the latent tensor produced by the primary flow
of the BottomUpLayer through the MergeLowRes layer. It is meaningful to remark that
the BUDB that takes care of encoding the low-res input can be either shared with the
primary flow (and in that case it is the "same_size" BUDB (or stack of BUDBs) -> see self.net),
or can be a deep-copy of the primary flow's BUDB.
This behaviour is controlled by lowres_separate_branch parameter.
forward(x, lowres_x=None)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
The input of the |
required |
lowres_x
|
Union[Tensor, None]
|
The low-res input used for Lateral Contextualization (LC). Default is |
None
|
NOTE
|
|
required | |
tensor
|
|
required |
GateLayer
Bases: Module
Layer class that implements a gating mechanism.
Double the number of channels through a convolutional layer, then use half the channels as gate for the other half.
forward(x)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
input # TODO add shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
output # TODO add shape |
MergeLayer
Bases: Module
Layer class that merges two or more input tensors.
Merges two or more (B, C, [Z], Y, X) input tensors by concatenating
them along dim=1 and passes the result through:
a) a convolutional 1x1 layer (merge_type == "linear"), or
b) a convolutional 1x1 layer and then a gated residual block (merge_type == "residual"), or
c) a convolutional 1x1 layer and then an ungated residual block (merge_type == "residual_ungated").
MergeLowRes
Bases: MergeLayer
Child class of MergeLayer.
Specifically designed to merge the low-resolution patches that are used in Lateral Contextualization approach.
forward(latent, lowres)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
latent
|
Tensor
|
The output latent tensor from previous layer in the LVAE hierarchy. |
required |
lowres
|
Tensor
|
The low-res patch image to be merged to increase the context. |
required |
ResBlockWithResampling
Bases: Module
Residual block with resampling.
Residual block that takes care of resampling (i.e. downsampling or upsampling) steps (by a factor 2).
It is structured as follows:
1. pre_conv: a downsampling or upsampling strided convolutional layer in case of resampling, or
a 1x1 convolutional layer that maps the number of channels of the input to inner_channels.
2. ResidualBlock
3. post_conv: a 1x1 convolutional layer that maps the number of channels to c_out.
Some implementation notes: - Resampling is performed through a strided convolution layer at the beginning of the block. - The strided convolution block has fixed kernel size of 3x3 and 1 layer of padding with zeros. - The number of channels is adjusted at the beginning and end of the block through 1x1 convolutional layers. - The number of internal channels is by default the same as the number of output channels, but min_inner_channels can override the behaviour.
forward(x)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
input # TODO add shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
output # TODO add shape |
ResidualBlock
Bases: Module
Residual block with 2 convolutional layers.
Some architectural notes: - The number of input, intermediate, and output channels is the same, - Padding is always 'same', - The 2 convolutional layers have the same groups, - No stride allowed, - Kernel sizes must be odd.
The output isgiven by: out = gate(f(x)) + x.
The presence of the gating mechanism is optional, and f(x) has different
structures depending on the block_type argument.
Specifically, block_type is a string specifying the block's structure, with:
a = activation
b = batch norm
c = conv layer
d = dropout.
For example, "bacdbacd" defines a block with 2x[batchnorm, activation, conv, dropout].
forward(x)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
input tensor # TODO add shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
output tensor # TODO add shape |
ResidualGatedBlock
Bases: ResidualBlock
Layer class that implements a residual block with a gating mechanism.
forward(x)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
input tensor # TODO add shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
output tensor # TODO add shape |
SkipConnectionMerger
TopDownDeterministicResBlock
Bases: ResBlockWithResampling
Resnet block for top-down deterministic layers.
forward(x)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
input # TODO add shape |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
output # TODO add shape |
TopDownLayer
Bases: Module
Top-down inference layer.
It includes: - Stochastic sampling, - Computation of KL divergence, - A small deterministic ResNet that performs upsampling.
NOTE 1: The algorithm for generative inference approximately works as follows: - p_params = output of top-down layer above - bu = inferred bottom-up value at this layer - q_params = merge(bu, p_params) - z = stochastic_layer(q_params) - (optional) get and merge skip connection from prev top-down layer - top-down deterministic ResNet
NOTE 2: The Top-Down layer can work in two modes: inference and prediction/generative. Depending on the particular mode, it follows distinct behaviours: - In inference mode, parameters of q(z_i|z_i+1) are obtained from the inference path, by merging outcomes of bottom-up and top-down passes. The exception is the top layer, in which the parameters of q(z_L|x) are set as the output of the topmost bottom-up layer. - On the contrary in predicition/generative mode, parameters of q(z_i|z_i+1) can be obtained once again by merging bottom-up and top-down outputs (CONDITIONAL GENERATION), or it is possible to directly sample from the prior p(z_i|z_i+1) (UNCONDITIONAL GENERATION).
NOTE 3: When doing unconditional generation, bu_value is not available. Hence the merge layer is not used, and z is sampled directly from p_params.
NOTE 4: If this is the top layer, at inference time, the uppermost bottom-up value is used directly as q_params, and p_params are defined in this layer (while they are usually taken from the previous layer), and can be learned.
forward(input_=None, skip_connection_input=None, inference_mode=False, bu_value=None, n_img_prior=None, forced_latent=None, force_constant_output=False, mode_pred=False, use_uncond_mode=False, var_clip_max=None)
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_
|
Union[Tensor, None]
|
The input tensor to the layer, which is the output of the top-down layer.
Default is |
None
|
skip_connection_input
|
Union[Tensor, None]
|
The tensor brought by the skip connection between the current and the
previous top-down layer.
Default is |
None
|
inference_mode
|
bool
|
Whether the layer is in inference mode. See NOTE 2 in class description
for more info.
Default is |
False
|
bu_value
|
Union[Tensor, None]
|
The tensor defining the parameters /mu_q and /sigma_q computed during the
bottom-up deterministic pass
at the correspondent hierarchical layer. Default is |
None
|
n_img_prior
|
Union[int, None]
|
The number of images to be generated from the unconditional prior
distribution p(z_L).
Default is |
None
|
forced_latent
|
Union[Tensor, None]
|
A pre-defined latent tensor. If it is not |
None
|
force_constant_output
|
bool
|
Whether to copy the first sample (and rel. distrib parameters) over the
whole batch.
This is used when doing experiment from the prior - q is not used.
Default is |
False
|
mode_pred
|
bool
|
Whether the model is in prediction mode. Default is |
False
|
use_uncond_mode
|
bool
|
Whether to use the uncoditional distribution p(z) to sample latents in prediction mode. |
False
|
var_clip_max
|
Union[float, None]
|
The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped. |
None
|
get_p_params(input_, n_img_prior)
Return the parameters of the prior distribution p(z_i|z_{i+1}).
The parameters depend on the hierarchical level of the layer: - if it is the topmost level, parameters are the ones of the prior. - else, the input from the layer above is the parameters itself.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_
|
Tensor
|
The input tensor to the layer, which is the output of the top-down layer above. |
required |
n_img_prior
|
int
|
The number of images to be generated from the unconditional prior distribution p(z_L). |
required |
sample_from_q(input_, bu_value, var_clip_max=None, mask=None)
Method computes the latent inference distribution q(z_i|z_{i+1}).
Used for sampling a latent tensor from it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_
|
Tensor
|
The input tensor to the layer, which is the output of the top-down layer. |
required |
bu_value
|
Tensor
|
The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer. |
required |
var_clip_max
|
Optional[float]
|
The maximum value reachable by the log-variance of the latent distribution.
Values exceeding this threshold are clipped. Default is |
None
|
mask
|
Tensor
|
A tensor that is used to mask the sampled latent tensor. Default is |
None
|