Skip to content

Layers

Source

Script containing the common basic blocks (nn.Module) reused by the LadderVAE.

BottomUpDeterministicResBlock

Bases: ResBlockWithResampling

Resnet block for bottom-up deterministic layers.

forward(x)

Forward pass.

Parameters:

Name Type Description Default
x Tensor

input # TODO add shape

required

Returns:

Type Description
Tensor

output # TODO add shape

BottomUpLayer

Bases: Module

Bottom-up deterministic layer.

It consists of one or a stack of BottomUpDeterministicResBlock's. The outputs are the so-called bu_values that are later used in the Decoder to update the generative distributions.

NOTE: When Lateral Contextualization is Enabled (i.e., enable_multiscale=True), the low-res lateral input is first fed through a BottomUpDeterministicBlock (BUDB) (without downsampling), and then merged to the latent tensor produced by the primary flow of the BottomUpLayer through the MergeLowRes layer. It is meaningful to remark that the BUDB that takes care of encoding the low-res input can be either shared with the primary flow (and in that case it is the "same_size" BUDB (or stack of BUDBs) -> see self.net), or can be a deep-copy of the primary flow's BUDB. This behaviour is controlled by lowres_separate_branch parameter.

forward(x, lowres_x=None)

Forward pass.

Parameters:

Name Type Description Default
x Tensor

The input of the BottomUpLayer, i.e., the input image or the output of the previous layer.

required
lowres_x Union[Tensor, None]

The low-res input used for Lateral Contextualization (LC). Default is None.

None
NOTE
required
tensor
required

GateLayer

Bases: Module

Layer class that implements a gating mechanism.

Double the number of channels through a convolutional layer, then use half the channels as gate for the other half.

forward(x)

Forward pass.

Parameters:

Name Type Description Default
x Tensor

input # TODO add shape

required

Returns:

Type Description
Tensor

output # TODO add shape

MergeLayer

Bases: Module

Layer class that merges two or more input tensors.

Merges two or more (B, C, [Z], Y, X) input tensors by concatenating them along dim=1 and passes the result through: a) a convolutional 1x1 layer (merge_type == "linear"), or b) a convolutional 1x1 layer and then a gated residual block (merge_type == "residual"), or c) a convolutional 1x1 layer and then an ungated residual block (merge_type == "residual_ungated").

MergeLowRes

Bases: MergeLayer

Child class of MergeLayer.

Specifically designed to merge the low-resolution patches that are used in Lateral Contextualization approach.

forward(latent, lowres)

Forward pass.

Parameters:

Name Type Description Default
latent Tensor

The output latent tensor from previous layer in the LVAE hierarchy.

required
lowres Tensor

The low-res patch image to be merged to increase the context.

required

ResBlockWithResampling

Bases: Module

Residual block with resampling.

Residual block that takes care of resampling (i.e. downsampling or upsampling) steps (by a factor 2). It is structured as follows: 1. pre_conv: a downsampling or upsampling strided convolutional layer in case of resampling, or a 1x1 convolutional layer that maps the number of channels of the input to inner_channels. 2. ResidualBlock 3. post_conv: a 1x1 convolutional layer that maps the number of channels to c_out.

Some implementation notes: - Resampling is performed through a strided convolution layer at the beginning of the block. - The strided convolution block has fixed kernel size of 3x3 and 1 layer of padding with zeros. - The number of channels is adjusted at the beginning and end of the block through 1x1 convolutional layers. - The number of internal channels is by default the same as the number of output channels, but min_inner_channels can override the behaviour.

forward(x)

Forward pass.

Parameters:

Name Type Description Default
x Tensor

input # TODO add shape

required

Returns:

Type Description
Tensor

output # TODO add shape

ResidualBlock

Bases: Module

Residual block with 2 convolutional layers.

Some architectural notes: - The number of input, intermediate, and output channels is the same, - Padding is always 'same', - The 2 convolutional layers have the same groups, - No stride allowed, - Kernel sizes must be odd.

The output isgiven by: out = gate(f(x)) + x. The presence of the gating mechanism is optional, and f(x) has different structures depending on the block_type argument. Specifically, block_type is a string specifying the block's structure, with: a = activation b = batch norm c = conv layer d = dropout. For example, "bacdbacd" defines a block with 2x[batchnorm, activation, conv, dropout].

forward(x)

Forward pass.

Parameters:

Name Type Description Default
x Tensor

input tensor # TODO add shape

required

Returns:

Type Description
Tensor

output tensor # TODO add shape

ResidualGatedBlock

Bases: ResidualBlock

Layer class that implements a residual block with a gating mechanism.

forward(x)

Forward pass.

Parameters:

Name Type Description Default
x Tensor

input tensor # TODO add shape

required

Returns:

Type Description
Tensor

output tensor # TODO add shape

SkipConnectionMerger

Bases: MergeLayer

Specialized MergeLayer module, handles skip connections in the model.

TopDownDeterministicResBlock

Bases: ResBlockWithResampling

Resnet block for top-down deterministic layers.

forward(x)

Forward pass.

Parameters:

Name Type Description Default
x Tensor

input # TODO add shape

required

Returns:

Type Description
Tensor

output # TODO add shape

TopDownLayer

Bases: Module

Top-down inference layer.

It includes: - Stochastic sampling, - Computation of KL divergence, - A small deterministic ResNet that performs upsampling.

NOTE 1: The algorithm for generative inference approximately works as follows: - p_params = output of top-down layer above - bu = inferred bottom-up value at this layer - q_params = merge(bu, p_params) - z = stochastic_layer(q_params) - (optional) get and merge skip connection from prev top-down layer - top-down deterministic ResNet

NOTE 2: The Top-Down layer can work in two modes: inference and prediction/generative. Depending on the particular mode, it follows distinct behaviours: - In inference mode, parameters of q(z_i|z_i+1) are obtained from the inference path, by merging outcomes of bottom-up and top-down passes. The exception is the top layer, in which the parameters of q(z_L|x) are set as the output of the topmost bottom-up layer. - On the contrary in predicition/generative mode, parameters of q(z_i|z_i+1) can be obtained once again by merging bottom-up and top-down outputs (CONDITIONAL GENERATION), or it is possible to directly sample from the prior p(z_i|z_i+1) (UNCONDITIONAL GENERATION).

NOTE 3: When doing unconditional generation, bu_value is not available. Hence the merge layer is not used, and z is sampled directly from p_params.

NOTE 4: If this is the top layer, at inference time, the uppermost bottom-up value is used directly as q_params, and p_params are defined in this layer (while they are usually taken from the previous layer), and can be learned.

forward(input_=None, skip_connection_input=None, inference_mode=False, bu_value=None, n_img_prior=None, forced_latent=None, force_constant_output=False, mode_pred=False, use_uncond_mode=False, var_clip_max=None)

Forward pass.

Parameters:

Name Type Description Default
input_ Union[Tensor, None]

The input tensor to the layer, which is the output of the top-down layer. Default is None.

None
skip_connection_input Union[Tensor, None]

The tensor brought by the skip connection between the current and the previous top-down layer. Default is None.

None
inference_mode bool

Whether the layer is in inference mode. See NOTE 2 in class description for more info. Default is False.

False
bu_value Union[Tensor, None]

The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer. Default is None.

None
n_img_prior Union[int, None]

The number of images to be generated from the unconditional prior distribution p(z_L). Default is None.

None
forced_latent Union[Tensor, None]

A pre-defined latent tensor. If it is not None, than it is used as the actual latent tensor and, hence, sampling does not happen. Default is None.

None
force_constant_output bool

Whether to copy the first sample (and rel. distrib parameters) over the whole batch. This is used when doing experiment from the prior - q is not used. Default is False.

False
mode_pred bool

Whether the model is in prediction mode. Default is False.

False
use_uncond_mode bool

Whether to use the uncoditional distribution p(z) to sample latents in prediction mode.

False
var_clip_max Union[float, None]

The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped.

None

get_p_params(input_, n_img_prior)

Return the parameters of the prior distribution p(z_i|z_{i+1}).

The parameters depend on the hierarchical level of the layer: - if it is the topmost level, parameters are the ones of the prior. - else, the input from the layer above is the parameters itself.

Parameters:

Name Type Description Default
input_ Tensor

The input tensor to the layer, which is the output of the top-down layer above.

required
n_img_prior int

The number of images to be generated from the unconditional prior distribution p(z_L).

required

sample_from_q(input_, bu_value, var_clip_max=None, mask=None)

Method computes the latent inference distribution q(z_i|z_{i+1}).

Used for sampling a latent tensor from it.

Parameters:

Name Type Description Default
input_ Tensor

The input tensor to the layer, which is the output of the top-down layer.

required
bu_value Tensor

The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer.

required
var_clip_max Optional[float]

The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped. Default is None.

None
mask Tensor

A tensor that is used to mask the sampled latent tensor. Default is None.

None