Layers

Script containing the common basic blocks (nn.Module) reused by the LadderVAE.

`BottomUpDeterministicResBlock`

Bases: ResBlockWithResampling

Resnet block for bottom-up deterministic layers.

`forward(x)`

Forward pass.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	input # TODO add shape	required

Returns:

Type	Description
`Tensor`	output # TODO add shape

`BottomUpLayer`

Bases: Module

Bottom-up deterministic layer.

It consists of one or a stack of BottomUpDeterministicResBlock's. The outputs are the so-called bu_values that are later used in the Decoder to update the generative distributions.

NOTE: When Lateral Contextualization is Enabled (i.e., enable_multiscale=True), the low-res lateral input is first fed through a BottomUpDeterministicBlock (BUDB) (without downsampling), and then merged to the latent tensor produced by the primary flow of the BottomUpLayer through the MergeLowRes layer. It is meaningful to remark that the BUDB that takes care of encoding the low-res input can be either shared with the primary flow (and in that case it is the "same_size" BUDB (or stack of BUDBs) -> see self.net), or can be a deep-copy of the primary flow's BUDB. This behaviour is controlled by lowres_separate_branch parameter.

`forward(x, lowres_x=None)`

Forward pass.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	The input of the `BottomUpLayer`, i.e., the input image or the output of the previous layer.	required
`lowres_x`	`Union[Tensor, None]`	The low-res input used for Lateral Contextualization (LC). Default is `None`.	`None`
`NOTE`			required
`tensor`			required

`GateLayer`

Bases: Module

Layer class that implements a gating mechanism.

Double the number of channels through a convolutional layer, then use half the channels as gate for the other half.

`forward(x)`

Forward pass.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	input # TODO add shape	required

Returns:

Type	Description
`Tensor`	output # TODO add shape

`MergeLayer`

Bases: Module

Layer class that merges two or more input tensors.

Merges two or more (B, C, [Z], Y, X) input tensors by concatenating them along dim=1 and passes the result through: a) a convolutional 1x1 layer (merge_type == "linear"), or b) a convolutional 1x1 layer and then a gated residual block (merge_type == "residual"), or c) a convolutional 1x1 layer and then an ungated residual block (merge_type == "residual_ungated").

`MergeLowRes`

Bases: MergeLayer

Child class of MergeLayer.

Specifically designed to merge the low-resolution patches that are used in Lateral Contextualization approach.

`forward(latent, lowres)`

Forward pass.

Parameters:

Name	Type	Description	Default
`latent`	`Tensor`	The output latent tensor from previous layer in the LVAE hierarchy.	required
`lowres`	`Tensor`	The low-res patch image to be merged to increase the context.	required

`ResBlockWithResampling`

Bases: Module

Residual block with resampling.

Residual block that takes care of resampling (i.e. downsampling or upsampling) steps (by a factor 2). It is structured as follows: 1. pre_conv: a downsampling or upsampling strided convolutional layer in case of resampling, or a 1x1 convolutional layer that maps the number of channels of the input to inner_channels. 2. ResidualBlock 3. post_conv: a 1x1 convolutional layer that maps the number of channels to c_out.

Some implementation notes: - Resampling is performed through a strided convolution layer at the beginning of the block. - The strided convolution block has fixed kernel size of 3x3 and 1 layer of padding with zeros. - The number of channels is adjusted at the beginning and end of the block through 1x1 convolutional layers. - The number of internal channels is by default the same as the number of output channels, but min_inner_channels can override the behaviour.

`forward(x)`

Forward pass.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	input # TODO add shape	required

Returns:

Type	Description
`Tensor`	output # TODO add shape

`ResidualBlock`

Bases: Module

Residual block with 2 convolutional layers.

Some architectural notes: - The number of input, intermediate, and output channels is the same, - Padding is always 'same', - The 2 convolutional layers have the same groups, - No stride allowed, - Kernel sizes must be odd.

The output isgiven by: out = gate(f(x)) + x. The presence of the gating mechanism is optional, and f(x) has different structures depending on the block_type argument. Specifically, block_type is a string specifying the block's structure, with: a = activation b = batch norm c = conv layer d = dropout. For example, "bacdbacd" defines a block with 2x[batchnorm, activation, conv, dropout].

`forward(x)`

Forward pass.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	input tensor # TODO add shape	required

Returns:

Type	Description
`Tensor`	output tensor # TODO add shape

`ResidualGatedBlock`

Bases: ResidualBlock

Layer class that implements a residual block with a gating mechanism.

`forward(x)`

Forward pass.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	input tensor # TODO add shape	required

Returns:

Type	Description
`Tensor`	output tensor # TODO add shape

`SkipConnectionMerger`

Bases: MergeLayer

Specialized MergeLayer module, handles skip connections in the model.

`TopDownDeterministicResBlock`

Bases: ResBlockWithResampling

Resnet block for top-down deterministic layers.

`forward(x)`

Forward pass.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	input # TODO add shape	required

Returns:

Type	Description
`Tensor`	output # TODO add shape

`TopDownLayer`

Bases: Module

Top-down inference layer.

It includes: - Stochastic sampling, - Computation of KL divergence, - A small deterministic ResNet that performs upsampling.

NOTE 1: The algorithm for generative inference approximately works as follows: - p_params = output of top-down layer above - bu = inferred bottom-up value at this layer - q_params = merge(bu, p_params) - z = stochastic_layer(q_params) - (optional) get and merge skip connection from prev top-down layer - top-down deterministic ResNet

NOTE 2: The Top-Down layer can work in two modes: inference and prediction/generative. Depending on the particular mode, it follows distinct behaviours: - In inference mode, parameters of q(z_i|z_i+1) are obtained from the inference path, by merging outcomes of bottom-up and top-down passes. The exception is the top layer, in which the parameters of q(z_L|x) are set as the output of the topmost bottom-up layer. - On the contrary in predicition/generative mode, parameters of q(z_i|z_i+1) can be obtained once again by merging bottom-up and top-down outputs (CONDITIONAL GENERATION), or it is possible to directly sample from the prior p(z_i|z_i+1) (UNCONDITIONAL GENERATION).

NOTE 3: When doing unconditional generation, bu_value is not available. Hence the merge layer is not used, and z is sampled directly from p_params.

NOTE 4: If this is the top layer, at inference time, the uppermost bottom-up value is used directly as q_params, and p_params are defined in this layer (while they are usually taken from the previous layer), and can be learned.

`forward(input_=None, skip_connection_input=None, inference_mode=False, bu_value=None, n_img_prior=None, forced_latent=None, force_constant_output=False, mode_pred=False, use_uncond_mode=False, var_clip_max=None)`

Forward pass.

Parameters:

Name	Type	Description	Default
`input_`	`Union[Tensor, None]`	The input tensor to the layer, which is the output of the top-down layer. Default is `None`.	`None`
`skip_connection_input`	`Union[Tensor, None]`	The tensor brought by the skip connection between the current and the previous top-down layer. Default is `None`.	`None`
`inference_mode`	`bool`	Whether the layer is in inference mode. See NOTE 2 in class description for more info. Default is `False`.	`False`
`bu_value`	`Union[Tensor, None]`	The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer. Default is `None`.	`None`
`n_img_prior`	`Union[int, None]`	The number of images to be generated from the unconditional prior distribution p(z_L). Default is `None`.	`None`
`forced_latent`	`Union[Tensor, None]`	A pre-defined latent tensor. If it is not `None`, than it is used as the actual latent tensor and, hence, sampling does not happen. Default is `None`.	`None`
`force_constant_output`	`bool`	Whether to copy the first sample (and rel. distrib parameters) over the whole batch. This is used when doing experiment from the prior - q is not used. Default is `False`.	`False`
`mode_pred`	`bool`	Whether the model is in prediction mode. Default is `False`.	`False`
`use_uncond_mode`	`bool`	Whether to use the uncoditional distribution p(z) to sample latents in prediction mode.	`False`
`var_clip_max`	`Union[float, None]`	The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped.	`None`

`get_p_params(input_, n_img_prior)`

Return the parameters of the prior distribution p(z_i|z_{i+1}).

The parameters depend on the hierarchical level of the layer: - if it is the topmost level, parameters are the ones of the prior. - else, the input from the layer above is the parameters itself.

Parameters:

Name	Type	Description	Default
`input_`	`Tensor`	The input tensor to the layer, which is the output of the top-down layer above.	required
`n_img_prior`	`int`	The number of images to be generated from the unconditional prior distribution p(z_L).	required

`sample_from_q(input_, bu_value, var_clip_max=None, mask=None)`

Method computes the latent inference distribution q(z_i|z_{i+1}).

Used for sampling a latent tensor from it.

Parameters:

Name	Type	Description	Default
`input_`	`Tensor`	The input tensor to the layer, which is the output of the top-down layer.	required
`bu_value`	`Tensor`	The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer.	required
`var_clip_max`	`Optional[float]`	The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped. Default is `None`.	`None`
`mask`	`Tensor`	A tensor that is used to mask the sampled latent tensor. Default is `None`.	`None`