Skip to content

Layers

Source

Script containing the common basic blocks (nn.Module) reused by the LadderVAE.

BottomUpDeterministicResBlock

Bases: ResBlockWithResampling

Resnet block for bottom-up deterministic layers.

forward(x)

Forward pass.

Parameters:

  • x (Tensor) –

    input # TODO add shape

Returns:

  • Tensor

    output # TODO add shape

BottomUpLayer

Bases: Module

Bottom-up deterministic layer.

It consists of one or a stack of BottomUpDeterministicResBlock's. The outputs are the so-called bu_values that are later used in the Decoder to update the generative distributions.

NOTE: When Lateral Contextualization is Enabled (i.e., enable_multiscale=True), the low-res lateral input is first fed through a BottomUpDeterministicBlock (BUDB) (without downsampling), and then merged to the latent tensor produced by the primary flow of the BottomUpLayer through the MergeLowRes layer. It is meaningful to remark that the BUDB that takes care of encoding the low-res input can be either shared with the primary flow (and in that case it is the "same_size" BUDB (or stack of BUDBs) -> see self.net), or can be a deep-copy of the primary flow's BUDB. This behaviour is controlled by lowres_separate_branch parameter.

__init__(n_res_blocks, n_filters, conv_strides=(2, 2), downsampling_steps=0, nonlin=None, batchnorm=True, dropout=None, res_block_type=None, res_block_kernel=None, gated=None, enable_multiscale=False, multiscale_lowres_size_factor=None, lowres_separate_branch=False, multiscale_retain_spatial_dims=False, decoder_retain_spatial_dims=False, output_expected_shape=None)

Constructor.

Parameters:

  • n_res_blocks (int) –

    Number of BottomUpDeterministicResBlock modules stacked in this layer.

  • n_filters (int) –

    Number of channels present through out the layers of this block.

  • downsampling_steps (int, default: 0 ) –

    Number of downsampling steps that has to be done in this layer (typically 1). Default is 0.

  • nonlin (Optional[Callable], default: None ) –

    The non-linearity function used in the block. Default is None.

  • batchnorm (bool, default: True ) –

    Whether to use batchnorm layers. Default is True.

  • dropout (Optional[float], default: None ) –

    The dropout probability in dropout layers. If None dropout is not used. Default is None.

  • res_block_type (Optional[str], default: None ) –

    A string specifying the structure of residual block. Check ResidualBlock doscstring for more information. Default is None.

  • res_block_kernel (Optional[int], default: None ) –

    The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is None.

  • gated (Optional[bool], default: None ) –

    Whether to use gated layer. Default is None.

  • enable_multiscale (bool, default: False ) –

    Whether to enable multiscale (Lateral Contextualization) or not. Default is False.

  • multiscale_lowres_size_factor (Optional[int], default: None ) –

    A factor the expresses the relative size of the primary flow tensor with respect to the lower-resolution lateral input tensor. Default in None.

  • lowres_separate_branch (bool, default: False ) –

    Whether the residual block(s) encoding the low-res input should be shared (False) or not (True) with the primary flow "same-size" residual block(s). Default is False.

  • multiscale_retain_spatial_dims (bool, default: False ) –

    Whether to pad the latent tensor resulting from the bottom-up layer's primary flow to match the size of the low-res input. Default is False.

  • decoder_retain_spatial_dims (bool, default: False ) –

    Whether in the corresponding top-down layer the shape of tensor is retained between input and output. Default is False.

  • output_expected_shape (Optional[Iterable[int]], default: None ) –

    The expected shape of the layer output (only used if enable_multiscale == True). Default is None.

forward(x, lowres_x=None)

Forward pass.

Parameters:

  • x (Tensor) –

    The input of the BottomUpLayer, i.e., the input image or the output of the previous layer.

  • lowres_x (Union[Tensor, None], default: None ) –

    The low-res input used for Lateral Contextualization (LC). Default is None.

  • NOTE
  • tensor

GateLayer

Bases: Module

Layer class that implements a gating mechanism.

Double the number of channels through a convolutional layer, then use half the channels as gate for the other half.

forward(x)

Forward pass.

Parameters:

  • x (Tensor) –

    input # TODO add shape

Returns:

  • Tensor

    output # TODO add shape

MergeLayer

Bases: Module

Layer class that merges two or more input tensors.

Merges two or more (B, C, [Z], Y, X) input tensors by concatenating them along dim=1 and passes the result through: a) a convolutional 1x1 layer (merge_type == "linear"), or b) a convolutional 1x1 layer and then a gated residual block (merge_type == "residual"), or c) a convolutional 1x1 layer and then an ungated residual block (merge_type == "residual_ungated").

__init__(merge_type, channels, conv_strides=(2, 2), nonlin=nn.LeakyReLU(), batchnorm=True, dropout=None, res_block_type=None, res_block_kernel=None, conv2d_bias=True)

Constructor.

Parameters:

  • merge_type (Literal['linear', 'residual', 'residual_ungated']) –

    The type of merge done in the layer. It can be chosen between "linear", "residual", and "residual_ungated". Check the class docstring for more information about the behaviour of different merge modalities.

  • channels (Union[int, Iterable[int]]) –

    The number of channels used in the convolutional blocks of this layer. If it is an int: - 1st 1x1 Conv2d: in_channels=2*channels, out_channels=channels - (Optional) ResBlock: in_channels=channels, out_channels=channels If it is an Iterable (must have len(channels)==3): - 1st 1x1 Conv2d: in_channels=sum(channels[:-1]), out_channels=channels[-1] - (Optional) ResBlock: in_channels=channels[-1], out_channels=channels[-1]

  • conv_strides (tuple[int], default: (2, 2) ) –

    The strides used in the convolutions. Default is (2, 2).

  • nonlin (Callable, default: LeakyReLU() ) –

    The non-linearity function used in the block. Default is nn.LeakyReLU.

  • batchnorm (bool, default: True ) –

    Whether to use batchnorm layers. Default is True.

  • dropout (Optional[float], default: None ) –

    The dropout probability in dropout layers. If None dropout is not used. Default is None.

  • res_block_type (Optional[str], default: None ) –

    A string specifying the structure of residual block. Check ResidualBlock doscstring for more information. Default is None.

  • res_block_kernel (Optional[int], default: None ) –

    The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is None.

  • conv2d_bias (Optional[bool], default: True ) –

    Whether to use bias term in convolutions. Default is True.

MergeLowRes

Bases: MergeLayer

Child class of MergeLayer.

Specifically designed to merge the low-resolution patches that are used in Lateral Contextualization approach.

forward(latent, lowres)

Forward pass.

Parameters:

  • latent (Tensor) –

    The output latent tensor from previous layer in the LVAE hierarchy.

  • lowres (Tensor) –

    The low-res patch image to be merged to increase the context.

ResBlockWithResampling

Bases: Module

Residual block with resampling.

Residual block that takes care of resampling (i.e. downsampling or upsampling) steps (by a factor 2). It is structured as follows: 1. pre_conv: a downsampling or upsampling strided convolutional layer in case of resampling, or a 1x1 convolutional layer that maps the number of channels of the input to inner_channels. 2. ResidualBlock 3. post_conv: a 1x1 convolutional layer that maps the number of channels to c_out.

Some implementation notes: - Resampling is performed through a strided convolution layer at the beginning of the block. - The strided convolution block has fixed kernel size of 3x3 and 1 layer of padding with zeros. - The number of channels is adjusted at the beginning and end of the block through 1x1 convolutional layers. - The number of internal channels is by default the same as the number of output channels, but min_inner_channels can override the behaviour.

__init__(mode, c_in, c_out, conv_strides, min_inner_channels=None, nonlin=nn.LeakyReLU(), resample=False, res_block_kernel=None, groups=1, batchnorm=True, res_block_type=None, dropout=None, gated=None, conv2d_bias=True)

Constructor.

Parameters:

  • mode (Literal['top-down', 'bottom-up']) –

    The type of resampling performed in the initial strided convolution of the block. If "bottom-up" downsampling of a factor 2 is done. If "top-down" upsampling of a factor 2 is done.

  • c_in (int) –

    The number of input channels.

  • c_out (int) –

    The number of output channels.

  • min_inner_channels (Union[int, None], default: None ) –

    The number of channels used in the inner layer of this module. Default is None, meaning that the number of inner channels is set to c_out.

  • nonlin (Callable, default: LeakyReLU() ) –

    The non-linearity function used in the block. Default is nn.LeakyReLU.

  • resample (bool, default: False ) –

    Whether to perform resampling in the first convolutional layer. If False, the first convolutional layer just maps the input to a tensor with inner_channels channels through 1x1 convolution. Default is False.

  • res_block_kernel (Optional[Union[int, Iterable[int]]], default: None ) –

    The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is None.

  • groups (int, default: 1 ) –

    The number of groups to consider in the convolutions. Default is 1.

  • batchnorm (bool, default: True ) –

    Whether to use batchnorm layers. Default is True.

  • res_block_type (Union[str, None], default: None ) –

    A string specifying the structure of residual block. Check ResidualBlock doscstring for more information. Default is None.

  • dropout (Union[float, None], default: None ) –

    The dropout probability in dropout layers. If None dropout is not used. Default is None.

  • gated (Union[bool, None], default: None ) –

    Whether to use gated layer. Default is None.

  • conv2d_bias (bool, default: True ) –

    Whether to use bias term in convolutions. Default is True.

forward(x)

Forward pass.

Parameters:

  • x (Tensor) –

    input # TODO add shape

Returns:

  • Tensor

    output # TODO add shape

ResidualBlock

Bases: Module

Residual block with 2 convolutional layers.

Some architectural notes: - The number of input, intermediate, and output channels is the same, - Padding is always 'same', - The 2 convolutional layers have the same groups, - No stride allowed, - Kernel sizes must be odd.

The output isgiven by: out = gate(f(x)) + x. The presence of the gating mechanism is optional, and f(x) has different structures depending on the block_type argument. Specifically, block_type is a string specifying the block's structure, with: a = activation b = batch norm c = conv layer d = dropout. For example, "bacdbacd" defines a block with 2x[batchnorm, activation, conv, dropout].

__init__(channels, nonlin, conv_strides=(2, 2), kernel=None, groups=1, batchnorm=True, block_type=None, dropout=None, gated=None, conv2d_bias=True)

Constructor.

Parameters:

  • channels (int) –

    The number of input and output channels (they are the same).

  • nonlin (Callable) –

    The non-linearity function used in the block (e.g., nn.ReLU).

  • kernel (Union[int, Iterable[int], None], default: None ) –

    The kernel size used in the convolutions of the block. It can be either a single integer or a pair of integers defining the squared kernel. Default is None.

  • groups (int, default: 1 ) –

    The number of groups to consider in the convolutions. Default is 1.

  • batchnorm (bool, default: True ) –

    Whether to use batchnorm layers. Default is True.

  • block_type (str, default: None ) –

    A string specifying the block structure, check class docstring for more info. Default is None.

  • dropout (float, default: None ) –

    The dropout probability in dropout layers. If None dropout is not used. Default is None.

  • gated (bool, default: None ) –

    Whether to use gated layer. Default is None.

  • conv2d_bias (bool, default: True ) –

    Whether to use bias term in convolutions. Default is True.

forward(x)

Forward pass.

Parameters:

  • x (Tensor) –

    input tensor # TODO add shape

Returns:

  • Tensor

    output tensor # TODO add shape

ResidualGatedBlock

Bases: ResidualBlock

Layer class that implements a residual block with a gating mechanism.

forward(x)

Forward pass.

Parameters:

  • x (Tensor) –

    input tensor # TODO add shape

Returns:

  • Tensor

    output tensor # TODO add shape

SkipConnectionMerger

Bases: MergeLayer

Specialized MergeLayer module, handles skip connections in the model.

__init__(nonlin, channels, batchnorm, dropout, res_block_type, conv_strides=(2, 2), merge_type='residual', conv2d_bias=True, res_block_kernel=None)

Constructor.

nonlin: Callable, optional The non-linearity function used in the block. Default is nn.LeakyReLU. channels: Union[int, Iterable[int]] The number of channels used in the convolutional blocks of this layer. If it is an int: - 1st 1x1 Conv2d: in_channels=2*channels, out_channels=channels - (Optional) ResBlock: in_channels=channels, out_channels=channels If it is an Iterable (must have len(channels)==3): - 1st 1x1 Conv2d: in_channels=sum(channels[:-1]), out_channels=channels[-1] - (Optional) ResBlock: in_channels=channels[-1], out_channels=channels[-1] batchnorm: bool Whether to use batchnorm layers. dropout: float The dropout probability in dropout layers. If None dropout is not used. res_block_type: str A string specifying the structure of residual block. Check ResidualBlock doscstring for more information. conv_strides: tuple, optional The strides used in the convolutions. Default is (2, 2). merge_type: Literal["linear", "residual", "residual_ungated"] The type of merge done in the layer. It can be chosen between "linear", "residual", and "residual_ungated". Check the class docstring for more information about the behaviour of different merge modalities. conv2d_bias: bool, optional Whether to use bias term in convolutions. Default is True. res_block_kernel: Union[int, Iterable[int]], optional The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is None.

TopDownDeterministicResBlock

Bases: ResBlockWithResampling

Resnet block for top-down deterministic layers.

forward(x)

Forward pass.

Parameters:

  • x (Tensor) –

    input # TODO add shape

Returns:

  • Tensor

    output # TODO add shape

TopDownLayer

Bases: Module

Top-down inference layer.

It includes: - Stochastic sampling, - Computation of KL divergence, - A small deterministic ResNet that performs upsampling.

NOTE 1: The algorithm for generative inference approximately works as follows: - p_params = output of top-down layer above - bu = inferred bottom-up value at this layer - q_params = merge(bu, p_params) - z = stochastic_layer(q_params) - (optional) get and merge skip connection from prev top-down layer - top-down deterministic ResNet

NOTE 2: The Top-Down layer can work in two modes: inference and prediction/generative. Depending on the particular mode, it follows distinct behaviours: - In inference mode, parameters of q(z_i|z_i+1) are obtained from the inference path, by merging outcomes of bottom-up and top-down passes. The exception is the top layer, in which the parameters of q(z_L|x) are set as the output of the topmost bottom-up layer. - On the contrary in predicition/generative mode, parameters of q(z_i|z_i+1) can be obtained once again by merging bottom-up and top-down outputs (CONDITIONAL GENERATION), or it is possible to directly sample from the prior p(z_i|z_i+1) (UNCONDITIONAL GENERATION).

NOTE 3: When doing unconditional generation, bu_value is not available. Hence the merge layer is not used, and z is sampled directly from p_params.

NOTE 4: If this is the top layer, at inference time, the uppermost bottom-up value is used directly as q_params, and p_params are defined in this layer (while they are usually taken from the previous layer), and can be learned.

__init__(z_dim, n_res_blocks, n_filters, conv_strides, is_top_layer=False, upsampling_steps=None, nonlin=None, merge_type=None, batchnorm=True, dropout=None, stochastic_skip=False, res_block_type=None, res_block_kernel=None, groups=1, gated=None, learn_top_prior=False, top_prior_param_shape=None, analytical_kl=False, retain_spatial_dims=False, vanilla_latent_hw=None, input_image_shape=None, normalize_latent_factor=1.0, conv2d_bias=True, stochastic_use_naive_exponential=False)

Constructor.

Parameters:

  • z_dim (int) –

    The size of the latent space.

  • n_res_blocks (int) –

    The number of TopDownDeterministicResBlock blocks

  • n_filters (int) –

    The number of channels present through out the layers of this block.

  • conv_strides (tuple[int]) –

    The strides used in the convolutions. Default is (2, 2).

  • is_top_layer (bool, default: False ) –

    Whether the current layer is at the top of the Decoder hierarchy. Default is False.

  • upsampling_steps (Union[int, None], default: None ) –

    The number of upsampling steps that has to be done in this layer (typically 1). Default is None.

  • nonlin (Union[Callable, None], default: None ) –

    The non-linearity function used in the block (e.g., nn.ReLU). Default is None.

  • merge_type (Union[Literal['linear', 'residual', 'residual_ungated'], None], default: None ) –

    The type of merge done in the layer. It can be chosen between "linear", "residual", and "residual_ungated". Check the MergeLayer class docstring for more information about the behaviour of different merging modalities. Default is None.

  • batchnorm (bool, default: True ) –

    Whether to use batchnorm layers. Default is True.

  • dropout (Union[float, None], default: None ) –

    The dropout probability in dropout layers. If None dropout is not used. Default is None.

  • stochastic_skip (bool, default: False ) –

    Whether to use skip connections between previous top-down layer's output and this layer's stochastic output. Stochastic skip connection allows the previous layer's output has a way to directly reach this hierarchical level, hence facilitating the gradient flow during backpropagation. Default is False.

  • res_block_type (Union[str, None], default: None ) –

    A string specifying the structure of residual block. Check ResidualBlock documentation for more information. Default is None.

  • res_block_kernel (Union[int, None], default: None ) –

    The kernel size used in the convolutions of the residual block. It can be either a single integer or a pair of integers defining the squared kernel. Default is None.

  • groups (int, default: 1 ) –

    The number of groups to consider in the convolutions. Default is 1.

  • gated (Union[bool, None], default: None ) –

    Whether to use gated layer in ResidualBlock. Default is None.

  • learn_top_prior (bool, default: False ) –

    Whether to set the top prior as learnable. If this is set to False, in the top-most layer the prior will be N(0,1). Otherwise, we will still have a normal distribution whose parameters will be learnt. Default is False.

  • top_prior_param_shape (Union[Iterable[int], None], default: None ) –

    The size of the tensor which expresses the mean and the variance of the prior for the top most layer. Default is None.

  • analytical_kl (bool, default: False ) –

    If True, KL divergence is calculated according to the analytical formula. Otherwise, an MC approximation using sampled latents is calculated. Default is False.

  • retain_spatial_dims (bool, default: False ) –

    If True, the size of Encoder's latent space is kept to input_image_shape within the topdown layer. This implies that the oput spatial size equals the input spatial size. To achieve this, we centercrop the intermediate representation. Default is False.

  • vanilla_latent_hw (Union[Iterable[int], None], default: None ) –

    The shape of the latent tensor used for prediction (i.e., it influences the computation of restricted KL). Default is None.

  • input_image_shape (Union[tuple[int, int], None], default: None ) –

    The shape of the input image tensor. When retain_spatial_dims is set to True, this is used to ensure that the shape of this layer output has the same shape as the input. Default is None.

  • normalize_latent_factor (float, default: 1.0 ) –

    A factor used to normalize the latent tensors q_params. Specifically, normalization is done by dividing the latent tensor by this factor. Default is 1.0.

  • conv2d_bias (bool, default: True ) –

    Whether to use bias term is the convolutional blocks of this layer. Default is True.

  • stochastic_use_naive_exponential (bool, default: False ) –

    If False, in the NormalStochasticBlock2d exponentials are computed according to the alternative definition provided by StableExponential class. This should improve numerical stability in the training process. Default is False.

forward(input_=None, skip_connection_input=None, inference_mode=False, bu_value=None, n_img_prior=None, forced_latent=None, force_constant_output=False, mode_pred=False, use_uncond_mode=False, var_clip_max=None)

Forward pass.

Parameters:

  • input_ (Union[Tensor, None], default: None ) –

    The input tensor to the layer, which is the output of the top-down layer. Default is None.

  • skip_connection_input (Union[Tensor, None], default: None ) –

    The tensor brought by the skip connection between the current and the previous top-down layer. Default is None.

  • inference_mode (bool, default: False ) –

    Whether the layer is in inference mode. See NOTE 2 in class description for more info. Default is False.

  • bu_value (Union[Tensor, None], default: None ) –

    The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer. Default is None.

  • n_img_prior (Union[int, None], default: None ) –

    The number of images to be generated from the unconditional prior distribution p(z_L). Default is None.

  • forced_latent (Union[Tensor, None], default: None ) –

    A pre-defined latent tensor. If it is not None, than it is used as the actual latent tensor and, hence, sampling does not happen. Default is None.

  • force_constant_output (bool, default: False ) –

    Whether to copy the first sample (and rel. distrib parameters) over the whole batch. This is used when doing experiment from the prior - q is not used. Default is False.

  • mode_pred (bool, default: False ) –

    Whether the model is in prediction mode. Default is False.

  • use_uncond_mode (bool, default: False ) –

    Whether to use the uncoditional distribution p(z) to sample latents in prediction mode.

  • var_clip_max (Union[float, None], default: None ) –

    The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped.

get_p_params(input_, n_img_prior)

Return the parameters of the prior distribution p(z_i|z_{i+1}).

The parameters depend on the hierarchical level of the layer: - if it is the topmost level, parameters are the ones of the prior. - else, the input from the layer above is the parameters itself.

Parameters:

  • input_ (Tensor) –

    The input tensor to the layer, which is the output of the top-down layer above.

  • n_img_prior (int) –

    The number of images to be generated from the unconditional prior distribution p(z_L).

sample_from_q(input_, bu_value, var_clip_max=None, mask=None)

Method computes the latent inference distribution q(z_i|z_{i+1}).

Used for sampling a latent tensor from it.

Parameters:

  • input_ (Tensor) –

    The input tensor to the layer, which is the output of the top-down layer.

  • bu_value (Tensor) –

    The tensor defining the parameters /mu_q and /sigma_q computed during the bottom-up deterministic pass at the correspondent hierarchical layer.

  • var_clip_max (Optional[float], default: None ) –

    The maximum value reachable by the log-variance of the latent distribution. Values exceeding this threshold are clipped. Default is None.

  • mask (Tensor, default: None ) –

    A tensor that is used to mask the sampled latent tensor. Default is None.