LVAE
LadderVAE
Bases: Module
Constructor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_shape
|
int
|
The size of the input image. |
required |
output_channels
|
int
|
The number of output channels. |
required |
multiscale_count
|
int
|
The number of scales for multiscale processing. |
required |
z_dims
|
list[int]
|
The dimensions of the latent space for each layer. |
required |
encoder_n_filters
|
int
|
The number of filters in the encoder. |
required |
decoder_n_filters
|
int
|
The number of filters in the decoder. |
required |
encoder_conv_strides
|
list[int]
|
The strides for the conv layers encoder. |
required |
decoder_conv_strides
|
list[int]
|
The strides for the conv layers decoder. |
required |
encoder_dropout
|
float
|
The dropout rate for the encoder. |
required |
decoder_dropout
|
float
|
The dropout rate for the decoder. |
required |
nonlinearity
|
str
|
The nonlinearity function to use. |
required |
predict_logvar
|
bool
|
Whether to predict the log variance. |
required |
analytical_kl
|
bool
|
Whether to use analytical KL divergence. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If only 2D convolutions are supported. |
image_size = input_shape
instance-attribute
Input image size. (Z, Y, X) or (Y, X) if the data is 2D.
bottomup_pass(inp)
Wrapper of _bottomup_pass().
create_bottom_up_layers(lowres_separate_branch)
Method creates the stack of bottom-up layers of the Encoder.
that are used to generate the so-called bu_values.
NOTE:
If self._multiscale_count < self.n_layers, then LC is done only in the first
self._multiscale_count bottom-up layers (starting from the bottom).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lowres_separate_branch
|
bool
|
Whether the residual block(s) used for encoding the low-res input are shared
( |
required |
create_final_topdown_layer(upsample)
Create the final top-down layer of the Decoder.
NOTE: In this layer, (optional) upsampling is performed by bilinear interpolation instead of transposed convolution (like in other TD layers).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
upsample
|
bool
|
Whether to upsample the input of the final top-down layer
by bilinear interpolation with |
required |
create_first_bottom_up(init_stride, num_res_blocks=1)
Method creates the first bottom-up block of the Encoder.
Its role is to perform a first image compression step. It is composed by a sequence of nn.Conv2d + non-linearity + BottomUpDeterministicResBlock (1 or more, default is 1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
init_stride
|
int
|
The stride used by the intial Conv2d block. |
required |
num_res_blocks
|
int
|
The number of BottomUpDeterministicResBlocks, default is 1. |
1
|
create_top_down_layers()
Method creates the stack of top-down layers of the Decoder.
In these layer the bu_valuesfrom the Encoder are merged with thep_paramsfrom the previous layer
of the Decoder to getq_params. Then, a stochastic layer generates a sample from the latent distribution
with parametersq_params. Finally, this sample is fed through a TopDownDeterministicResBlock to
compute thep_params` for the layer below.
NOTE 1: The algorithm for generative inference approximately works as follows: - p_params = output of top-down layer above - bu = inferred bottom-up value at this layer - q_params = merge(bu, p_params) - z = stochastic_layer(q_params) - (optional) get and merge skip connection from prev top-down layer - top-down deterministic ResNet
NOTE 2: When doing unconditional generation, bu_value is not available. Hence the merge layer is not used, and z is sampled directly from p_params.
forward(x)
Forward pass through the LVAE model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
The input tensor of shape (B, C, H, W). |
required |
get_latent_spatial_size(level_idx)
Level_idx: 0 is the bottommost layer, the highest resolution one.
get_padded_size(size)
Returns the smallest size (H, W) of the image with actual size given as input, such that H and W are powers of 2. :param size: input size, tuple either (N, C, H, W) or (H, W) :return: 2-tuple (H, W)
reset_for_inference(tile_size=None)
Should be called if we want to predict for a different input/output size.
topdown_pass(bu_values=None, n_img_prior=None, constant_layers=None, forced_latent=None, top_down_layers=None, final_top_down_layer=None)
Method defines the forward pass through the LVAE Decoder, the so-called.
Top-Down pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bu_values
|
Union[Tensor, None]
|
Output of the bottom-up pass. It will have values from multiple layers of the ladder. |
None
|
n_img_prior
|
Union[Tensor, None]
|
When |
None
|
constant_layers
|
Union[Iterable[int], None]
|
A sequence of indexes associated to the layers in which a single instance's
z is copied over the entire batch (bottom-up path is not used, so only prior
is used here). Set to |
None
|
forced_latent
|
Union[list[Tensor], None]
|
A list of tensors that are used as fixed latent variables (hence, sampling doesn't take place in this case). |
None
|
top_down_layers
|
Union[ModuleList, None]
|
A list of top-down layers to use in the top-down pass. If |
None
|
final_top_down_layer
|
Union[Sequential, None]
|
The last top-down layer of the top-down pass. If |
None
|