https://arxiv.org/pdf/2007.03898.pdf
1. Introduction
In this paper, we aim to make VAEs great again by architecture design. We propose Nouveau VAE (NVAE), a deep hierarchical VAE with a carefully designed network architecture that produces highquality images. NVAE obtains the state-of-the-art results among non-autoregressive likelihood-based generative models, reducing the gap with autoregressive models. The main building block of our network is depthwise convolutions that rapidly increase the receptive field of the network without dramatically increasing the number of parameters.
In summary, we make the following contributions:
i) We propose a novel deep hierarchical VAE, called NVAE, with depthwise convolutions in its generative model.
ii) We propose a new residual parameterization of the approximate posteriors.
iii) We stabilize training deep VAEs with spectral regularization.
iv) We provide practical solutions to reduce the memory burden of VAEs.
v) We show that deep hierarchical VAEs can obtain state-of-the-art results on several image datasets, and can produce high-quality samples even when trained with the original VAE objective. To the best of our knowledge, NVAE is the first successful application of VAEs to images as large as 256×256 pixels.
2. Method
In this paper, we propose a deep hierarchical VAE called NVAE that generates large high-quality images. NVAE’s design focuses on tackling two main challenges: (i) designing expressive neural networks specifically for VAEs, and (ii) scaling up the training to a large number of hierarchical groups and image sizes while maintaining training stability.
Training objective: the variational lower bound
Residual Cells for Encoder and Decoder:
Taming the Unbounded KL Term:
3. Experiment
Hyper-parameters:
Performance:
Comparison:
More samples:
4. Conclusion
Engineering is magic... : )