前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >5篇生成模型相关 paper

5篇生成模型相关 paper

作者头像
CreateAMind
发布2018-07-24 14:55:24
3730
发布2018-07-24 14:55:24
举报
文章被收录于专栏:CreateAMind

1 bayes gan

Bayesian GAN Yunus Saatchi Andrew Gordon Wilson

Uber AI Labs

Cornell University

Abstract

Generative adversarial networks (GANs) can implicitly learn rich distributions over images, audio, and data which are hard to model with an explicit likelihood. We present a practical Bayesian formulation for unsupervised and semi-supervised learning with GANs. Within this framework, we use stochastic gradient Hamilto- nian Monte Carlo to marginalize the weights of the generator and discriminator networks. The resulting approach is straightforward and obtains good performance without any standard interventions such as feature matching or mini-batch discrim- ination. By exploring an expressive posterior over the parameters of the generator, the Bayesian GAN avoids mode-collapse, produces interpretable and diverse candi- date samples, and provides state-of-the-art quantitative results for semi-supervised learning on benchmarks including SVHN, CelebA, and CIFAR-10, outperforming DCGAN, Wasserstein GANs, and DCGAN ensembles.

2

Variational Inference of Disentangled Latent Concepts from Unlabeled Observations

Abhishek Kumar⇤ Prasanna Sattigeri⇤ Avinash Balakrishnan⇤

Abstract

Disentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, o↵er several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We evaluate the proposed approach using several quantitative metrics and empirically observe significant gains over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).

1 Introduction

3

Information Dropout: Learning Optimal Representations Through Noisy ComputationAlessandro Achille and Stefano Soatto

Department of Computer Science University of California, Los Angeles 405 Hilgard Ave, Los Angeles, 90095, CA, USA Email: {achille,soatto}@cs.ucla.edu

F

1

Abstract—The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of dropout. We show that our regularized loss function can be efficiently minimized using Information Dropout, a generalization of dropout rooted in information theoretic principles that automatically adapts to the data and can better exploit architectures of limited capacity. When the task is the reconstruction of the input, we show that our loss function yields a Variational Autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Finally, we prove that we can promote the creation of disentangled representations simply by enforcing a factorized prior, a fact that has been observed empirically in recent work. Our experiments validate the theoretical intuitions behind our method, and we find that information dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.

1 INTRODUCTION

4

Opening the black box of Deep Neural Networks via Information

Ravid Schwartz-Ziv

Edmond and Lilly Safra Center for Brain Sciences The Hebrew University of Jerusalem Jerusalem, 91904, Israel

Naftali Tishby∗ School of Engineering and Computer Science and Edmond and Lilly Safra Center for Brain Sciences The Hebrew University of Jerusalem Jerusalem, 91904, Israel

Editor: ICRI-CI

Abstract

Despite their great success, there is still no comprehensive theoretical understanding of learning with Deep Neural Networks (DNNs) or their inner organization. Previous work [Tishby and Za- slavsky (2015)] proposed to analyze DNNs in the Information Plane; i.e., the plane of the Mutual Information values that each layer preserves on the input and output variables. They suggested that the goal of the network is to optimize the Information Bottleneck (IB) tradeoff between compres- sion and prediction, successively, for each layer.

In this work we follow up on this idea and demonstrate the effectiveness of the Information- Plane visualization of DNNs. Our main results are: (i) most of the training epochs in standard DL are spent on compression of the input to efficient representation and not on fitting the training labels. (ii) The representation compression phase begins when the training errors becomes small and the Stochastic Gradient Decent (SGD) epochs change from a fast drift to smaller training error into a stochastic relaxation, or random diffusion, constrained by the training error value. (iii) The converged layers lie on or very close to the Information Bottleneck (IB) theoretical bound, and the maps from the input to any hidden layer and from this hidden layer to the output satisfy the IB self-consistent equations. This generalization through noise mechanism is unique to Deep Neural Networks and absent in one layer networks. (iv) The training time is dramatically reduced when adding more hidden layers. Thus the main advantage of the hidden layers is computational. This can be explained by the reduced relaxation time, as this it scales super-linearly (exponentially for simple diffusion) with the information compression from the previous layer. (v) As we expect critical slowing down of the stochastic relaxation near phase transitions on the IB curve, we expect the hidden layers to converge to such critical points.1 Keywords: Deep Neural Networks, Deep Learning, Information Bottleneck, Representation Learning

5

SEMANTIC INTERPOLATION IN IMPLICIT MODELS

Yannic Kilcher, Aure ́lien Lucchi, Thomas Hofmann

Department of Computer Science ETH Zurich{yannic.kilcher,aurelien.lucchi,thomas.hofmann}@inf.ethz.ch

ABSTRACT

In implicit models, one often interpolates between sampled points in latent space. As we show in this paper, care needs to be taken to match-up the distributional assumptions on code vectors with the geometry of the interpolating paths. Oth- erwise, typical assumptions about the quality and semantics of in-between points may not be justified. Based on our analysis we propose to modify the prior code distribution to put significantly more probability mass closer to the origin. As a result, linear interpolation paths are not only shortest paths, but they are also guar- anteed to pass through high-density regions, irrespective of the dimensionality of the latent space. Experiments on standard benchmark image datasets demonstrate clear visual improvements in the quality of the generated samples and exhibit more meaningful interpolation paths.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2017-11-14,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档