InfoGAN、GAN训练不稳定因素分析

InfoGAN: using the variational bound on mutual information (twice)

Many people have recommended me the infoGAN paper, but I hadn't taken the time to read it until recently. It is actually quite cool:

Summary of this note

  • I show how the original GAN algorithm can be derived using exactly the same variational lower-bound that the authors use in this paper (see also this blog post by Yingzhen)
  • However, GANs use the bound in the wrong direction and end up minimising a lower bound which is not a good thing to do
  • InfoGANs can be expressed purely in terms of mutual information, and applying the variational bound twice: once in the correct direction, once in the wrong direction
  • I believe that the unstable behaviour of GANs is partially explained by using the bound in the incorrect way

Mini-review

The InfoGAN idea is pretty simple. The paper presents an extension to the GAN objective. A new term encourages high mutual information between generated samples and a small subset of latent variables cc. The hope is that by forcing high information content, we cram the most interesting aspects of the representation into cc.

If we were successful, cc ends up representing the most salient and most meaningful sources of variation in the data, while the rest of the noise variables zz will account for additional, meaningless sources of variation and can essentially be dismissed as uncompressible noise.

In order to maximise the mutual information, the authors make use of a variational lower bound. This, conveniently, results in a recognition model, similar to the one we see in variational autoencoders. The recognition model infers latent representation cc from data.

The paper is pretty cool, the results are convincing. I found the notation and derivation a bit confusing, so here is my mini-review:

  • I think the introduction, I don't think it's fair to say "To the best of our knowledge, the only other unsupervised method that learns disentangled representations is hossRBM". There are loads of other methods that attempt this.
  • I believe Lemma 5.1 is basically a trivial application of the theorem of total expectation, and I really don't see the need to provide a proof for that (maybe reviewers asked for a proof).

ref paper eqn 5

My view on InfoGANs

I think there is an interesting connection that the authors did not mention (frankly, it probably would have overcomplicated the presentation). The connection is that original GAN objective itself can be derived from mutual information, and in fact, the discriminator D can be thought of as a variational auxillary variable, exactly the same role as the recognition model q(c|x)q(c|x) in the InfoGAN paper.

The connection relies on the interpretation of Jensen-Shannon divergence as mutual information (see e.g. Yingzen's blog postGANs, mutual information, and possibly algorithm selection?). Here is my graphical model view on InfoGANs that may put things in a slightly different light:

Let's consider the joint distribution of a bunch of varibles:

Now, the main problem is with this derivation is that we were supposed to minimise ℓGANℓGAN, so we really would like an upper bound instead of a lower bound. But the variational method only provides a lower bound. Therefore,

GANs minimise a lower bound, which I believe accounts for some of their unstable behaviour

InfoGANs use the bound twice

Recall that the idealised InfoGAN objective is the weighted difference of two mutual information terms.

To arrive at the algorithm the authors used, one uses the bound on both mutual information terms.

  • When you apply the bound on the first term, you get a lower bound, and you introduce an auxillary distribution that ends up being called the discriminator. This application of the bound is wrong because it bounds the loss function from the wrong side.
  • When you apply the bound on the second term, you end up upper bounding the loss function, because of the negative sign. This is a good thing. The combination of a lower bound and an upper bound means that you don't even know which direction you're bounding or approximating the loss function from anymore, it's neither an upper or a lower bound.

本文由zdx3578推荐。

原文发布于微信公众号 - CreateAMind(createamind)

原文发表时间:2016-11-12

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏Aloys的开发之路

Peer Code Reviews Made Easy with Eclipse Plug-In

Origin Article: Peer Code Reviews Made Easy with Eclipse Plug-In Origin Author: ...

1746
来自专栏Pulsar-V

OpenCV编译参数一览

全部参数一览 //Path to a program. ANT_EXECUTABLE:FILEPATH=D:/apache-ant-1.10.1/bin/ant...

7986
来自专栏小樱的经验随笔

BZOJ 1046: [HAOI2007]上升序列【贪心+二分状态+dp+递归】

1046: [HAOI2007]上升序列 Time Limit: 10 Sec  Memory Limit: 162 MB Submit: 4987  Solv...

2825
来自专栏鸿的学习笔记

16位顶级数据科学家语录

Chief Data Scientist at The New York Times & Associate Professor of Applied Ma...

872
来自专栏CreateAMind

如何制作一个自动驾驶小车(含代码)

I’ve just finished a recent side project with my friend Kendrick. (his GitHub) ...

1673
来自专栏c#开发者

How to pass multiple values from child widow

In my previous post which was relating to implementing dialog box in web based a...

3668
来自专栏Golang语言社区

Knapsack problem algorithms for my real-life carry-on knapsack

I'm a nomad and live out of one carry-on bag. This means that the total weight o...

1412
来自专栏CSDN技术头条

大数据与Hadoop最有影响力150人(英)

There are more than 284 million activeusers on twitter. This makes following t...

3217
来自专栏张善友的专栏

Microsoft training Kits

Microsoft training kits对于开始学习一门新技术的时候是一个非常好的资料.下面是一些training kits列表: .NET Framew...

1978
来自专栏HansBug's Lab

1653: [Usaco2006 Feb]Backward Digit Sums

1653: [Usaco2006 Feb]Backward Digit Sums Time Limit: 5 Sec  Memory Limit: 64 MB ...

2664

扫码关注云+社区

领取腾讯云代金券