使用infogan学习可解释的隐变量特征学习-及代码示例(代码和官方有差异)

第二部分为代码!loss部分与openai实现有差异。

原论文中给出的学习到的语义特征有如下:

粗细、倾斜

脸的朝向、寛瘦、光线强弱

旋转、宽瘦

亮度等

角度、眼镜、发型、情绪

Learning Interpretable Latent Representations with InfoGAN

A tutorial on implementing InfoGAN in Tensorflow

In this week’s post I want to explore a simple addition to Generative Adversarial Networks which make them more useful for both researchers interested in their potential as an unsupervised learning tool 无监督, as well as the enthusiast or practitioner who wants more control over the kinds of data they can generate. If you are new to GANs, check out this earlier tutorial I wrote a couple weeks ago introducing them. The addition I want to go over in this post is called InfoGAN, and it was introduced in this paper published by OpenAI earlier this year. It allows GANs to learn disentangled latent representations, which can then be exploited in a number of useful ways. For those interested in the mathematics behind the technique, I high recommend reading the paper, as it is a theoretically interesting approach. In this post though, I would like to provide a more intuitive explanation of what InfoGANs do, and how they can be easily implemented in current GANs.

Why InfoGAN?

The structure of a GAN is as follows: a generator (G) and discriminator (D) are updated in competing fashion in order to produce realistic data samples from latent variables. The discriminator is optimized to differentiate between the generator’s created samples and the true data samples, while the generator is optimized to produce samples most likely to trick the discriminator. In order to generate a sample, the generator is fed with a randomly noise vector z. This z is the set of latent variables used by the generator to produce samples. These latent variables can be thought of as the seeds which contain all the information needed to grow our data sample. Once the GAN has been trained to convergence, each variable in z should hypothetically corresponds to some aspect of the generated sample. Ideally each of these variables would not only correspond to the data, but do so in a semantically meaningful way. In image data for example, we would expect some variables to adjust lighting, other to adjust object position, and others to adjust colors. In reality however, once a GAN is trained the z variables often fail to correspond to any semantically decipherable aspects of an image.

论文中学习到的语义特征如本文开头部分图片展示。

This is where the InfoGAN comes in. By introducing a mutual information maximization objective into the GAN, we can have the network learn to disentangle the representations of the latent variables in addition to produce visually convincing samples. Using this technique we can have the GAN learn meaningful latent variables that can theoretically correspond to aspects of an image such as the presence or absence of certain objects, or positional features of a scene. One of the more impressive aspects of InfoGAN is that unlike Conditional GAN models which rely on supervised learning to introduce meaningful latent codes, InfoGAN is entirely unsupervised, and the model learns to generate semantically meaningful latent variables automatically. All we have to provide it with is a template for the kinds of variables we want it to learn.

Implementing InfoGAN

With a few additions, we can easily implement InfoGAN on top of the DCGAN model I discussed here. In that example we built a GAN which could generate MNIST digits. With InfoGAN we can have it learn a representation in which each of the ten digits corresponds to a single latent variable.

Little adjustment needs to be made to the generator network in order for it to be used as an InfoGAN. In addition to the original z vector, we simply add a set of c vectors which correspond to the latent representations we want our model to learn. For categorical representations such as type of object, we will use one-hot vectors, and for continuous representations such as object rotation, we will use float variables between -1 and 1, drawn from a random uniform distribution. These c variables are concatenated with the z vector and all are fed into the generator together.

The discriminator network is where the bulk of the change happens. It is here that the constraints are determined and enforced between the latent variables and the properties of the generated image. In order to impose these constraints, we add a second top layer to the discriminator. We refer to this as a Q-network. Not to be confused with the Q-networks from reinforcement learning, we use Q due to the theoretical connection with variational inference techniques which utilizes a q-distribution to approximate a true-data p-distribution. This Q-network shares all lower-layers with the discriminator, and only the top layers are different. After a fully connected layer, the Q-network contains softmax layers for categorical variables, and a tanh layer for continuous variables. We then compute the loss for categorical outputs with c*log(ĉ) and continuous outputs with |c -ĉ|². In both cases c refers to the assigned latent value given to the generator, and ĉ refers to the value determined by the Q-network. By minimizing the distance between c and ĉ, we can ensure that the generator is learning to produce samples which utilize the latent information in a semantically meaningful way.

We now just ensure we generate appropriate latent c variables to attach to the z vector when training the GAN. During optimization, in addition to updating the discriminator network using the discriminator loss, and updating the generator using the generator loss, we also update all variables using the Q-network losses.

Returning to the MNIST dataset, we can define a new categorical latent variable with 10 possible values, corresponding to the 10 digits. We can also define two latent continuous variables, with values ranging from -1 to 1. Once the training process has converged, the network learns to produce different images depending on the value of those latent variables.

technique can be applied to any dataset in which the underlying data distribution has some properties which could be meaningfully broken up categorically or continuously. Although MNIST is relatively simple, many datasets often have multiple obvious categorical distinctions which an InfoGAN could be trained to learn. In addition to more purposeful data generation, InfoGAN can also be used as a first step in other supervised learning problems. For example, the Q-network portion of the GAN trained in the way described above could serve as a classifier for new real-world data.

For an implementation of the entire thing in Tensorflow, see the Gist below:

见下文

When I first learned about InfoGAN it was clearly a promising idea that took me a while to wrap my head around. I hope this tutorial has made understanding it a little clearer for those new to the concept. If you’ve read this and still want to know more, I recommend the original OpenAI paper, as well as their implementation, both of which are excellent.

If you’d like to follow my work on Deep Learning, AI, and Cognitive Science, follow me on Medium @Arthur Juliani, or on twitter @awjliani.

code: https://gist.github.com/awjuliani/c9ecd8b37d33d6855cd4ed9aa16ce89f#file-infogan-tutorial-ipynb

原文发布于微信公众号 - CreateAMind(createamind)

原文发表时间:2016-11-07

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏大数据挖掘DT机器学习

Python机器学习——线性模型

最近断断续续地在接触一些python的东西。按照我的习惯,首先从应用层面搞起,尽快入门,后续再细化一 些技术细节。找了一些资料,基本语法和数据结构搞定之后...

3876
来自专栏机器学习实践二三事

Python-OpenCV(7)

接着上篇,这次主要讲一个主题就是: 图像的阈值处理 主要涵盖的内容点包括: 简单阈值 自适应阈值 Otsu’s 二值化 大家可能不是很清楚阈值处理的...

2179
来自专栏专知

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

【导读】专知内容组整理了最近生成对抗网络相关文章,为大家进行介绍,欢迎查看! 1. Semi-supervised FusedGAN for Condition...

7607
来自专栏专知

【论文推荐】最新6篇主题模型相关论文—正则化变分推断主题模型、非参数先验、在线聊天、词义消歧、神经语言模型

【导读】专知内容组整理了最近六篇主题模型(Topic Modeling)相关文章,为大家进行介绍,欢迎查看! 1. Topic Modeling on Heal...

3055
来自专栏CreateAMind

声音图片 多感知论文

In this work, we show that a model trained to predict held-out sound from video ...

942
来自专栏斜述视角

自然语言处理 | 隐马尔可夫模型(1)

马尔可夫链(Markov chain),又称离散时间马尔可夫链,因俄国数学家安德烈·马尔可夫得名,为状态空间中经过从一个状态到另一个状态转换的随机过程。该过程要...

1214
来自专栏CreateAMind

disentangled-representation-papers

https://github.com/sootlasten/disentangled-representation-papers

5032
来自专栏专知

SIGIR 2018 信息检索领域顶级学术会议接受论文列表

5273
来自专栏专知

【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

【导读】专知内容组整理了最近八篇主题模型(Topic Model)相关文章,为大家进行介绍,欢迎查看! 1. Application of Rényi and ...

47312
来自专栏机器之心

学界 | NIPS 2017论文接收列表公布:清华、北大、腾讯、百度论文盘点

机器之心报道 机器之心编辑部 机器学习领域顶会 NIPS 2017 将于 12 月 4 号-9号在美国加州长滩举办。今日,大会公布了NIPS 2017 的论文接...

43315

扫码关注云+社区

领取腾讯云代金券