噪音:让生成模型稳定训练的技巧

Instance Noise: A trick for stabilising GAN training

with Casper Kaae Sønderby

Generative Adversarial Networks (GANs) are notoriously hard to train. In a recent paper, we presented an idea that might help remedy this.

介绍缘由,起因。

Our intern Casper spent the summer working with GANs, resulting in a paper which appeared on arXiv this week. One particular technique did us great service: instance noise. It's not the main focus of Casper's paper, so the details have been relegated to an Appendix. We thought it's a good idea to summarise it here, giving a few more details. Naturally, I think the full paper is also worth a read, there are a few more interesting things in there:

Instance noise

Summary
  • we think a major reason for GANs' instability may be that the generative distributions are weird, degenerate, and their support don't generally overlap with the true data distribution.
  • this makes the nice theory break down and may lead to unstable behaviour
  • we suggest adding noise to both real and synthetic data during training might help overcome these problems
  • in this note we motivate this technique and illustrate in a few figures how it helps training

GANs should work.

There are different ways to think about GANs: you can approach it from a game theoretic view of seeking Nash equilibrium (Salimans et al, 2016), or you can treat it as an E-M like iterative algorithm where the discriminator's job is likelihood ratio estimation (Mohamed et al, 2016, Uehara et al, 2016, Nowozin et al). If you've read my earlier posts, it should come as no surprise that I subscribe to the latter view.

Consider the following idealised GAN algorithm, each iteration consisting of the following steps:

So why don't they?

Crucially, the convergence of this algorithm relies on a few assumptions never really made explicit that don't always hold:

3 that the Bayes-optimal solution to the logistic regression problem is unique: there is a single optimal discriminator that does a much better job than any other classifier.

1 the log-likelihood-ratio and therefore KL divergence is infinite and not well defined

How to fix this?

The main ways to avoid these pathologies involve making the discriminator's job harder. Why? The JSJS divergence is constant locally in θθ, but it doesn't mean that the variational lower bound also has to be constant. Indeed, if you cripple the discriminator so the lower bound is not tight, you may end up with a non-constant function of θθ that will roughly guide you to the right direction.

An example of this crippling is that in most GAN implementations the discriminator is only partially updated in each iteration, rather than trained until convergence. This extreme form of early stopping is a form of regularisation that prevents the discriminator from overfitting.

Another way to cripple the discriminator is adding label noise, or equivalently, one-sided label smoothing as introduced by Salimans et al, (2016). In this technique the labels in the discriminator's training data are randomly flipped. Let's illustrate this technique in two figures.

The classifiaction view:

discriminators have a harder job, they are all punished evenly: there is no way for the discriminator to be smart about handling label noise. Adding label noise doesn't change the structure of the logistic regression loss landscape dramatically, it mainly just pushes everything up. Hence, there are still a large number of near-optimal discriminators. Adding label noise still does not allow us to pinpoint a single unique Bayes-optimal classifier. The JS divergence is not saturated to its maximum level anymore, but it is still locally constant in

θθ.

The graphical model view:

An alrernative way to think about instance noise vs. label noise is via graphical models. The following three graphical models define joint distributions, parametrised by θθ. The GAN algorithm tries to adjust θθ so as to minimise the mutual information between the highlighted nodes in these graphical models:

Here's what the variables are:

本文由zdx3578推荐。

原文发布于微信公众号 - CreateAMind(createamind)

原文发表时间:2016-11-09

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏吉浦迅科技

DAY 1: 学习CUDA C Programming Guide

3853
来自专栏琯琯博客

一个有用的PHP片段的集合

2957
来自专栏企鹅号快讯

2018年值得关注的200场机器学习会议

2017年马上就要过去了,这一年你的收获怎么样?在学习的过程中,独自学习与向别人学习同样重要,其中通过各种会议了解AI行业研究成果是个不错的提高自己的方法。对于...

3189
来自专栏CreateAMind

MoCoGAN 分解运动和内容的视频生成

https://github.com/sergeytulyakov/mocogan

1691
来自专栏CreateAMind

Hebbian learning 的实现 Fast Weights

732
来自专栏AI科技大本营的专栏

资源 | 2018年值得关注的200场机器学习会议(建议收藏)

2017年马上就要过去了,这一年你的收获怎么样?在学习的过程中,独自学习与向别人学习同样重要,其中通过各种会议了解AI行业研究成果是个不错的提高自己的方法。对于...

37310
来自专栏CreateAMind

代码+视频 - 卡车强化学习自动驾驶模拟 - 玩游戏看风景

https://github.com/aleju/self-driving-truck

1374
来自专栏人工智能头条

Deep Learning Machine Beats Humans in IQ Test

1764
来自专栏数据科学与人工智能

【陆勤践行】DataSchool 推荐的数据科学资源

Blogs Simply Statistics1: Written by the Biostatistics professors at Johns Hopki...

2829
来自专栏CreateAMind

代码+视频 - 卡车强化学习自动驾驶模拟 - 玩游戏看风景

https://github.com/aleju/self-driving-truck

1283

扫码关注云+社区

领取腾讯云代金券