前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >噪音:让生成模型稳定训练的技巧

噪音:让生成模型稳定训练的技巧

作者头像
用户1908973
发布2018-07-25 11:08:18
8090
发布2018-07-25 11:08:18
举报
文章被收录于专栏:CreateAMindCreateAMind

Instance Noise: A trick for stabilising GAN training

with Casper Kaae Sønderby

Generative Adversarial Networks (GANs) are notoriously hard to train. In a recent paper, we presented an idea that might help remedy this.

介绍缘由,起因。

Our intern Casper spent the summer working with GANs, resulting in a paper which appeared on arXiv this week. One particular technique did us great service: instance noise. It's not the main focus of Casper's paper, so the details have been relegated to an Appendix. We thought it's a good idea to summarise it here, giving a few more details. Naturally, I think the full paper is also worth a read, there are a few more interesting things in there:

Instance noise

Summary
  • we think a major reason for GANs' instability may be that the generative distributions are weird, degenerate, and their support don't generally overlap with the true data distribution.
  • this makes the nice theory break down and may lead to unstable behaviour
  • we suggest adding noise to both real and synthetic data during training might help overcome these problems
  • in this note we motivate this technique and illustrate in a few figures how it helps training

GANs should work.

There are different ways to think about GANs: you can approach it from a game theoretic view of seeking Nash equilibrium (Salimans et al, 2016), or you can treat it as an E-M like iterative algorithm where the discriminator's job is likelihood ratio estimation (Mohamed et al, 2016, Uehara et al, 2016, Nowozin et al). If you've read my earlier posts, it should come as no surprise that I subscribe to the latter view.

Consider the following idealised GAN algorithm, each iteration consisting of the following steps:

So why don't they?

Crucially, the convergence of this algorithm relies on a few assumptions never really made explicit that don't always hold:

3 that the Bayes-optimal solution to the logistic regression problem is unique: there is a single optimal discriminator that does a much better job than any other classifier.

1 the log-likelihood-ratio and therefore KL divergence is infinite and not well defined

How to fix this?

The main ways to avoid these pathologies involve making the discriminator's job harder. Why? The JSJS divergence is constant locally in θθ, but it doesn't mean that the variational lower bound also has to be constant. Indeed, if you cripple the discriminator so the lower bound is not tight, you may end up with a non-constant function of θθ that will roughly guide you to the right direction.

An example of this crippling is that in most GAN implementations the discriminator is only partially updated in each iteration, rather than trained until convergence. This extreme form of early stopping is a form of regularisation that prevents the discriminator from overfitting.

Another way to cripple the discriminator is adding label noise, or equivalently, one-sided label smoothing as introduced by Salimans et al, (2016). In this technique the labels in the discriminator's training data are randomly flipped. Let's illustrate this technique in two figures.

The classifiaction view:

discriminators have a harder job, they are all punished evenly: there is no way for the discriminator to be smart about handling label noise. Adding label noise doesn't change the structure of the logistic regression loss landscape dramatically, it mainly just pushes everything up. Hence, there are still a large number of near-optimal discriminators. Adding label noise still does not allow us to pinpoint a single unique Bayes-optimal classifier. The JS divergence is not saturated to its maximum level anymore, but it is still locally constant in

θθ.

The graphical model view:

An alrernative way to think about instance noise vs. label noise is via graphical models. The following three graphical models define joint distributions, parametrised by θθ. The GAN algorithm tries to adjust θθ so as to minimise the mutual information between the highlighted nodes in these graphical models:

Here's what the variables are:

本文由zdx3578推荐。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2016-11-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Instance Noise: A trick for stabilising GAN training
  • with Casper Kaae Sønderby
  • Instance noise
    • GANs should work.
      • How to fix this?
      领券
      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档