How to Train a GAN? Tips and tricks to make GANs work
While research in Generative Adversarial Networks (GANs) continues to improve the fundamental stability of these models, we use a bunch of tricks to train them and make them stable day to day.
If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document. If we find it to be reasonable and verified, we will merge it in.
1. Normalize the inputs
normalize the images between -1 and 1
Tanh as the last layer of the generator output
2: A modified loss function
In GAN papers, the loss function to optimize G is min (log 1-D), but in practice folks practically use max log D
because the first formulation has vanishing gradients early on
Goodfellow et. al (2014)
In practice, works well:
Flip labels when training generator: real = fake, fake = real
3: Use a spherical Z
Dont sample from a Uniform distribution
Sample from a gaussian distribution
When doing interpolations, do the interpolation via a great circle, rather than a straight line from point A to point B
Label Smoothing, i.e. if you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
Salimans et. al. 2016
make the labels the noisy for the discriminator: occasionally flip the labels when training the discriminator
7: DCGAN / Hybrid Models
Use DCGAN when you can. It works!
if you cant use DCGANs and no model is stable, use a hybrid model : KL + GAN or VAE + GAN
8: Use stability tricks from RL
Experience Replay
Keep a replay buffer of past generations and occassionally show them
Keep checkpoints from the past of G and D and occassionaly swap them out for a few iterations
All stability tricks that work for deep deterministic policy gradients
See Pfau & Vinyals (2016)
9: Use the ADAM Optimizer
optim.Adam rules!
See Radford et. al. 2015
Use SGD for discriminator and ADAM for generator
10: Track failures early
D loss goes to 0: failure mode
check norms of gradients: if they are over 100 things are screwing up
when things are working, D loss has low variance and goes down over time vs having huge variance and spiking
if loss of generator steadily decreases, then it's fooling D with garbage (says martin)
11: Dont balance loss via statistics (unless you have a good reason to)
Dont try to find a (number of G / number of D) schedule to uncollapse training
It's hard and we've all tried it.
If you do try it, have a principled approach to it, rather than intuition
For example
代码语言:javascript
复制
while lossD > A:
train D
while lossG > B:
train G
12: If you have labels, use them
if you have labels available, training the discriminator to also classify the samples: auxillary GANs
13: Add noise to inputs, decay over time
Add some artificial noise to inputs to D (Arjovsky et. al., Huszar, 2016)