If you ask deep learning people "what is the best image generation model now", many of them would probably say "generative adversarial networks" (GAN). The original paper describes an adversarial process that asks the generator and the discriminator to fight against each other, and many people including myself like this intuition. What lessons can we learn from GAN for better approximate inference (which is my thesis topic)? I need to rewrite the framework in the language I'm comfortable with, hence I decided to put it on this blog post as a research note.
Suppose there's a machine that can show you some images. This machine flips a coin to determine what it will show to you. For heads, the machine shows you a gorgeous paint from human artists. For tails, it shows you a forged famous paint. Your job is to figure out whether the shown image is "real" or not.
To summarize, GAN can be viewed as an augmented generative model which is trained by minimizing mutual information. This augmentation is smart in the sense that it uses label-like information ss that can be obtained for free, which introduces supervision signal to help unsupervised learning.
Recently I started to think about automatic algorithm selection. Probably because I'm tired of my reviewers complaining on my alpha-divergence papers "I don't know how to choose alpha and you should give us a guidance". Tom Minka gives an empirical guidance in his divergence measure tech report, and same for us in recent papers. I know this is an important but very difficult topic for research, but at least not only myself have thought about it, e.g. in this paper the authors connected beta-divergences to tweedie distributions and performed approximate maximum likelihood to select beta. Another interesting paper in this line is the "variational tempering" paper which models the annealing temperature in a probabilistic model as well. I like these papers as the core idea is very simple: we should also use probabilistic modeling for algorithmic parameters. Perhaps this also connects to Bayesian optimization but I'm gonna stop here as the note is already a bit too long.
原文发布于微信公众号 - CreateAMind（createamind）