https://github.com/Evolving-AI-Lab/ppgn
一 官方介绍
This repository contains source code necessary to reproduce some of the main results in the paper:
Nguyen A, Yosinski J, Bengio Y, Dosovitskiy A, Clune J. (2016). "Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space". arXiv:1612.00005v1.
If you use this software in an academic article, please consider citing:
@article{nguyen2016ppgn,
title={Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space},
author={Nguyen, Anh and Yosinski, Jason and Bengio, Yoshua and Dosovitskiy, Alexey and Clune, Jeff},
journal={arXiv preprint 1612.00005},
year={2016}
}
For more information regarding the paper, please visit www.evolvingai.org/ppgn
This code is built on top of Caffe. You'll need to install the following:
caffe/python
folder in settings.py is correctYou will need to download a few models to run the examples below. There are download.sh
scripts provided for your convenience.
cd nets/generator/noiseless && ./download.sh
cd nets/caffenet && ./download.sh
cd nets/placesCNN && ./download.sh
cd nets/lrcn && ./download.sh
Settings:
download.sh
scripts run correctly.The main sampling algorithm is in sampler.py. We provide two Python scripts for sampling conditioned on classes and sampling conditioned on captions to which you can pass various command-line arguments to run different experiments. The basic idea is to sample from the joint model p(x,y) which decomposes into a prior p(x) model (given by the G and E) and a condition p(y|x) model. Here, we provide the pre-trained networks for the Noiseless Joint PPGN-h model (Sec 3.5 in the paper). We show examples conditioning on classes, hidden neurons, and captions by using different condition networks.
We provide here 5 different examples as a starting point. Feel free to fork away to produce even cooler results!
1_class_conditional_sampling.sh: Sampling conditioning on the class "junco" (output unit #13 of the CaffeNet DNNtrained on ImageNet dataset). This script produces a sampling chain for a single given class.
./1_class_conditional_sampling.sh 13
produces this result:A sampling chain conditioning on class "junco" starting from a random code (top left)
2_class_conditional_sampling_many.sh: We can also run a long sampling chain between different classes.
./2_class_conditional_sampling_many.sh <epsilon1>
with different epsilon1
(multiplier for the image prior component) produces a chain with different styles of samples:1e-5 1e-3 1e-1
Default | More abstract style | Ignoring class gradient |
3_hidden_conditional_sampling.sh: Instead of conditioning on a class, it is possible to condition on a hidden neuron i.e. performing Multifaceted Feature Visualization or synthesizing a set of inputs that highly activate a given neuron to understand what features it has learned to detect.
./3_hidden_conditional_sampling.sh 196
produces a set of images for a conv5 neuron #196 previously identified as a "face detector" in DeepVis toolbox:30 samples generated by conditioning on a "face detector" conv5 neuron. It is interesting that the face detector neuron even fires for things that do not look like a face at all (e.g. the yellow house in the center)
Running the above longer could can produce many other types of faces.
4_hidden_conditional_sampling_placesCNN.sh: One can repeat the example above but with an arbitrary neuron in a different condition network. Here, we visualize the conv5 neuron #182 in the AlexNet DNN trained on MIT Places205dataset. This neuron has been previously identified as a "food detector" in Zhou et al [2].
./4_hidden_conditional_sampling_placesCNN.sh 182
produces this result:30 random samples that highly activate a "food detector" conv5 neuron.
5_caption_conditional_sampling.sh: We can also replace the image classifier network in previous examples with a pre-trained image captioning network to form a text-to-image model without even re-training anything. The image captioning model in this example is the LRCN model in Donahue et al (2015) [1].
recurrent
branch of the Caffe provided here and update the path to Caffe accordingly in settings.py./5_caption_conditional_sampling.sh a_church_steeple_that_has_a_clock_on_it
produces this result:Note that we often obtain mixed results with this particular text-to-image model. For some words, it works pretty well, but for others it struggles to produce reasonable images. While the language space in this model still needs further exploration, as a starting point, here are some sentences that produce reasonable images.
Here are a few (crazy?) ideas that one could play with PPGNs:
Learning What and Where to Draw
Reed et al. (2016), by conditioning on a region of the last heatmap of a fully convolutional classification network or a semantic segmentation network.Note that the code in this repository is licensed under MIT License, but, the pre-trained condition models used by the code have their own licenses. Please carefully check them before use.
If you have questions/suggestions, please feel free to email, tweet to @anh_ng8 or create github issues.
[1] Donahue et al. "Long-term Recurrent Convolutional Networks for Visual Recognition and Description". CVPR 2015
[2] Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. "Object detectors emerge in deep scene cnns". ICLR 2015.
二 代码:
代码整体注释非常清楚,结合论文查看通俗易懂
class Sampler(object): 核心类
def get_code(self, encoder, path, layer):
'''
Push the given image through an encoder (here, AlexNet) to get a code. 图片通过神经网络编码为特征隐变量
'''
# set up the inputs for the net: 读入图片并处理
# initialize the encoder 编码网络初始化
# extract the features 前向计算获取计算后特征
def backward_from_x_to_h(self, generator, diff, start, end):
'''
Backpropagate the gradient from the image (start) back to the latent space (end) of the generator network. 生成网络将隐变量输出为图片,这里进行图片到隐变量的BP计算
'''
图片大小匹配计算
def h_autoencoder_grad(self, h, encoder, decoder, gen_out_layer, topleft):
'''
Compute the gradient of the energy of P(input) wrt input, which is given by decode(encode(input))-input {see Alain & Bengio, 2014}.自编码网络隐变量误差计算
Specifically, we compute E(G(h)) - h.
Note: this is an "upside down" auto-encoder for h that goes h -> x -> h with G modeling h -> x and E modeling x -> h.
'''
图片大小尺寸准备、图片变量准备保持最好的sample。
代码中的epsilon123对应公式:
循环控制等支持代码
PPGN关键是根据condition网络来生成相应图片,下面是condition网络处理相关代码in sampling_class.py 神经网络中图片的前向计算和BP计算。
下面脚本就是对以上代码的不同配置的使用。
本文由zdx3578推荐。