此次介绍一下基于vae的一个经典改进DFC-VAE,所谓的dfc就是deep feature consistent,原文是这样说的“Instead of using pixel-by-pixel loss, we enforce deep feature consistency between the input and the output of a VAE, which ensures the VAE’s output to preserve the spatial correlation characteristics of the input, thus leading the output to have a more natural visual appearance and better perceptual quality. “
1.原始VAE的不足
对于VAE的loss,经过几篇文章的介绍相信大家都是熟悉的
关系的重建好坏的主要是第一项loss,即reconstraction loss,正常的VAE中,此项为两张图片之间的MSE loss,然而MSE LOSS并不能捕捉到图片中的空间关系与知觉信息,原文是这样说的:” Such measurements are easily implemented and efficient for deep neural network training. However, the generated images tend to be very blurry when compared to natural images. This is because the pixel-by-pixel loss does not capture the perceptual difference and spatial correlation between the two images. For example, the same image offset by a few pixels will have little visual perceptual difference for humans, but it could have a very high pixel-by-pixel loss. This is a well-known problem in the image quality measurement community“
因此VAE的重建大多偏向于模糊,这就是使用MSE loss的首要不足,当然这只是一方面原因,具体可以参考WAE的那篇介绍。
2.如何改进
那么该如何解决这个问题呢?既然是MSE惹的祸,那就改一改它,MSE的问题用一句话来形容就是”一叶障目不见泰山”,无法捕捉到全局特征,格局太小,那么什么可以捕捉到全局特征呢?答:CNN。使用一个在imagenet上预训练好的CNN网络(例如VGG)对原始图片与重建图片进行特征的提取,然后对这些特征进行loss的计算,这就是该论文的精髓。
架构图如下,其中关键不难看出,就是使用了预训练网络重构了MSE loss
3.更多细节
基本原理就是这样,如果看公式的话,其实只有两个,在理论理解上并不困难。只不过在一些技术细节上,值得注意,比如loss权重的配比和网络架构等,整个模型超参数较多!
4.实验效果