Deep feature vae

CreateAMind

发布于 2019-10-14 19:00:12

9530

发布于 2019-10-14 19:00:12

文章被收录于专栏：CreateAMind

此次介绍一下基于vae的一个经典改进DFC-VAE，所谓的dfc就是deep feature consistent，原文是这样说的“Instead of using pixel-by-pixel loss, we enforce deep feature consistency between the input and the output of a VAE, which ensures the VAE’s output to preserve the spatial correlation characteristics of the input, thus leading the output to have a more natural visual appearance and better perceptual quality. “

１．原始ＶＡＥ的不足

对于VAE的loss，经过几篇文章的介绍相信大家都是熟悉的

关系的重建好坏的主要是第一项loss，即reconstraction loss,正常的VAE中，此项为两张图片之间的ＭＳＥ　ｌｏｓｓ,然而ＭＳＥ　ＬＯＳＳ并不能捕捉到图片中的空间关系与知觉信息，原文是这样说的：” Such measurements are easily implemented and efficient for deep neural network training. However, the generated images tend to be very blurry when compared to natural images. This is because the pixel-by-pixel loss does not capture the perceptual difference and spatial correlation between the two images. For example, the same image offset by a few pixels will have little visual perceptual difference for humans, but it could have a very high pixel-by-pixel loss. This is a well-known problem in the image quality measurement community“

因此ＶＡＥ的重建大多偏向于模糊，这就是使用ＭＳＥ　ｌｏｓｓ的首要不足，当然这只是一方面原因，具体可以参考ＷＡＥ的那篇介绍。

２．如何改进

那么该如何解决这个问题呢？既然是ＭＳＥ惹的祸，那就改一改它，ＭＳＥ的问题用一句话来形容就是”一叶障目不见泰山”，无法捕捉到全局特征，格局太小，那么什么可以捕捉到全局特征呢？答：ＣＮＮ。使用一个在imagenet上预训练好的ＣＮＮ网络（例如ＶＧＧ）对原始图片与重建图片进行特征的提取，然后对这些特征进行ｌｏｓｓ的计算，这就是该论文的精髓。

架构图如下，其中关键不难看出，就是使用了预训练网络重构了ＭＳＥ　ｌｏｓｓ