专栏首页CreateAMind图像结构样式分开生成的生成模型论文代码

图像结构样式分开生成的生成模型论文代码

Generative Image Modeling using Style and Structure Adversarial Networks

Xiaolong Wang, Abhinav Gupta

Robotics Institute, Carnegie Mellon University

当前生成框架使用终端到终端的学习和由均匀分布的噪声采样生成图像。 然而,这些方法忽略图像形成的最基本的原理:图像的产物:(a)结构:底层三维模型;(二)风格:纹理映射到结构。在本文中,我们因式分解图像

生成过程并提出体例结构生成对抗性网(S2-GAN)。我们的S2-GAN有两个组成部分:StructureGAN产生一个结构图;style-GaN取面法线图作为输入并产生2D图像。除了真正的与生成图片的损失函数,我们使用计算机表面的额外损失

生成的图像。这两个GANS首先独立培训,然后通过共同学习合并在一起。我们展示我们的S2-GAN模型可解释,产生更逼真的图像,并能用于学习的无监督RGBD表示。

我们的style-GAN也可以作为渲染引擎,生成不同的图片。

代码 https://github.com/xiaolonw/ss-gan

我们的训练结果迁移到物体分类识别及对象detection的识别上的效果也不错。

先上图,然后论文部分解读

论文部分解读:

Introduction

Unsupervised learning of visual representations is one of the most fundamental problems in computer vision. There are two common approaches for unsupervised learning: (a) using a discriminative framework with auxiliary tasks where supervision comes for free, such as context prediction [1,2] or temporal embedding [3,4,5,6,7,8]; (b) using a generative framework where the underlying model is compositional and attempts to generate realistic images [9,10,11,12]. The underlying hypothesis of the generative framework is that if the model is good enough to generate novel and realistic images, it should be a good representation for

vision tasks as well. Most of these generative frameworks use end-to-end learning to generate RGB images from control parameters (z also called noise since it is sampled from a uniform distribution). Recently, some impressive results [13] have been shown on restrictive domains such as faces and bedrooms.

However, these approaches ignore one of the most basic underlying principles of image formation. Images are a product of two separate phenomena:Structure: this encodes the underlying geometry of the scene. It refers to the underlying mesh, voxel representation etc. Style: this encodes the texture on the objects and the illumination. In this paper, we build upon this IM101 principle of image formation and factor the generative adversarial network (GAN) into two generative processes as Fig. 1. The first, a structure generative model (namely Structure-GAN), takes zˆ and generates the underlying 3D structure (y3D ) for the

视觉表现的无监督学习是最根本的一个

计算机视觉问题。有对无监督学习两种常用的方法:(1)使用带有辅助任务,其中一个辨别框架

监督来为免费的,比如环境的预测[1,2]或时间嵌入[3,4,5,6,7,8] (b)使用一个生成框架,底层模型

成分并试图生成逼真的图像[9,10,11,12]。的生成架构的基本假设是,如果该模型是足够好

以生成新的和现实的图像,它应该是一个很好的代表性

视觉任务为好。大多数生成框架使用终端到终端的学习

以生成控制参数的RGB图像(Z也称为噪声,因为它

从均匀分布取样)。最近,一些令人印象深刻的结果[13]

已被证明对限制性领域,如脸和卧室。

然而,这些方法忽略的图像形成的最基本原理之一。图像是两个独立的现象的产物:结构:此编码场景的基本几何形状。它指的是

基本网格,体素表示等, 风格:这个编码的纹理

对象和照明。在本文中,我们建立在这一原则IM101

成像和因子的生成对抗网络(GAN)划分为两个

生成过程如图。 1.首先,结构生成模型(即

结构-GAN),采用z和用于生成基本的3D结构(y3D)

The second, a conditional generative network (namely Style-GAN), takesy3D as input and noise z ̃ to generate the image yI . We call this factored generative network Style and Structure Generative Adversarial Network (S2-GAN).

Why S2-GAN? We believe there are fourfold advantages of factoring the style and structure in the image generation process. Firstly, factoring style and structure simplifies the overall generative process and leads to more realistic high-resolution images. It also leads to a highly stable and robust learning procedure. Secondly, due to the factoring process, S2-GAN is more interpretable as compared to its counterparts. One can even factor the errors and understand where the surface normal generation failed as compared to texture generation. Thirdly, as our results indicate, S2-GAN allows us to learn RGBD representation in an unsupervised manner. This can be crucial for many robotics and graphics applications. Finally, our Style-GAN can also be thought of as a learned rendering engine which, given any 3D input, allows us to render a corresponding image. It also allows us to build applications where one can modify the underlying 3D structure of an input image and render a completely new image.

第二,有条件生成网络(即风格-GAN),需要y3D作为输入和噪声Z到生成图像矣。我们把这个因素生成的网络,风格结构生成对抗性网(S2-GAN)。

为什么S2-GAN?我们相信,有融通的优势四倍

式和结构在图像生成处理。首先,保理和风格

结构简化了整个生成过程,并导致更逼真

高分辨率的图像。这也导致高度稳定的和强大的学习过程。其次,由于理过程,S2-GAN是更可解释

相比,其对应物。人们甚至可以在因素的错误和理解

当表面正常生成失败相比,纹理生成的。

第三,我们的结果表明,S2-GAN可以让我们学习RGBD表示

在无人监督的方式。这可用于许多机器人和图形关键

应用程序。最后,我们的风格GAN,也可以看作是一个博学渲染引擎当中,给予任何3D输入,使我们能够呈现相应的图像。

它也让我们来构建应用程序,其中一个可以修改底层3D

输入图像的结构和呈现一个完全新的图像。

However, learning S2-GAN is still not an easy task. To tackle this challenge, we first learn the Style-GAN and Structure-GAN in an independent manner. We use the NYUv2 RGBD dataset [14] with more than 200K frames for learning

the initial networks. We train a Structure-GAN using the ground truth surface normals from Kinect.

Because the perspective distortion of texture is more directly related to normals than to depth, we use surface normal to represent image structure in this paper.

We learn in parallel our Style-GAN which is conditional on the ground truth surface normals. While training the Style-GAN, we have two loss functions:

the first loss function takes in an image and the surface normals and tries to predict if they correspond to a real scene or not.

However, this loss function alone does not enforce explicit pixel based constraints for aligning generated images with input surface normals.

To enforce the pixel-wise constraints, we make the following assumption: if the generated image is realistic enough, we should be able to reconstruct or predict the 3D structure based on it.

We achieve this by adding another discriminator network. More specifically, the generated image is not only forwarded to the discriminator network in GAN but also a input for the trained surface normal predictor network.

Once we have trained an initial Style-GAN and Structure-GAN, we combine them together and perform end-to-end learning jointly where images are generated from zˆ, z ̃ and fed to discriminators for real/fake task.

2 Related Work

3 背景GAN

4 Style and Structure GAN

GAN and DCGAN approaches directly generate images from the sampled z.

Instead, we use the fact that image generation has two components: (a) gener- ating the underlying structure based on the objects in the scene; (b) generating the texture/style on top of this 3D structure. We use this simple observation to decompose the generative process into two procedures: (i) Structure-GAN - this process generates surface normals from sampled zˆ and (ii) Style-GAN - this model generates the images taking as input the surface normals and another latent variable z ̃ sampled from uniform distribution. We train both models with RGBD data, and the ground truth surface normals are obtained from the depth.

GAN和DCGAN办法直接生成从采样ž图像。相反,我们使用该图像生成有两个组成部分的事实:(a)产生根据在场景中的对象的基本结构;(b)产生在这个三维结构的顶部纹理/风格。我们使用这个简单的观察,以分解的生成过程分成两个步骤:(ⅰ)结构 - GAN - 这个过程从采样z和生成结构表面(ⅱ)形式 - GAN - 该模型生成以作为输入表面法线和另一图像潜变量z从均匀分布采样。我们用RGBD数据训练这两种模式,与地面真相表面法线是从深度获得。

本文分享自微信公众号 - CreateAMind(createamind),作者:zdx3578

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2016-09-29

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 一个很牛的GAN工具项目:HyperGAN

    A versatile GAN(generative adversarial network) implementation focused on scalab...

    用户1908973
  • 灾难性遗忘问题新视角:迁移-干扰平衡

    1. Catastrophic Forgetting and the Stability-Plasticity Dilemma

    用户1908973
  • Tracking Emerges by Colorizing Videos

    Carl Vondrick , Abhinav Shrivastava , Alireza Fathi , Sergio Guadarrama ,Kevin M...

    用户1908973
  • EQUS-帮助查看公式(CS SE)

    可视化通常是简化信息和帮助人们理解复杂数据的一种方式。在本文中,我们描述了电子表格公式(EQUS)交互式可视化的设计,开发和评估。这项工作是合理的,理由是这些工...

    刘子蔚
  • Google的预训练模型又霸榜了,这次叫做T5(附榜单)

    T5 serves primarily as code for reproducing the experiments in Exploring the Lim...

    数据派THU
  • hdu-----(1179)Ollivanders: Makers of Fine Wands since 382 BC.(二分匹配)

    Ollivanders: Makers of Fine Wands since 382 BC. Time Limit: 2000/1000 MS (Java/O...

    Gxjun
  • 通过音乐驱动的机器人情感韵律和手势,建立人机信任(Human-Computer Interaction)

    随着人机协作机会的不断扩大,信任对于机器人的充分参与和利用变得越来越重要。建立在情感关系和人际关系纽带上的情感信任尤其重要,因为它对错误更有弹性,并增加了合作的...

    李欣颖6837176
  • Gitbook 插件安装 - 侧边栏宽度调整

    this plug-in provides a vertical bar that divides the summary and main content.

    Devops海洋的渔夫
  • Istio实战——istio1.5 使用 wasm 扩展介绍

      它来了,它来了,它带着“优化”走来了。 Istio 1.5 千呼万唤始出来, 只见它左手一只鸡,右手一只鸭,怀里抱着一个 istiod。这些礼物包括:

    秦始皇2.0
  • Issues About Installing Octopress

    Actually I am fresh to Write Blog with Octopress in Github Pages.According to th...

    技术小黑屋

扫码关注云+社区

领取腾讯云代金券