前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >[计算机视觉论文速递] ECCV 2018 专场8

[计算机视觉论文速递] ECCV 2018 专场8

作者头像
Amusi
修改2019-12-17 15:02:12
5120
修改2019-12-17 15:02:12
举报
文章被收录于专栏:CVerCVer

前戏

Amusi 将日常整理的论文都会同步发布到 daily-paper-computer-vision 上。名字有点露骨,还请见谅。喜欢的童鞋,欢迎star、fork和pull。

直接点击“阅读全文”即可访问daily-paper-computer-vision

link: https://github.com/amusi/daily-paper-computer-vision

ECCV 2018是计算机视觉领域中的顶级会议,目前已经公开了部分已录用的paper。之前已经推送了七篇 ECCV 2018论文速递推文:

[计算机视觉论文速递] ECCV 2018 专场1

[计算机视觉论文速递] ECCV 2018 专场2

[计算机视觉论文速递] ECCV 2018 专场3

[计算机视觉论文速递] ECCV 2018 专场4

[计算机视觉论文速递] ECCV 2018 专场5

[计算机视觉论文速递] ECCV 2018 专场6

[计算机视觉论文速递] ECCV 2018 专场7

VAE

《MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics》

ECCV 2018

Abstract:Long-term human motion can be represented as a series of motion modes---motion sequences that capture short-term temporal dynamics---with transitions between them. We leverage this structure and present a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation. Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode. Our model is able to generate multiple diverse and plausible motion sequences in the future from the same input. We apply our approach to both facial and full body motion, and demonstrate applications like analogy-based motion transfer and video synthesis.

Welcome to click AD

摘要:长期(long-term)人体运动可以表示为一系列运动模式 - 捕捉短期时间动态的运动序列 - 它们之间的过渡。我们利用这种结构,提出了一种新颖的运动变换变分自动编码器(MT-VAE),用于学习运动序列生成。我们的模型联合学习运动模式的特征嵌入(可以从中重建运动序列)和表示一个运动模式到下一个运动模式的转换的特征变换。我们的模型能够从相同的输入生成"未来"的多种多样且可信的运动序列。我们将此方法应用于面部和全身运动,并演示了基于类比的运动传递和视频合成等应用。

arXiv:https://arxiv.org/abs/1808.04545

Visual Reasoning

《Self-produced Guidance for Weakly-supervised Object Localization》

ECCV 2018

GuessWhat?!

The Multi-hop FiLM architecture

Abstract:Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rather than all at once, as in prior work. By alternating between attending to the language input and generating FiLM layer parameters, this approach is better able to scale to settings with longer input sequences such as dialogue. We demonstrate that multi-hop FiLM generation achieves state-of-the-art for the short input sequence task ReferIt --- on-par with single-hop FiLM generation --- while also significantly outperforming prior state-of-the-art and single-hop FiLM generation on the GuessWhat?! visual dialogue task.

摘要:最近计算机视觉和自然语言处理方面的突破激发了人们对挑战多模式任务(如视觉问答和视觉对话)的兴趣。对于这样的任务,一种成功的方法是通过特征线性调制(FiLM)层(即,每通道缩放和移位)来调节语言上基于图像的卷积网络计算。我们提出以多跳方式生成在卷积网络的层次结构上的FiLM层的参数,而不是像在先前的工作中那样一次生成。通过在参与语言输入和生成FiLM层参数之间交替,这种方法能够更好地扩展到具有较长输入序列的设置,例如对话(dialogue)。我们证明了多跳FiLM生成实现了短输入序列任务的最新技术参考 - 与单跳FiLM生成相媲美 - 同时也明显优于先前的先进技术GuessWhat上的单跳FiLM生成?!视觉对话任务。

arXiv:https://arxiv.org/abs/1808.04446

注:Amusi觉得将CV与NLP结合有非常大的研究意义和前景。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-08-16,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CVer 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
图像处理
图像处理基于腾讯云深度学习等人工智能技术,提供综合性的图像优化处理服务,包括图像质量评估、图像清晰度增强、图像智能裁剪等。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档