【论文推荐】最新六篇视频分类相关论文—教师学生网络、表观-关系、Charades-Ego、视觉数据合成、图蒸馏、细粒度视频分类

【导读】专知内容组为大家推出最新六篇视频分类(Video Classification)相关论文,欢迎查看!

1.I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames(I Have Seen Enough:利用教师学生网络使用更少的帧进行视频分类)



作者:Shweta Bhardwaj,Mitesh M. Khapra

CVPR Workshop on Brave New Ideas for Video Understanding (BIVU)

摘要:Over the past few years, various tasks involving videos such as classification, description, summarization and question answering have received a lot of attention. Current models for these tasks compute an encoding of the video by treating it as a sequence of images and going over every image in the sequence. However, for longer videos this is very time consuming. In this paper, we focus on the task of video classification and aim to reduce the computational time by using the idea of distillation. Specifically, we first train a teacher network which looks at all the frames in a video and computes a representation for the video. We then train a student network whose objective is to process only a small fraction of the frames in the video and still produce a representation which is very close to the representation computed by the teacher network. This smaller student network involving fewer computations can then be employed at inference time for video classification. We experiment with the YouTube-8M dataset and show that the proposed student network can reduce the inference time by upto 30% with a very small drop in the performance

期刊:arXiv, 2018年5月12日

网址

http://www.zhuanzhi.ai/document/2e238a793c8db4a8aa0e57020adc17e6

2.Appearance-and-Relation Networks for Video Classification(表观-关系视频分类网络)



作者:Limin Wang,Wei Li,Wen Li,Luc Van Gool

CVPR18 camera-ready version. Code & models available at https://github.com/wanglimin/ARTNet

机构:, Nanjing University

摘要:Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods.

期刊:arXiv, 2018年5月6日

网址

http://www.zhuanzhi.ai/document/c945693806904a2b6c3c3bcf021acca8

3.Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos(Charades-Ego:由配对的第三人称和第一人称视频组成的一个大型数据集



作者:Gunnar A. Sigurdsson,Abhinav Gupta,Cordelia Schmid,Ali Farhadi,Karteek Alahar

摘要:In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances. Charades-Ego has temporal annotations and textual descriptions, making it suitable for egocentric video classification, localization, captioning, and new tasks utilizing the cross-modal nature of the data.

期刊:arXiv, 2018年5月1日

网址

http://www.zhuanzhi.ai/document/b3d218000dc923a6a5900bacdafd5772

4.Visual Data Synthesis via GAN for Zero-Shot Video Classification(通过GAN进行视觉数据合成进行视频零样本分类)



作者:Chenrui Zhang,Yuxin Peng

accepted by International Joint Conference on Artificial Intelligence (IJCAI) 2018

机构:Peking University

摘要:Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seen-to-unseen correlation via learning a projection between visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distribution, and commonly suffer from the information degradation issue caused by "heterogeneity gap". In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video feature of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Correlation to overcome information degradation issue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot synthesis procedure with robust guidance signals. Experimental results on four video datasets demonstrate that our approach can improve the zero-shot video classification performance significantly.

期刊:arXiv, 2018年4月26日

网址

http://www.zhuanzhi.ai/document/4ea8bdde291c58c698466e51a8d24c11

5.Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification(Better and Faster:知识从多个自监督学习任务通过图蒸馏转移到视频分类)



作者:Chenrui Zhang,Yuxin Peng

accepted by International Joint Conference on Artificial Intelligence (IJCAI) 2018

机构:Peking University

摘要:Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific features and thus resulting in suboptimal video representation. Second, high computational and memory cost hinders their application in real-world scenarios. In this paper, we propose a graph-based distillation framework to address these problems: (1) We propose logits graph and representation graph to transfer knowledge from multiple self-supervised tasks, where the former distills classifier-level knowledge by solving a multi-distribution joint matching problem, and the latter distills internal feature knowledge from pairwise ensembled representations with tackling the challenge of heterogeneity among different features; (2) The proposal that adopts a teacher-student framework can reduce the redundancy of knowledge learnt from teachers dramatically, leading to a lighter student model that solves classification task more efficiently. Experimental results on 3 video datasets validate that our proposal not only helps learn better video representation but also compress model for faster inference.

期刊:arXiv, 2018年4月26日

网址

http://www.zhuanzhi.ai/document/ac2c40c7a76dc3f1495f0981c6f8affd

6.Fine-grained Video Classification and Captioning(细粒度视频分类和描述生成)



作者:Farzaneh Mahdisoltani,Guillaume Berger,Waseem Gharbieh,David Fleet,Roland Memisevic

机构:University of Toronto

摘要:We describe a DNN for fine-grained action classification and video captioning. It gives state-of-the-art performance on the challenging Something-Something dataset, with over 220, 000 videos and 174 fine-grained actions. Classification and captioning on this dataset are challenging because of the subtle differences between actions, the use of thousands of different objects, and the diversity of captions penned by crowd actors. The model architecture shares features for classification and captioning, and is trained end-to-end. It performs much better than the existing classification benchmark for Something-Something, with impressive fine-grained results, and it yields a strong baseline on the new Something-Something captioning task. Our results reveal that there is a strong correlation between the degree of detail in the task and the ability of the learned features to transfer to other tasks.

期刊:arXiv, 2018年4月25日

网址

http://www.zhuanzhi.ai/document/3bcf0f7c4788e02c8c9fd5459803465b

-END-

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2018-06-06

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏大数据文摘

如何优雅地测量一只猫的体积

2127
来自专栏一心无二用,本人只专注于基础图像算法的实现与优化。

关于Cewu Lu等的《Combining Sketch and Tone for Pencil Drawing Production》一文铅笔画算法的理解和笔录。

相关论文的链接:Combining Sketch and Tone for Pencil Drawing Production        第一次看《Com...

4159
来自专栏量子位

18种热门GAN的PyTorch开源代码 | 附论文地址

有GitHub小伙伴提供了前人的肩膀供你站上去。TA汇总了18种热门GAN的PyTorch实现,还列出了每一种GAN的论文地址,可谓良心资源。

2192
来自专栏新智元

【一文打尽 ICLR 2018】9大演讲,DeepMind、谷歌最新干货抢鲜看

4277
来自专栏专知

【论文推荐】最新5篇语音识别(ASR)相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【导读】专知内容组整理了最近五篇语音识别(Automatic Speech Recognition, ASR)相关文章,为大家进行介绍,欢迎查看! 1. Aud...

5974
来自专栏专知

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【导读】专知内容组整理了最近七篇自注意力机制(Self-attention)相关文章,为大家进行介绍,欢迎查看! 1. A Structured Self-at...

4.9K6
来自专栏CVer

人工智能 | 中国计算机学会推荐国际学术刊物/会议

关注CVer公众号的群体大多以学生为主,特别是研究生。相信在帮boss做事的时候,论文也是核心工作。Amusi平时爱推送一些论文速递,但这么多论文,怎么快速区分...

1741
来自专栏AI科技评论

动态 | NIPS 2017 今天开启议程,谷歌科学家竟然组团去了450人,还都不是去玩的!

AI 科技评论按:据说,别人去NIPS 2017是这样的: ? 而谷歌去NIPS 2017是这样的: ? 今天,人工智能领域本年度最后一个学术盛会、机器学习领域...

2303
来自专栏专知

【专知荟萃06】计算机视觉CV知识资料大全集(入门/进阶/论文/课程/会议/专家等)(附pdf下载)

【导读】主题荟萃知识是专知的核心功能之一,为用户提供AI领域系统性的知识学习服务。主题荟萃为用户提供全网关于该主题的精华(Awesome)知识资料收录整理,使得...

1.5K14
来自专栏专知

【专知荟萃23】深度强化学习RL知识资料全集(入门/进阶/论文/综述/代码/专家,附查看)

【AlphaGoZero核心技术】深度强化学习专知荟萃 【AlphaGoZero核心技术】深度强化学习专知荟萃 基础入门 进阶文章 Papers Papers ...

6969

扫码关注云+社区

领取腾讯云代金券