前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >【专知荟萃24】视频描述生成(Video Captioning)知识资料全集(入门/进阶/论文/综述/代码/专家,附查看)

【专知荟萃24】视频描述生成(Video Captioning)知识资料全集(入门/进阶/论文/综述/代码/专家,附查看)

作者头像
WZEARW
发布2018-04-11 15:27:16
3.8K0
发布2018-04-11 15:27:16
举报
文章被收录于专栏:专知专知

视频描述生成(Video Captioning)专知荟萃

  • 视频描述生成(Video Captioning)专知荟萃
    • 入门学习
    • 进阶文章
      • 2015
      • 2016
      • 2017
    • Tutorial
    • 代码
    • 领域专家
    • Datasets

入门学习
  1. Video Analysis 相关领域介绍之Video Captioning(视频to文字描述)
    • [https://zhuanlan.zhihu.com/p/26730181]
  2. 让机器读懂视频
    • [http://gitbook.cn/books/59192e91ceea8e6fe4504c74/index.html]
  3. 梅涛:“看图说话”——人类走开,我AI来
  4. 深度三维残差神经网络:视频理解新突破
    • [http://www.msra.cn/zh-cn/news/features/pseudo-3d-residual-networks-20171027]
  5. Word2VisualVec for Video-To-Text Matching and Ranking
    • [http://www-nlpir.nist.gov/projects/tvpubs/tv16.slides/tv16.vtt.mediamill.slides.pdf]

进阶文章

2015
  1. Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015. - [http://arxiv.org/pdf/1411.4389.pdf]
  2. Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729.
    • UT / UML / Berkeley [http://arxiv.org/pdf/1412.4729]
  3. Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861.
    • Microsoft [http://arxiv.org/pdf/1505.01861]
  4. Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence--Video to Text, arXiv:1505.00487.
    • UT / Berkeley / UML [http://arxiv.org/pdf/1505.00487]
  5. Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029
    • Univ. Montreal / Univ. Sherbrooke [http://arxiv.org/pdf/1502.08029.pdf]]
  6. Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698
    • MPI / Berkeley [http://arxiv.org/pdf/1506.01698.pdf]]
  7. Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724
    • Univ. Toronto / MIT [[http://arxiv.org/pdf/1506.06724.pdf]]
  8. Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053
    • Univ. Montreal [http://arxiv.org/pdf/1507.01053.pdf]
2016
  1. Multimodal Video Description
    • [https://dl.acm.org/citation.cfm?id=2984066]
  2. Describing Videos using Multi-modal Fusion
    • [https://dl.acm.org/citation.cfm?id=2984065]
  3. Andrew Shin , Katsunori Ohnishi , Tatsuya Harada Beyond caption to narrative: Video captioning with multiple sentences
    • [http://ieeexplore.ieee.org/abstract/document/7532983/]
  4. Jianfeng Dong, Xirong Li, Cees G. M. Snoek Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction
    • [https://pdfs.semanticscholar.org/de22/8875bc33e9db85123469ef80fc0071a92386.pdf]
2017
  1. Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950.
    • TAU / USC [[https://arxiv.org/pdf/1612.06950.pdf]]
  2. Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks Attention-Based Multimodal Fusion for Video Description
    • [https://arxiv.org/abs/1701.03126]
  3. Weakly Supervised Dense Video Captioning(CVPR2017)
  4. Multi-Task Video Captioning with Video and Entailment Generation(ACL2017)
  5. Multimodal Memory Modelling for Video Captioning, Junbo Wang, Wei Wang, Yan Huang, Liang Wang, Tieniu Tan
    • [https://arxiv.org/abs/1611.05592]
  6. Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing Recurrent Topic-Transition GAN for Visual Paragraph Generation
    • [https://arxiv.org/abs/1703.07022]
  7. MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning Xuelong Li1 , Bin Zhao2 , Xiaoqiang Lu1
    • [https://www.ijcai.org/proceedings/2017/0307.pdf]

Tutorial
  1. “Bridging Video and Language with Deep Learning,” Invited tutorial at ECCV-ACM Multimedia, Amsterdam, The Netherlands, Oct. 2016.
    • [https://www.microsoft.com/en-us/research/publication/tutorial-bridging-video-language-deep-learning/]
  2. ICIP-2017-Tutorial-Video-and-Language-Pub
    • [https://www.microsoft.com/en-us/research/wp-content/uploads/2017/09/ICIP-2017-Tutorial-Video-and-Language-Pub.pdf]

代码
  1. neuralvideo
    • [https://github.com/olivernina/neuralvideo]
  2. Translating Videos to Natural Language Using Deep Recurrent Neural Networks
    • [ https://www.cs.utexas.edu/~vsub/naacl15_project.html#code\]
  3. Describing Videos by Exploiting Temporal Structure
    • [https://github.com/yaoli/arctic-capgen-vid]
  4. SA-tensorflow: Soft attention mechanism for video caption generation
    • [https://github.com/tsenghungchen/SA-tensorflow]
  5. Sequence to Sequence -- Video to Text
    • [https://github.com/jazzsaxmafia/video_to_sequence\]

领域专家
  1. 梅涛 微软亚洲研究院资深研究员梅涛博士,微软亚洲研究院资深研究员,国际模式识别学会会士,美国计算机协会杰出科学家,中国科技大学和中山大学兼职教授博导。主要研究兴趣为多媒体分析、计算机视觉和机器学习。 - [https://www.microsoft.com/en-us/research/people/tmei/]
  2. Xirong Li 李锡荣 中国人民大学数据工程与知识工程教育部重点实验室副教授、博士生导师。
    • [http://lixirong.net/]
  3. Jiebo Luo IEEE/SPIE Fellow、长江讲座美国罗彻斯特大学教授
    • [http://www.cs.rochester.edu/u/jluo/]
  4. Subhashini Venugopalan
    • [https://www.cs.utexas.edu/~vsub/\]

Datasets
  1. MSR-VTT dataset 该数据集为ACM Multimedia 2016 的 Microsoft Research - Video to Text (MSR-VTT) Challenge。地址为 Microsoft Multimedia Challenge 。该数据集包含10000个视频片段(video clip),被分为训练,验证和测试集三部分。每个视频片段都被标注了大概20条英文句子。此外,MSR-VTT还提供了每个视频的类别信息(共计20类),这个类别信息算是先验的,在测试集中也是已知的。同时,视频都是包含音频信息的。该数据库共计使用了四种机器翻译的评价指标,分别为:METEOR, BLEU@1-4,ROUGE-L,CIDEr。
    • [https://www.microsoft.com/en-us/research/publication/msr-vtt-large-video-description-dataset-bridging-video-language-supplementary-material/]
    • [http://ms-multimedia-challenge.com/]
  2. YouTube2Text dataset(or called MSVD dataset) 该数据集同样由Microsoft Research提供,地址为 Microsoft Research Video Description Corpus 。该数据集包含1970段YouTube视频片段(时长在10-25s之间),每段视频被标注了大概40条英文句子。
    • [http://www.cs.utexas.edu/users/ml/clamp/videoDescription/]

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2017-12-01,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 专知 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 入门学习
  • 进阶文章
    • 2015
      • 2016
        • 2017
        • Tutorial
        • 代码
        • 领域专家
        • Datasets
        相关产品与服务
        机器翻译
        机器翻译(Tencent Machine Translation,TMT)结合了神经机器翻译和统计机器翻译的优点,从大规模双语语料库自动学习翻译知识,实现从源语言文本到目标语言文本的自动翻译,目前可支持十余种语言的互译。
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档