前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >全球最全计算机视觉资料(5:图像和视频标注)

全球最全计算机视觉资料(5:图像和视频标注)

作者头像
朱晓霞
发布2018-07-20 16:49:30
4920
发布2018-07-20 16:49:30
举报

目标检测和深度学习

Image Captioning
  1. m-RNN模型《 Explain Images with Multimodal Recurrent Neural Networks》 2014 [https://arxiv.org/pdf/1410.1090.pdf]
  2. NIC模型 《Show and Tell: A Neural Image Caption Generator》2014
  3. MS Captivator From captions to visual concepts and back 2014
  4. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2015
  5. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? 2016 [https://arxiv.org/pdf/1506.01144.pdf]
  6. Guiding Long-Short Term Memory for Image Caption Generation 2015 https://arxiv.org/pdf/1509.04942.pdf
  7. Watch What You Just Said: Image Captioning with Text-Conditional Attention 2016 [https://arxiv.org/pdf/1606.04621.pdf] [https://github.com/LuoweiZhou/e2e-gLSTM-sc]
  8. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge,2014 [https://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6454/7204]
  9. Self-critical Sequence Training for Image Captioning 2017 CVPR [https://arxiv.org/pdf/1612.00563.pdf]
  10. Deep Reinforcement Learning-based Image Captioning with Embedding Reward 2017 cvpr [https://arxiv.org/abs/1704.03899]
  11. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning 2017 cvpr [https://arxiv.org/pdf/1612.01887.pdf] [https://github.com/jiasenlu/AdaptiveAttention]
  12. Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models, arXiv:1411.2539. [https://arxiv.org/abs/1411.2539]
  13. Berkeley Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description [https://arxiv.org/abs/1411.4389]
  14. UML / UT Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, NAACL-HLT, 2015. [https://arxiv.org/abs/1412.4729]
  15. CMU / Microsoft Xinlei Chen, C. Lawrence Zitnick, Learning a Recurrent Visual Representation for Image Caption Generation [https://arxiv.org/abs/1411.5654]
  16. Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation, CVPR 2015 [https://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf\]
  17. Facebook Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, Phrase-based Image Captioning, arXiv:1502.03671 / ICML 2015 [https://arxiv.org/abs/1502.03671]
  18. UCLA / Baidu Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images [https://arxiv.org/abs/1504.06692]
  19. MS + Berkeley Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick, Exploring Nearest Neighbor Approaches for Image Captioning [https://arxiv.org/abs/1505.04467]
  20. Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, Margaret Mitchell, Language Models for Image Captioning: The Quirks and What Works [https://arxiv.org/abs/1505.01809]
  21. Adelaide Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony Dick, Image Captioning with an Intermediate Attributes Layer [https://arxiv.org/abs/1506.01144v1]
  22. Tilburg Grzegorz Chrupala, Akos Kadar, Afra Alishahi, Learning language through pictures [https://arxiv.org/abs/1506.03694]
  23. Univ. Montreal Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks [https://arxiv.org/abs/1507.01053]
  24. Cornell Jack Hessel, Nicolas Savva, Michael J. Wilber, Image Representations and New Domains in Neural Image Captioning [https://arxiv.org/abs/1508.02091]
  25. MS + City Univ. of HongKong Ting Yao, Tao Mei, and Chong-Wah Ngo, "Learning Query and Image Similarities with Ranking Canonical Correlation Analysis", ICCV, 2015 [https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yao_Learning_Query_and_ICCV_2015_paper.pdf\]
  26. Mao J, Xu W, Yang Y, et al. Deep Captioning with Multimodal Recurrent Neural Networks (m- RNN) 2015. [https://arxiv.org/abs/1412.6632]
  27. Pan Y, Yao T, Li H, et al. Video Captioning with Transferred Semantic Attributes 2016. [https://arxiv.org/abs/1611.07675]
  28. Johnson J, Karpathy A, Li F F. DenseCap: Fully Convolutional Localization Networks for Dense Captioning [https://arxiv.org/abs/1511.07571]
  29. I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. NIPS 2014. [https://arxiv.org/abs/1409.3215]
  30. Karpathy A, Li F F. Deep Visual-Semantic Alignments for Generating Image Descriptions TPAMI 2015 [https://arxiv.org/abs/1412.2306]
  31. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. CVPR, 2014. [http://cs.stanford.edu/people/karpathy/deepvideo/]
  32. Yao T, Pan Y, Li Y, et al. Boosting Image Captioning with Attributes 2016. [https://arxiv.org/abs/1611.01646]
  33. Venugopalan S, Rohrbach M, Donahue J, et al. Sequence to Sequence -- Video to Text. 2015. [https://arxiv.org/abs/1505.00487]
  34. Captioning Images with Diverse Objects, CVPR 2017
  35. Dense Captioning with Joint Inference and Visual Context, CVPR 2017
  36. Incorporating Copying Mechanism in Image Captioning, CVPR 2017
  37. Skeleton Key Image Captioning by Skeleton-Attribute Decomposition, CVPR 2017
  38. An Empirical Study of Language CNN for Image Captioning, ICCV 2017
  39. Areas of Attention for Image Captioning, ICCV 2017
  40. Improved Image Captioning via Policy Gradient optimization of SPIDEr, ICCV 2017
  41. Paying Attention to Descriptions Generated by Image Captioning Models, ICCV 2017
  42. Scene Graph Generation from Objects, Phrases and Region Captions, ICCV 2017
  43. Show, Adapt and Tell Adversarial Training of Cross-domain Image Captioner, ICCV 2017
  44. Speaking the Same Language Matching Machine to Human Captions by Adversarial Training, ICCV 2017
  45. Paying Attention to Descriptions Generated by Image Captioning Models, ICCV 2017
Video Captioning
  1. Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015. [http://jeffdonahue.com/lrcn/] [http://arxiv.org/pdf/1411.4389.pdf]
  2. Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729. UT / UML / Berkeley [http://arxiv.org/pdf/1412.4729]
  3. Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861. Microsoft [http://arxiv.org/pdf/1505.01861]
  4. Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence--Video to Text, arXiv:1505.00487. UT / Berkeley / UML [http://arxiv.org/pdf/1505.00487]
  5. Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029 Univ. Montreal / Univ. Sherbrooke [http://arxiv.org/pdf/1502.08029.pdf]
  6. Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698 MPI / Berkeley [http://arxiv.org/pdf/1506.01698.pdf]
  7. Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724 Univ. Toronto / MIT [http://arxiv.org/pdf/1506.06724.pdf]
  8. Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053 Univ. Montreal [http://arxiv.org/pdf/1507.01053.pdf]
  9. Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950. TAU / USC [[https://arxiv.org/pdf/1612.06950.pdf]]
  10. Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks Attention-Based Multimodal Fusion for Video Description [https://arxiv.org/abs/1701.03126]
  11. Describing Videos using Multi-modal Fusion [https://dl.acm.org/citation.cfm?id=2984065]
  12. Andrew Shin , Katsunori Ohnishi , Tatsuya Harada Beyond caption to narrative: Video captioning with multiple sentences [http://ieeexplore.ieee.org/abstract/document/7532983/]
  13. Jianfeng Dong, Xirong Li, Cees G. M. Snoek Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction [https://pdfs.semanticscholar.org/de22/8875bc33e9db85123469ef80fc0071a92386.pdf]
  14. Multimodal Video Description [https://dl.acm.org/citation.cfm?id=2984066]
  15. Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing Recurrent Topic-Transition GAN for Visual Paragraph Generation [https://arxiv.org/abs/1703.07022]
  16. Weakly Supervised Dense Video Captioning(CVPR2017)
  17. Multi-Task Video Captioning with Video and Entailment Generation(ACL2017)

转发帮助更多的人~

——卡耐基

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2018-05-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 目标检测和深度学习 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Image Captioning
  • Video Captioning
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档