【专知荟萃06】计算机视觉CV知识资料大全集(入门/进阶/论文/课程/会议/专家等)(附pdf下载)

【导读】主题荟萃知识是专知的核心功能之一,为用户提供AI领域系统性的知识学习服务。主题荟萃为用户提供全网关于该主题的精华(Awesome)知识资料收录整理,使得AI从业者便捷学习和解决工作问题!在专知人工智能主题知识树基础上,主题荟萃由专业人工编辑和算法工具辅助协作完成,并保持动态更新!另外欢迎对此创作主题荟萃感兴趣的同学,请加入我们专知AI创作者计划,共创共赢! 今天专知为大家呈送第七篇专知主题荟萃-自动文摘Automatic Summarization知识资料大全集荟萃 (入门/进阶/论文/课程/会议/专家等等),请大家查看!专知访问www.zhuanzhi.ai, 或关注微信公众号后台回复" 专知"进入专知,搜索主题“自动文摘”查看。此外,我们也提供该文pdf下载链接,请文章末尾查看!此为初始版本,请大家指正补充,欢迎在后台留言!欢迎大家分享转发~

了解专知,专知,一个新的认知方式!

计算机视觉(computer vision)荟萃

  • 入门学习
  • 进阶论文
    • Image Classification
    • Object Detection
    • Video Classification
    • Object Tracking
    • Segmentation
    • Object Recognition
    • Image Captioning
    • Video Captioning
    • Visual Question Answering
    • Edge Detection
    • Human Pose Estimation
    • Image Generation
  • 课程
  • 综述
  • Turorial
  • 图书
  • 相关期刊与会议
    • 国际会议
    • 期刊
  • 领域专家
    • 华人机构和学者
    • North America
    • Europe
    • Australia
    • Asia and Middle East
  • Software
  • Datasets
  • Challenge
  • 创业公司
  • 公众号

入门学习

  1. 计算机视觉:让冰冷的机器看懂这个多彩的世界 by 孙剑
    • [http://www.msra.cn/zh-cn/news/features/computer-vision-20150210]
  2. UCLA朱松纯: 正本清源·初探计算机视觉的三个源头、兼谈人工智能
  3. 深度学习与视觉计算 by 王亮 中科院自动化所
    • [http://www.caai.cn/index.php?s=/Home/Article/qikandetail/year/2017/month/04.html]
  4. 如何做好计算机视觉的研究? by 微软 华刚博士
    • [http://www.msra.cn/zh-cn/news/features/do-research-in-computer-vision-20161205]
  5. 计算机视觉 微软亚洲研究院系列文章
    • 通俗介绍计算机视觉在生活中的各种应用。
    • [http://www.msra.cn/zh-cn/research/computer-vision]
  6. 计算机视觉随谈
    • [http://blog.csdn.net/zouxy09/article/details/38639349]
  7. 计算机视觉:就在你我身边 微软
  8. 什么是计算机视觉?什么是机器视觉?
  9. 卷积神经网络如何进行图像识别
    • [http://www.infoq.com/cn/articles/convolutional-neural-networks-image-recognition]
  10. 相似图片搜索的原理 阮一峰
    • [http://www.ruanyifeng.com/blog/2011/07/principle_of_similar_image_search.html\]
  11. 如何识别图像边缘? 阮一峰
    • [http://www.ruanyifeng.com/blog/2016/07/edge-recognition.html]
  12. 图像目标检测(Object Detection)原理与实现 (1-6)
    • [http://www.voidcn.com/article/p-xnjyqlkj-ua.html]
  13. 运动目标跟踪系列(1-17)
    • [http://blog.csdn.net/App_12062011/article/category/6269524/1]
  14. 看图说话的AI小朋友——图像标注趣谈(上,下)
    • [https://zhuanlan.zhihu.com/p/22408033]
    • [https://zhuanlan.zhihu.com/p/22520434]
  15. Video Analysis 相关领域介绍之Video Captioning(视频to文字描述)
    • [https://zhuanlan.zhihu.com/p/26730181]
  16. 从特斯拉到计算机视觉之「图像语义分割」
    • [https://zhuanlan.zhihu.com/p/21824299]
  17. 计算机视觉识别简史:从 AlexNet、ResNet 到 Mask RCNN
  18. 深度学习在计算机视觉领域的前沿进展
    • [https://zhuanlan.zhihu.com/p/24699780]
  19. 深度学习时代的计算机视觉 [https://mp.weixin.qq.com/s/gExfzCxjHrSb7afn33f-lA]
  20. 视觉求索 公众号相关文章系列,
  21. 深度学习大讲堂 公众号相关文章系列
    • 深度学习在目标跟踪中的应用 [https://zhuanlan.zhihu.com/p/22334661]
    • 深度学习在图像取证中的进展与趋势 [https://zhuanlan.zhihu.com/p/23341157]
    • 行人检测、跟踪与检索领域年度进展报告 [https://zhuanlan.zhihu.com/p/26807041]
    • 基于深度学习的目标检测研究进展 [https://zhuanlan.zhihu.com/p/21412911]
    • 基于深度学习的视觉实例搜索研究进展 [https://zhuanlan.zhihu.com/p/22265265]
    • 基于深度学习的VQA(视觉问答)技术 [https://zhuanlan.zhihu.com/p/22530291]
    • 人脸识别简史与近期进展 [https://zhuanlan.zhihu.com/p/21465605]
    • 边缘检测领域年度进展报告 [https://zhuanlan.zhihu.com/p/26848831]
    • 目标跟踪领域进展报告 [https://zhuanlan.zhihu.com/p/27293523]

进阶论文

Image Classification

  1. Microsoft Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition [http://arxiv.org/pdf/1512.03385v1.pdf] [[http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf]]
  2. Microsoft Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, [http://arxiv.org/pdf/1502.01852]
  3. Batch Normalization Sergey Ioffe, Christian Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[http://arxiv.org/pdf/1502.03167]
  4. GoogLeNet Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, CVPR, 2015. [http://arxiv.org/pdf/1409.4842]
  5. VGG-Net Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, ICLR, 2015. [http://www.robots.ox.ac.uk/~vgg/research/very_deep/] [http://arxiv.org/pdf/1409.1556]
  6. AlexNet Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012. [http://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012]

Object Detection

  1. Deep Neural Networks for Object Detection (基于DNN的对象检测)NIPS2013:
    • [https://cis.temple.edu/~yuhong/teach/2014_spring/papers/NIPS2013_DNN_OD.pdf]
  2. R-CNN Rich feature hierarchies for accurate object detection and semantic segmentation:
    • [https://arxiv.org/abs/1311.2524]
  3. Fast R-CNN :
    • [http://arxiv.org/abs/1504.08083]
  4. Faster R-CNN Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks:
    • [http://arxiv.org/abs/1506.01497]
  5. Scalable Object Detection using Deep Neural Networks
    • [http://arxiv.org/abs/1312.2249]
  6. Scalable, High-Quality Object Detection
    • [http://arxiv.org/abs/1412.1441]
  7. SPP-Net Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
    • [http://arxiv.org/abs/1406.4729]
  8. DeepID-Net DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
    • [http://www.ee.cuhk.edu.hk/%CB%9Cwlouyang/projects/imagenetDeepId/index.html]
  9. Object Detectors Emerge in Deep Scene CNNs
    • [http://arxiv.org/abs/1412.6856]
  10. segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection
    • [https://arxiv.org/abs/1502.04275]
  11. Object Detection Networks on Convolutional Feature Maps
    • [http://arxiv.org/abs/1504.06066]
  12. Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction
    • [http://arxiv.org/abs/1504.03293]
  13. DeepBox: Learning Objectness with Convolutional Networks
    • [http://arxiv.org/abs/1504.03293]
  14. Object detection via a multi-region & semantic segmentation-aware CNN model
    • [http://arxiv.org/abs/1505.01749]
  15. You Only Look Once: Unified, Real-Time Object Detection
    • [http://arxiv.org/abs/1506.02640]
  16. YOLOv2 YOLO9000: Better, Faster, Stronger
    • [https://arxiv.org/abs/1612.08242]
  17. AttentionNet: Aggregating Weak Directions for Accurate Object Detection
    • [http://arxiv.org/abs/1506.07704]
  18. DenseBox: Unifying Landmark Localization with End to End Object Detection
    • [http://arxiv.org/abs/1509.04874]
  19. SSD: Single Shot MultiBox Detector
    • [http://arxiv.org/abs/1512.02325]
  20. DSSD : Deconvolutional Single Shot Detector
    • [https://arxiv.org/abs/1701.06659]
  21. G-CNN: an Iterative Grid Based Object Detector
    • [http://arxiv.org/abs/1512.07729]
  22. HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection
    • [http://arxiv.org/abs/1604.00600]
  23. A MultiPath Network for Object Detection
    • [http://arxiv.org/abs/1604.02135]
  24. R-FCN: Object Detection via Region-based Fully Convolutional Networks
    • [http://arxiv.org/abs/1605.06409]
  25. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
    • [http://arxiv.org/abs/1607.07155]
  26. PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection
    • [http://arxiv.org/abs/1608.08021]
  27. Feature Pyramid Networks for Object Detection
    • [https://arxiv.org/abs/1612.03144]
  28. Learning Chained Deep Features and Classifiers for Cascade in Object Detection
    • [https://arxiv.org/abs/1702.07054]
  29. DSOD: Learning Deeply Supervised Object Detectors from Scratch
    • [https://arxiv.org/abs/1708.01241]
  30. Focal Loss for Dense Object Detection ICCV 2017 Best student paper award. Facebook AI Research
    • [https://arxiv.org/abs/1708.02002]
  31. Mask-RCNN 2017 ICCV 2017 Best paper award. Facebook AI Research - [https://arxiv.org/pdf/1703.06870.pdf]

Video Classification

  1. Nicolas Ballas, Li Yao, Pal Chris, Aaron Courville, "Delving Deeper into Convolutional Networks for Learning Video Representations", ICLR 2016. [http://arxiv.org/pdf/1511.06432v4.pdf]
  2. Michael Mathieu, camille couprie, Yann Lecun, "Deep Multi Scale Video Prediction Beyond Mean Square Error", ICLR 2016. Paper [http://arxiv.org/pdf/1511.05440v6.pdf]
  3. Donahue, Jeffrey, et al. Long-term recurrent convolutional networks for visual recognition and description CVPR 2015 [https://arxiv.org/abs/1411.4389]
  4. Karpathy, Andrej, et al. Large-scale Video Classification with Convolutional Neural Networks. CVPR 2014 [http://cs.stanford.edu/people/karpathy/deepvideo/]
  5. Yue-Hei Ng, Joe, et al. Beyond short snippets: Deep networks for video classification. CVPR 2015 [https://arxiv.org/abs/1503.08909]
  6. Tran, Du, et al. Learning Spatiotemporal Features with 3D Convolutional Networks. ICCV 2015 [https://arxiv.org/abs/1412.0767]

Object Tracking

NIPS2013

  • DLT: Naiyan Wang and Dit-Yan Yeung. "Learning A Deep Compact Image Representation for Visual Tracking." NIPS (2013).
    • paper [http://winsty.net/papers/dlt.pdf)]
    • project [http://winsty.net/dlt.html)]
    • code [http://winsty.net/dlt/DLTcode.zip)]

CVPR2014

  • CN: Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg and Joost van de Weijer. "Adaptive Color Attributes for Real-Time Visual Tracking." CVPR (2014).
    • paper [http://www.cvl.isy.liu.se/research/objrec/visualtracking/colvistrack/CN_Tracking_CVPR14.pdf]
    • project [http://www.cvl.isy.liu.se/research/objrec/visualtracking/colvistrack/index.html]

ECCV2014

  • MEEM: Jianming Zhang, Shugao Ma, and Stan Sclaroff. "MEEM: Robust Tracking via Multiple Experts using Entropy Minimization." ECCV (2014).
    • paper [http://cs-people.bu.edu/jmzhang/MEEM/MEEM-eccv-preprint.pdf]
    • project [http://cs-people.bu.edu/jmzhang/MEEM/MEEM.html]
  • TGPR: Jin Gao, Haibin Ling, Weiming Hu, Junliang Xing. "Transfer Learning Based Visual Tracking with Gaussian Process Regression." ECCV (2014).
    • paper [http://www.dabi.temple.edu/~hbling/publication/tgpr-eccv14.pdf]
    • project [http://www.dabi.temple.edu/~hbling/code/TGPR.htm]
  • STC: Kaihua Zhang, Lei Zhang, Ming-Hsuan Yang, David Zhang. "Fast Tracking via Spatio-Temporal Context Learning." ECCV (2014).
    • paper [http://arxiv.org/pdf/1311.1939v1.pdf]
    • project [http://www4.comp.polyu.edu.hk/~cslzhang/STC/STC.htm]
  • SAMF: Yang Li, Jianke Zhu. "A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration." ECCV workshop (2014).
    • paper [http://link.springer.com/content/pdf/10.1007%2F978-3-319-16181-5_18.pdf]
    • github [https://github.com/ihpdep/samf]

BMVC2014

  • DSST: Martin Danelljan, Gustav Häger, Fahad Shahbaz Khan and Michael Felsberg. "Accurate Scale Estimation for Robust Visual Tracking." BMVC (2014).
    • paper [http://www.cvl.isy.liu.se/research/objrec/visualtracking/scalvistrack/ScaleTracking_BMVC14.pdf]
    • PAMI [http://www.cvl.isy.liu.se/en/research/objrec/visualtracking/scalvistrack/DSST_TPAMI.pdf]
    • project [http://www.cvl.isy.liu.se/en/research/objrec/visualtracking/scalvistrack/index.html]
  • SAMF: Yang Li, Jianke Zhu. "A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration." ECCV workshop (2014).
    • paper [https://github.com/ihpdep/ihpdep.github.io/raw/master/papers/eccvw14_samf.pdf]
    • github [https://github.com/ihpdep/samf]

ICML2015

  • CNN-SVM: Seunghoon Hong, Tackgeun You, Suha Kwak and Bohyung Han. "Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network ." ICML (2015)
    • paper [http://120.52.73.80/arxiv.org/pdf/1502.06796.pdf]
    • project [http://cvlab.postech.ac.kr/research/CNN_SVM/]

CVPR2015

  • MUSTer: Zhibin Hong, Zhe Chen, Chaohui Wang, Xue Mei, Danil Prokhorov, Dacheng Tao. "MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking." CVPR (2015).
    • paper [http://openaccess.thecvf.com/content_cvpr_2015/papers/Hong_MUlti-Store_Tracker_MUSTer_2015_CVPR_paper.pdf]
    • project [https://sites.google.com/site/multistoretrackermuster/]
  • LCT: Chao Ma, Xiaokang Yang, Chongyang Zhang, Ming-Hsuan Yang. "Long-term Correlation Tracking." CVPR (2015).
    • paper [http://openaccess.thecvf.com/content_cvpr_2015/papers/Ma_Long-Term_Correlation_Tracking_2015_CVPR_paper.pdf]
    • project [https://sites.google.com/site/chaoma99/cvpr15_tracking]
    • github [https://github.com/chaoma99/lct-tracker]
  • DAT: Horst Possegger, Thomas Mauthner, and Horst Bischof. "In Defense of Color-based Model-free Tracking." CVPR (2015).
    • paper [https://lrs.icg.tugraz.at/pubs/possegger_cvpr15.pdf]
    • project [https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/dat]
    • code [https://lrs.icg.tugraz.at/downloads/dat-v1.0.zip]
  • RPT: Yang Li, Jianke Zhu and Steven C.H. Hoi. "Reliable Patch Trackers: Robust Visual Tracking by Exploiting Reliable Patches." CVPR (2015).
    • paper [https://github.com/ihpdep/ihpdep.github.io/raw/master/papers/cvpr15_rpt.pdf]
    • github [https://github.com/ihpdep/rpt]

ICCV2015

  • FCNT: Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." ICCV (2015).
    • paper [http://202.118.75.4/lu/Paper/ICCV2015/iccv15_lijun.pdf]
    • project [http://scott89.github.io/FCNT/]
    • github [https://github.com/scott89/FCNT]
  • SRDCF: Martin Danelljan, Gustav Häger, Fahad Khan, Michael Felsberg. "Learning Spatially Regularized Correlation Filters for Visual Tracking." ICCV (2015).
    • paper [https://www.cvl.isy.liu.se/research/objrec/visualtracking/regvistrack/SRDCF_ICCV15.pdf]
    • project [https://www.cvl.isy.liu.se/research/objrec/visualtracking/regvistrack/]
  • CF2: Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang. "Hierarchical Convolutional Features for Visual Tracking." ICCV (2015)
    • paper [http://faculty.ucmerced.edu/mhyang/papers/iccv15_tracking.pdf]
    • project [https://sites.google.com/site/jbhuang0604/publications/cf2]
    • github [https://github.com/jbhuang0604/CF2]
  • Naiyan Wang, Jianping Shi, Dit-Yan Yeung and Jiaya Jia. "Understanding and Diagnosing Visual Tracking Systems." ICCV (2015).
    • paper [http://winsty.net/papers/diagnose.pdf]
    • project [http://winsty.net/tracker_diagnose.html]
    • code [http://winsty.net/diagnose/diagnose_code.zip]

Segmentation

  1. Alexander Kolesnikov, Christoph Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, 2016. [http://pub.ist.ac.at/~akolesnikov/files/ECCV2016/main.pdf] [https://github.com/kolesman/SEC]
  2. Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. [http://arxiv.org/pdf/1504.01013]
  3. Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. [http://arxiv.org/pdf/1506.02108]

Object Recognition

  1. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell [http://arxiv.org/abs/1310.1531]
  2. CNN Features off-the-shelf: an Astounding Baseline for Recognition CVPR 2014 [http://arxiv.org/abs/1403.6382]
  3. HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification intro: ICCV 2015 [https://arxiv.org/abs/1410.0736]

Image Captioning

  1. m-RNN模型《 Explain Images with Multimodal Recurrent Neural Networks》 2014 [https://arxiv.org/pdf/1410.1090.pdf]
  2. NIC模型 《Show and Tell: A Neural Image Caption Generator》2014
  3. MS Captivator From captions to visual concepts and back 2014
  4. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention 2015
  5. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? 2016 [https://arxiv.org/pdf/1506.01144.pdf]
  6. Guiding Long-Short Term Memory for Image Caption Generation 2015https://arxiv.org/pdf/1509.04942.pdf
  7. Watch What You Just Said: Image Captioning with Text-Conditional Attention 2016 [https://arxiv.org/pdf/1606.04621.pdf] [https://github.com/LuoweiZhou/e2e-gLSTM-sc]
  8. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge,2014 [https://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6454/7204]
  9. Self-critical Sequence Training for Image Captioning 2017 CVPR [https://arxiv.org/pdf/1612.00563.pdf]

Video Captioning

  1. Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term Recurrent Convolutional Networks for Visual Recognition and Description, CVPR, 2015. [http://jeffdonahue.com/lrcn/] [http://arxiv.org/pdf/1411.4389.pdf]
  2. Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, arXiv:1412.4729. UT / UML / Berkeley [http://arxiv.org/pdf/1412.4729]
  3. Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint Modeling Embedding and Translation to Bridge Video and Language, arXiv:1505.01861. Microsoft [http://arxiv.org/pdf/1505.01861]
  4. Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence--Video to Text, arXiv:1505.00487. UT / Berkeley / UML [http://arxiv.org/pdf/1505.00487]
  5. Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting Temporal Structure, arXiv:1502.08029 Univ. Montreal / Univ. Sherbrooke [http://arxiv.org/pdf/1502.08029.pdf]]
  6. Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story of Movie Description, arXiv:1506.01698 MPI / Berkeley [http://arxiv.org/pdf/1506.01698.pdf]]
  7. Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books, arXiv:1506.06724 Univ. Toronto / MIT [[http://arxiv.org/pdf/1506.06724.pdf]]
  8. Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia Content using Attention-based Encoder-Decoder Networks, arXiv:1507.01053 Univ. Montreal [http://arxiv.org/pdf/1507.01053.pdf]
  9. Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal Tessellation for Video Annotation and Summarization, arXiv:1612.06950. TAU / USC [[https://arxiv.org/pdf/1612.06950.pdf]]
  10. Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. Hershey, Tim K. Marks Attention-Based Multimodal Fusion for Video Description [https://arxiv.org/abs/1701.03126]
  11. Describing Videos using Multi-modal Fusion [https://dl.acm.org/citation.cfm?id=2984065]
  12. Andrew Shin , Katsunori Ohnishi , Tatsuya Harada Beyond caption to narrative: Video captioning with multiple sentences [http://ieeexplore.ieee.org/abstract/document/7532983/]
  13. Jianfeng Dong, Xirong Li, Cees G. M. Snoek Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction [https://pdfs.semanticscholar.org/de22/8875bc33e9db85123469ef80fc0071a92386.pdf]
  14. Multimodal Video Description [https://dl.acm.org/citation.cfm?id=2984066]
  15. Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing Recurrent Topic-Transition GAN for Visual Paragraph Generation [https://arxiv.org/abs/1703.07022]
  16. Weakly Supervised Dense Video Captioning(CVPR2017)
  17. Multi-Task Video Captioning with Video and Entailment Generation(ACL2017)

Visual Question Answering

  1. Kushal Kafle, and Christopher Kanan. An Analysis of Visual Question Answering Algorithms. arXiv:1703.09684, 2017. [https://arxiv.org/abs/1703.09684]
  2. Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim, Dual Attention Networks for Multimodal Reasoning and Matching, arXiv:1611.00471, 2016. [https://arxiv.org/abs/1611.00471]
  3. Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang, Hadamard Product for Low-rank Bilinear Pooling, arXiv:1610.04325, 2016. [https://arxiv.org/abs/1610.04325]
  4. Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach, Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, arXiv:1606.01847, 2016. [https://arxiv.org/abs/1606.01847] [[code]][https://github.com/akirafukui/vqa-mcb]
  5. Kuniaki Saito, Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada, DualNet: Domain-Invariant Network for Visual Question Answering. arXiv:1606.06108v1, 2016. [https://arxiv.org/pdf/1606.06108.pdf]
  6. Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh, Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions, arXiv:1606.06622, 2016. [https://arxiv.org/pdf/1606.06622v1.pdf]
  7. Hyeonwoo Noh, Bohyung Han, Training Recurrent Answering Units with Joint Loss Minimization for VQA, arXiv:1606.03647, 2016. [http://arxiv.org/abs/1606.03647v1]

Edge Detection

  1. Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection Holistically-Nested Edge Detection [http://arxiv.org/pdf/1504.06375] [https://github.com/s9xie/hed]
  2. Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015. [http://arxiv.org/pdf/1412.1123]
  3. Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015. [http://mc.eistar.net/UpLoadFiles/Papers/DeepContour_cvpr15.pdf]

Human Pose Estimation

  1. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, CVPR, 2017.
  2. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele, Deepcut: Joint subset partition and labeling for multi person pose estimation, CVPR, 2016.
  3. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh, Convolutional pose machines, CVPR, 2016.
  4. Alejandro Newell, Kaiyu Yang, and Jia Deng, Stacked hourglass networks for human pose estimation, ECCV, 2016.
  5. Tomas Pfister, James Charles, and Andrew Zisserman, Flowing convnets for human pose estimation in videos, ICCV, 2015.
  6. Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler, Joint training of a convolutional network and a graphical model for human pose estimation, NIPS, 2014.

Image Generation

  1. Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu. "Conditional Image Generation with PixelCNN Decoders"[https://arxiv.org/pdf/1606.05328v2.pdfhttps://github.com/kundan2510/pixelCNN][]
    1. Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox, "Learning to Generate Chairs with Convolutional Neural Networks", CVPR, 2015. [http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf]
    2. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra, "DRAW: A Recurrent Neural Network For Image Generation", ICML, 2015. [https://arxiv.org/pdf/1502.04623v2.pdf]
    3. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, Generative Adversarial Networks, NIPS, 2014. [http://arxiv.org/abs/1406.2661]
    4. Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus, Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, NIPS, 2015. [http://arxiv.org/abs/1506.05751]

课程

  1. 斯坦福视觉实验室主页:http://vision.stanford.edu/ 李飞飞组CS131, CS231A, CS231n 三个课程,可是说是最好的计算机视觉课程。
  2. CS 131 Computer Vision: Foundations and Applications: 基础知识:主要讲传统的边缘检测,特征点描述,相机标定,全景图拼接等知识 [http://vision.stanford.edu/teaching/cs131_fall1415/schedule.html]
  3. CS231A Computer Vision: from 3D reconstruction to recognition: [http://cvgl.stanford.edu/teaching/cs231a_winter1415/schedule.html]
  4. CS231n 2017: Convolutional Neural Networks for Visual Recognition 主要讲卷积神经网络的具体结构,各组成部分的原理优化以及各种应用。 [http://vision.stanford.edu/teaching/cs231n/] 国内地址:[http://www.bilibili.com/video/av13260183/]
  5. Stanford CS231n 2016 : Convolutional Neural Networks for Visual Recognition
    • homepage: [http://cs231n.stanford.edu/]
    • homepage: [http://vision.stanford.edu/teaching/cs231n/index.html]
    • syllabus: [http://vision.stanford.edu/teaching/cs231n/syllabus.html]
    • course notes: [http://cs231n.github.io/]
    • youtube: [https://www.youtube.com/watch?v=NfnWJUyUJYU&feature=youtu.be]
    • mirror: [http://pan.baidu.com/s/1pKsTivp]
    • mirror: [http://pan.baidu.com/s/1c2wR8dy]
    • 网易中文字幕:[http://study.163.com/course/introduction/1003223001.htm]
    • assignment 1: [http://cs231n.github.io/assignments2016/assignment1/]
    • assignment 2: [http://cs231n.github.io/assignments2016/assignment2/]
    • assignment 3: [http://cs231n.github.io/assignments2016/assignment3/]
  6. 1st Summer School on Deep Learning for Computer Vision Barcelona: (July 4-8, 2016)
    • youtube: [https://www.youtube.com/user/imatgeupc/videos?shelf_id=0&sort=dd&view=0]
    • 深度学习计算机视觉夏季学校课程, 包含基础知识以及许多深度学习在计算机视觉中的应用,比如分类,检测,captioning等等
    • homepage(slides+videos): [http://imatge-upc.github.io/telecombcn-2016-dlcv/]
    • homepage: [https://imatge.upc.edu/web/teaching/deep-learning-computer-vision]
  7. 2nd Summer School on Deep Learning for Computer VisionBarcelona (June 21-27, 2017) [https://telecombcn-dl.github.io/2017-dlcv/]

综述

  1. Annotated Computer Vision Bibliography: Table of Contents. Since 1994 Keith Price从1994年开始做了这个索引,涵盖了所有计算机视觉里面所有topic,所有subtopic的著作,包括论文,教材,还对各类主题的关键词。这个网站频繁更新(最近一次是2017年8月28号),收录每个方向重要期刊,会议文献和书籍,并且保证了所有链接不失效。
  2. What Sparked Video Research in 1877? The Overlooked Role of the Siemens Artificial Eye by Mark Schubin 2017 [http://ieeexplore.ieee.org/document/7857854/]
  3. Giving machines humanlike eyes. by Posch, C., Benosman, R., Etienne-Cummings, R. 2015 [http://ieeexplore.ieee.org/document/7335800/]
  4. Seeing is not enough by Tom GellerOberlin, OH [https://dl.acm.org/citation.cfm?id=2001276]
  5. Visual Tracking: An Experimental Survey [https://dl.acm.org/citation.cfm?id=2693387]
  6. A survey on object recognition and segmentation techniques [http://ieeexplore.ieee.org/document/7724975/]
  7. A Review of Image Recognition with Deep Convolutional Neural Network [https://link.springer.com/chapter/10.1007/978-3-319-63309-1_7\]
  8. Recent Advance in Content-based Image Retrieval: A Literature Survey. Wengang Zhou, Houqiang Li, and Qi Tian 2017 [https://arxiv.org/pdf/1706.06064.pdf]
  9. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures 2016 [https://www.jair.org/media/4900/live-4900-9139-jair.pdf]

Turorial

  1. Intro to Deep Learning for Computer Vision 2016 [http://chaosmail.github.io/deeplearning/2016/10/22/intro-to-deep-learning-for-computer-vision/]
  2. CVPR 2014 Tutorial on Deep Learning in Computer Vision [https://sites.google.com/site/deeplearningcvpr2014/]
  3. CVPR 2015 Applied Deep Learning for Computer Vision with Torch [https://github.com/soumith/cvpr2015]
  4. Deep Learning for Computer Vision – Introduction to Convolution Neural Networks [http://www.analyticsvidhya.com/blog/2016/04/deep-learning-computer-vision-introduction-convolution-neural-networks/]
  5. A Beginner's Guide To Understanding Convolutional Neural Networks [https://adeshpande3.github.io/adeshpande3.github.io/A-Beginners-Guide-To-Understanding-Convolutional-Neural-Networks/']
  6. CVPR'17 Tutorial Deep Learning for Objects and Scenes by Kaiming He Ross Girshick [http://deeplearning.csail.mit.edu/]
  7. CVPR tutorial : Large-Scale Visual Recognition [http://www.europe.naverlabs.com/Research/Computer-Vision/Highlights/CVPR-tutorial-Large-Scale-Visual-Recognition]
  8. CVPR’16 Tutorial on Image Tag Assignment, Refinement and Retrieval [http://www.lambertoballan.net/2016/06/cvpr16-tutorial-image-tag-assignment-refinement-and-retrieval/]
  9. Tutorial on Answering Questions about Images with Deep Learning The tutorial was presented at '2nd Summer School on Integrating Vision and Language: Deep Learning' in Malta, 2016 [https://arxiv.org/abs/1610.01076]
  10. “Semantic Segmentation for Scene Understanding: Algorithms and Implementations" tutorial [ https://www.youtube.com/watch?v=pQ318oCGJGY]
  11. A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach [http://minds.jacobs-university.de/sites/default/files/uploads/papers/ESNTutorialRev.pdf] [http://deeplearning.cs.cmu.edu/notes/shaoweiwang.pdf]
  12. Towards Good Practices for Recognition & Detection by Hikvision Research Institute. Supervised Data Augmentation (SDA) [http://image-net.org/challenges/talks/2016/Hikvision_at_ImageNet_2016.pdf]
  13. Generative Adversarial Networks by Ian Goodfellow, NIPS 2016 tutorial [ https://arxiv.org/abs/1701.00160] [http://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf]
  14. Deep Learning for Computer Vision – Introduction to Convolution Neural Networks [http://www.analyticsvidhya.com/blog/2016/04/deep-learning-computer-vision-introduction-convolution-neural-networks/]

图书

  1. 两本经典教材《Computer Vision: A Modern Approach》和《Computer Vision: Algorithms and Applications》,可以先读完第一本再读第二本。
  2. Computer Vision: A Modern Approach by David A. Forsyth, Jean Ponce 英文:[http://cmuems.com/excap/readings/forsyth-ponce-computer-vision-a-modern-approach.pdf] 中文:[https://pan.baidu.com/s/1min99eK]
  3. Computer Vision: Algorithms and Applications by Richard Szeliski 英文:[http://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf\] 中文:[https://pan.baidu.com/s/1mhYGtio]
  4. Computer Vision: Models, Learning, and Inference by Simon J.D. Prince 书的主页上还有配套的Slider, 代码,tutorial,演示等各种资源。 [http://www.computervisionmodels.com/]

相关期刊与会议

国际会议

  1. CVPR, Computer Vision and Pattern Recognition CVPR 2017:[http://cvpr2017.thecvf.com/]
  2. ICCV, International Conference on Computer Vision ICCV2017:[http://iccv2017.thecvf.com/]
  3. ECCV, European Conference on Computer Vision
  4. SIGGRAPH, Special Interest Group on Computer Graphics and Interactive techniques SIGGRAPH2017 [http://s2017.siggraph.org/]
  5. ACM International Conference on Multimedia ACMMM2017: [http://www.acmmm.org/2017/]
  6. ICIP, International Conference on Image Processing [http://2017.ieeeicip.org/]

期刊

  1. ACM Transactions on Graphics, TOG
  2. International Journal of Computer Vision, IJCV
  3. IEEE Trans on Pattern Analysis and Machine Intelligence, TPAMI
  4. IEEE Transactions on Image Processing, TIP
  5. IEEE Transactions on Visualization and Computer Graphics, TVCG
  6. IEEE Communications Surveys and Tutorials
  7. IEEE Signal Processing Magazine
  8. IEEE Transactions on EVOLUTIONARY COMPUTATION
  9. IEEE Transactions on GEOSCIENCE and REMOTE SENSING 2区
  10. IEEE Transactions on Pattern Analysis and Machine Intelligence
  11. NEUROCOMPUTING 2区
  12. Pattern Recognition Letters 2区
  13. Proceedings of the IEEE
  14. Signal image and Video Processing 4区
  15. IEEE journal on Selected areas in Communications 2区
  16. IEEE Transactions on image Processing 2区
  17. journal of Visual Communication and image Representation 3区
  18. Machine Vision and Application 3区
  19. Pattern Recognition 2区
  20. Signal Processing-image Communication 3区
  21. COMPUTER Vision and image UNDERSTANDING 3区
  22. IEEE Communications Surveys and Tutorials
  23. IET image Processing 4区
  24. Artificial Intelligence 2区
  25. Machine Learning 3区
  26. Medical image Analysis 2区

领域专家

(水平有限,漏了很多大牛,欢迎大家提建议和补充,会一直保持更新)

华人机构和学者

  • 旷视首席科学家, 前MSRA首席研究员 孙剑 [http://www.jiansun.org/]
  • 微软微软全球执行副总裁 沈向阳 [https://news.microsoft.com/exec/harry-shum/]
  • 微软亚洲研究院 华刚 [https://www.microsoft.com/en-us/research/people/ganghua/]
  • 上海科技大学的 虞晶怡 [http://www.yu-jingyi.com/]
  • 微软亚洲研究院 梅涛 [https://www.microsoft.com/en-us/research/people/tmei/]
  • 微软亚洲研究院 张正友 [https://www.microsoft.com/en-us/research/people/zhang/]
  • 微软研究院 刘自成[http://people.ucas.ac.cn/~xlchen\]
  • 微软亚洲研究院 王井东 [https://www.microsoft.com/en-us/research/people/jingdw/]
  • 原百度研究院院长 林元庆 [https://www.linkedin.com/in/yuanqing-lin-8666789/]
  • 澳大利亚国立大学 李宏东 [http://users.cecs.anu.edu.au/~hongdong/\]
  • 加州大学伯克利分校 马毅 [http://yima.csl.illinois.edu/]
  • 密苏里科技大学 尹朝征 [http://web.mst.edu/~yinz/\]
  • 美国西北大学 吴郢 [http://www.mccormick.northwestern.edu/research-faculty/directory/profiles/wu-ying.html]
  • 新加坡国立大学 360 颜水成团队 [https://www.ece.nus.edu.sg/stfpage/eleyans/]
  • 新加坡国立大学 冯佳时 [https://sites.google.com/site/jshfeng/home]
  • 香港中文大学教授贾佳亚:http://www.cse.cuhk.edu.hk/~leojia/index.html
  • 香港中文大学多媒体实验室&商汤(汤晓鸥团队); http://mmlab.ie.cuhk.edu.hk/, [https://www.ie.cuhk.edu.hk/people/xotang.shtml]
  • 香港中文大学教授王晓刚; http://www.ee.cuhk.edu.hk/~xgwang/
  • 图森首席科学家,香港科技大学王乃岩博士以及其团队 http://www.winsty.net/
  • 美国伊利诺斯大学黄煦涛 [https://ece.illinois.edu/directory/profile/t-huang1]
  • 奥尔巴尼大学陈梅 [http://www.albany.edu/meichen/]
  • 宾夕法尼亚州立大学 刘燕西 [http://www.cse.psu.edu/~yul11/\]
  • 亮风台联合创始人、首席科学家 凌海滨及其团队 [http://www.dabi.temple.edu/~hbling/\]
  • UCLA教授朱松纯; http://www.stat.ucla.edu/~sczhu/
  • 肯塔基大学计算机系 杨睿刚 [http://www.vis.uky.edu/~ryang/\]
  • 南洋理工大学 袁浚菘
  • 中科院自动化所; http://www.ia.cas.cn/
    • 视觉信息处理研究组
    • 生物识别与安全研究组(生物识别与安全技术研究中心):http://www.cbsr.ia.ac.cn
    • 模式识别基础理论与方法研究组(Pattern Analysis and Learning Group):http://www.nlpr.ia.ac.cn/pal/
    • 计算医学研究组(脑网络组研究中心):http://www.brainnetome.org
    • 空天信息研究中心
    • 多媒体计算研究组:http://nlpr-web.ia.ac.cn/mmc/index.html
    • 机器视觉课题组:http://vision.ia.ac.cn
    • 图像视频组:http://www.nlpr.ia.ac.cn/iva
    • 智能感知与计算研究中心:http://www.cripac.ia.ac.cn
    • Li-Group(李子青组):[http://www.cbsr.ia.ac.cn/Li%20Group/index%20CH.asp], 中科奥森科技有限公司:[http://www.authenmetric.com],“中科奥森”
    • 胡卫明组
    • 中科院自动化所模式识别国家重点实验室;http://www.nlpr.ia.ac.cn/CN/model/index.shtml
  • 中科院计算所;http://www.ict.ac.cn/
    • 跨媒体计算课题组(http://mcg.ict.ac.cn)
    • 视觉信息处理与学习研究组(http://vipl.ict.ac.cnhttp://seetatech.com),下设人脸组、手语组、视频组、视觉建模组、情感计算组、视觉场景理解组、多模态生物特征组、多媒体计算与多模态智能组,中科视拓(北京)科技有限公司:
    • 中科院计算所智能信息处理重点实验室;http://iip.ict.ac.cn/
    • 前瞻研究实验室
  • 信息工程研究所(http://www.cskaoyan.com/thread-205594-1-1.html)
    • 刘偲组:http://liusi-group.com
    • 多媒体安全与智能分析研究组
  • 美国罗彻斯特大学教授罗杰波:http://www.cs.rochester.edu/u/jluo/
  • 北京大学高文教授及其团队:http://www.jdl.ac.cn/htm-gaowen/
  • 清华大学章毓晋教授及其团队:http://www.tsinghua.edu.cn/publish/ee/4157/2010/20101217173552339241557/20101217173552339241557.html
  • 清华大学朱军,艾海舟,朱文武,鲁继文教授等
    • [http://ml.cs.tsinghua.edu.cn/~jun/index.shtml]
    • [http://media.cs.tsinghua.edu.cn/~ahz/\]
    • [https://baike.baidu.com/item/%E6%9C%B1%E6%96%87%E6%AD%A6/10181070?fr=aladdin]
    • [http://www.au.tsinghua.edu.cn/publish/au/1714/2016/20160229104943061296929/20160229104943061296929_.html]
  • 西安交通大学人工智能与机器人研究所 (郑南宁 龚怡宏):http://www.aiar.xjtu.edu.cn/[http://gr.xjtu.edu.cn/web/ygong/home]
  • 天津大学计算机图形图像与可视计算实验室
  • 上海交通大学计算机视觉实验室刘允才教授:http://www.visionlab.sjtu.edu.cn/
  • https://cvsjtu.wordpress.com/
  • 浙江大学:何晓飞,蔡登,宋明黎,李玺,朱建科,潘纲等老师团队
    • 浙江大学图像技术研究与应用(ITRA)团队:http://www.dvzju.com/
  • 中国科学技术大学 查正军 [http://auto.ustc.edu.cn/teacher_details.php?i=362\]
  • 南京大学 吴建鑫 [https://cs.nju.edu.cn/wujx/]
  • 中山大学:郑伟诗,林倞教授团队
  • 南开:程明明教授团队
  • 南京审计大学:吴毅教授(tracking)
  • 大连理工大学:卢湖川教授(tracking)
  • 厦门大学:纪荣嵘和王菡子教授等
  • 华中科技大学:白翔教授团队(text detection)
  • 北京邮电大学郭军老师组
  • 哈工大: 左旺孟老师团队

Software

  1. Caffe[http://caffe.berkeleyvision.org/]
  2. PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration[https://github.com/pytorch/pytorch]
  3. CNTK - Microsoft Cognitive Toolkit[https://github.com/Microsoft/CNTK]
  4. Theano[http://deeplearning.net/software/theano/]
  5. cuda-convnet[https://code.google.com/p/cuda-convnet2/]
  6. DeepLearnToolbox[https://github.com/rasmusbergpalm/DeepLearnToolbox]
  7. Deepnet[https://github.com/nitishsrivastava/deepnet]
  8. Deeppy[https://github.com/andersbll/deeppy]
  9. JavaNN[https://github.com/ivan-vasilev/neuralnetworks]
  10. hebel[https://github.com/hannes-brt/hebel]
  11. Mocha.jl[https://github.com/pluskid/Mocha.jl]
  12. OpenDL[https://github.com/guoding83128/OpenDL]
  13. cuDNN[https://developer.nvidia.com/cuDNN]
  14. MGL[http://melisgl.github.io/mgl-pax-world/mgl-manual.html]
  15. Knet.jl[https://github.com/denizyuret/Knet.jl]
  16. Nvidia DIGITS - a web app based on Caffe[https://github.com/NVIDIA/DIGITS]
  17. Neon - Python based Deep Learning Framework[https://github.com/NervanaSystems/neon]
  18. . Keras - Theano based Deep Learning Library[http://keras.io]
  19. . Chainer - A flexible framework of neural networks for deep learning[http://chainer.org/]
  20. RNNLIB - A recurrent neural network library[http://sourceforge.net/p/rnnl/wiki/Home/]
  21. Brainstorm - Fast, flexible and fun neural networks.[https://github.com/IDSIA/brainstorm]
  22. Tensorflow - Open source software library for numerical computation using data flow graphs[https://github.com/tensorflow/tensorflow]
  23. DMTK - Microsoft Distributed Machine Learning Tookit[https://github.com/Microsoft/DMTK]
  24. Scikit Flow - Simplified interface for TensorFlow [mimicking Scikit Learn][https://github.com/google/skflow]
  25. MXnet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning framework[https://github.com/dmlc/mxnet/]
  26. Apache SINGA - A General Distributed Deep Learning Platform[http://singa.incubator.apache.org/]
  27. DSSTNE - Amazon's library for building Deep Learning models[https://github.com/amznlabs/amazon-dsstne]
  28. SyntaxNet - Google's syntactic parser - A TensorFlow dependency library[https://github.com/tensorflow/models/tree/master/syntaxnet]
  29. mlpack - A scalable Machine Learning library[http://mlpack.org/]
  30. Paddle - PArallel Distributed Deep LEarning by Baidu[https://github.com/baidu/paddle]
  31. NeuPy - Theano based Python library for ANN and Deep Learning[http://neupy.com]
  32. Sonnet - a library for constructing neural networks by Google's DeepMind[https://github.com/deepmind/sonnet]

Datasets

Detection

  1. PASCAL VOC 2009 dataset Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets
  2. LabelMe dataset LabelMe is a web-based image annotation tool that allows researchers to label images and share the annotations with the rest of the community. If you use the database, we only ask that you contribute to it, from time to time, by using the labeling tool.
  3. BioID Face Detection Database
  4. 1521 images with human faces, recorded under natural conditions, i.e. varying illumination and complex background. The eye positions have been set manually.
  5. CMU/VASC & PIE Face dataset
  6. Yale Face dataset
  7. Caltech Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds
  8. Caltech 101 Pictures of objects belonging to 101 categories
  9. Caltech 256 Pictures of objects belonging to 256 categories
  10. Daimler Pedestrian Detection Benchmark 15,560 pedestrian and non-pedestrian samples (image cut-outs) and 6744 additional full images not containing pedestrians for bootstrapping. The test set contains more than 21,790 images with 56,492 pedestrian labels (fully visible or partially occluded), captured from a vehicle in urban traffic.
  11. MIT Pedestrian dataset CVC Pedestrian Datasets
  12. CVC Pedestrian Datasets CBCL Pedestrian Database
  13. MIT Face dataset CBCL Face Database
  14. MIT Car dataset CBCL Car Database
  15. MIT Street dataset CBCL Street Database
  16. INRIA Person Data Set A large set of marked up images of standing or walking people
  17. INRIA car dataset A set of car and non-car images taken in a parking lot nearby INRIA
  18. INRIA horse dataset A set of horse and non-horse images
  19. H3D Dataset 3D skeletons and segmented regions for 1000 people in images
  20. HRI RoadTraffic dataset A large-scale vehicle detection dataset
  21. BelgaLogos 10000 images of natural scenes, with 37 different logos, and 2695 logos instances, annotated with a bounding box.
  22. FlickrBelgaLogos 10000 images of natural scenes grabbed on Flickr, with 2695 logos instances cut and pasted from the BelgaLogos dataset.
  23. FlickrLogos-32 The dataset FlickrLogos-32 contains photos depicting logos and is meant for the evaluation of multi-class logo detection/recognition as well as logo retrieval methods on real-world images. It consists of 8240 images downloaded from Flickr.
  24. TME Motorway Dataset 30000+ frames with vehicle rear annotation and classification (car and trucks) on motorway/highway sequences. Annotation semi-automatically generated using laser-scanner data. Distance estimation and consistent target ID over time available.
  25. PHOS (Color Image Database for illumination invariant feature selection) Phos is a color image database of 15 scenes captured under different illumination conditions. More particularly, every scene of the database contains 15 different images: 9 images captured under various strengths of uniform illumination, and 6 images under different degrees of non-uniform illumination. The images contain objects of different shape, color and texture and can be used for illumination invariant feature detection and selection.

Classification

  1. PASCAL VOC 2009 dataset Classification/Detection Competitions, Segmentation Competition, Person Layout Taster Competition datasets
  2. Caltech Cars, Motorcycles, Airplanes, Faces, Leaves, Backgrounds
  3. Caltech 101 Pictures of objects belonging to 101 categories
  4. Caltech 256 Pictures of objects belonging to 256 categories
  5. ETHZ Shape Classes A dataset for testing object class detection algorithms. It contains 255 test images and features five diverse shape-based classes (apple logos, bottles, giraffes, mugs, and swans).
  6. Flower classification data sets 17 Flower Category Dataset
  7. Animals with attributes A dataset for Attribute Based Classification. It consists of 30475 images of 50 animals classes with six pre-extracted feature representations for each image.
  8. Stanford Dogs Dataset Dataset of 20,580 images of 120 dog breeds with bounding-box annotation, for fine-grained image categorization.
  9. Video classification USAA dataset The USAA dataset includes 8 different semantic class videos which are home videos of social occassions which feature activities of group of people. It contains around 100 videos for training and testing respectively. Each video is labeled by 69 attributes. The 69 attributes can be broken down into five broad classes: actions, objects, scenes, sounds, and camera movement.
  10. McGill Real-World Face Video Database This database contains 18000 video frames of 640x480 resolution from 60 video sequences, each of which recorded from a different subject (31 female and 29 male).
  11. e-Lab Video Data Set Video data sets to train machines to recognise objects in our environment. e-VDS35 has 35 classes and a total of 2050 videos of roughly 10 seconds each.

Tracking

  1. Dataset-AMP: Luka Čehovin Zajc; Alan Lukežič; Aleš Leonardis; Matej Kristan. "Beyond Standard Benchmarks: Parameterizing Performance Evaluation in Visual Object Tracking." ICCV (2017). [paper]
  2. Dataset-Nfs: Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, Deva Ramanan and Simon Lucey. "Need for Speed: A Benchmark for Higher Frame Rate Object Tracking." ICCV (2017) [paper] [supp] [project]
  3. Dataset-DTB70: Siyi Li, Dit-Yan Yeung. "Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models." AAAI (2017) [paper] [project] [dataset]
  4. Dataset-UAV123: Matthias Mueller, Neil Smith and Bernard Ghanem. "A Benchmark and Simulator for UAV Tracking." ECCV (2016) [paper] [project] [dataset]
  5. Dataset-TColor-128: Pengpeng Liang, Erik Blasch, Haibin Ling. "Encoding color information for visual tracking: Algorithms and benchmark." TIP (2015) [paper] [project] [dataset]
  6. Dataset-NUS-PRO: Annan Li, Min Lin, Yi Wu, Ming-Hsuan Yang, and Shuicheng Yan. "NUS-PRO: A New Visual Tracking Challenge." PAMI (2015) [paper] [project] [Data_360(code:bf28)] [Data_baidu]] [View_360(code:515a)] [View_baidu]]
  7. Dataset-PTB: Shuran Song and Jianxiong Xiao. "Tracking Revisited using RGBD Camera: Unified Benchmark and Baselines." ICCV (2013) [paper] [project] [5 validation] [95 evaluation]
  8. Dataset-ALOV300+: Arnold W. M. Smeulders, Dung M. Chu, Rita Cucchiara, Simone Calderara, Afshin Dehghan, Mubarak Shah. "Visual Tracking: An Experimental Survey." PAMI (2014) [paper] [project] Mirror Link:ALOV300++ Dataset Mirror Link:ALOV300++ Groundtruth
  9. OTB2013: Wu, Yi, Jongwoo Lim, and Minghsuan Yang. "Online Object Tracking: A Benchmark." CVPR (2013). [paper]
  10. OTB2015: Wu, Yi, Jongwoo Lim, and Minghsuan Yang. "Object Tracking Benchmark." TPAMI (2015). [paper] [project]
  11. Dataset-VOT: [project]
  12. [VOT13_paper_ICCV(http://www.votchallenge.net/vot2013/Download/vot_2013_paper.pdfThe)] Visual Object Tracking VOT2013 challenge results
  13. [VOT14_paper_ECCV]The Visual Object Tracking VOT2014 challenge results
  14. [VOT15_paper_ICCV]The Visual Object Tracking VOT2015 challenge results
  15. [VOT16_paper_ECCV]The Visual Object Tracking VOT2016 challenge results
  16. [VOT17_paper_ECCV]The Visual Object Tracking VOT2017 challenge results

Challenge

  1. Microsoft COCO Image Captioning Challenge [https://competitions.codalab.org/competitions/3221]
  2. ImageNet Large Scale Visual Recognition Challenge [http://www.image-net.org/]
  3. COCO 2017 Detection Challenge [http://cocodataset.org/#detections-challenge2017]
  4. Visual Domain Adaptation (VisDA2017) Segmentation Challenge [https://competitions.codalab.org/competitions/17054]
  5. The PASCAL Visual Object Classes Homepage [http://host.robots.ox.ac.uk/pascal/VOC/]
  6. YouTube-8M Large-Scale Video Understanding [https://research.google.com/youtube8m/workshop.html]
  7. joint COCO and Places Challenge [https://places-coco2017.github.io/]
  8. Places Challenge 2017: Deep Scene Understanding is held jointly with COCO Challenge at ICCV'17 [http://placeschallenge.csail.mit.edu/]
  9. COCO Challenges. [http://cocodataset.org/#home]
  10. VQA Challenge 2017 [http://visualqa.org/]
  11. The Joint Video and Language Understanding Workshop: MovieQA and The Large Scale Movie Description Challenge (LSMDC), at ICCV 2017 [https://sites.google.com/site/describingmovies/challenge]
  12. Microsoft Multimedia Challenge (2017) [http://ms-multimedia-challenge.com/2017/challenge]
  13. MOTChallenge: The Multiple Object Tracking Benchmark [https://motchallenge.net/]
  14. Visual Domain Adaptation Challenge [http://ai.bu.edu/visda-2017/]
  15. MegaFace and MF2: Million-Scale Face Recognition [http://megaface.cs.washington.edu/]
  16. Facial Keypoints Detection [https://www.kaggle.com/c/facial-keypoints-detection]
  17. The VOT challenges Visual Object Tracking [http://www.votchallenge.net/]
  18. Large-scale Scene Understanding Challenge. SCENE CLASSIFICATION, SEGMENTATION, SALIENCY PREDICTION [http://lsun.cs.princeton.edu/2017/]
  19. AI Challenger·全球AI挑战赛 图像中文描述,人体骨骼关键点,场景分类 [https://challenger.ai/]
  20. 2016上海BOT大数据应用大赛 [http://www.zhishu51.com/Activity/bot]

创业公司

  1. 旷视科技:让机器看懂世界 [https://megvii.com/]
  2. 云从科技:源自计算机视觉之父的人脸识别技术 [http://www.cloudwalk.cn/]
  3. 格林深瞳:让计算机看懂世界 [http://www.deepglint.com/]
  4. 北京陌上花科技有限公司:人工智能计算机视觉引擎 [http://www.dressplus.cn/]
  5. 依图科技:与您一起构建计算机视觉的未来 [http://www.yitutech.com/]
  6. 码隆科技:最时尚的人工智能 [https://www.malong.com/]
  7. Linkface脸云科技:全球领先的人脸识别技术服务 [https://www.linkface.cn/]
  8. 速感科技:让机器人认识世界,用机器人改变世界 [http://www.qfeeltech.com/]
  9. 图森: 中国自动驾驶商业化领跑者 [http://www.tusimple.com/]
  10. Sense TIme商汤科技:教会计算机看懂这个世界 [https://www.sensetime.com/]
  11. 图普科技:专注于图像识别 [https://us.tuputech.com/?from=gz]
  12. 亮风台: 专注增强现实,引领人机交互 [https://www.hiscene.com/]
  13. 中科视拓 : 知人识面辨万物,开源赋能共发展 [http://www.seetatech.com/]
  14. 中科奥森:双目深度学习: 让生活和社会更安 [http://www.authenmetric.com/]
  15. 银河水滴:全球领先的步态识别技术 [http://www.watrix.cc/y1.html]

公众号

  1. 视觉求索 thevisionseeker
  2. 深度学习大讲堂 deeplearningclass
  3. VALSE valse_wechat_

原文发布于微信公众号 - 专知(Quan_Zhuanzhi)

原文发表时间:2017-11-06

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏专知

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【导读】专知内容组整理了最近七篇自注意力机制(Self-attention)相关文章,为大家进行介绍,欢迎查看! 1. A Structured Self-at...

5K6
来自专栏CVer

人工智能 | 中国计算机学会推荐国际学术刊物/会议

关注CVer公众号的群体大多以学生为主,特别是研究生。相信在帮boss做事的时候,论文也是核心工作。Amusi平时爱推送一些论文速递,但这么多论文,怎么快速区分...

1801
来自专栏CreateAMind

GAN论文解读推荐

ICLR 2017 的 submission DDL 刚刚过,网上就纷纷有了 ICLR 2017 导读的文章。本周我也将为大家带来 ICLR 2017 subm...

2263
来自专栏PPV课数据科学社区

【学习】常用的机器学习&数据挖掘知识点

Basis(基础): MSE(Mean Square Error 均方误差),LMS(LeastMean Square 最小均方),LSM(Least Squa...

36412
来自专栏新智元

ICLR 2017深度学习(提交)论文汇总:NLP、无监督学习、自动编码、RL、RNN(150论文下载)

【新智元导读】ICLR 2017 将于2017年4月24日至26日在法国土伦(toulon)举行,11月4日已经停止接收论文。本文汇总了本年度NLP、无监督学习...

48510
来自专栏专知

【论文推荐】最新八篇网络节点表示相关论文—可扩展嵌入、对抗自编码器、图划分、异构信息、显式矩阵分解、深度高斯、图、随机游走

【导读】专知内容组整理了最近八篇网络节点表示(Network Embedding)相关文章,为大家进行介绍,欢迎查看! 1.SIGNet: Scalable E...

7556
来自专栏CVer

谷歌CVPR 2018最全总结:45篇论文,Ian Goodfellow GAN演讲PPT下载

谷歌在今年的CVPR上表现强势,有超过200名谷歌员工将在大会上展示论文或被邀请演讲,45篇论文被接收。在计算机视觉领域,生成对抗网络GAN无疑是最受关注的主题...

4733
来自专栏CSDN技术头条

【基础】常用的机器学习&数据挖掘知识点

Basis(基础): MSE(Mean Square Error均方误差),LMS(LeastMean Square最小均方),LSM(Least Square...

3188
来自专栏专知

【论文推荐】最新六篇视频分类相关论文—教师学生网络、表观-关系、Charades-Ego、视觉数据合成、图蒸馏、细粒度视频分类

【导读】专知内容组为大家推出最新六篇视频分类(Video Classification)相关论文,欢迎查看!

1353
来自专栏大数据文摘

如何优雅地测量一只猫的体积

2147

扫码关注云+社区

领取腾讯云代金券