向AI转型的程序员都关注了这个号👇👇👇
机器学习AI算法工程 公众号:datayx
【CVPR 2022 论文开源目录】
- Backbone
- CLIP
- GAN
- NAS
- NeRF
- Visual Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 小样本分割(Few-Shot Segmentation)
- 视频理解(Video Understanding)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D重建(3D Reconstruction)
- 伪装物体检测(Camouflaged Object Detection)
- 深度估计(Depth Estimation)
- 立体匹配(Stereo Matching)
- 车道线检测(Lane Detection)
- 图像修复(Image Inpainting)
- 人群计数(Crowd Counting)
- 医学图像(Medical Image)
- 场景图生成(Scene Graph Generation)
- 弱监督物体检测(Weakly Supervised Object Localization)
- 高光谱图像重建(Hyperspectral Image Reconstruction)
- 水印(Watermarking)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
Backbone
A ConvNet for the 2020s
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
MPViT : Multi-Path Vision Transformer for Dense Prediction
CLIP
HairCLIP: Design Your Hair by Text and Reference Image
- Paper: https://arxiv.org/abs/2112.05142
- Code: https://github.com/wty-ustc/HairCLIP
PointCLIP: Point Cloud Understanding by CLIP
- Paper: https://arxiv.org/abs/2112.02413
- Code: https://github.com/ZrrSkywalker/PointCLIP
Blended Diffusion for Text-driven Editing of Natural Images
- Paper: https://arxiv.org/abs/2111.14818
- Code: https://github.com/omriav/blended-diffusion
GAN
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
- Homepage: https://semanticstylegan.github.io/
- Paper: https://arxiv.org/abs/2112.02236
- Demo: https://semanticstylegan.github.io/videos/demo.mp4
Style Transformer for Image Inversion and Editing
- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer
NAS
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
- Paper: https://arxiv.org/abs/2203.01665
- Code: https://github.com/Sunshine-Ye/Beta-DARTS
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
NeRF
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
- Homepage: https://jonbarron.info/mipnerf360/
- Paper: https://arxiv.org/abs/2111.12077
- Demo: https://youtu.be/YStDS2-Ln1s
Point-NeRF: Point-based Neural Radiance Fields
- Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
- Paper: https://arxiv.org/abs/2201.08845
- Code: https://github.com/Xharlie/point-nerf
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
- Paper: https://arxiv.org/abs/2111.13679
- Homepage: https://bmild.github.io/rawnerf/
- Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
Urban Radiance Fields
- Homepage: https://urban-radiance-fields.github.io/
- Paper: https://arxiv.org/abs/2111.14643
- Demo: https://youtu.be/qGlq5DZT6uc
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
- Paper: https://arxiv.org/abs/2202.13162
- Code: https://github.com/HexagonPrime/Pix2NeRF
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
- Homepage: https://grail.cs.washington.edu/projects/humannerf/
- Paper: https://arxiv.org/abs/2201.04127
- Demo: https://youtu.be/GM-RoZEymmw
Visual Transformer
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
- Paper: https://arxiv.org/abs/2112.11010
- Code: https://github.com/youngwanLEE/MPViT
应用(Application)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
Embracing Single Stride 3D Object Detector with Sparse Transformer
- Paper: https://arxiv.org/abs/2112.06375
- Code: https://github.com/TuSimple/SST
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2203.02891
- Code: https://github.com/xulianuwa/MCTformer
Spatio-temporal Relation Modeling for Few-shot Action Recognition
- Paper: https://arxiv.org/abs/2112.05132
- Code: https://github.com/Anirudh257/strm
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
- Paper: https://arxiv.org/abs/2111.07910
- Code: https://github.com/caiyuanhao1998/MST
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
- Homepage: https://point-bert.ivg-research.xyz/
- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT
GroupViT: Semantic Segmentation Emerges from Text Supervision
- Homepage: https://jerryxu.net/GroupViT/
- Paper: https://arxiv.org/abs/2202.11094
- Demo: https://youtu.be/DtJsWIUTW-Y
Restormer: Efficient Transformer for High-Resolution Image Restoration
- Paper: https://arxiv.org/abs/2111.09881
- Code: https://github.com/swz30/Restormer
Splicing ViT Features for Semantic Appearance Transfer
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
Self-supervised Video Transformer
- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514
- Code: https://github.com/kahnchana/svt
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
- Paper: https://arxiv.org/abs/2203.02664
- Code: https://github.com/rulixiang/afa
Accelerating DETR Convergence via Semantic-Aligned Matching
- Paper: https://arxiv.org/abs/2203.06883
- Code: https://github.com/ZhangGongjie/SAM-DETR
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Style Transformer for Image Inversion and Editing
- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
- Paper: https://arxiv.org/abs/2203.10981
- Code: https://github.com/kuanchihhuang/MonoDTR
Mask Transfiner for High-Quality Instance Segmentation
- Paper: https://arxiv.org/abs/2111.13673
- Code: https://github.com/SysCV/transfiner
视觉和语言(Vision-Language)
Conditional Prompt Learning for Vision-Language Models
- Paper: https://arxiv.org/abs/2203.05557
- Code: https://github.com/KaiyangZhou/CoOp
自监督学习(Self-supervised Learning)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
- Paper: https://arxiv.org/abs/2203.06965
- Code: None
Crafting Better Contrastive Views for Siamese Representation Learning
HCSC: Hierarchical Contrastive Selective Coding
数据增强(Data Augmentation)
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
- Paper: https://arxiv.org/abs/2202.12513
- Code: https://github.com/DensoITLab/TeachAugment
AlignMix: Improving representation by interpolating aligned features
- Paper: https://arxiv.org/abs/2103.15375
- Code: None
目标检测(Object Detection)
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Accelerating DETR Convergence via Semantic-Aligned Matching
- Paper: https://arxiv.org/abs/2203.06883
- Code: https://github.com/ZhangGongjie/SAM-DETR
Localization Distillation for Dense Object Detection
Focal and Global Knowledge Distillation for Detectors
A Dual Weighting Label Assignment Scheme for Object Detection
- Paper: https://arxiv.org/abs/2203.09730
- Code: https://github.com/strongwolf/DW
目标跟踪(Visual Tracking)
Correlation-Aware Deep Tracking
- Paper: https://arxiv.org/abs/2203.01666
- Code: None
TCTrack: Temporal Contexts for Aerial Tracking
- Paper: https://arxiv.org/abs/2203.01885
- Code: https://github.com/vision4robotics/TCTrack
语义分割(Semantic Segmentation)
弱监督语义分割
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2203.00962
- Code: https://github.com/zhaozhengChen/ReCAM
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2203.02891
- Code: https://github.com/xulianuwa/MCTformer
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
- Paper: https://arxiv.org/abs/2203.02664
- Code: https://github.com/rulixiang/afa
半监督语义分割
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
无监督语义分割
GroupViT: Semantic Segmentation Emerges from Text Supervision
- Homepage: https://jerryxu.net/GroupViT/
- Paper: https://arxiv.org/abs/2202.11094
- Demo: https://youtu.be/DtJsWIUTW-Y
实例分割(Instance Segmentation)
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
- Paper: https://arxiv.org/abs/2203.04074
- Code: https://github.com/zhang-tao-whu/e2ec
Mask Transfiner for High-Quality Instance Segmentation
- Paper: https://arxiv.org/abs/2111.13673
- Code: https://github.com/SysCV/transfiner
自监督实例分割
FreeSOLO: Learning to Segment Objects without Annotations
- Paper: https://arxiv.org/abs/2202.12181
- Code: None
视频实例分割
Efficient Video Instance Segmentation via Tracklet Query and Proposal
- Homepage: https://jialianwu.com/projects/EfficientVIS.html
- Paper: https://arxiv.org/abs/2203.01853
- Demo: https://youtu.be/sSPMzgtMKCE
小样本分割(Few-Shot Segmentation)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
- Paper: https://arxiv.org/abs/2203.07615
- Code: https://github.com/chunbolang/BAM
视频理解(Video Understanding)
Self-supervised Video Transformer
- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514
- Code: https://github.com/kahnchana/svt
行为识别(Action Recognition)
Spatio-temporal Relation Modeling for Few-shot Action Recognition
- Paper: https://arxiv.org/abs/2112.05132
- Code: https://github.com/Anirudh257/strm
动作检测(Action Detection)
End-to-End Semi-Supervised Learning for Video Action Detection
- Paper: https://arxiv.org/abs/2203.04251
- Code: None
图像编辑(Image Editing)
Style Transformer for Image Inversion and Editing
- Paper: https://arxiv.org/abs/2203.07932
- Code: https://github.com/sapphire497/style-transformer
Blended Diffusion for Text-driven Editing of Natural Images
- Paper: https://arxiv.org/abs/2111.14818
- Code: https://github.com/omriav/blended-diffusion
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
- Homepage: https://semanticstylegan.github.io/
- Paper: https://arxiv.org/abs/2112.02236
- Demo: https://semanticstylegan.github.io/videos/demo.mp4
Low-level Vision
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
Restormer: Efficient Transformer for High-Resolution Image Restoration
- Paper: https://arxiv.org/abs/2111.09881
- Code: https://github.com/swz30/Restormer
超分辨率(Super-Resolution)
图像超分辨率(Image Super-Resolution)
Learning the Degradation Distribution for Blind Image Super-Resolution
- Paper: https://arxiv.org/abs/2203.04962
- Code: https://github.com/greatlog/UnpairedSR
视频超分辨率(Video Super-Resolution)
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
3D点云(3D Point Cloud)
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
- Homepage: https://point-bert.ivg-research.xyz/
- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT
A Unified Query-based Paradigm for Point Cloud Understanding
- Paper: https://arxiv.org/abs/2203.01252
- Code: None
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
- Paper: https://arxiv.org/abs/2203.00680
- Code: https://github.com/MohamedAfham/CrossPoint
PointCLIP: Point Cloud Understanding by CLIP
- Paper: https://arxiv.org/abs/2112.02413
- Code: https://github.com/ZrrSkywalker/PointCLIP
3D目标检测(3D Object Detection)
Embracing Single Stride 3D Object Detector with Sparse Transformer
- Paper: https://arxiv.org/abs/2112.06375
- Code: https://github.com/TuSimple/SST
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
- Paper: https://arxiv.org/abs/2011.12001
- Code: https://github.com/qq456cvb/CanonicalVoting
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
- Paper: https://arxiv.org/abs/2203.10981
- Code: https://github.com/kuanchihhuang/MonoDTR
3D语义分割(3D Semantic Segmentation)
Scribble-Supervised LiDAR Semantic Segmentation
- Paper: https://arxiv.org/abs/2203.08537
- Dataset: https://github.com/ouenal/scribblekitti
3D目标跟踪(3D Object Tracking)
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
- Paper: https://arxiv.org/abs/2203.01730
- Code: https://github.com/Ghostish/Open3DSOT
3D人体姿态估计(3D Human Pose Estimation)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
- Paper: https://arxiv.org/abs/2111.12707
- Code: https://github.com/Vegetebird/MHFormer
- 中文解读: https://zhuanlan.zhihu.com/p/439459426
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
3D语义场景补全(3D Semantic Scene Completion)
MonoScene: Monocular 3D Semantic Scene Completion
- Paper: https://arxiv.org/abs/2112.00726
- Code: https://github.com/cv-rits/MonoScene
3D重建(3D Reconstruction)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
伪装物体检测(Camouflaged Object Detection)
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
- Paper: https://arxiv.org/abs/2203.02688
- Code: https://github.com/lartpang/ZoomNet
深度估计(Depth Estimation)
单目深度估计
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
- Paper: https://arxiv.org/abs/2203.01502
- Code: None
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- Paper: https://arxiv.org/abs/2203.00838
- Code: None
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
立体匹配(Stereo Matching)
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
- Paper: https://arxiv.org/abs/2203.02146
- Code: https://github.com/gangweiX/ACVNet
车道线检测(Lane Detection)
Rethinking Efficient Lane Detection via Curve Modeling
- Paper: https://arxiv.org/abs/2203.02431
- Code: https://github.com/voldemortX/pytorch-auto-drive
- Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
图像修复(Image Inpainting)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
- Paper: https://arxiv.org/abs/2203.00867
- Code: https://github.com/DQiaole/ZITS_inpainting
人群计数(Crowd Counting)
Leveraging Self-Supervision for Cross-Domain Crowd Counting
- Paper: https://arxiv.org/abs/2103.16291
- Code: None
医学图像(Medical Image)
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
- Paper: https://arxiv.org/abs/2203.02533
- Code: None
场景图生成(Scene Graph Generation)
SGTR: End-to-end Scene Graph Generation with Transformer
- Paper: https://arxiv.org/abs/2112.12970
- Code: None
风格迁移(Style Transfer)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
- Homepage: https://lukashoel.github.io/stylemesh/
- Paper: https://arxiv.org/abs/2112.01530
- Code: https://github.com/lukasHoel/stylemesh
- Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks
弱监督物体检测(Weakly Supervised Object Localization)
Weakly Supervised Object Localization as Domain Adaption
- Paper: https://arxiv.org/abs/2203.01714
- Code: https://github.com/zh460045050/DA-WSOL_CVPR2022
高光谱图像重建(Hyperspectral Image Reconstruction)
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
- Paper: https://arxiv.org/abs/2111.07910
- Code: https://github.com/caiyuanhao1998/MST
水印(Watermarking)
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
- Paper: https://arxiv.org/abs/2104.13450
- Code: None
数据集(Datasets)
It's About Time: Analog Clock Reading in the Wild
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
Kubric: A scalable dataset generator
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
Scribble-Supervised LiDAR Semantic Segmentation
- Paper: https://arxiv.org/abs/2203.08537
- Dataset: https://github.com/ouenal/scribblekitti
新任务(New Task)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
It's About Time: Analog Clock Reading in the Wild
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
Splicing ViT Features for Semantic Appearance Transfer
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
其他(Others)
Kubric: A scalable dataset generator
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
机器学习算法AI大数据技术
搜索公众号添加: datanlp
长按图片,识别二维码
阅读过本文的人还看了以下文章:
TensorFlow 2.0深度学习案例实战
基于40万表格数据集TableBank,用MaskRCNN做表格检测
《基于深度学习的自然语言处理》中/英PDF
Deep Learning 中文版初版-周志华团队
【全套视频课】最全的目标检测算法系列讲解,通俗易懂!
《美团机器学习实践》_美团算法团队.pdf
《深度学习入门:基于Python的理论与实现》高清中文PDF+源码
《深度学习:基于Keras的Python实践》PDF和代码
特征提取与图像处理(第二版).pdf
python就业班学习视频,从入门到实战项目
2019最新《PyTorch自然语言处理》英、中文版PDF+源码
《21个项目玩转深度学习:基于TensorFlow的实践详解》完整版PDF+附书代码
《深度学习之pytorch》pdf+附书源码
PyTorch深度学习快速实战入门《pytorch-handbook》
【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》
《Python数据分析与挖掘实战》PDF+完整源码
汽车行业完整知识图谱项目实战视频(全23课)
李沐大神开源《动手学深度学习》,加州伯克利深度学习(2019春)教材
笔记、代码清晰易懂!李航《统计学习方法》最新资源全套!
《神经网络与深度学习》最新2018版中英PDF+源码
将机器学习模型部署为REST API
FashionAI服装属性标签图像识别Top1-5方案分享
重要开源!CNN-RNN-CTC 实现手写汉字识别
yolo3 检测出图像中的不规则汉字
同样是机器学习算法工程师,你的面试为什么过不了?
前海征信大数据算法:风险概率预测
【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目,让你掌握深度学习图像分类
VGG16迁移学习,实现医学图像识别分类工程项目
特征工程(一)
特征工程(二) :文本数据的展开、过滤和分块
特征工程(三):特征缩放,从词袋到 TF-IDF
特征工程(四): 类别特征
特征工程(五): PCA 降维
特征工程(六): 非线性特征提取和模型堆叠
特征工程(七):图像特征提取和深度学习
如何利用全新的决策树集成级联结构gcForest做特征工程并打分?
Machine Learning Yearning 中文翻译稿
蚂蚁金服2018秋招-算法工程师(共四面)通过
全球AI挑战-场景分类的比赛源码(多模型融合)
斯坦福CS230官方指南:CNN、RNN及使用技巧速查(打印收藏)
python+flask搭建CNN在线识别手写中文网站
中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程
不断更新资源
深度学习、机器学习、数据分析、python
搜索公众号添加: datayx