本文介绍了一种新的框架——BEVFormer,用于学习具有时空Transformer的统一BEV表征,以支持多个自动驾驶感知任务。BEVFormer利用空间和时间信息,通过预定的网格状BEV查询向量与空间和时间域交互。为了聚合空间信息,作者设计了一个空间交叉注意力,每个BEV查询向量从跨相机视图的感兴趣区域提取空间特征。对于时间信息,作者提出了一种时间自注意力来递归融合历史BEV信息。实验结果表明,在nuScenes测试集上,BEVFormer的NDS指标达到了最新的56.9%,比之前的最佳技术高出9.0分,与基于lidar的基线性能相当。此外,BEVFormer还显著提高了低能见度条件下目标速度估计和召回率的准确性。
本文所涉及的所有资源的获取方式:这里
本论文提出了一种名为BEVFormer的新型BEV特征生成框架,该框架能够通过注意力机制有效地聚合来自多视角相机的时空特征和历史BEV特征。BEVFormer主要由六个编码器层组成,其中包括BEV查询、空间交叉注意力和时间自我注意力三种定制设计。BEVFormer的BEV查询向量是一组网格形状的可学习参数,用于从多目相机视图中查询BEV空间中的特征。空间交叉注意力和时间自注意力则用于根据BEV查询从多相机图像中查找和聚合空间特征和历史BEV的时间特征。BEVFormer还设计了一个基于可变形注意力的端到端三维探测头和一个基于二维分割方法Panoptic SegFormer的地图分割头。
BEVFormer采用了可变形注意力和时间自注意力两种注意力机制,使得BEVFormer能够在不增加计算成本的前提下,有效地聚合来自多视角相机的时空特征和历史BEV特征。另外,BEVFormer还设计了一个基于可变形DETR探测器的端到端三维探测头和一个基于二维分割方法Panoptic SegFormer的地图分割头,使得BEVFormer可以应用于三维物体检测和地图分割等多个自动驾驶感知任务。
BEVFormer的目标是解决多目相机三维感知问题,即如何从多目相机视图中聚合时空特征和历史BEV特征,以实现准确的三维物体检测和地图分割。传统的方法通常是独立完成三维物体检测或地图分割任务,而BEVFormer通过引入注意力机制,能够有效地聚合来自多视角相机的时空特征和历史BEV特征,从而提高了多目相机三维感知的准确性。
本文介绍了BEVFormer模型在nuScenes和Waymo两个公共自动驾驶数据集上的实验结果。实验内容包括:
Following (https://mmdetection3d.readthedocs.io/en/latest/getting_started.html#installation)
a. Create a conda virtual environment and activate it.
conda create -n open-mmlab python=3.8 -y
conda activate open-mmlab
b. Install PyTorch and torchvision following the official instructions.
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
# Recommended torch>=1.9
c. Install gcc>=5 in conda env (optional).
conda install -c omgarcia gcc-6 # gcc-6.2
c. Install mmcv-full.
pip install mmcv-full==1.4.0
# pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
d. Install mmdet and mmseg.
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
e. Install mmdet3d from source code.
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
f. Install Detectron2 and Timm.
pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13 typing-extensions==4.5.0 pylint ipython==8.12 numpy==1.19.5 matplotlib==3.5.2 numba==0.48.0 pandas==1.4.4 scikit-image==0.19.3 setuptools==59.5.0
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
g. Clone BEVFormer.
git clone https://github.com/fundamentalvision/BEVFormer.git
h. Prepare pretrained models.
cd bevformer
mkdir ckpts
cd ckpts & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pret
Download nuScenes V1.0 full dataset data and CAN bus expansion data HERE. Prepare nuscenes data by running
Download CAN bus expansion
# download 'can_bus.zip'
unzip can_bus.zip
# move can_bus to data dir
Prepare nuScenes data
We genetate custom annotation files which are different from mmdet3d’s
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --version v1.0-mini --canbus ./data
Using the above code will generate nuscenes_infos_temporal_{train,val}.pkl.
Folder structure
bevformer
├── projects/
├── tools/
├── configs/
├── ckpts/
│ ├── r101_dcn_fcos3d_pretrain.pth
├── data/
│ ├── can_bus/
│ ├── nuscenes/
│ │ ├── maps/
│ │ ├── samples/
│ │ ├── sweeps/
│ │ ├── v1.0-test/
| | ├── v1.0-trainval/
| | ├── nuscenes_infos_temporal_train.pkl
| | ├── nuscenes_infos_temporal_val.pkl
Train BEVFormer with 8 GPUs
./tools/dist_train.sh ./projects/configs/bevformer/bevformer_base.py 8
Eval BEVFormer with 8 GPUs
./tools/dist_test.sh ./projects/configs/bevformer/bevformer_base.py ./path/to/ckpts.pth 8
Note: using 1 GPU to eval can obtain slightly higher performance because continuous video may be truncated with multiple GPUs. By default we report the score evaled with 8 GPUs.
The above training script can not support FP16 training, and we provide another script to train BEVFormer with FP16.
./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 8
see visual.py
Backbone | Method | Lr Schd | NDS | mAP | memroy | Config | Download |
---|---|---|---|---|---|---|---|
R50 | BEVFormer-tiny_fp16 | 24ep | 35.9 | 25.7 | - | config | |
R50 | BEVFormer-tiny | 24ep | 35.4 | 25.2 | 6500M | config | model/log |
R101-DCN | BEVFormer-small | 24ep | 47.9 | 37.0 | 10500M | config | model/log |
R101-DCN | BEVFormer-base | 24ep | 51.7 | 41.6 | 28500M | config | model/log |
R50 | BEVformerV2-t1-base | 24ep | 42.6 | 35.1 | 23952M | config | model/log |
R50 | BEVformerV2-t1-base | 48ep | 43.9 | 35.9 | 23952M | config | model/log |
R50 | BEVformerV2-t1 | 24ep | 45.3 | 38.1 | 37579M | config | model/log |
R50 | BEVformerV2-t1 | 48ep | 46.5 | 39.5 | 37579M | config | model/log |
R50 | BEVformerV2-t2 | 24ep | 51.8 | 42.0 | 38954M | config | model/log |
R50 | BEVformerV2-t2 | 48ep | 52.6 | 43.1 | 38954M | config | model/log |
R50 | BEVformerV2-t8 | 24ep | 55.3 | 46.0 | 40392M | config | model/log |
# 错误1
...
from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception
# 修改方法: 降低numpy版本即可
pip install numpy==1.23.4
# 错误2
ModuleNotFoundError: No module named 'spconv'
# 修改方法 需要跟cuda配置上, 本人是cuda-11.3, 安装版本如下
pip install spconv-cu113
# 错误3
ModuleNotFoundError: No module named 'IPython'
# 修改方法
pip install IPython
# 错误4
# 情况1:'No module named 'projects.mmdet3d_plugin'
# 情况2:ModuleNotFoundError: No module named 'tools'
# 情况3: ModuleNotFoundError: No module named 'tools.data_converter'
# 因为tools和projects.mmdet3d_plugin都是从本地导入模块,
# 导入失败要么是python环境变量没生效, 要么是模块的路径不对
# 修改办法: 更新python-path环境即可, 当前python虚拟环境的终端执行下面语句
export PYTHONPATH=$PYTHONPATH:"./"
# 如果还报错检查这句代码的路径是否正确, 可是使用绝对路径代替
# 错误5
TypeError: FormatCode() got an unexpected keyword argument 'verify'
# 修改办法: 降低yapf版本
pip install yapf==0.40.1
# 错误6
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
# 原因: 安装的mmcv与cuda版本没对用上,建议去whl官方下载离线安装
# 修改参考1.4.1安装mmcv-full教程
# 错误7
# AttributeError: module 'distutils' has no attribute 'version'
修改:更新setuptools版本
pip install setuptools==58.4.0
# 错误8
# docker里面提示libGL.so.1不存在
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
# 修改方法:安装ffmpeg即可
apt-get install ffmpeg -y
# 错误9 pip安装mmcv-full时报错
subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for mmcv-full
# 修改方法:由于g++,gcc工具没安装,安装build-essential即可
sudo apt-get install build-essential
# 错误10 训练时显存爆炸 RuntimeError: CUDA out of memory
# 修改:先将配置文件中samples_per_gpu改为1即可workers_per_gpu改0测试环境,
# 后期正式训练时逐渐增加这2个参数的数字, 直到显存占满
# 如果设置成1和0都显存不够, 可以更换显卡了
samples_per_gpu=1, workers_per_gpu=0
https://zhuanlan.zhihu.com/p/658855697
@article{li2022bevformer,
title={BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers},
author={Li, Zhiqi and Wang, Wenhai and Li, Hongyang and Xie, Enze and Sima, Chonghao and Lu, Tong and Qiao, Yu and Dai, Jifeng}
journal={arXiv preprint arXiv:2203.17270},
year={2022}
}
@article{Yang2022BEVFormerVA,
title={BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision},
author={Chenyu Yang and Yuntao Chen and Haofei Tian and Chenxin Tao and Xizhou Zhu and Zhaoxiang Zhang and Gao Huang and Hongyang Li and Y. Qiao and Lewei Lu and Jie Zhou and Jifeng Dai},
journal={ArXiv},
year={2022},
}
[1]Li Z, Wang W, Li H, et al. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 1-18.
编程未来,从这里启航!解锁无限创意,让每一代码都成为你通往成功的阶梯,帮助更多人欣赏与学习!