💡💡💡本文独家改进:
1)DCNv4优势:(1) 去除空间聚合中的softmax归一化,以增强其动态性和表达能力;(2) 优化存储器访问以最小化冗余操作以加速。这些改进显著加快了收敛速度,并大幅提高了处理速度,DCNv 4实现了三倍以上的前向速度。
2)这个双注意力块串联了两个模块:通道-空间注意力和并行注意力。我们提出了一种新的并行注意力架构,通过并行连接三种不同的注意力机制(全局通道注意力、局部通道注意力和空间注意力)。
💡💡💡如何跟YOLO11结合:1)和11Detect创新性结合;2) DCNv4结合SPPF;1)C3k2结合双注意力块(DAB);
💡💡💡涨点情况:GC10-DET缺陷检测,11Detect创新性结合,原始mAP50为0.633 提升至0.646
💡💡💡涨点情况:GC10-DET缺陷检测,和DCNv4结合SPPF,原始mAP50为0.633 提升至0.647
💡💡💡涨点情况:DCNv4结合SPPF+11Detect创新性结合,原始mAP50为0.633 提升至0.651
💡💡💡涨点情况:DCNv4结合SPPF+DCNv4结合11Detect+双注意力块(DAB)创新性结合,原始mAP50为0.633 提升至0.660
改进结构图如下:
Ultralytics YOLO11是一款尖端的、最先进的模型,它在之前YOLO版本成功的基础上进行了构建,并引入了新功能和改进,以进一步提升性能和灵活性。YOLO11设计快速、准确且易于使用,使其成为各种物体检测和跟踪、实例分割、图像分类以及姿态估计任务的绝佳选择。
结构图如下:
C3k2,结构图如下
C3k2,继承自类C2f,其中通过c3k设置False或者Ture来决定选择使用C3k还是Bottleneck
实现代码ultralytics/nn/modules/block.py
借鉴V10 PSA结构,实现了C2PSA和C2fPSA,最终选择了基于C2的C2PSA(可能涨点更好?)
实现代码ultralytics/nn/modules/block.py
分类检测头引入了DWConv(更加轻量级,为后续二次创新提供了改进点),结构图如下(和V8的区别):
实现代码ultralytics/nn/modules/head.py
数据集大小,训练集1833张,验证集459张
标签可视化:存在大小缺陷,缺陷类别不均衡等特点
# Ultralytics YOLO 🚀, AGPL-3.0 license
# COCO 2017 dataset https://cocodataset.org by Microsoft
# Documentation: https://docs.ultralytics.com/datasets/detect/coco/
# Example usage: yolo train data=coco.yaml
# parent
# ├── ultralytics
# └── datasets
# └── coco ← downloads here (20.1 GB)
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: D:/YOLOv11/data/GC10-DET # dataset root dir
train: images/train # train images (relative to 'path') 118287 images
val: images/val # val images (relative to 'path') 5000 images
#test: test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794
# Classes
names:
0: chongkong
1: hanfeng
2: yueyawan
3: shuiban
4: youban
5: siban
6: yiwu
7: yahen
8: zhehen
9: yaozhe
import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO
if __name__ == '__main__':
model = YOLO('ultralytics/cfg/models/11/yolo11.yaml')
#model.load('yolov8n.pt') # loading pretrain weights
model.train(data='data/GC10.yaml',
cache=False,
imgsz=640,
epochs=200,
batch=8,
close_mosaic=10,
device='0',
optimizer='SGD', # using SGD
project='runs/train',
name='exp',
)
原始mAP50为 0.633
YOLO11 summary (fused): 281 layers, 2,827,814 parameters, 0 gradients, 6.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 15/15 [00:21<00:00, 1.44s/it]
all 459 747 0.67 0.615 0.633 0.327
chongkong 53 54 0.78 0.944 0.934 0.639
hanfeng 7 10 0.395 0.8 0.48 0.178
yueyawan 138 171 0.656 0.524 0.556 0.231
shuiban 71 91 0.799 0.648 0.71 0.385
youban 105 106 0.826 0.764 0.817 0.449
siban 45 92 0.522 0.154 0.26 0.0919
yiwu 43 104 0.678 0.471 0.553 0.258
yahen 10 21 0.176 0.0476 0.119 0.0251
zhehen 71 71 0.941 0.905 0.948 0.545
yaozhe 27 27 0.926 0.889 0.949 0.467
预测结果如下:
论文: https://arxiv.org/pdf/2401.06197.pdf
摘要:我们介绍了可变形卷积v4 (DCNv4),这是一种高效的算子,专为广泛的视觉应用而设计。DCNv4通过两个关键增强解决了其前身DCNv3的局限性:去除空间聚合中的softmax归一化,增强空间聚合的动态性和表现力;优化内存访问以最小化冗余操作以提高速度。与DCNv3相比,这些改进显著加快了收敛速度,并大幅提高了处理速度,其中DCNv4的转发速度是DCNv3的三倍以上。DCNv4在各种任务中表现出卓越的性能,包括图像分类、实例和语义分割,尤其是图像生成。当在潜在扩散模型中与U-Net等生成模型集成时,DCNv4的性能优于其基线,强调了其增强生成模型的可能性。在实际应用中,将InternImage模型中的DCNv3替换为DCNv4来创建FlashInternImage,无需进一步修改即可使速度提高80%,并进一步提高性能。DCNv4在速度和效率方面的进步,以及它在不同视觉任务中的强大性能,显示了它作为未来视觉模型基础构建块的潜力。
图1所示。(a)我们以DCNv3为基准显示相对运行时间。DCNv4比DCNv3有明显的加速,并且超过了其他常见的视觉算子。(b)在相同的网络架构下,DCNv4收敛速度快于其他视觉算子,而DCNv3在初始训练阶段落后于视觉算子。
💡💡💡如何跟YOLO11结合:1)和11Detect创新性结合
改进结构图如下:
DAB由两个模块串联而成:通道-空间注意力模块(CSAM)和并行注意力模块(PAM),如图2所示。从理论上讲,DAB的结构设计可以减少模型的复杂度。现有的基于深度学习的模型通常需要大量的参数来准确捕捉雾霾成分,从而增加了模型的复杂性。相比之下,DAB中的CSAM和PAM基于注意力机制,可以在保持较少参数的情况下高效检测雾霾成分。此外,将CSAM与PAM串联起来可以互补,增强DAB的有效性。因此,DAB通过利用一种结构设计——即两个模块CSAM和PAM串联连接,能够更精确地检测雾霾成分,同时按比例减少模型的复杂度。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect_DCNv4, [nc, 128, 1]] # Detect(P3, P4, P5)
实验结果如下:
原始mAP50为0.633 提升至0.646
YOLO11-Detect_DCNv4 summary (fused): 277 layers, 2,684,199 parameters, 0 gradients, 7.8 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 29/29 [00:20<00:00, 1.44it/s]
all 459 747 0.674 0.619 0.646 0.313
chongkong 53 54 0.773 0.963 0.911 0.612
hanfeng 7 10 0.401 0.6 0.539 0.178
yueyawan 138 171 0.681 0.525 0.587 0.237
shuiban 71 91 0.853 0.681 0.727 0.396
youban 105 106 0.673 0.906 0.832 0.31
siban 45 92 0.533 0.239 0.305 0.0942
yiwu 43 104 0.7 0.452 0.568 0.264
yahen 10 21 0.317 0.0952 0.145 0.0379
zhehen 71 71 0.981 0.915 0.951 0.545
yaozhe 27 27 0.831 0.815 0.892 0.459
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, DCNv4_SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
实验结果如下:
原始mAP50为0.633 提升至0.647
YOLO11-DCNv4_SPPF summary (fused): 292 layers, 4,682,054 parameters, 0 gradients, 7.8 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 15/15 [00:21<00:00, 1.42s/it]
all 459 747 0.685 0.615 0.647 0.338
chongkong 53 54 0.796 0.963 0.918 0.637
hanfeng 7 10 0.371 0.7 0.538 0.233
yueyawan 138 171 0.642 0.538 0.602 0.259
shuiban 71 91 0.825 0.67 0.723 0.407
youban 105 106 0.824 0.764 0.829 0.453
siban 45 92 0.477 0.207 0.269 0.0995
yiwu 43 104 0.616 0.423 0.51 0.251
yahen 10 21 0.542 0.115 0.23 0.0718
zhehen 71 71 0.902 0.915 0.948 0.528
yaozhe 27 27 0.852 0.852 0.903 0.437
实验结果如下:
原始mAP50为0.633 提升至0.651
YOLO11-Detect_DCNv4-DCNv4_SPPF summary (fused): 288 layers, 4,538,439 parameters, 0 gradients, 9.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 15/15 [00:30<00:00, 2.04s/it]
all 459 747 0.725 0.618 0.651 0.322
chongkong 53 54 0.753 0.963 0.911 0.629
hanfeng 7 10 0.479 0.644 0.537 0.184
yueyawan 138 171 0.629 0.491 0.538 0.232
shuiban 71 91 0.83 0.644 0.722 0.376
youban 105 106 0.68 0.906 0.855 0.346
siban 45 92 0.565 0.207 0.309 0.112
yiwu 43 104 0.702 0.5 0.576 0.269
yahen 10 21 0.689 0.0952 0.172 0.0598
zhehen 71 71 0.97 0.924 0.959 0.549
yaozhe 27 27 0.956 0.809 0.926 0.463
改进结构图如下:
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2_DAB, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2_DAB, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2_DAB, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2_DAB, [1024, True]]
- [-1, 1, DCNv4_SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2_DAB, [512, False]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2_DAB, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2_DAB, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2_DAB, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect_DCNv4, [nc, 128, 1]] # Detect(P3, P4, P5)
实验结果如下:
原始mAP50为0.633 提升至0.66
YOLO11-Detect_DCNv4-DCNv4_SPPF-C3k2_DAB summary: 706 layers, 4,976,274 parameters, 0 gradients, 10.8 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 15/15 [00:24<00:00, 1.62s/it]
all 459 747 0.706 0.623 0.66 0.323
chongkong 53 54 0.776 0.963 0.928 0.633
hanfeng 7 10 0.483 0.748 0.636 0.206
yueyawan 138 171 0.734 0.474 0.586 0.252
shuiban 71 91 0.773 0.692 0.728 0.389
youban 105 106 0.746 0.934 0.893 0.331
siban 45 92 0.628 0.202 0.301 0.116
yiwu 43 104 0.802 0.452 0.57 0.27
yahen 10 21 0.271 0.0952 0.119 0.0475
zhehen 71 71 0.846 0.915 0.949 0.529
yaozhe 27 27 1 0.755 0.886 0.454
欢迎点赞评论私信获取源码!!
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。