前往小程序,Get更优阅读体验!
立即前往
发布
社区首页 >专栏 >YOLO11多个点组合创新:GC10-DET缺陷检测 | DCNv4结合SPPF+DCNv4结合11Detect+双注意力块(DAB)​​​​​​​创新性结合

YOLO11多个点组合创新:GC10-DET缺陷检测 | DCNv4结合SPPF+DCNv4结合11Detect+双注意力块(DAB)​​​​​​​创新性结合

原创
作者头像
AI小怪兽
修改2025-01-16 09:02:40
修改2025-01-16 09:02:40
25440
代码可运行
举报
文章被收录于专栏:毕业设计YOLO大作战
运行总次数:0
代码可运行

💡💡💡本文独家改进:

1)DCNv4优势:(1) 去除空间聚合中的softmax归一化,以增强其动态性和表达能力;(2) 优化存储器访问以最小化冗余操作以加速。这些改进显著加快了收敛速度,并大幅提高了处理速度,DCNv 4实现了三倍以上的前向速度

2)这个双注意力块串联了两个模块:通道-空间注意力和并行注意力。我们提出了一种新的并行注意力架构,通过并行连接三种不同的注意力机制(全局通道注意力、局部通道注意力和空间注意力)。

💡💡💡如何跟YOLO11结合:1)和11Detect创新性结合;2) DCNv4结合SPPF;1)C3k2结合双注意力块(DAB);

💡💡💡涨点情况:GC10-DET缺陷检测,11Detect创新性结合,原始mAP50为0.633 提升至0.646

💡💡💡涨点情况:GC10-DET缺陷检测,和DCNv4结合SPPF,原始mAP50为0.633 提升至0.647

💡💡💡涨点情况:DCNv4结合SPPF+11Detect创新性结合,原始mAP50为0.633 提升至0.651

💡💡💡涨点情况:DCNv4结合SPPF+DCNv4结合11Detect+双注意力块(DAB)创新性结合,原始mAP50为0.633 提升至0.660

改进结构图如下:

1.YOLO11介绍

Ultralytics YOLO11是一款尖端的、最先进的模型,它在之前YOLO版本成功的基础上进行了构建,并引入了新功能和改进,以进一步提升性能和灵活性。YOLO11设计快速、准确且易于使用,使其成为各种物体检测和跟踪、实例分割、图像分类以及姿态估计任务的绝佳选择。

结构图如下:

1.1 C3k2

C3k2,结构图如下

C3k2,继承自类C2f,其中通过c3k设置False或者Ture来决定选择使用C3k还是Bottleneck

实现代码ultralytics/nn/modules/block.py

1.2 C2PSA介绍

借鉴V10 PSA结构,实现了C2PSA和C2fPSA,最终选择了基于C2的C2PSA(可能涨点更好?)

实现代码ultralytics/nn/modules/block.py

1.3 11 Detect介绍

分类检测头引入了DWConv(更加轻量级,为后续二次创新提供了改进点),结构图如下(和V8的区别):

实现代码ultralytics/nn/modules/head.py

2.如何训练GC10-DET数据集

2.1 数据集介绍

数据集大小,训练集1833张,验证集459张

标签可视化:存在大小缺陷,缺陷类别不均衡等特点

2.2 GC10.yaml

代码语言:javascript
代码运行次数:0
复制
# Ultralytics YOLO 🚀, AGPL-3.0 license
# COCO 2017 dataset https://cocodataset.org by Microsoft
# Documentation: https://docs.ultralytics.com/datasets/detect/coco/
# Example usage: yolo train data=coco.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco  ← downloads here (20.1 GB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: D:/YOLOv11/data/GC10-DET # dataset root dir
train: images/train # train images (relative to 'path') 118287 images
val: images/val # val images (relative to 'path') 5000 images
#test: test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

# Classes
names:
  0: chongkong
  1: hanfeng
  2: yueyawan
  3: shuiban
  4: youban
  5: siban
  6: yiwu
  7: yahen
  8: zhehen
  9: yaozhe

2.3 如何训练

代码语言:javascript
代码运行次数:0
复制
import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('ultralytics/cfg/models/11/yolo11.yaml')
    #model.load('yolov8n.pt') # loading pretrain weights
    model.train(data='data/GC10.yaml',
                cache=False,
                imgsz=640,
                epochs=200,
                batch=8,
                close_mosaic=10,
                device='0',
                optimizer='SGD', # using SGD
                project='runs/train',
                name='exp',
                )

2.4 原始训练结果可视化

原始mAP50为 0.633

代码语言:javascript
代码运行次数:0
复制
YOLO11 summary (fused): 281 layers, 2,827,814 parameters, 0 gradients, 6.3 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 15/15 [00:21<00:00,  1.44s/it]
                   all        459        747       0.67      0.615      0.633      0.327
             chongkong         53         54       0.78      0.944      0.934      0.639
               hanfeng          7         10      0.395        0.8       0.48      0.178
              yueyawan        138        171      0.656      0.524      0.556      0.231
               shuiban         71         91      0.799      0.648       0.71      0.385
                youban        105        106      0.826      0.764      0.817      0.449
                 siban         45         92      0.522      0.154       0.26     0.0919
                  yiwu         43        104      0.678      0.471      0.553      0.258
                 yahen         10         21      0.176     0.0476      0.119     0.0251
                zhehen         71         71      0.941      0.905      0.948      0.545
                yaozhe         27         27      0.926      0.889      0.949      0.467

预测结果如下:

3.本文算法原理介绍

3.1 DCNv4介绍

论文: https://arxiv.org/pdf/2401.06197.pdf

摘要:我们介绍了可变形卷积v4 (DCNv4),这是一种高效的算子,专为广泛的视觉应用而设计。DCNv4通过两个关键增强解决了其前身DCNv3的局限性:去除空间聚合中的softmax归一化,增强空间聚合的动态性和表现力;优化内存访问以最小化冗余操作以提高速度。与DCNv3相比,这些改进显著加快了收敛速度,并大幅提高了处理速度,其中DCNv4的转发速度是DCNv3的三倍以上。DCNv4在各种任务中表现出卓越的性能,包括图像分类、实例和语义分割,尤其是图像生成。当在潜在扩散模型中与U-Net等生成模型集成时,DCNv4的性能优于其基线,强调了其增强生成模型的可能性。在实际应用中,将InternImage模型中的DCNv3替换为DCNv4来创建FlashInternImage,无需进一步修改即可使速度提高80%,并进一步提高性能。DCNv4在速度和效率方面的进步,以及它在不同视觉任务中的强大性能,显示了它作为未来视觉模型基础构建块的潜力。

图1所示。(a)我们以DCNv3为基准显示相对运行时间。DCNv4比DCNv3有明显的加速,并且超过了其他常见的视觉算子。(b)在相同的网络架构下,DCNv4收敛速度快于其他视觉算子,而DCNv3在初始训练阶段落后于视觉算子。

💡💡💡如何跟YOLO11结合:1)和11Detect创新性结合

改进结构图如下:

3.2 双注意力块(DAB)原理

DAB由两个模块串联而成:通道-空间注意力模块(CSAM)和并行注意力模块(PAM),如图2所示。从理论上讲,DAB的结构设计可以减少模型的复杂度。现有的基于深度学习的模型通常需要大量的参数来准确捕捉雾霾成分,从而增加了模型的复杂性。相比之下,DAB中的CSAM和PAM基于注意力机制,可以在保持较少参数的情况下高效检测雾霾成分。此外,将CSAM与PAM串联起来可以互补,增强DAB的有效性。因此,DAB通过利用一种结构设计——即两个模块CSAM和PAM串联连接,能够更精确地检测雾霾成分,同时按比例减少模型的复杂度。

4.DCNv4创新到YOLO11

4. 1 11Detect结合DCNv4二次创新到YOLO11

4.1.1 yolo11-Detect_DCNv4.yaml

代码语言:javascript
代码运行次数:0
复制
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect_DCNv4, [nc, 128, 1]] # Detect(P3, P4, P5)

4.1.2 改进结果可视化

实验结果如下:

原始mAP50为0.633 提升至0.646

代码语言:javascript
代码运行次数:0
复制
YOLO11-Detect_DCNv4 summary (fused): 277 layers, 2,684,199 parameters, 0 gradients, 7.8 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 29/29 [00:20<00:00,  1.44it/s]
                   all        459        747      0.674      0.619      0.646      0.313
             chongkong         53         54      0.773      0.963      0.911      0.612
               hanfeng          7         10      0.401        0.6      0.539      0.178
              yueyawan        138        171      0.681      0.525      0.587      0.237
               shuiban         71         91      0.853      0.681      0.727      0.396
                youban        105        106      0.673      0.906      0.832       0.31
                 siban         45         92      0.533      0.239      0.305     0.0942
                  yiwu         43        104        0.7      0.452      0.568      0.264
                 yahen         10         21      0.317     0.0952      0.145     0.0379
                zhehen         71         71      0.981      0.915      0.951      0.545
                yaozhe         27         27      0.831      0.815      0.892      0.459

4.2 DCNv4结合SPPF

4.2.1 yolo11-DCNv4_SPPF.yaml

代码语言:javascript
代码运行次数:0
复制
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, DCNv4_SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

4.2.2 改进结果可视化

实验结果如下:

原始mAP50为0.633 提升至0.647

代码语言:javascript
代码运行次数:0
复制
YOLO11-DCNv4_SPPF summary (fused): 292 layers, 4,682,054 parameters, 0 gradients, 7.8 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 15/15 [00:21<00:00,  1.42s/it]
                   all        459        747      0.685      0.615      0.647      0.338
             chongkong         53         54      0.796      0.963      0.918      0.637
               hanfeng          7         10      0.371        0.7      0.538      0.233
              yueyawan        138        171      0.642      0.538      0.602      0.259
               shuiban         71         91      0.825       0.67      0.723      0.407
                youban        105        106      0.824      0.764      0.829      0.453
                 siban         45         92      0.477      0.207      0.269     0.0995
                  yiwu         43        104      0.616      0.423       0.51      0.251
                 yahen         10         21      0.542      0.115       0.23     0.0718
                zhehen         71         71      0.902      0.915      0.948      0.528
                yaozhe         27         27      0.852      0.852      0.903      0.437

4.3 DCNv4结合SPPF+Detect_DCNv4

实验结果如下:

原始mAP50为0.633 提升至0.651

代码语言:javascript
代码运行次数:0
复制
YOLO11-Detect_DCNv4-DCNv4_SPPF summary (fused): 288 layers, 4,538,439 parameters, 0 gradients, 9.3 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 15/15 [00:30<00:00,  2.04s/it]
                   all        459        747      0.725      0.618      0.651      0.322
             chongkong         53         54      0.753      0.963      0.911      0.629
               hanfeng          7         10      0.479      0.644      0.537      0.184
              yueyawan        138        171      0.629      0.491      0.538      0.232
               shuiban         71         91       0.83      0.644      0.722      0.376
                youban        105        106       0.68      0.906      0.855      0.346
                 siban         45         92      0.565      0.207      0.309      0.112
                  yiwu         43        104      0.702        0.5      0.576      0.269
                 yahen         10         21      0.689     0.0952      0.172     0.0598
                zhehen         71         71       0.97      0.924      0.959      0.549
                yaozhe         27         27      0.956      0.809      0.926      0.463

5. 一种基于YOLO11的GC10-DET缺陷检测算法

改进结构图如下:

5.1 yolo11-Detect_DCNv4-DCNv4_SPPF-C3k2_DAB.yaml

代码语言:javascript
代码运行次数:0
复制
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2_DAB, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2_DAB, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2_DAB, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2_DAB, [1024, True]]
  - [-1, 1, DCNv4_SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2_DAB, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2_DAB, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2_DAB, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2_DAB, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect_DCNv4, [nc, 128, 1]] # Detect(P3, P4, P5)

5.2 改进结果可视化

实验结果如下:

原始mAP50为0.633 提升至0.66

代码语言:javascript
代码运行次数:0
复制
YOLO11-Detect_DCNv4-DCNv4_SPPF-C3k2_DAB summary: 706 layers, 4,976,274 parameters, 0 gradients, 10.8 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 15/15 [00:24<00:00,  1.62s/it]
                   all        459        747      0.706      0.623       0.66      0.323
             chongkong         53         54      0.776      0.963      0.928      0.633
               hanfeng          7         10      0.483      0.748      0.636      0.206
              yueyawan        138        171      0.734      0.474      0.586      0.252
               shuiban         71         91      0.773      0.692      0.728      0.389
                youban        105        106      0.746      0.934      0.893      0.331
                 siban         45         92      0.628      0.202      0.301      0.116
                  yiwu         43        104      0.802      0.452       0.57       0.27
                 yahen         10         21      0.271     0.0952      0.119     0.0475
                zhehen         71         71      0.846      0.915      0.949      0.529
                yaozhe         27         27          1      0.755      0.886      0.454

欢迎点赞评论私信获取源码!!

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1.YOLO11介绍
  • 1.1 C3k2
  • 1.2 C2PSA介绍
  • 1.3 11 Detect介绍
  • 2.如何训练GC10-DET数据集
  • 2.1 数据集介绍
  • 2.2 GC10.yaml
  • 2.3 如何训练
  • 2.4 原始训练结果可视化
  • 3.本文算法原理介绍
  • 3.1 DCNv4介绍
  • 3.2 双注意力块(DAB)原理
  • 4.DCNv4创新到YOLO11
  • 4. 1 11Detect结合DCNv4二次创新到YOLO11
    • 4.1.1 yolo11-Detect_DCNv4.yaml
    • 4.1.2 改进结果可视化
  • 4.2 DCNv4结合SPPF
    • 4.2.1 yolo11-DCNv4_SPPF.yaml
    • 4.2.2 改进结果可视化
  • 4.3 DCNv4结合SPPF+Detect_DCNv4
  • 5. 一种基于YOLO11的GC10-DET缺陷检测算法
  • 5.1 yolo11-Detect_DCNv4-DCNv4_SPPF-C3k2_DAB.yaml
  • 5.2 改进结果可视化
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档