我正在与yolostereo3d的stereo3d对象检测(仅立体声摄像机,没有天鹅绒)在基蒂数据集上工作,以edgeNext为骨干,而不是resNet。
在使用相同的基蒂数据集将主干网从resNet更改为edgeNext之前,一切正常。但是,之后我开始出现以下错误:
RuntimeError: Given groups=1, weight of size [8, 1024, 1, 1], expected input[8, 304, 9, 40] to have 1024 channels, but got 304 channels instead
下面是我如何改变脊梁的方法:
class YoloStereo3DCore(nn.Module):
"""
Inference Structure of YoloStereo3D
Similar to YoloMono3D,
Left and Right image are fed into the backbone in batch. So they will affect each other with BatchNorm2d.
"""
def __init__(self, backbone_arguments):
f = open("/home/zakaseb/Thesis/YoloStereo3D/Stereo3D/Sequence.txt", "a")
f.write("yolosterero3dCore_init \n")
f.close()
super(YoloStereo3DCore, self).__init__()
self.backbone =edgenext_small(**backbone_arguments) # Resnet, change backbone from here
base_features = 256 #if backbone_arguments['depth'] > 34 else 64 # meaning which depth of resnet
self.neck = StereoMerging(base_features) #stereomerging outputs features and depth output.
下面是edgenext_small()
@BACKBONE_DICT.register_module
def edgenext_small(pretrained=False, **kwargs):
FPS @ BS=1: 93.84 & @ BS=256: 1785.92 for MobileViT_S
model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
global_block=[0, 1, 1, 1],
global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
use_pos_embd_xca=[False, True, False, False],
kernel_sizes=[3, 5, 7, 9],
d2_scales=[2, 2, 3, 4],
classifier_dropout=0.0)
return model
发布于 2022-09-03 23:57:58
如您所见,您的主干返回304个通道的功能映射,但下一个层期望1024个通道。目前有两种解决方案:
dims
参数的EdgeNeXt
以1024结尾。典型情况是:model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 1024], expan_ratio=4,
global_block=[0, 1, 1, 1],
global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
use_pos_embd_xca=[False, True, False, False],
kernel_sizes=[3, 5, 7, 9],
d2_scales=[2, 2, 3, 4],
classifier_dropout=0.0)
torch.Conv2D(304, 1024, kernel_size)
来学习304中的1024特性。否则,您可以更改下一层的架构。https://stackoverflow.com/questions/73593551
复制相似问题