# 【SOT】siameseFC论文和代码解析

1. 前言

## 除了深度学习【目标检测】专栏[1]，我开通了深度学习【目标追踪】专栏[2]，用来记录学习目标追踪算法（单目标追踪SOT/多目标追踪MOT）论文/代码的解析。最近我在阅读目标追踪领域的文献综述时，遇到了很多关于孪生网络（siamese network）在目标追踪领域的应用。这里，我们以单目标追踪SOT中比较经典的Fully-Convolutional Siamese Networks（称之为siameseFC[3])网络，结合论文和代码，展开对siameseFC的讲解。

The key contribution of this paper is to demonstrate that this approach achieves very competitive performance in modern tracking benchmarks at speeds that far exceed the frame-rate requirement.

• 输入为模板图像z（大小为127x127x3) + 搜索图像x(大小为255x255x3)
• 特征提取网络/卷积神经网络 ，论文中采用了较为简单的AlexNet，输出为

,大小为6x6x128以及

,大小为22x22x128

• 互相关运算

（论文中的cross-correlation),实质上是以

• 结果 score map ，大小为（17x17x1)。这里的17=（22-6）/1+1，符合卷积互运算的定义

https://github.com/mozhuangb/SiameseFC-pytorch

2. SiameseFC网络结构定义

• 特征提取网络AlexNet
• 互相关运算网络

AlexNet网络代码定义如下：

```class AlexNet(nn.Module):
def __init__(self):
super(AlexNet, self).__init__()
self.conv1 = nn.Sequential(
nn.BatchNorm2d(96),
nn.ReLU(inplace=True),

self.conv2 = nn.Sequential(
nn.Conv2d(96, 256, 5, stride=1, padding=0, groups=2),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),

self.conv3 = nn.Sequential(
nn.BatchNorm2d(384),
nn.ReLU(inplace=True))

self.conv4 = nn.Sequential(
nn.Conv2d(384, 384, 3, stride=1, padding=0, groups=2),
nn.BatchNorm2d(384),
nn.ReLU(inplace=True))

self.conv5 = nn.Sequential(
nn.Conv2d(384, 256, 3, stride=1, padding=0, groups=2))

def forward(self, x):
conv1 = self.conv1(x)
conv2 = self.conv2(conv1)
conv3 = self.conv3(conv2)
conv4 = self.conv4(conv3)
conv5 = self.conv5(conv4)
return conv5```

```class Siamfc(nn.Module):
def __init__(self, branch):
super(Siamfc, self).__init__()
self.branch = branch
self._initialize_weights()

def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()

def Xcorr(self, x, z):         # x denote search, z denote template
out = []
for i in range(x.size(0)):
out.append(F.conv2d(x[i, :, :, :].unsqueeze(0), z[i, :, :, :].unsqueeze(0)))

def forward(self, x, z):        # x denote search, z denote template
x = self.branch(x)
z = self.branch(z)
xcorr_out = self.Xcorr(x, z)
return out```

```def Xcorr(self, x, z):         # x denote search, z denote template
out = []
for i in range(x.size(0)):
out.append(F.conv2d(x[i, :, :, :].unsqueeze(0), z[i, :, :, :].unsqueeze(0)))

3. siameseFC的输入、输出和损失函数

3.1 siameseFC网络的输入

```crop_z = self.crop(
img_z, bndbox_z, self.exemplarSize
)  # crop template patch from img_z, then resize [127, 127]
crop_x = self.crop(
img_x, bndbox_x, self.instanceSize
)  # crop search patch from img_x, then resize [255, 255]```

```# crop the image patch of the specified size - template(127), search(255)
def crop(self, image, bndbox, out_size):
center = bndbox[:2] + bndbox[2:] / 2
size = bndbox[2:]

context = self.context * size.sum() #(w+h)/2
patch_sz = out_size / self.exemplarSize * \
np.sqrt((size + context).prod())

return crop_pil(image, center, patch_sz, out_size=out_size)```

context就是添加边界的宽度，即上述公式中的2p。这里又跳入函数crop_pil中，定义如下：

```def crop_pil(image, center, size, padding='avg', out_size=None):
# convert bndbox to corners
size = np.array(size)
corners = np.concatenate((center - size / 2, center + size / 2)) #（左上和右下）
corners = np.round(corners).astype(int)

pads = np.concatenate((-corners[:2], corners[2:] - image.size)) #填充原图，防止后面裁剪的时候出现越界现象

patch = image.crop(corners)

if out_size is not None:
if isinstance(out_size, numbers.Number):
out_size = (out_size, out_size)
if not out_size == patch.size:
patch = patch.resize(out_size, Image.BILINEAR)

return patch```

When a sub-window extends beyond the extent of the image, the missing portions are lled with the mean RGB value

2. 缩放，这里用resize做的

3.2 siameseFC的真实标签值

`labels, weights = self.create_labels()  # create corresponding labels and weights`
`其中 self.create_labels()定义如下：`
```# create labels and weights. This section is similar to Matlab version of Siamfc
def create_labels(self):
labels = self.create_logisticloss_labels()
weights = np.zeros_like(labels)

pos_num = np.sum(labels == 1)
neg_num = np.sum(labels == 0)
weights[labels == 1] = 0.5 / pos_num
weights[labels == 0] = 0.5 / neg_num
#weights *= pos_num + neg_num

labels = labels[np.newaxis, :]
weights = weights[np.newaxis, :]

return labels, weights

def create_logisticloss_labels(self):
label_sz = self.scoreSize #17
r_pos = self.rPos / self.totalStride #16/8=2
r_neg = self.rNeg / self.totalStride #0
labels = np.zeros((label_sz, label_sz))

for r in range(label_sz):
for c in range(label_sz):
dist = np.sqrt((r - label_sz // 2)**2 + (c - label_sz // 2)**2)
if dist <= r_pos:
labels[r, c] = 1
elif dist <= r_neg:
labels[r, c] = self.ignoreLabel
else:
labels[r, c] = 0

return labels```

```array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])```

3.3 损失函数

```criterion = BCEWeightLoss()                 # define criterion
output = model(search, template)
loss = criterion(output, labels, weights)/template.size(0)```

4. siameseFC的tracking部分

`scores_up = cv2.resize(scores, (config.final_sz, config.final_sz), interpolation=cv2.INTER_CUBIC)   # [257,257,3]`
`同时为了找到尺度合适的候选区域，代码中完成了以下步骤，相关注释都给出了，不过多讲解：`
```scores_ = np.squeeze(scores_up)
# penalize change of scale
scores_[0, :, :] = config.scalePenalty * scores_[0, :, :]
scores_[2, :, :] = config.scalePenalty * scores_[2, :, :]
# find scale with highest peak (after penalty)
new_scale_id = np.argmax(np.amax(scores_, axis=(1, 2)))
# update scaled sizes
x_sz = (1 - config.scaleLR) * x_sz + config.scaleLR * scaled_search_area[new_scale_id]
target_w = (1 - config.scaleLR) * target_w + config.scaleLR * scaled_target_w[new_scale_id]
target_h = (1 - config.scaleLR) * target_h + config.scaleLR * scaled_target_h[new_scale_id]

# select response with new_scale_id
score_ = scores_[new_scale_id, :, :]
score_ = score_ - np.min(score_)
score_ = score_ / np.sum(score_)
# apply displacement penalty
score_ = (1 - config.wInfluence) * score_ + config.wInfluence * penalty
p = np.asarray(np.unravel_index(np.argmax(score_), np.shape(score_)))                   # position of max response in score_
center = float(config.final_sz - 1) / 2                                                 # center of score_
disp_in_area = p - center
disp_in_xcrop = disp_in_area * float(config.totalStride) / config.responseUp
disp_in_frame = disp_in_xcrop * x_sz / config.instanceSize
pos_y, pos_x = pos_y + disp_in_frame[0], pos_x + disp_in_frame[1]
bboxes[f, :] = pos_x - target_w / 2, pos_y - target_h / 2, target_w, target_h```

5. 总结

[1] https://zhuanlan.zhihu.com/c_1166445784311758848

[2] https://zhuanlan.zhihu.com/c_1177216807848529920

[3] http://xxx.itp.ac.cn/abs/1606.09549

[4] Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation lters. PAMI 37(3) (2015) 583{596

[5] https://github.com/rafellerc/Pytorch-SiamFC

0 条评论

• ### ICLR2019 | 表示形式语言：比较有限自动机和循环神经网络

本文对ICLR2019论文《REPRESENTING FORMAL LANGUAGES：A COMPARISON BETWEEN FINITE AUTOMATA...

• ### CCAI 2020 | 探秘AI的未来：对话Yolanda Gil

2020年中国人工智能大会（Chinese Congress on Artificial Intelligence 2020，简称“CCAI 2020”）将于8...

• ### [scikit-learn 机器学习] 4. 特征提取

通常使用 one-hot 编码，产生2进制的编码，会扩展数据，当数据值种类多时，不宜使用

• ### python读取文件——python读取和保存mat文件

首先我们谈谈MarkDown编辑器，我感觉些倒是挺方便的，因为用惯了LaTeX，对于MarkDown还是比较容易上手的，但是我发现，MarkDown中有...

• ### 投稿 | 图卷积网络 GCN: Graph Convolutional Networks

上面左图是2D卷积神经网络，其输入是4行4列的矩阵，通过卷积核逐步移动实现对整个输入的卷积操作；而右图输入是图网络，其结构和连接是不规则的，无法像卷积神经网络那...

• ### 【python实现卷积神经网络】开始训练

代码来源：https://github.com/eriklindernoren/ML-From-Scratch

• ### 用Python开始机器学习：文本特征抽取与向量化

假设我们刚看完诺兰的大片《星际穿越》，设想如何让机器来自动分析各位观众对电影的评价到底是“赞”（positive）还是“踩”（negative）呢？ 这类问题就...