把所有相关的bn设置为 momentum=1.0 。
先比较两个state_dict,来freeze交集:
def freeze_model(model, defined_dict, keep_step=None):
for (name, param) in model.named_parameters():
if name in defined_dict:
param.requires_grad = False
else:
pass
freezed_num, pass_num = 0, 0
for (name, param) in model.named_parameters():
if param.requires_grad == False:
freezed_num += 1
else:
pass_num += 1
return model, freezed_num, pass_num
之后再指定optimizer的时候要注意避开这部分参数,防止被freeze的参数重新被optimizer将requires_grad置为True:
# 注意这里的 filter 是python3的写法,所以直接用就行,没必要加 list() 。
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=0.001, \
betas=(0.9, 0.999), eps=1e-08, weight_decay=1e-5)
def check_state_dict_same(pre_dict, cur_dict):
diff_lst = list()
for key in cur_dict.keys():
if key in pre_dict:
if not torch.equal(cur_dict[key], pre_dict[key]):
diff_lst.append(key)
return diff_lst
diff_lst = check_state_dict_same(pre_dict=pre_state_dict, cur_dict=model.state_dict())
if diff_lst:
print('\n\n Change by follow pars: \n')
print(diff_lst)
print('\n\n')
exit(0)
else:
print('\n\n Model is successfully freezed . \n')
bn在model.train()的模式下还是会自动更新参数的,就算放到 with torch.no_grad(): 里面或者把每个bn的参数都 p.requires_grad = False 也没用。
由于bn模块经常被写成可复用的形式。因此固定住bn的时候,记得另外写一套供不需被固定的分支所调用的bn模块。