前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >RuntimeError: Trying to backward through the graph a second time...

RuntimeError: Trying to backward through the graph a second time...

作者头像
狼啸风云
修改2022-09-02 13:07:32
3.3K0
修改2022-09-02 13:07:32
举报

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

torch.autograd.backward

torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None)
  • retain_graph (bool, optional) – If False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way.Defaults to the value of create_graph.
  • create_graph (bool, optional) – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products.Defaults to False.

retain_graph = True (when to use it?)

retain_graph这个参数在平常中我们是用不到的,但是在特殊的情况下我们会用到它:

  1. 一个网络有两个output分别执行backward进行回传的时候: output1.backward(), output2.backward().
  2. 一个网络有两个loss需要分别执行backward进行回传的时候: loss1.backward(), loss1.backward().

以情况2.为例 如果代码这样写,就会出现博文开头的参数:

loss1.backward()
loss2.backward()

正确代码:

loss1.backward(retain_graph=True) #保留backward后的中间参数。
loss2.backward() #所有中间变量都会被释放,以便下一次的循环
optimizer.step() # 更新参数

retain_graph参数为True去保留中间参数从而两个loss的backward()不会相互影响。

补充:两个网络的两个loss需要分别执行backward进行回传的时候: loss1.backward(), loss1.backward().

#两个网络的情况需要分别为两个网络分别定义optimizer
optimizer1= torch.optim.SGD(net1.parameters(), learning_rate, momentum,weight_decay)
optimizer2= torch.optim.SGD(net2.parameters(), learning_rate, momentum,weight_decay)
.....
#train 部分的loss回传处理
loss1 = loss()
loss2 = loss()

optimizer1.zero_grad() #set the grade to zero
loss1.backward(retain_graph=True) #保留backward后的中间参数。
optimizer1.step()

optimizer2.zero_grad() #set the grade to zero
loss2.backward()
optimizer2.step()

scheduler = torch.optim.lr_scheduler.StepLR( 附录:

步骤解释

  • optimizer.zero_grad()

将梯度初始化为零 (因为一个batch的loss关于weight的导数是所有sample的loss关于weight的导数的累加和) 对应d_weights = [0] * n

  • output = net(inputs)

前向传播求出预测的值

  • loss = Loss(outputs, labels)

求loss

  • loss.backward()

反向传播求梯度 对应d_weights = [d_weights[j] + (label[k] - output ) * input[k][j] for j in range(n)]

  • optimizer.step()

更新所有参数 对应weights = [weights[k] + alpha * d_weights[k] for k in range(n)]

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2021-05-20 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • torch.autograd.backward
  • retain_graph = True (when to use it?)
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档