在Pytorch中,如何将L1正则化器添加到激活中?

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (2)
  • 关注 (0)
  • 查看 (2628)

我想将L1正则化器添加到ReLU的激活输出中。或者说,如何仅将规则化器添加到网络中的特定层?

crossentropy + lambda1*L1(layer1) + lambda2*L1(layer2) + ...

我相信提供给torch.optim.Adagrad的参数仅适用于交叉熵损失。或者它可能适用于整个网络的所有参数(权重)。但无论如何,似乎不允许将单一的正则化应用于单层激活,并且不会提供L1损失。

提问于
用户回答回答于

以下是如何做到的方法:

  • 在模块的前向返回中,最终输出和要应用L1正则化的层的输出
  • loss变量为交叉熵、输出损失、目标损失和L1惩罚之和。

下面是一个示例代码

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)

    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out, layer1_out, layer2_out


def l1_penalty(var):
    return torch.abs(var).sum()


def l2_penalty(var):
    return torch.sqrt(torch.pow(var, 2).sum())


batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

# usually following code is looped over all batches 
# but let's just do a dummy batch for brevity

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())

optimizer.zero_grad()
outputs, layer1_out, layer2_out = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
l1_regularization = lambda1 * l1_penalty(layer1_out)
l2_regularization = lambda2 * l2_penalty(layer2_out)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()
用户回答回答于

正则化应该是模型每一层的加权参数,而不是每一层的输出。

import torch
from torch.autograd import Variable
from torch.nn import functional as F


class MLP(torch.nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(128, 32)
        self.linear2 = torch.nn.Linear(32, 16)
        self.linear3 = torch.nn.Linear(16, 2)
    def forward(self, x):
        layer1_out = F.relu(self.linear1(x))
        layer2_out = F.relu(self.linear2(layer1_out))
        out = self.linear3(layer2_out)
        return out

batchsize = 4
lambda1, lambda2 = 0.5, 0.01

model = MLP()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)

inputs = Variable(torch.rand(batchsize, 128))
targets = Variable(torch.ones(batchsize).long())
l1_regularization, l2_regularization = torch.tensor(0), torch.tensor(0)

optimizer.zero_grad()
outputs = model(inputs)
cross_entropy_loss = F.cross_entropy(outputs, targets)
for param in model.parameters():
    l1_regularization += torch.norm(param, 1)
    l2_regularization += torch.norm(param, 2)

loss = cross_entropy_loss + l1_regularization + l2_regularization
loss.backward()
optimizer.step()

扫码关注云+社区

领取腾讯云代金券