问如何计算pytorch中BCEWithLogitsLoss的不平衡权重
EN

Stack Overflow用户

提问于 2019-07-14 01:58:40

回答 4查看 10.5K关注 0票数 9

我正在尝试用270标签来解决一个多标签问题，并且我已经将目标标签转换成了一个热编码形式。我正在使用BCEWithLogitsLoss()。由于训练数据不平衡，我使用了pos_weight参数，但我有点困惑。

pos_weight (张量，可选)-正例的权重。必须是长度等于类数的向量。

我是否需要给出每个标签的正值的总计数作为张量，或者它们通过权重表示其他东西？

pytorch

multilabel-classification

回答 4

Stack Overflow用户

发布于 2020-04-01 23:01:48

PyTorch documentation for BCEWithLogitsLoss建议pos_weight为每个类别的负数和正数之间的比率。

因此，如果len(dataset)为1000，则多热编码的元素0有100个正计数，那么pos_weights_vector的元素0应该为900/100 = 9。这意味着二进制交叉发送损失将表现为数据集包含900个正样本而不是100个。

下面是我的实现：

  def calculate_pos_weights(class_counts):
    pos_weights = np.ones_like(class_counts)
    neg_counts = [len(data)-pos_count for pos_count in class_counts]
    for cdx, pos_count, neg_count in enumerate(zip(class_counts,  neg_counts)):
      pos_weights[cdx] = neg_count / (pos_count + 1e-5)

    return torch.as_tensor(pos_weights, dtype=torch.float)

其中class_counts只是正样本的按列求和。我在PyTorch论坛上进行了posted it，其中一位PyTorch开发人员对此表示支持。

票数 8

Stack Overflow用户

发布于 2019-07-15 08:47:57

PyTorch解决方案

好吧，实际上我已经看过文档了，你可以简单地使用pos_weight。

此参数为每个类的正样本赋予权重，因此，如果您有270类，则应该使用shape (270,)传递torch.Tensor，为每个类定义权重。

下面是来自documentation的略微修改过的代码片段

# 270 classes, batch size = 64    
target = torch.ones([64, 270], dtype=torch.float32)  
# Logits outputted from your network, no activation
output = torch.full([64, 270], 0.9)
# Weights, each being equal to one. You can input your own here.
pos_weight = torch.ones([270])
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
criterion(output, target)  # -log(sigmoid(0.9))

自制解决方案

说到权重，没有内置的解决方案，但你可以很容易地自己编写一个：

import torch

class WeightedMultilabel(torch.nn.Module):
    def __init__(self, weights: torch.Tensor):
        self.loss = torch.nn.BCEWithLogitsLoss()
        self.weights = weights.unsqueeze()

    def forward(outputs, targets):
        return self.loss(outputs, targets) * self.weights

Tensor的长度必须与多标签分类中的类别数量(270)相同，每个类别都为您的特定示例赋予权重。

计算权重

您只需在数据集中添加每个样本的标签，除以最小值并在末尾倒数。

一种代码片段：

weights = torch.zeros_like(dataset[0])
for element in dataset:
    weights += element

weights = 1 / (weights / torch.min(weights))

使用这种方法，出现最少的类将给出正常的损失，而其他类的权重将小于1。

但是，它可能会在训练过程中造成一些不稳定，所以您可能希望试验一下这些值(也许可以使用log变换而不是线性变换？)

其他方法

你可能会考虑上采样/下采样(虽然这个操作很复杂，因为你还会添加/删除其他类，所以我认为需要高级启发式)。

票数 2

Stack Overflow用户

发布于 2020-11-25 17:33:59

只是为了快速修改@crypdick的答案，这个函数的实现对我来说是有效的：

def calculate_pos_weights(class_counts,data):
    pos_weights = np.ones_like(class_counts)
    neg_counts = [len(data)-pos_count for pos_count in class_counts]
    for cdx, (pos_count, neg_count) in enumerate(zip(class_counts,  neg_counts)):
        pos_weights[cdx] = neg_count / (pos_count + 1e-5)

    return torch.as_tensor(pos_weights, dtype=torch.float)

其中data是您尝试应用权重的数据集。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57021620

复制

相似问题

问如何计算pytorch中BCEWithLogitsLoss的不平衡权重
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何计算pytorch中BCEWithLogitsLoss的不平衡权重EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何计算pytorch中BCEWithLogitsLoss的不平衡权重
EN