我试图做分类任务,我得到了最后4层从伯特和连接他们。
out = model(...)
out=torch.cat([out['hidden_states'][-i] for i in range(1,5)],dim=-1)现在的形状是(12,200,768*4),它是batch,max_length,concatenation layer,但是对于完全连接的层,我们需要二维。因此,一种方法是像torch.mean((12,200,768*4),dim=1)一样平均,并将输出作为(12,768*4)。但我搞不懂伯特最初的做法是什么
发布于 2022-04-02 00:26:44
没有“原始”的BERT方法与级联的隐藏层分类。您有几个选项可供选择,我将只描述对您的方法的评论,并在下面提出一个备选方案。
初步:
import torch.nn as nn
from transformers import BertTokenizerFast, BertModel
t = BertTokenizerFast.from_pretrained("bert-base-cased")
m = BertModel.from_pretrained("bert-base-cased")
fc = nn.Linear(768, 5)
s = ["This is a random sentence", "This is another random sentence with more words"]
i = t(s, padding=True,return_tensors="pt")
with torch.no_grad():
o = m(**i, output_hidden_states=True)
print(i)首先,您应该查看您的输入:
#print(I)
{'input_ids':
tensor([[ 101, 1188, 1110, 170, 7091, 5650, 102, 0, 0, 0],
[ 101, 1188, 1110, 1330, 7091, 5650, 1114, 1167, 1734, 102]]),
'token_type_ids':
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask':
tensor([[1, 1, 1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
}你在这里应该注意的是,较短的句子被填充了。这是相关的,因为简单地将平均值与torch.mean合并,就会根据填充标记的数量对同一句子进行不同的句子嵌入。当然,在经过充分的培训之后,该模型将在某种程度上学习如何处理这个问题,但是您应该使用更多的复杂平均函数,它可以立即删除填充标记:
def mean_pooling(model_output, attention_mask):
input_mask_expanded = attention_mask.unsqueeze(-1).expand(model_output.size()).float()
return torch.sum(model_output * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
o_mean = [mean_pooling(o.hidden_states[-x],i.attention_mask) for x in range(1,5)]
#we want a tensor and not a list
o_mean = torch.stack(o_mean, dim=1)
#we want only one tensor per sequence
o_mean = torch.mean(o_mean,dim=1)
print(o_mean.shape)
with torch.no_grad():
print(fc(o_mean))输出:
torch.Size([2, 768])
tensor([[ 0.0677, -0.0261, -0.3602, 0.4221, 0.2251],
[-0.0328, -0.0161, -0.5209, 0.5825, 0.2405]])这些操作非常昂贵,人们经常使用一种称为cls池的方法,将其作为一种更便宜、性能相当的替代方案:
#We only use the cls token (i.e. first token of the sequence)
#id 101
o_cls = [o.hidden_states[-x][:, 0] for x in range(1,5)]
#we want a tensor and not a list
o_cls = torch.stack(o_cls, dim=1)
#we want only one tensor per sequence
o_cls = torch.mean(o_cls,dim=1)
print(o_cls.shape)
with torch.no_grad():
print(fc(o_cls))输出:
torch.Size([2, 768])
tensor([[-0.3731, 0.0473, -0.4472, 0.3804, 0.4057],
[-0.3468, 0.0685, -0.5885, 0.4994, 0.4182]])https://stackoverflow.com/questions/71434804
复制相似问题