首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >PyTorch -有效地应用注意力

PyTorch -有效地应用注意力
EN

Stack Overflow用户
提问于 2018-12-10 21:14:28
回答 1查看 1.6K关注 0票数 7

我已经构建了一个带有注意力的RNN语言模型,并且通过关注所有先前的隐藏状态(只有一个方向),为输入的每个元素创建上下文向量。

在我看来,最直接的解决方案是在RNN输出上使用for循环,以便逐个计算每个上下文向量。

代码语言:javascript
复制
import torch
import torch.nn as nn
import torch.nn.functional as F

class RNN_LM(nn.Module):
    def __init__(self, hidden_size, vocab_size, embedding_dim=None, droprate=0.5):
        super().__init__()
        if not embedding_dim:
            embedding_dim = hidden_size
        self.embedding_matrix = nn.Embedding(vocab_size, embedding_dim)

        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_size, batch_first=False)
        self.attn = nn.Linear(hidden_size, hidden_size)
        self.vocab_dist = nn.Linear(hidden_size, vocab_size)
        self.dropout = nn.Dropout(droprate)

    def forward(self, x):
        x = self.dropout(self.embedding_matrix(x.view(-1, 1)))
        x, states = self.lstm(x)
        #print(x.size())
        x = x.squeeze()
        content_vectors = [x[0].view(1, -1)]
        # for-loop over hidden states and attention
        for i in range(1, x.size(0)):
            prev_states = x[:i]
            current_state = x[i].view(1, -1)

            attn_prod = torch.mm(self.attn(current_state), prev_states.t())
            attn_weights = F.softmax(attn_prod, dim=1)
            context = torch.mm(attn_weights, prev_states)
            content_vectors.append(context)

        return self.vocab_dist(self.dropout(torch.cat(content_vectors)))

注意:这里的forward方法仅用于训练。

然而,这种解决方案效率不是很高,因为代码不能很好地并行化顺序地计算每个上下文向量。但由于上下文向量并不相互依赖,我想知道是否有一种非顺序的方法来计算它们。

那么,有没有一种方法可以在没有循环的情况下计算上下文向量,以便更多的计算可以并行化?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-12-11 06:27:32

好吧,为了清楚起见:我假设我们只关心向量化for循环。x的形状是什么?假设x是二维的,我有以下代码,其中v1执行您的循环,v2是矢量化版本:

代码语言:javascript
复制
import torch
import torch.nn.functional as F

torch.manual_seed(0)

x = torch.randn(3, 6)

def v1():
    for i in range(1, x.size(0)):
        prev = x[:i]
        curr = x[i].view(1, -1)

        prod = torch.mm(curr, prev.t())
        attn = prod # same shape
        context = torch.mm(attn, prev)
        print(context)

def v2():
    # we're going to unroll the loop by vectorizing over the new,
    # 0-th dimension of `x`. We repeat it as many times as there
    # are iterations in the for loop
    repeated = x.unsqueeze(0).repeat(x.size(0), 1, 1)

    # we're looking to build a `prevs` tensor such that
    # prevs[i, x, y] == prev[x, y] at i-th iteration of the loop in v1,
    # up to 0-padding necessary to make them all the same size.
    # We need to build a higher-dimensional equivalent of torch.triu
    xs = torch.arange(x.size(0)).reshape(1, -1, 1)
    zs = torch.arange(x.size(0)).reshape(-1, 1, 1)
    prevs = torch.where(zs < xs, torch.tensor(0.), repeated)

    # this is an equivalent of the above iteration starting at 1
    prevs = prevs[:-1]
    currs = x[1:]

    # a batched matrix multiplication
    prod = torch.matmul(currs, prevs.transpose(1, 2))
    attn = prod # same shape
    context = torch.matmul(attn, prevs)
    # equivalent of a higher dimensional torch.diagonal
    contexts = torch.einsum('iij->ij', (context))
    print(contexts)

print(x)

print('\n------ v1 -------\n')
v1()
print('\n------ v2 -------\n')
v2()

它向量化了你的循环,但有一些注意事项。首先,我假设x是二维的。其次,我跳过了softmax,声称它不会改变输入的大小,因此不会影响向量化。这是真的,但不幸的是,填充0的向量v的softmax不等于未填充的v的0填充的softmax。不过,这可以通过重整化来解决。请让我知道,如果我的假设是正确的,这是否是一个足够好的起点,你的工作。

票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53706462

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档