文章/答案/技术大牛

发布

社区首页 >问答首页 >PyTorch -使用LSTM的水务署

问PyTorch -使用LSTM的水务署
EN

Stack Overflow用户

提问于 2018-04-07 12:58:54

回答 1查看 461关注 0票数 1

我试着用谷歌的研究论文神经模型在WSD上复制PyTorch。

在对大型数据集进行培训之前，我遇到了一些问题，无法适应这个模型。

使用此培训集：

这部电影也是三部曲中的第一部。

这一模式定义：

class WordGuesser(nn.Module):
    def __init__(self, hidden_dim, context_dim, embedding_dim, vocabulary_dim, batch_dim, window_dim):
        super(WordGuesser, self).__init__()
        self.hidden_dim = hidden_dim
        self.batch_dim = batch_dim
        self.window_dim = window_dim
        self.word_embeddings = nn.Embedding(vocabulary_dim, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)
        #self.extract_context = nn.Linear((2 * window_dim + 1) * hidden_dim, context_dim)
        self.extract_context = nn.Linear(hidden_dim, context_dim)
        self.predict = nn.Linear(context_dim, vocabulary_dim)
        self.hidden = self.init_hidden()

    def init_hidden(self):
        return (autograd.Variable(torch.zeros(1, self.batch_dim, self.hidden_dim).cuda()),
                autograd.Variable(torch.zeros(1, self.batch_dim, self.hidden_dim).cuda()))

    def forward(self, sentence, hidden):
        embeddings = self.word_embeddings(sentence)
        out, self.hidden = self.lstm(embeddings.permute(1, 0, 2), hidden)
        lstm_out = out[-1]
        context = self.extract_context(lstm_out)
        prediction = self.predict(context)
        return prediction, context

而这个训练程序是：

num_epoch = 100
hidden_units = 512
embedding_dim = 256
context_dim = 256

def mytrain():
    lines = open('training/overfit.txt').readlines()
    sentences = data.split_to_sentences(lines) #uses spaCy to detect sentences from each line
    word2idx=dict() #dictionary is built from the training set
    idx2word =dict()
    i = 0
    for s in sentences:
        for t in s.split(' '):
            if t in word2idx:
                continue
            word2idx[t] = i
            idx2word[i] = t
            i += 1
    word2idx['$'] = i #the token to guess the missing word in a sentence
    idx2word[i] = '$'
    X = list()
    Y = list()
    for sentence in sentences:
        sentence = sentence.split(' ')
        for i in range(len(sentence)):
            newsentence = list(sentence)
            newsentence[i] = '$'
            if not sentence[i] in word2idx:
                continue
            indices = [word2idx[w] for w in newsentence]
            label = word2idx[sentence[i]]
            X.append(indices)
            Y.append(label)
    model = WordGuesser(hidden_units, context_dim, embedding_dim, len(word2idx), len(X), len(X[0]))
    model.train()
    model.cuda()
    input = torch.LongTensor(X).cuda()
    output = torch.LongTensor(Y).cuda()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    model.hidden = model.init_hidden()
    for epoch in range(num_epoch):
        model.hidden = model.init_hidden()
        model.zero_grad()
        input_tensor = autograd.Variable(input)
        target_tensor = autograd.Variable(output)

        predictions, context = model(input_tensor, model.hidden)
        for i, prediction in enumerate(predictions):
            sorted_val = sorted(enumerate(np.array(prediction.data)), key=lambda x : x[1], reverse=True)
            print([(idx2word[x[0]], x[1]) for x in sorted_val[:5]], idx2word[Y[i]])
        loss = criterion(predictions, target_tensor)
        loss.backward()
        optimizer.step()

        print(epoch, loss.data[0])

    torch.save(model, "train2.pt")

在培训过程中，你可以从以下分数中看出，在21世纪之后，模型似乎会变得过于合适(预测中的前5个单词，一行中的最后一个词是这句话的标签)：

(“那”，11.362326)，(“电影”，11.356865)，(“也”，7.5573149)，(“to”，5.3518314)，(“意图”，4.3520432) (电影，11.073805)，('The'，10.451499)，('was'，7.5498624)，(‘was’，4.9684553)，(be，4.0730805)电影 (“曾经”，11.232123)，(“也”，9.9741745)，('the'，6.0156212)，(be，4.9949703)，('The'，4.5516477)是 (“也”，9.6998224)，(“曾经”，9.6202812)，('The'，6.345758)，(“电影”，4.9122157)，('be'，2.6727715) (“预定”，18.344809)，(“to”，16.410078)，(“电影”，10.147289)，('The'，9.8423424)，('$'，9.6181822) (“to”，12.442947)，(“预定”，10.900065)，(“电影”，8.2598763)，('The'，8.0493736)，('$'，4.4901967) (' be '，12.189278)，('was'，7.7172523)，(‘曾经’，7.5415096)，('the'，5.2521734)，('The'，4.1723843) be (' the '，15.59604)，(be，9.3750105)，('first'，8.9820032)，('was'，8.6859236)，(‘还’，5.0665498) (‘I’，10.191225)，('the'，5.1829329)，('in'，3.6020348)，(be，3.4108081)，('a'，1.5569853) (' in '，14.731103)，('first'，9.3131113)，('a'，5.982264)，(‘三部曲’，4.2928643)，(be，0.49548936) ('a'，14.357709)，('in'，8.3088198)，(‘三部曲’，6.3918238)，(‘第一个’，6.2178354)，(‘预定’，0.95656234) a (“三部曲”，14.351434)，(a，4.5073452)，('in'，4.2348137)，('$'，3.7552347)，(‘打算’，3.5101018)三部曲 ('.'，18.152126)，('$'，12.028764)，('to'，9.6003456)，(‘意图’，8.1202478)，('The'，4.9225812)。

在运行另一个Python脚本时，该脚本加载模型并查询以下单词(使用相同的代码打印培训期间的分数)：

这部电影也是三部曲中的第一部。be 这部电影也打算成为三部曲中的第一部。曾经是美元电影也是三部曲中的第一部。这个

我得到了这些分数：

(电影，24.066889)，('$'，20.107487)，(‘曾经’，16.855488)，('a'，12.969441)，('in'，8.1248817) be (电影，24.089062)，('$'，20.116539)，('in'，16.891994)，('a'，12.982826)，(‘in’，8.1167336)是 (“电影”，23.993624)，('$'，20.108011)，('in'，16.891005)，('a'，12.960193)，(‘in’，8.1577587)

我还尝试设置为False的model.train()模式，使用model.eval()，以及调用topk的LSTM分数，但结果并不令人满意，

lstm

pytorch

word-sense-disambiguation

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-04-08 15:24:05

通过torch.save()只保存模型的torch.save()，然后使用model.load_state_dict()将其加载回评估阶段来解决问题。

此外，我将句子查询循环包装在另一个循环中，作为一个热身(从这里获得)，一旦它最后一次循环，我设置model.eval()并打印分数，结果证明是正确的。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49707613

复制

相似问题

问PyTorch -使用LSTM的水务署
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PyTorch -使用LSTM的水务署EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PyTorch -使用LSTM的水务署
EN