问如何在GoogleNews- vectors negative3000.bin预训练模型中添加缺失的单词向量？
EN

Stack Overflow用户

提问于 2015-11-29 06:05:33

回答 1查看 3.8K关注 0票数 2

我在python中使用gensim word2vec库，并使用预先训练好的GoogleNews-word2vec-negative300.bin模型。但,

我的语料库中有单词，我没有单词向量，我正在获取keyError，我该如何解决这个问题？

这是我到目前为止已经尝试过的方法。

1:加载经过训练的GoogleNews-vectors-negative300.bin模型：

model = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
print "model loaded..."

2:使用tweet中所有词向量的平均值构建训练集的词向量，然后进行缩放

def buildWordVector(text, size):
vec = np.zeros(size).reshape((1, size))
count = 0.
for word in text:
    try:
        vec += model[word].reshape((1, size))
        count += 1.
        #print "found! ",  word
    except KeyError:
        print "not found! ",  word #missing words
        continue
if count != 0:
    vec /= count
return vec

trained_vecs = np.concatenate([buildWordVector(z, n_dim) for z in x_train])

请告诉我们如何在预先训练好的Word2vec模型中添加新单词？

python

nlp

gensim

word2vec

word-embedding

回答 1

Stack Overflow用户

发布于 2015-12-08 23:55:47

编辑2019/06/07

正如@Oleg Melnikov和https://rare-technologies.com/word2vec-tutorial/#online_training__resuming所指出的，没有词汇树是不可能恢复训练的(在使用C代码完成训练后，词汇树不会被保存)

请注意，使用C工具load_word2vec_format()生成的模型无法恢复训练。你仍然可以使用它们来查询/相似度，但是对训练至关重要的信息(词汇树)在那里是缺失的。

获取预先训练的向量-例如。在gensim
Continue中训练模型在gensim

中训练模型

这些命令可能会派上用场

# Loading pre-trained vectors
model = Word2Vec.load_word2vec_format('/tmp/vectors.bin', binary=True)

# Training the model with list of sentences (with 4 CPU cores)
model.train(sentences, workers=4)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/33976953

复制

相似问题

问如何在GoogleNews- vectors negative3000.bin预训练模型中添加缺失的单词向量？
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在GoogleNews- vectors negative3000.bin预训练模型中添加缺失的单词向量？EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在GoogleNews- vectors negative3000.bin预训练模型中添加缺失的单词向量？
EN