文章/答案/技术大牛

发布

社区首页 >问答首页 >理解tensorflow中的`tf.nn.nce_loss()`

问理解tensorflow中的`tf.nn.nce_loss()`
EN

Stack Overflow用户

提问于 2017-01-05 07:54:20

回答 2查看 15K关注 0票数 24

我正在尝试理解Tensorflow中的NCE损失函数。NCE损失用于word2vec任务，例如：

# Look up embeddings for inputs.
embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

# Compute the average NCE loss for the batch.
# tf.nce_loss automatically draws a new sample of the negative labels each
# time we evaluate the loss.
loss = tf.reduce_mean(
    tf.nn.nce_loss(weights=nce_weights,
                   biases=nce_biases,
                   labels=train_labels,
                   inputs=embed,
                   num_sampled=num_sampled,
                   num_classes=vocabulary_size))

更多详细信息，请参考Tensorflow word2vec_basic.py

NCE函数中的输入和输出矩阵是什么？

在word2vec模型中，我们对构建单词的表示感兴趣。在训练过程中，给定一个滑动窗口，每个单词将有两个嵌入: 1)当单词是中心词时；2)当单词是上下文词时。这两个嵌入分别称为输入向量和输出向量。(more explanations of input and output matrices)

在我看来，输入矩阵是embeddings，输出矩阵是nce_weights。是对的吗？

最终的嵌入是什么？

根据s0urcer的post也与nce有关，它说最终的嵌入矩阵就是输入矩阵。而，some others saying，final_embedding=input_matrix+output_matrix。哪一个是正确的/更常见的？

python

tensorflow

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-05-22 01:19:10

让我们看一下word2vec example (examples/tutorials/word2vec)中的相关代码。

embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

这两行代码创建了嵌入表示。embeddings是一个矩阵，其中每行表示一个单词向量。embedding_lookup是获取对应于train_inputs的向量的一种快速方法。在word2vec示例中，train_inputs由一些int32数字组成，表示目标单词的id。基本上，可以通过隐藏层特征来放置它。

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                        stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

这两行创建参数。它们将在训练期间由优化器更新。我们可以使用tf.matmul(embed, tf.transpose(nce_weights)) + nce_biases来获得最终的输出分数。换句话说，分类中的最后一个内积层可以被它的所取代。

loss = tf.reduce_mean(
      tf.nn.nce_loss(weights=nce_weights,     # [vocab_size, embed_size]
                   biases=nce_biases,         # [vocab_size]
                   labels=train_labels,       # [bs, 1]
                   inputs=embed,              # [bs, embed_size]
                   num_sampled=num_sampled, 
                   num_classes=vocabulary_size))

这些代码行创建了nce loss，@garej给出了一个非常好的解释。在nce算法中，num_sampled指的是负采样次数。

为了说明nce的用法，我们可以通过以下两个步骤将其应用于mnist example (examples/tutorials/mnist/mnist_deep.py)：

1.用隐藏层输出替换embed。隐藏层的维度为1024，num_output为10。num_sampled的最小值为1，记得去掉deepnn()中的最后一层内积层。

y_conv, keep_prob = deepnn(x)                                            
                                                                           
num_sampled = 1                                                          
vocabulary_size = 10                                                     
embedding_size = 1024                                                    
with tf.device('/cpu:0'):                                                
  embed = y_conv                                                         
  # Construct the variables for the NCE loss                             
  nce_weights = tf.Variable(                                             
      tf.truncated_normal([vocabulary_size, embedding_size],             
                          stddev=1.0 / math.sqrt(embedding_size)))       
  nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

2.创建损耗并计算输出。在计算输出后，我们可以使用它来计算精度。请注意，这里的标签不是softmax中使用的one-hot向量。标签是训练样本的原始标签。

loss = tf.reduce_mean(                                   
    tf.nn.nce_loss(weights=nce_weights,                           
                   biases=nce_biases,                             
                   labels=y_idx,                                  
                   inputs=embed,                                  
                   num_sampled=num_sampled,                       
                   num_classes=vocabulary_size))                  
                                                                    
output = tf.matmul(y_conv, tf.transpose(nce_weights)) + nce_biases
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))

当我们设置num_sampled=1时，val精度将在98.8%附近结束。如果我们设置num_sampled=9，我们可以获得几乎与softmax训练的值精度相同的值精度。但请注意，nce不同于softmax。

nce训练mnist的完整代码可以在here中找到。希望对您有所帮助。

票数 18

Stack Overflow用户

发布于 2017-01-31 20:19:05

embeddings张量是最终的输出矩阵。它将单词映射到向量。在你的单词预测图中使用它。

输入矩阵是从训练文本生成的一批centre-word : context-word对(分别为train_input和train_label )。

虽然我还不知道nce_loss op的确切工作原理，但其基本思想是它使用单层网络(参数nce_weights和nce_biases)将输入向量(使用embed op从embeddings中选择)映射到输出单词，然后将输出与训练标签(训练文本中的相邻单词)以及单词中所有其他单词的随机子样本(num_sampled)进行比较，然后修改输入向量(存储在embeddings中)和网络参数以将误差降至最低。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/41475180

复制

相似问题

问理解tensorflow中的`tf.nn.nce_loss()`
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问理解tensorflow中的`tf.nn.nce_loss()`EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问理解tensorflow中的`tf.nn.nce_loss()`
EN