blocks|key|2489235|text|embeddings张量是最终的输出矩阵。它将单词映射到向量。在你的单词预测图中使用它。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2489236|输入矩阵是从训练文本生成的一批centre-word+:+context-word对(分别为train_input和train_label+)。|2489237|虽然我还不知道nce_loss+op的确切工作原理，但其基本思想是它使用单层网络(参数nce_weights和nce_biases)将输入向量(使用embed+op从embeddings中选择)映射到输出单词，然后将输出与训练标签(训练文本中的相邻单词)以及单词中所有其他单词的随机子样本(num_sampled)进行比较，然后修改输入向量(存储在embeddings中)和网络参数以将误差降至最低。|2489238|entityMap^0|0|A|0|F|Q|1A|B|1M|B|0|7|8|17|B|1J|A|22|5|2B|A|41|B|4U|A|0^^$0|@$1|2|3|4|5|6|7|L|8|@$9|M|A|N|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|O|8|@$9|P|A|Q|B|C]|$9|R|A|S|B|C]|$9|T|A|U|B|C]]|D|@]|E|$]]|$1|H|3|I|5|6|7|V|8|@$9|W|A|X|B|C]|$9|Y|A|Z|B|C]|$9|10|A|11|B|C]|$9|12|A|13|B|C]|$9|14|A|15|B|C]|$9|16|A|17|B|C]|$9|18|A|19|B|C]]|D|@]|E|$]]|$1|J|3|-4|5|6|7|1A|8|@]|D|@]|E|$]]]|K|$]]

The <code>embeddings</code> Tensor is your final output matrix. It maps words to vectors. Use this in your word prediction graph.

The input matrix is a batch of <code>centre-word : context-word</code> pairs (<code>train_input</code> and <code>train_label</code> respectively) generated from the training text. 

While the exact workings of the <code>nce_loss</code> op are not yet know to me, the basic idea is that it uses a single layer network (parameters <code>nce_weights</code> and <code>nce_biases</code>) to map an input vector (selected from <code>embeddings</code> using the <code>embed</code> op) to an output word, and then compares the output to the training label (an adjacent word in the training text) and also to a random sub-sample (<code>num_sampled</code>) of all other words in the vocab, and then modifies the input vector (stored in <code>embeddings</code>) and the network parameters to minimise the error.

blocks|key|2699742|text|让我们看一下word2vec+example+(examples/tutorials/word2vec)中的相关代码。|type|unstyled|depth|inlineStyleRanges|offset|length|style|BOLD|entityRanges|data|2699743|embeddings+=+tf.Variable(
++++tf.random_uniform([vocabulary_size,+embedding_size],+-1.0,+1.0))
embed+=+tf.nn.embedding_lookup(embeddings,+train_inputs)|code-block|syntax|javascript|2699744|这两行代码创建了嵌入表示。embeddings是一个矩阵，其中每行表示一个单词向量。embedding_lookup是获取对应于train_inputs的向量的一种快速方法。在word2vec示例中，train_inputs由一些int32数字组成，表示目标单词的id。基本上，可以通过隐藏层特征来放置它。|CODE|2699745|#+Construct+the+variables+for+the+NCE+loss
nce_weights+=+tf.Variable(
++++tf.truncated_normal([vocabulary_size,+embedding_size],
++++++++++++++++++++++++stddev=1.0+/+math.sqrt(embedding_size)))
nce_biases+=+tf.Variable(tf.zeros([vocabulary_size]))|2699746|这两行创建参数。它们将在训练期间由优化器更新。我们可以使用tf.matmul(embed,+tf.transpose(nce_weights))+%2B+nce_biases来获得最终的输出分数。换句话说，分类中的最后一个内积层可以被它的所取代。|2699747|loss+=+tf.reduce_mean(
++++++tf.nn.nce_loss(weights=nce_weights,+++++#+[vocab_size,+embed_size]
+++++++++++++++++++biases=nce_biases,+++++++++#+[vocab_size]
+++++++++++++++++++labels=train_labels,+++++++#+[bs,+1]
+++++++++++++++++++inputs=embed,++++++++++++++#+[bs,+embed_size]
+++++++++++++++++++num_sampled=num_sampled,+
+++++++++++++++++++num_classes=vocabulary_size))|2699748|这些代码行创建了nce+loss，@garej给出了一个非常好的解释。在nce算法中，num_sampled指的是负采样次数。|2699749|2699750|为了说明nce的用法，我们可以通过以下两个步骤将其应用于mnist+example+(examples/tutorials/mnist/mnist_deep.py)：|2699751|1.用隐藏层输出替换embed。隐藏层的维度为1024，num_output为10。num_sampled的最小值为1，记得去掉deepnn()中的最后一层内积层。|2699752|y_conv,+keep_prob+=+deepnn(x)++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
num_sampled+=+1++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
vocabulary_size+=+10+++++++++++++++++++++++++++++++++++++++++++++++++++++
embedding_size+=+1024++++++++++++++++++++++++++++++++++++++++++++++++++++
with+tf.device('/cpu:0'):++++++++++++++++++++++++++++++++++++++++++++++++
++embed+=+y_conv+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++#+Construct+the+variables+for+the+NCE+loss+++++++++++++++++++++++++++++
++nce_weights+=+tf.Variable(+++++++++++++++++++++++++++++++++++++++++++++
++++++tf.truncated_normal([vocabulary_size,+embedding_size],+++++++++++++
++++++++++++++++++++++++++stddev=1.0+/+math.sqrt(embedding_size)))+++++++
++nce_biases+=+tf.Variable(tf.zeros([vocabulary_size]))+|2699753|2.创建损耗并计算输出。在计算输出后，我们可以使用它来计算精度。请注意，这里的标签不是softmax中使用的one-hot向量。标签是训练样本的原始标签。|2699754|loss+=+tf.reduce_mean(+++++++++++++++++++++++++++++++++++
++++tf.nn.nce_loss(weights=nce_weights,+++++++++++++++++++++++++++
+++++++++++++++++++biases=nce_biases,+++++++++++++++++++++++++++++
+++++++++++++++++++labels=y_idx,++++++++++++++++++++++++++++++++++
+++++++++++++++++++inputs=embed,++++++++++++++++++++++++++++++++++
+++++++++++++++++++num_sampled=num_sampled,+++++++++++++++++++++++
+++++++++++++++++++num_classes=vocabulary_size))++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
output+=+tf.matmul(y_conv,+tf.transpose(nce_weights))+%2B+nce_biases
correct_prediction+=+tf.equal(tf.argmax(output,+1),+tf.argmax(y_,+1))|2699755|当我们设置num_sampled=1时，val精度将在98.8%25附近结束。如果我们设置num_sampled=9，我们可以获得几乎与softmax训练的值精度相同的值精度。但请注意，nce不同于softmax。|2699756|nce训练mnist的完整代码可以在here中找到。希望对您有所帮助。|2699757|entityMap|0|LINK|mutability|MUTABLE|url|https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/2_BasicModels/word2vec.py|1|https://github.com/yl-1993/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_deep.py|2|https://github.com/yl-1993/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_deep_nce.py^0|6|8|O|R|0|0|0|D|A|16|G|1S|C|2S|C|37|5|3O|2|0|0|T|1K|0|0|8|8|10|3|17|B|H|6|0|0|4|3|S|5|17|12|1|0|0|8|N|4|13|2|16|B|1S|8|0|0|0|B|0|0|5|D|R|5|17|D|2J|3|2P|7|0|0|3|5|5|I|4|2|0^^$0|@$1|2|3|4|5|6|7|1M|8|@$9|1N|A|1O|B|C]]|D|@$9|1P|A|1Q|1|1R]]|E|$]]|$1|F|3|G|5|H|7|1S|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|1T|8|@$9|1U|A|1V|B|M]|$9|1W|A|1X|B|M]|$9|1Y|A|1Z|B|M]|$9|20|A|21|B|M]|$9|22|A|23|B|M]|$9|24|A|25|B|M]]|D|@]|E|$]]|$1|N|3|O|5|H|7|26|8|@]|D|@]|E|$I|J]]|$1|P|3|Q|5|6|7|27|8|@$9|28|A|29|B|M]]|D|@]|E|$]]|$1|R|3|S|5|H|7|2A|8|@]|D|@]|E|$I|J]]|$1|T|3|U|5|6|7|2B|8|@$9|2C|A|2D|B|M]|$9|2E|A|2F|B|M]|$9|2G|A|2H|B|M]|$9|2I|A|2J|B|C]]|D|@]|E|$]]|$1|V|3|-4|5|6|7|2K|8|@]|D|@]|E|$]]|$1|W|3|X|5|6|7|2L|8|@$9|2M|A|2N|B|M]|$9|2O|A|2P|B|C]]|D|@$9|2Q|A|2R|1|2S]]|E|$]]|$1|Y|3|Z|5|6|7|2T|8|@$9|2U|A|2V|B|C]|$9|2W|A|2X|B|M]|$9|2Y|A|2Z|B|M]|$9|30|A|31|B|M]|$9|32|A|33|B|M]]|D|@]|E|$]]|$1|10|3|11|5|H|7|34|8|@]|D|@]|E|$I|J]]|$1|12|3|13|5|6|7|35|8|@$9|36|A|37|B|C]]|D|@]|E|$]]|$1|14|3|15|5|H|7|38|8|@]|D|@]|E|$I|J]]|$1|16|3|17|5|6|7|39|8|@$9|3A|A|3B|B|M]|$9|3C|A|3D|B|M]|$9|3E|A|3F|B|M]|$9|3G|A|3H|B|M]|$9|3I|A|3J|B|M]]|D|@]|E|$]]|$1|18|3|19|5|6|7|3K|8|@$9|3L|A|3M|B|M]|$9|3N|A|3O|B|M]]|D|@$9|3P|A|3Q|1|3R]]|E|$]]|$1|1A|3|-4|5|6|7|3S|8|@]|D|@]|E|$]]]|1B|$1C|$5|1D|1E|1F|E|$1G|1H]]|1I|$5|1D|1E|1F|E|$1G|1J]]|1K|$5|1D|1E|1F|E|$1G|1L]]]]

Let's look at the relative code in word2vec example (<a href="https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/2_BasicModels/word2vec.py" rel="nofollow noreferrer">examples/tutorials/word2vec</a>).
<pre><code>embeddings = tf.Variable(
 tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)
</code></pre>
These two lines create embedding representations. <code>embeddings</code> is a matrix where each row represents a word vector. <code>embedding_lookup</code> is a quick way to get vectors corresponding to <code>train_inputs</code>. In word2vec example, <code>train_inputs</code> consists of some <code>int32</code> number, representing the <code>id</code> of target words. Basically, it can be placed by hidden layer feature.
<pre><code># Construct the variables for the NCE loss
nce_weights = tf.Variable(
 tf.truncated_normal([vocabulary_size, embedding_size],
 stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
</code></pre>
These two lines create parameters. They will be updated by optimizer during training. We can use <code>tf.matmul(embed, tf.transpose(nce_weights)) + nce_biases</code> to get final output score. In other words, last inner-product layer in classification can be replaced by it.
<pre><code>loss = tf.reduce_mean(
 tf.nn.nce_loss(weights=nce_weights, # [vocab_size, embed_size]
 biases=nce_biases, # [vocab_size]
 labels=train_labels, # [bs, 1]
 inputs=embed, # [bs, embed_size]
 num_sampled=num_sampled, 
 num_classes=vocabulary_size))
</code></pre>
These lines create <code>nce loss</code>, @garej has given a very good explanation. <code>num_sampled</code> refers to the number of negative sampling in <code>nce</code> algorithm.
<hr />
To illustrate the usage of <code>nce</code>, we can apply it in mnist example (<a href="https://github.com/yl-1993/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_deep.py" rel="nofollow noreferrer">examples/tutorials/mnist/mnist_deep.py</a>) with following 2 steps:
1. Replace embed with hidden layer output. The dimension of hidden layer is <code>1024</code> and num_output is <code>10</code>. Minimum value of <code>num_sampled</code> is 1. Remember to remove the last inner-product layer in <code>deepnn()</code>.
<pre><code>y_conv, keep_prob = deepnn(x) 
 
num_sampled = 1 
vocabulary_size = 10 
embedding_size = 1024 
with tf.device('/cpu:0'): 
 embed = y_conv 
 # Construct the variables for the NCE loss 
 nce_weights = tf.Variable( 
 tf.truncated_normal([vocabulary_size, embedding_size], 
 stddev=1.0 / math.sqrt(embedding_size))) 
 nce_biases = tf.Variable(tf.zeros([vocabulary_size])) 
</code></pre>
2. Create loss and compute output. After computing the output, we can use it to calculate accuracy. Note that the label here is not one-hot vector as used in softmax. Labels are the original label of training samples.
<pre><code>loss = tf.reduce_mean( 
 tf.nn.nce_loss(weights=nce_weights, 
 biases=nce_biases, 
 labels=y_idx, 
 inputs=embed, 
 num_sampled=num_sampled, 
 num_classes=vocabulary_size)) 
 
output = tf.matmul(y_conv, tf.transpose(nce_weights)) + nce_biases
correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))
</code></pre>
When we set <code>num_sampled=1</code>, the val accuracy will end at around <code>98.8%</code>. And if we set <code>num_sampled=9</code>, we can get almost the same val accuracy as trained by softmax. But note that <code>nce</code> is different from <code>softmax</code>.
Full code of training <code>mnist</code> by <code>nce</code> can be found <a href="https://github.com/yl-1993/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_deep_nce.py" rel="nofollow noreferrer">here</a>. Hope it is helpful.

I am trying to understand the NCE loss function in Tensorflow. NCE loss is employed for a word2vec task, for instance:

<pre><code># Look up embeddings for inputs.
embeddings = tf.Variable(
 tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs)

# Construct the variables for the NCE loss
nce_weights = tf.Variable(
 tf.truncated_normal([vocabulary_size, embedding_size],
 stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

# Compute the average NCE loss for the batch.
# tf.nce_loss automatically draws a new sample of the negative labels each
# time we evaluate the loss.
loss = tf.reduce_mean(
 tf.nn.nce_loss(weights=nce_weights,
 biases=nce_biases,
 labels=train_labels,
 inputs=embed,
 num_sampled=num_sampled,
 num_classes=vocabulary_size))
</code></pre>

more details, please reference Tensorflow <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/word2vec/word2vec_basic.py" rel="noreferrer">word2vec_basic.py</a>

<ol>
<li>What are the input and output matrices in the NCE function? </li>
</ol>

In a word2vec model, we are interested in building representations for words. In the training process, given a slid window, every word will have two embeddings: 1) when the word is a centre word; 2) when the word is a context word. These two embeddings are called input and output vectors, respectively. (<a href="http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf" rel="noreferrer">more explanations of input and output matrices</a>)

In my opinion, the input matrix is <code>embeddings</code> and the output matrix is <code>nce_weights</code>. Is it right?

<ol start="2">
<li>What is the final embedding?</li>
</ol>

According to a <a href="https://stackoverflow.com/questions/37982478/what-is-the-purpose-of-weights-and-biases-in-tensorflow-word2vec-example">post</a> by s0urcer also relating to <code>nce</code>, it says the final embedding matrix is just the input matrix. While, <a href="https://stats.stackexchange.com/questions/177667/input-vector-representation-vs-output-vector-representation-in-word2vec">some others saying</a>, the <code>final_embedding=input_matrix+output_matrix</code>. Which is right/more common?

Understanding `tf.nn.nce_loss()` in tensorflow

我正在尝试理解Tensorflow中的NCE损失函数。NCE损失用于word2vec任务，例如：# Look up embeddings for inputs.embeddings = tf.Variable(    tf.random_uniform([vocabulary_size, embedding_size]...

问理解tensorflow中的`tf.nn.nce_loss()`
EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问理解tensorflow中的`tf.nn.nce_loss()`EN

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问理解tensorflow中的`tf.nn.nce_loss()`
EN