# “你所知道的word2vec都是错的”：论文和代码天壤之别，是普遍现象了？

##### 栗子 发自 凹非寺 量子位 出品 | 公众号 QbitAI

word2vec是谷歌2013年开源的语言工具。

“关于word2vec，你所知道的一切都是错的。”

## 不一样的天空

word2vec有种经典解释 (在Skip-Gram里、带负采样的那种) ，论文和数不胜数的博客都是这样写的：

(多数用word2vec做词嵌入的人类，要么是直接调用C实现，要么是调用gensim实现。gensim是从C实现上翻译过来的，连变量的名字都不变。)

### C实现长这样

syn0数组，负责某个词作为中心词时的向量。是随机初始化的。

```1https://github.com/tmikolov/word2vec/blob/20c129af10659f7c50e86e3be406df663beff438/word2vec.c#L369
2  for (a = 0; a < vocab_size; a++) for (b = 0; b < layer1_size; b++) {
3    next_random = next_random * (unsigned long long)25214903917 + 11;
4    syn0[a * layer1_size + b] =
5       (((next_random & 0xFFFF) / (real)65536) - 0.5) / layer1_size;
6  }```

syn1neg数组，负责这个词作为上下文时的向量。是零初始化的。

```1https://github.com/tmikolov/word2vec/blob/20c129af10659f7c50e86e3be406df663beff438/word2vec.c#L365
2for (a = 0; a < vocab_size; a++) for (b = 0; b < layer1_size; b++)
3  syn1neg[a * layer1_size + b] = 0;```

``` 1if (negative > 0) for (d = 0; d < negative + 1; d++) {
2  // if we are performing negative sampling, in the 1st iteration,
3  // pick a word from the context and set the dot product target to 1
4  if (d == 0) {
5    target = word;
6    label = 1;
7  } else {
8    // for all other iterations, pick a word randomly and set the dot
9    //product target to 0
10    next_random = next_random * (unsigned long long)25214903917 + 11;
11    target = table[(next_random >> 16) % table_size];
12    if (target == 0) target = next_random % (vocab_size - 1) + 1;
13    if (target == word) continue;
14    label = 0;
15  }
16  l2 = target * layer1_size;
17  f = 0;
18
19  // find dot product of original vector with negative sample vector
20  // store in f
21  for (c = 0; c < layer1_size; c++) f += syn0[c + l1] * syn1neg[c + l2];
22
23  // set g = sigmoid(f) (roughly, the actual formula is slightly more complex)
24  if (f > MAX_EXP) g = (label - 1) * alpha;
25  else if (f < -MAX_EXP) g = (label - 0) * alpha;
26  else g = (label - expTable[(int)((f + MAX_EXP) * (EXP_TABLE_SIZE / MAX_EXP / 2))]) * alpha;
27
28  // 1. update the vector syn1neg,
29  // 2. DO NOT UPDATE syn0
30  // 3. STORE THE syn0 gradient in a temporary buffer neu1e
31  for (c = 0; c < layer1_size; c++) neu1e[c] += g * syn1neg[c + l2];
32  for (c = 0; c < layer1_size; c++) syn1neg[c + l2] += g * syn0[c + l1];
33}
34// Finally, after all samples, update syn1 from neu1e
35https://github.com/tmikolov/word2vec/blob/20c129af10659f7c50e86e3be406df663beff438/word2vec.c#L541
36// Learn weights input -> hidden
37for (c = 0; c < layer1_size; c++) syn0[c + l1] += neu1e[c];```

## 传送门

https://github.com/bollu/bollu.github.io

https://news.ycombinator.com/item?id=20089515

0 条评论