diff --git a/chapter_natural-language-processing/word2vec.md b/chapter_natural-language-processing/word2vec.md index f201b6e26ffd1d537bd51aebe4f87d21f98fc461..675836646cedf347a535556a94f30b94985acfa5 100644 --- a/chapter_natural-language-processing/word2vec.md +++ b/chapter_natural-language-processing/word2vec.md @@ -130,15 +130,7 @@ $$ - \text{log} \mathbb{P} (w_o \mid w_c) = -\text{log} \frac{1}{1+\text{exp}(-\ 当我们把$K$取较小值时,每次随机梯度下降的梯度计算开销将由$\mathcal{O}(|\mathcal{V}|)$降为$\mathcal{O}(K)$。 -我们也可以对连续词袋模型进行负采样。有关背景词$w^{(t-m)}, \ldots, w^{(t-1)}, w^{(t+1)}, \ldots, w^{(t+m)}$生成中心词$w_c$的损失函数 -$$-\text{log} \mathbb{P}(w^{(t)} \mid w^{(t-m)}, \ldots, w^{(t-1)}, w^{(t+1)}, \ldots, w^{(t+m)})$$ - -在负采样中可以近似为 - -$$-\text{log} \frac{1}{1+\text{exp}[-\mathbf{u}_c^\top (\mathbf{v}_{o_1} + \ldots + \mathbf{v}_{o_{2m}}) /(2m)]} - \sum_{k=1, w_k \sim \mathbb{P}(w)}^K \text{log} \frac{1}{1+\text{exp}[(\mathbf{u}_{i_k}^\top (\mathbf{v}_{o_1} + \ldots + \mathbf{v}_{o_{2m}}) /(2m)]}$$ - -同样地,当我们把$K$取较小值时,每次随机梯度下降的梯度计算开销将由$\mathcal{O}(|\mathcal{V}|)$降为$\mathcal{O}(K)$。 ## 层序softmax