restore 0.005

e0fe8912 · Aston Zhang · 303e427d · e0fe8912
隐藏空白更改
内联并排

Showing with 7 addition and 7 deletion

chapter_natural-language-processing/word2vec-gluon.md chapter_natural-language-processing/word2vec-gluon.md +7 -7

未找到文件。
--- a/chapter_natural-language-processing/word2vec-gluon.md
+++ b/chapter_natural-language-processing/word2vec-gluon.md
@@ -251,7 +251,7 @@ loss = gloss.SigmoidBinaryCrossEntropyLoss()

 值得一提的是，我们可以通过掩码变量指定小批量中参与损失函数计算的部分预测值和标签：当掩码为1时，相应位置的预测值和标签将参与损失函数的计算；当掩码为0时，相应位置的预测值和标签则不参与损失函数的计算。我们之前提到，掩码变量可用于避免填充项对损失函数计算的影响。

-```{.python .input}
+```{.python .input  n=20}
 pred = nd.array([[1.5, 0.3, -1, 2], [1.1, -0.6, 2.2, 0.4]])
 # 标签变量 label 中的 1 和 0 分别代表背景词和噪音词。
 label = nd.array([[1, 0, 0, 0], [1, 1, 0, 0]])
@@ -261,7 +261,7 @@ loss(pred, label, mask) * mask.shape[1] / mask.sum(axis=1)

 作为比较，下面将从零开始实现二元交叉熵损失函数的计算，并根据掩码变量`mask`计算掩码为1的预测值和标签的损失。

-```{.python .input}
+```{.python .input  n=21}
 def sigmd(x):
    return -math.log(1 / (1 + math.exp(-x)))

@@ -273,7 +273,7 @@ print('%.7f' % ((sigmd(1.1) + sigmd(-0.6) + sigmd(-2.2)) / 3))

 我们分别构造中心词和背景词的嵌入层，并将超参数词向量维度`embed_size`设置成100。

-```{.python .input  n=20}
+```{.python .input  n=22}
 embed_size = 100
 net = nn.Sequential()
 net.add(nn.Embedding(input_dim=len(idx_to_token), output_dim=embed_size),
@@ -284,7 +284,7 @@ net.add(nn.Embedding(input_dim=len(idx_to_token), output_dim=embed_size),

 下面定义训练函数。由于填充项的存在，跟之前的训练函数相比，损失函数的计算稍有不同。

-```{.python .input  n=21}
+```{.python .input  n=23}
 def train(net, lr, num_epochs):
    ctx = gb.try_gpu()
    net.initialize(ctx=ctx, force_reinit=True)
@@ -310,15 +310,15 @@ def train(net, lr, num_epochs):

 现在我们可以训练使用负采样的跳字模型了。

-```{.python .input  n=22}
-train(net, 0.00005, 8)
+```{.python .input  n=24}
+train(net, 0.005, 8)
 ```

 ## 应用词嵌入模型

 当训练好词嵌入模型后，我们可以根据两个词向量的余弦相似度表示词与词之间在语义上的相似度。可以看到，使用训练得到的词嵌入模型时，与词“chip”语义最接近的词大多与芯片有关。

-```{.python .input  n=23}
+```{.python .input  n=25}
 def get_similar_tokens(query_token, k, embed):
    W = embed.weight.data()
    x = W[token_to_idx[query_token]]