# Step 5\. Do the backward pass and update the gradient
# 步骤 5\. 反向传播更新梯度
loss.backward()
loss.backward()
optimizer.step()
optimizer.step()
# Get the Python number from a 1-element Tensor by calling tensor.item()
# 通过调tensor.item()得到单个Python数值。
total_loss+=loss.item()
total_loss+=loss.item()
losses.append(total_loss)
losses.append(total_loss)
print(losses)# The loss decreased every iteration over the training data!
print(losses)# 用训练数据每次迭代,损失函数都会下降。
```
```
...
@@ -194,25 +190,25 @@ print(losses) # The loss decreased every iteration over the training data!
...
@@ -194,25 +190,25 @@ print(losses) # The loss decreased every iteration over the training data!
```
```
## Exercise: Computing Word Embeddings: Continuous Bag-of-Words
## 练习:计算连续词袋模型的词向量
The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word. This is distinct from language modeling, since CBOW is not sequential and does not have to be probabilistic. Typcially, CBOW is used to quickly train word embeddings, and these embeddings are used to initialize the embeddings of some more complicated model. Usually, this is referred to as _pretraining embeddings_. It almost always helps performance a couple of percent.
The CBOW model is as follows. Given a target word `\(w_i\)` and an `\(N\)` context window on each side, `\(w_{i-1}, \dots, w_{i-N}\)` and `\(w_{i+1}, \dots, w_{i+N}\)`, referring to all context words collectively as `\(C\)`, CBOW tries to minimize