seq2seq with attention added (#882)

* seq2seq with attention added * image_search use latest API * update seq2seq with attention

seq2seq with attention added (#882)
* seq2seq with attention added * image_search use latest API * update seq2seq with attention
8dda1694 · jzhang533 · GitHub · 04af4d42 · 8dda1694 · 8dda1694
2 changed file
--- a/paddle2.0_docs/image_search/image_search.ipynb
+++ b/paddle2.0_docs/image_search/image_search.ipynb
--- a/paddle2.0_docs/seq2seq_with_attention/seq2seq_with_attention.ipynb
+++ b/paddle2.0_docs/seq2seq_with_attention/seq2seq_with_attention.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 使用注意力机制的LSTM的机器翻译\n",
+    "\n",
+    "本示例教程介绍如何使用飞桨完成一个机器翻译任务。我们将会使用飞桨提供的LSTM的API，组建一个`sequence to sequence with attention`的机器翻译的模型，并在示例的数据集上完成从英文翻译成中文的机器翻译。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 环境设置\n",
+    "\n",
+    "本示例教程基于飞桨2.0版本。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.0.0\n",
+      "89af2088b6e74bdfeef2d4d78e08461ed2aafee5\n"
+     ]
+    }
+   ],
+   "source": [
+    "import paddle\n",
+    "import paddle.nn.functional as F\n",
+    "import re\n",
+    "import numpy as np\n",
+    "\n",
+    "paddle.disable_static()\n",
+    "print(paddle.__version__)\n",
+    "print(paddle.__git_commit__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 下载数据集\n",
+    "\n",
+    "我们将使用 [http://www.manythings.org/anki/](http://www.manythings.org/anki/) 提供的中英文的英汉句对作为数据集，来完成本任务。该数据集含有23610个中英文双语的句对。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--2020-09-04 14:06:10--  https://www.manythings.org/anki/cmn-eng.zip\n",
+      "Resolving www.manythings.org (www.manythings.org)... 104.24.108.196, 104.24.109.196, 172.67.173.198, ...\n",
+      "Connecting to www.manythings.org (www.manythings.org)|104.24.108.196|:443... connected.\n",
+      "HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable\n",
+      "\n",
+      "    The file is already fully retrieved; nothing to do.\n",
+      "\n",
+      "Archive:  cmn-eng.zip\n"
+     ]
+    }
+   ],
+   "source": [
+    "!wget -c https://www.manythings.org/anki/cmn-eng.zip && unzip -f cmn-eng.zip"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "   23610 cmn.txt\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!wc -l cmn.txt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 构建双语句对的数据结构\n",
+    "\n",
+    "接下来我们通过处理下载下来的双语句对的文本文件，将双语句对读入到python的数据结构中。这里做了如下的处理。\n",
+    "\n",
+    "- 对于英文，首先会把全部英文都变成小写，并只保留英文的单词。\n",
+    "- 对于中文，为了简便起见，未做分词，按照字做了切分。\n",
+    "- 为了后续的程序运行的更快，我们通过限制句子长度，和只保留部分英文单词开头的句子的方式，得到了一个较小的数据集。这样得到了一个有5508个句对的数据集。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "MAX_LEN = 10"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "5508\n",
+      "(['i', 'won'], ['我', '赢', '了', '。'])\n",
+      "(['he', 'ran'], ['他', '跑', '了', '。'])\n",
+      "(['i', 'quit'], ['我', '退', '出', '。'])\n",
+      "(['i', 'm', 'ok'], ['我', '沒', '事', '。'])\n",
+      "(['i', 'm', 'up'], ['我', '已', '经', '起', '来', '了', '。'])\n",
+      "(['we', 'try'], ['我', '们', '来', '试', '试', '。'])\n",
+      "(['he', 'came'], ['他', '来', '了', '。'])\n",
+      "(['he', 'runs'], ['他', '跑', '。'])\n",
+      "(['i', 'agree'], ['我', '同', '意', '。'])\n",
+      "(['i', 'm', 'ill'], ['我', '生', '病', '了', '。'])\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "lines = open('cmn.txt', encoding='utf-8').read().strip().split('\\n')\n",
+    "words_re = re.compile(r'\\w+')\n",
+    "\n",
+    "pairs = []\n",
+    "for l in lines:\n",
+    "    en_sent, cn_sent, _ = l.split('\\t')\n",
+    "    pairs.append((words_re.findall(en_sent.lower()), list(cn_sent)))\n",
+    "\n",
+    "# create a smaller dataset to make the demo process faster\n",
+    "filtered_pairs = []\n",
+    "\n",
+    "for x in pairs:\n",
+    "    if len(x[0]) < MAX_LEN and len(x[1]) < MAX_LEN and \\\n",
+    "    x[0][0] in ('i', 'you', 'he', 'she', 'we', 'they'):\n",
+    "        filtered_pairs.append(x)\n",
+    "\n",
+    "            \n",
+    "print(len(filtered_pairs))\n",
+    "for x in filtered_pairs[:10]: print(x) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#  创建词表\n",
+    "\n",
+    "接下来我们分别创建中英文的词表，这两份词表会用来将中文的句子转换为词的ID构成的序列。词表中还加入了如下三个特殊的词：\n",
+    "- `<pad>`: 用来对较短的句子进行填充。\n",
+    "- `<bos>`: \"begin of sentence\"， 表示句子的开始的特殊词。\n",
+    "- `<eos>`: \"end of sentence\"， 表示句子的结束的特殊词。\n",
+    "\n",
+    "Note: 在实际的任务中，可能还需要通过`<unk>`（或者`<oov>`）特殊词来表示未在词表中出现的词。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2539\n",
+      "2039\n"
+     ]
+    }
+   ],
+   "source": [
+    "en_vocab = {}\n",
+    "cn_vocab = {}\n",
+    "\n",
+    "# create special token for unkown, begin of sentence, end of sentence\n",
+    "en_vocab['<pad>'], en_vocab['<bos>'], en_vocab['<eos>'] = 0, 1, 2\n",
+    "cn_vocab['<pad>'], cn_vocab['<bos>'], cn_vocab['<eos>'] = 0, 1, 2\n",
+    "\n",
+    "#print(en_vocab, cn_vocab)\n",
+    "\n",
+    "en_idx, cn_idx = 3, 3\n",
+    "\n",
+    "for en, cn in filtered_pairs:\n",
+    "    for w in en: \n",
+    "        if w not in en_vocab: \n",
+    "            en_vocab[w] = en_idx\n",
+    "            en_idx += 1\n",
+    "    for w in cn:  \n",
+    "        if w not in cn_vocab: \n",
+    "            cn_vocab[w] = cn_idx\n",
+    "            cn_idx += 1\n",
+    "\n",
+    "print(len(list(en_vocab)))\n",
+    "print(len(list(cn_vocab)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 创建padding过的数据集\n",
+    "\n",
+    "接下来根据词表，我们将会创建一份实际的用于训练的用numpy array组织起来的数据集。\n",
+    "- 所有的句子都通过`<pad>`补充成为了长度相同的句子。\n",
+    "- 对于英文句子（源语言），我们将其反转了过来，这会带来更好的翻译的效果。\n",
+    "- 所创建的`padded_cn_label_sents`是训练过程中的预测的目标，即，每个中文的当前词去预测下一个词是什么词。\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(5508, 11)\n",
+      "(5508, 12)\n",
+      "(5508, 12)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# create padded datasets\n",
+    "padded_en_sents = []\n",
+    "padded_cn_sents = []\n",
+    "padded_cn_label_sents = []\n",
+    "for en, cn in filtered_pairs:\n",
+    "    # reverse source sentence\n",
+    "    padded_en_sent = en + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(en))\n",
+    "    padded_en_sent.reverse()\n",
+    "    padded_cn_sent = ['<bos>'] + cn + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(cn))\n",
+    "    padded_cn_label_sent = cn + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(cn) + 1) \n",
+    "\n",
+    "    padded_en_sents.append([en_vocab[w] for w in padded_en_sent])\n",
+    "    padded_cn_sents.append([cn_vocab[w] for w in padded_cn_sent])\n",
+    "    padded_cn_label_sents.append([cn_vocab[w] for w in padded_cn_label_sent])\n",
+    "\n",
+    "train_en_sents = np.array(padded_en_sents)\n",
+    "train_cn_sents = np.array(padded_cn_sents)\n",
+    "train_cn_label_sents = np.array(padded_cn_label_sents)\n",
+    "\n",
+    "\n",
+    "print(train_en_sents.shape)\n",
+    "print(train_cn_sents.shape)\n",
+    "print(train_cn_label_sents.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 创建网络\n",
+    "\n",
+    "我们将会创建一个Encoder-AttentionDecoder架构的模型结构用来完成机器翻译任务。\n",
+    "首先我们将设置一些必要的网络结构中用到的参数。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "embedding_size = 128\n",
+    "hidden_size = 256\n",
+    "num_encoder_lstm_layers = 1\n",
+    "en_vocab_size = len(list(en_vocab))\n",
+    "cn_vocab_size = len(list(cn_vocab))\n",
+    "epochs = 30\n",
+    "batch_size = 16"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Encoder部分\n",
+    "\n",
+    "在编码器的部分，我们通过查找完Embedding之后接一个LSTM的方式构建一个对源语言编码的网络。飞桨的RNN系列的API，除了LSTM之外，还提供了SimleRNN, GRU供使用，同时，还可以使用反向RNN，双向RNN，多层RNN等形式。也可以通过`dropout`参数设置是否对多层RNN的中间层进行`dropout`处理，来防止过拟合。\n",
+    "\n",
+    "除了使用序列到序列的RNN操作之外，也可以通过SimpleRNN, GRUCell, LSTMCell等API更灵活的创建单步的RNN计算，甚至通过集成RNNCellBase来实现自己的RNN计算单元。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# encoder: simply learn representation of source sentence\n",
+    "class Encoder(paddle.nn.Layer):\n",
+    "    def __init__(self):\n",
+    "        super(Encoder, self).__init__()\n",
+    "        self.emb = paddle.nn.Embedding(en_vocab_size, embedding_size,)\n",
+    "        self.lstm = paddle.nn.LSTM(input_size=embedding_size, \n",
+    "                                   hidden_size=hidden_size, \n",
+    "                                   num_layers=num_encoder_lstm_layers)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        x = self.emb(x)\n",
+    "        x, (_, _) = self.lstm(x)\n",
+    "        return x"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# AttentionDecoder部分\n",
+    "\n",
+    "在解码器部分，我们通过一个带有注意力机制的LSTM来完成解码。\n",
+    "\n",
+    "- 单步的LSTM：在解码器的实现的部分，我们同样使用LSTM，与Encoder部分不同的是，下面的代码，每次只让LSTM往前计算一次。整体的recurrent部分，是在训练循环内完成的。\n",
+    "- 注意力机制：这里使用了一个由两个Linear组成的网络来完成注意力机制的计算，它用来计算出目标语言在每次翻译一个词的时候，需要对源语言当中的每个词需要赋予多少的权重。\n",
+    "- 对于第一次接触这样的网络结构来说，下面的代码在理解起来可能稍微有些复杂，你可以通过插入打印每个tensor在不同步骤时的形状的方式来更好的理解。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# only move one step of LSTM, \n",
+    "# the recurrent loop is implemented inside training loop\n",
+    "class AttentionDecoder(paddle.nn.Layer):\n",
+    "    def __init__(self):\n",
+    "        super(AttentionDecoder, self).__init__()\n",
+    "        self.emb = paddle.nn.Embedding(cn_vocab_size, embedding_size)\n",
+    "        \n",
+    "        # the lstm layer for to generate target sentence representation\n",
+    "        self.lstm = paddle.nn.LSTM(input_size=embedding_size + hidden_size, \n",
+    "                                   hidden_size=hidden_size)\n",
+    "        \n",
+    "        # for computing attention weights\n",
+    "        self.attention_linear1 = paddle.nn.Linear(hidden_size * 2, hidden_size)\n",
+    "        self.attention_linear2 = paddle.nn.Linear(hidden_size, 1)\n",
+    "        \n",
+    "        # for computing output logits\n",
+    "        self.outlinear =paddle.nn.Linear(hidden_size, cn_vocab_size)\n",
+    "\n",
+    "\n",
+    "    def forward(self, x, previous_hidden, previous_cell, encoder_outputs):\n",
+    "        x = self.emb(x)\n",
+    "        \n",
+    "        attention_inputs = paddle.concat((encoder_outputs, \n",
+    "                                      paddle.tile(previous_hidden, repeat_times=[1, MAX_LEN+1, 1])),\n",
+    "                                      axis=-1\n",
+    "                                     )\n",
+    "\n",
+    "        attention_hidden = self.attention_linear1(attention_inputs)\n",
+    "        attention_hidden = F.tanh(attention_hidden)\n",
+    "        attention_logits = self.attention_linear2(attention_hidden)\n",
+    "        attention_logits = paddle.squeeze(attention_logits)\n",
+    "\n",
+    "        \n",
+    "        attention_weights = F.softmax(attention_logits)        \n",
+    "        attention_weights = paddle.expand_as(paddle.unsqueeze(attention_weights, -1), \n",
+    "                                             encoder_outputs)\n",
+    "\n",
+    "        context_vector = paddle.multiply(encoder_outputs, attention_weights)               \n",
+    "        context_vector = paddle.reduce_sum(context_vector, 1)\n",
+    "        context_vector = paddle.unsqueeze(context_vector, 1)\n",
+    "        \n",
+    "        lstm_input = paddle.concat((x, context_vector), axis=-1)\n",
+    "\n",
+    "        # LSTM requirement to previous hidden/state: \n",
+    "        # (number_of_layers * direction, batch, hidden)\n",
+    "        previous_hidden = paddle.transpose(previous_hidden, [1, 0, 2])\n",
+    "        previous_cell = paddle.transpose(previous_cell, [1, 0, 2])\n",
+    "        \n",
+    "        x, (hidden, cell) = self.lstm(lstm_input, (previous_hidden, previous_cell))\n",
+    "        \n",
+    "        # change the return to (batch, number_of_layers * direction, hidden)\n",
+    "        hidden = paddle.transpose(hidden, [1, 0, 2])\n",
+    "        cell = paddle.transpose(cell, [1, 0, 2])\n",
+    "\n",
+    "        output = self.outlinear(hidden)\n",
+    "        output = paddle.squeeze(output)\n",
+    "        return output, (hidden, cell)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 训练模型\n",
+    "\n",
+    "接下来我们开始训练模型。\n",
+    "\n",
+    "- 在每个epoch开始之前，我们对训练数据进行了随机打乱。\n",
+    "- 我们通过多次调用`atten_decoder`，在这里实现了解码时的recurrent循环。\n",
+    "- `teacher forcing`策略: 在每次解码下一个词时，我们给定了训练数据当中的真实词作为了预测下一个词时的输入。相应的，你也可以尝试用模型预测的结果作为下一个词的输入。（或者混合使用）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch:0\n",
+      "iter 0, loss:[7.618719]\n",
+      "iter 200, loss:[2.9712436]\n",
+      "epoch:1\n",
+      "iter 0, loss:[2.926154]\n",
+      "iter 200, loss:[2.8847036]\n",
+      "epoch:2\n",
+      "iter 0, loss:[2.9981458]\n",
+      "iter 200, loss:[3.099761]\n",
+      "epoch:3\n",
+      "iter 0, loss:[2.6152773]\n",
+      "iter 200, loss:[2.5736806]\n",
+      "epoch:4\n",
+      "iter 0, loss:[2.418916]\n",
+      "iter 200, loss:[2.0204105]\n",
+      "epoch:5\n",
+      "iter 0, loss:[2.0660372]\n",
+      "iter 200, loss:[1.997014]\n",
+      "epoch:6\n",
+      "iter 0, loss:[1.7394348]\n",
+      "iter 200, loss:[1.9713217]\n",
+      "epoch:7\n",
+      "iter 0, loss:[2.2450879]\n",
+      "iter 200, loss:[1.8005365]\n",
+      "epoch:8\n",
+      "iter 0, loss:[1.7562586]\n",
+      "iter 200, loss:[1.8237668]\n",
+      "epoch:9\n",
+      "iter 0, loss:[1.3632518]\n",
+      "iter 200, loss:[1.6413273]\n",
+      "epoch:10\n",
+      "iter 0, loss:[1.0960134]\n",
+      "iter 200, loss:[1.4547268]\n",
+      "epoch:11\n",
+      "iter 0, loss:[1.4081496]\n",
+      "iter 200, loss:[1.4078153]\n",
+      "epoch:12\n",
+      "iter 0, loss:[1.1659987]\n",
+      "iter 200, loss:[1.1858114]\n",
+      "epoch:13\n",
+      "iter 0, loss:[1.3759178]\n",
+      "iter 200, loss:[1.2046292]\n",
+      "epoch:14\n",
+      "iter 0, loss:[0.8987882]\n",
+      "iter 200, loss:[1.1897587]\n",
+      "epoch:15\n",
+      "iter 0, loss:[0.83738756]\n",
+      "iter 200, loss:[0.78109366]\n",
+      "epoch:16\n",
+      "iter 0, loss:[0.84268856]\n",
+      "iter 200, loss:[0.9557387]\n",
+      "epoch:17\n",
+      "iter 0, loss:[0.643647]\n",
+      "iter 200, loss:[0.9286504]\n",
+      "epoch:18\n",
+      "iter 0, loss:[0.5729206]\n",
+      "iter 200, loss:[0.6324647]\n",
+      "epoch:19\n",
+      "iter 0, loss:[0.6614718]\n",
+      "iter 200, loss:[0.5292754]\n",
+      "epoch:20\n",
+      "iter 0, loss:[0.45713213]\n",
+      "iter 200, loss:[0.6192503]\n",
+      "epoch:21\n",
+      "iter 0, loss:[0.36670336]\n",
+      "iter 200, loss:[0.41927388]\n",
+      "epoch:22\n",
+      "iter 0, loss:[0.3294798]\n",
+      "iter 200, loss:[0.4599006]\n",
+      "epoch:23\n",
+      "iter 0, loss:[0.29158494]\n",
+      "iter 200, loss:[0.27783182]\n",
+      "epoch:24\n",
+      "iter 0, loss:[0.24686475]\n",
+      "iter 200, loss:[0.34916434]\n",
+      "epoch:25\n",
+      "iter 0, loss:[0.26881775]\n",
+      "iter 200, loss:[0.2400788]\n",
+      "epoch:26\n",
+      "iter 0, loss:[0.20649]\n",
+      "iter 200, loss:[0.212987]\n",
+      "epoch:27\n",
+      "iter 0, loss:[0.12560298]\n",
+      "iter 200, loss:[0.17958683]\n",
+      "epoch:28\n",
+      "iter 0, loss:[0.13129365]\n",
+      "iter 200, loss:[0.14788578]\n",
+      "epoch:29\n",
+      "iter 0, loss:[0.07885154]\n",
+      "iter 200, loss:[0.14729765]\n"
+     ]
+    }
+   ],
+   "source": [
+    "encoder = Encoder()\n",
+    "atten_decoder = AttentionDecoder()\n",
+    "\n",
+    "opt = paddle.optimizer.Adam(learning_rate=0.001, \n",
+    "                            parameters=encoder.parameters()+atten_decoder.parameters())\n",
+    "\n",
+    "for epoch in range(epochs):\n",
+    "    print(\"epoch:{}\".format(epoch))\n",
+    "\n",
+    "    # shuffle training data\n",
+    "    perm = np.random.permutation(len(train_en_sents))\n",
+    "    train_en_sents_shuffled = train_en_sents[perm]\n",
+    "    train_cn_sents_shuffled = train_cn_sents[perm]\n",
+    "    train_cn_label_sents_shuffled = train_cn_label_sents[perm]\n",
+    "\n",
+    "    for iteration in range(train_en_sents_shuffled.shape[0] // batch_size):\n",
+    "        x_data = train_en_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]\n",
+    "        sent = paddle.to_tensor(x_data)\n",
+    "        en_repr = encoder(sent)\n",
+    "\n",
+    "        x_cn_data = train_cn_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]\n",
+    "        x_cn_label_data = train_cn_label_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]\n",
+    "\n",
+    "        # shape: (batch,  num_layer(=1 here) * num_of_direction(=1 here) * hidden_size)\n",
+    "        hidden = paddle.zeros([batch_size, 1, hidden_size])\n",
+    "        cell = paddle.zeros([batch_size, 1, hidden_size])\n",
+    "\n",
+    "        loss = paddle.zeros([1])\n",
+    "        # the decoder recurrent loop mentioned above\n",
+    "        for i in range(MAX_LEN + 2):\n",
+    "            cn_word = paddle.to_tensor(x_cn_data[:,i:i+1])\n",
+    "            cn_word_label = paddle.to_tensor(x_cn_label_data[:,i:i+1])\n",
+    "\n",
+    "            logits, (hidden, cell) = atten_decoder(cn_word, hidden, cell, en_repr)\n",
+    "            step_loss = F.softmax_with_cross_entropy(logits, cn_word_label)\n",
+    "            avg_step_loss = paddle.mean(step_loss)\n",
+    "            loss += avg_step_loss\n",
+    "\n",
+    "        loss = loss / (MAX_LEN + 2)\n",
+    "        if(iteration % 200 == 0):\n",
+    "            print(\"iter {}, loss:{}\".format(iteration, loss.numpy()))\n",
+    "\n",
+    "        loss.backward()\n",
+    "        opt.minimize(loss)\n",
+    "        encoder.clear_gradients()\n",
+    "        atten_decoder.clear_gradients()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 使用模型进行机器翻译\n",
+    "\n",
+    "完成上面的模型训练之后，我们可以得到一个能够从英文翻译成中文的机器翻译模型。接下来我们通过一个greedy search来实现使用该模型完成实际的机器翻译。（实际的任务中，你可能需要用beam search算法来提升效果）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "he is poor\n",
+      "true: 他很穷。\n",
+      "pred: 他很穷。\n",
+      "i lent him a cd\n",
+      "true: 我借给他一盘CD。\n",
+      "pred: 我借给他一盘CD。\n",
+      "i m not so brave\n",
+      "true: 我没那么勇敢。\n",
+      "pred: 我没那么勇敢。\n",
+      "he goes to bed at eight o clock\n",
+      "true: 他八點上床睡覺。\n",
+      "pred: 他八點鐘也會遲到。\n",
+      "i know how old you are\n",
+      "true: 我知道你多大了。\n",
+      "pred: 我知道你多大了。\n",
+      "i m a detective\n",
+      "true: 我是个侦探。\n",
+      "pred: 我是个侦探。\n",
+      "i am the fastest runner\n",
+      "true: 我是跑得最快的人。\n",
+      "pred: 我是最快的跑者。\n",
+      "he got down the book from the shelf\n",
+      "true: 他從架上拿下書。\n",
+      "pred: 他從架上拿下書。\n",
+      "he arrived at the station at seven\n",
+      "true: 他7点到了火车站。\n",
+      "pred: 他7点到了火车站。\n",
+      "he fell down on the floor\n",
+      "true: 他摔倒在地。\n",
+      "pred: 他摔倒在地。\n"
+     ]
+    }
+   ],
+   "source": [
+    "encoder.eval()\n",
+    "atten_decoder.eval()\n",
+    "\n",
+    "num_of_exampels_to_evaluate = 10\n",
+    "\n",
+    "indices = np.random.choice(len(train_en_sents),  num_of_exampels_to_evaluate, replace=False)\n",
+    "x_data = train_en_sents[indices]\n",
+    "sent = paddle.to_tensor(x_data)\n",
+    "en_repr = encoder(sent)\n",
+    "\n",
+    "word = np.array(\n",
+    "    [[cn_vocab['<bos>']]] * num_of_exampels_to_evaluate\n",
+    ")\n",
+    "word = paddle.to_tensor(word)\n",
+    "\n",
+    "hidden = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])\n",
+    "cell = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])\n",
+    "\n",
+    "decoded_sent = []\n",
+    "for i in range(MAX_LEN + 2):\n",
+    "    logits, (hidden, cell) = atten_decoder(word, hidden, cell, en_repr)\n",
+    "\n",
+    "    word = paddle.argmax(logits, axis=1)\n",
+    "    decoded_sent.append(word.numpy())\n",
+    "    word = paddle.unsqueeze(word, axis=-1)\n",
+    "    \n",
+    "results = np.stack(decoded_sent, axis=1)\n",
+    "for i in range(num_of_exampels_to_evaluate):\n",
+    "    en_input = \" \".join(filtered_pairs[indices[i]][0])\n",
+    "    ground_truth_translate = \"\".join(filtered_pairs[indices[i]][1])\n",
+    "    model_translate = \"\"\n",
+    "    for k in results[i]:\n",
+    "        w = list(cn_vocab)[k]\n",
+    "        if w != '<pad>' and w != '<eos>':\n",
+    "            model_translate += w\n",
+    "    print(en_input)\n",
+    "    print(\"true: {}\".format(ground_truth_translate))\n",
+    "    print(\"pred: {}\".format(model_translate))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The End\n",
+    "\n",
+    "你还可以通过变换网络结构，调整数据集，尝试不同的参数的方式来进一步提升本示例当中的机器翻译的效果。同时，也可以尝试在其他的类似的任务中用飞桨来完成实际的实践。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}