resolve ipynb conflicts

5f5910a9 · gongweibao · 8bdc3588 · 5f5910a9 · 5f5910a9 · 5f5910a9
4 changed file
--- a/fit_a_line/README.en.ipynb
+++ b/fit_a_line/README.en.ipynb
@@ -7,7 +7,7 @@
        "# Linear Regression\n",
        "Let us begin the tutorial with a classical problem called Linear Regression \\[[1](#References)\\]. In this chapter, we will train a model from a realistic dataset to predict home prices. Some important concepts in Machine Learning will be covered through this example.\n",
        "\n",
-        "The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).\n",
+        "The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).\n",
        "\n",
        "## Problem Setup\n",
        "Suppose we have a dataset of $n$ real estate properties. These real estate properties will be referred to as *homes* in this chapter for clarity.\n",
@@ -384,7 +384,7 @@
        "4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.\n",
        "\n",
        "\u003cbr/\u003e\n",
-        "\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003e\u003cimg alt=\"Common Creative License\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png\" /\u003e\u003c/a\u003e This tutorial was created and published with [Creative Common License 4.0](http://creativecommons.org/licenses/by-nc-sa/4.0/).\n"
+        "This tutorial is contributed by \u003ca xmlns:cc=\"http://creativecommons.org/ns#\" href=\"http://book.paddlepaddle.org\" property=\"cc:attributionName\" rel=\"cc:attributionURL\"\u003ePaddlePaddle\u003c/a\u003e, and licensed under a \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003eCreative Commons Attribution-NonCommercial-ShareAlike 4.0 International License\u003c/a\u003e.\n"
      ]
    }
  ],

--- a/fit_a_line/README.ipynb
+++ b/fit_a_line/README.ipynb
@@ -7,7 +7,7 @@
        "# 线性回归\n",
        "让我们从经典的线性回归（Linear Regression \\[[1](#参考文献)\\]）模型开始这份教程。在这一章里，你将使用真实的数据集建立起一个房价预测模型，并且了解到机器学习中的若干重要概念。\n",
        "\n",
-        "本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。\n",
+        "本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。\n",
        "\n",
        "## 背景介绍\n",
        "给定一个大小为$n$的数据集  ${\\{y_{i}, x_{i1}, ..., x_{id}\\}}_{i=1}^{n}$，其中$x_{i1}, \\ldots, x_{id}$是第$i$个样本$d$个属性上的取值，$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述，即\n",
@@ -21,8 +21,8 @@
        "## 效果展示\n",
        "我们使用从[UCI Housing Data Set](https://archive.ics.uci.edu/ml/datasets/Housing)获得的波士顿房价数据集进行模型的训练和预测。下面的散点图展示了使用模型对部分房屋价格进行的预测。其中，每个点的横坐标表示同一类房屋真实价格的中位数，纵坐标表示线性回归模型根据特征预测的结果，当二者值完全相等的时候就会落在虚线上。所以模型预测得越准确，则点离虚线越近。\n",
        "\u003cp align=\"center\"\u003e\n",
-        "\t\u003cimg src = \"image/predictions.png\" width=400\u003e\u003cbr/\u003e\n",
+        "    \u003cimg src = \"image/predictions.png\" width=400\u003e\u003cbr/\u003e\n",
-        "\t图1. 预测值 V.S. 真实值\n",
+        "    图1. 预测值 V.S. 真实值\n",
        "\u003c/p\u003e\n",
        "\n",
        "## 模型概览\n",
@@ -124,8 +124,8 @@
        "- 很多的机器学习技巧/模型（例如L1，L2正则项，向量空间模型-Vector Space Model）都基于这样的假设：所有的属性取值都差不多是以0为均值且取值范围相近的。\n",
        "\n",
        "\u003cp align=\"center\"\u003e\n",
-        "\t\u003cimg src = \"image/ranges.png\" width=550\u003e\u003cbr/\u003e\n",
+        "    \u003cimg src = \"image/ranges.png\" width=550\u003e\u003cbr/\u003e\n",
-        "\t图2. 各维属性的取值范围\n",
+        "    图2. 各维属性的取值范围\n",
        "\u003c/p\u003e\n",
        "\n",
        "#### 整理训练集与测试集\n",

--- a/recommender_system/README.en.ipynb
+++ b/recommender_system/README.en.ipynb
@@ -8,6 +8,9 @@
        "\n",
        "The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system).\n",
        "\n",
+        "For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).\n",
+        "\n",
+        "\n",
        "## Background\n",
        "\n",
        "With the fast growth of e-commerce, online videos, and online reading business, users have to rely on recommender systems to avoid manually browsing tremendous volume of choices.  Recommender systems understand users' interest by mining user behavior and other properties of users and products.\n",
@@ -82,22 +85,617 @@
        "\n",
        "## Dataset\n",
        "\n",
-        "We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model.  This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies.  Each rate is in the range of 1~5.  Thanks to GroupLens Research for collecting, processing and publishing the dataset.  \n",
+        "We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model.  This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies.  Each rate is in the range of 1~5.  Thanks to GroupLens Research for collecting, processing and publishing the dataset.\n",
+        "\n",
+        "`paddle.v2.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "# Run this block to show dataset's documentation\n",
+        "help(paddle.v2.dataset.movielens)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "The raw `MoiveLens` contains movie ratings, relevant features from both movies and users.\n",
+        "For instance, one movie's feature could be:\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "movie_info = paddle.dataset.movielens.movie_info()\n",
+        "print movie_info.values()[0]\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "```text\n",
+        "\u003cMovieInfo id(1), title(Toy Story), categories(['Animation', \"Children's\", 'Comedy'])\u003e\n",
+        "```\n",
+        "\n",
+        "One user's feature could be:\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "user_info = paddle.dataset.movielens.user_info()\n",
+        "print user_info.values()[0]\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "```text\n",
+        "\u003cUserInfo id(1), gender(F), age(1), job(10)\u003e\n",
+        "```\n",
+        "\n",
+        "In this dateset, the distribution of age is shown as follows:\n",
+        "\n",
+        "```text\n",
+        "1: \"Under 18\"\n",
+        "18: \"18-24\"\n",
+        "25: \"25-34\"\n",
+        "35: \"35-44\"\n",
+        "45: \"45-49\"\n",
+        "50: \"50-55\"\n",
+        "56: \"56+\"\n",
+        "```\n",
+        "\n",
+        "User's occupation is selected from the following options:\n",
+        "\n",
+        "```text\n",
+        "0: \"other\" or not specified\n",
+        "1: \"academic/educator\"\n",
+        "2: \"artist\"\n",
+        "3: \"clerical/admin\"\n",
+        "4: \"college/grad student\"\n",
+        "5: \"customer service\"\n",
+        "6: \"doctor/health care\"\n",
+        "7: \"executive/managerial\"\n",
+        "8: \"farmer\"\n",
+        "9: \"homemaker\"\n",
+        "10: \"K-12 student\"\n",
+        "11: \"lawyer\"\n",
+        "12: \"programmer\"\n",
+        "13: \"retired\"\n",
+        "14: \"sales/marketing\"\n",
+        "15: \"scientist\"\n",
+        "16: \"self-employed\"\n",
+        "17: \"technician/engineer\"\n",
+        "18: \"tradesman/craftsman\"\n",
+        "19: \"unemployed\"\n",
+        "20: \"writer\"\n",
+        "```\n",
+        "\n",
+        "Each record consists of three main components: user features, movie features and movie ratings.\n",
+        "Likewise, as a simple example, consider the following:\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "train_set_creator = paddle.dataset.movielens.train()\n",
+        "train_sample = next(train_set_creator())\n",
+        "uid = train_sample[0]\n",
+        "mov_id = train_sample[len(user_info[uid].value())]\n",
+        "print \"User %s rates Movie %s with Score %s\"%(user_info[uid], movie_info[mov_id], train_sample[-1])\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "```text\n",
+        "User \u003cUserInfo id(1), gender(F), age(1), job(10)\u003e rates Movie \u003cMovieInfo id(1193), title(One Flew Over the Cuckoo's Nest), categories(['Drama'])\u003e with Score [5.0]\n",
+        "```\n",
+        "\n",
+        "The output shows that user 1 gave movie `1193` a rating of 5.\n",
+        "\n",
+        "After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.\n",
+        "\n",
+        "## Model Architecture\n",
+        "\n",
+        "### Initialize PaddlePaddle\n",
+        "\n",
+        "First, we must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "%matplotlib inline\n",
+        "\n",
+        "import matplotlib.pyplot as plt\n",
+        "from IPython import display\n",
+        "import cPickle\n",
+        "\n",
+        "import paddle.v2 as paddle\n",
+        "\n",
+        "paddle.init(use_gpu=False)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "### Model Configuration\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "uid = paddle.layer.data(\n",
+        "        name='user_id',\n",
+        "        type=paddle.data_type.integer_value(\n",
+        "            paddle.dataset.movielens.max_user_id() + 1))\n",
+        "usr_emb = paddle.layer.embedding(input=uid, size=32)\n",
+        "\n",
+        "usr_gender_id = paddle.layer.data(\n",
+        "        name='gender_id', type=paddle.data_type.integer_value(2))\n",
+        "usr_gender_emb = paddle.layer.embedding(input=usr_gender_id, size=16)\n",
+        "\n",
+        "usr_age_id = paddle.layer.data(\n",
+        "        name='age_id',\n",
+        "        type=paddle.data_type.integer_value(\n",
+        "            len(paddle.dataset.movielens.age_table)))\n",
+        "usr_age_emb = paddle.layer.embedding(input=usr_age_id, size=16)\n",
+        "\n",
+        "usr_job_id = paddle.layer.data(\n",
+        "        name='job_id',\n",
+        "        type=paddle.data_type.integer_value(paddle.dataset.movielens.max_job_id(\n",
+        "        ) + 1))\n",
+        "usr_job_emb = paddle.layer.embedding(input=usr_job_id, size=16)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "As shown in the above code, the input is four dimension integers for each user, that is,  `user_id`,`gender_id`, `age_id` and `job_id`. In order to deal with these features conveniently, we use the language model in NLP to transform these discrete values into embedding vaules `usr_emb`, `usr_gender_emb`, `usr_age_emb` and `usr_job_emb`.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "usr_combined_features = paddle.layer.fc(\n",
+        "        input=[usr_emb, usr_gender_emb, usr_age_emb, usr_job_emb],\n",
+        "        size=200,\n",
+        "        act=paddle.activation.Tanh())\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "Then, employing user features as input, directly connecting to a fully-connected layer, which is used to reduce dimension to 200.\n",
        "\n",
-        "We don't have to download and preprocess the data.  Instead, we can use PaddlePaddle's dataset module `paddle.v2.dataset.movielens`.\n",
+        "Furthermore, we do a similar transformation for each movie feature. The model configuration is:\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "mov_id = paddle.layer.data(\n",
+        "    name='movie_id',\n",
+        "    type=paddle.data_type.integer_value(\n",
+        "        paddle.dataset.movielens.max_movie_id() + 1))\n",
+        "mov_emb = paddle.layer.embedding(input=mov_id, size=32)\n",
+        "\n",
+        "mov_categories = paddle.layer.data(\n",
+        "    name='category_id',\n",
+        "    type=paddle.data_type.sparse_binary_vector(\n",
+        "        len(paddle.dataset.movielens.movie_categories())))\n",
+        "\n",
+        "mov_categories_hidden = paddle.layer.fc(input=mov_categories, size=32)\n",
+        "\n",
+        "\n",
+        "movie_title_dict = paddle.dataset.movielens.get_movie_title_dict()\n",
+        "mov_title_id = paddle.layer.data(\n",
+        "    name='movie_title',\n",
+        "    type=paddle.data_type.integer_value_sequence(len(movie_title_dict)))\n",
+        "mov_title_emb = paddle.layer.embedding(input=mov_title_id, size=32)\n",
+        "mov_title_conv = paddle.networks.sequence_conv_pool(\n",
+        "    input=mov_title_emb, hidden_size=32, context_len=3)\n",
+        "\n",
+        "mov_combined_features = paddle.layer.fc(\n",
+        "    input=[mov_emb, mov_categories_hidden, mov_title_conv],\n",
+        "    size=200,\n",
+        "    act=paddle.activation.Tanh())\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
        "\n",
+        "Movie title, a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.\n",
        "\n",
-        "## Model Specification\n",
+        "Finally, we can use cosine similarity to calculate the similarity between user characteristics and movie features.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "inference = paddle.layer.cos_sim(a=usr_combined_features, b=mov_combined_features, size=1, scale=5)\n",
+        "cost = paddle.layer.regression_cost(\n",
+        "        input=inference,\n",
+        "        label=paddle.layer.data(\n",
+        "        name='score', type=paddle.data_type.dense_vector(1)))\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
        "\n",
+        "## Model Training\n",
        "\n",
+        "### Define Parameters\n",
        "\n",
-        "## Training\n",
+        "First, we define the model parameters according to the previous model configuration `cost`.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "# Create parameters\n",
+        "parameters = paddle.parameters.create(cost)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
        "\n",
+        "### Create Trainer\n",
        "\n",
+        "Before jumping into creating a training module, algorithm setting is also necessary. Here we specified Adam optimization algorithm via `paddle.optimizer`.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "trainer = paddle.trainer.SGD(cost=cost, parameters=parameters,\n",
+        "                             update_equation=paddle.optimizer.Adam(learning_rate=1e-4))\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
        "\n",
-        "## Inference\n",
+        "```text\n",
+        "[INFO 2017-03-06 17:12:13,378 networks.py:1472] The input order is [user_id, gender_id, age_id, job_id, movie_id, category_id, movie_title, score]\n",
+        "[INFO 2017-03-06 17:12:13,379 networks.py:1478] The output order is [__regression_cost_0__]\n",
+        "```\n",
        "\n",
+        "### Training\n",
        "\n",
+        "`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input is generated for training.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "reader=paddle.reader.batch(\n",
+        "    paddle.reader.shuffle(\n",
+        "        paddle.dataset.movielens.trai(), buf_size=8192),\n",
+        "        batch_size=256)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `movielens.train` corresponds to `user_id` feature.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "feeding = {\n",
+        "    'user_id': 0,\n",
+        "    'gender_id': 1,\n",
+        "    'age_id': 2,\n",
+        "    'job_id': 3,\n",
+        "    'movie_id': 4,\n",
+        "    'category_id': 5,\n",
+        "    'movie_title': 6,\n",
+        "    'score': 7\n",
+        "}\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "Callback function `event_handler` will be called during training when a pre-defined event happens.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "step=0\n",
+        "\n",
+        "train_costs=[],[]\n",
+        "test_costs=[],[]\n",
+        "\n",
+        "def event_handler(event):\n",
+        "    global step\n",
+        "    global train_costs\n",
+        "    global test_costs\n",
+        "    if isinstance(event, paddle.event.EndIteration):\n",
+        "        need_plot = False\n",
+        "        if step % 10 == 0:  # every 10 batches, record a train cost\n",
+        "            train_costs[0].append(step)\n",
+        "            train_costs[1].append(event.cost)\n",
+        "\n",
+        "        if step % 1000 == 0: # every 1000 batches, record a test cost\n",
+        "            result = trainer.test(reader=paddle.batch(\n",
+        "                  paddle.dataset.movielens.test(), batch_size=256))\n",
+        "            test_costs[0].append(step)\n",
+        "            test_costs[1].append(result.cost)\n",
+        "\n",
+        "        if step % 100 == 0: # every 100 batches, update cost plot\n",
+        "            plt.plot(*train_costs)\n",
+        "            plt.plot(*test_costs)\n",
+        "            plt.legend(['Train Cost', 'Test Cost'], loc='upper left')\n",
+        "            display.clear_output(wait=True)\n",
+        "            display.display(plt.gcf())\n",
+        "            plt.gcf().clear()\n",
+        "        step += 1\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n",
+        "Finally, we can invoke `trainer.train` to start training:\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "editable": true
+      },
+      "source": [
+        "trainer.train(\n",
+        "    reader=reader,\n",
+        "    event_handler=event_handler,\n",
+        "    feeding=feeding,\n",
+        "    num_passes=200)\n"
+      ],
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "\n"
+          ]
+        }
+      ],
+      "execution_count": 1
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
        "\n",
        "## Conclusion\n",
        "\n",
@@ -105,16 +703,16 @@
        "\n",
        "## Reference\n",
        "\n",
-        "1. [Peter Brusilovsky](https://en.wikipedia.org/wiki/Peter_Brusilovsky) (2007). *The Adaptive Web*. p. 325.\n",
+        "1. [Peter Brusilovsky](https://en.wikipedia.org/wiki/Peter_Brusilovsky) (2007). *The Adaptive Web*. p. 325.\n",
-        "2. Robin Burke , [Hybrid Web Recommender Systems](http://www.dcs.warwick.ac.uk/~acristea/courses/CS411/2010/Book%20-%20The%20Adaptive%20Web/HybridWebRecommenderSystems.pdf), pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.\n",
+        "2. Robin Burke , [Hybrid Web Recommender Systems](http://www.dcs.warwick.ac.uk/~acristea/courses/CS411/2010/Book%20-%20The%20Adaptive%20Web/HybridWebRecommenderSystems.pdf), pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.\n",
        "3. P. Resnick, N. Iacovou, etc. “[GroupLens: An Open Architecture for Collaborative Filtering of Netnews](http://ccs.mit.edu/papers/CCSWP165.html)”, Proceedings of ACM Conference on Computer Supported Cooperative Work, CSCW 1994. pp.175-186.\n",
-        "4. Sarwar, Badrul, et al. \"[Item-based collaborative filtering recommendation algorithms.](http://files.grouplens.org/papers/www10_sarwar.pdf)\" *Proceedings of the 10th International Conference on World Wide Web*. ACM, 2001.\n",
+        "4. Sarwar, Badrul, et al. \"[Item-based collaborative filtering recommendation algorithms.](http://files.grouplens.org/papers/www10_sarwar.pdf)\" *Proceedings of the 10th International Conference on World Wide Web*. ACM, 2001.\n",
        "5. Kautz, Henry, Bart Selman, and Mehul Shah. \"[Referral Web: Combining Social networks and collaborative filtering.](http://www.cs.cornell.edu/selman/papers/pdf/97.cacm.refweb.pdf)\" Communications of the ACM 40.3 (1997): 63-65. APA\n",
-        "6. Yuan, Jianbo, et al. [\"Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach.\"](https://arxiv.org/pdf/1611.05480v1.pdf) *arXiv preprint arXiv:1611.05480* (2016).\n",
+        "6. Yuan, Jianbo, et al. [\"Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach.\"](https://arxiv.org/pdf/1611.05480v1.pdf) *arXiv preprint arXiv:1611.05480* (2016).\n",
        "7. Covington P, Adams J, Sargin E. [Deep neural networks for youtube recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)[C]//Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016: 191-198.\n",
        "\n",
        "\u003cbr/\u003e\n",
-        "\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003e\u003cimg alt=\"Creative Commons\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png\" /\u003e\u003c/a\u003e\u003cbr /\u003e\u003cspan xmlns:dct=\"http://purl.org/dc/terms/\" href=\"http://purl.org/dc/dcmitype/Text\" property=\"dct:title\" rel=\"dct:type\"\u003eThis tutorial\u003c/span\u003e was created by \u003ca xmlns:cc=\"http://creativecommons.org/ns#\" href=\"http://book.paddlepaddle.org\" property=\"cc:attributionName\" rel=\"cc:attributionURL\"\u003ethe PaddlePaddle community\u003c/a\u003e and published under \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003eCommon Creative 4.0 License\u003c/a\u003e。\n"
+        "This tutorial is contributed by \u003ca xmlns:cc=\"http://creativecommons.org/ns#\" href=\"http://book.paddlepaddle.org\" property=\"cc:attributionName\" rel=\"cc:attributionURL\"\u003ePaddlePaddle\u003c/a\u003e, and licensed under a \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\"\u003eCreative Commons Attribution-NonCommercial-ShareAlike 4.0 International License\u003c/a\u003e.\n"
      ]
    }
  ],

--- a/recommender_system/README.ipynb
+++ b/recommender_system/README.ipynb
@@ -6,7 +6,7 @@
      "source": [
        "# 个性化推荐\n",
        "\n",
-        "本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。\n",
+        "本教程源代码目录在[book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/recommender_system)， 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。\n",
        "\n",
        "## 背景介绍\n",
        "\n",