提交 9486db87 编写于 作者: Q qiaolongfei

add draw line in 01.fit-a-line

上级 3e263f9f
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
"# Linear Regression\n", "# Linear Regression\n",
"Let us begin the tutorial with a classical problem called Linear Regression \\[[1](#References)\\]. In this chapter, we will train a model from a realistic dataset to predict home prices. Some important concepts in Machine Learning will be covered through this example.\n", "Let us begin the tutorial with a classical problem called Linear Regression \\[[1](#References)\\]. In this chapter, we will train a model from a realistic dataset to predict home prices. Some important concepts in Machine Learning will be covered through this example.\n",
"\n", "\n",
"The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).\n", "The source code for this tutorial lives on [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line). For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/book/blob/develop/README.en.md).\n",
"\n", "\n",
"## Problem Setup\n", "## Problem Setup\n",
"Suppose we have a dataset of $n$ real estate properties. These real estate properties will be referred to as *homes* in this chapter for clarity.\n", "Suppose we have a dataset of $n$ real estate properties. These real estate properties will be referred to as *homes* in this chapter for clarity.\n",
...@@ -308,19 +308,41 @@ ...@@ -308,19 +308,41 @@
"editable": true "editable": true
}, },
"source": [ "source": [
"# event_handler to print training and testing info\n", "import matplotlib.pyplot as plt\n",
"from IPython import display\n",
"import cPickle\n",
"\n",
"step=0\n",
"\n",
"train_costs=[],[]\n",
"test_costs=[],[]\n",
"\n",
"def event_handler(event):\n", "def event_handler(event):\n",
" global step\n",
" global train_costs\n",
" global test_costs\n",
" if isinstance(event, paddle.event.EndIteration):\n", " if isinstance(event, paddle.event.EndIteration):\n",
" if event.batch_id % 100 == 0:\n", " need_plot = False\n",
" print \"Pass %d, Batch %d, Cost %f\" % (\n", " if step % 10 == 0: # every 10 batches, record a train cost\n",
" event.pass_id, event.batch_id, event.cost)\n", " train_costs[0].append(step)\n",
"\n", " train_costs[1].append(event.cost)\n",
" if isinstance(event, paddle.event.EndPass):\n", "\n",
" result = trainer.test(\n", " if step % 1000 == 0: # every 1000 batches, record a test cost\n",
" reader=paddle.batch(\n", " result = trainer.test(\n",
" uci_housing.test(), batch_size=2),\n", " reader=paddle.batch(\n",
" feeding=feeding)\n", " uci_housing.test(), batch_size=2),\n",
" print \"Test %d, Cost %f\" % (event.pass_id, result.cost)\n" " feeding=feeding)\n",
" test_costs[0].append(step)\n",
" test_costs[1].append(result.cost)\n",
"\n",
" if step % 100 == 0: # every 100 batches, update cost plot\n",
" plt.plot(*train_costs)\n",
" plt.plot(*test_costs)\n",
" plt.legend(['Train Cost', 'Test Cost'], loc='upper left')\n",
" display.clear_output(wait=True)\n",
" display.display(plt.gcf())\n",
" plt.gcf().clear()\n",
" step += 1\n"
], ],
"outputs": [ "outputs": [
{ {
...@@ -372,6 +394,8 @@ ...@@ -372,6 +394,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"\n",
"![png](./image/train-and-test.png)\n",
"\n", "\n",
"## Summary\n", "## Summary\n",
"This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.\n", "This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.\n",
......
...@@ -163,19 +163,41 @@ feeding={'x': 0, 'y': 1} ...@@ -163,19 +163,41 @@ feeding={'x': 0, 'y': 1}
Moreover, an event handler is provided to print the training progress: Moreover, an event handler is provided to print the training progress:
```python ```python
# event_handler to print training and testing info import matplotlib.pyplot as plt
from IPython import display
import cPickle
step=0
train_costs=[],[]
test_costs=[],[]
def event_handler(event): def event_handler(event):
global step
global train_costs
global test_costs
if isinstance(event, paddle.event.EndIteration): if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0: need_plot = False
print "Pass %d, Batch %d, Cost %f" % ( if step % 10 == 0: # every 10 batches, record a train cost
event.pass_id, event.batch_id, event.cost) train_costs[0].append(step)
train_costs[1].append(event.cost)
if isinstance(event, paddle.event.EndPass):
result = trainer.test( if step % 1000 == 0: # every 1000 batches, record a test cost
reader=paddle.batch( result = trainer.test(
uci_housing.test(), batch_size=2), reader=paddle.batch(
feeding=feeding) uci_housing.test(), batch_size=2),
print "Test %d, Cost %f" % (event.pass_id, result.cost) feeding=feeding)
test_costs[0].append(step)
test_costs[1].append(result.cost)
if step % 100 == 0: # every 100 batches, update cost plot
plt.plot(*train_costs)
plt.plot(*test_costs)
plt.legend(['Train Cost', 'Test Cost'], loc='upper left')
display.clear_output(wait=True)
display.display(plt.gcf())
plt.gcf().clear()
step += 1
``` ```
### Start Training ### Start Training
...@@ -191,6 +213,8 @@ trainer.train( ...@@ -191,6 +213,8 @@ trainer.train(
num_passes=30) num_passes=30)
``` ```
![png](./image/train-and-test.png)
## Summary ## Summary
This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation. This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
"# 线性回归\n", "# 线性回归\n",
"让我们从经典的线性回归(Linear Regression \\[[1](#参考文献)\\])模型开始这份教程。在这一章里,你将使用真实的数据集建立起一个房价预测模型,并且了解到机器学习中的若干重要概念。\n", "让我们从经典的线性回归(Linear Regression \\[[1](#参考文献)\\])模型开始这份教程。在这一章里,你将使用真实的数据集建立起一个房价预测模型,并且了解到机器学习中的若干重要概念。\n",
"\n", "\n",
"本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line), 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_cn.rst)。\n", "本教程源代码目录在[book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/fit_a_line), 初次使用请参考PaddlePaddle[安装教程](https://github.com/PaddlePaddle/book/blob/develop/README.md)。\n",
"\n", "\n",
"## 背景介绍\n", "## 背景介绍\n",
"给定一个大小为$n$的数据集 ${\\{y_{i}, x_{i1}, ..., x_{id}\\}}_{i=1}^{n}$,其中$x_{i1}, \\ldots, x_{id}$是第$i$个样本$d$个属性上的取值,$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述,即\n", "给定一个大小为$n$的数据集 ${\\{y_{i}, x_{i1}, ..., x_{id}\\}}_{i=1}^{n}$,其中$x_{i1}, \\ldots, x_{id}$是第$i$个样本$d$个属性上的取值,$y_i$是该样本待预测的目标。线性回归模型假设目标$y_i$可以被属性间的线性组合描述,即\n",
...@@ -35,7 +35,7 @@ ...@@ -35,7 +35,7 @@
"\n", "\n",
"$\\hat{Y}$ 表示模型的预测结果,用来和真实值$Y$区分。模型要学习的参数即:$\\omega_1, \\ldots, \\omega_{13}, b$。\n", "$\\hat{Y}$ 表示模型的预测结果,用来和真实值$Y$区分。模型要学习的参数即:$\\omega_1, \\ldots, \\omega_{13}, b$。\n",
"\n", "\n",
"建立模型后,我们需要给模型一个优化目标,使得学到的参数能够让预测值$\\hat{Y}$尽可能地接近真实值$Y$。这里我们引入损失函数([Loss Function](https://en.wikipedia.org/wiki/Loss_function),或Cost Function)这个概念。 输入任意一个数据样本的目标值$y_{i}$和模型给出的预测值$\\hat{y_{i}}$,损失函数输出一个非负的实值。这个实通常用来反映模型误差的大小。\n", "建立模型后,我们需要给模型一个优化目标,使得学到的参数能够让预测值$\\hat{Y}$尽可能地接近真实值$Y$。这里我们引入损失函数([Loss Function](https://en.wikipedia.org/wiki/Loss_function),或Cost Function)这个概念。 输入任意一个数据样本的目标值$y_{i}$和模型给出的预测值$\\hat{y_{i}}$,损失函数输出一个非负的实值。这个实通常用来反映模型误差的大小。\n",
"\n", "\n",
"对于线性回归模型来讲,最常见的损失函数就是均方误差(Mean Squared Error, [MSE](https://en.wikipedia.org/wiki/Mean_squared_error))了,它的形式是:\n", "对于线性回归模型来讲,最常见的损失函数就是均方误差(Mean Squared Error, [MSE](https://en.wikipedia.org/wiki/Mean_squared_error))了,它的形式是:\n",
"\n", "\n",
...@@ -304,18 +304,41 @@ ...@@ -304,18 +304,41 @@
}, },
"source": [ "source": [
"# event_handler to print training and testing info\n", "# event_handler to print training and testing info\n",
"import matplotlib.pyplot as plt\n",
"from IPython import display\n",
"import cPickle\n",
"\n",
"step=0\n",
"\n",
"train_costs=[],[]\n",
"test_costs=[],[]\n",
"\n",
"def event_handler(event):\n", "def event_handler(event):\n",
" global step\n",
" global train_costs\n",
" global test_costs\n",
" if isinstance(event, paddle.event.EndIteration):\n", " if isinstance(event, paddle.event.EndIteration):\n",
" if event.batch_id % 100 == 0:\n", " need_plot = False\n",
" print \"Pass %d, Batch %d, Cost %f\" % (\n", " if step % 10 == 0: # every 10 batches, record a train cost\n",
" event.pass_id, event.batch_id, event.cost)\n", " train_costs[0].append(step)\n",
"\n", " train_costs[1].append(event.cost)\n",
" if isinstance(event, paddle.event.EndPass):\n", "\n",
" result = trainer.test(\n", " if step % 1000 == 0: # every 1000 batches, record a test cost\n",
" reader=paddle.batch(\n", " result = trainer.test(\n",
" uci_housing.test(), batch_size=2),\n", " reader=paddle.batch(\n",
" feeding=feeding)\n", " uci_housing.test(), batch_size=2),\n",
" print \"Test %d, Cost %f\" % (event.pass_id, result.cost)\n" " feeding=feeding)\n",
" test_costs[0].append(step)\n",
" test_costs[1].append(result.cost)\n",
"\n",
" if step % 100 == 0: # every 100 batches, update cost plot\n",
" plt.plot(*train_costs)\n",
" plt.plot(*test_costs)\n",
" plt.legend(['Train Cost', 'Test Cost'], loc='upper left')\n",
" display.clear_output(wait=True)\n",
" display.display(plt.gcf())\n",
" plt.gcf().clear()\n",
" step += 1\n"
], ],
"outputs": [ "outputs": [
{ {
...@@ -367,6 +390,8 @@ ...@@ -367,6 +390,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"\n",
"![png](./image/train-and-test.png)\n",
"\n", "\n",
"## 总结\n", "## 总结\n",
"在这章里,我们借助波士顿房价这一数据集,介绍了线性回归模型的基本概念,以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来,因此弄清楚线性模型的原理和局限非常重要。\n", "在这章里,我们借助波士顿房价这一数据集,介绍了线性回归模型的基本概念,以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来,因此弄清楚线性模型的原理和局限非常重要。\n",
......
...@@ -159,18 +159,41 @@ feeding={'x': 0, 'y': 1} ...@@ -159,18 +159,41 @@ feeding={'x': 0, 'y': 1}
```python ```python
# event_handler to print training and testing info # event_handler to print training and testing info
import matplotlib.pyplot as plt
from IPython import display
import cPickle
step=0
train_costs=[],[]
test_costs=[],[]
def event_handler(event): def event_handler(event):
global step
global train_costs
global test_costs
if isinstance(event, paddle.event.EndIteration): if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0: need_plot = False
print "Pass %d, Batch %d, Cost %f" % ( if step % 10 == 0: # every 10 batches, record a train cost
event.pass_id, event.batch_id, event.cost) train_costs[0].append(step)
train_costs[1].append(event.cost)
if isinstance(event, paddle.event.EndPass):
result = trainer.test( if step % 1000 == 0: # every 1000 batches, record a test cost
reader=paddle.batch( result = trainer.test(
uci_housing.test(), batch_size=2), reader=paddle.batch(
feeding=feeding) uci_housing.test(), batch_size=2),
print "Test %d, Cost %f" % (event.pass_id, result.cost) feeding=feeding)
test_costs[0].append(step)
test_costs[1].append(result.cost)
if step % 100 == 0: # every 100 batches, update cost plot
plt.plot(*train_costs)
plt.plot(*test_costs)
plt.legend(['Train Cost', 'Test Cost'], loc='upper left')
display.clear_output(wait=True)
display.display(plt.gcf())
plt.gcf().clear()
step += 1
``` ```
### 开始训练 ### 开始训练
...@@ -186,6 +209,8 @@ trainer.train( ...@@ -186,6 +209,8 @@ trainer.train(
num_passes=30) num_passes=30)
``` ```
![png](./image/train-and-test.png)
## 总结 ## 总结
在这章里,我们借助波士顿房价这一数据集,介绍了线性回归模型的基本概念,以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来,因此弄清楚线性模型的原理和局限非常重要。 在这章里,我们借助波士顿房价这一数据集,介绍了线性回归模型的基本概念,以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来,因此弄清楚线性模型的原理和局限非常重要。
......
...@@ -205,19 +205,41 @@ feeding={'x': 0, 'y': 1} ...@@ -205,19 +205,41 @@ feeding={'x': 0, 'y': 1}
Moreover, an event handler is provided to print the training progress: Moreover, an event handler is provided to print the training progress:
```python ```python
# event_handler to print training and testing info import matplotlib.pyplot as plt
from IPython import display
import cPickle
step=0
train_costs=[],[]
test_costs=[],[]
def event_handler(event): def event_handler(event):
global step
global train_costs
global test_costs
if isinstance(event, paddle.event.EndIteration): if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0: need_plot = False
print "Pass %d, Batch %d, Cost %f" % ( if step % 10 == 0: # every 10 batches, record a train cost
event.pass_id, event.batch_id, event.cost) train_costs[0].append(step)
train_costs[1].append(event.cost)
if isinstance(event, paddle.event.EndPass):
result = trainer.test( if step % 1000 == 0: # every 1000 batches, record a test cost
reader=paddle.batch( result = trainer.test(
uci_housing.test(), batch_size=2), reader=paddle.batch(
feeding=feeding) uci_housing.test(), batch_size=2),
print "Test %d, Cost %f" % (event.pass_id, result.cost) feeding=feeding)
test_costs[0].append(step)
test_costs[1].append(result.cost)
if step % 100 == 0: # every 100 batches, update cost plot
plt.plot(*train_costs)
plt.plot(*test_costs)
plt.legend(['Train Cost', 'Test Cost'], loc='upper left')
display.clear_output(wait=True)
display.display(plt.gcf())
plt.gcf().clear()
step += 1
``` ```
### Start Training ### Start Training
...@@ -233,6 +255,8 @@ trainer.train( ...@@ -233,6 +255,8 @@ trainer.train(
num_passes=30) num_passes=30)
``` ```
![png](./image/train-and-test.png)
## Summary ## Summary
This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation. This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
......
...@@ -201,18 +201,41 @@ feeding={'x': 0, 'y': 1} ...@@ -201,18 +201,41 @@ feeding={'x': 0, 'y': 1}
```python ```python
# event_handler to print training and testing info # event_handler to print training and testing info
import matplotlib.pyplot as plt
from IPython import display
import cPickle
step=0
train_costs=[],[]
test_costs=[],[]
def event_handler(event): def event_handler(event):
global step
global train_costs
global test_costs
if isinstance(event, paddle.event.EndIteration): if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0: need_plot = False
print "Pass %d, Batch %d, Cost %f" % ( if step % 10 == 0: # every 10 batches, record a train cost
event.pass_id, event.batch_id, event.cost) train_costs[0].append(step)
train_costs[1].append(event.cost)
if isinstance(event, paddle.event.EndPass):
result = trainer.test( if step % 1000 == 0: # every 1000 batches, record a test cost
reader=paddle.batch( result = trainer.test(
uci_housing.test(), batch_size=2), reader=paddle.batch(
feeding=feeding) uci_housing.test(), batch_size=2),
print "Test %d, Cost %f" % (event.pass_id, result.cost) feeding=feeding)
test_costs[0].append(step)
test_costs[1].append(result.cost)
if step % 100 == 0: # every 100 batches, update cost plot
plt.plot(*train_costs)
plt.plot(*test_costs)
plt.legend(['Train Cost', 'Test Cost'], loc='upper left')
display.clear_output(wait=True)
display.display(plt.gcf())
plt.gcf().clear()
step += 1
``` ```
### 开始训练 ### 开始训练
...@@ -228,6 +251,8 @@ trainer.train( ...@@ -228,6 +251,8 @@ trainer.train(
num_passes=30) num_passes=30)
``` ```
![png](./image/train-and-test.png)
## 总结 ## 总结
在这章里,我们借助波士顿房价这一数据集,介绍了线性回归模型的基本概念,以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来,因此弄清楚线性模型的原理和局限非常重要。 在这章里,我们借助波士顿房价这一数据集,介绍了线性回归模型的基本概念,以及如何使用PaddlePaddle实现训练和测试的过程。很多的模型和技巧都是从简单的线性回归模型演化而来,因此弄清楚线性模型的原理和局限非常重要。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册