index.html 24.0 KB
Newer Older
1

Y
Yu Yang 已提交
2 3 4 5
<html>
<head>
  <script type="text/x-mathjax-config">
  MathJax.Hub.Config({
Y
Yu Yang 已提交
6
    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
Y
Yu Yang 已提交
7 8
    jax: ["input/TeX", "output/HTML-CSS"],
    tex2jax: {
9 10
      inlineMath: [ ['$','$'] ],
      displayMath: [ ['$$','$$'] ],
Y
Yu Yang 已提交
11 12 13 14
      processEscapes: true
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });
Y
Yi Wang 已提交
15 16
  </script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
Y
Yu Yang 已提交
17
  <script type="text/javascript" src="../.tools/theme/marked.js">
Y
Yu Yang 已提交
18 19
  </script>
  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
Y
Yi Wang 已提交
20
  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
Y
Yu Yang 已提交
21
  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
Y
Yu Yang 已提交
22
  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
Y
Yu Yang 已提交
23
  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
Y
Yu Yang 已提交
24 25
</head>
<style type="text/css" >
Y
Yu Yang 已提交
26 27 28 29 30 31
.markdown-body {
    box-sizing: border-box;
    min-width: 200px;
    max-width: 980px;
    margin: 0 auto;
    padding: 45px;
Y
Yu Yang 已提交
32 33 34 35
}
</style>


Y
Yu Yang 已提交
36
<body>
Y
Yu Yang 已提交
37

Y
Yu Yang 已提交
38
<div id="context" class="container-fluid markdown-body">
Y
Yu Yang 已提交
39 40 41 42
</div>

<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
C
choijulie 已提交
43
# Linear Regression
Y
Yan Xu 已提交
44

H
Hao Wang 已提交
45
Let's start this tutorial from the classic Linear Regression ([[1](#References)]) model.
Y
Yu Yang 已提交
46

H
Hao Wang 已提交
47
In this chapter, you will build a model to predict house price with real datasets and learn about several important concepts about machine learning.
Y
Yu Yang 已提交
48

H
Hao Wang 已提交
49
The source code of this tutorial is in [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line). For the new users, please refer to [Running This Book](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book) .
Y
Yan Xu 已提交
50

Y
Yu Yang 已提交
51 52


H
Hao Wang 已提交
53 54
## Background
Given a $n$ dataset ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$, of which $ x_{i1}, \ldots, x_{id}$ are the values of the $d$th attribute of $i$ sample, and $y_i$ is the target to be predicted for this sample.
Y
Yu Yang 已提交
55

H
Hao Wang 已提交
56
 The linear regression model assumes that the target $y_i$ can be described by a linear combination among attributes, i.e.
Y
Yu Yang 已提交
57

H
Hao Wang 已提交
58
$$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldots,n$$
C
choijulie 已提交
59

H
Hao Wang 已提交
60 61 62 63 64 65 66 67
For example, in the problem of prediction of house price we are going to explore, $x_{ij}$ is a description of the various attributes of the house $i$ (such as the number of rooms, the number of schools and hospitals around, traffic conditions, etc.). $y_i$ is the price of the house.



At first glance, this assumption is too simple, and the true relationship among variables is unlikely to be linear. However, because the linear regression model has the advantages of simple form and easy to be modeled and analyzed, it has been widely applied in practical problems. Many classic statistical learning and machine learning books \[[2,3,4](#references)\] also focus on linear model in a chapter.

##  Result Demo
We used the Boston house price dataset obtained from [UCI Housing dataset](http://paddlemodels.bj.bcebos.com/uci_housing/housing.data) to train and predict the model. The scatter plot below shows the result of price prediction for parts of house with model. Each point on x-axis represents the median of the real price of the same type of house, and the y-axis represents the result of the linear regression model based on the feature prediction. When the two values are completely equal, they will fall on the dotted line. So the more accurate the model is predicted, the closer the point is to the dotted line.
Y
Yu Yang 已提交
68
<p align="center">
H
Hao Wang 已提交
69 70
    <img src = "https://github.com/PaddlePaddle/book/blob/develop/01.fit_a_line/image/predictions.png?raw=true" width=400><br/>
    Figure One. Predict value V.S Ground-truth value
Y
Yu Yang 已提交
71 72
</p>

C
choijulie 已提交
73
## Model Overview
Y
Yu Yang 已提交
74

C
choijulie 已提交
75
### Model Definition
Y
Yu Yang 已提交
76

H
Hao Wang 已提交
77 78 79
In the dataset of Boston house price, there are 14 values associated with the home: the first 13 are used to describe various information of house, that is $x_i$ in the model; the last value is the medium price of the house we want to predict, which is $y_i$ in the model.

Therefore, our model can be expressed as:
Y
Yu Yang 已提交
80 81 82

$$\hat{Y} = \omega_1X_{1} + \omega_2X_{2} + \ldots + \omega_{13}X_{13} + b$$

H
Hao Wang 已提交
83
$\hat{Y}$ represents the predicted result of the model and is used to distinguish it from the real value $Y$. The parameters to be learned by the model are: $\omega_1, \ldots, \omega_{13}, b$.
Y
Yu Yang 已提交
84

H
Hao Wang 已提交
85
After building the model, we need to give the model an optimization goal so that the learned parameters can make the predicted value $\hat{Y}$ get as close to the true value $Y$. Here we introduce the concept of loss function ([Loss Function](https://en.wikipedia.org/wiki/Loss_function), or Cost Function.  Input the target value $y_{i}$ of any data sample and the predicted value $\hat{y_{i}}$ given by a model. Then the loss function outputs a non-negative real number, which is usually used to represent model error.
Y
Yu Yang 已提交
86

H
Hao Wang 已提交
87
For linear regression models, the most common loss function is the Mean Squared Error ([MSE](https://en.wikipedia.org/wiki/Mean_squared_error)), which is:
Y
Yu Yang 已提交
88 89 90

$$MSE=\frac{1}{n}\sum_{i=1}^{n}{(\hat{Y_i}-Y_i)}^2$$

H
Hao Wang 已提交
91 92 93 94 95 96 97
That is, for a test set in size of $n$, $MSE$ is the mean of the squared error of the $n$ data prediction results.

The method used to optimize the loss function is generally the gradient descent method. The gradient descent method is a first-order optimization algorithm. If $f(x)$ is defined and divisible at point $x_n$, then $f(x)$ is considered to be the fastest in the negative direction of the gradient $-▽f(x_n)$ at point of $x_n$. Adjust $x$ repeatedly to make $f(x)$ close to the local or global minimum value. The adjustment is as follows:

$$x_n+1=x_n-λ▽f(x), n≧0$$

Where λ represents the learning rate. This method of adjustment is called the gradient descent method.
C
choijulie 已提交
98

99
### Training Process
Y
Yu Yang 已提交
100

H
Hao Wang 已提交
101 102 103 104 105 106
After defining the model structure, we will train the model through the following steps.

  1. Initialize parameters, including weights $\omega_i$ and bias $b$, to initialize them (eg. 0 as mean, 1 as variance).
  2. Forward propagation of network calculates network output and loss functions.
  3. Reverse error propagation according to the loss function ( [backpropagation](https://en.wikipedia.org/wiki/Backpropagation) ), passing forward the network error from the output layer and updating the parameters in the network.
  4. Repeat steps 2~3 until the network training error reaches the specified level or the training round reaches the set value.
Y
Yu Yang 已提交
107

Y
Yan Xu 已提交
108

H
Hao Wang 已提交
109
## Dataset
110

H
Hao Wang 已提交
111 112
### Dataset Introduction
The dataset consists of 506 lines, each containing information about a type of houses in a suburb of Boston and the median price of that type of house. The meaning of each dimensional attribute is as follows:
113

H
Hao Wang 已提交
114
| Property Name | Explanation | Type |
Y
Yu Yang 已提交
115
| ------| ------ | ------ |
H
Hao Wang 已提交
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
CRIM | Per capita crime rate in the town | Continuous value |
| ZN | Proportion of residential land with an area of over 25,000 square feet | Continuous value |
| INDUS | Proportion of non-retail commercial land | Continuous value |
CHAS | Whether it is adjacent to Charles River | Discrete value, 1=proximity; 0=not adjacent |
NOX | Nitric Oxide Concentration | Continuous value |
| RM | Average number of rooms per house | Continuous value |
| AGE | Proportion of self-use units built before 1940 | Continuous value |
| DIS | Weighted Distance to 5 Job Centers in Boston | Continuous value |
| RAD | Accessibility Index to Radial Highway | Continuous value |
| TAX | Tax Rate of Full-value Property | Continuous value |
| PTRATIO | Proportion of Student and Teacher | Continuous value |
| B | 1000(BK - 0.63)^2, where BK is black ratio | Continuous value |
LSTAT | Low-income population ratio | Continuous value |
| MEDV | Median price of a similar home | Continuous value |

### Data Pre-processing
C
choijulie 已提交
132

H
Hao Wang 已提交
133 134
#### Continuous value and discrete value
Analyzing the data, first we find that all 13-dimensional attributes exist 12-dimensional continuous value and 1-dimensional discrete values (CHAS). Discrete value is often represented by numbers like 0, 1, and 2, but its meaning is different from continuous value's because the difference of discrete value here has no meaning. For example, if we use 0, 1, and 2 to represent red, green, and blue, we cannot infer that the distance between blue and red is longer than that between green and red. So usually for a discrete property with $d$ possible values, we will convert them to $d$ binary properties with a value of 0 or 1 or map each possible value to a multidimensional vector. However, there is no this problem for CHAS, since CHAS itself is a binary attribute .
C
choijulie 已提交
135

H
Hao Wang 已提交
136 137
#### Normalization of attributes
Another fact that can be easily found is that the range of values of each dimensional attribute is largely different (as shown in Figure 2). For example, the value range of attribute B is [0.32, 396.90], and the value range of attribute NOX is [0.3850, 0.8170]. Here is a common operation - normalization. The goal of normalization is to scale the value of each attribute to a similar range, such as [-0.5, 0.5]. Here we use a very common operation method: subtract the mean and divide by the range of values.
Y
Yan Xu 已提交
138

H
Hao Wang 已提交
139
There are at least three reasons for implementing normalization (or [Feature scaling](https://en.wikipedia.org/wiki/Feature_scaling)):
Y
Yan Xu 已提交
140

H
Hao Wang 已提交
141
- A range of values that are too large or too small can cause floating value overflow or underflow during calculation.
C
choijulie 已提交
142

H
Hao Wang 已提交
143
- Different ranges of number result in different attributes being different for the model (at least in the initial period of training), and this implicit assumption is often unreasonable. This can make the optimization process difficult and the training time greatly longer.
C
choijulie 已提交
144

H
Hao Wang 已提交
145
- Many machine learning techniques/models (such as L1, L2 regular items, Vector Space Model) are based on the assumption that all attribute values are almost zero and their ranges of value are similar.
Y
Yan Xu 已提交
146

C
choijulie 已提交
147

Y
Yu Yang 已提交
148 149

<p align="center">
H
Hao Wang 已提交
150 151
    <img src = "https://github.com/PaddlePaddle/book/blob/develop/01.fit_a_line/image/ranges.png?raw=true" width=550><br/>
    Figure 2. Value range of attributes for all dimensions
Y
Yu Yang 已提交
152 153
</p>

H
Hao Wang 已提交
154 155 156
#### Organizing training set and testing set

We split the dataset into two parts: one is used to adjust the parameters of the model, that is, to train the model, the error of the model on this dataset is called ** training error **; the other is used to test.The error of the model on this dataset is called the ** test error**. The goal of our training model is to predict unknown new data by finding the regulation from the training data, so the test error is an better indicator for the performance of the model. When it comes to the ratio of the segmentation data, we should take into account two factors: more training data will reduce the square error of estimated parameters, resulting in a more reliable model; and more test data will reduce the square error of the test error, resulting in more credible test error. The split ratio set in our example is $8:2$
C
choijulie 已提交
157

158

H
Hao Wang 已提交
159
In a more complex model training process, we often need more than one dataset: the validation set. Because complex models often have some hyperparameters ([Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_optimization)) that need to be adjusted, we will try a combination of multiple hyperparameters to train multiple models separately and then compare their performance on the validation set to select the relatively best set of hyperparameters, and finally use the model with this set of parameters to evaluate the test error on the test set. Since the model trained in this chapter is relatively simple, we won't talk about this process at present.
160

161 162
## Training

L
lvmengsi 已提交
163
`fit_a_line/train.py` demonstrates the overall process of training.
164

H
Hao Wang 已提交
165
### Configuring the Data feeder
Y
Yan Xu 已提交
166

H
Hao Wang 已提交
167
First we import the libraries:
168 169

```python
170
from __future__ import print_function
171 172 173
import paddle
import paddle.fluid as fluid
import numpy
H
Hao Wang 已提交
174 175
import math
import sys
176 177
```

H
Hao Wang 已提交
178
We introduced the dataset [UCI Housing dataset](http://paddlemodels.bj.bcebos.com/uci_housing/housing.data) via the uci_housing module
179

H
Hao Wang 已提交
180
It is encapsulated in the uci_housing module:
181

H
Hao Wang 已提交
182 183
1. The process of data download. The download data is saved in ~/.cache/paddle/dataset/uci_housing/housing.data.
2. The process of [data preprocessing](#data preprocessing).
184

H
Hao Wang 已提交
185
Next we define the data feeder for training. The data feeder reads a batch of data in the size of `BATCH_SIZE` each time. If the user wants the data to be random, it can define data in size of a batch and a cache. In this case, each time the data feeder randomly reads as same data as the batch size from the cache.
Y
Yu Yang 已提交
186

187 188
```python
BATCH_SIZE = 20
Y
Yu Yang 已提交
189

190 191 192
train_reader = paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.uci_housing.train(), buf_size=500),
H
Hao Wang 已提交
193
        batch_size=BATCH_SIZE)
Y
Yu Yang 已提交
194

195 196 197
test_reader = paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.uci_housing.test(), buf_size=500),
H
Hao Wang 已提交
198
        batch_size=BATCH_SIZE)
199
```
Q
qiaolongfei 已提交
200

H
Hao Wang 已提交
201
If you want to read data directly from \*.txt file, you can refer to the method as follows.
Y
Yan Xu 已提交
202

H
Hao Wang 已提交
203 204 205 206
feature_names = [
    'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
    'PTRATIO', 'B', 'LSTAT', 'convert'
]
Q
qiaolongfei 已提交
207

H
Hao Wang 已提交
208
feature_num = len(feature_names)
209

H
Hao Wang 已提交
210
data = numpy.fromfile(filename, sep=' ') # Read primary data from file
211

H
Hao Wang 已提交
212
data = data.reshape(data.shape[0] // feature_num, feature_num)
213

H
Hao Wang 已提交
214
maximums, minimums, avgs = data.max(axis=0), data.min(axis=0), data.sum(axis=0)/data.shape[0]
Q
qiaolongfei 已提交
215

H
Hao Wang 已提交
216 217
for i in six.moves.range(feature_num-1):
 data[:, i] = (data[:, i] - avgs[i]) / (maximums[i] - minimums[i]) # six.moves is compatible to python2 and python3
218

H
Hao Wang 已提交
219
ratio = 0.8 # distribution ratio of train dataset and verification dataset
220

H
Hao Wang 已提交
221
offset = int(data.shape[0]\*ratio)
Q
qiaolongfei 已提交
222

H
Hao Wang 已提交
223
train_data = data[:offset]
Y
Yan Xu 已提交
224

H
Hao Wang 已提交
225
test_data = data[offset:]
Y
Yu Yang 已提交
226

H
Hao Wang 已提交
227 228 229 230
train_reader = paddle.batch(
    paddle.reader.shuffle(
        train_data, buf_size=500),
        batch_size=BATCH_SIZE)
Y
Yu Yang 已提交
231

H
Hao Wang 已提交
232 233 234 235
test_reader = paddle.batch(
    paddle.reader.shuffle(
        test_data, buf_size=500),
        batch_size=BATCH_SIZE)
Y
Yan Xu 已提交
236

H
Hao Wang 已提交
237 238
### Configure Program for Training
The aim of the program for training is to define a network structure of a training model. For linear regression, it is a simple fully connected layer from input to output. More complex result, such as Convolutional Neural Network and Recurrent Neural Network, will be introduced in later chapters. It must return `mean error` as the first return value in program for training, for that `mean error` will be used for BackPropagation.
Y
Yu Yang 已提交
239 240

```python
H
Hao Wang 已提交
241 242 243 244 245 246 247 248 249
x = fluid.layers.data(name='x', shape=[13], dtype='float32') # define shape and data type of input
y = fluid.layers.data(name='y', shape=[1], dtype='float32') # define shape and data type of output
y_predict = fluid.layers.fc(input=x, size=1, act=None) # fully connected layer connecting input and output

main_program = fluid.default_main_program() # get default/global main function
startup_program = fluid.default_startup_program() # get default/global launch program

cost = fluid.layers.square_error_cost(input=y_predict, label=y) # use label and output predicted data to estimate square error
avg_loss = fluid.layers.mean(cost) # compute mean value for square error and get mean loss
250
```
H
Hao Wang 已提交
251 252 253
For details, please refer to:
[fluid.default_main_program](http://www.paddlepaddle.org/documentation/docs/zh/develop/api_cn/fluid_cn.html#default-main-program)
[fluid.default_startup_program](http://www.paddlepaddle.org/documentation/docs/zh/develop/api_cn/fluid_cn.html#default-startup-program)
Y
Yu Yang 已提交
254

H
Hao Wang 已提交
255
### Optimizer Function Configuration
Y
Yu Yang 已提交
256

H
Hao Wang 已提交
257
`SGD optimizer`, `learning_rate` below are learning rate, which is related to rate of convergence for train of network.
Y
Yu Yang 已提交
258 259

```python
H
Hao Wang 已提交
260 261 262 263
#Clone main_program to get test_program
# operations of some operators are different between train and test. For example, batch_norm use parameter for_test to determine whether the program is for training or for testing.
#The api will not delete any operator, please apply it before backward and optimization.
test_program = main_program.clone(for_test=True)
264 265 266 267

sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_loss)

Y
Yu Yang 已提交
268 269
```

H
Hao Wang 已提交
270
### Define Training Place
271

H
Hao Wang 已提交
272
We can define whether an operation runs on the CPU or on the GPU.
L
liaogang 已提交
273

H
Hao Wang 已提交
274 275 276
```python
use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() # define the execution space of executor
R
root 已提交
277

H
Hao Wang 已提交
278 279
###executor can accept input program and add data input operator and result fetch operator based on feed map and fetch list. Use close() to close executor and call run(...) to run the program.
exe = fluid.Executor(place)
D
daminglu 已提交
280

H
Hao Wang 已提交
281 282 283
```
For details, please refer to:
[fluid.executor](http://www.paddlepaddle.org/documentation/docs/zh/develop/api_cn/fluid_cn.html#permalink-15-executor)
Q
qiaolongfei 已提交
284

H
Hao Wang 已提交
285 286
### Create Training Process
To train, it needs a train program and some parameters and creates a function to get test error in the process of train necessary parameters contain executor, program, reader, feeder, fetch_list,  executor represents executor created before. Program created before represents program executed by executor. If the parameter is undefined, then it is defined default_main_program by default. Reader represents data read. Feeder represents forward input variable and fetch_list represents variable user wants to get or name.
287

H
Hao Wang 已提交
288 289 290 291 292 293 294 295 296 297 298 299 300
```python
num_epochs = 100

def train_test(executor, program, reader, feeder, fetch_list):
    accumulated = 1 * [0]
    count = 0
    for data_test in reader():
        outs = executor.run(program=program,
                            feed=feeder.feed(data_test),
                            fetch_list=fetch_list)
        accumulated = [x_c[0] + x_c[1][0] for x_c in zip(accumulated, outs)] # accumulate loss value in the process of test
        count += 1 # accumulate samples in test dataset
    return [x_d / count for x_d in accumulated] # compute mean loss
301

Y
Yu Yang 已提交
302 303
```

H
Hao Wang 已提交
304
### Train Main Loop
Y
Yan Xu 已提交
305

H
Hao Wang 已提交
306
give name of directory to be stored and initialize an executor
Y
Yu Yang 已提交
307

308
```python
309
%matplotlib inline
H
Hao Wang 已提交
310 311 312 313 314 315 316 317
params_dirname = "fit_a_line.inference.model"
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe.run(startup_program)
train_prompt = "train cost"
test_prompt = "test cost"
from paddle.utils.plot import Ploter
plot_prompt = Ploter(train_prompt, test_prompt)
step = 0
318

H
Hao Wang 已提交
319
exe_test = fluid.Executor(place)
Y
Yu Yang 已提交
320
```
H
Hao Wang 已提交
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346
Paddlepaddle provides reader mechanism to read training data. Reader provide multiple columns of data at one time. Therefore, we need a python list to read sequence. We create a loop to train until the result of train is good enough or time of loop is enough.
If the number of iterations for train is equal to the number of iterations for saving parameters, you can save train parameter into `params_dirname`.
Set main loop for training.
```python
for pass_id in range(num_epochs):
    for data_train in train_reader():
        avg_loss_value, = exe.run(main_program,
                                  feed=feeder.feed(data_train),
                                  fetch_list=[avg_loss])
        if step % 10 == 0: # record and output train loss for every 10 batches.
            plot_prompt.append(train_prompt, step, avg_loss_value[0])
            plot_prompt.plot()
            print("%s, Step %d, Cost %f" %
                      (train_prompt, step, avg_loss_value[0]))
        if step % 100 == 0:  # record and output test loss for every 100 batches.
            test_metics = train_test(executor=exe_test,
                                     program=test_program,
                                     reader=test_reader,
                                     fetch_list=[avg_loss.name],
                                     feeder=feeder)
            plot_prompt.append(test_prompt, step, test_metics[0])
            plot_prompt.plot()
            print("%s, Step %d, Cost %f" %
                      (test_prompt, step, test_metics[0]))
            if test_metics[0] < 10.0: # If the accuracy is up to the requirement, the train can be stopped.
                break
Y
Yu Yang 已提交
347

H
Hao Wang 已提交
348
        step += 1
Q
qiaolongfei 已提交
349

H
Hao Wang 已提交
350 351
        if math.isnan(float(avg_loss_value[0])):
            sys.exit("got NaN loss, training failed.")
352

H
Hao Wang 已提交
353 354 355 356
        #save train parameters into the path given before
        if params_dirname is not None:
            fluid.io.save_inference_model(params_dirname, ['x'], [y_predict], exe)
```
Q
qiaolongfei 已提交
357

H
Hao Wang 已提交
358 359
## Predict
It needs to create trained parameters to run program for prediction. The trained parameters is in `params_dirname`.
Y
Yan Xu 已提交
360

H
Hao Wang 已提交
361 362
### Prepare Environment for Prediction
Similar to the process of training, predictor needs a program for prediction. We can slightly modify our training program to include the prediction value.
Q
qiaolongfei 已提交
363 364

```python
H
Hao Wang 已提交
365 366
infer_exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
Q
qiaolongfei 已提交
367 368
```

H
Hao Wang 已提交
369
### Predict
370

H
Hao Wang 已提交
371
Save pictures
Q
qiaolongfei 已提交
372
```python
H
Hao Wang 已提交
373 374 375 376 377 378 379 380 381 382 383 384 385 386
def save_result(points1, points2):
    import matplotlib
    matplotlib.use('Agg')
    import matplotlib.pyplot as plt
    x1 = [idx for idx in range(len(points1))]
    y1 = points1
    y2 = points2
    l1 = plt.plot(x1, y1, 'r--', label='predictions')
    l2 = plt.plot(x1, y2, 'g--', label='GT')
    plt.plot(x1, y1, 'ro-', x1, y2, 'g+-')
    plt.title('predictions VS GT')
    plt.legend()
    plt.savefig('./image/prediction_gt.png')
```
Q
qiaolongfei 已提交
387

H
Hao Wang 已提交
388
Via fluid.io.load_inference_model, predictor will read well-trained model from `params_dirname` to predict unknown data.
Q
qiaolongfei 已提交
389

H
Hao Wang 已提交
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419
```python
with fluid.scope_guard(inference_scope):
    [inference_program, feed_target_names,
     fetch_targets] = fluid.io.load_inference_model(params_dirname, infer_exe) # load pre-predict model
    batch_size = 10

    infer_reader = paddle.batch(
        paddle.dataset.uci_housing.test(), batch_size=batch_size) # prepare test dataset

    infer_data = next(infer_reader())
    infer_feat = numpy.array(
        [data[0] for data in infer_data]).astype("float32") # extract data in test dataset
    infer_label = numpy.array(
        [data[1] for data in infer_data]).astype("float32") # extract label in test dataset

    assert feed_target_names[0] == 'x'
    results = infer_exe.run(inference_program,
                            feed={feed_target_names[0]: numpy.array(infer_feat)},
                            fetch_list=fetch_targets) # predict
    #print predict result and label and visualize the result
    print("infer results: (House Price)")
    for idx, val in enumerate(results[0]):
        print("%d: %.2f" % (idx, val)) # print predict result

    print("\nground truth:")
    for idx, val in enumerate(infer_label):
        print("%d: %.2f" % (idx, val)) # print label

    save_result(results[0], infer_label) # save picture
```
420 421


Q
qiaolongfei 已提交
422

C
choijulie 已提交
423
## Summary
H
Hao Wang 已提交
424
In this chapter, we analyzed dataset of Boston House Price to introduce the basic concepts of linear regression model and how to use PaddlePaddle to implement training and testing. A number of models and theories are derived from linear regression model. Therefore, it is not unnecessary to figure out the principle and limitation of linear regression model.
Y
Yu Yang 已提交
425

H
Hao Wang 已提交
426
<a name="References"></a>
C
choijulie 已提交
427
## References
Y
Yu Yang 已提交
428 429 430 431 432 433
1. https://en.wikipedia.org/wiki/Linear_regression
2. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning[M]. Springer, Berlin: Springer series in statistics, 2001.
3. Murphy K P. Machine learning: a probabilistic perspective[M]. MIT press, 2012.
4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.

<br/>
X
xiaoting 已提交
434
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://paddlepaddleimage.cdn.bcebos.com/bookimage/camo.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This tutorial</span> is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
435

Y
Yu Yang 已提交
436 437 438 439 440 441 442
</div>
<!-- You can change the lines below now. -->

<script type="text/javascript">
marked.setOptions({
  renderer: new marked.Renderer(),
  gfm: true,
Y
Yu Yang 已提交
443 444 445
  breaks: false,
  smartypants: true,
  highlight: function(code, lang) {
Y
Yu Yang 已提交
446
    code = code.replace(/&amp;/g, "&")
Y
Yu Yang 已提交
447 448
    code = code.replace(/&gt;/g, ">")
    code = code.replace(/&lt;/g, "<")
449
    code = code.replace(/&nbsp;/g, " ")
Y
Yu Yang 已提交
450
    return hljs.highlightAuto(code, [lang]).value;
Y
Yu Yang 已提交
451 452 453
  }
});
document.getElementById("context").innerHTML = marked(
454
        document.getElementById("markdown").innerHTML)
Y
Yu Yang 已提交
455 456
</script>
</body>