README.md 22.1 KB
Newer Older
C
choijulie 已提交
1
# Linear Regression
Y
Yan Xu 已提交
2

H
Hao Wang 已提交
3
Let's start this tutorial from the classic Linear Regression ([[1](#References)]) model.
Z
zhouxiao-coder 已提交
4

H
Hao Wang 已提交
5
In this chapter, you will build a model to predict house price with real datasets and learn about several important concepts about machine learning.
L
Luo Tao 已提交
6

H
Hao Wang 已提交
7
The source code of this tutorial is in [book/fit_a_line](https://github.com/PaddlePaddle/book/tree/develop/01.fit_a_line). For the new users, please refer to [Running This Book](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book) .
Y
Yan Xu 已提交
8

Z
zhouxiao-coder 已提交
9 10


H
Hao Wang 已提交
11 12
## Background
Given a $n$ dataset ${\{y_{i}, x_{i1}, ..., x_{id}\}}_{i=1}^{n}$, of which $ x_{i1}, \ldots, x_{id}$ are the values of the $d$th attribute of $i$ sample, and $y_i$ is the target to be predicted for this sample.
Z
zhouxiao-coder 已提交
13

H
Hao Wang 已提交
14
 The linear regression model assumes that the target $y_i$ can be described by a linear combination among attributes, i.e.
Z
zhouxiao-coder 已提交
15

H
Hao Wang 已提交
16
$$y_i = \omega_1x_{i1} + \omega_2x_{i2} + \ldots + \omega_dx_{id} + b,  i=1,\ldots,n$$
C
choijulie 已提交
17

H
Hao Wang 已提交
18 19 20 21 22 23 24 25
For example, in the problem of prediction of house price we are going to explore, $x_{ij}$ is a description of the various attributes of the house $i$ (such as the number of rooms, the number of schools and hospitals around, traffic conditions, etc.). $y_i$ is the price of the house.



At first glance, this assumption is too simple, and the true relationship among variables is unlikely to be linear. However, because the linear regression model has the advantages of simple form and easy to be modeled and analyzed, it has been widely applied in practical problems. Many classic statistical learning and machine learning books \[[2,3,4](#references)\] also focus on linear model in a chapter.

##  Result Demo
We used the Boston house price dataset obtained from [UCI Housing dataset](http://paddlemodels.bj.bcebos.com/uci_housing/housing.data) to train and predict the model. The scatter plot below shows the result of price prediction for parts of house with model. Each point on x-axis represents the median of the real price of the same type of house, and the y-axis represents the result of the linear regression model based on the feature prediction. When the two values are completely equal, they will fall on the dotted line. So the more accurate the model is predicted, the closer the point is to the dotted line.
Z
zhouxiao-coder 已提交
26
<p align="center">
H
Hao Wang 已提交
27 28
    <img src = "https://github.com/PaddlePaddle/book/blob/develop/01.fit_a_line/image/predictions.png?raw=true" width=400><br/>
    Figure One. Predict value V.S Ground-truth value
Z
zhouxiao-coder 已提交
29 30
</p>

C
choijulie 已提交
31
## Model Overview
32

C
choijulie 已提交
33
### Model Definition
34

H
Hao Wang 已提交
35 36 37
In the dataset of Boston house price, there are 14 values associated with the home: the first 13 are used to describe various information of house, that is $x_i$ in the model; the last value is the medium price of the house we want to predict, which is $y_i$ in the model.

Therefore, our model can be expressed as:
Z
zhouxiao-coder 已提交
38 39 40

$$\hat{Y} = \omega_1X_{1} + \omega_2X_{2} + \ldots + \omega_{13}X_{13} + b$$

H
Hao Wang 已提交
41
$\hat{Y}$ represents the predicted result of the model and is used to distinguish it from the real value $Y$. The parameters to be learned by the model are: $\omega_1, \ldots, \omega_{13}, b$.
Z
zhouxiao-coder 已提交
42

黄河大鲤鱼's avatar
黄河大鲤鱼 已提交
43
After building the model, we need to give the model an optimization goal so that the learned parameters can make the predicted value $\hat{Y}$ get as close to the true value $Y$. Here we introduce the concept of loss function ([Loss Function](https://en.wikipedia.org/wiki/Loss_function), or Cost Function).  Input the target value $y_{i}$ of any data sample and the predicted value $\hat{y_{i}}$ given by a model. Then the loss function outputs a non-negative real number, which is usually used to represent model error.
Z
zhouxiao-coder 已提交
44

H
Hao Wang 已提交
45
For linear regression models, the most common loss function is the Mean Squared Error ([MSE](https://en.wikipedia.org/wiki/Mean_squared_error)), which is:
Z
zhouxiao-coder 已提交
46

47
$$MSE=\frac{1}{n}\sum_{i=1}^{n}{(\hat{Y_i}-Y_i)}^2$$
Z
zhouxiao-coder 已提交
48

H
Hao Wang 已提交
49 50 51 52 53 54 55
That is, for a test set in size of $n$, $MSE$ is the mean of the squared error of the $n$ data prediction results.

The method used to optimize the loss function is generally the gradient descent method. The gradient descent method is a first-order optimization algorithm. If $f(x)$ is defined and divisible at point $x_n$, then $f(x)$ is considered to be the fastest in the negative direction of the gradient $-▽f(x_n)$ at point of $x_n$. Adjust $x$ repeatedly to make $f(x)$ close to the local or global minimum value. The adjustment is as follows:

$$x_n+1=x_n-λ▽f(x), n≧0$$

Where λ represents the learning rate. This method of adjustment is called the gradient descent method.
C
choijulie 已提交
56

D
daminglu 已提交
57
### Training Process
Z
zhouxiao-coder 已提交
58

H
Hao Wang 已提交
59 60 61 62 63 64
After defining the model structure, we will train the model through the following steps.

  1. Initialize parameters, including weights $\omega_i$ and bias $b$, to initialize them (eg. 0 as mean, 1 as variance).
  2. Forward propagation of network calculates network output and loss functions.
  3. Reverse error propagation according to the loss function ( [backpropagation](https://en.wikipedia.org/wiki/Backpropagation) ), passing forward the network error from the output layer and updating the parameters in the network.
  4. Repeat steps 2~3 until the network training error reaches the specified level or the training round reaches the set value.
65

Y
Yan Xu 已提交
66

H
Hao Wang 已提交
67
## Dataset
68

H
Hao Wang 已提交
69 70
### Dataset Introduction
The dataset consists of 506 lines, each containing information about a type of houses in a suburb of Boston and the median price of that type of house. The meaning of each dimensional attribute is as follows:
71

H
Hao Wang 已提交
72
| Property Name | Explanation | Type |
Z
zhouxiao-coder 已提交
73
| ------| ------ | ------ |
H
Hao Wang 已提交
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
CRIM | Per capita crime rate in the town | Continuous value |
| ZN | Proportion of residential land with an area of over 25,000 square feet | Continuous value |
| INDUS | Proportion of non-retail commercial land | Continuous value |
CHAS | Whether it is adjacent to Charles River | Discrete value, 1=proximity; 0=not adjacent |
NOX | Nitric Oxide Concentration | Continuous value |
| RM | Average number of rooms per house | Continuous value |
| AGE | Proportion of self-use units built before 1940 | Continuous value |
| DIS | Weighted Distance to 5 Job Centers in Boston | Continuous value |
| RAD | Accessibility Index to Radial Highway | Continuous value |
| TAX | Tax Rate of Full-value Property | Continuous value |
| PTRATIO | Proportion of Student and Teacher | Continuous value |
| B | 1000(BK - 0.63)^2, where BK is black ratio | Continuous value |
LSTAT | Low-income population ratio | Continuous value |
| MEDV | Median price of a similar home | Continuous value |

### Data Pre-processing
C
choijulie 已提交
90

H
Hao Wang 已提交
91 92
#### Continuous value and discrete value
Analyzing the data, first we find that all 13-dimensional attributes exist 12-dimensional continuous value and 1-dimensional discrete values (CHAS). Discrete value is often represented by numbers like 0, 1, and 2, but its meaning is different from continuous value's because the difference of discrete value here has no meaning. For example, if we use 0, 1, and 2 to represent red, green, and blue, we cannot infer that the distance between blue and red is longer than that between green and red. So usually for a discrete property with $d$ possible values, we will convert them to $d$ binary properties with a value of 0 or 1 or map each possible value to a multidimensional vector. However, there is no this problem for CHAS, since CHAS itself is a binary attribute .
C
choijulie 已提交
93

H
Hao Wang 已提交
94 95
#### Normalization of attributes
Another fact that can be easily found is that the range of values of each dimensional attribute is largely different (as shown in Figure 2). For example, the value range of attribute B is [0.32, 396.90], and the value range of attribute NOX is [0.3850, 0.8170]. Here is a common operation - normalization. The goal of normalization is to scale the value of each attribute to a similar range, such as [-0.5, 0.5]. Here we use a very common operation method: subtract the mean and divide by the range of values.
Y
Yan Xu 已提交
96

H
Hao Wang 已提交
97
There are at least three reasons for implementing normalization (or [Feature scaling](https://en.wikipedia.org/wiki/Feature_scaling)):
Y
Yan Xu 已提交
98

H
Hao Wang 已提交
99
- A range of values that are too large or too small can cause floating value overflow or underflow during calculation.
C
choijulie 已提交
100

H
Hao Wang 已提交
101
- Different ranges of number result in different attributes being different for the model (at least in the initial period of training), and this implicit assumption is often unreasonable. This can make the optimization process difficult and the training time greatly longer.
C
choijulie 已提交
102

H
Hao Wang 已提交
103
- Many machine learning techniques/models (such as L1, L2 regular items, Vector Space Model) are based on the assumption that all attribute values are almost zero and their ranges of value are similar.
Y
Yan Xu 已提交
104

C
choijulie 已提交
105

Z
zhouxiao-coder 已提交
106 107

<p align="center">
H
Hao Wang 已提交
108 109
    <img src = "https://github.com/PaddlePaddle/book/blob/develop/01.fit_a_line/image/ranges.png?raw=true" width=550><br/>
    Figure 2. Value range of attributes for all dimensions
Z
zhouxiao-coder 已提交
110 111
</p>

H
Hao Wang 已提交
112 113 114
#### Organizing training set and testing set

We split the dataset into two parts: one is used to adjust the parameters of the model, that is, to train the model, the error of the model on this dataset is called ** training error **; the other is used to test.The error of the model on this dataset is called the ** test error**. The goal of our training model is to predict unknown new data by finding the regulation from the training data, so the test error is an better indicator for the performance of the model. When it comes to the ratio of the segmentation data, we should take into account two factors: more training data will reduce the square error of estimated parameters, resulting in a more reliable model; and more test data will reduce the square error of the test error, resulting in more credible test error. The split ratio set in our example is $8:2$
C
choijulie 已提交
115

116

H
Hao Wang 已提交
117
In a more complex model training process, we often need more than one dataset: the validation set. Because complex models often have some hyperparameters ([Hyperparameter](https://en.wikipedia.org/wiki/Hyperparameter_optimization)) that need to be adjusted, we will try a combination of multiple hyperparameters to train multiple models separately and then compare their performance on the validation set to select the relatively best set of hyperparameters, and finally use the model with this set of parameters to evaluate the test error on the test set. Since the model trained in this chapter is relatively simple, we won't talk about this process at present.
118

C
choijulie 已提交
119
## Training
Z
zhouxiao-coder 已提交
120

H
Hao Wang 已提交
121
`fit_a_line/trainer.py` demonstrates the overall process of training.
Y
Yi Wang 已提交
122

H
Hao Wang 已提交
123
### Configuring the Data feeder
Y
Yan Xu 已提交
124

H
Hao Wang 已提交
125
First we import the libraries:
D
daminglu 已提交
126 127 128 129 130

```python
import paddle
import paddle.fluid as fluid
import numpy
H
Hao Wang 已提交
131 132
import math
import sys
133
from __future__ import print_function
D
daminglu 已提交
134 135
```

H
Hao Wang 已提交
136
We introduced the dataset [UCI Housing dataset](http://paddlemodels.bj.bcebos.com/uci_housing/housing.data) via the uci_housing module
D
daminglu 已提交
137

H
Hao Wang 已提交
138
It is encapsulated in the uci_housing module:
Z
zhouxiao-coder 已提交
139

H
Hao Wang 已提交
140 141
1. The process of data download. The download data is saved in ~/.cache/paddle/dataset/uci_housing/housing.data.
2. The process of [data preprocessing](#data preprocessing).
D
daminglu 已提交
142

H
Hao Wang 已提交
143
Next we define the data feeder for training. The data feeder reads a batch of data in the size of `BATCH_SIZE` each time. If the user wants the data to be random, it can define data in size of a batch and a cache. In this case, each time the data feeder randomly reads as same data as the batch size from the cache.
Z
zhouxiao-coder 已提交
144

145 146
```python
BATCH_SIZE = 20
Z
zhouxiao-coder 已提交
147

148 149 150
train_reader = paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.uci_housing.train(), buf_size=500),
H
Hao Wang 已提交
151
        batch_size=BATCH_SIZE)
Z
zhouxiao-coder 已提交
152

153 154 155
test_reader = paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.uci_housing.test(), buf_size=500),
H
Hao Wang 已提交
156
        batch_size=BATCH_SIZE)
157
```
Q
qiaolongfei 已提交
158

H
Hao Wang 已提交
159
If you want to read data directly from \*.txt file, you can refer to the method as follows.
Y
Yan Xu 已提交
160

黄河大鲤鱼's avatar
黄河大鲤鱼 已提交
161
```python
H
Hao Wang 已提交
162 163 164 165
feature_names = [
    'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
    'PTRATIO', 'B', 'LSTAT', 'convert'
]
Q
qiaolongfei 已提交
166

H
Hao Wang 已提交
167 168 169 170 171
feature_num = len(feature_names)
data = numpy.fromfile(filename, sep=' ') # Read primary data from file
data = data.reshape(data.shape[0] // feature_num, feature_num)
maximums, minimums, avgs = data.max(axis=0), data.min(axis=0), data.sum(axis=0)/data.shape[0]
for i in six.moves.range(feature_num-1):
黄河大鲤鱼's avatar
黄河大鲤鱼 已提交
172
    data[:, i] = (data[:, i] - avgs[i]) / (maximums[i] - minimums[i]) # six.moves is compatible to python2 and python3
173

H
Hao Wang 已提交
174
ratio = 0.8 # distribution ratio of train dataset and verification dataset
黄河大鲤鱼's avatar
黄河大鲤鱼 已提交
175
offset = int(data.shape[0]*ratio)
H
Hao Wang 已提交
176 177
train_data = data[:offset]
test_data = data[offset:]
Z
zhouxiao-coder 已提交
178

H
Hao Wang 已提交
179 180 181 182 183 184 185 186
train_reader = paddle.batch(
    paddle.reader.shuffle(
        train_data, buf_size=500),
        batch_size=BATCH_SIZE)
test_reader = paddle.batch(
    paddle.reader.shuffle(
        test_data, buf_size=500),
        batch_size=BATCH_SIZE)
黄河大鲤鱼's avatar
黄河大鲤鱼 已提交
187
```
Y
Yan Xu 已提交
188

H
Hao Wang 已提交
189 190
### Configure Program for Training
The aim of the program for training is to define a network structure of a training model. For linear regression, it is a simple fully connected layer from input to output. More complex result, such as Convolutional Neural Network and Recurrent Neural Network, will be introduced in later chapters. It must return `mean error` as the first return value in program for training, for that `mean error` will be used for BackPropagation.
191

Z
zhouxiao-coder 已提交
192
```python
H
Hao Wang 已提交
193 194 195 196 197 198 199 200 201
x = fluid.layers.data(name='x', shape=[13], dtype='float32') # define shape and data type of input
y = fluid.layers.data(name='y', shape=[1], dtype='float32') # define shape and data type of output
y_predict = fluid.layers.fc(input=x, size=1, act=None) # fully connected layer connecting input and output

main_program = fluid.default_main_program() # get default/global main function
startup_program = fluid.default_startup_program() # get default/global launch program

cost = fluid.layers.square_error_cost(input=y_predict, label=y) # use label and output predicted data to estimate square error
avg_loss = fluid.layers.mean(cost) # compute mean value for square error and get mean loss
Z
zhouxiao-coder 已提交
202
```
H
Hao Wang 已提交
203 204 205
For details, please refer to:
[fluid.default_main_program](http://www.paddlepaddle.org/documentation/docs/zh/develop/api_cn/fluid_cn.html#default-main-program)
[fluid.default_startup_program](http://www.paddlepaddle.org/documentation/docs/zh/develop/api_cn/fluid_cn.html#default-startup-program)
Z
zhouxiao-coder 已提交
206

H
Hao Wang 已提交
207
### Optimizer Function Configuration
Y
Yi Wang 已提交
208

H
Hao Wang 已提交
209
`SGD optimizer`, `learning_rate` below are learning rate, which is related to rate of convergence for train of network.
210

Z
zhouxiao-coder 已提交
211
```python
H
Hao Wang 已提交
212 213 214 215 216 217 218
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_loss)

#Clone main_program to get test_program
# operations of some operators are different between train and test. For example, batch_norm use parameter for_test to determine whether the program is for training or for testing.
#The api will not delete any operator, please apply it before backward and optimization.
test_program = main_program.clone(for_test=True)
Y
Yi Wang 已提交
219 220
```

H
Hao Wang 已提交
221
### Define Training Place
222

H
Hao Wang 已提交
223
We can define whether an operation runs on the CPU or on the GPU.
L
liaogang 已提交
224

H
Hao Wang 已提交
225 226 227
```python
use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() # define the execution space of executor
R
root 已提交
228

H
Hao Wang 已提交
229 230
###executor can accept input program and add data input operator and result fetch operator based on feed map and fetch list. Use close() to close executor and call run(...) to run the program.
exe = fluid.Executor(place)
D
daminglu 已提交
231

H
Hao Wang 已提交
232 233 234
```
For details, please refer to:
[fluid.executor](http://www.paddlepaddle.org/documentation/docs/zh/develop/api_cn/fluid_cn.html#permalink-15-executor)
Q
qiaolongfei 已提交
235

H
Hao Wang 已提交
236 237
### Create Training Process
To train, it needs a train program and some parameters and creates a function to get test error in the process of train necessary parameters contain executor, program, reader, feeder, fetch_list,  executor represents executor created before. Program created before represents program executed by executor. If the parameter is undefined, then it is defined default_main_program by default. Reader represents data read. Feeder represents forward input variable and fetch_list represents variable user wants to get or name.
238

H
Hao Wang 已提交
239 240 241 242 243 244 245 246 247 248 249 250 251
```python
num_epochs = 100

def train_test(executor, program, reader, feeder, fetch_list):
    accumulated = 1 * [0]
    count = 0
    for data_test in reader():
        outs = executor.run(program=program,
                            feed=feeder.feed(data_test),
                            fetch_list=fetch_list)
        accumulated = [x_c[0] + x_c[1][0] for x_c in zip(accumulated, outs)] # accumulate loss value in the process of test
        count += 1 # accumulate samples in test dataset
    return [x_d / count for x_d in accumulated] # compute mean loss
252

Z
zhouxiao-coder 已提交
253
```
Y
Yi Wang 已提交
254

H
Hao Wang 已提交
255
### Train Main Loop
Y
Yan Xu 已提交
256

H
Hao Wang 已提交
257
give name of directory to be stored and initialize an executor
Z
zhouxiao-coder 已提交
258

259
```python
260
%matplotlib inline
H
Hao Wang 已提交
261 262 263 264 265 266 267 268
params_dirname = "fit_a_line.inference.model"
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe.run(startup_program)
train_prompt = "train cost"
test_prompt = "test cost"
from paddle.utils.plot import Ploter
plot_prompt = Ploter(train_prompt, test_prompt)
step = 0
269

H
Hao Wang 已提交
270
exe_test = fluid.Executor(place)
Z
zhouxiao-coder 已提交
271
```
H
Hao Wang 已提交
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297
Paddlepaddle provides reader mechanism to read training data. Reader provide multiple columns of data at one time. Therefore, we need a python list to read sequence. We create a loop to train until the result of train is good enough or time of loop is enough.
If the number of iterations for train is equal to the number of iterations for saving parameters, you can save train parameter into `params_dirname`.
Set main loop for training.
```python
for pass_id in range(num_epochs):
    for data_train in train_reader():
        avg_loss_value, = exe.run(main_program,
                                  feed=feeder.feed(data_train),
                                  fetch_list=[avg_loss])
        if step % 10 == 0: # record and output train loss for every 10 batches.
            plot_prompt.append(train_prompt, step, avg_loss_value[0])
            plot_prompt.plot()
            print("%s, Step %d, Cost %f" %
                      (train_prompt, step, avg_loss_value[0]))
        if step % 100 == 0:  # record and output test loss for every 100 batches.
            test_metics = train_test(executor=exe_test,
                                     program=test_program,
                                     reader=test_reader,
                                     fetch_list=[avg_loss.name],
                                     feeder=feeder)
            plot_prompt.append(test_prompt, step, test_metics[0])
            plot_prompt.plot()
            print("%s, Step %d, Cost %f" %
                      (test_prompt, step, test_metics[0]))
            if test_metics[0] < 10.0: # If the accuracy is up to the requirement, the train can be stopped.
                break
Z
zhouxiao-coder 已提交
298

H
Hao Wang 已提交
299
        step += 1
Q
qiaolongfei 已提交
300

H
Hao Wang 已提交
301 302
        if math.isnan(float(avg_loss_value[0])):
            sys.exit("got NaN loss, training failed.")
303

H
Hao Wang 已提交
304 305 306 307
        #save train parameters into the path given before
        if params_dirname is not None:
            fluid.io.save_inference_model(params_dirname, ['x'], [y_predict], exe)
```
Q
qiaolongfei 已提交
308

H
Hao Wang 已提交
309 310
## Predict
It needs to create trained parameters to run program for prediction. The trained parameters is in `params_dirname`.
Y
Yan Xu 已提交
311

H
Hao Wang 已提交
312 313
### Prepare Environment for Prediction
Similar to the process of training, predictor needs a program for prediction. We can slightly modify our training program to include the prediction value.
Q
qiaolongfei 已提交
314 315

```python
H
Hao Wang 已提交
316 317
infer_exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
Q
qiaolongfei 已提交
318 319
```

H
Hao Wang 已提交
320
### Predict
D
daminglu 已提交
321

H
Hao Wang 已提交
322
Save pictures
Q
qiaolongfei 已提交
323
```python
H
Hao Wang 已提交
324 325 326 327 328 329 330 331 332 333 334 335 336 337
def save_result(points1, points2):
    import matplotlib
    matplotlib.use('Agg')
    import matplotlib.pyplot as plt
    x1 = [idx for idx in range(len(points1))]
    y1 = points1
    y2 = points2
    l1 = plt.plot(x1, y1, 'r--', label='predictions')
    l2 = plt.plot(x1, y2, 'g--', label='GT')
    plt.plot(x1, y1, 'ro-', x1, y2, 'g+-')
    plt.title('predictions VS GT')
    plt.legend()
    plt.savefig('./image/prediction_gt.png')
```
Q
qiaolongfei 已提交
338

H
Hao Wang 已提交
339
Via fluid.io.load_inference_model, predictor will read well-trained model from `params_dirname` to predict unknown data.
Q
qiaolongfei 已提交
340

H
Hao Wang 已提交
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370
```python
with fluid.scope_guard(inference_scope):
    [inference_program, feed_target_names,
     fetch_targets] = fluid.io.load_inference_model(params_dirname, infer_exe) # load pre-predict model
    batch_size = 10

    infer_reader = paddle.batch(
        paddle.dataset.uci_housing.test(), batch_size=batch_size) # prepare test dataset

    infer_data = next(infer_reader())
    infer_feat = numpy.array(
        [data[0] for data in infer_data]).astype("float32") # extract data in test dataset
    infer_label = numpy.array(
        [data[1] for data in infer_data]).astype("float32") # extract label in test dataset

    assert feed_target_names[0] == 'x'
    results = infer_exe.run(inference_program,
                            feed={feed_target_names[0]: numpy.array(infer_feat)},
                            fetch_list=fetch_targets) # predict
    #print predict result and label and visualize the result
    print("infer results: (House Price)")
    for idx, val in enumerate(results[0]):
        print("%d: %.2f" % (idx, val)) # print predict result

    print("\nground truth:")
    for idx, val in enumerate(infer_label):
        print("%d: %.2f" % (idx, val)) # print label

    save_result(results[0], infer_label) # save picture
```
371 372


Q
qiaolongfei 已提交
373

C
choijulie 已提交
374
## Summary
H
Hao Wang 已提交
375
In this chapter, we analyzed dataset of Boston House Price to introduce the basic concepts of linear regression model and how to use PaddlePaddle to implement training and testing. A number of models and theories are derived from linear regression model. Therefore, it is not unnecessary to figure out the principle and limitation of linear regression model.
Z
zhouxiao-coder 已提交
376

H
Hao Wang 已提交
377
<a name="References"></a>
C
choijulie 已提交
378
## References
Z
zhouxiao-coder 已提交
379 380 381
1. https://en.wikipedia.org/wiki/Linear_regression
2. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning[M]. Springer, Berlin: Springer series in statistics, 2001.
3. Murphy K P. Machine learning: a probabilistic perspective[M]. MIT press, 2012.
Z
zhouxiao-coder 已提交
382
4. Bishop C M. Pattern recognition[J]. Machine Learning, 2006, 128.
L
Luo Tao 已提交
383 384

<br/>
H
Hao Wang 已提交
385
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This tutorial</span> is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.