index.html 27.8 KB
Newer Older
1

Y
Yu Yang 已提交
2 3 4 5
<html>
<head>
  <script type="text/x-mathjax-config">
  MathJax.Hub.Config({
Y
Yu Yang 已提交
6
    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
Y
Yu Yang 已提交
7 8
    jax: ["input/TeX", "output/HTML-CSS"],
    tex2jax: {
9 10
      inlineMath: [ ['$','$'] ],
      displayMath: [ ['$$','$$'] ],
Y
Yu Yang 已提交
11 12 13 14
      processEscapes: true
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });
Y
Yi Wang 已提交
15 16
  </script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
Y
Yu Yang 已提交
17
  <script type="text/javascript" src="../.tools/theme/marked.js">
Y
Yu Yang 已提交
18 19
  </script>
  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
Y
Yi Wang 已提交
20
  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
Y
Yu Yang 已提交
21
  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
Y
Yu Yang 已提交
22
  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
Y
Yu Yang 已提交
23
  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
Y
Yu Yang 已提交
24 25
</head>
<style type="text/css" >
Y
Yu Yang 已提交
26 27 28 29 30 31
.markdown-body {
    box-sizing: border-box;
    min-width: 200px;
    max-width: 980px;
    margin: 0 auto;
    padding: 45px;
Y
Yu Yang 已提交
32 33 34 35
}
</style>


Y
Yu Yang 已提交
36
<body>
Y
Yu Yang 已提交
37

Y
Yu Yang 已提交
38
<div id="context" class="container-fluid markdown-body">
Y
Yu Yang 已提交
39 40 41 42
</div>

<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
C
choijulie 已提交
43
# Recognize Digits
Y
Yu Yang 已提交
44

W
Wang,Jeff 已提交
45 46
The source code for this tutorial is here:  [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits).
For instructions on getting started with Paddle, please refer to [installation instructions](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
Y
Yu Yang 已提交
47

C
choijulie 已提交
48
## Introduction
W
Wang,Jeff 已提交
49 50 51 52 53
When one learns to program, the first task is usually to write a program that prints "Hello World!".
In Machine Learning or Deep Learning, an equivalent task is to train a model to recognize hand-written digits using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset.
Handwriting recognition is a classic image classification problem. The problem is relatively easy and MNIST is a complete dataset.
As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1).
The input image is a $28\times28$ matrix, and the label is one of the digits from $0$ to $9$. All images are normalized, meaning that they are both rescaled and centered.
Y
Yu Yang 已提交
54 55 56

<p align="center">
<img src="image/mnist_example_image.png" width="400"><br/>
C
choijulie 已提交
57
Fig. 1. Examples of MNIST images
Y
Yu Yang 已提交
58 59
</p>

W
Wang,Jeff 已提交
60 61 62 63
The MNIST dataset is from the [NIST](https://www.nist.gov/srd/nist-special-database-19) Special Database 3 (SD-3) and the Special Database 1 (SD-1).
The SD-3 is labeled by the staff of the U.S. Census Bureau, while SD-1 is labeled by high school students. Therefore the SD-3 is cleaner and easier to recognize than the SD-1 dataset.
Yann LeCun et al. used half of the samples from each of SD-1 and SD-3 to create the MNIST training set of 60,000 samples and test set of 10,000 samples.
250 annotators labeled the training set, thus guaranteed that there wasn't a complete overlap of annotators of training set and test set.
Y
Yu Yang 已提交
64

W
Wang,Jeff 已提交
65 66 67 68 69
The MNIST dataset has been used for evaluating many image recognition algorithms such as a single layer linear classifier,
Multilayer Perceptron (MLP) and Multilayer CNN LeNet\[[1](#references)\], K-Nearest Neighbors (k-NN) \[[2](#references)\], Support Vector Machine (SVM) \[[3](#references)\],
Neural Networks \[[4-7](#references)\], Boosting \[[8](#references)\] and preprocessing methods like distortion removal, noise removal, and blurring.
Among these algorithms, the *Convolutional Neural Network* (CNN) has achieved a series of impressive results in Image Classification tasks, including VGGNet, GoogLeNet,
and ResNet (See [Image Classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) tutorial).
Y
Yu Yang 已提交
70

Y
Yi Wang 已提交
71
In this tutorial, we start with a simple **softmax** regression model and go on with MLP and CNN.  Readers will see how these methods improve the recognition accuracy step-by-step.
Y
Yu Yang 已提交
72 73


C
choijulie 已提交
74
## Model Overview
Y
Yu Yang 已提交
75

C
choijulie 已提交
76 77 78 79
Before introducing classification algorithms and training procedure, we define the following symbols:
- $X$ is the input: Input is a $28\times 28$ MNIST image. It is flattened to a $784$ dimensional vector. $X=\left (x_0, x_1, \dots, x_{783} \right )$.
- $Y$ is the output: Output of the classifier is 1 of the 10 classes (digits from 0 to 9). $Y=\left (y_0, y_1, \dots, y_9 \right )$. Each dimension $y_i$ represents the probability that the input image belongs to class $i$.
- $L$ is the ground truth label: $L=\left ( l_0, l_1, \dots, l_9 \right )$. It is also 10 dimensional, but only one entry is $1$ and all others are $0$s.
Y
Yu Yang 已提交
80

C
choijulie 已提交
81
### Softmax Regression
Y
Yu Yang 已提交
82

C
choijulie 已提交
83
In a simple softmax regression model, the input is first fed to fully connected layers. Then, a softmax function is applied to output probabilities of multiple output classes\[[9](#references)\].
Y
Yu Yang 已提交
84

C
choijulie 已提交
85
The input $X$ is multiplied by weights $W$ and then added to the bias $b$ to generate activations.
Y
Yu Yang 已提交
86

L
Luo Tao 已提交
87
$$ y_i = \text{softmax}(\sum_j W_{i,j}x_j + b_i) $$
Y
Yu Yang 已提交
88

C
choijulie 已提交
89
where $ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $
Y
Yu Yang 已提交
90

K
Kavya Srinet 已提交
91
For an $N$-class classification problem with $N$ output nodes, Softmax normalizes the resulting $N$ dimensional vector so that each of its entries falls in the range $[0,1]\in {R}$, representing the probability that the sample belongs to a certain class. Here $y_i$ denotes the predicted probability that an image is of digit $i$.
Y
Yu Yang 已提交
92

C
choijulie 已提交
93
In such a classification problem, we usually use the cross entropy loss function:
Y
Yu Yang 已提交
94

95
$$  \text{_L_<sub>cross-entropy</sub>}(label, y) = -\sum_i label_ilog(y_i) $$
Y
Yu Yang 已提交
96

C
choijulie 已提交
97
Fig. 2 illustrates a softmax regression network, with the weights in blue, and the bias in red. `+1` indicates that the bias is $1$.
Y
Yu Yang 已提交
98

Y
Yu Yang 已提交
99
<p align="center">
C
choijulie 已提交
100 101
<img src="image/softmax_regression_en.png" width=400><br/>
Fig. 2. Softmax regression network architecture<br/>
Y
Yu Yang 已提交
102 103
</p>

C
choijulie 已提交
104
### Multilayer Perceptron
Y
Yu Yang 已提交
105

C
choijulie 已提交
106
The softmax regression model described above uses the simplest two-layer neural network. That is, it only contains an input layer and an output layer, with limited regression capability. To achieve better recognition results, consider adding several hidden layers\[[10](#references)\] between the input layer and the output layer.
Y
Yu Yang 已提交
107

C
choijulie 已提交
108 109 110
1.  After the first hidden layer, we get $ H_1 = \phi(W_1X + b_1) $, where $\phi$ denotes the activation function. Some [common ones](###list-of-common-activation-functions) are sigmoid, tanh and ReLU.
2.  After the second hidden layer, we get $ H_2 = \phi(W_2H_1 + b_2) $.
3.  Finally, the output layer outputs $Y=\text{softmax}(W_3H_2 + b_3)$, the vector denoting our classification result.
111

C
choijulie 已提交
112
Fig. 3. shows a Multilayer Perceptron network, with the weights in blue, and the bias in red. +1 indicates that the bias is $1$.
Y
Yu Yang 已提交
113

Y
Yu Yang 已提交
114
<p align="center">
C
choijulie 已提交
115 116
<img src="image/mlp_en.png" width=500><br/>
Fig. 3. Multilayer Perceptron network architecture<br/>
Y
Yu Yang 已提交
117

L
Luo Tao 已提交
118 119
</p>

C
choijulie 已提交
120
### Convolutional Neural Network
Y
Yu Yang 已提交
121

C
choijulie 已提交
122
#### Convolutional Layer
L
Luo Tao 已提交
123

Y
Yu Yang 已提交
124
<p align="center">
D
dangqingqing 已提交
125
<img src="image/conv_layer.png" width='750'><br/>
C
choijulie 已提交
126
Fig. 4. Convolutional layer<br/>
Y
Yu Yang 已提交
127
</p>
Y
Yu Yang 已提交
128

C
choijulie 已提交
129
The **convolutional layer** is the core of a Convolutional Neural Network. The parameters in this layer are composed of a set of filters, also called kernels. We could visualize the convolution step in the following fashion: Each kernel slides horizontally and vertically till it covers the whole image. At every window, we compute the dot product of the kernel and the input. Then, we add the bias and apply an activation function. The result is a two-dimensional activation map. For example, some kernel may recognize corners, and some may recognize circles. These convolution kernels may respond strongly to the corresponding features.
L
Luo Tao 已提交
130

Y
Yi Wang 已提交
131
Fig. 4 illustrates the dynamic programming of a convolutional layer, where depths are flattened for simplicity. The input is $W_1=5$, $H_1=5$, $D_1=3$. In fact, this is a common representation for colored images. $W_1$ and $H_1$ correspond to the width and height in a colored image. $D_1$ corresponds to the three color channels for RGB. The parameters of the convolutional layer are $K=2$, $F=3$, $S=2$, $P=1$. $K$ denotes the number of kernels; specifically, $Filter$ $W_0$ and $Filter$ $W_1$ are the kernels. $F$ is kernel size while $W0$ and $W1$ are both $F\timesF = 3\times3$ matrices in all depths. $S$ is the stride, which is the width of the sliding window; here, kernels move leftwards or downwards by two units each time. $P$ is the width of the padding, which denotes an extension of the input; here, the gray area shows zero padding with size 1.
L
Luo Tao 已提交
132

C
choijulie 已提交
133
#### Pooling Layer
Y
Yu Yang 已提交
134

C
choijulie 已提交
135 136 137 138
<p align="center">
<img src="image/max_pooling_en.png" width="400px"><br/>
Fig. 5 Pooling layer using max-pooling<br/>
</p>
L
Luo Tao 已提交
139

C
choijulie 已提交
140
A **pooling layer** performs downsampling. The main functionality of this layer is to reduce computation by reducing the network parameters. It also prevents over-fitting to some extent. Usually, a pooling layer is added after a convolutional layer. Pooling layer can use various techniques, such as max pooling and average pooling. As shown in Fig.5, max pooling uses rectangles to segment the input layer into several parts and computes the maximum value in each part as the output.
Y
Yu Yang 已提交
141

C
choijulie 已提交
142
#### LeNet-5 Network
Y
Yu Yang 已提交
143

Y
Yu Yang 已提交
144
<p align="center">
C
choijulie 已提交
145 146
<img src="image/cnn_en.png"><br/>
Fig. 6. LeNet-5 Convolutional Neural Network architecture<br/>
Y
Yu Yang 已提交
147
</p>
Y
Yu Yang 已提交
148

C
choijulie 已提交
149 150
[**LeNet-5**](http://yann.lecun.com/exdb/lenet/) is one of the simplest Convolutional Neural Networks. Fig. 6. shows its architecture: A 2-dimensional input image is fed into two sets of convolutional layers and pooling layers. This output is then fed to a fully connected layer and a softmax classifier. Compared to multilayer, fully connected perceptrons, the LeNet-5 can recognize images better. This is due to the following three properties of the convolution:

Y
Yi Wang 已提交
151
- The 3D nature of the neurons: a convolutional layer is organized by width, height, and depth. Neurons in each layer are connected to only a small region in the previous layer. This region is called the receptive field.
C
choijulie 已提交
152
- Local connectivity: A CNN utilizes the local space correlation by connecting local neurons. This design guarantees that the learned filter has a strong response to local input features. Stacking many such layers generates a non-linear filter that is more global. This enables the network to first obtain good representation for small parts of input and then combine them to represent a larger region.
Y
Yi Wang 已提交
153
- Weight sharing: In a CNN, computation is iterated on shared parameters (weights and bias) to form a feature map. This means that all the neurons in the same depth of the output response to the same feature. This allows the network to detect a feature regardless of its position in the input.
C
choijulie 已提交
154

K
Kavya Srinet 已提交
155
For more details on Convolutional Neural Networks, please refer to the tutorial on [Image Classification](https://github.com/PaddlePaddle/book/blob/develop/image_classification/README.md) and the [relevant lecture](http://cs231n.github.io/convolutional-networks/) from a Stanford course.
Y
Yu Yang 已提交
156

Y
Yi Wang 已提交
157
### List of Common Activation Functions
C
choijulie 已提交
158
- Sigmoid activation function: $ f(x) = sigmoid(x) = \frac{1}{1+e^{-x}} $
Y
Yu Yang 已提交
159

C
choijulie 已提交
160
- Tanh activation function: $ f(x) = tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}} $
Y
Yu Yang 已提交
161

C
choijulie 已提交
162
  In fact, tanh function is just a rescaled version of the sigmoid function. It is obtained by magnifying the value of the sigmoid function and moving it downwards by 1.
Y
Yu Yang 已提交
163

C
choijulie 已提交
164
- ReLU activation function: $ f(x) = max(0, x) $
Y
Yu Yang 已提交
165

C
choijulie 已提交
166
For more information, please refer to [Activation functions on Wikipedia](https://en.wikipedia.org/wiki/Activation_function).
Y
Yu Yang 已提交
167

C
choijulie 已提交
168
## Data Preparation
Y
Yu Yang 已提交
169

C
choijulie 已提交
170
PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads and caches the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).  The cache is under `/home/username/.cache/paddle/dataset/mnist`:
Y
Yu Yang 已提交
171 172


C
choijulie 已提交
173 174 175 176 177 178
|    File name          |       Description | Size            |
|----------------------|--------------|-----------|
|train-images-idx3-ubyte|  Training images | 60,000 |
|train-labels-idx1-ubyte|  Training labels | 60,000 |
|t10k-images-idx3-ubyte |  Evaluation images | 10,000 |
|t10k-labels-idx1-ubyte |  Evaluation labels | 10,000 |
Y
Yu Yang 已提交
179 180


W
Wang,Jeff 已提交
181 182 183 184 185 186 187 188 189 190
## Fluid API Overview

The demo will be using the latest paddle fluid API. Fluid API is the latest Paddle API. It simplifies the model configurations without sacrifice the performance.
We recommend using Fluid API as it is much easier to pick up.
Here are the quick overview on the major fluid API complements.

1. `inference_program`: A function that specify how to get the prediction from the data input.
This is where you specify the network flow.
1. `train_program`: A function that specify how to get avg_cost from `inference_program` and labels.
This is where you specify the loss calculations.
191
1. `optimizer_func`:"A function that specifies the configuration of the the optimizer. The optimizer is responsible for minimizing the loss and driving the training. Paddle supports many different optimizers."
W
Wang,Jeff 已提交
192 193 194 195 196 197 198
1. `Trainer`: Fluid trainer manages the training process specified by the `train_program` and `optimizer`. Users can monitor the training
progress through the `event_handler` callback function.
1. `Inferencer`: Fluid inferencer loads the `inference_program` and the parameters trained by the Trainer.
It then can infer the data and return prediction

We will go though all of them and dig more on the configurations in this demo.

C
choijulie 已提交
199
## Model Configuration
Y
Yu Yang 已提交
200

C
choijulie 已提交
201
A PaddlePaddle program starts from importing the API package:
Y
Yu Yang 已提交
202 203

```python
204
import paddle
205
import paddle.fluid as fluid
206
from __future__ import print_function
Y
Yu Yang 已提交
207 208
```

W
Wang,Jeff 已提交
209 210 211 212 213
### Program Functions Configuration

First, We need to setup the `inference_program` function. We want to use this program to demonstrate three different classifiers, each defined as a Python function.
We need to feed image data to the classifier. PaddlePaddle provides a special layer `layer.data` for reading data.
Let us create a data layer for reading images and connect it to the classification network.
C
choijulie 已提交
214 215

- Softmax regression: the network has a fully-connection layer with softmax activation:
Y
Yu Yang 已提交
216 217

```python
218 219
def softmax_regression():
    img = fluid.layers.data(name='img', shape=[1, 28, 28], dtype='float32')
W
Wang,Jeff 已提交
220 221
    predict = fluid.layers.fc(
        input=img, size=10, act='softmax')
Y
Yu Yang 已提交
222 223
    return predict
```
C
choijulie 已提交
224

225
- Multi-Layer Perceptron: this network has two hidden fully-connected layers, both are using ReLU as activation function. The output layer is using softmax activation:
Y
Yu Yang 已提交
226 227

```python
228 229
def multilayer_perceptron():
    img = fluid.layers.data(name='img', shape=[1, 28, 28], dtype='float32')
W
Wang,Jeff 已提交
230
    # first fully-connected layer, using ReLu as its activation function
231
    hidden = fluid.layers.fc(input=img, size=200, act='relu')
W
Wang,Jeff 已提交
232
    # second fully-connected layer, using ReLu as its activation function
233 234 235
    hidden = fluid.layers.fc(input=hidden, size=200, act='relu')
    prediction = fluid.layers.fc(input=hidden, size=10, act='softmax')
    return prediction
Y
Yu Yang 已提交
236
```
C
choijulie 已提交
237 238

- Convolution network LeNet-5: the input image is fed through two convolution-pooling layers, a fully-connected layer, and the softmax output layer:
Y
Yu Yang 已提交
239 240

```python
241 242
def convolutional_neural_network():
    img = fluid.layers.data(name='img', shape=[1, 28, 28], dtype='float32')
W
Wang,Jeff 已提交
243
    # first conv pool
244
    conv_pool_1 = fluid.nets.simple_img_conv_pool(
Y
Yu Yang 已提交
245 246 247 248 249
        input=img,
        filter_size=5,
        num_filters=20,
        pool_size=2,
        pool_stride=2,
250 251
        act="relu")
    conv_pool_1 = fluid.layers.batch_norm(conv_pool_1)
W
Wang,Jeff 已提交
252
    # second conv pool
253
    conv_pool_2 = fluid.nets.simple_img_conv_pool(
Y
Yu Yang 已提交
254 255 256 257 258
        input=conv_pool_1,
        filter_size=5,
        num_filters=50,
        pool_size=2,
        pool_stride=2,
259
        act="relu")
W
Wang,Jeff 已提交
260
    # output layer with softmax activation function. size = 10 since there are only 10 possible digits.
261 262
    prediction = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
    return prediction
Y
Yu Yang 已提交
263 264
```

W
Wang,Jeff 已提交
265
#### Train Program Configuration
266 267 268 269 270 271
Then we need to setup the the `train_program`. It takes the prediction from the classifier first.
During the training, it will calculate the `avg_loss` from the prediction.

**NOTE:** A train program should return an array and the first return argument has to be `avg_cost`.
The trainer always implicitly use it to calculate the gradient.

W
Wang,Jeff 已提交
272
Please feel free to modify the code to test different results between `softmax regression`, `mlp`, and `convolutional neural network` classifier.
Y
Yu Yang 已提交
273

Y
Yi Wang 已提交
274
```python
275 276 277
def train_program():
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')

W
Wang,Jeff 已提交
278
    # predict = softmax_regression() # uncomment for Softmax
279 280
    # predict = multilayer_perceptron() # uncomment for MLP
    predict = convolutional_neural_network() # uncomment for LeNet5
W
Wang,Jeff 已提交
281 282

    # Calculate the cost from the prediction and label.
283 284 285
    cost = fluid.layers.cross_entropy(input=predict, label=label)
    avg_cost = fluid.layers.mean(cost)
    acc = fluid.layers.accuracy(input=predict, label=label)
286 287

    # The first item needs to be avg_cost.
288
    return [avg_cost, acc]
Y
Yu Yang 已提交
289 290
```

291 292
#### Optimizer Function Configuration

293
In the following `Adam` optimizer, `learning_rate` specifies the learning rate in the optimization procedure.
294 295 296 297 298 299

```python
def optimizer_program():
    return fluid.optimizer.Adam(learning_rate=0.001)
```

W
Wang,Jeff 已提交
300
### Data Feeders Configuration
Y
Yu Yang 已提交
301

302
Then we specify the training data `paddle.dataset.mnist.train()` and testing data `paddle.dataset.mnist.test()`. These two methods are *reader creators*. Once called, a reader creator returns a *reader*.  A reader is a Python method, which, once called, returns a Python generator, which yields instances of data.
Y
Yu Yang 已提交
303

Y
Yi Wang 已提交
304
`shuffle` is a reader decorator. It takes a reader A as input and returns a new reader B. Under the hood, B calls A to read data in the following fashion: it copies in `buffer_size` instances at a time into a buffer, shuffles the data, and yields the shuffled instances one at a time. A large buffer size would yield very shuffled data.
Y
Yu Yang 已提交
305

Y
Yi Wang 已提交
306
`batch` is a special decorator, which takes a reader and outputs a *batch reader*, which doesn't yield an instance, but a minibatch at a time.
Y
Yu Yang 已提交
307

Q
qiaolongfei 已提交
308
```python
309 310 311 312
train_reader = paddle.batch(
        paddle.reader.shuffle(
            paddle.dataset.mnist.train(), buf_size=500),
        batch_size=64)
L
liaogang 已提交
313

314 315
test_reader = paddle.batch(
            paddle.dataset.mnist.test(), batch_size=64)
Q
qiaolongfei 已提交
316 317
```

W
Wang,Jeff 已提交
318 319 320
### Trainer Configuration

Now, we need to setup the trainer. The trainer need to take in `train_program`, `place`, and `optimizer`.
C
choijulie 已提交
321

Y
Yi Wang 已提交
322
```python
323 324
use_cuda = False # set to True if training with GPU
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
W
Wang,Jeff 已提交
325

Y
yuyang 已提交
326
trainer = fluid.contrib.trainer.Trainer(
327
    train_func=train_program, place=place, optimizer_func=optimizer_program)
W
Wang,Jeff 已提交
328
 ```
L
Luo Tao 已提交
329

W
Wang,Jeff 已提交
330 331
#### Event Handler

332 333 334
Fluid API provides a hook to the callback function during training. Users are able to monitor training progress through mechanism.
We will demonstrate two event handlers here. Please feel free to modify on the Jupyter notebook to see the differences.

W
Wang,Jeff 已提交
335 336 337
`event_handler` is used to plot some text data when training.

```python
338 339
# Save the parameter into a directory. The Inferencer can load the parameters from it to do infer
params_dirname = "recognize_digits_network.inference.model"
340
lists = []
L
Luo Tao 已提交
341
def event_handler(event):
Y
yuyang 已提交
342
    if isinstance(event, fluid.contrib.trainer.EndStepEvent):
343 344 345
        if event.step % 100 == 0:
            # event.metrics maps with train program return arguments.
            # event.metrics[0] will yeild avg_cost and event.metrics[1] will yeild acc in this example.
346 347
            print("Pass %d, Batch %d, Cost %f" % (
                event.step, event.epoch, event.metrics[0]))
348

Y
yuyang 已提交
349
    if isinstance(event, fluid.contrib.trainer.EndEpochEvent):
W
Wang,Jeff 已提交
350 351
        avg_cost, acc = trainer.test(
            reader=test_reader, feed_order=['img', 'label'])
352 353 354 355 356 357 358 359 360 361 362 363 364

        print("Test with Epoch %d, avg_cost: %s, acc: %s" % (event.epoch, avg_cost, acc))

        # save parameters
        trainer.save_params(params_dirname)
        lists.append((event.epoch, avg_cost, acc))
```

`event_handler_plot` is used to plot a figure like below:

![png](./image/train_and_test.png)

```python
S
shippingwang 已提交
365
from paddle.utils import Ploter
366 367 368 369 370

train_title = "Train cost"
test_title = "Test cost"
cost_ploter = Ploter(train_title, test_title)
step = 0
W
Wang,Jeff 已提交
371 372
lists = []

373 374 375
# event_handler to plot a figure
def event_handler_plot(event):
    global step
Y
yuyang 已提交
376
    if isinstance(event, fluid.contrib.trainer.EndStepEvent):
377 378 379 380 381 382
        if step % 100 == 0:
            # event.metrics maps with train program return arguments.
            # event.metrics[0] will yeild avg_cost and event.metrics[1] will yeild acc in this example.
            cost_ploter.append(train_title, step, event.metrics[0])
            cost_ploter.plot()
        step += 1
Y
yuyang 已提交
383
    if isinstance(event, fluid.contrib.trainer.EndEpochEvent):
384
        # save parameters
W
Wang,Jeff 已提交
385
        trainer.save_params(params_dirname)
386 387 388 389

        avg_cost, acc = trainer.test(
            reader=test_reader, feed_order=['img', 'label'])
        cost_ploter.append(test_title, step, avg_cost)
W
Wang,Jeff 已提交
390
        lists.append((event.epoch, avg_cost, acc))
Q
qiaolongfei 已提交
391
```
L
Luo Tao 已提交
392

393 394
#### Start training

395
Now that we setup the event_handler and the reader, we can start training the model. `feed_order` is used to map the data dict to the train_program
396

Q
qiaolongfei 已提交
397
```python
K
Kavya Srinet 已提交
398
# Train the model now
L
Luo Tao 已提交
399
trainer.train(
400 401
    num_epochs=5,
    event_handler=event_handler_plot,
402 403
    reader=train_reader,
    feed_order=['img', 'label'])
Y
Yu Yang 已提交
404 405
```

C
choijulie 已提交
406
During training, `trainer.train` invokes `event_handler` for certain events. This gives us a chance to print the training progress.
Y
Yu Yang 已提交
407

408 409 410 411 412 413 414 415 416 417 418 419 420
```
Pass 0, Batch 0, Cost 0.125650
Pass 100, Batch 0, Cost 0.161387
Pass 200, Batch 0, Cost 0.040036
Pass 300, Batch 0, Cost 0.023391
Pass 400, Batch 0, Cost 0.005856
Pass 500, Batch 0, Cost 0.003315
Pass 600, Batch 0, Cost 0.009977
Pass 700, Batch 0, Cost 0.020959
Pass 800, Batch 0, Cost 0.105560
Pass 900, Batch 0, Cost 0.239809
Test with Epoch 0, avg_cost: 0.053097883707459624, acc: 0.9822850318471338
```
Y
Yu Yang 已提交
421

C
choijulie 已提交
422
After the training, we can check the model's prediction accuracy.
Y
Yu Yang 已提交
423

424
```python
C
choijulie 已提交
425 426 427
# find the best pass
best = sorted(lists, key=lambda list: float(list[1]))[0]
print 'Best pass is %s, testing Avgcost is %s' % (best[0], best[1])
428
print 'The classification accuracy is %.2f%%' % (float(best[2]) * 100)
C
choijulie 已提交
429
```
L
liaogang 已提交
430

C
choijulie 已提交
431
Usually, with MNIST data, the softmax regression model achieves an accuracy around 92.34%, the MLP 97.66%, and the convolution network around 99.20%. Convolution layers have been widely considered a great invention for image processing.
L
liaogang 已提交
432

C
choijulie 已提交
433 434
## Application

Y
yuyang 已提交
435
After training, users can use the trained model to classify images. The following code shows how to inference MNIST images through `fluid.contrib.inferencer.Inferencer`.
L
liaogang 已提交
436

437 438 439 440 441 442
### Create Inferencer

The `Inferencer` takes an `infer_func` and `param_path` to setup the network and the trained parameters.
We can simply plug-in the classifier defined earlier here.

```python
Y
yuyang 已提交
443
inferencer = fluid.contrib.inferencer.Inferencer(
444 445 446 447 448 449 450 451 452 453 454
    # infer_func=softmax_regression, # uncomment for softmax regression
    # infer_func=multilayer_perceptron, # uncomment for MLP
    infer_func=convolutional_neural_network,  # uncomment for LeNet5
    param_path=params_dirname,
    place=place)
```

#### Generate input data for inferring

`infer_3.png` is an example image of the digit `3`. Turn it into an numpy array to match the data feeder format.

L
liaogang 已提交
455
```python
456 457 458 459 460 461 462 463 464 465 466
# Prepare the test image
import os
import numpy as np
from PIL import Image
def load_image(file):
    im = Image.open(file).convert('L')
    im = im.resize((28, 28), Image.ANTIALIAS)
    im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32)
    im = im / 255.0 * 2.0 - 1.0
    return im

467
cur_dir = os.getcwd()
468
img = load_image(cur_dir + '/image/infer_3.png')
469
```
470

471
### Inference
472

473
Now we are ready to do inference.
474

475
```python
476 477
results = inferencer.infer({'img': img})
lab = np.argsort(results)  # probs and lab are the results of one batch data
478
print("Inference result of image/infer_3.png is: %d" % lab[0][0][-1])
L
liaogang 已提交
479 480
```

Y
Yu Yang 已提交
481

C
choijulie 已提交
482 483 484 485
## Conclusion

This tutorial describes a few common deep learning models using **Softmax regression**, **Multilayer Perceptron Network**, and **Convolutional Neural Network**. Understanding these models is crucial for future learning; the subsequent tutorials derive more sophisticated networks by building on top of them.

K
Kavya Srinet 已提交
486
When our model evolves from a simple softmax regression to a slightly complex Convolutional Neural Network, the recognition accuracy on the MNIST dataset achieves a large improvement. This is due to the Convolutional layers' local connections and parameter sharing. While learning new models in the future, we encourage the readers to understand the key ideas that lead a new model to improve the results of an old one.
C
choijulie 已提交
487

Y
Yi Wang 已提交
488
Moreover, this tutorial introduces the basic flow of PaddlePaddle model design, which starts with a *data provider*, a model layer construction, and finally training and prediction. Motivated readers can leverage the flow used in this MNIST handwritten digit classification example and experiment with different data and network architectures to train models for classification tasks of their choice.
C
choijulie 已提交
489

Y
Yu Yang 已提交
490

C
choijulie 已提交
491
## References
Y
Yu Yang 已提交
492 493

1. LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. ["Gradient-based learning applied to document recognition."](http://ieeexplore.ieee.org/abstract/document/726791/) Proceedings of the IEEE 86, no. 11 (1998): 2278-2324.
Y
Yi Wang 已提交
494
2. Wejéus, Samuel. ["A Neural Network Approach to Arbitrary SymbolRecognition on Modern Smartphones."](http://www.diva-portal.org/smash/record.jsf?pid=diva2:753279&dswid=-434) (2014).
Y
Yu Yang 已提交
495 496 497 498 499 500 501
3. Decoste, Dennis, and Bernhard Schölkopf. ["Training invariant support vector machines."](http://link.springer.com/article/10.1023/A:1012454411458) Machine learning 46, no. 1-3 (2002): 161-190.
4. Simard, Patrice Y., David Steinkraus, and John C. Platt. ["Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160.8494&rep=rep1&type=pdf) In ICDAR, vol. 3, pp. 958-962. 2003.
5. Salakhutdinov, Ruslan, and Geoffrey E. Hinton. ["Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure."](http://www.jmlr.org/proceedings/papers/v2/salakhutdinov07a/salakhutdinov07a.pdf) In AISTATS, vol. 11. 2007.
6. Cireşan, Dan Claudiu, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. ["Deep, big, simple neural nets for handwritten digit recognition."](http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00052) Neural computation 22, no. 12 (2010): 3207-3220.
7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
502
10. Bishop, Christopher M. ["Pattern recognition."](http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf) Machine Learning 128 (2006): 1-58.
Y
Yu Yang 已提交
503 504

<br/>
L
Luo Tao 已提交
505
This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
506

Y
Yu Yang 已提交
507 508 509 510 511 512 513
</div>
<!-- You can change the lines below now. -->

<script type="text/javascript">
marked.setOptions({
  renderer: new marked.Renderer(),
  gfm: true,
Y
Yu Yang 已提交
514 515 516
  breaks: false,
  smartypants: true,
  highlight: function(code, lang) {
Y
Yu Yang 已提交
517
    code = code.replace(/&amp;/g, "&")
Y
Yu Yang 已提交
518 519
    code = code.replace(/&gt;/g, ">")
    code = code.replace(/&lt;/g, "<")
520
    code = code.replace(/&nbsp;/g, " ")
Y
Yu Yang 已提交
521
    return hljs.highlightAuto(code, [lang]).value;
Y
Yu Yang 已提交
522 523 524
  }
});
document.getElementById("context").innerHTML = marked(
525
        document.getElementById("markdown").innerHTML)
Y
Yu Yang 已提交
526 527
</script>
</body>