README.en.md 25.0 KB
Newer Older
Y
Yuanpeng 已提交
1 2
# Recognize Digits

3
Source code of this chapter is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits), For the first-time use, please refer to PaddlePaddle [installation instructions](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)
Y
Yuanpeng 已提交
4

5 6 7
## Background
When we study programming, the first program is usually printing “Hello World.” In Machine Learning, or Deep Learning, this is hand-written digit recognition with [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a typical image classification problem. The problem is relatively easy, and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains hand-written digits and corresponding labels (Fig. 1). An image is a 28x28 matrix, and a label corresponds to one of the 10 digits from 0 to 9. Each image is normalized in size and centered.

8 9
<p align="center">
<img src="image/mnist_example_image.png" width="400"><br/>
Y
Yuanpeng 已提交
10
Fig. 1. MNIST image examples
11 12
</p>

Y
Yuanpeng 已提交
13 14
MNIST dataset is made from [NIST](https://www.nist.gov/srd/nist-special-database-19) Special Database 3 (SD-3) and Special Database 1 (SD-1). Since SD-3 is labeled by staffs in U.S. Census Bureau, while SD-1 is labeled by high school students in U.S., SD-3 is cleaner and easier to recognize than SD-1 is. Yann LeCun et al. extracted half of samples from each of SD-1 and SD-3 for MNIST training set (60,000 samples) and test set (10,000 samples), where training set was labeled by 250 different annotators, and it was guaranteed that annotators of training set and test set are not completely overlapped.

Y
Yuanpeng 已提交
15
Yann LeCun, one of the founders of Deep Learning, had large contribution on hand-written character recognition in early dates, and proposed CNN (Convolutional Neural Network), which drastically improved recognition capability for hand-written characters. CNN is now a critical key for Deep Learning. From Yann LeCun’s first proposal of LeNet, to those winning models in ImageNet, such as VGGNet, GoogLeNet, ResNet, etc. (Please refer to [Image Classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification) chapter) CNN achieved a series of astonishing results in Image Classification.
Y
Yuanpeng 已提交
16 17 18 19

Many algorithms are tested on MNIST. In 1998, LeCun experimented single layer linear classifier, MLP (Multilayer Perceptron) and Multilayer CNN LeNet, which continuously reduced test error from 12% to 0.7% \[[1](#References)\]. Since then, researchers worked on many algorithms such as k-NN (K-Nearest Neighbors) \[[2](#References)\], Support Vector Machine (SVM) \[[3](#References)\], Neural Networks \[[4-7](#References)\] and Boosting \[[8](#References)\], and applied various preprocessing methods, such as distortion removal, noise removal and blurring, to increase recognition accuracy.

In this chapter, we start from simple Softmax regression model, and guide readers to introduction of hand-written character recognition, and gradual improvement of models.
20

Y
Yuanpeng 已提交
21 22 23 24 25 26 27
## Model Overview

Before introducing the classification algorithms and training procedure, we provide some definitions:
- $X$ is input:MNIST image is $28\times28$ two dimensional matrix. It is reshaped to $784$ dimensional vector. $X=\left ( x_0, x_1, \dots, x_{783} \right )$。
- $Y$ is output:Output of classifier is 10 class digits from 0 to 9. $Y=\left ( y_0, y_1, \dots, y_9 \right )$,Each dimension $y_i$ represents a probability that the image belongs to $i$.
- $L$ is a image's ground truth label:$L=\left ( l_0, l_1, \dots, l_9 \right )$ It is also 10 dimensional, but only one dimension is 1 and others are all 0.

Y
Yuanpeng 已提交
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
### Softmax Regression

The simplest softmax regression model is to feed input to fully connected layers, and directly use softmax for multi-class classification \[[9](#References)\].

Input $X$ is multiplied with weights $W$, added by bias $b$, and activated.

$$ y_i = softmax(\sum_j W_{i,j}x_j + b_i) $$

where $ softmax(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} $

For a $N$ class classification problem with $N$ output nodes, a $N$ dimensional input features is normalized to $N$ real values in [0, 1], each representing the probability of the sample to belong to the class. Here $y_i$ is the prediction probability that an image is digit $i$.

In classification problem, we usually use cross entropy loss function:

$$  crossentropy(label, y) = -\sum_i label_ilog(y_i) $$

Fig. 2 is softmax regression network, with weights in black, and bias in red. +1 indicates bias is 1.

<p align="center">
<img src="image/softmax_regression.png" width=400><br/>
Fig. 2. Softmax regression network architecture<br/>
</p>

51
### Multilayer Perceptron
Y
Yuanpeng 已提交
52 53 54 55 56 57 58 59 60 61 62 63 64 65

Softmax regression model uses the simplest two layer neural network, i.e. it only contains input layer and output layer, so that it's regression ability is limited. To achieve better recognition effect, we consider adding several hidden layers \[[10](#References)\] between the input layer and the output layer.

1.  After the first hidden layer, we get $ H_1 = \phi(W_1X + b_1) $, where $\phi$ represents activation function. Some common ones are sigmoid, tanh and ReLU.
2.  After the second hidden layer, we get $ H_2 = \phi(W_2H_1 + b_2) $.
3.  Finally, after output layer, we get $Y=softmax(W_3H_2 + b_3)$, the last classification result vector.

Fig. 3. is multi-layer perceptron network, with weights in black, and bias in red. +1 indicates bias is 1.

<p align="center">
<img src="image/mlp.png" width=500><br/>
Fig. 3. Multi-layer perceptron network architecture<br/>
</p>

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
### Convolutional Neural Network

#### Convolutional layer

<p align="center">
<img src="image/conv_layer.png" width=500><br/>
Fig. 4. Convolutional layer<br/>
</p>

Convolutional layer is the core of Convolutional Neural Networks. The parameters in this layer are composed of a set of filters, or kernels. In forward step, each kernel moves horizontally and vertically, and compute dot product of the kernel and input on corresponding positions, then add bias and apply activation function. The result is two dimensional activation map. For example, some kernel may recognize corners, and some may recognize circles. These convolution kernels may respond strongly to the corresponding features.

Fig. 4 is a dynamic graph for a convolutional layer, where depths are not shown for simplicity. Input is $W_1=5,H_1=5,D_1=3$. In fact, this is a common representation for colored images. The width and height of a colored image corresponds to $W_1$ and $H_1$, and the 3 color channels for RGB corresponds to $D_1$. The parameters of convolutional layers are $K=2,F=3,S=2,P=1$. $K$ is the number of kernels. Here, $Filter W_0$ and $Filter   W_1$ are two convolution kernels. $F$ is kernel size. $W0$ and $W1$ are both $3\times3$ matrix in all depths. $S$ is stride. Kernels moves leftwards or downwards by 2 units each time. $P$ is padding, which is the extension for the input.

#### Pooling layer

<p align="center">
<img src="image/max_pooling.png" width="400px"><br/>
Fig. 5 Pooling layer<br/>
</p>

Pooling layer is a sampling method. The main functionality is to reduce computation by reducing network parameters. It also prevents over-fitting to some extent. Usually, a pooling layer is added after a convolutional layer. Pooling layer includes max pooling, average pooling, etc. Max pooling uses rectangles to divide input layer into several parts, and compute maximum value in each part as output (Fig. 5.)

Y
Yuanpeng 已提交
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
#### LeNet-5 Network 

<p align="center">
<img src="image/cnn.png"><br/>
Fig. 6. LeNet-5 Convolutional Neural Network architecture<br/>
</p>

[LeNet-5](http://yann.lecun.com/exdb/lenet/) is one of the simplest Convolutional Neural Networks. Fig. 6. shows its architecture: 2 dimensional image input is fed into two sets of convolutional layer and pooling layer, then it is fed into fully connected layer and softmax classifier. The following three properties of convolution enable LeNet-5 to better recognize images than Multilayer fully-connected perceptrons:

- 3D properties of neurons: a convolutional layer is organized by width, height and depth. Neurons in each layer are connected to only a small region in previous layer. This region is called receptive field.
- Local connection: CNN utilizes local space correlation by connecting local neurons. This design guarantees learned filter has strong response to local input features. Stacking many such layers leads non-linear filter becomes more and more global. This allows the network to first obtain good representation for a small parts of input, then combine them to represent larger region.
- Sharing weights: In CNN, computation is iterated with shared parameters (weights and bias) to form afeature map. This means all neurons in the same depth of output respond to the same feature. This allows detecting a feature regardless of its position in the input, and enables a property of translation equivariance.

For more details of Convolutional Neural Network , please refer to [Stanford open course]( http://cs231n.github.io/convolutional-networks/ ) and [Image Classification](https://github.com/PaddlePaddle/book/blob/develop/image_classification/README.md) chapter。

Y
Yuanpeng 已提交
103 104 105 106 107 108 109 110 111 112 113
### List of common activation functions  
- Sigmoid activation function: $ f(x) = sigmoid(x) = \frac{1}{1+e^{-x}} $

- Tanh activation function: $ f(x) = tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}} $

  In fact, tang function is just a rescaled version of sigmoid function. It is obtained by magnifying the value of sigmoid function and moving it downwards by 1.

- ReLU activation function: $ f(x) = max(0, x) $

For more information, please refer to [Activation functions in Wikipedia](https://en.wikipedia.org/wiki/Activation_function)

Y
Yuanpeng 已提交
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
## Data Preparation

### Data and Download

Execute the following command to download [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and unzip, then put paths of training set and test set to train.list and test.list respectively for PaddlePaddle to read.

```bash
./data/get_mnist_data.sh
```

`gzip` downloaded data. The following files can be found in `data/raw_data`:

|    File name          |       Description              |
|----------------------|-------------------------|
|train-images-idx3-ubyte|  Training images, 60,000 |
|train-labels-idx1-ubyte|  Training labels, 60,000 |
|t10k-images-idx3-ubyte |  Evaluation images, 10,000 |
Y
Yuanpeng 已提交
131
|t10k-labels-idx1-ubyte |  Evaluation labels, 10,000 |
Y
Yuanpeng 已提交
132 133 134 135 136 137 138

Users can randomly generate 10 images with the following script (Refer to Fig. 1.)

```bash
./load_data.py
```

Y
Yuanpeng 已提交
139 140 141 142 143 144 145 146 147 148 149 150
### Provide data for PaddlePaddle

We use python interface to convey data to system. `mnist_provider.py` shows a complete example for MNIST data.

```python
# Define a py data provider
@provider(
    input_types={'pixel': dense_vector(28 * 28),
                 'label': integer_value(10)})
def process(settings, filename):  # settings is not used currently.
		# Open image file
    with open( filename + "-images-idx3-ubyte", "rb") as f:             
Y
Yuanpeng 已提交
151
		# Read first 4 parameters. magic is data format. n is number of data. rows and cols are number of rows and columns, respectively
Y
Yuanpeng 已提交
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171
        magic, n, rows, cols = struct.upack(">IIII", f.read(16))        
		# With empty string as a unit, read data one by one
        images = np.fromfile(                                           
            f, 'ubyte',
            count=n * rows * cols).reshape(n, rows, cols).astype('float32')
		# Normalize data of [0, 255] to [-1,1]
        images = images / 255.0 * 2.0 - 1.0                             


		# Open label file
    with open( filename + "-labels-idx1-ubyte", "rb") as l:             
		# Read first two parameters
        magic, n = struct.upack(">II", l.read(8))                       
		# With empty string as a unit, read data one by one
        labels = np.fromfile(l, 'ubyte', count=n).astype("int")         

    for i in xrange(n):
        yield {"pixel": images[i, :], 'label': labels[i]}
```

172

Y
Yuanpeng 已提交
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190
## Model Configurations

### Data Definition

在模型配置中,定义通过 `define_py_data_sources2` 函数从 `dataprovider` 中读入数据。如果该配置用于预测,则不需要数据定义部分。

In model configuration, define data reading from `dataprovider` by `define_py_data_sources2`. If this configuration is used for prediction, data definition is not necessary.

```python
 if not is_predict:
     data_dir = './data/'
     define_py_data_sources2(
         train_list=data_dir + 'train.list',
         test_list=data_dir + 'test.list',
         module='mnist_provider',
         obj='process')
```

Y
Yuanpeng 已提交
191 192 193 194 195 196 197 198 199
### Algorithm Configuration

Set training related parameters.

- batch_size: use 128 samples in each training step.
- learning_rate: rating of iteration, related to the rate of convergence.
- learning_method: Optimizer `MomentumOptimizer` for training. The parameter 0.9 indicates momentum keeps 0.9 of previous speed.
- regularization: A method to prevent overfitting. Here L2 regularization is used.

200 201 202 203 204 205 206 207
```python
settings(
    batch_size=128,
    learning_rate=0.1 / 128.0,
    learning_method=MomentumOptimizer(0.9),
    regularization=L2Regularization(0.0005 * 128))
```

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230
### Model Architecture

#### Overview

First get data by `data_layer`, and get classification result by classifier. Here we provided three different classifiers. In training, we compute loss function, which is usually cross entropy for classification problem. In prediction, we can directly output results.

``` python
data_size = 1 * 28 * 28
label_size = 10
img = data_layer(name='pixel', size=data_size)

predict = softmax_regression(img) # Softmax Regression
#predict = multilayer_perceptron(img) # Multilayer Perceptron
#predict = convolutional_neural_network(img) #LeNet5 Convolutional Neural Network
 
if not is_predict:
    lbl = data_layer(name="label", size=label_size)
    inputs(img, lbl)
    outputs(classification_cost(input=predict, label=lbl))
else:
    outputs(predict)
```

Y
Yuanpeng 已提交
231 232 233 234 235 236 237 238 239
#### Softmax Regression

One simple fully connected layer with softmax activation function outputs classification result.

```python
def softmax_regression(img):
    predict = fc_layer(input=img, size=10, act=SoftmaxActivation())
    return predict
```
240

Y
Yuanpeng 已提交
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286
#### MultiLayer Perceptron

One simple fully connected layer with softmax activation function outputs classification result.

```python
def multilayer_perceptron(img):
    # First fully connected layer with ReLU
    hidden1 = fc_layer(input=img, size=128, act=ReluActivation())
    # Second fully connected layer with ReLU
    hidden2 = fc_layer(input=hidden1, size=64, act=ReluActivation())
    # Output layer as fully connected layer and softmax activation. The size must be 10.
    predict = fc_layer(input=hidden2, size=10, act=SoftmaxActivation())
    return predict
```

#### Convolutional Neural Network LeNet-5

The following is the LeNet-5 network architecture. 2D input image is first fed into two sets of convolutional layer and pooling layer, and it is fed to fully connected layer, and another fully connected layer with softmax activation.

```python
def convolutional_neural_network(img):
    # First convolutional layer - pooling layer
    conv_pool_1 = simple_img_conv_pool(
        input=img,
        filter_size=5,
        num_filters=20,
        num_channel=1,
        pool_size=2,
        pool_stride=2,
        act=TanhActivation())
    # Second convolutional layer - pooling layer
    conv_pool_2 = simple_img_conv_pool(
        input=conv_pool_1,
        filter_size=5,
        num_filters=50,
        num_channel=20,
        pool_size=2,
        pool_stride=2,
        act=TanhActivation())
    # Fully connected layer
    fc1 = fc_layer(input=conv_pool_2, size=128, act=TanhActivation())
    # Output layer as fully connected layer and softmax activation. The size must be 10.
    predict = fc_layer(input=fc1, size=10, act=SoftmaxActivation())
    return predict
```

Y
Yuanpeng 已提交
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337
## Training Model

### Training Commands and Logs

1.By configuring `train.sh` to execute training:

```bash
config=mnist_model.py                   # Select network in mnist_model.py
output=./softmax_mnist_model            
log=softmax_train.log                   

paddle train \
--config=$config \                      # Scripts for network configuration.
--dot_period=10 \                       # After `dot_period` steps, print one `.`
--log_period=100 \						# Print a log every batchs
--test_all_data_in_one_period=1 \		# Whether to use all data in every test
--use_gpu=0 \							# Whether to use GPU
--trainer_count=1 \						# Number of CPU or GPU
--num_passes=100 \						# Passes for training (One pass uses all data.)
--save_dir=$output \					# Path to saved model
2>&1 | tee $log

python -m paddle.utils.plotcurve -i $log > plot.png
```

After configuring parameters, execute `./train.sh`. Training log is as follows.

```
I0117 12:52:29.628617  4538 TrainerInternal.cpp:165]  Batch=100 samples=12800 AvgCost=2.63996 CurrentCost=2.63996 Eval: classification_error_evaluator=0.241172  CurrentEval: classification_error_evaluator=0.241172 
.........
I0117 12:52:29.768741  4538 TrainerInternal.cpp:165]  Batch=200 samples=25600 AvgCost=1.74027 CurrentCost=0.840582 Eval: classification_error_evaluator=0.185234  CurrentEval: classification_error_evaluator=0.129297 
.........
I0117 12:52:29.916970  4538 TrainerInternal.cpp:165]  Batch=300 samples=38400 AvgCost=1.42119 CurrentCost=0.783026 Eval: classification_error_evaluator=0.167786  CurrentEval: classification_error_evaluator=0.132891 
.........
I0117 12:52:30.061213  4538 TrainerInternal.cpp:165]  Batch=400 samples=51200 AvgCost=1.23965 CurrentCost=0.695054 Eval: classification_error_evaluator=0.160039  CurrentEval: classification_error_evaluator=0.136797 
......I0117 12:52:30.223270  4538 TrainerInternal.cpp:181]  Pass=0 Batch=469 samples=60000 AvgCost=1.1628 Eval: classification_error_evaluator=0.156233 
I0117 12:52:30.366894  4538 Tester.cpp:109]  Test samples=10000 cost=0.50777 Eval: classification_error_evaluator=0.0978 
```

2.Use `plot_cost.py` to plot error curve during training.

```bash
python plot_cost.py softmax_train.log            
```

3.Use `evaluate.py ` to select the best trained model.

```bash
python evaluate.py softmax_train.log
```

Y
Yuanpeng 已提交
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412
### Training Results for Softmax Regression

<p align="center">
<img src="image/softmax_train_log.png" width="400px"><br/>
Fig. 7 Softmax regression error curve<br/>
</p>

Evaluation results of the models:

```text
Best pass is 00013, testing Avgcost is 0.484447
The classification accuracy is 90.01%
```

From the evaluation results, the best step for softmax regression model is pass-00013, where classification accuracy is 90.01%, and the last pass-00099 has accuracy of 89.3%. From Fig. 7, we also see that the best accuracy may not appear in the last pass. A explanation is that during training, the model may already arrive at local optimum, and it just swings around nearby in the following passes, or it gets lower local optimum.

### Results of Multilayer Perceptron

<p align="center">
<img src="image/mlp_train_log.png" width="400px"><br/>
Fig. 8. Multilayer perceptron error curve
</p>

Evaluation results of the models:

```text
Best pass is 00085, testing Avgcost is 0.164746
The classification accuracy is 94.95%
```

From the evaluation results, the final training accuracy is 94.95%. It has significant improvement comparing with softmax regression model. The reason is that softmax regression is simple, and it cannot fit complex data, but Multi-layer perceptron with hidden layers has stronger fitting capacity.

### Training results for Convolutional Neural Network

<p align="center">
<img src="image/cnn_train_log.png" width="400px"><br/>
图9. Convolutional Neural Network error curve
</p>

Results of model evaluation:

```text
Best pass is 00076, testing Avgcost is 0.0244684
The classification accuracy is 99.20%
```

From the evaluation result, the best accuracy of Convolutional Neural Network is 99.20%. This means, for image problem, Convolutional Neural Network has better recognition effect than fully connected network. This should be related to the local connection and parameter sharing of convolutional layers. Also, in Fig. 9, Convolutional Neural Network achieves good effect in early steps, which indicates that it is fast to converge.

## Application Model

### Prediction Commands and Results
Script `predict.py` can make prediction for trained models. For example, in softmax regression:

```bash
python predict.py -c mnist_model.py -d data/raw_data/ -m softmax_mnist_model/pass-00047
```

- -c sets model architecture
- -d sets data for prediction
- -m sets model parameters, here the best trained model is used for prediction

Follow to instruction to input image ID for prediction. The classifier can output probabilities for each digit, predicted results with the highest probability, and ground truth label.

```
Input image_id [0~9999]: 3
Predicted probability of each digit:
[[  1.00000000e+00   1.60381094e-28   1.60381094e-28   1.60381094e-28
    1.60381094e-28   1.60381094e-28   1.60381094e-28   1.60381094e-28
    1.60381094e-28   1.60381094e-28]]
Predict Number: 0 
Actual Number: 0
```

From the result, this classifier recognizes the digit on the third image as digit 0 with near to 100% probability, and the ground truth is actually consistent.

Y
Yuanpeng 已提交
413 414 415
## Conclusion
Softmax regression, Multi-layer perceptron and Convolutional Neural Network in this chapter are the most basic Deep Learning models. More sophisticated models in the following chapters are derived from them. Therefore, these models are very helpful for the future learning. At the same time, we observed that when evolving from the simplest softmax regression to slightly complex Convolutional Neural Network, recognition accuracy on MNIST data set has large improvement, due to Convolutional layers' local connections and parameter sharing. When learning new models in the future, we hope readers to understand the key ideas for a new model to improve over an old one. Moreover, this chapter introduced basic flow of PaddlePaddle model design, starting from dataprovider, model layer construction, to final training and prediction. By becoming familiar with this flow, readers can use specific data and define specific network models, and complete training and prediction for their tasks.

Y
Yuanpeng 已提交
416
## References
417 418 419 420 421 422 423 424 425 426 427 428

1. LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. ["Gradient-based learning applied to document recognition."](http://ieeexplore.ieee.org/abstract/document/726791/) Proceedings of the IEEE 86, no. 11 (1998): 2278-2324.
2. Wejéus, Samuel. ["A Neural Network Approach to Arbitrary SymbolRecognition on Modern Smartphones."](http://www.diva-portal.org/smash/record.jsf?pid=diva2%3A753279&dswid=-434) (2014).
3. Decoste, Dennis, and Bernhard Schölkopf. ["Training invariant support vector machines."](http://link.springer.com/article/10.1023/A:1012454411458) Machine learning 46, no. 1-3 (2002): 161-190.
4. Simard, Patrice Y., David Steinkraus, and John C. Platt. ["Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160.8494&rep=rep1&type=pdf) In ICDAR, vol. 3, pp. 958-962. 2003.
5. Salakhutdinov, Ruslan, and Geoffrey E. Hinton. ["Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure."](http://www.jmlr.org/proceedings/papers/v2/salakhutdinov07a/salakhutdinov07a.pdf) In AISTATS, vol. 11. 2007.
6. Cireşan, Dan Claudiu, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. ["Deep, big, simple neural nets for handwritten digit recognition."](http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00052) Neural computation 22, no. 12 (2010): 3207-3220.
7. Deng, Li, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, and Geoffrey E. Hinton. ["Binary coding of speech spectrograms using a deep auto-encoder."](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.1908&rep=rep1&type=pdf) In Interspeech, pp. 1692-1695. 2010.
8. Kégl, Balázs, and Róbert Busa-Fekete. ["Boosting products of base classifiers."](http://dl.acm.org/citation.cfm?id=1553439) In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497-504. ACM, 2009.
9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
10. Bishop, Christopher M. ["Pattern recognition."](http://s3.amazonaws.com/academia.edu.documents/30428242/bg0137.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1484816640&Signature=85Ad6%2Fca8T82pmHzxaSXermovIA%3D&response-content-disposition=inline%3B%20filename%3DPattern_recognition_and_machine_learning.pdf) Machine Learning 128 (2006): 1-58.

Y
Yuanpeng 已提交
429 430
<br/>
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This book</span> is created by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and uses <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Shared knowledge signature - non commercial use-Sharing 4.0 International Licensing Protocal</a>.