Source code of this chapter is under [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/recognize_digits), For the first-time use, please refer to PaddlePaddle [installation instructions](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html)。
When we study programming, the first program is usually printing “Hello World.” In Machine Learning, or Deep Learning, this is hand-written digit recognition with [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a typical image classification problem. The problem is relatively easy, and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains hand-written digits and corresponding labels (Fig. 1). An image is a 28x28 matrix, and a label corresponds to one of the 10 digits from 0 to 9. Each image is normalized in size and centered.
MNIST dataset is made from [NIST](https://www.nist.gov/srd/nist-special-database-19) Special Database 3 (SD-3) and Special Database 1 (SD-1). Since SD-3 is labeled by staffs in U.S. Census Bureau, while SD-1 is labeled by high school students in U.S., SD-3 is cleaner and easier to recognize than SD-1 is. Yann LeCun et al. extracted half of samples from each of SD-1 and SD-3 for MNIST training set (60,000 samples) and test set (10,000 samples), where training set was labeled by 250 different annotators, and it was guaranteed that annotators of training set and test set are not completely overlapped.
Yann LeCun, one of the founders of Deep Learning, had large contribution on hand-written character recognition in early dates, and proposed CNN (Convolutional Neural Network), which drastically improved recognition capability for hand-written characters. CNN is now a critical key for Deep Learning. From Yann LeCun’s first proposal of LeNet, to those winning models in ImageNet, such as VGGNet, GoogLeNet, ResNet, etc. (Please refer to [Image Classification](https://github.com/PaddlePaddle/book/tree/develop/image_classification) chapter) CNN achieved a series of astonishing results in Image Classification.
Many algorithms are tested on MNIST. In 1998, LeCun experimented single layer linear classifier, MLP (Multilayer Perceptron) and Multilayer CNN LeNet, which continuously reduced test error from 12% to 0.7% \[[1](#References)\]. Since then, researchers worked on many algorithms such as k-NN (K-Nearest Neighbors) \[[2](#References)\], Support Vector Machine (SVM) \[[3](#References)\], Neural Networks \[[4-7](#References)\] and Boosting \[[8](#References)\], and applied various preprocessing methods, such as distortion removal, noise removal and blurring, to increase recognition accuracy.
本教程中,我们从简单的模型Softmax回归开始,带大家入门手写字符识别,并逐步进行模型优化。
In this chapter, we start from simple Softmax regression model, and guide readers to introduction of hand-written character recognition, and gradual improvement of models.
Before introducing the classification algorithms and training procedure, we provide some definitions:
...
...
@@ -49,29 +25,6 @@ Before introducing the classification algorithms and training procedure, we prov
- $Y$ is output:Output of classifier is 10 class digits from 0 to 9. $Y=\left ( y_0, y_1, \dots, y_9 \right )$,Each dimension $y_i$ represents a probability that the image belongs to $i$.
- $L$ is a image's ground truth label:$L=\left ( l_0, l_1, \dots, l_9 \right )$ It is also 10 dimensional, but only one dimension is 1 and others are all 0.
The simplest softmax regression model is to feed input to fully connected layers, and directly use softmax for multi-class classification \[[9](#References)\].
...
...
@@ -95,22 +48,6 @@ Fig. 2 is softmax regression network, with weights in black, and bias in red. +1
Softmax regression model uses the simplest two layer neural network, i.e. it only contains input layer and output layer, so that it's regression ability is limited. To achieve better recognition effect, we consider adding several hidden layers \[[10](#References)\] between the input layer and the output layer.
...
...
@@ -126,21 +63,8 @@ Fig. 3. is multi-layer perceptron network, with weights in black, and bias in re
@@ -152,15 +76,6 @@ Convolutional layer is the core of Convolutional Neural Networks. The parameters
Fig. 4 is a dynamic graph for a convolutional layer, where depths are not shown for simplicity. Input is $W_1=5,H_1=5,D_1=3$. In fact, this is a common representation for colored images. The width and height of a colored image corresponds to $W_1$ and $H_1$, and the 3 color channels for RGB corresponds to $D_1$. The parameters of convolutional layers are $K=2,F=3,S=2,P=1$. $K$ is the number of kernels. Here, $Filter W_0$ and $Filter W_1$ are two convolution kernels. $F$ is kernel size. $W0$ and $W1$ are both $3\times3$ matrix in all depths. $S$ is stride. Kernels moves leftwards or downwards by 2 units each time. $P$ is padding, which is the extension for the input.
Pooling layer is a sampling method. The main functionality is to reduce computation by reducing network parameters. It also prevents over-fitting to some extent. Usually, a pooling layer is added after a convolutional layer. Pooling layer includes max pooling, average pooling, etc. Max pooling uses rectangles to divide input layer into several parts, and compute maximum value in each part as output (Fig. 5.)
For more details of Convolutional Neural Network , please refer to [Stanford open course](http://cs231n.github.io/convolutional-networks/) and [Image Classification](https://github.com/PaddlePaddle/book/blob/develop/image_classification/README.md) chapter。
Execute the following command to download [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and unzip, then put paths of training set and test set to train.list and test.list respectively for PaddlePaddle to read.
...
...
@@ -272,39 +136,6 @@ Users can randomly generate 10 images with the following script (Refer to Fig. 1
First get data by `data_layer`, and get classification result by classifier. Here we provided three different classifiers. In training, we compute loss function, which is usually cross entropy for classification problem. In prediction, we can directly output results.
The following is the LeNet-5 network architecture. 2D input image is first fed into two sets of convolutional layer and pooling layer, and it is fed to fully connected layer, and another fully connected layer with softmax activation.
@@ -698,22 +351,6 @@ The classification accuracy is 90.01%
From the evaluation results, the best step for softmax regression model is pass-00013, where classification accuracy is 90.01%, and the last pass-00099 has accuracy of 89.3%. From Fig. 7, we also see that the best accuracy may not appear in the last pass. A explanation is that during training, the model may already arrive at local optimum, and it just swings around nearby in the following passes, or it gets lower local optimum.
@@ -730,22 +367,6 @@ The classification accuracy is 94.95%
From the evaluation results, the final training accuracy is 94.95%. It has significant improvement comparing with softmax regression model. The reason is that softmax regression is simple, and it cannot fit complex data, but Multi-layer perceptron with hidden layers has stronger fitting capacity.
### Training results for Convolutional Neural Network
<palign="center">
...
...
@@ -762,35 +383,8 @@ The classification accuracy is 99.20%
From the evaluation result, the best accuracy of Convolutional Neural Network is 99.20%. This means, for image problem, Convolutional Neural Network has better recognition effect than fully connected network. This should be related to the local connection and parameter sharing of convolutional layers. Also, in Fig. 9, Convolutional Neural Network achieves good effect in early steps, which indicates that it is fast to converge.
Script `predict.py` can make prediction for trained models. For example, in softmax regression:
...
...
@@ -816,11 +410,6 @@ Actual Number: 0
From the result, this classifier recognizes the digit on the third image as digit 0 with near to 100% probability, and the ground truth is actually consistent.
Softmax regression, Multi-layer perceptron and Convolutional Neural Network in this chapter are the most basic Deep Learning models. More sophisticated models in the following chapters are derived from them. Therefore, these models are very helpful for the future learning. At the same time, we observed that when evolving from the simplest softmax regression to slightly complex Convolutional Neural Network, recognition accuracy on MNIST data set has large improvement, due to Convolutional layers' local connections and parameter sharing. When learning new models in the future, we hope readers to understand the key ideas for a new model to improve over an old one. Moreover, this chapter introduced basic flow of PaddlePaddle model design, starting from dataprovider, model layer construction, to final training and prediction. By becoming familiar with this flow, readers can use specific data and define specific network models, and complete training and prediction for their tasks.
...
...
@@ -837,8 +426,5 @@ Softmax regression, Multi-layer perceptron and Convolutional Neural Network in t
9. Rosenblatt, Frank. ["The perceptron: A probabilistic model for information storage and organization in the brain."](http://psycnet.apa.org/journals/rev/65/6/386/) Psychological review 65, no. 6 (1958): 386.
10. Bishop, Christopher M. ["Pattern recognition."](http://s3.amazonaws.com/academia.edu.documents/30428242/bg0137.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1484816640&Signature=85Ad6%2Fca8T82pmHzxaSXermovIA%3D&response-content-disposition=inline%3B%20filename%3DPattern_recognition_and_machine_learning.pdf) Machine Learning 128 (2006): 1-58.
<arel="license"href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><imgalt="知识共享许可协议"style="border-width:0"src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png"/></a><br/><spanxmlns:dct="http://purl.org/dc/terms/"href="http://purl.org/dc/dcmitype/Text"property="dct:title"rel="dct:type">This book</span> is created by <axmlns:cc="http://creativecommons.org/ns#"href="http://book.paddlepaddle.org"property="cc:attributionName"rel="cc:attributionURL">PaddlePaddle</a>, and uses <arel="license"href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Shared knowledge signature - non commercial use-Sharing 4.0 International Licensing Protocal</a>.