The source code for this tutorial is here: [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits). For instructions on getting started with Paddle, please refer to [installation instructions](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
The source code for this tutorial is here: [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits).
For instructions on getting started with Paddle, please refer to [installation instructions](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
## Introduction
## Introduction
When one learns to program, the first task is usually to write a program that prints "Hello World!". In Machine Learning or Deep Learning, an equivalent task is to train a model to recognize hand-written digits using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a classic image classification problem. The problem is relatively easy and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1). The input image is a $28\times28$ matrix, and the label is one of the digits from $0$ to $9$. All images are normalized, meaning that they are both rescaled and centered.
When one learns to program, the first task is usually to write a program that prints "Hello World!".
In Machine Learning or Deep Learning, an equivalent task is to train a model to recognize hand-written digits using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset.
Handwriting recognition is a classic image classification problem. The problem is relatively easy and MNIST is a complete dataset.
As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1).
The input image is a $28\times28$ matrix, and the label is one of the digits from $0$ to $9$. All images are normalized, meaning that they are both rescaled and centered.
The MNIST dataset is from the [NIST](https://www.nist.gov/srd/nist-special-database-19) Special Database 3 (SD-3) and the Special Database 1 (SD-1). The SD-3 is labeled by the staff of the U.S. Census Bureau, while SD-1 is labeled by high school students. Therefore the SD-3 is cleaner and easier to recognize than the SD-1 dataset. Yann LeCun et al. used half of the samples from each of SD-1 and SD-3 to create the MNIST training set of 60,000 samples and test set of 10,000 samples. 250 annotators labeled the training set, thus guaranteed that there wasn't a complete overlap of annotators of training set and test set.
The MNIST dataset is from the [NIST](https://www.nist.gov/srd/nist-special-database-19) Special Database 3 (SD-3) and the Special Database 1 (SD-1).
The SD-3 is labeled by the staff of the U.S. Census Bureau, while SD-1 is labeled by high school students. Therefore the SD-3 is cleaner and easier to recognize than the SD-1 dataset.
Yann LeCun et al. used half of the samples from each of SD-1 and SD-3 to create the MNIST training set of 60,000 samples and test set of 10,000 samples.
250 annotators labeled the training set, thus guaranteed that there wasn't a complete overlap of annotators of training set and test set.
The MNIST dataset has been used for evaluating many image recognition algorithms such as a single layer linear classifier, Multilayer Perceptron (MLP) and Multilayer CNN LeNet\[[1](#references)\], K-Nearest Neighbors (k-NN) \[[2](#references)\], Support Vector Machine (SVM) \[[3](#references)\], Neural Networks \[[4-7](#references)\], Boosting \[[8](#references)\] and preprocessing methods like distortion removal, noise removal, and blurring. Among these algorithms, the *Convolutional Neural Network* (CNN) has achieved a series of impressive results in Image Classification tasks, including VGGNet, GoogLeNet, and ResNet (See [Image Classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) tutorial).
The MNIST dataset has been used for evaluating many image recognition algorithms such as a single layer linear classifier,
Multilayer Perceptron (MLP) and Multilayer CNN LeNet\[[1](#references)\], K-Nearest Neighbors (k-NN) \[[2](#references)\], Support Vector Machine (SVM) \[[3](#references)\],
Neural Networks \[[4-7](#references)\], Boosting \[[8](#references)\] and preprocessing methods like distortion removal, noise removal, and blurring.
Among these algorithms, the *Convolutional Neural Network* (CNN) has achieved a series of impressive results in Image Classification tasks, including VGGNet, GoogLeNet,
and ResNet (See [Image Classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) tutorial).
In this tutorial, we start with a simple **softmax** regression model and go on with MLP and CNN. Readers will see how these methods improve the recognition accuracy step-by-step.
In this tutorial, we start with a simple **softmax** regression model and go on with MLP and CNN. Readers will see how these methods improve the recognition accuracy step-by-step.
...
@@ -124,183 +136,177 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
...
@@ -124,183 +136,177 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
The demo will be using the latest paddle fluid API. Fluid API is the latest Paddle API. It simplifies the model configurations without sacrifice the performance.
We recommend using Fluid API as it is much easier to pick up.
Here are the quick overview on the major fluid API complements.
1.`inference_program`: A function that specify how to get the prediction from the data input.
This is where you specify the network flow.
1.`train_program`: A function that specify how to get avg_cost from `inference_program` and labels.
This is where you specify the loss calculations.
1.`optimizer`: Configure how to minimize the loss. Paddle supports most major optimization methods.
1.`Trainer`: Fluid trainer manages the training process specified by the `train_program` and `optimizer`. Users can monitor the training
progress through the `event_handler` callback function.
1.`Inferencer`: Fluid inferencer loads the `inference_program` and the parameters trained by the Trainer.
It then can infer the data and return prediction
We will go though all of them and dig more on the configurations in this demo.
## Model Configuration
## Model Configuration
A PaddlePaddle program starts from importing the API package:
A PaddlePaddle program starts from importing the API package:
```python
```python
importpaddle.v2aspaddle
importpaddle
importpaddle.fluidasfluid
```
```
We want to use this program to demonstrate three different classifiers, each defined as a Python function:
### Program Functions Configuration
First, We need to setup the `inference_program` function. We want to use this program to demonstrate three different classifiers, each defined as a Python function.
We need to feed image data to the classifier. PaddlePaddle provides a special layer `layer.data` for reading data.
Let us create a data layer for reading images and connect it to the classification network.
- Softmax regression: the network has a fully-connection layer with softmax activation:
- Softmax regression: the network has a fully-connection layer with softmax activation:
- Multi-Layer Perceptron: this network has two hidden fully-connected layers, one with ReLU and the other with softmax activation:
- Multi-Layer Perceptron: this network has two hidden fully-connected layers, both are using ReLU as activation function. The output layer is using softmax activation:
PaddlePaddle provides a special layer `layer.data` for reading data. Let us create a data layer for reading images and connect it to a classification network created using one of above three functions. We also need a cost layer for training the model.
#### Train Program Configuration
Then we need to setup the the `train_program`. It takes the prediction from the classifier first. During the training, it will calculate the `avg_loss` from the prediction.
Please feel free to modify the code to test different results between `softmax regression`, `mlp`, and `convolutional neural network` classifier.
Then we specify the training data `paddle.dataset.mnist.train()` and testing data `paddle.dataset.mnist.test()`. These two methods are *reader creators*. Once called, a reader creator returns a *reader*. A reader is a Python method, which, once called, returns a Python generator, which yields instances of data.
# predict = multilayer_perceptron(images) # uncomment for MLP
predict=convolutional_neural_network(images)# uncomment for LeNet5
`shuffle` is a reader decorator. It takes a reader A as input and returns a new reader B. Under the hood, B calls A to read data in the following fashion: it copies in `buffer_size` instances at a time into a buffer, shuffles the data, and yields the shuffled instances one at a time. A large buffer size would yield very shuffled data.
```
Now, it is time to specify training parameters. In the following `Momentum` optimizer, `momentum=0.9` means that 90% of the current momentum comes from that of the previous iteration. The learning rate relates to the speed at which the network training converges. Regularization is meant to prevent over-fitting; here we use the L2 regularization.
`batch` is a special decorator, which takes a reader and outputs a *batch reader*, which doesn't yield an instance, but a minibatch at a time.
Then we specify the training data `paddle.dataset.mnist.train()` and testing data `paddle.dataset.mnist.test()`. These two methods are *reader creators*. Once called, a reader creator returns a *reader*. A reader is a Python method, which, once called, returns a Python generator, which yields instances of data.
### Trainer Configuration
`shuffle` is a reader decorator. It takes a reader A as input and returns a new reader B. Under the hood, B calls A to read data in the following fashion: it copies in `buffer_size` instances at a time into a buffer, shuffles the data, and yields the shuffled instances one at a time. A large buffer size would yield very shuffled data.
Now, we need to setup the trainer. The trainer need to take in `train_program`, `place`, and `optimizer`.
In the following `Adam` optimizer, `learning_rate` means the speed at which the network training converges.
`batch` is a special decorator, which takes a reader and outputs a *batch reader*, which doesn't yield an instance, but a minibatch at a time.
# Test with Pass 0, Cost 0.326659, {'classification_error_evaluator': 0.09470000118017197}
# Test with Pass 0, Cost 0.326659, {'classification_error_evaluator': 0.09470000118017197}
```
```
After the training, we can check the model's prediction accuracy.
After the training, we can check the model's prediction accuracy.
...
@@ -315,27 +321,25 @@ Usually, with MNIST data, the softmax regression model achieves an accuracy arou
...
@@ -315,27 +321,25 @@ Usually, with MNIST data, the softmax regression model achieves an accuracy arou
## Application
## Application
After training, users can use the trained model to classify images. The following code shows how to inference MNIST images through `paddle.infer` interface.
After training, users can use the trained model to classify images. The following code shows how to inference MNIST images through `fluid.Inferencer`.
```python
```python
fromPILimportImage
inferencer = fluid.Inferencer(
importnumpyasnp
# infer_func=softmax_regression, # uncomment for softmax regression
importos
# infer_func=multilayer_perceptron, # uncomment for MLP
defload_image(file):
infer_func=convolutional_neural_network, # uncomment for LeNet5
The source code for this tutorial is here: [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits). For instructions on getting started with Paddle, please refer to [installation instructions](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
The source code for this tutorial is here: [book/recognize_digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits).
For instructions on getting started with Paddle, please refer to [installation instructions](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
## Introduction
## Introduction
When one learns to program, the first task is usually to write a program that prints "Hello World!". In Machine Learning or Deep Learning, an equivalent task is to train a model to recognize hand-written digits using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Handwriting recognition is a classic image classification problem. The problem is relatively easy and MNIST is a complete dataset. As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1). The input image is a $28\times28$ matrix, and the label is one of the digits from $0$ to $9$. All images are normalized, meaning that they are both rescaled and centered.
When one learns to program, the first task is usually to write a program that prints "Hello World!".
In Machine Learning or Deep Learning, an equivalent task is to train a model to recognize hand-written digits using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset.
Handwriting recognition is a classic image classification problem. The problem is relatively easy and MNIST is a complete dataset.
As a simple Computer Vision dataset, MNIST contains images of handwritten digits and their corresponding labels (Fig. 1).
The input image is a $28\times28$ matrix, and the label is one of the digits from $0$ to $9$. All images are normalized, meaning that they are both rescaled and centered.
The MNIST dataset is from the [NIST](https://www.nist.gov/srd/nist-special-database-19) Special Database 3 (SD-3) and the Special Database 1 (SD-1). The SD-3 is labeled by the staff of the U.S. Census Bureau, while SD-1 is labeled by high school students. Therefore the SD-3 is cleaner and easier to recognize than the SD-1 dataset. Yann LeCun et al. used half of the samples from each of SD-1 and SD-3 to create the MNIST training set of 60,000 samples and test set of 10,000 samples. 250 annotators labeled the training set, thus guaranteed that there wasn't a complete overlap of annotators of training set and test set.
The MNIST dataset is from the [NIST](https://www.nist.gov/srd/nist-special-database-19) Special Database 3 (SD-3) and the Special Database 1 (SD-1).
The SD-3 is labeled by the staff of the U.S. Census Bureau, while SD-1 is labeled by high school students. Therefore the SD-3 is cleaner and easier to recognize than the SD-1 dataset.
Yann LeCun et al. used half of the samples from each of SD-1 and SD-3 to create the MNIST training set of 60,000 samples and test set of 10,000 samples.
250 annotators labeled the training set, thus guaranteed that there wasn't a complete overlap of annotators of training set and test set.
The MNIST dataset has been used for evaluating many image recognition algorithms such as a single layer linear classifier, Multilayer Perceptron (MLP) and Multilayer CNN LeNet\[[1](#references)\], K-Nearest Neighbors (k-NN) \[[2](#references)\], Support Vector Machine (SVM) \[[3](#references)\], Neural Networks \[[4-7](#references)\], Boosting \[[8](#references)\] and preprocessing methods like distortion removal, noise removal, and blurring. Among these algorithms, the *Convolutional Neural Network* (CNN) has achieved a series of impressive results in Image Classification tasks, including VGGNet, GoogLeNet, and ResNet (See [Image Classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) tutorial).
The MNIST dataset has been used for evaluating many image recognition algorithms such as a single layer linear classifier,
Multilayer Perceptron (MLP) and Multilayer CNN LeNet\[[1](#references)\], K-Nearest Neighbors (k-NN) \[[2](#references)\], Support Vector Machine (SVM) \[[3](#references)\],
Neural Networks \[[4-7](#references)\], Boosting \[[8](#references)\] and preprocessing methods like distortion removal, noise removal, and blurring.
Among these algorithms, the *Convolutional Neural Network* (CNN) has achieved a series of impressive results in Image Classification tasks, including VGGNet, GoogLeNet,
and ResNet (See [Image Classification](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) tutorial).
In this tutorial, we start with a simple **softmax** regression model and go on with MLP and CNN. Readers will see how these methods improve the recognition accuracy step-by-step.
In this tutorial, we start with a simple **softmax** regression model and go on with MLP and CNN. Readers will see how these methods improve the recognition accuracy step-by-step.
...
@@ -166,183 +178,177 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
...
@@ -166,183 +178,177 @@ PaddlePaddle provides a Python module, `paddle.dataset.mnist`, which downloads a
The demo will be using the latest paddle fluid API. Fluid API is the latest Paddle API. It simplifies the model configurations without sacrifice the performance.
We recommend using Fluid API as it is much easier to pick up.
Here are the quick overview on the major fluid API complements.
1. `inference_program`: A function that specify how to get the prediction from the data input.
This is where you specify the network flow.
1. `train_program`: A function that specify how to get avg_cost from `inference_program` and labels.
This is where you specify the loss calculations.
1. `optimizer`: Configure how to minimize the loss. Paddle supports most major optimization methods.
1. `Trainer`: Fluid trainer manages the training process specified by the `train_program` and `optimizer`. Users can monitor the training
progress through the `event_handler` callback function.
1. `Inferencer`: Fluid inferencer loads the `inference_program` and the parameters trained by the Trainer.
It then can infer the data and return prediction
We will go though all of them and dig more on the configurations in this demo.
## Model Configuration
## Model Configuration
A PaddlePaddle program starts from importing the API package:
A PaddlePaddle program starts from importing the API package:
```python
```python
import paddle.v2 as paddle
import paddle
import paddle.fluid as fluid
```
```
We want to use this program to demonstrate three different classifiers, each defined as a Python function:
### Program Functions Configuration
First, We need to setup the `inference_program` function. We want to use this program to demonstrate three different classifiers, each defined as a Python function.
We need to feed image data to the classifier. PaddlePaddle provides a special layer `layer.data` for reading data.
Let us create a data layer for reading images and connect it to the classification network.
- Softmax regression: the network has a fully-connection layer with softmax activation:
- Softmax regression: the network has a fully-connection layer with softmax activation:
- Multi-Layer Perceptron: this network has two hidden fully-connected layers, one with ReLU and the other with softmax activation:
- Multi-Layer Perceptron: this network has two hidden fully-connected layers, both are using ReLU as activation function. The output layer is using softmax activation:
PaddlePaddle provides a special layer `layer.data` for reading data. Let us create a data layer for reading images and connect it to a classification network created using one of above three functions. We also need a cost layer for training the model.
#### Train Program Configuration
Then we need to setup the the `train_program`. It takes the prediction from the classifier first. During the training, it will calculate the `avg_loss` from the prediction.
Please feel free to modify the code to test different results between `softmax regression`, `mlp`, and `convolutional neural network` classifier.
Then we specify the training data `paddle.dataset.mnist.train()` and testing data `paddle.dataset.mnist.test()`. These two methods are *reader creators*. Once called, a reader creator returns a *reader*. A reader is a Python method, which, once called, returns a Python generator, which yields instances of data.
# predict = multilayer_perceptron(images) # uncomment for MLP
predict = convolutional_neural_network(images) # uncomment for LeNet5
`shuffle` is a reader decorator. It takes a reader A as input and returns a new reader B. Under the hood, B calls A to read data in the following fashion: it copies in `buffer_size` instances at a time into a buffer, shuffles the data, and yields the shuffled instances one at a time. A large buffer size would yield very shuffled data.
```
Now, it is time to specify training parameters. In the following `Momentum` optimizer, `momentum=0.9` means that 90% of the current momentum comes from that of the previous iteration. The learning rate relates to the speed at which the network training converges. Regularization is meant to prevent over-fitting; here we use the L2 regularization.
`batch` is a special decorator, which takes a reader and outputs a *batch reader*, which doesn't yield an instance, but a minibatch at a time.
Then we specify the training data `paddle.dataset.mnist.train()` and testing data `paddle.dataset.mnist.test()`. These two methods are *reader creators*. Once called, a reader creator returns a *reader*. A reader is a Python method, which, once called, returns a Python generator, which yields instances of data.
### Trainer Configuration
`shuffle` is a reader decorator. It takes a reader A as input and returns a new reader B. Under the hood, B calls A to read data in the following fashion: it copies in `buffer_size` instances at a time into a buffer, shuffles the data, and yields the shuffled instances one at a time. A large buffer size would yield very shuffled data.
Now, we need to setup the trainer. The trainer need to take in `train_program`, `place`, and `optimizer`.
In the following `Adam` optimizer, `learning_rate` means the speed at which the network training converges.
`batch` is a special decorator, which takes a reader and outputs a *batch reader*, which doesn't yield an instance, but a minibatch at a time.
```python
use_cuda = False # set to True if training with GPU
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
# Test with Pass 0, Cost 0.326659, {'classification_error_evaluator': 0.09470000118017197}
# Test with Pass 0, Cost 0.326659, {'classification_error_evaluator': 0.09470000118017197}
```
```
After the training, we can check the model's prediction accuracy.
After the training, we can check the model's prediction accuracy.
...
@@ -357,27 +363,25 @@ Usually, with MNIST data, the softmax regression model achieves an accuracy arou
...
@@ -357,27 +363,25 @@ Usually, with MNIST data, the softmax regression model achieves an accuracy arou
## Application
## Application
After training, users can use the trained model to classify images. The following code shows how to inference MNIST images through `paddle.infer` interface.
After training, users can use the trained model to classify images. The following code shows how to inference MNIST images through `fluid.Inferencer`.
```python
```python
from PIL import Image
inferencer = fluid.Inferencer(
import numpy as np
# infer_func=softmax_regression, # uncomment for softmax regression
import os
# infer_func=multilayer_perceptron, # uncomment for MLP
def load_image(file):
infer_func=convolutional_neural_network, # uncomment for LeNet5