|
|
**This document is no longer maintained.**
|
|
|
|
|
|
## A Complete Tutorial
|
|
|
|
|
|
This [tutorial](https://github.com/PaddlePaddle/book/blob/develop/mnist-client/README.md) is a step-by-step guide on how to create an online handwriting recognition application using a pre-trained MNIST model.
|
|
|
|
|
|
For all these pre-trained models, please go on reading. We hope they could help you create innovative applications.
|
|
|
|
|
|
- [1. word2vec](#1-word2vec)
|
|
|
- [2. Image Classification](#2-image-classification)
|
|
|
- [3. Sentiment Classification](#3-sentiment-classification)
|
|
|
- [4. Machine Translation](#4-machine-translation)
|
|
|
- [5. Recognize Digits](#5-recognize-digits)
|
|
|
- [6. Object Detection](#6-object-detection)
|
|
|
|
|
|
|
|
|
## 1. word2vec
|
|
|
|
|
|
The PaddlePaddle program that trains the model is from [Chapter 4](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec) of the [PaddlePaddle book](github.com/paddlepaddle/book).
|
|
|
|
|
|
For a pre-trained model, please download the following files:
|
|
|
|
|
|
- [topology](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/04.word2vec/inference_topology.pkl)
|
|
|
- [parameter](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/04.word2vec/param.tar)
|
|
|
- [dictionary](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/04.word2vec/word_dict)
|
|
|
- [embedding table](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/04.word2vec/embedding_table)
|
|
|
|
|
|
We use the Peen Treebank (PTB) (Tomas Mikolov’s pre-processed version) [dataset](https://github.com/PaddlePaddle/book/blob/develop/04.word2vec/README.md#dataset). There are 2073 words in the dictionary.
|
|
|
|
|
|
The learned embedding table is part of the parameters, represents each of the 2073 words by a vector of 32 float values.
|
|
|
|
|
|
Given the learned embedding table, we can compute cosine similarity of two words:
|
|
|
|
|
|
```python
|
|
|
from scipy import spatial
|
|
|
import numpy
|
|
|
|
|
|
def load_dict_and_embedding():
|
|
|
word_dict = dict()
|
|
|
with open("word_dict", "r") as f:
|
|
|
for line in f:
|
|
|
key, value = line.strip().split(" ")
|
|
|
word_dict[key] = int(value)
|
|
|
|
|
|
embeddings = numpy.loadtxt("embedding_table", delimiter=",")
|
|
|
return word_dict, embeddings
|
|
|
|
|
|
# load word dict and embedding table
|
|
|
word_dict, embedding_table = load_dict_and_embedding()
|
|
|
|
|
|
print(spatial.distance.cosine(embedding_table[word_dict['car']], embedding_table[word_dict['world']]))
|
|
|
print(spatial.distance.cosine(embedding_table[word_dict['say']], embedding_table[word_dict['talking']]))
|
|
|
```
|
|
|
|
|
|
It would print
|
|
|
```text
|
|
|
0.0698071067872
|
|
|
0.583393289346
|
|
|
```
|
|
|
|
|
|
The first number shows that "car" and "world" are less correlated, and the second one shows that "say" and "talk" are highly correlated.
|
|
|
|
|
|
|
|
|
## 2. Image Classification
|
|
|
|
|
|
The PaddlePaddle program that trains the model is from [Chapter 3](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification) of the [PaddlePaddle book](github.com/paddlepaddle/book). Please be aware that this program requires CUDA GPU.
|
|
|
|
|
|
For a pre-trained model, please download the following files:
|
|
|
|
|
|
- [topology](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/03.image_classification/inference_topology.pkl)
|
|
|
- [parameter](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/03.image_classification/param.tar)
|
|
|
|
|
|
The training data is the [cifar10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html), which includes 10 classes of 32*32 color images:
|
|
|
|
|
|
| Class | Label |
|
|
|
| :------:| :------:|
|
|
|
| airplane | 0 |
|
|
|
| automobile | 1 |
|
|
|
| bird | 2 |
|
|
|
| cat | 3 |
|
|
|
| deer | 4 |
|
|
|
| dog | 5 |
|
|
|
| frog | 6 |
|
|
|
| horse | 7 |
|
|
|
| ship | 8 |
|
|
|
| truck | 9 |
|
|
|
|
|
|
This model takes a 32x32x3-dimensional vector as the input, which should be a flatten an image -- the first 1024 (32*32) values in this vector should be the red channel, followed by the green and the blue channel. Within each channel, values are in row-major, so the first 32 values in each channel are from the first row. We need to normalize values into the range [0.0, 1.0].
|
|
|
|
|
|
The model's output is a 10-vector of class possibilities.
|
|
|
|
|
|
We can run a PaddlePaddle server in a Docker container to serve the model.
|
|
|
|
|
|
1. Download the topology and parameter files to the current directory.
|
|
|
|
|
|
2. Run the server container
|
|
|
```bash
|
|
|
nvidia-docker run --name=my_svr -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
|
|
|
```
|
|
|
|
|
|
3. Check that the server is up and running
|
|
|
```bash
|
|
|
docker logs my_svr
|
|
|
```
|
|
|
should print something like
|
|
|
```
|
|
|
I0915 19:56:44.282585 66 Util.cpp:166] commandline: --use_gpu=True
|
|
|
```
|
|
|
|
|
|
4. Call the server. The following Python script processes an image and sends it to the server:
|
|
|
```python
|
|
|
import cv2
|
|
|
import numpy as np
|
|
|
import json
|
|
|
import requests
|
|
|
|
|
|
img_file = "./img.png"
|
|
|
BACKEND_URL = "http://localhost:8000"
|
|
|
|
|
|
img = cv2.imread(img_file)
|
|
|
img = np.swapaxes(img, 1, 2)
|
|
|
img = np.swapaxes(img, 1, 0)
|
|
|
arr = img.flatten()
|
|
|
arr = arr / 255.0
|
|
|
req = {"image": arr.tolist()}
|
|
|
req = requests.request("POST", url=BACKEND_URL, json=req)
|
|
|
print json.dumps(req.json())
|
|
|
```
|
|
|
It prints result label probabilities like
|
|
|
```
|
|
|
{
|
|
|
"code": 0,
|
|
|
"data": [
|
|
|
[
|
|
|
4.751561937155202e-05,
|
|
|
3.0828364288026933e-06,
|
|
|
1.4101417946221773e-05,
|
|
|
0.9994580149650574,
|
|
|
0.0001739991275826469,
|
|
|
0.00024292869784403592,
|
|
|
3.745338835869916e-05,
|
|
|
1.409481956216041e-05,
|
|
|
2.895288162108045e-06,
|
|
|
5.973368843115168e-06
|
|
|
]
|
|
|
],
|
|
|
"message": "success"
|
|
|
}
|
|
|
```
|
|
|
|
|
|
|
|
|
## 3. Sentiment Classification
|
|
|
|
|
|
|
|
|
The PaddlePaddle program that trains this model comes from [Chapter 6](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment) of the [PaddlePaddle book](github.com/paddlepaddle/book).
|
|
|
|
|
|
We can download the following files to get a pre-trained model:
|
|
|
|
|
|
- [topology](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/06.understand_sentiment/inference_topology.pkl)
|
|
|
- [parameter](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/06.understand_sentiment/param.tar)
|
|
|
- [dictionary](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/06.understand_sentiment/word_dict.tar)
|
|
|
|
|
|
|
|
|
The training data is the [IMDB dataset](http://ai.stanford.edu/~amaas/data/sentiment/). This model takes a sequence of word indexes as an input and outputs possibilities of the sentiment being positive or negative.
|
|
|
|
|
|
We can run a PaddlePaddle server instance in a Docker container to serve this model:
|
|
|
|
|
|
1. Download the topology and parameter files through the links above, and copy them to the current directory.
|
|
|
|
|
|
2. Run the server
|
|
|
```bash
|
|
|
docker run --name my_svr -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=0 paddlepaddle/book:serve
|
|
|
```
|
|
|
|
|
|
3. Check that the server is up and running
|
|
|
```bash
|
|
|
docker logs my_svr
|
|
|
```
|
|
|
should print something like
|
|
|
```
|
|
|
I0915 19:56:44.282585 66 Util.cpp:166] commandline: --use_gpu=False
|
|
|
```
|
|
|
|
|
|
4. Call the server. According to the above dictionary, words in the sentence "I like it" corresponds to indices 8, 37, 7. We can send this sentence to the server by running:
|
|
|
```bash
|
|
|
curl -v -H "Content-Type: application/json" -X POST -d '{"word":[8,37,7]}' http://localhost:8000/
|
|
|
```
|
|
|
The response should look like:
|
|
|
```
|
|
|
{
|
|
|
"code": 0,
|
|
|
"data": [
|
|
|
[
|
|
|
0.9999890327453613,
|
|
|
1.0963042768707965e-05
|
|
|
]
|
|
|
],
|
|
|
"message": "success"
|
|
|
}
|
|
|
```
|
|
|
|
|
|
|
|
|
## 4. Machine Translation
|
|
|
|
|
|
The PaddlePaddle program that trains the model comes from [Chapter 8](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) of the [PaddlePaddle book](github.com/paddlepaddle/book).
|
|
|
|
|
|
We can download the following files to get a pre-trained model:
|
|
|
|
|
|
- [topology](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/08.machine_translation/inference_topology.pkl)
|
|
|
- [parameter](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/08.machine_translation/param.tar)
|
|
|
- [source dictionary](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/08.machine_translation/src_dict.txt)
|
|
|
- [target dictionary](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/08.machine_translation/trg_dict.txt)
|
|
|
|
|
|
The training dataset is the WMT-14 dataset (French to English) and got a BLUE score of 26.92. Both the source and target dictionaries have 30000 words.
|
|
|
|
|
|
We can run the PaddlePaddle server in a Docker container:
|
|
|
|
|
|
1. Download the model files to the current directory.
|
|
|
2. Run the server
|
|
|
```
|
|
|
docker run --name my_svr -v $(pwd):/data -d -p 8000:80 -e WITH_GPU=0 -e OUTPUT_FIELD=prob,id paddlepaddle/book:serve
|
|
|
```
|
|
|
|
|
|
3. Call the server. For example, words in the French sentence "le temps est très bon" correspond to word IDs 0, 12, 169, 22, 631, and 1, according to the source dictionary. This [Python script](https://gist.github.com/Superjom/04cf4be0178cee3672ee45b02b6c6437) sends the sentence to the server and prints the translation result:
|
|
|
```
|
|
|
<s> le temps est bon <e>
|
|
|
0th 0.058041 Time is good . <e>
|
|
|
1th 0.051606 time is good . <e>
|
|
|
2th 0.015649 the time is good . <e>
|
|
|
```
|
|
|
|
|
|
|
|
|
## 5. Recognize Digits
|
|
|
|
|
|
The PaddlePaddle program that trains this model comes from [Chapter 2](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits) of the [PaddlePaddle book](github.com/paddlepaddle/book).
|
|
|
|
|
|
We can download the following files to get a pre-trained model:
|
|
|
- [topology](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/inference_topology.pkl)
|
|
|
- [parameter](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/param.tar)
|
|
|
|
|
|
The training dataset is the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset, where each image is a handwritten gray-scale digit, cropped into 28x28 pixels.
|
|
|
|
|
|
We can run a PaddlePaddle server instance in a Docker container to serve this model.
|
|
|
1. Download model files to the current directory.
|
|
|
1. Run the server
|
|
|
```bash
|
|
|
nvidia-docker run --name my_svr -d -v $PWD:/data -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
|
|
|
```
|
|
|
1. Use this [Gist](https://gist.github.com/dzhwinter/c0e1edfdfac329006f0ef1bb5cd7873d) to flatten and print an image. You can copy-n-paste the result into the following curl command to send it to the server:
|
|
|
```bash
|
|
|
curl -v -H "Content-Type: application/json" -X POST -d '{"img":[/*your image goes here*/]} http://localhost:8000/'
|
|
|
```
|
|
|
The final output should look like the following
|
|
|
```
|
|
|
{
|
|
|
"code": 0,
|
|
|
"data": [
|
|
|
[
|
|
|
0.03862910717725754,
|
|
|
0.5247572064399719,
|
|
|
0.04542972892522812,
|
|
|
0.02226484753191471,
|
|
|
0.09190762042999268,
|
|
|
0.01627335511147976,
|
|
|
0.04605291783809662,
|
|
|
0.13657279312610626,
|
|
|
0.04367228224873543,
|
|
|
0.03444007411599159
|
|
|
]
|
|
|
|
|
|
],
|
|
|
"message": "success"
|
|
|
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## 6. Object Detection
|
|
|
|
|
|
The PaddlePaddle program that trains this model is the [PaddlePaddle model bank](https://github.com/PaddlePaddle/models/tree/develop/ssd).
|
|
|
|
|
|
![alt](https://github.com/reyoung/MITHackthonDetectionDemo/raw/master/result.jpg)
|
|
|
|
|
|
Feel free to check out this [demo](https://github.com/reyoung/MITHackthonDetectionDemo).
|
|
|
|
|
|
We can download the following files to get pre-trained model:
|
|
|
- [parameters](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/param.tar)
|
|
|
- [topology](https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/inference_topology.pkl)
|
|
|
|
|
|
The training dataset is the PASCAL VOC dataset
|
|
|
- http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html
|
|
|
- http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
|
|
|
|
|
|
The labels include:
|
|
|
|
|
|
| class | label |
|
|
|
| --- | --- |
|
|
|
|background | 0 |
|
|
|
| aeroplane | 1 |
|
|
|
| bicycle | 2 |
|
|
|
| bird | 3 |
|
|
|
| boat | 4 |
|
|
|
| bottle | 5 |
|
|
|
| bus | 6 |
|
|
|
| car | 7 |
|
|
|
| cat | 8 |
|
|
|
| chair | 9 |
|
|
|
| cow | 10 |
|
|
|
| diningtable | 11 |
|
|
|
| dog | 12 |
|
|
|
| horse | 13 |
|
|
|
| motorbike| 14 |
|
|
|
| person | 15 |
|
|
|
| pottedplant| 16 |
|
|
|
| sheep | 17 |
|
|
|
| sofa | 18 |
|
|
|
| train | 19 |
|
|
|
| tvmonitor | 20 |
|
|
|
|
|
|
|
|
|
We can run a PaddlePaddle server in a Docker container to serve this model. This model uses some layers that don't have CPU-only implementations, so we will have to run the server on a CUDA GPU computer:
|
|
|
|
|
|
```bash
|
|
|
wget https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/param.tar
|
|
|
wget https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/inference_topology.pkl
|
|
|
nvidia-docker run --name my_svr -d -v $PWD:/data -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
|
|
|
```
|
|
|
|
|
|
|
|
|
We can send an image to this server to get its predicted labels. Because images in the training dataset have mean pixels values 104, 117, and 124 for the RGB channels respectively, we'd need to shift RGB values of the input image and normalized in the range [0.0, 1.0]. Also, the input image should be resized or cropped as `300 x 300` pixels.
|
|
|
|
|
|
The server accepts a JSON of the processed the input image:
|
|
|
```json
|
|
|
{
|
|
|
"image": [ -104, 108, 112, ...]
|
|
|
}
|
|
|
```
|
|
|
|
|
|
The `image` field should contain 3x300x300 values.
|
|
|
|
|
|
The following sample program loads, resizes, normalizes, and flattens an image and sends the result to the server for recognition:
|
|
|
|
|
|
```python
|
|
|
import numpy as np
|
|
|
import requests
|
|
|
from PIL import Image
|
|
|
import json
|
|
|
|
|
|
# Change to your backend URL
|
|
|
BACKEND_URL = "http://127.0.0.1:8000"
|
|
|
|
|
|
img = Image.open("test.jpg")
|
|
|
# Resize or crop to 300 x 300
|
|
|
img = img.crop((0, 0, 300, 300))
|
|
|
mean = np.array([104, 117, 124], dtype='float32')[:, np.newaxis, np.newaxis]
|
|
|
|
|
|
# The image shape should be [channel, height, width], i.e., [3, 300, 300]
|
|
|
img = np.swapaxes(img, 1, 2)
|
|
|
img = np.swapaxes(img, 1, 0)
|
|
|
|
|
|
img = (np.array(img) - mean).flatten()
|
|
|
req = {"image": img.tolist()}
|
|
|
|
|
|
req = requests.request("POST", url=BACKEND_URL, json=req)
|
|
|
print json.dumps(req.json())
|
|
|
```
|
|
|
|
|
|
It should print something like the following:
|
|
|
|
|
|
```json
|
|
|
{
|
|
|
"message": "success",
|
|
|
"code": 0,
|
|
|
"data": [
|
|
|
[
|
|
|
0,
|
|
|
3,
|
|
|
0.013827803544700146,
|
|
|
0.914117693901062,
|
|
|
0.6044294238090515,
|
|
|
1,
|
|
|
0.7246007323265076
|
|
|
],
|
|
|
[ ... ],
|
|
|
...
|
|
|
]
|
|
|
}
|
|
|
```
|
|
|
|
|
|
The response includes status `message` and `code`. The `data` field is an array of 7-vectors, where each vector corresponds to a detected object, and include the following seven elements:
|
|
|
|
|
|
1. Always be zero.
|
|
|
1. A detected label. Please be aware that 0 means background.
|
|
|
1. The confidence score. The higher, the more confident.
|
|
|
1. The x-axis of the upper-left position of the detected object.
|
|
|
1. The y-axis of the upper-left position of the detected object.
|
|
|
1. The x-axis of the bottom-right position of the detected object.
|
|
|
1. The y-axis of the bottom-right position of the detected object. |