This document is no longer maintained.
A Complete Tutorial
This tutorial is a step-by-step guide on how to create an online handwriting recognition application using a pre-trained MNIST model.
For all these pre-trained models, please go on reading. We hope they could help you create innovative applications.
- 1. word2vec
- 2. Image Classification
- 3. Sentiment Classification
- 4. Machine Translation
- 5. Recognize Digits
- 6. Object Detection
1. word2vec
The PaddlePaddle program that trains the model is from Chapter 4 of the PaddlePaddle book.
For a pre-trained model, please download the following files:
We use the Peen Treebank (PTB) (Tomas Mikolov’s pre-processed version) dataset. There are 2073 words in the dictionary.
The learned embedding table is part of the parameters, represents each of the 2073 words by a vector of 32 float values.
Given the learned embedding table, we can compute cosine similarity of two words:
from scipy import spatial
import numpy
def load_dict_and_embedding():
word_dict = dict()
with open("word_dict", "r") as f:
for line in f:
key, value = line.strip().split(" ")
word_dict[key] = int(value)
embeddings = numpy.loadtxt("embedding_table", delimiter=",")
return word_dict, embeddings
# load word dict and embedding table
word_dict, embedding_table = load_dict_and_embedding()
print(spatial.distance.cosine(embedding_table[word_dict['car']], embedding_table[word_dict['world']]))
print(spatial.distance.cosine(embedding_table[word_dict['say']], embedding_table[word_dict['talking']]))
It would print
0.0698071067872
0.583393289346
The first number shows that "car" and "world" are less correlated, and the second one shows that "say" and "talk" are highly correlated.
2. Image Classification
The PaddlePaddle program that trains the model is from Chapter 3 of the PaddlePaddle book. Please be aware that this program requires CUDA GPU.
For a pre-trained model, please download the following files:
The training data is the cifar10 dataset, which includes 10 classes of 32*32 color images:
Class | Label |
---|---|
airplane | 0 |
automobile | 1 |
bird | 2 |
cat | 3 |
deer | 4 |
dog | 5 |
frog | 6 |
horse | 7 |
ship | 8 |
truck | 9 |
This model takes a 32x32x3-dimensional vector as the input, which should be a flatten an image -- the first 1024 (32*32) values in this vector should be the red channel, followed by the green and the blue channel. Within each channel, values are in row-major, so the first 32 values in each channel are from the first row. We need to normalize values into the range [0.0, 1.0].
The model's output is a 10-vector of class possibilities.
We can run a PaddlePaddle server in a Docker container to serve the model.
-
Download the topology and parameter files to the current directory.
-
Run the server container
nvidia-docker run --name=my_svr -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
-
Check that the server is up and running
docker logs my_svr
should print something like
I0915 19:56:44.282585 66 Util.cpp:166] commandline: --use_gpu=True
-
Call the server. The following Python script processes an image and sends it to the server:
import cv2 import numpy as np import json import requests img_file = "./img.png" BACKEND_URL = "http://localhost:8000" img = cv2.imread(img_file) img = np.swapaxes(img, 1, 2) img = np.swapaxes(img, 1, 0) arr = img.flatten() arr = arr / 255.0 req = {"image": arr.tolist()} req = requests.request("POST", url=BACKEND_URL, json=req) print json.dumps(req.json())
It prints result label probabilities like
{ "code": 0, "data": [ [ 4.751561937155202e-05, 3.0828364288026933e-06, 1.4101417946221773e-05, 0.9994580149650574, 0.0001739991275826469, 0.00024292869784403592, 3.745338835869916e-05, 1.409481956216041e-05, 2.895288162108045e-06, 5.973368843115168e-06 ] ], "message": "success" }
3. Sentiment Classification
The PaddlePaddle program that trains this model comes from Chapter 6 of the PaddlePaddle book.
We can download the following files to get a pre-trained model:
The training data is the IMDB dataset. This model takes a sequence of word indexes as an input and outputs possibilities of the sentiment being positive or negative.
We can run a PaddlePaddle server instance in a Docker container to serve this model:
-
Download the topology and parameter files through the links above, and copy them to the current directory.
-
Run the server
docker run --name my_svr -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=0 paddlepaddle/book:serve
-
Check that the server is up and running
docker logs my_svr
should print something like
I0915 19:56:44.282585 66 Util.cpp:166] commandline: --use_gpu=False
-
Call the server. According to the above dictionary, words in the sentence "I like it" corresponds to indices 8, 37, 7. We can send this sentence to the server by running:
curl -v -H "Content-Type: application/json" -X POST -d '{"word":[8,37,7]}' http://localhost:8000/
The response should look like:
{ "code": 0, "data": [ [ 0.9999890327453613, 1.0963042768707965e-05 ] ], "message": "success" }
4. Machine Translation
The PaddlePaddle program that trains the model comes from Chapter 8 of the PaddlePaddle book.
We can download the following files to get a pre-trained model:
The training dataset is the WMT-14 dataset (French to English) and got a BLUE score of 26.92. Both the source and target dictionaries have 30000 words.
We can run the PaddlePaddle server in a Docker container:
-
Download the model files to the current directory.
-
Run the server
docker run --name my_svr -v $(pwd):/data -d -p 8000:80 -e WITH_GPU=0 -e OUTPUT_FIELD=prob,id paddlepaddle/book:serve
-
Call the server. For example, words in the French sentence "le temps est très bon" correspond to word IDs 0, 12, 169, 22, 631, and 1, according to the source dictionary. This Python script sends the sentence to the server and prints the translation result:
<s> le temps est bon <e>
0th 0.058041 Time is good . <e>
1th 0.051606 time is good . <e>
2th 0.015649 the time is good . <e>
5. Recognize Digits
The PaddlePaddle program that trains this model comes from Chapter 2 of the PaddlePaddle book.
We can download the following files to get a pre-trained model:
The training dataset is the MNIST dataset, where each image is a handwritten gray-scale digit, cropped into 28x28 pixels.
We can run a PaddlePaddle server instance in a Docker container to serve this model.
- Download model files to the current directory.
- Run the server
nvidia-docker run --name my_svr -d -v $PWD:/data -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
- Use this Gist to flatten and print an image. You can copy-n-paste the result into the following curl command to send it to the server:
The final output should look like the followingcurl -v -H "Content-Type: application/json" -X POST -d '{"img":[/*your image goes here*/]} http://localhost:8000/'
{ "code": 0, "data": [ [ 0.03862910717725754, 0.5247572064399719, 0.04542972892522812, 0.02226484753191471, 0.09190762042999268, 0.01627335511147976, 0.04605291783809662, 0.13657279312610626, 0.04367228224873543, 0.03444007411599159 ] ], "message": "success" }
6. Object Detection
The PaddlePaddle program that trains this model is the PaddlePaddle model bank.
Feel free to check out this demo.
We can download the following files to get pre-trained model:
The training dataset is the PASCAL VOC dataset
- http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html
- http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
The labels include:
class | label |
---|---|
background | 0 |
aeroplane | 1 |
bicycle | 2 |
bird | 3 |
boat | 4 |
bottle | 5 |
bus | 6 |
car | 7 |
cat | 8 |
chair | 9 |
cow | 10 |
diningtable | 11 |
dog | 12 |
horse | 13 |
motorbike | 14 |
person | 15 |
pottedplant | 16 |
sheep | 17 |
sofa | 18 |
train | 19 |
tvmonitor | 20 |
We can run a PaddlePaddle server in a Docker container to serve this model. This model uses some layers that don't have CPU-only implementations, so we will have to run the server on a CUDA GPU computer:
wget https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/param.tar
wget https://s3.us-east-2.amazonaws.com/models.paddlepaddle/SSD/inference_topology.pkl
nvidia-docker run --name my_svr -d -v $PWD:/data -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
We can send an image to this server to get its predicted labels. Because images in the training dataset have mean pixels values 104, 117, and 124 for the RGB channels respectively, we'd need to shift RGB values of the input image and normalized in the range [0.0, 1.0]. Also, the input image should be resized or cropped as 300 x 300
pixels.
The server accepts a JSON of the processed the input image:
{
"image": [ -104, 108, 112, ...]
}
The image
field should contain 3x300x300 values.
The following sample program loads, resizes, normalizes, and flattens an image and sends the result to the server for recognition:
import numpy as np
import requests
from PIL import Image
import json
# Change to your backend URL
BACKEND_URL = "http://127.0.0.1:8000"
img = Image.open("test.jpg")
# Resize or crop to 300 x 300
img = img.crop((0, 0, 300, 300))
mean = np.array([104, 117, 124], dtype='float32')[:, np.newaxis, np.newaxis]
# The image shape should be [channel, height, width], i.e., [3, 300, 300]
img = np.swapaxes(img, 1, 2)
img = np.swapaxes(img, 1, 0)
img = (np.array(img) - mean).flatten()
req = {"image": img.tolist()}
req = requests.request("POST", url=BACKEND_URL, json=req)
print json.dumps(req.json())
It should print something like the following:
{
"message": "success",
"code": 0,
"data": [
[
0,
3,
0.013827803544700146,
0.914117693901062,
0.6044294238090515,
1,
0.7246007323265076
],
[ ... ],
...
]
}
The response includes status message
and code
. The data
field is an array of 7-vectors, where each vector corresponds to a detected object, and include the following seven elements:
- Always be zero.
- A detected label. Please be aware that 0 means background.
- The confidence score. The higher, the more confident.
- The x-axis of the upper-left position of the detected object.
- The y-axis of the upper-left position of the detected object.
- The x-axis of the bottom-right position of the detected object.
- The y-axis of the bottom-right position of the detected object.