This document gives the prediction time-consuming benchmark of PaddleOCR Ultra Lightweight Chinese Model (8.6M) on each platform.
## TEST DATA
* 500 images were randomly sampled from the Chinese public data set [ICDAR2017-RCTW](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_ch/datasets.md#ICDAR2017-RCTW-17).
Most of the pictures in the set were collected in the wild through mobile phone cameras.
Some are screenshots.
These pictures show various scenes, including street scenes, posters, menus, indoor scenes and screenshots of mobile applications.
## MEASUREMENT
The predicted time-consuming indicators on the four platforms are as follows:
Here we have sorted out the commonly used handwritten OCR dataset datasets, which are being updated continuously. We welcome you to contribute datasets ~
-[Institute of automation, Chinese Academy of Sciences - handwritten Chinese dataset](#Institute of automation, Chinese Academy of Sciences - handwritten Chinese dataset)
-[NIST handwritten single character dataset - English](#NIST handwritten single character dataset - English)
<aname="Institute of automation, Chinese Academy of Sciences - handwritten Chinese dataset"></a>
## Institute of automation, Chinese Academy of Sciences - handwritten Chinese dataset
* It includes online and offline handwritten data,`HWDB1.0~1.2` has totally 3895135 handwritten single character samples, which belong to 7356 categories (7185 Chinese characters and 171 English letters, numbers and symbols);`HWDB2.0~2.2` has totally 5091 pages of images, which are divided into 52230 text lines and 1349414 words. All text and text samples are stored as grayscale images. Some sample words are shown below.
-**使用建议**:Data for single character, white background, can form a large number of text lines for training. White background can be processed into transparent state, which is convenient to add various backgrounds. For the case of semantic needs, it is suggested to extract single character from real corpus to form text lines.
<aname="NIST handwritten single character dataset - English"></a>
## NIST handwritten single character dataset - English(NIST Handprinted Forms and Characters Database)
-**Data introduction**: NIST19 dataset is suitable for handwritten document and character recognition model training. It is extracted from the handwritten sample form of 3600 authors and contains 810000 character images in total. Nine of them are shown below.
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
Here we have sorted out the commonly used vertical multi-language OCR dataset datasets, which are being updated continuously. We welcome you to contribute datasets ~
-**Data introduction**: It contains more than 250000 vehicle license plate images and vehicle license plate detection and recognition information labeling. It contains the following license plate image information in different scenes.
* CCPD-Base: General license plate picture
* CCPD-DB: The brightness of license plate area is bright, dark or uneven
* CCPD-FN: The license plate is farther or closer to the camera location
* CCPD-Blur: The license plate contains blurring due to camera lens jitter
* CCPD-Weather: The license plate is photographed on rainy, snowy or foggy days
* CCPD-Challenge: So far, some of the most challenging images in license plate detection and recognition tasks
* CCPD-NP: Pictures of new cars without license plates.
![](../datasets/ccpd_demo.png)
-**Download address**
* Baidu cloud download address (extracted code is hm0U): [https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw](https://pan.baidu.com/s/1i5AOjAbtkwb17Zy-NQGqkw)
* Google drive download address:[https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view](https://drive.google.com/file/d/1rdEsCUcIUaYOVRkx5IMTRNA7PcGMmSgc/view)
-**Data introduction**: This is a toolkit for data synthesis. You can output captcha images according to the input text. Use the toolkit to generate several demo images as follows.
![](../datasets/captcha_demo.png)
-**Download address**: The dataset is generated and has no download address.
<aname="multi-language dataset"></a>
## multi-language dataset(Multi-lingual scene text detection and recognition)
-**Data introduction**: Multi language detection dataset MLT contains both language recognition and detection tasks.
* In the detection task, the training set contains 10000 images in 10 languages, and each language contains 1000 training images. The test set contains 10000 images.
* In the recognition task, the training set contains 111998 samples.
-**Download address**: The training set is large and can be downloaded in two parts. It can only be downloaded after registering on the website: