whl_en.md 21.7 KB
Newer Older
qq_25193841's avatar
qq_25193841 已提交
1
# Paddleocr Package
W
WenmuZhou 已提交
2

W
WenmuZhou 已提交
3 4
## 1 Get started quickly
### 1.1 install package
W
WenmuZhou 已提交
5 6
install by pypi
```bash
W
WenmuZhou 已提交
7
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
W
WenmuZhou 已提交
8 9 10 11
```

build own whl package and install
```bash
W
WenmuZhou 已提交
12 13
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
W
WenmuZhou 已提交
14
```
W
WenmuZhou 已提交
15 16 17
## 2 Use
### 2.1 Use by code
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
W
WenmuZhou 已提交
18

W
WenmuZhou 已提交
19
* detection angle classification and recognition
W
WenmuZhou 已提交
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
38
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>

W
WenmuZhou 已提交
57 58 59
* detection and recognition
```python
from paddleocr import PaddleOCR,draw_ocr
W
WenmuZhou 已提交
60
ocr = PaddleOCR(lang='en') # need to run only once to download and load model into memory
W
WenmuZhou 已提交
61
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
62
result = ocr.ocr(img_path, cls=False)
W
WenmuZhou 已提交
63 64 65 66 67 68 69 70 71
for line in result:
    print(line)

# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
72
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
73 74 75 76 77 78 79 80 81
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
W
WenmuZhou 已提交
82
......
W
WenmuZhou 已提交
83 84 85 86 87 88 89 90
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>

W
WenmuZhou 已提交
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
* classification and recognition
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path, det=False, cls=True)
for line in result:
    print(line)
```

Output will be a list, each item contains recognition text and confidence
```bash
['PAIN', 0.990372]
```

W
WenmuZhou 已提交
106 107 108
* only detection
```python
from paddleocr import PaddleOCR,draw_ocr
W
WenmuZhou 已提交
109
ocr = PaddleOCR() # need to run only once to download and load model into memory
W
WenmuZhou 已提交
110 111 112 113 114 115 116 117 118
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path,rec=False)
for line in result:
    print(line)

# draw result
from PIL import Image

image = Image.open(img_path).convert('RGB')
W
WenmuZhou 已提交
119
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
120 121 122 123 124 125 126 127 128
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
W
WenmuZhou 已提交
129
......
W
WenmuZhou 已提交
130 131 132 133 134 135 136 137 138 139 140
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det.jpg" width="800">
</div>

* only recognition
```python
from paddleocr import PaddleOCR
W
WenmuZhou 已提交
141
ocr = PaddleOCR(lang='en') # need to run only once to load model into memory
W
WenmuZhou 已提交
142
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
W
WenmuZhou 已提交
143
result = ocr.ocr(img_path, det=False, cls=False)
W
WenmuZhou 已提交
144 145 146 147
for line in result:
    print(line)
```

W
WenmuZhou 已提交
148
Output will be a list, each item contains recognition text and confidence
W
WenmuZhou 已提交
149 150 151 152
```bash
['PAIN', 0.990372]
```

W
WenmuZhou 已提交
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
* only classification
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
for line in result:
    print(line)
```

Output will be a list, each item contains classification result and confidence
```bash
['0', 0.99999964]
```

W
WenmuZhou 已提交
168
### 2.2 Use by command line
W
WenmuZhou 已提交
169 170 171 172 173 174

show help information
```bash
paddleocr -h
```

W
WenmuZhou 已提交
175 176
* detection classification and recognition
```bash
A
andyjpaddle 已提交
177
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true --lang en
W
WenmuZhou 已提交
178 179 180 181
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
182 183 184
[[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
[[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
[[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
W
WenmuZhou 已提交
185 186 187
......
```

W
WenmuZhou 已提交
188 189
* detection and recognition
```bash
A
andyjpaddle 已提交
190
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --lang en
W
WenmuZhou 已提交
191 192 193 194
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
195 196 197
[[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
[[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
[[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
W
WenmuZhou 已提交
198
......
W
WenmuZhou 已提交
199 200
```

W
WenmuZhou 已提交
201 202
* classification and recognition
```bash
A
andyjpaddle 已提交
203
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --lang en
W
WenmuZhou 已提交
204 205 206 207
```

Output will be a list, each item contains text and recognition confidence
```bash
208
['PAIN', 0.9934559464454651]
W
WenmuZhou 已提交
209 210
```

W
WenmuZhou 已提交
211 212 213 214 215 216 217
* only detection
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false
```

Output will be a list, each item only contains bounding box
```bash
218 219 220
[[397.0, 802.0], [1092.0, 802.0], [1092.0, 841.0], [397.0, 841.0]]
[[397.0, 750.0], [1211.0, 750.0], [1211.0, 789.0], [397.0, 789.0]]
[[397.0, 702.0], [1209.0, 698.0], [1209.0, 734.0], [397.0, 738.0]]
W
WenmuZhou 已提交
221
......
W
WenmuZhou 已提交
222 223 224 225
```

* only recognition
```bash
A
andyjpaddle 已提交
226
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --lang en
W
WenmuZhou 已提交
227 228 229 230
```

Output will be a list, each item contains text and recognition confidence
```bash
231
['PAIN', 0.9934559464454651]
W
WenmuZhou 已提交
232 233
```

W
WenmuZhou 已提交
234 235
* only classification
```bash
W
WenmuZhou 已提交
236
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --rec false
W
WenmuZhou 已提交
237 238 239 240 241 242 243
```

Output will be a list, each item contains classification result and confidence
```bash
['0', 0.99999964]
```

W
WenmuZhou 已提交
244
## 3 Use custom model
W
WenmuZhou 已提交
245 246 247
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows

W
WenmuZhou 已提交
248
### 3.1 Use by code
W
WenmuZhou 已提交
249 250 251 252

```python
from paddleocr import PaddleOCR,draw_ocr
# The path of detection and recognition model must contain model and params files
W
WenmuZhou 已提交
253
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True)
W
WenmuZhou 已提交
254
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
W
WenmuZhou 已提交
255
result = ocr.ocr(img_path, cls=True)
W
WenmuZhou 已提交
256 257 258 259 260 261 262 263 264
for line in result:
    print(line)

# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
265
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
266 267 268 269
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

W
WenmuZhou 已提交
270
### 3.2 Use by command line
W
WenmuZhou 已提交
271 272

```bash
W
WenmuZhou 已提交
273
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
W
WenmuZhou 已提交
274 275
```

W
WenmuZhou 已提交
276
## 4 Use web images or numpy array as input
W
WenmuZhou 已提交
277

W
WenmuZhou 已提交
278
### 4.1 Web image
W
WenmuZhou 已提交
279

W
WenmuZhou 已提交
280
- Use by code
W
WenmuZhou 已提交
281 282 283 284 285 286 287 288 289 290 291 292 293 294
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# show result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
295
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
296 297 298
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
W
WenmuZhou 已提交
299
- Use by command line
W
WenmuZhou 已提交
300 301 302 303
```bash
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
```

W
WenmuZhou 已提交
304
### 4.2 Numpy array
W
WenmuZhou 已提交
305 306 307
Support numpy array as input only when used by code

```python
W
WenmuZhou 已提交
308
import cv2
309
from paddleocr import PaddleOCR, draw_ocr, download_with_progressbar
W
WenmuZhou 已提交
310 311 312 313 314 315 316 317 318 319
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
img = cv2.imread(img_path)
# img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY), If your own training model supports grayscale images, you can uncomment this line
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# show result
from PIL import Image
320 321 322

download_with_progressbar(img_path, 'tmp.jpg')
image = Image.open('tmp.jpg').convert('RGB')
W
WenmuZhou 已提交
323 324 325
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
326
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
327 328 329 330 331
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```


W
WenmuZhou 已提交
332
## 5 Parameter Description
W
WenmuZhou 已提交
333 334 335 336 337 338 339

| Parameter                    | Description                                                                                                                                                                                                                 | Default value                  |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| use_gpu                 | use GPU or not                                                                                                                                                                                                          | TRUE                    |
| gpu_mem                 | GPU memory size used for initialization                                                                                                                                                                                              | 8000M                   |
| image_dir               | The images path or folder path for predicting when used by the command line                                                                                                                                                                           |                         |
| det_algorithm           | Type of detection algorithm selected                                                                                                                                                                                                   | DB                      |
W
WenmuZhou 已提交
340
| det_model_dir           | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None           |
W
WenmuZhou 已提交
341 342 343 344
| det_max_side_len        | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally                                                                                                                         | 960                     |
| det_db_thresh           | Binarization threshold value of DB output map                                                                                                                                                                                        | 0.3                     |
| det_db_box_thresh       | The threshold value of the DB output box. Boxes score lower than this value will be discarded                                                                                                                                                                         | 0.5                     |
| det_db_unclip_ratio     | The expanded ratio of DB output box                                                                                                                                                                                             | 2                       |
L
LDOUBLEV 已提交
345
| det_db_score_mode |  The parameter that control how the score of the detection frame is calculated. There are 'fast' and 'slow' options. If the text to be detected is curved, it is recommended to use 'slow'  | 'fast' |
W
WenmuZhou 已提交
346 347 348 349
| det_east_score_thresh   | Binarization threshold value of EAST output map                                                                                                                                                                                       | 0.8                     |
| det_east_cover_thresh   | The threshold value of the EAST output box. Boxes score lower than this value will be discarded                                                                                                                                                                         | 0.1                     |
| det_east_nms_thresh     | The NMS threshold value of EAST model output box                                                                                                                                                                                              | 0.2                     |
| rec_algorithm           | Type of recognition algorithm selected                                                                                                                                                                                                | CRNN                    |
W
WenmuZhou 已提交
350
| rec_model_dir           | the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/rec`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
W
WenmuZhou 已提交
351 352
| rec_image_shape         | image shape of recognition algorithm                                                                                                                                                                                            | "3,32,320"              |
| rec_batch_num           | When performing recognition, the batchsize of forward images                                                                                                                                                                                         | 30                      |
W
WenmuZhou 已提交
353 354
| max_text_length         | The maximum text length that the recognition algorithm can recognize                                                                                                                                                                                         | 25                      |
| rec_char_dict_path      | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2                                                                                                                                              | ./ppocr/utils/ppocr_keys_v1.txt                        |
W
WenmuZhou 已提交
355
| use_space_char          | Whether to recognize spaces                                                                                                                                                                                                         | TRUE                    |
W
WenmuZhou 已提交
356
| drop_score          | Filter the output by score (from the recognition model), and those below this score will not be returned                                                                                                                                                                                                        | 0.5                    |
W
WenmuZhou 已提交
357 358 359 360 361
| use_angle_cls          | Whether to load classification model                                                                                                                                                                                                       | FALSE                    |
| cls_model_dir           | the classification inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/cls`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| cls_image_shape         | image shape of classification algorithm                                                                                                                                                                                            | "3,48,192"              |
| label_list         | label list of classification algorithm                                                                                                                                                                                            | ['0','180']           |
| cls_batch_num           | When performing classification, the batchsize of forward images                                                                                                                                                                                         | 30                      |
W
WenmuZhou 已提交
362
| enable_mkldnn           | Whether to enable mkldnn                                                                                                                                                                                                       | FALSE                   |
W
WenmuZhou 已提交
363 364
| use_zero_copy_run           | Whether to forward by zero_copy_run                                                                                                                                                                               | FALSE                   |
| lang                     | The support language, now only Chinese(ch)、English(en)、French(french)、German(german)、Korean(korean)、Japanese(japan) are supported                                                                                                                                                                                                  | ch                    |
W
WenmuZhou 已提交
365
| det                     | Enable detction when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
W
WenmuZhou 已提交
366
| rec                     | Enable recognition when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
W
WenmuZhou 已提交
367
| cls                     | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction)                                                                                                                                                                                                   | FALSE                    |
文幕地方's avatar
文幕地方 已提交
368
| show_log                     | Whether to print log| FALSE                    |
Z
zhoujun 已提交
369
| type                     | Perform ocr or table structuring, the value is selected in ['ocr','structure']                                                                                                                                                                                             | ocr                    |
A
andyjpaddle 已提交
370
| ocr_version                     | OCR Model version number, the current model support list is as follows: PP-OCRv3 supports Chinese and English detection, recognition, multilingual recognition, direction classifier models, PP-OCRv2 support Chinese detection and recognition model, PP-OCR support Chinese detection, recognition and direction classifier, multilingual recognition model | PP-OCRv3                 |