whl_en.md 21.9 KB
Newer Older
qq_25193841's avatar
qq_25193841 已提交
1
# Paddleocr Package
W
WenmuZhou 已提交
2

W
WenmuZhou 已提交
3 4
## 1 Get started quickly
### 1.1 install package
W
WenmuZhou 已提交
5 6
install by pypi
```bash
W
WenmuZhou 已提交
7
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
W
WenmuZhou 已提交
8 9 10 11
```

build own whl package and install
```bash
W
WenmuZhou 已提交
12 13
python3 setup.py bdist_wheel
pip3 install dist/paddleocr-x.x.x-py3-none-any.whl # x.x.x is the version of paddleocr
W
WenmuZhou 已提交
14
```
W
WenmuZhou 已提交
15 16 17
## 2 Use
### 2.1 Use by code
The paddleocr whl package will automatically download the ppocr lightweight model as the default model, which can be customized and replaced according to the section 3 **Custom Model**.
W
WenmuZhou 已提交
18

W
WenmuZhou 已提交
19
* detection angle classification and recognition
W
WenmuZhou 已提交
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
38
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
......
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>

W
WenmuZhou 已提交
57 58 59
* detection and recognition
```python
from paddleocr import PaddleOCR,draw_ocr
W
WenmuZhou 已提交
60
ocr = PaddleOCR(lang='en') # need to run only once to download and load model into memory
W
WenmuZhou 已提交
61
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
62
result = ocr.ocr(img_path, cls=False)
W
WenmuZhou 已提交
63 64 65 66 67 68 69 70 71
for line in result:
    print(line)

# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
72
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
73 74 75 76 77 78 79 80 81
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
W
WenmuZhou 已提交
82
......
W
WenmuZhou 已提交
83 84 85 86 87 88 89 90
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
</div>

W
WenmuZhou 已提交
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
* classification and recognition
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path, det=False, cls=True)
for line in result:
    print(line)
```

Output will be a list, each item contains recognition text and confidence
```bash
['PAIN', 0.990372]
```

W
WenmuZhou 已提交
106 107 108
* only detection
```python
from paddleocr import PaddleOCR,draw_ocr
W
WenmuZhou 已提交
109
ocr = PaddleOCR() # need to run only once to download and load model into memory
W
WenmuZhou 已提交
110 111 112 113 114 115 116 117 118
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
result = ocr.ocr(img_path,rec=False)
for line in result:
    print(line)

# draw result
from PIL import Image

image = Image.open(img_path).convert('RGB')
W
WenmuZhou 已提交
119
im_show = draw_ocr(image, result, txts=None, scores=None, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
120 121 122 123 124 125 126 127 128
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

Output will be a list, each item only contains bounding box
```bash
[[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
[[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
[[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
W
WenmuZhou 已提交
129
......
W
WenmuZhou 已提交
130 131 132 133 134 135 136 137 138 139 140
```

Visualization of results

<div align="center">
    <img src="../imgs_results/whl/12_det.jpg" width="800">
</div>

* only recognition
```python
from paddleocr import PaddleOCR
W
WenmuZhou 已提交
141
ocr = PaddleOCR(lang='en') # need to run only once to load model into memory
W
WenmuZhou 已提交
142
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
W
WenmuZhou 已提交
143
result = ocr.ocr(img_path, det=False, cls=False)
W
WenmuZhou 已提交
144 145 146 147
for line in result:
    print(line)
```

W
WenmuZhou 已提交
148
Output will be a list, each item contains recognition text and confidence
W
WenmuZhou 已提交
149 150 151 152
```bash
['PAIN', 0.990372]
```

W
WenmuZhou 已提交
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
* only classification
```python
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True) # need to run only once to load model into memory
img_path = 'PaddleOCR/doc/imgs_words_en/word_10.png'
result = ocr.ocr(img_path, det=False, rec=False, cls=True)
for line in result:
    print(line)
```

Output will be a list, each item contains classification result and confidence
```bash
['0', 0.99999964]
```

W
WenmuZhou 已提交
168
### 2.2 Use by command line
W
WenmuZhou 已提交
169 170 171 172 173 174

show help information
```bash
paddleocr -h
```

175 176
**Note**: The whl package uses the `PP-OCRv3` model by default, and the input shape used by the recognition model is `3,48,320`, so if you use the recognition function, you need to add the parameter `--rec_image_shape 3,48,320`, if you do not use the default `PP- OCRv3` model, you do not need to set this parameter.

W
WenmuZhou 已提交
177 178
* detection classification and recognition
```bash
179
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --use_angle_cls true --lang en --rec_image_shape 3,48,320
W
WenmuZhou 已提交
180 181 182 183
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
184 185 186
[[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
[[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
[[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
W
WenmuZhou 已提交
187 188 189
......
```

W
WenmuZhou 已提交
190 191
* detection and recognition
```bash
192
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --lang en --rec_image_shape 3,48,320
W
WenmuZhou 已提交
193 194 195 196
```

Output will be a list, each item contains bounding box, text and recognition confidence
```bash
197 198 199
[[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
[[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
[[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
W
WenmuZhou 已提交
200
......
W
WenmuZhou 已提交
201 202
```

W
WenmuZhou 已提交
203 204
* classification and recognition
```bash
205
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --lang en --rec_image_shape 3,48,320
W
WenmuZhou 已提交
206 207 208 209
```

Output will be a list, each item contains text and recognition confidence
```bash
210
['PAIN', 0.9934559464454651]
W
WenmuZhou 已提交
211 212
```

W
WenmuZhou 已提交
213 214 215 216 217 218 219
* only detection
```bash
paddleocr --image_dir PaddleOCR/doc/imgs_en/img_12.jpg --rec false
```

Output will be a list, each item only contains bounding box
```bash
220 221 222
[[397.0, 802.0], [1092.0, 802.0], [1092.0, 841.0], [397.0, 841.0]]
[[397.0, 750.0], [1211.0, 750.0], [1211.0, 789.0], [397.0, 789.0]]
[[397.0, 702.0], [1209.0, 698.0], [1209.0, 734.0], [397.0, 738.0]]
W
WenmuZhou 已提交
223
......
W
WenmuZhou 已提交
224 225 226 227
```

* only recognition
```bash
228
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --det false --lang en --rec_image_shape 3,48,320
W
WenmuZhou 已提交
229 230 231 232
```

Output will be a list, each item contains text and recognition confidence
```bash
233
['PAIN', 0.9934559464454651]
W
WenmuZhou 已提交
234 235
```

W
WenmuZhou 已提交
236 237
* only classification
```bash
W
WenmuZhou 已提交
238
paddleocr --image_dir PaddleOCR/doc/imgs_words_en/word_10.png --use_angle_cls true --det false --rec false
W
WenmuZhou 已提交
239 240 241 242 243 244 245
```

Output will be a list, each item contains classification result and confidence
```bash
['0', 0.99999964]
```

W
WenmuZhou 已提交
246
## 3 Use custom model
W
WenmuZhou 已提交
247 248 249
When the built-in model cannot meet the needs, you need to use your own trained model.
First, refer to the first section of [inference_en.md](./inference_en.md) to convert your det and rec model to inference model, and then use it as follows

W
WenmuZhou 已提交
250
### 3.1 Use by code
W
WenmuZhou 已提交
251 252 253 254

```python
from paddleocr import PaddleOCR,draw_ocr
# The path of detection and recognition model must contain model and params files
W
WenmuZhou 已提交
255
ocr = PaddleOCR(det_model_dir='{your_det_model_dir}', rec_model_dir='{your_rec_model_dir}', rec_char_dict_path='{your_rec_char_dict_path}', cls_model_dir='{your_cls_model_dir}', use_angle_cls=True)
W
WenmuZhou 已提交
256
img_path = 'PaddleOCR/doc/imgs_en/img_12.jpg'
W
WenmuZhou 已提交
257
result = ocr.ocr(img_path, cls=True)
W
WenmuZhou 已提交
258 259 260 261 262 263 264 265 266
for line in result:
    print(line)

# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
267
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
268 269 270 271
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```

W
WenmuZhou 已提交
272
### 3.2 Use by command line
W
WenmuZhou 已提交
273 274

```bash
W
WenmuZhou 已提交
275
paddleocr --image_dir PaddleOCR/doc/imgs/11.jpg --det_model_dir {your_det_model_dir} --rec_model_dir {your_rec_model_dir} --rec_char_dict_path {your_rec_char_dict_path} --cls_model_dir {your_cls_model_dir} --use_angle_cls true
W
WenmuZhou 已提交
276 277
```

W
WenmuZhou 已提交
278
## 4 Use web images or numpy array as input
W
WenmuZhou 已提交
279

W
WenmuZhou 已提交
280
### 4.1 Web image
W
WenmuZhou 已提交
281

W
WenmuZhou 已提交
282
- Use by code
W
WenmuZhou 已提交
283 284 285 286 287 288 289 290 291 292 293 294 295 296
```python
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# show result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
297
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
298 299 300
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
W
WenmuZhou 已提交
301
- Use by command line
W
WenmuZhou 已提交
302 303 304 305
```bash
paddleocr --image_dir http://n.sinaimg.cn/ent/transform/w630h933/20171222/o111-fypvuqf1838418.jpg --use_angle_cls=true
```

W
WenmuZhou 已提交
306
### 4.2 Numpy array
W
WenmuZhou 已提交
307 308 309
Support numpy array as input only when used by code

```python
W
WenmuZhou 已提交
310
import cv2
311
from paddleocr import PaddleOCR, draw_ocr, download_with_progressbar
W
WenmuZhou 已提交
312 313 314 315 316 317 318 319 320 321
ocr = PaddleOCR(use_angle_cls=True, lang="ch") # need to run only once to download and load model into memory
img_path = 'PaddleOCR/doc/imgs/11.jpg'
img = cv2.imread(img_path)
# img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY), If your own training model supports grayscale images, you can uncomment this line
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)

# show result
from PIL import Image
322 323 324

download_with_progressbar(img_path, 'tmp.jpg')
image = Image.open('tmp.jpg').convert('RGB')
W
WenmuZhou 已提交
325 326 327
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
W
WenmuZhou 已提交
328
im_show = draw_ocr(image, boxes, txts, scores, font_path='/path/to/PaddleOCR/doc/fonts/simfang.ttf')
W
WenmuZhou 已提交
329 330 331 332 333
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```


W
WenmuZhou 已提交
334
## 5 Parameter Description
W
WenmuZhou 已提交
335 336 337 338 339 340 341

| Parameter                    | Description                                                                                                                                                                                                                 | Default value                  |
|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| use_gpu                 | use GPU or not                                                                                                                                                                                                          | TRUE                    |
| gpu_mem                 | GPU memory size used for initialization                                                                                                                                                                                              | 8000M                   |
| image_dir               | The images path or folder path for predicting when used by the command line                                                                                                                                                                           |                         |
| det_algorithm           | Type of detection algorithm selected                                                                                                                                                                                                   | DB                      |
W
WenmuZhou 已提交
342
| det_model_dir           | the text detection inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/det`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None           |
W
WenmuZhou 已提交
343 344 345 346 347 348 349 350
| det_max_side_len        | The maximum size of the long side of the image. When the long side exceeds this value, the long side will be resized to this size, and the short side will be scaled proportionally                                                                                                                         | 960                     |
| det_db_thresh           | Binarization threshold value of DB output map                                                                                                                                                                                        | 0.3                     |
| det_db_box_thresh       | The threshold value of the DB output box. Boxes score lower than this value will be discarded                                                                                                                                                                         | 0.5                     |
| det_db_unclip_ratio     | The expanded ratio of DB output box                                                                                                                                                                                             | 2                       |
| det_east_score_thresh   | Binarization threshold value of EAST output map                                                                                                                                                                                       | 0.8                     |
| det_east_cover_thresh   | The threshold value of the EAST output box. Boxes score lower than this value will be discarded                                                                                                                                                                         | 0.1                     |
| det_east_nms_thresh     | The NMS threshold value of EAST model output box                                                                                                                                                                                              | 0.2                     |
| rec_algorithm           | Type of recognition algorithm selected                                                                                                                                                                                                | CRNN                    |
W
WenmuZhou 已提交
351
| rec_model_dir           | the text recognition inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/rec`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
W
WenmuZhou 已提交
352 353
| rec_image_shape         | image shape of recognition algorithm                                                                                                                                                                                            | "3,32,320"              |
| rec_batch_num           | When performing recognition, the batchsize of forward images                                                                                                                                                                                         | 30                      |
W
WenmuZhou 已提交
354 355
| max_text_length         | The maximum text length that the recognition algorithm can recognize                                                                                                                                                                                         | 25                      |
| rec_char_dict_path      | the alphabet path which needs to be modified to your own path when `rec_model_Name` use mode 2                                                                                                                                              | ./ppocr/utils/ppocr_keys_v1.txt                        |
W
WenmuZhou 已提交
356
| use_space_char          | Whether to recognize spaces                                                                                                                                                                                                         | TRUE                    |
W
WenmuZhou 已提交
357
| drop_score          | Filter the output by score (from the recognition model), and those below this score will not be returned                                                                                                                                                                                                        | 0.5                    |
W
WenmuZhou 已提交
358 359 360 361 362
| use_angle_cls          | Whether to load classification model                                                                                                                                                                                                       | FALSE                    |
| cls_model_dir           | the classification inference model folder. There are two ways to transfer parameters, 1. None: Automatically download the built-in model to `~/.paddleocr/cls`; 2. The path of the inference model converted by yourself, the model and params files must be included in the model path | None |
| cls_image_shape         | image shape of classification algorithm                                                                                                                                                                                            | "3,48,192"              |
| label_list         | label list of classification algorithm                                                                                                                                                                                            | ['0','180']           |
| cls_batch_num           | When performing classification, the batchsize of forward images                                                                                                                                                                                         | 30                      |
W
WenmuZhou 已提交
363
| enable_mkldnn           | Whether to enable mkldnn                                                                                                                                                                                                       | FALSE                   |
W
WenmuZhou 已提交
364 365
| use_zero_copy_run           | Whether to forward by zero_copy_run                                                                                                                                                                               | FALSE                   |
| lang                     | The support language, now only Chinese(ch)、English(en)、French(french)、German(german)、Korean(korean)、Japanese(japan) are supported                                                                                                                                                                                                  | ch                    |
W
WenmuZhou 已提交
366
| det                     | Enable detction when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
W
WenmuZhou 已提交
367
| rec                     | Enable recognition when `ppocr.ocr` func exec                                                                                                                                                                                                   | TRUE                    |
W
WenmuZhou 已提交
368
| cls                     | Enable classification when `ppocr.ocr` func exec((Use use_angle_cls in command line mode to control whether to start classification in the forward direction)                                                                                                                                                                                                   | FALSE                    |
文幕地方's avatar
文幕地方 已提交
369
| show_log                     | Whether to print log| FALSE                    |
Z
zhoujun 已提交
370
| type                     | Perform ocr or table structuring, the value is selected in ['ocr','structure']                                                                                                                                                                                             | ocr                    |
371
| ocr_version                     | OCR Model version number, the current model support list is as follows: PP-OCRv3 support Chinese and English detection and recognition model and direction classifier model, PP-OCRv2 support Chinese detection and recognition model, PP-OCR support Chinese detection, recognition and direction classifier, multilingual recognition model | PP-OCRv3                 |