quickstart_en.md 10.3 KB
Newer Older
M
MissPenguin 已提交
1 2
# PaddleOCR Quick Start

M
MissPenguin 已提交
3
**Note:** This tutorial mainly introduces the usage of PP-OCR series models, please refer to [PP-Structure Quick Start](../../ppstructure/docs/quickstart_en.md) for the quick use of document analysis related functions.
M
MissPenguin 已提交
4 5

- [1. Installation](#1-installation)
文幕地方's avatar
文幕地方 已提交
6 7
  - [1.1 Install PaddlePaddle](#11-install-paddlepaddle)
  - [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
M
MissPenguin 已提交
8
- [2. Easy-to-Use](#2-easy-to-use)
文幕地方's avatar
文幕地方 已提交
9 10 11 12 13
  - [2.1 Use by Command Line](#21-use-by-command-line)
    - [2.1.1 Chinese and English Model](#211-chinese-and-english-model)
    - [2.1.2 Multi-language Model](#212-multi-language-model)
  - [2.2 Use by Code](#22-use-by-code)
    - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese--english-model-and-multilingual-model)
M
MissPenguin 已提交
14
- [3. Summary](#3-summary)
littletomatodonkey's avatar
littletomatodonkey 已提交
15 16 17



qq_25193841's avatar
qq_25193841 已提交
18
<a name="1nstallation"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
19

qq_25193841's avatar
qq_25193841 已提交
20
## 1. Installation
littletomatodonkey's avatar
littletomatodonkey 已提交
21

qq_25193841's avatar
qq_25193841 已提交
22
<a name="11-install-paddlepaddle"></a>
W
WenmuZhou 已提交
23

qq_25193841's avatar
qq_25193841 已提交
24 25 26
### 1.1 Install PaddlePaddle

> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).
littletomatodonkey's avatar
littletomatodonkey 已提交
27

qq_25193841's avatar
qq_25193841 已提交
28
- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install
littletomatodonkey's avatar
littletomatodonkey 已提交
29

qq_25193841's avatar
qq_25193841 已提交
30
  ```bash
31
  python -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple
qq_25193841's avatar
qq_25193841 已提交
32 33 34 35 36
  ```

- If you have no available GPU on your machine, please run the following command to install the CPU version

  ```bash
37
  python -m pip install paddlepaddle -i https://pypi.tuna.tsinghua.edu.cn/simple
qq_25193841's avatar
qq_25193841 已提交
38
  ```
littletomatodonkey's avatar
littletomatodonkey 已提交
39

qq_25193841's avatar
qq_25193841 已提交
40
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
W
WenmuZhou 已提交
41

qq_25193841's avatar
qq_25193841 已提交
42 43 44
<a name="12-install-paddleocr-whl-package"></a>

### 1.2 Install PaddleOCR Whl Package
qq_25193841's avatar
qq_25193841 已提交
45 46 47

```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
littletomatodonkey's avatar
littletomatodonkey 已提交
48 49
```

qq_25193841's avatar
qq_25193841 已提交
50
- **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
qq_25193841's avatar
qq_25193841 已提交
51

qq_25193841's avatar
qq_25193841 已提交
52
  Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
qq_25193841's avatar
qq_25193841 已提交
53

qq_25193841's avatar
qq_25193841 已提交
54 55 56 57 58 59
<a name="2-easy-to-use"></a>

## 2. Easy-to-Use

<a name="21-use-by-command-line"></a>

60
### 2.1 Use by Command Line
qq_25193841's avatar
qq_25193841 已提交
61

qq_25193841's avatar
qq_25193841 已提交
62
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
qq_25193841's avatar
qq_25193841 已提交
63 64

```bash
qq_25193841's avatar
qq_25193841 已提交
65
cd /path/to/ppocr_img
littletomatodonkey's avatar
littletomatodonkey 已提交
66
```
qq_25193841's avatar
qq_25193841 已提交
67

qq_25193841's avatar
qq_25193841 已提交
68
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
qq_25193841's avatar
qq_25193841 已提交
69

qq_25193841's avatar
qq_25193841 已提交
70
<a name="211-english-and-chinese-model"></a>
qq_25193841's avatar
qq_25193841 已提交
71

qq_25193841's avatar
qq_25193841 已提交
72
#### 2.1.1 Chinese and English Model
qq_25193841's avatar
qq_25193841 已提交
73

74
* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
qq_25193841's avatar
qq_25193841 已提交
75

qq_25193841's avatar
qq_25193841 已提交
76
  ```bash
A
andyjpaddle 已提交
77
  paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false
qq_25193841's avatar
qq_25193841 已提交
78
  ```
littletomatodonkey's avatar
littletomatodonkey 已提交
79

qq_25193841's avatar
qq_25193841 已提交
80
  Output will be a list, each item contains bounding box, text and recognition confidence
littletomatodonkey's avatar
littletomatodonkey 已提交
81

qq_25193841's avatar
qq_25193841 已提交
82
  ```bash
83 84 85
  [[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
  [[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
  [[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
qq_25193841's avatar
qq_25193841 已提交
86 87 88
  ......
  ```

A
andyjpaddle 已提交
89 90 91 92 93 94
  pdf file is also supported, you can infer the first few pages by using the `page_num` parameter, the default is 0, which means infer all pages

  ```bash
  paddleocr --image_dir ./xxx.pdf --use_angle_cls true --use_gpu false --page_num 2
  ```

qq_25193841's avatar
qq_25193841 已提交
95 96 97 98 99
* Only detection: set `--rec` to `false`

  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
  ```
qq_25193841's avatar
qq_25193841 已提交
100

qq_25193841's avatar
qq_25193841 已提交
101 102 103
  Output will be a list, each item only contains bounding box

  ```bash
104 105 106
  [[397.0, 802.0], [1092.0, 802.0], [1092.0, 841.0], [397.0, 841.0]]
  [[397.0, 750.0], [1211.0, 750.0], [1211.0, 789.0], [397.0, 789.0]]
  [[397.0, 702.0], [1209.0, 698.0], [1209.0, 734.0], [397.0, 738.0]]
qq_25193841's avatar
qq_25193841 已提交
107 108 109 110 111 112
  ......
  ```

* Only recognition: set `--det` to `false`

  ```bash
A
andyjpaddle 已提交
113
  paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en
qq_25193841's avatar
qq_25193841 已提交
114 115 116 117 118
  ```

  Output will be a list, each item contains text and recognition confidence

  ```bash
119
  ['PAIN', 0.9934559464454651]
qq_25193841's avatar
qq_25193841 已提交
120 121
  ```

A
andyjpaddle 已提交
122 123 124 125 126
**Version**
paddleocr uses the PP-OCRv3 model by default(`--ocr_version PP-OCRv3`). If you want to use other versions, you can set the parameter `--ocr_version`, the specific version description is as follows:
|  version name |  description |
|    ---    |   ---   |
| PP-OCRv3 | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |
A
andyjpaddle 已提交
127
| PP-OCRv2 | only supports Chinese and English detection and recognition, direction classifier, multilingual model is not updated |
A
andyjpaddle 已提交
128 129 130 131 132 133
| PP-OCR   | support Chinese and English detection and recognition, direction classifier, support multilingual recognition |

If you want to add your own trained model, you can add model links and keys in [paddleocr](../../paddleocr.py) and recompile.

More whl package usage can be found in [whl package](./whl_en.md)

qq_25193841's avatar
qq_25193841 已提交
134
<a name="212-multi-language-model"></a>
qq_25193841's avatar
qq_25193841 已提交
135 136 137

#### 2.1.2 Multi-language Model

A
andyjpaddle 已提交
138
PaddleOCR currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
qq_25193841's avatar
qq_25193841 已提交
139 140

``` bash
A
andyjpaddle 已提交
141
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en
littletomatodonkey's avatar
littletomatodonkey 已提交
142 143
```

qq_25193841's avatar
qq_25193841 已提交
144 145 146 147 148 149 150
<div align="center">
    <img src="../imgs_en/254.jpg" width="300" height="600">
    <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div>
The result is a list, each item contains a text box, text and recognition confidence

```text
151 152 153
[[[67.0, 51.0], [327.0, 46.0], [327.0, 74.0], [68.0, 80.0]], ('PHOCAPITAL', 0.9944712519645691)]
[[[72.0, 92.0], [453.0, 84.0], [454.0, 114.0], [73.0, 122.0]], ('107 State Street', 0.9744491577148438)]
[[[69.0, 135.0], [501.0, 125.0], [501.0, 156.0], [70.0, 165.0]], ('Montpelier Vermont', 0.9357033967971802)]
qq_25193841's avatar
qq_25193841 已提交
154 155
......
```
littletomatodonkey's avatar
littletomatodonkey 已提交
156

qq_25193841's avatar
qq_25193841 已提交
157
Commonly used multilingual abbreviations include
littletomatodonkey's avatar
littletomatodonkey 已提交
158

qq_25193841's avatar
qq_25193841 已提交
159 160 161 162 163
| Language            | Abbreviation |      | Language | Abbreviation |      | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English   | ch           |      | French   | fr           |      | Japanese | japan        |
| English             | en           |      | German   | german       |      | Korean   | korean       |
| Chinese Traditional | chinese_cht  |      | Italian  | it           |      | Russian  | ru           |
littletomatodonkey's avatar
littletomatodonkey 已提交
164

qq_25193841's avatar
qq_25193841 已提交
165
A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
littletomatodonkey's avatar
littletomatodonkey 已提交
166

qq_25193841's avatar
qq_25193841 已提交
167

qq_25193841's avatar
qq_25193841 已提交
168
<a name="22-use-by-code"></a>
qq_25193841's avatar
qq_25193841 已提交
169

qq_25193841's avatar
qq_25193841 已提交
170 171
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
qq_25193841's avatar
qq_25193841 已提交
172

qq_25193841's avatar
qq_25193841 已提交
173
#### 2.2.1 Chinese & English Model and Multilingual Model
qq_25193841's avatar
qq_25193841 已提交
174

qq_25193841's avatar
qq_25193841 已提交
175
* detection, angle classification and recognition:
qq_25193841's avatar
qq_25193841 已提交
176

qq_25193841's avatar
qq_25193841 已提交
177 178 179 180 181 182 183
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
qq_25193841's avatar
qq_25193841 已提交
184
result = ocr.ocr(img_path, cls=True)
A
andyjpaddle 已提交
185 186 187 188
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line)
qq_25193841's avatar
qq_25193841 已提交
189 190


qq_25193841's avatar
qq_25193841 已提交
191 192
# draw result
from PIL import Image
A
andyjpaddle 已提交
193
result = result[0]
qq_25193841's avatar
qq_25193841 已提交
194 195 196 197
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
qq_25193841's avatar
qq_25193841 已提交
198
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
qq_25193841's avatar
qq_25193841 已提交
199 200
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
littletomatodonkey's avatar
littletomatodonkey 已提交
201
```
qq_25193841's avatar
qq_25193841 已提交
202

qq_25193841's avatar
qq_25193841 已提交
203
Output will be a list, each item contains bounding box, text and recognition confidence
qq_25193841's avatar
qq_25193841 已提交
204 205

```bash
206 207 208 209
[[[441.0, 174.0], [1166.0, 176.0], [1165.0, 222.0], [441.0, 221.0]], ('ACKNOWLEDGEMENTS', 0.9971134662628174)]
  [[[403.0, 346.0], [1204.0, 348.0], [1204.0, 384.0], [402.0, 383.0]], ('We would like to thank all the designers and', 0.9761400818824768)]
  [[[403.0, 396.0], [1204.0, 398.0], [1204.0, 434.0], [402.0, 433.0]], ('contributors who have been involved in the', 0.9791957139968872)]
  ......
littletomatodonkey's avatar
littletomatodonkey 已提交
210 211
```

qq_25193841's avatar
qq_25193841 已提交
212
Visualization of results
littletomatodonkey's avatar
littletomatodonkey 已提交
213

qq_25193841's avatar
qq_25193841 已提交
214
<div align="center">
qq_25193841's avatar
qq_25193841 已提交
215
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
qq_25193841's avatar
qq_25193841 已提交
216
</div>
qq_25193841's avatar
qq_25193841 已提交
217

A
andyjpaddle 已提交
218 219 220 221 222 223 224 225
If the input is a PDF file, you can refer to the following code for visualization

```python
from paddleocr import PaddleOCR, draw_ocr

# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
A
andyj 已提交
226
ocr = PaddleOCR(use_angle_cls=True, lang="ch", page_num=2)  # need to run only once to download and load model into memory
A
andyjpaddle 已提交
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
img_path = './xxx.pdf'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line)

# draw result
import fitz
from PIL import Image
import cv2
import numpy as np
imgs = []
with fitz.open(img_path) as pdf:
    for pg in range(0, pdf.pageCount):
        page = pdf[pg]
        mat = fitz.Matrix(2, 2)
        pm = page.getPixmap(matrix=mat, alpha=False)
        # if width or height > 2000 pixels, don't enlarge the image
        if pm.width > 2000 or pm.height > 2000:
            pm = page.getPixmap(matrix=fitz.Matrix(1, 1), alpha=False)

        img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
        img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
        imgs.append(img)
for idx in range(len(result)):
    res = result[idx]
    image = imgs[idx]
    boxes = [line[0] for line in res]
    txts = [line[1][0] for line in res]
    scores = [line[1][1] for line in res]
    im_show = draw_ocr(image, boxes, txts, scores, font_path='doc/fonts/simfang.ttf')
    im_show = Image.fromarray(im_show)
    im_show.save('result_page_{}.jpg'.format(idx))
```
qq_25193841's avatar
qq_25193841 已提交
262 263 264 265 266

<a name="3"></a>

## 3. Summary

M
MissPenguin 已提交
267
In this section, you have mastered the use of PaddleOCR whl package.
qq_25193841's avatar
qq_25193841 已提交
268

A
andyjpaddle 已提交
269
PaddleOCR is a rich and practical OCR tool library that get through the whole process of data production, model training, compression, inference and deployment, please refer to the [tutorials](../../README.md#tutorials) to start the journey of PaddleOCR.