quickstart_en.md 11.5 KB
Newer Older
littletomatodonkey's avatar
littletomatodonkey 已提交
1

qq_25193841's avatar
qq_25193841 已提交
2
# PaddleOCR Quick Start
littletomatodonkey's avatar
littletomatodonkey 已提交
3

qq_25193841's avatar
qq_25193841 已提交
4 5 6
+ [1. Installation](#1installation)
  + [1.1 Install PaddlePaddle](#11-install-paddlepaddle)
  + [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
qq_25193841's avatar
qq_25193841 已提交
7
* [2. Easy-to-Use](#2-easy-to-use)
8
  + [2.1 Use by Command Line](#21-use-by-command-line)
qq_25193841's avatar
qq_25193841 已提交
9 10
    - [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
    - [2.1.2 Multi-language Model](#212-multi-language-model)
qq_25193841's avatar
qq_25193841 已提交
11
    - [2.1.3 Layout Analysis](#213-layoutAnalysis)
qq_25193841's avatar
qq_25193841 已提交
12 13
  + [2.2 Use by Code](#22-use-by-code)
    - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model)
qq_25193841's avatar
qq_25193841 已提交
14
    - [2.2.2 Layout Analysis](#222-layoutAnalysis)
qq_25193841's avatar
qq_25193841 已提交
15
* [3. Summary](#3)
littletomatodonkey's avatar
littletomatodonkey 已提交
16

qq_25193841's avatar
qq_25193841 已提交
17
<a name="1nstallation"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
18

qq_25193841's avatar
qq_25193841 已提交
19
## 1. Installation
littletomatodonkey's avatar
littletomatodonkey 已提交
20

qq_25193841's avatar
qq_25193841 已提交
21
<a name="11-install-paddlepaddle"></a>
W
WenmuZhou 已提交
22

qq_25193841's avatar
qq_25193841 已提交
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
### 1.1 Install PaddlePaddle

> If you do not have a Python environment, please refer to [Environment Preparation](./environment_en.md).

- If you have CUDA 9 or CUDA 10 installed on your machine, please run the following command to install

  ```bash
  python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
  ```

- If you have no available GPU on your machine, please run the following command to install the CPU version

  ```bash
  python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
  ```

For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.

<a name="12-install-paddleocr-whl-package"></a>

### 1.2 Install PaddleOCR Whl Package
qq_25193841's avatar
qq_25193841 已提交
44 45 46

```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
littletomatodonkey's avatar
littletomatodonkey 已提交
47 48
```

qq_25193841's avatar
qq_25193841 已提交
49
- **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
qq_25193841's avatar
qq_25193841 已提交
50

qq_25193841's avatar
qq_25193841 已提交
51
  Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
qq_25193841's avatar
qq_25193841 已提交
52

qq_25193841's avatar
qq_25193841 已提交
53
- **For layout analysis users**, run the following command to install **Layout-Parser**
littletomatodonkey's avatar
littletomatodonkey 已提交
54

qq_25193841's avatar
qq_25193841 已提交
55 56 57 58 59 60 61 62 63 64
  ```bash
  pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
  ```

<a name="2-easy-to-use"></a>

## 2. Easy-to-Use

<a name="21-use-by-command-line"></a>

65
### 2.1 Use by Command Line
qq_25193841's avatar
qq_25193841 已提交
66

qq_25193841's avatar
qq_25193841 已提交
67
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
qq_25193841's avatar
qq_25193841 已提交
68 69

```bash
qq_25193841's avatar
qq_25193841 已提交
70
cd /path/to/ppocr_img
littletomatodonkey's avatar
littletomatodonkey 已提交
71
```
qq_25193841's avatar
qq_25193841 已提交
72

qq_25193841's avatar
qq_25193841 已提交
73
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
qq_25193841's avatar
qq_25193841 已提交
74

qq_25193841's avatar
qq_25193841 已提交
75
<a name="211-english-and-chinese-model"></a>
qq_25193841's avatar
qq_25193841 已提交
76

qq_25193841's avatar
qq_25193841 已提交
77
#### 2.1.1 Chinese and English Model
qq_25193841's avatar
qq_25193841 已提交
78

79
* Detection, direction classification and recognition: set the parameter`--use_gpu false` to disable the gpu device
qq_25193841's avatar
qq_25193841 已提交
80

qq_25193841's avatar
qq_25193841 已提交
81
  ```bash
82
  paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false
qq_25193841's avatar
qq_25193841 已提交
83
  ```
littletomatodonkey's avatar
littletomatodonkey 已提交
84

qq_25193841's avatar
qq_25193841 已提交
85
  Output will be a list, each item contains bounding box, text and recognition confidence
littletomatodonkey's avatar
littletomatodonkey 已提交
86

qq_25193841's avatar
qq_25193841 已提交
87 88 89 90 91 92 93 94 95 96 97 98
  ```bash
  [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
  [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
  [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
  ......
  ```

* Only detection: set `--rec` to `false`

  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
  ```
qq_25193841's avatar
qq_25193841 已提交
99

qq_25193841's avatar
qq_25193841 已提交
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
  Output will be a list, each item only contains bounding box

  ```bash
  [[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
  [[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
  [[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
  ......
  ```

* Only recognition: set `--det` to `false`

  ```bash
  paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en
  ```

  Output will be a list, each item contains text and recognition confidence

  ```bash
  ['PAIN', 0.990372]
  ```

121
If you need to use the 2.0 model, please specify the parameter `--version PP-OCR`, paddleocr uses the 2.1 model by default(`--versioin PP-OCRv2`). More whl package usage can be found in [whl package](./whl_en.md)
qq_25193841's avatar
qq_25193841 已提交
122
<a name="212-multi-language-model"></a>
qq_25193841's avatar
qq_25193841 已提交
123 124 125

#### 2.1.2 Multi-language Model

qq_25193841's avatar
qq_25193841 已提交
126
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
qq_25193841's avatar
qq_25193841 已提交
127 128 129

``` bash
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en
littletomatodonkey's avatar
littletomatodonkey 已提交
130 131
```

qq_25193841's avatar
qq_25193841 已提交
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
<div align="center">
    <img src="../imgs_en/254.jpg" width="300" height="600">
    <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div>
The result is a list, each item contains a text box, text and recognition confidence

```text
[('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
[('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
[('Montpelier Vermont', 0.97389287), [[69.0, 132.0], [501.0, 126.0], [501.0, 158.0], [70.0, 164.0]]]
[('8022256183', 0.99810505), [[71.0, 175.0], [363.0, 170.0], [364.0, 202.0], [72.0, 207.0]]]
[('REG 07-24-201706:59 PM', 0.93537045), [[73.0, 299.0], [653.0, 281.0], [654.0, 318.0], [74.0, 336.0]]]
[('045555', 0.99346405), [[509.0, 331.0], [651.0, 325.0], [652.0, 356.0], [511.0, 362.0]]]
[('CT1', 0.9988654), [[535.0, 367.0], [654.0, 367.0], [654.0, 406.0], [535.0, 406.0]]]
......
```
littletomatodonkey's avatar
littletomatodonkey 已提交
148

qq_25193841's avatar
qq_25193841 已提交
149
Commonly used multilingual abbreviations include
littletomatodonkey's avatar
littletomatodonkey 已提交
150

qq_25193841's avatar
qq_25193841 已提交
151 152 153 154 155
| Language            | Abbreviation |      | Language | Abbreviation |      | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English   | ch           |      | French   | fr           |      | Japanese | japan        |
| English             | en           |      | German   | german       |      | Korean   | korean       |
| Chinese Traditional | chinese_cht  |      | Italian  | it           |      | Russian  | ru           |
littletomatodonkey's avatar
littletomatodonkey 已提交
156

qq_25193841's avatar
qq_25193841 已提交
157
A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
qq_25193841's avatar
qq_25193841 已提交
158
<a name="213-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
159

qq_25193841's avatar
qq_25193841 已提交
160 161 162
#### 2.1.3 Layout Analysis

Layout analysis refers to the division of 5 types of areas of the document, including text, title, list, picture and table. For the first three types of regions, directly use the OCR model to complete the text detection and recognition of the corresponding regions, and save the results in txt. For the table area, after the table structuring process, the table picture is converted into an Excel file of the same table style. The picture area will be individually cropped into an image.
littletomatodonkey's avatar
littletomatodonkey 已提交
163

qq_25193841's avatar
qq_25193841 已提交
164 165 166 167
To use the layout analysis function of PaddleOCR, you need to specify `--type=structure`

```bash
paddleocr --image_dir=../doc/table/1.png --type=structure
littletomatodonkey's avatar
littletomatodonkey 已提交
168 169
```

qq_25193841's avatar
qq_25193841 已提交
170
- **Results Format**
qq_25193841's avatar
qq_25193841 已提交
171

qq_25193841's avatar
qq_25193841 已提交
172
  The returned results of PP-Structure is a list composed of a dict, an example is as follows
qq_25193841's avatar
qq_25193841 已提交
173

qq_25193841's avatar
qq_25193841 已提交
174 175 176 177 178 179 180 181 182
  ```shell
  [
    {   'type': 'Text',
        'bbox': [34, 432, 345, 462],
        'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                  [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
    }
  ]
  ```
qq_25193841's avatar
qq_25193841 已提交
183

qq_25193841's avatar
qq_25193841 已提交
184
  The description of each field in dict is as follows
qq_25193841's avatar
qq_25193841 已提交
185

qq_25193841's avatar
qq_25193841 已提交
186 187 188 189 190
  | Parameter | Description                                                  |
  | --------- | ------------------------------------------------------------ |
  | type      | Type of image area                                           |
  | bbox      | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] |
  | res       | OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text |
littletomatodonkey's avatar
littletomatodonkey 已提交
191

qq_25193841's avatar
qq_25193841 已提交
192
- **Parameter Description:**
littletomatodonkey's avatar
littletomatodonkey 已提交
193

qq_25193841's avatar
qq_25193841 已提交
194 195 196 197 198 199
  | Parameter       | Description                                                  | Default value                                |
  | --------------- | ------------------------------------------------------------ | -------------------------------------------- |
  | output          | The path where excel and recognition results are saved       | ./output/table                               |
  | table_max_len   | The long side of the image is resized in table structure model | 488                                          |
  | table_model_dir | inference model path of table structure model                | None                                         |
  | table_char_type | dict path of table structure model                           | ../ppocr/utils/dict/table_structure_dict.txt |
qq_25193841's avatar
qq_25193841 已提交
200

qq_25193841's avatar
qq_25193841 已提交
201
<a name="22-use-by-code"></a>
qq_25193841's avatar
qq_25193841 已提交
202

qq_25193841's avatar
qq_25193841 已提交
203 204
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
qq_25193841's avatar
qq_25193841 已提交
205

qq_25193841's avatar
qq_25193841 已提交
206
#### 2.2.1 Chinese & English Model and Multilingual Model
qq_25193841's avatar
qq_25193841 已提交
207

qq_25193841's avatar
qq_25193841 已提交
208
* detection, angle classification and recognition:
qq_25193841's avatar
qq_25193841 已提交
209

qq_25193841's avatar
qq_25193841 已提交
210 211 212 213 214 215 216
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
qq_25193841's avatar
qq_25193841 已提交
217 218 219 220 221
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


qq_25193841's avatar
qq_25193841 已提交
222 223
# draw result
from PIL import Image
qq_25193841's avatar
qq_25193841 已提交
224 225 226 227
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
qq_25193841's avatar
qq_25193841 已提交
228
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
qq_25193841's avatar
qq_25193841 已提交
229 230
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
littletomatodonkey's avatar
littletomatodonkey 已提交
231
```
qq_25193841's avatar
qq_25193841 已提交
232

qq_25193841's avatar
qq_25193841 已提交
233
Output will be a list, each item contains bounding box, text and recognition confidence
qq_25193841's avatar
qq_25193841 已提交
234 235

```bash
qq_25193841's avatar
qq_25193841 已提交
236 237 238
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
qq_25193841's avatar
qq_25193841 已提交
239
......
littletomatodonkey's avatar
littletomatodonkey 已提交
240 241
```

qq_25193841's avatar
qq_25193841 已提交
242
Visualization of results
littletomatodonkey's avatar
littletomatodonkey 已提交
243

qq_25193841's avatar
qq_25193841 已提交
244
<div align="center">
qq_25193841's avatar
qq_25193841 已提交
245
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
qq_25193841's avatar
qq_25193841 已提交
246
</div>
qq_25193841's avatar
qq_25193841 已提交
247
<a name="222-layoutAnalysis"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
248

qq_25193841's avatar
qq_25193841 已提交
249
#### 2.2.2 Layout Analysis
qq_25193841's avatar
qq_25193841 已提交
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

save_folder = './output/table'
img_path = './table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

font_path = './fonts/simfang.ttf'
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
qq_25193841's avatar
qq_25193841 已提交
276 277 278 279 280 281 282 283

<a name="3"></a>

## 3. Summary

In this section, you have mastered the use of PaddleOCR whl packages and obtained results.

PaddleOCR is a rich and practical OCR tool library that opens up the whole process of data, model training, compression and inference deployment, so in the [next section](./paddleOCR_overview_en.md) we will first introduce you to the overview of PaddleOCR, and then clone the PaddleOCR project to start the application journey of PaddleOCR.