quickstart_en.md 10.3 KB
Newer Older
littletomatodonkey's avatar
littletomatodonkey 已提交
1

qq_25193841's avatar
qq_25193841 已提交
2
# PaddleOCR Quick Start
littletomatodonkey's avatar
littletomatodonkey 已提交
3

qq_25193841's avatar
qq_25193841 已提交
4
[PaddleOCR Quick Start](#paddleocr-quick-start)
littletomatodonkey's avatar
littletomatodonkey 已提交
5

qq_25193841's avatar
qq_25193841 已提交
6 7 8 9 10 11 12 13 14 15 16
* [1. Light Installation](#1-light-installation)
  + [1.1 Install PaddlePaddle2.0](#11-install-paddlepaddle20)
  + [1.2 Install PaddleOCR Whl Package](#12-install-paddleocr-whl-package)
* [2. Easy-to-Use](#2-easy-to-use)
  + [2.1 Use by command line](#21-use-by-command-line)
    - [2.1.1 English and Chinese Model](#211-english-and-chinese-model)
    - [2.1.2 Multi-language Model](#212-multi-language-model)
    - [2.1.3 LayoutParser](#213-layoutparser)
  + [2.2 Use by Code](#22-use-by-code)
    - [2.2.1 Chinese & English Model and Multilingual Model](#221-chinese---english-model-and-multilingual-model)
    - [2.2.2 LayoutParser](#222-layoutparser)
littletomatodonkey's avatar
littletomatodonkey 已提交
17

qq_25193841's avatar
qq_25193841 已提交
18
<a name="1-light-installation"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
19

qq_25193841's avatar
qq_25193841 已提交
20
## 1. Light Installation
littletomatodonkey's avatar
littletomatodonkey 已提交
21

qq_25193841's avatar
qq_25193841 已提交
22
<a name="11-install-paddlepaddle20"></a>
W
WenmuZhou 已提交
23

qq_25193841's avatar
qq_25193841 已提交
24
### 1.1 Install PaddlePaddle2.0
littletomatodonkey's avatar
littletomatodonkey 已提交
25

qq_25193841's avatar
qq_25193841 已提交
26 27 28
```bash
# If you have cuda9 or cuda10 installed on your machine, please run the following command to install
python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple
littletomatodonkey's avatar
littletomatodonkey 已提交
29

qq_25193841's avatar
qq_25193841 已提交
30 31
# If you only have cpu on your machine, please run the following command to install
python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple
littletomatodonkey's avatar
littletomatodonkey 已提交
32
```
qq_25193841's avatar
qq_25193841 已提交
33 34 35

For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.

qq_25193841's avatar
qq_25193841 已提交
36 37
<a name="12-install-paddleocr-whl-package"></a>

qq_25193841's avatar
qq_25193841 已提交
38 39 40 41
### 1.2 Install PaddleOCR Whl Package

```bash
pip install "paddleocr>=2.0.1" # Recommend to use version 2.0.1+
littletomatodonkey's avatar
littletomatodonkey 已提交
42 43
```

qq_25193841's avatar
qq_25193841 已提交
44
- **For windows users:** If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file [here](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely).
qq_25193841's avatar
qq_25193841 已提交
45

qq_25193841's avatar
qq_25193841 已提交
46
  Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
qq_25193841's avatar
qq_25193841 已提交
47

qq_25193841's avatar
qq_25193841 已提交
48
- **For layout analysis users**, run the following command to install **Layout-Parser**
littletomatodonkey's avatar
littletomatodonkey 已提交
49

qq_25193841's avatar
qq_25193841 已提交
50 51 52 53 54 55 56 57 58 59 60 61
  ```bash
  pip3 install -U https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
  ```

<a name="2-easy-to-use"></a>

## 2. Easy-to-Use

<a name="21-use-by-command-line"></a>

### 2.1 Use by command line

qq_25193841's avatar
qq_25193841 已提交
62
PaddleOCR provides a series of test images, click [here](https://paddleocr.bj.bcebos.com/dygraph_v2.1/ppocr_img.zip) to download, and then switch to the corresponding directory in the terminal
qq_25193841's avatar
qq_25193841 已提交
63 64

```bash
qq_25193841's avatar
qq_25193841 已提交
65
cd /path/to/ppocr_img
littletomatodonkey's avatar
littletomatodonkey 已提交
66
```
qq_25193841's avatar
qq_25193841 已提交
67

qq_25193841's avatar
qq_25193841 已提交
68
If you do not use the provided test image, you can replace the following `--image_dir` parameter with the corresponding test image path
qq_25193841's avatar
qq_25193841 已提交
69

qq_25193841's avatar
qq_25193841 已提交
70
<a name="211-english-and-chinese-model"></a>
qq_25193841's avatar
qq_25193841 已提交
71

qq_25193841's avatar
qq_25193841 已提交
72
#### 2.1.1 Chinese and English Model
qq_25193841's avatar
qq_25193841 已提交
73

qq_25193841's avatar
qq_25193841 已提交
74
* Detection, direction classification and recognition: set the direction classifier parameter`--use_angle_cls true` to recognize vertical text.
qq_25193841's avatar
qq_25193841 已提交
75

qq_25193841's avatar
qq_25193841 已提交
76 77 78
  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en
  ```
littletomatodonkey's avatar
littletomatodonkey 已提交
79

qq_25193841's avatar
qq_25193841 已提交
80
  Output will be a list, each item contains bounding box, text and recognition confidence
littletomatodonkey's avatar
littletomatodonkey 已提交
81

qq_25193841's avatar
qq_25193841 已提交
82 83 84 85 86 87 88 89 90 91 92 93
  ```bash
  [[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
  [[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
  [[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
  ......
  ```

* Only detection: set `--rec` to `false`

  ```bash
  paddleocr --image_dir ./imgs_en/img_12.jpg --rec false
  ```
qq_25193841's avatar
qq_25193841 已提交
94

qq_25193841's avatar
qq_25193841 已提交
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
  Output will be a list, each item only contains bounding box

  ```bash
  [[756.0, 812.0], [805.0, 812.0], [805.0, 830.0], [756.0, 830.0]]
  [[820.0, 803.0], [1085.0, 801.0], [1085.0, 836.0], [820.0, 838.0]]
  [[393.0, 801.0], [715.0, 805.0], [715.0, 839.0], [393.0, 836.0]]
  ......
  ```

* Only recognition: set `--det` to `false`

  ```bash
  paddleocr --image_dir ./imgs_words_en/word_10.png --det false --lang en
  ```

  Output will be a list, each item contains text and recognition confidence

  ```bash
  ['PAIN', 0.990372]
  ```

More whl package usage can be found in [whl package](./whl_en.md)
<a name="212-multi-language-model"></a>
qq_25193841's avatar
qq_25193841 已提交
118 119 120

#### 2.1.2 Multi-language Model

qq_25193841's avatar
qq_25193841 已提交
121
Paddleocr currently supports 80 languages, which can be switched by modifying the `--lang` parameter.
qq_25193841's avatar
qq_25193841 已提交
122 123 124

``` bash
paddleocr --image_dir ./doc/imgs_en/254.jpg --lang=en
littletomatodonkey's avatar
littletomatodonkey 已提交
125 126
```

qq_25193841's avatar
qq_25193841 已提交
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
<div align="center">
    <img src="../imgs_en/254.jpg" width="300" height="600">
    <img src="../imgs_results/multi_lang/img_02.jpg" width="600" height="600">
</div>
The result is a list, each item contains a text box, text and recognition confidence

```text
[('PHO CAPITAL', 0.95723116), [[66.0, 50.0], [327.0, 44.0], [327.0, 76.0], [67.0, 82.0]]]
[('107 State Street', 0.96311164), [[72.0, 90.0], [451.0, 84.0], [452.0, 116.0], [73.0, 121.0]]]
[('Montpelier Vermont', 0.97389287), [[69.0, 132.0], [501.0, 126.0], [501.0, 158.0], [70.0, 164.0]]]
[('8022256183', 0.99810505), [[71.0, 175.0], [363.0, 170.0], [364.0, 202.0], [72.0, 207.0]]]
[('REG 07-24-201706:59 PM', 0.93537045), [[73.0, 299.0], [653.0, 281.0], [654.0, 318.0], [74.0, 336.0]]]
[('045555', 0.99346405), [[509.0, 331.0], [651.0, 325.0], [652.0, 356.0], [511.0, 362.0]]]
[('CT1', 0.9988654), [[535.0, 367.0], [654.0, 367.0], [654.0, 406.0], [535.0, 406.0]]]
......
```
littletomatodonkey's avatar
littletomatodonkey 已提交
143

qq_25193841's avatar
qq_25193841 已提交
144
Commonly used multilingual abbreviations include
littletomatodonkey's avatar
littletomatodonkey 已提交
145

qq_25193841's avatar
qq_25193841 已提交
146 147 148 149 150
| Language            | Abbreviation |      | Language | Abbreviation |      | Language | Abbreviation |
| ------------------- | ------------ | ---- | -------- | ------------ | ---- | -------- | ------------ |
| Chinese & English   | ch           |      | French   | fr           |      | Japanese | japan        |
| English             | en           |      | German   | german       |      | Korean   | korean       |
| Chinese Traditional | chinese_cht  |      | Italian  | it           |      | Russian  | ru           |
littletomatodonkey's avatar
littletomatodonkey 已提交
151

qq_25193841's avatar
qq_25193841 已提交
152 153
A list of all languages and their corresponding abbreviations can be found in [Multi-Language Model Tutorial](./multi_languages_en.md)
<a name="213-layoutparser"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
154

qq_25193841's avatar
qq_25193841 已提交
155
#### 2.1.3 LayoutParser
littletomatodonkey's avatar
littletomatodonkey 已提交
156

qq_25193841's avatar
qq_25193841 已提交
157 158 159 160
To use the layout analysis function of PaddleOCR, you need to specify `--type=structure`

```bash
paddleocr --image_dir=../doc/table/1.png --type=structure
littletomatodonkey's avatar
littletomatodonkey 已提交
161 162
```

qq_25193841's avatar
qq_25193841 已提交
163
- **Results Format**
qq_25193841's avatar
qq_25193841 已提交
164

qq_25193841's avatar
qq_25193841 已提交
165
  The returned results of PP-Structure is a list composed of a dict, an example is as follows
qq_25193841's avatar
qq_25193841 已提交
166

qq_25193841's avatar
qq_25193841 已提交
167 168 169 170 171 172 173 174 175
  ```shell
  [
    {   'type': 'Text',
        'bbox': [34, 432, 345, 462],
        'res': ([[36.0, 437.0, 341.0, 437.0, 341.0, 446.0, 36.0, 447.0], [41.0, 454.0, 125.0, 453.0, 125.0, 459.0, 41.0, 460.0]],
                  [('Tigure-6. The performance of CNN and IPT models using difforen', 0.90060663), ('Tent  ', 0.465441)])
    }
  ]
  ```
qq_25193841's avatar
qq_25193841 已提交
176

qq_25193841's avatar
qq_25193841 已提交
177
  The description of each field in dict is as follows
qq_25193841's avatar
qq_25193841 已提交
178

qq_25193841's avatar
qq_25193841 已提交
179 180 181 182 183
  | Parameter | Description                                                  |
  | --------- | ------------------------------------------------------------ |
  | type      | Type of image area                                           |
  | bbox      | The coordinates of the image area in the original image, respectively [left upper x, left upper y, right bottom x, right bottom y] |
  | res       | OCR or table recognition result of image area。<br> Table: HTML string of the table; <br> OCR: A tuple containing the detection coordinates and recognition results of each single line of text |
littletomatodonkey's avatar
littletomatodonkey 已提交
184

qq_25193841's avatar
qq_25193841 已提交
185
- **Parameter Description:**
littletomatodonkey's avatar
littletomatodonkey 已提交
186

qq_25193841's avatar
qq_25193841 已提交
187 188 189 190 191 192
  | Parameter       | Description                                                  | Default value                                |
  | --------------- | ------------------------------------------------------------ | -------------------------------------------- |
  | output          | The path where excel and recognition results are saved       | ./output/table                               |
  | table_max_len   | The long side of the image is resized in table structure model | 488                                          |
  | table_model_dir | inference model path of table structure model                | None                                         |
  | table_char_type | dict path of table structure model                           | ../ppocr/utils/dict/table_structure_dict.txt |
qq_25193841's avatar
qq_25193841 已提交
193

qq_25193841's avatar
qq_25193841 已提交
194
<a name="22-use-by-code"></a>
qq_25193841's avatar
qq_25193841 已提交
195

qq_25193841's avatar
qq_25193841 已提交
196 197
### 2.2 Use by Code
<a name="221-chinese---english-model-and-multilingual-model"></a>
qq_25193841's avatar
qq_25193841 已提交
198

qq_25193841's avatar
qq_25193841 已提交
199
#### 2.2.1 Chinese & English Model and Multilingual Model
qq_25193841's avatar
qq_25193841 已提交
200

qq_25193841's avatar
qq_25193841 已提交
201
* detection, angle classification and recognition:
qq_25193841's avatar
qq_25193841 已提交
202

qq_25193841's avatar
qq_25193841 已提交
203 204 205 206 207 208 209
```python
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
qq_25193841's avatar
qq_25193841 已提交
210 211 212 213 214
result = ocr.ocr(img_path, cls=True)
for line in result:
    print(line)


qq_25193841's avatar
qq_25193841 已提交
215 216
# draw result
from PIL import Image
qq_25193841's avatar
qq_25193841 已提交
217 218 219 220
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
qq_25193841's avatar
qq_25193841 已提交
221
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
qq_25193841's avatar
qq_25193841 已提交
222 223
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
littletomatodonkey's avatar
littletomatodonkey 已提交
224
```
qq_25193841's avatar
qq_25193841 已提交
225

qq_25193841's avatar
qq_25193841 已提交
226
Output will be a list, each item contains bounding box, text and recognition confidence
qq_25193841's avatar
qq_25193841 已提交
227 228

```bash
qq_25193841's avatar
qq_25193841 已提交
229 230 231
[[[442.0, 173.0], [1169.0, 173.0], [1169.0, 225.0], [442.0, 225.0]], ['ACKNOWLEDGEMENTS', 0.99283075]]
[[[393.0, 340.0], [1207.0, 342.0], [1207.0, 389.0], [393.0, 387.0]], ['We would like to thank all the designers and', 0.9357758]]
[[[399.0, 398.0], [1204.0, 398.0], [1204.0, 433.0], [399.0, 433.0]], ['contributors whohave been involved in the', 0.9592447]]
qq_25193841's avatar
qq_25193841 已提交
232
......
littletomatodonkey's avatar
littletomatodonkey 已提交
233 234
```

qq_25193841's avatar
qq_25193841 已提交
235
Visualization of results
littletomatodonkey's avatar
littletomatodonkey 已提交
236

qq_25193841's avatar
qq_25193841 已提交
237
<div align="center">
qq_25193841's avatar
qq_25193841 已提交
238
    <img src="../imgs_results/whl/12_det_rec.jpg" width="800">
qq_25193841's avatar
qq_25193841 已提交
239
</div>
qq_25193841's avatar
qq_25193841 已提交
240
<a name="222-layoutparser"></a>
littletomatodonkey's avatar
littletomatodonkey 已提交
241

qq_25193841's avatar
qq_25193841 已提交
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268
#### 2.2.2 LayoutParser

```python
import os
import cv2
from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(show_log=True)

save_folder = './output/table'
img_path = './table/1.png'
img = cv2.imread(img_path)
result = table_engine(img)
save_structure_res(result, save_folder,os.path.basename(img_path).split('.')[0])

for line in result:
    line.pop('img')
    print(line)

from PIL import Image

font_path = './fonts/simfang.ttf'
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result,font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
```
littletomatodonkey's avatar
littletomatodonkey 已提交
269