README_en.md 6.6 KB
Newer Older
W
WenmuZhou 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
# Getting Started

[1. Install whl package](#Install whl package)

[2. Quick Start](#Quick Start)

[3. PostProcess](#PostProcess)

[4. Results](#Results)

[5. Training](#Training)

<a name="Install whl package"></a>

## 1.  Install whl package
```bash
wget https://paddleocr.bj.bcebos.com/whl/layoutparser-0.0.0-py3-none-any.whl
pip install -U layoutparser-0.0.0-py3-none-any.whl
```

<a name="Quick Start"></a>

## 2. Quick Start

Use LayoutParser to identify the layout of a given document:

```python
W
WenmuZhou 已提交
28
import cv2
W
WenmuZhou 已提交
29
import layoutparser as lp
W
WenmuZhou 已提交
30
image = cv2.imread("doc/table/layout.jpg")
W
WenmuZhou 已提交
31 32 33 34 35 36 37 38 39 40 41 42
image = image[..., ::-1]

# load model
model = lp.PaddleDetectionLayoutModel(config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config", 
                                threshold=0.5,
                                label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
                                enforce_cpu=False, 
                                enable_mkldnn=True)
# detect
layout = model.detect(image)

# show result
W
WenmuZhou 已提交
43 44
show_img = lp.draw_box(image, layout, box_width=3, show_element_type=True)
show_img.show()
W
WenmuZhou 已提交
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
```

The following figure shows the result, with different colored detection boxes representing different categories and displaying specific categories in the upper left corner of the box with `show_element_type`

<div align="center">
<img src="../../doc/table/result_all.jpg"  width = "600" />
</div>
`PaddleDetectionLayoutModel`parameters are described as follows:

|   parameter    |                       description                        |   default   |                            remark                            |
| :------------: | :------------------------------------------------------: | :---------: | :----------------------------------------------------------: |
|  config_path   |                    model config path                     |    None     | Specify config_ path will automatically download the model (only for the first time,the model will exist and will not be downloaded again) |
|   model_path   |                        model path                        |    None     | local model path, config_ path and model_ path must be set to one, cannot be none at the same time |
|   threshold    |              threshold of prediction score               |     0.5     |                              \                               |
|  input_shape   |                 picture size of reshape                  | [3,640,640] |                              \                               |
|   batch_size   |                    testing batch size                    |      1      |                              \                               |
|   label_map    |                  category mapping table                  |    None     | Setting config_ path, it can be none, and the label is automatically obtained according to the dataset name_ map |
|  enforce_cpu   |                    whether to use CPU                    |    False    |      False to use GPU, and True to force the use of CPU      |
| enforce_mkldnn | whether mkldnn acceleration is enabled in CPU prediction |    True     |                              \                               |
|   thread_num   |                the number of CPU threads                 |     10      |                              \                               |

The following model configurations and label maps are currently supported, which you can use by modifying '--config_path' and '--label_map' to detect different types of content:

| dataset                                                      | config_path                                                  | label_map                                                 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------------------------------------------- |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) word | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | {0:"Table"}                                               |
| TableBank latex                                              | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_latex/config | {0:"Table"}                                               |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet)        | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config      | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |

* TableBank word and TableBank latex are trained on datasets of word documents and latex documents respectively;
* Download TableBank dataset contains both word and latex。

<a name="PostProcess"></a>

## 3. PostProcess

Layout parser contains multiple categories, if you only want to get the detection box for a specific category (such as the "Text" category), you can use the following code:

```python
W
WenmuZhou 已提交
84
# follow the above code
W
WenmuZhou 已提交
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
# filter areas for a specific text type
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])

# text areas may be detected within the image area, delete these areas
text_blocks = lp.Layout([b for b in text_blocks \
                   if not any(b.is_in(b_fig) for b_fig in figure_blocks)])

# sort text areas and assign ID
h, w = image.shape[:2]

left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)

left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1])

right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1])

# the two lists are merged and the indexes are added in order
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])

# display result
W
WenmuZhou 已提交
108
show_img = lp.draw_box(image, text_blocks,
W
WenmuZhou 已提交
109 110
            box_width=3, 
            show_element_id=True)
W
WenmuZhou 已提交
111
show_img.show()
W
WenmuZhou 已提交
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
```

Displays results with only the "Text" category:

<div align="center">
<img src="../../doc/table/result_text.jpg"  width = "600" />
</div>
<a name="Results"></a>

## 4. Results

| Dataset   | mAP  | CPU time cost | GPU time cost |
| --------- | ---- | ------------- | ------------- |
| PubLayNet | 93.6 | 1713.7ms      | 66.6ms        |
| TableBank | 96.2 | 1968.4ms      | 65.1ms        |

**Envrionment:**	

**CPU:**  Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz,24core

**GPU:**  a single NVIDIA Tesla P40

<a name="Training"></a>

## 5. Training

The above model is based on PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) ,if you want to train your own layout parser model,please refer to:[train_layoutparser_model](train_layoutparser_model_en.md)