@@ -8,7 +8,10 @@ PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, w
...
@@ -8,7 +8,10 @@ PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, w
### Recent Update
### Recent Update
- 2021.8.11:
- 2021.11.17:
- Support install and start PPOCRLabel through the whl package (by [d2623587501](https://github.com/d2623587501))
- Dataset segmentation: Divide the annotation file into training, verification and testing parts (refer to section 3.5 below, by [MrCuiHao](https://github.com/MrCuiHao))
- 2021.8.11:
- New functions: Open the dataset folder, image rotation (Note: Please delete the label box before rotating the image) (by [Wei-JL](https://github.com/Wei-JL))
- New functions: Open the dataset folder, image rotation (Note: Please delete the label box before rotating the image) (by [Wei-JL](https://github.com/Wei-JL))
- Added shortcut key description (Help-Shortcut Key), repaired the direction shortcut key movement function under batch processing (by [d2623587501](https://github.com/d2623587501))
- Added shortcut key description (Help-Shortcut Key), repaired the direction shortcut key movement function under batch processing (by [d2623587501](https://github.com/d2623587501))
- 2021.2.5: New batch processing and undo functions (by [Evezerest](https://github.com/Evezerest)):
- 2021.2.5: New batch processing and undo functions (by [Evezerest](https://github.com/Evezerest)):
...
@@ -21,11 +24,11 @@ PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, w
...
@@ -21,11 +24,11 @@ PPOCRLabel is a semi-automatic graphic annotation tool suitable for OCR field, w
- Click to modify the recognition result.(If you can't change the result, please switch to the system default input method, or switch back to the original input method again)
- Click to modify the recognition result.(If you can't change the result, please switch to the system default input method, or switch back to the original input method again)
- 2020.12.18: Support re-recognition of a single label box (by [ninetailskim](https://github.com/ninetailskim) ), perfect shortcut keys.
- 2020.12.18: Support re-recognition of a single label box (by [ninetailskim](https://github.com/ninetailskim) ), perfect shortcut keys.
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
For more software version requirements, please refer to the instructions in [Installation Document](https://www.paddlepaddle.org.cn/install/quick) for operation.
# Note: The cloud-hosting code may not be able to synchronize the update with this GitHub project in real time. There might be a delay of 3-5 days. Please give priority to the recommended method.
PPOCRLabel can be started in two ways: whl package and Python script. The whl package form is more convenient to start, and the python script to start is convenient for secondary development.
```
#### **Install Third-party Libraries**
#### Windows
```bash
```bash
cd PaddleOCR
pip install PPOCRLabel # install
pip3 install-r requirements.txt
PPOCRLabel # run
```
```
If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file using http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely.
> If you getting this error `OSError: [WinError 126] The specified module could not be found` when you install shapely on windows. Please try to download Shapely whl file using http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely.
>
Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
> Reference: [Solve shapely installation on windows](https://stackoverflow.com/questions/44398265/install-shapely-oserror-winerror-126-the-specified-module-could-not-be-found)
>
### 1.2 Install PPOCRLabel
#### Ubuntu Linux
#### Windows
```bash
pip3 install PPOCRLabel
pip3 install trash-cli
PPOCRLabel
```
#### MacOS
```bash
```bash
pipinstall pyqt5
pip3 install PPOCRLabel
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
cd ./PPOCRLabel # Switch to the PPOCRLabel directory
pip3 uninstall opencv-python # Uninstall opencv manually as it conflicts with pyqt
python PPOCRLabel.py --lang ch
pip3 install opencv-contrib-python-headless==4.2.0.32 # Install the headless version of opencv
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
python3 PPOCRLabel.py
```
```
## 2. Usage
## 2. Usage
### 2.1 Steps
### 2.1 Steps
...
@@ -119,7 +118,7 @@ python3 PPOCRLabel.py
...
@@ -119,7 +118,7 @@ python3 PPOCRLabel.py
10. Labeling result: the user can export the label result manually through the menu "File - Export Label", while the program will also export automatically if "File - Auto export Label Mode" is selected. The manually checked label will be stored in *Label.txt* under the opened picture folder. Click "File"-"Export Recognition Results" in the menu bar, the recognition training data of such pictures will be saved in the *crop_img* folder, and the recognition label will be saved in *rec_gt.txt*<sup>[4]</sup>.
10. Labeling result: the user can export the label result manually through the menu "File - Export Label", while the program will also export automatically if "File - Auto export Label Mode" is selected. The manually checked label will be stored in *Label.txt* under the opened picture folder. Click "File"-"Export Recognition Results" in the menu bar, the recognition training data of such pictures will be saved in the *crop_img* folder, and the recognition label will be saved in *rec_gt.txt*<sup>[4]</sup>.
### Note
### 2.2 Note
[1] PPOCRLabel uses the opened folder as the project. After opening the image folder, the picture will not be displayed in the dialog. Instead, the pictures under the folder will be directly imported into the program after clicking "Open Dir".
[1] PPOCRLabel uses the opened folder as the project. After opening the image folder, the picture will not be displayed in the dialog. Instead, the pictures under the folder will be directly imported into the program after clicking "Open Dir".
...
@@ -137,6 +136,8 @@ python3 PPOCRLabel.py
...
@@ -137,6 +136,8 @@ python3 PPOCRLabel.py
| rec_gt.txt | The recognition label file, which can be directly used for PPOCR identification model training, is generated after the user clicks on the menu bar "File"-"Export recognition result". |
| rec_gt.txt | The recognition label file, which can be directly used for PPOCR identification model training, is generated after the user clicks on the menu bar "File"-"Export recognition result". |
| crop_img | The recognition data, generated at the same time with *rec_gt.txt* |
| crop_img | The recognition data, generated at the same time with *rec_gt.txt* |
## 3. Explanation
## 3. Explanation
### 3.1 Shortcut keys
### 3.1 Shortcut keys
...
@@ -189,7 +190,26 @@ For some data that are difficult to recognize, the recognition results will not
...
@@ -189,7 +190,26 @@ For some data that are difficult to recognize, the recognition results will not
> *Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.*
> *Note: The status of the checkboxes in the recognition results still needs to be saved manually by clicking Save Button.*
### 3.5 Error message
### 3.5 Dataset division
- Enter the following command in the terminal to execute the dataset division script:
```
cd ./PPOCRLabel # Change the directory to the PPOCRLabel folder
-`trainValTestRatio` is the division ratio of the number of images in the training set, validation set, and test set, set according to your actual situation, the default is `6:2:2`
-`labelRootPath` is the storage path of the dataset labeled by PPOCRLabel, the default is `../train_data/label`
-`detRootPath` is the path where the text detection dataset is divided according to the dataset marked by PPOCRLabel. The default is `../train_data/det`
-`recRootPath` is the path where the character recognition dataset is divided according to the dataset marked by PPOCRLabel. The default is `../train_data/rec`
### 3.6 Error message
- If paddleocr is installed with whl, it has a higher priority than calling PaddleOCR class with paddleocr.py, which may cause an exception if whl package is not updated.
- If paddleocr is installed with whl, it has a higher priority than calling PaddleOCR class with paddleocr.py, which may cause an exception if whl package is not updated.
...
@@ -207,24 +227,8 @@ For some data that are difficult to recognize, the recognition results will not
...
@@ -207,24 +227,8 @@ For some data that are difficult to recognize, the recognition results will not
trainValTestRatio is the division ratio of the number of images in the training set, validation set, and test set, set according to your actual situation, the default is 6:2:2
labelRootPath is the storage path of the dataset labeled by PPOCRLabel, the default is ../train_data/label
detRootPath is the path where the text detection dataset is divided according to the dataset marked by PPOCRLabel. The default is ../train_data/det
recRootPath is the path where the character recognition dataset is divided according to the dataset marked by PPOCRLabel. The default is ../train_data/rec