> The [captcha](https://github.com/lepture/captcha) library can be used to generate captcha images. You can generate the train and test dataset by yourself or just run the script `scripts/run_process_data.sh`. By default, the shell script will generate 10000 test images and 50000 train images separately.
-[Dataset Preparation](#dataset-preparation)
> ```
-[Training Process](#training-process)
> $ cd scripts
-[Training](#training)
> $ sh run_process_data.sh
-[Distributed Training](#distributed-training)
>
-[Evaluation Process](#evaluation-process)
> # after execution, you will find the dataset like the follows:
-[Description of Random Situation](#description-of-random-situation)
> ...
-[ModelZoo Homepage](#modelzoo-homepage)
# [WarpCTC Description](#contents)
## Structure
This is an example of training WarpCTC with self-generated captcha image dataset in MindSpore.
# [Model Architecture](#content)
WarpCTC is a two-layer stacked LSTM appending with one-layer FC neural network. See src/warpctc.py for details.
# [Dataset](#content)
The dataset is self-generated using a third-party library called [captcha](https://github.com/lepture/captcha), which can randomly generate digits from 0 to 9 in image. In this network, we set the length of digits varying from 1 to 4.
# [Environment Requirements](#contents)
- Hardware(Ascend/GPU)
- Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. You will be able to have access to related resources once approved.
Run the script `scripts/run_process_data.sh` to generate a dataset. By default, the shell script will generate 10000 test images and 50000 train images separately.
```
$ cd scripts
$ sh run_process_data.sh
# after execution, you will find the dataset like the follows:
.
└─warpctc
└─data
├─ train # train dataset
└─ test # evaluate dataset
```
- After the dataset is prepared, you may start running the training or the evaluation scripts as follows:
Parameters for both training and evaluation can be set in config.py.
Parameters for both training and evaluation can be set in config.py.
...
@@ -69,82 +156,82 @@ Parameters for both training and evaluation can be set in config.py.
...
@@ -69,82 +156,82 @@ Parameters for both training and evaluation can be set in config.py.
"save_checkpoint_path": "./checkpoint", # path to save checkpoint
"save_checkpoint_path": "./checkpoint", # path to save checkpoint
```
```
## Running the example
## [Dataset Preparation](#contents)
- You may refer to "Generate dataset" in [Quick Start](#quick-start) to automatically generate a dataset, or you may choose to generate a captcha dataset by yourself.
- Set options in `config.py`, including learning rate and other network hyperparameters. Click [MindSpore dataset preparation tutorial](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#mindspore) for more information about dataset.
bash run_standalone_train.sh ../data/train Ascend
### [Training](#contents)
- Run `run_standalone_train.sh` for non-distributed training of WarpCTC model, either on Ascend or on GPU.
- Run `run_distribute_train.sh` for distributed training of WarpCTC model on Ascend.
> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html).
Training result will be stored in folder `scripts`, whose name begins with "train" or "train_parallel". Under this, you can find checkpoint file together with result like the followings in log.