提交 4490258e 编写于 作者: X Xinghai Sun

Update README for DS2 cloud training.

上级 55e0a29c
# Deep Speech 2 on PaddlePaddle
# DeepSpeech2 on PaddlePaddle
## Installation
......@@ -161,3 +161,9 @@ python demo_client.py
On the client console, press and hold the "white-space" key on the keyboard to start talking, until you finish your speech and then release the "white-space" key. The decoding results (infered transcription) will be displayed.
It could be possible to start the server and the client in two seperate machines, e.g. `demo_client.py` is usually started in a machine with a microphone hardware, while `demo_server.py` is usually started in a remote server with powerful GPUs. Please first make sure that these two machines have network access to each other, and then use `--host_ip` and `--host_port` to indicate the server machine's actual IP address (instead of the `localhost` as default) and TCP port, in both `demo_server.py` and `demo_client.py`.
## PaddleCloud Training
If you wish to train DeepSpeech2 on PaddleCloud, please refer to
[Train DeepSpeech2 on PaddleCloud](https://github.com/PaddlePaddle/models/tree/develop/deep_speech_2/cloud).
# Run DS2 on PaddleCloud
# Train DeepSpeech2 on PaddleCloud
>Note:
>Make sure [PaddleCloud client](https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md#%E4%B8%8B%E8%BD%BD%E5%B9%B6%E9%85%8D%E7%BD%AEpaddlecloud) has be installed and current directory is `models/deep_speech_2/cloud/`
>Please make sure [PaddleCloud Client](https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md#%E4%B8%8B%E8%BD%BD%E5%B9%B6%E9%85%8D%E7%BD%AEpaddlecloud) has be installed and current directory is `deep_speech_2/cloud/`
## Step-1 Configure data set
## Step 1: Upload Data
Configure your input data and output path in pcloud_submit.sh:
Provided with several input manifests, `pcloud_upload_data.sh` will pack and upload all the containing audio files to PaddleCloud filesystem, and also generate some corresponding manifest files with updated cloud paths.
- `TRAIN_MANIFEST`: Absolute path of train data manifest file in local file system.This file has format as bellow:
Please modify the following arguments in `pcloud_upload_data.sh`:
- `IN_MANIFESTS`: Paths (in local filesystem) of manifest files containing the audio files to be uploaded. Multiple paths can be concatenated with a whitespace delimeter. Lines of manifest files are in the following format:
```
{"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.flac", "duration": 5.855, "text
......@@ -15,67 +17,54 @@ Configure your input data and output path in pcloud_submit.sh:
{"audio_filepath": "/home/disk1/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.flac", "duration": 4.815, "text
": "nor is mister ..."}
```
- `OUT_MANIFESTS`: Paths (in local filesystem) to write the updated output manifest files to. Multiple paths can be concatenated with a whitespace delimeter. The values of `audio_filepath` in the output manifests are jjjjjkknew paths in PaddleCloud filesystem.
- `CLOUD_DATA_DIR`: Directory (in PaddleCloud filesystem) to upload the data to.
- `NUM_SHARDS`: Number of data shards / parts (in tar files) to be generated when packing and uploading data. Smaller `num_shards` requires larger temoporal local disk space for packing data.
- `TEST_MANIFEST`: Absolute path of train data manifest file in local filesystem. This file has format like `TRAIN_MANIFEST`.
- `VOCAB_FILE`: Absolute path of vocabulary file in local filesytem.
- `MEAN_STD_FILE`: Absolute path of normalizer's statistic file in local filesytem.
- `CLOUD_DATA_DIR:` Absolute path in PaddleCloud filesystem. We will upload local train data to this directory.
- `CLOUD_MODEL_DIR`: Absolute path in PaddleCloud filesystem. PaddleCloud trainer will save model to this directory.
By running:
>Note: Upload will be skipped if target file has existed in `CLOUD_DATA_DIR`.
```
sh pcloud_upload_data.sh
```
all the audio files will be uploaded to PaddleCloud filesystem, and you will get modified manifests files in `OUT_MANIFESTS`.
## Step-2 Configure computation resource
You have to take this step only once, when it is your first time to do the cloud training. Later on, the data is persisitent on the cloud filesystem and is reusable for multple jobs.
Configure computation resource in pcloud_submit.sh:
## Step 2: Configure Training
```
# Configure computation resource and submit job to PaddleCloud
paddlecloud submit \
-image wanghaoshuang/pcloud_ds2:latest \
-jobname ${JOB_NAME} \
-cpu 4 \
-gpu 4 \
-memory 10Gi \
-parallelism 1 \
-pscpu 1 \
-pservers 1 \
-psmemory 10Gi \
-passes 1 \
-entry "sh pcloud_train.sh ${CLOUD_DATA_DIR} ${CLOUD_MODEL_DIR}" \
${DS2_PATH}
```
For more information, please refer to [PaddleCloud](https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md#提交任务)
Configure cloud training arguments in `pcloud_submit.sh`, with the following arguments:
## Step-3 Configure algorithm options
Configure algorithm options in pcloud_train.sh:
```
python train.py \
--use_gpu=1 \
--trainer_count=4 \
--batch_size=256 \
--mean_std_filepath=$MEAN_STD_FILE \
--train_manifest_path='./local.train.manifest' \
--dev_manifest_path='./local.test.manifest' \
--vocab_filepath=$VOCAB_PATH \
--output_model_dir=${MODEL_PATH}
```
You can get more information about algorithm options by follow command:
```
cd ..
python train.py --help
```
- `TRAIN_MANIFEST`: Manifest filepath (in local filesystem) for training. Notice that the`audio_filepath` should be in cloud filesystem, like those generated by `pcloud_upload_data.sh`.
- `DEV_MANIFEST`: Manifest filepath (in local filesystem) for validation.
- `CLOUD_MODEL_DIR`: Directory (in PaddleCloud filesystem) to save the model parameters (checkpoints).
- `BATCH_SIZE`: Training batch size for a single node.
- `NUM_GPU`: Number of GPUs allocated for a single node.
- `NUM_NODE`: Number of nodes (machines) allocated for this job.
- `IS_LOCAL`: Set to False to enable parameter server, if using multiple nodes.
Configure other training hyper-parameters in `pcloud_train.sh` as you wish, just as what you can do in local training.
By running:
## Step-4 Submit job
```
$ sh pcloud_submit.sh
sh pcloud_submit.sh
```
you submit a training job to PaddleCloud. And you will see the job name when the submission is done.
## Step 3 Get Job Logs
Run this to list all the jobs you have submitted, as well as their running status:
## Step-5 Get logs
```
$ paddlecloud logs -n 10000 deepspeech20170727130129
paddlecloud get jobs
```
For more information, please refer to [PaddleCloud client](https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md#下载并配置paddlecloud) or get help by follow command:
Run this, the corresponding job's logs will be printed.
```
paddlecloud --help
paddlecloud logs -n 10000 $REPLACED_WITH_YOUR_ACTUAL_JOB_NAME
```
## More Help
For more information about the usage of PaddleCloud, please refer to [PaddleCloud Usage](https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md#提交任务).
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册