- Specify the loading operation when model file is missing. Now support fail/rand/zere three operations.
- Specify the loading operation when model file is missing. Now support fail/rand/zero three operations.
-`fail`: program will exit.
-`rand`: uniform or normal distribution according to **initial\_strategy** in network config. Uniform range is: **[mean - std, mean + std]**, where mean and std are configures in trainer config.
-`zero`: all parameters are zero.
...
...
@@ -118,11 +118,11 @@
- type: int32 (default: 0).
*`--test_wait`
- Whether to wait for parameter per pass if not exist. If set test_data_path in submitting environment of cluster, it will launch one process to perfom testing, so we need to set test_wait=1. Note that in the cluster submitting environment, this argument has been set True by default.
- Whether to wait for parameter per pass if not exist. It can be used when user launch another process to perfom testing during the training process.
- type: bool (default: 0).
*`--model_list`
- File that saves the model list when testing. It was set automatically when using cluster submitting environment after setting model_path.
- File that saves the model list when testing.
- type: string (default: "", null).
*`--predict_output_dir`
...
...
@@ -212,7 +212,7 @@
- type: bool (default: 0).
*`--pservers`
- Comma separated IP addresses of pservers. It is set automatically in cluster submitting environment.
- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer\_count and gpu\_id (0 by default). Here, layer l1 and l2 are computed on the GPU.
- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer\_count and gpu\_id (0 by default). Here, layer fc1 and fc2 are computed on the GPU.
- device=-1: use the CPU for layer l3.
- device=-1: use the CPU for layer fc3.
- trainer_count:
- trainer_count=1: if gpu\_id is not set, then use the first GPU to compute layers l1 and l2. Otherwise use the GPU with gpu\_id.
- trainer_count=1: if gpu\_id is not set, then use the first GPU to compute layers fc1 and fc2. Otherwise use the GPU with gpu\_id.
- trainer_count>1: use trainer\_count GPUs to compute one layer using data parallelism. For example, trainer\_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer l1 and l2.
- trainer_count>1: use trainer\_count GPUs to compute one layer using data parallelism. For example, trainer\_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer fc1 and fc2.