use_case.md 6.0 KB
Newer Older
Z
zhangjinchao01 已提交
1 2 3 4 5 6 7 8 9 10 11 12
# Use Case

## Local Training

These command line arguments are commonly used by local training experiments, such as image classification, natural language processing, et al.

```
paddle train \
  --use_gpu=1/0 \                        #1:GPU,0:CPU(default:true)
  --config=network_config \
  --save_dir=output \
  --trainer_count=COUNT \                #(default:1)
W
wangyanfei01 已提交
13
  --test_period=M \                      #(default:0) 
W
wangyanfei01 已提交
14
  --num_passes=N \                       #(defalut:100)
Z
zhangjinchao01 已提交
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182
  --log_period=K \                       #(default:100)
  --dot_period=1000 \                    #(default:1)
  #[--show_parameter_stats_period=100] \ #(default:0)
  #[--saving_period_by_batches=200] \    #(default:0)
```
`show_parameter_stats_period` and `saving_period_by_batches` are optional according to your task.

### 1) Pass Command Argument to Network config

`config_args` is a useful parameter to pass arguments to network config.

```
--config_args=generating=1,beam_size=5,layer_num=10 \
```
And `get_config_arg` can be used to parse these arguments in network config as follows:

```
generating = get_config_arg('generating', bool, False)
beam_size = get_config_arg('beam_size', int, 3)
layer_num = get_config_arg('layer_num', int, 8)
```

`get_config_arg`:

```
get_config_arg(name, type, default_value)
```
- name: the name specified in the `--config_args`
- type: value type, bool, int, str, float etc.
- default_value: default value if not set.

### 2) Use Model to Initialize Network

add argument:

```
--init_model_path=model_path
--load_missing_parameter_strategy=rand
```

## Local Testing

Method 1:

```
paddle train --job=test \
             --use_gpu=1/0 \ 
             --config=network_config \
             --trainer_count=COUNT \ 
             --init_model_path=model_path \
```
- use init\_model\_path to specify test model.
- only can test one model.

Method 2:

```
paddle train --job=test \
             --use_gpu=1/0 \ 
             --config=network_config \
             --trainer_count=COUNT \ 
             --model_list=model.list \
```
- use model_list to specify test models
- can test several models, where model.list likes:

```
./alexnet_pass1
./alexnet_pass2
```

Method 3:

```
paddle train --job=test \
             --use_gpu=1/0 \
             --config=network_config \
             --trainer_count=COUNT \
             --save_dir=model \
             --test_pass=M \
             --num_passes=N \
```
This way must use model path saved by Paddle like this: `model/pass-%5d`. Testing model is from M-th pass to (N-1)-th pass. For example: M=12 and N=14 will test `model/pass-00012` and `model/pass-00013`.

## Sparse Training

Sparse training is usually used to accelerate calculation when input is sparse data with highly dimension. For example, dictionary dimension of input data is 1 million, but one sample just have several words. In paddle, sparse matrix multiplication is used in forward propagation and sparse updating is perfomed on weight updating after backward propagation.

### 1) Local training

You need to set **sparse\_update=True** in network config.  Check the network config documentation for more details.

### 2) cluster training

Add the following argument for cluster training of a sparse model. At the same time you need to set **sparse\_remote\_update=True** in network config. Check the network config documentation for more details.

```
--ports_num_for_sparse=1    #(default: 0)
```

## parallel_nn
`parallel_nn` can be set to mixed use of GPUs and CPUs to compute layers. That is to say, you can deploy network to use a GPU to compute some layers and use a CPU to compute other layers. The other way is to split layers into different GPUs, which can **reduce GPU memory** or **use parallel computation to accelerate some layers**.

If you want to use these characteristics, you need to specify device ID in network config (denote it as deviceId) and add command line argument:

```
--parallel_nn=true
```
### case 1: Mixed Use of GPU and CPU
Consider the following example:

```
#command line:
paddle train --use_gpu=true --parallel_nn=true trainer_count=COUNT

default_device(0)

fc1=fc_layer(...)
fc2=fc_layer(...)
fc3=fc_layer(...,layer_attr=ExtraAttr(device=-1))

```
- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer\_count and gpu\_id (0 by default). Here, layer l1 and l2 are computed on the GPU.

- device=-1: use the CPU for layer l3.

- trainer_count:
  - trainer_count=1: if gpu\_id is not set, then use the first GPU to compute layers l1 and l2. Otherwise use the GPU with gpu\_id.

  - trainer_count>1: use trainer\_count GPUs to compute one layer using data parallelism. For example, trainer\_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer l1 and l2.

### Case 2: Specify Layers in Different Devices

```
#command line:
paddle train --use_gpu=true --parallel_nn=true --trainer_count=COUNT

#network:
fc2=fc_layer(input=l1, layer_attr=ExtraAttr(device=0), ...)
fc3=fc_layer(input=l1, layer_attr=ExtraAttr(device=1), ...)
fc4=fc_layer(input=fc2, layer_attr=ExtraAttr(device=-1), ...)
```
In this case, we assume that there are 4 GPUs in one machine.

- trainer_count=1:
  - Use GPU 0 to compute layer l2.
  - Use GPU 1 to compute layer l3.
  - Use CPU to compute layer l4.

- trainer_count=2:
  - Use GPU 0 and 1 to compute layer l2.
  - Use GPU 2 and 3 to compute layer l3.
  - Use CPU to compute l4 in two threads.

- trainer_count=4:
  - It will fail (note, we have assumed that there are 4 GPUs in machine), because argument `allow_only_one_model_on_one_gpu` is true by default.

**Allocation of device ID when `device!=-1`**:

```
(deviceId + gpu_id + threadId * numLogicalDevices_) % numDevices_

deviceId:             specified in layer.
gpu_id:               0 by default.
threadId:             thread ID, range: 0,1,..., trainer_count-1
numDevices_:          device (GPU) count in machine.
numLogicalDevices_:   min(max(deviceId + 1), numDevices_)
```