提交 78bdd324 编写于 作者: L Liu Yiqun

Merge branch 'develop' into warpctc

......@@ -50,7 +50,7 @@ before_install:
fi
- if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo paddle/scripts/travis/before_install.linux.sh; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then paddle/scripts/travis/before_install.osx.sh; fi
- pip install wheel protobuf sphinx breathe recommonmark virtualenv numpy
- pip install wheel protobuf sphinx breathe recommonmark virtualenv numpy sphinx_rtd_theme
script:
- paddle/scripts/travis/main.sh
notifications:
......
paddle/image/logs
paddle/image/*.pyc
paddle/image/train.list
paddle/rnn/logs
paddle/rnn/*.pyc
paddle/rnn/imdb.pkl
caffe/image/logs
tensorflow/image/logs
tensorflow/rnn/logs
# Benchmark
Machine:
- CPU: 12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz
- GPU: Tesla K40m
- cuDNN: v5.1
- system: Docker 1.12.1, all platforms are tested in docker environment.
Platforms:
- PaddlePaddle: paddledev/paddle:gpu-devel-v0.9.0a0
- Tensorflow: gcr.io/tensorflow/tensorflow:0.11.0rc0-gpu
- Caffe: kaixhin/cuda-caffe
Several convolutional neural networks and recurrent neural networks are used to test.
## Image
### Benchmark Model
AlexNet, GoogleNet and a small network used in Caffe.
- [AlexNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet): but the group size is one.
- [GoogleNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet): but remove loss1 and loss2 when testing benchmark.
- [SmallNet](https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10\_quick\_train\_test.prototxt)
### Single-GPU
- AlexNet: input - 3 * 227 * 227, Time: ms/batch
| BatchSize | 64 | 128 | 256 | 512 |
|--------------|-----| -----| ------| -----|
| PaddlePaddle | 195 | 334 | 602 | 1629 |
| TensorFlow | 223 | 364 | 645 | 1235 |
| Caffe | 324 | 627 | 1232 | 2513 |
**Notation**
All platforms use cuDNN-v5.1. We see that caffe is slower in this experiment, because its workspace limit size of cuDNN-conv interface is 8 * 1024 * 1024, which is smaller in PaddlePaddle and TensorFlow. Note that Caffe will be faster if increasing the workspace limit size.
- GoogletNet: input - 3 * 224 * 224, Time: ms/batch
| BatchSize | 64 | 128 | 256 |
|--------------|-------| -------| --------|
| PaddlePaddle | 613 | 1149 | 2348 |
| TensorFlow | 644 | 1176 | 2219 |
| Caffe | 694 | 1364 | out of memory |
- SmallNet: input - 3 * 32 * 32, Time ms/batch
| BatchSize | 64 | 128 | 256 | 512 |
|--------------|--------| -------- | --------|---------|
| PaddlePaddle | 10.463 | 18.184 | 33.113 | 63.039 |
| TensorFlow | 9 | 15 | 28 | 59 |
| Caffe | 9.373 | 16.6606 | 31.4797 | 59.719 |
**Notation**
All the single-GPU experiments in caffe use `caffe time` to calculate elapsed time, which does not include parameter updating time. However, both PaddlePaddle and TensorFlow experiments contain the parameter updating time. As compared with the total time, this part is relatively little on single machine, we can ignore it.
In Tensorflow, they implement algorithm searching method instead of using the algorithm searching interface in cuDNN.
### Multi-GPU: 4 GPUs
- AlexNet, ms / batch
| total-BatchSize | 128 * 4 | 256 * 4 |
|------------------|----------| -----------|
| PaddlePaddle | 347 | 622 |
| TensorFlow | 377 | 675 |
| Caffe | 1229 | 2435 |
For example, if `total-BatchSize = 128 * 4`, the speedup ratio is calculated by
```
time_at_1gpu_batch_128 * 4 / time_at_4gpu_total_batch_512
= (334 * 4)/347
= 3.85
```
<img src="figs/alexnet-4gpu.png" width="420">
- GoogleNet, ms / batch
| total-BatchSize | 128 * 4 | 256 * 4 |
|-------------------|--------------| ----------- |
| PaddlePaddle | 1178 | 2367 |
| TensorFlow | 1210 | 2292 |
| Caffe | 2007 | out of memory |
<img src="figs/googlenet-4gpu.png" width="420">
## RNN
We use lstm network for text classfication to test benchmark.
### Dataset
- [IMDB](http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl)
- Sequence length is 100. In fact, PaddlePaddle supports training with variable-length sequence, but TensorFlow needs to pad. Thus, we also pad sequence length to 100 in PaddlePaddle in order to compare.
- Dictionary size=30000
- Peephole connection is used in `lstmemory` by default in PaddlePaddle. It is also configured in TensorFlow.
### Single-GPU
#### LSTM in Text Classification
Testing `2 lstm layer + fc` network with different hidden size and batch size.
- Batch size = 64, ms / batch
| hidden_size | 256 | 512 | 1280 |
|--------------|-------| -------| --------|
| PaddlePaddle | 83 | 184 | 641 |
| TensorFlow | 175 | 280 | 818 |
- Batch size = 128, ms / batch
| hidden_size | 256 | 512 | 1280 |
|--------------|------- | -------| --------|
| PaddlePaddle | 110 | 261 | 1007 |
| TensorFlow | 181 | 361 | 1237 |
- Batch size = 256, ms / batch
| hidden_size | 256 | 512 | 1280 |
|--------------|-------| -------| --------|
| PaddlePaddle | 170 | 414 | 1655 |
| TensorFlow | 238 | 536 | 1905 |
<img src="figs/rnn_lstm_cls.png" width="600">
#### Seq2Seq
The benchmark of sequence-to-sequence network will be added later.
### Multi GPU: 4 GPUs
#### LSTM in Text Classification
- hidden_size = 256, ms / batch
| batch_size | 256 | 512 |
|--------------| -------| --------|
| PaddlePaddle | 90 | 118 |
| TensorFlow | 226 | 118 |
- hidden_size = 512, ms / batch
| batch_size | 256 | 512 |
|--------------| -------| --------|
| PaddlePaddle | 189 | 268 |
| TensorFlow | 297 | 383 |
<img src="figs/rnn_lstm_4gpus.png" width="420">
#### Seq2Seq
The benchmark of sequence-to-sequence network will be added later.
name: "alexnet"
input: "data"
input_dim: 64
input_dim: 3
input_dim: 227
input_dim: 227
input: "label"
input_dim: 64
input_dim: 1
input_dim: 1
input_dim: 1
force_backward: true
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}
name: "googlenet"
input: "data"
input_dim: 128
input_dim: 3
input_dim: 224
input_dim: 224
input: "label"
input_dim: 128
input_dim: 1
input_dim: 1
input_dim: 1
layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "conv1/relu_7x7"
type: "ReLU"
bottom: "conv1/7x7_s2"
top: "conv1/7x7_s2"
}
layer {
name: "pool1/3x3_s2"
type: "Pooling"
bottom: "conv1/7x7_s2"
top: "pool1/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
#layer {
# name: "pool1/norm1"
# type: "LRN"
# bottom: "pool1/3x3_s2"
# top: "pool1/norm1"
# lrn_param {
# local_size: 5
# alpha: 0.0001
# beta: 0.75
# }
#}
layer {
name: "conv2/3x3_reduce"
type: "Convolution"
# bottom: "pool1/norm1"
bottom: "pool1/3x3_s2"
top: "conv2/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "conv2/relu_3x3_reduce"
type: "ReLU"
bottom: "conv2/3x3_reduce"
top: "conv2/3x3_reduce"
}
layer {
name: "conv2/3x3"
type: "Convolution"
bottom: "conv2/3x3_reduce"
top: "conv2/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "conv2/relu_3x3"
type: "ReLU"
bottom: "conv2/3x3"
top: "conv2/3x3"
}
#layer {
# name: "conv2/norm2"
# type: "LRN"
# bottom: "conv2/3x3"
# top: "conv2/norm2"
# lrn_param {
# local_size: 5
# alpha: 0.0001
# beta: 0.75
# }
#}
layer {
name: "pool2/3x3_s2"
type: "Pooling"
# bottom: "conv2/norm2"
bottom: "conv2/3x3"
top: "pool2/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "inception_3a/1x1"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_1x1"
type: "ReLU"
bottom: "inception_3a/1x1"
top: "inception_3a/1x1"
}
layer {
name: "inception_3a/3x3_reduce"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_3a/3x3_reduce"
top: "inception_3a/3x3_reduce"
}
layer {
name: "inception_3a/3x3"
type: "Convolution"
bottom: "inception_3a/3x3_reduce"
top: "inception_3a/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_3x3"
type: "ReLU"
bottom: "inception_3a/3x3"
top: "inception_3a/3x3"
}
layer {
name: "inception_3a/5x5_reduce"
type: "Convolution"
bottom: "pool2/3x3_s2"
top: "inception_3a/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 16
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_3a/5x5_reduce"
top: "inception_3a/5x5_reduce"
}
layer {
name: "inception_3a/5x5"
type: "Convolution"
bottom: "inception_3a/5x5_reduce"
top: "inception_3a/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_5x5"
type: "ReLU"
bottom: "inception_3a/5x5"
top: "inception_3a/5x5"
}
layer {
name: "inception_3a/pool"
type: "Pooling"
bottom: "pool2/3x3_s2"
top: "inception_3a/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_3a/pool_proj"
type: "Convolution"
bottom: "inception_3a/pool"
top: "inception_3a/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3a/relu_pool_proj"
type: "ReLU"
bottom: "inception_3a/pool_proj"
top: "inception_3a/pool_proj"
}
layer {
name: "inception_3a/output"
type: "Concat"
bottom: "inception_3a/1x1"
bottom: "inception_3a/3x3"
bottom: "inception_3a/5x5"
bottom: "inception_3a/pool_proj"
top: "inception_3a/output"
}
layer {
name: "inception_3b/1x1"
type: "Convolution"
bottom: "inception_3a/output"
top: "inception_3b/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3b/relu_1x1"
type: "ReLU"
bottom: "inception_3b/1x1"
top: "inception_3b/1x1"
}
layer {
name: "inception_3b/3x3_reduce"
type: "Convolution"
bottom: "inception_3a/output"
top: "inception_3b/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3b/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_3b/3x3_reduce"
top: "inception_3b/3x3_reduce"
}
layer {
name: "inception_3b/3x3"
type: "Convolution"
bottom: "inception_3b/3x3_reduce"
top: "inception_3b/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3b/relu_3x3"
type: "ReLU"
bottom: "inception_3b/3x3"
top: "inception_3b/3x3"
}
layer {
name: "inception_3b/5x5_reduce"
type: "Convolution"
bottom: "inception_3a/output"
top: "inception_3b/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3b/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_3b/5x5_reduce"
top: "inception_3b/5x5_reduce"
}
layer {
name: "inception_3b/5x5"
type: "Convolution"
bottom: "inception_3b/5x5_reduce"
top: "inception_3b/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3b/relu_5x5"
type: "ReLU"
bottom: "inception_3b/5x5"
top: "inception_3b/5x5"
}
layer {
name: "inception_3b/pool"
type: "Pooling"
bottom: "inception_3a/output"
top: "inception_3b/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_3b/pool_proj"
type: "Convolution"
bottom: "inception_3b/pool"
top: "inception_3b/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_3b/relu_pool_proj"
type: "ReLU"
bottom: "inception_3b/pool_proj"
top: "inception_3b/pool_proj"
}
layer {
name: "inception_3b/output"
type: "Concat"
bottom: "inception_3b/1x1"
bottom: "inception_3b/3x3"
bottom: "inception_3b/5x5"
bottom: "inception_3b/pool_proj"
top: "inception_3b/output"
}
layer {
name: "pool3/3x3_s2"
type: "Pooling"
bottom: "inception_3b/output"
top: "pool3/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "inception_4a/1x1"
type: "Convolution"
bottom: "pool3/3x3_s2"
top: "inception_4a/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4a/relu_1x1"
type: "ReLU"
bottom: "inception_4a/1x1"
top: "inception_4a/1x1"
}
layer {
name: "inception_4a/3x3_reduce"
type: "Convolution"
bottom: "pool3/3x3_s2"
top: "inception_4a/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4a/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_4a/3x3_reduce"
top: "inception_4a/3x3_reduce"
}
layer {
name: "inception_4a/3x3"
type: "Convolution"
bottom: "inception_4a/3x3_reduce"
top: "inception_4a/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 208
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4a/relu_3x3"
type: "ReLU"
bottom: "inception_4a/3x3"
top: "inception_4a/3x3"
}
layer {
name: "inception_4a/5x5_reduce"
type: "Convolution"
bottom: "pool3/3x3_s2"
top: "inception_4a/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 16
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4a/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_4a/5x5_reduce"
top: "inception_4a/5x5_reduce"
}
layer {
name: "inception_4a/5x5"
type: "Convolution"
bottom: "inception_4a/5x5_reduce"
top: "inception_4a/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 48
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4a/relu_5x5"
type: "ReLU"
bottom: "inception_4a/5x5"
top: "inception_4a/5x5"
}
layer {
name: "inception_4a/pool"
type: "Pooling"
bottom: "pool3/3x3_s2"
top: "inception_4a/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_4a/pool_proj"
type: "Convolution"
bottom: "inception_4a/pool"
top: "inception_4a/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4a/relu_pool_proj"
type: "ReLU"
bottom: "inception_4a/pool_proj"
top: "inception_4a/pool_proj"
}
layer {
name: "inception_4a/output"
type: "Concat"
bottom: "inception_4a/1x1"
bottom: "inception_4a/3x3"
bottom: "inception_4a/5x5"
bottom: "inception_4a/pool_proj"
top: "inception_4a/output"
}
#layer {
# name: "loss1/ave_pool"
# type: "Pooling"
# bottom: "inception_4a/output"
# top: "loss1/ave_pool"
# pooling_param {
# pool: AVE
# kernel_size: 5
# stride: 3
# }
#}
#layer {
# name: "loss1/conv"
# type: "Convolution"
# bottom: "loss1/ave_pool"
# top: "loss1/conv"
# param {
# lr_mult: 1
# decay_mult: 1
# }
# param {
# lr_mult: 2
# decay_mult: 0
# }
# convolution_param {
# num_output: 128
# kernel_size: 1
# weight_filler {
# type: "xavier"
# }
# bias_filler {
# type: "constant"
# value: 0.2
# }
# }
#}
#layer {
# name: "loss1/relu_conv"
# type: "ReLU"
# bottom: "loss1/conv"
# top: "loss1/conv"
#}
#layer {
# name: "loss1/fc"
# type: "InnerProduct"
# bottom: "loss1/conv"
# top: "loss1/fc"
# param {
# lr_mult: 1
# decay_mult: 1
# }
# param {
# lr_mult: 2
# decay_mult: 0
# }
# inner_product_param {
# num_output: 1024
# weight_filler {
# type: "xavier"
# }
# bias_filler {
# type: "constant"
# value: 0.2
# }
# }
#}
#layer {
# name: "loss1/relu_fc"
# type: "ReLU"
# bottom: "loss1/fc"
# top: "loss1/fc"
#}
#layer {
# name: "loss1/drop_fc"
# type: "Dropout"
# bottom: "loss1/fc"
# top: "loss1/fc"
# dropout_param {
# dropout_ratio: 0.7
# }
#}
#layer {
# name: "loss1/classifier"
# type: "InnerProduct"
# bottom: "loss1/fc"
# top: "loss1/classifier"
# param {
# lr_mult: 1
# decay_mult: 1
# }
# param {
# lr_mult: 2
# decay_mult: 0
# }
# inner_product_param {
# num_output: 1000
# weight_filler {
# type: "xavier"
# }
# bias_filler {
# type: "constant"
# value: 0
# }
# }
#}
#layer {
# name: "loss1/loss"
# type: "SoftmaxWithLoss"
# bottom: "loss1/classifier"
# bottom: "label"
# top: "loss1/loss1"
# loss_weight: 0.3
#}
layer {
name: "inception_4b/1x1"
type: "Convolution"
bottom: "inception_4a/output"
top: "inception_4b/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 160
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4b/relu_1x1"
type: "ReLU"
bottom: "inception_4b/1x1"
top: "inception_4b/1x1"
}
layer {
name: "inception_4b/3x3_reduce"
type: "Convolution"
bottom: "inception_4a/output"
top: "inception_4b/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 112
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4b/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_4b/3x3_reduce"
top: "inception_4b/3x3_reduce"
}
layer {
name: "inception_4b/3x3"
type: "Convolution"
bottom: "inception_4b/3x3_reduce"
top: "inception_4b/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 224
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4b/relu_3x3"
type: "ReLU"
bottom: "inception_4b/3x3"
top: "inception_4b/3x3"
}
layer {
name: "inception_4b/5x5_reduce"
type: "Convolution"
bottom: "inception_4a/output"
top: "inception_4b/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 24
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4b/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_4b/5x5_reduce"
top: "inception_4b/5x5_reduce"
}
layer {
name: "inception_4b/5x5"
type: "Convolution"
bottom: "inception_4b/5x5_reduce"
top: "inception_4b/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4b/relu_5x5"
type: "ReLU"
bottom: "inception_4b/5x5"
top: "inception_4b/5x5"
}
layer {
name: "inception_4b/pool"
type: "Pooling"
bottom: "inception_4a/output"
top: "inception_4b/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_4b/pool_proj"
type: "Convolution"
bottom: "inception_4b/pool"
top: "inception_4b/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4b/relu_pool_proj"
type: "ReLU"
bottom: "inception_4b/pool_proj"
top: "inception_4b/pool_proj"
}
layer {
name: "inception_4b/output"
type: "Concat"
bottom: "inception_4b/1x1"
bottom: "inception_4b/3x3"
bottom: "inception_4b/5x5"
bottom: "inception_4b/pool_proj"
top: "inception_4b/output"
}
layer {
name: "inception_4c/1x1"
type: "Convolution"
bottom: "inception_4b/output"
top: "inception_4c/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4c/relu_1x1"
type: "ReLU"
bottom: "inception_4c/1x1"
top: "inception_4c/1x1"
}
layer {
name: "inception_4c/3x3_reduce"
type: "Convolution"
bottom: "inception_4b/output"
top: "inception_4c/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4c/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_4c/3x3_reduce"
top: "inception_4c/3x3_reduce"
}
layer {
name: "inception_4c/3x3"
type: "Convolution"
bottom: "inception_4c/3x3_reduce"
top: "inception_4c/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4c/relu_3x3"
type: "ReLU"
bottom: "inception_4c/3x3"
top: "inception_4c/3x3"
}
layer {
name: "inception_4c/5x5_reduce"
type: "Convolution"
bottom: "inception_4b/output"
top: "inception_4c/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 24
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4c/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_4c/5x5_reduce"
top: "inception_4c/5x5_reduce"
}
layer {
name: "inception_4c/5x5"
type: "Convolution"
bottom: "inception_4c/5x5_reduce"
top: "inception_4c/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4c/relu_5x5"
type: "ReLU"
bottom: "inception_4c/5x5"
top: "inception_4c/5x5"
}
layer {
name: "inception_4c/pool"
type: "Pooling"
bottom: "inception_4b/output"
top: "inception_4c/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_4c/pool_proj"
type: "Convolution"
bottom: "inception_4c/pool"
top: "inception_4c/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4c/relu_pool_proj"
type: "ReLU"
bottom: "inception_4c/pool_proj"
top: "inception_4c/pool_proj"
}
layer {
name: "inception_4c/output"
type: "Concat"
bottom: "inception_4c/1x1"
bottom: "inception_4c/3x3"
bottom: "inception_4c/5x5"
bottom: "inception_4c/pool_proj"
top: "inception_4c/output"
}
layer {
name: "inception_4d/1x1"
type: "Convolution"
bottom: "inception_4c/output"
top: "inception_4d/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 112
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4d/relu_1x1"
type: "ReLU"
bottom: "inception_4d/1x1"
top: "inception_4d/1x1"
}
layer {
name: "inception_4d/3x3_reduce"
type: "Convolution"
bottom: "inception_4c/output"
top: "inception_4d/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 144
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4d/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_4d/3x3_reduce"
top: "inception_4d/3x3_reduce"
}
layer {
name: "inception_4d/3x3"
type: "Convolution"
bottom: "inception_4d/3x3_reduce"
top: "inception_4d/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 288
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4d/relu_3x3"
type: "ReLU"
bottom: "inception_4d/3x3"
top: "inception_4d/3x3"
}
layer {
name: "inception_4d/5x5_reduce"
type: "Convolution"
bottom: "inception_4c/output"
top: "inception_4d/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4d/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_4d/5x5_reduce"
top: "inception_4d/5x5_reduce"
}
layer {
name: "inception_4d/5x5"
type: "Convolution"
bottom: "inception_4d/5x5_reduce"
top: "inception_4d/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4d/relu_5x5"
type: "ReLU"
bottom: "inception_4d/5x5"
top: "inception_4d/5x5"
}
layer {
name: "inception_4d/pool"
type: "Pooling"
bottom: "inception_4c/output"
top: "inception_4d/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_4d/pool_proj"
type: "Convolution"
bottom: "inception_4d/pool"
top: "inception_4d/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4d/relu_pool_proj"
type: "ReLU"
bottom: "inception_4d/pool_proj"
top: "inception_4d/pool_proj"
}
layer {
name: "inception_4d/output"
type: "Concat"
bottom: "inception_4d/1x1"
bottom: "inception_4d/3x3"
bottom: "inception_4d/5x5"
bottom: "inception_4d/pool_proj"
top: "inception_4d/output"
}
#layer {
# name: "loss2/ave_pool"
# type: "Pooling"
# bottom: "inception_4d/output"
# top: "loss2/ave_pool"
# pooling_param {
# pool: AVE
# kernel_size: 5
# stride: 3
# }
#}
#layer {
# name: "loss2/conv"
# type: "Convolution"
# bottom: "loss2/ave_pool"
# top: "loss2/conv"
# param {
# lr_mult: 1
# decay_mult: 1
# }
# param {
# lr_mult: 2
# decay_mult: 0
# }
# convolution_param {
# num_output: 128
# kernel_size: 1
# weight_filler {
# type: "xavier"
# }
# bias_filler {
# type: "constant"
# value: 0.2
# }
# }
#}
#layer {
# name: "loss2/relu_conv"
# type: "ReLU"
# bottom: "loss2/conv"
# top: "loss2/conv"
#}
#layer {
# name: "loss2/fc"
# type: "InnerProduct"
# bottom: "loss2/conv"
# top: "loss2/fc"
# param {
# lr_mult: 1
# decay_mult: 1
# }
# param {
# lr_mult: 2
# decay_mult: 0
# }
# inner_product_param {
# num_output: 1024
# weight_filler {
# type: "xavier"
# }
# bias_filler {
# type: "constant"
# value: 0.2
# }
# }
#}
#layer {
# name: "loss2/relu_fc"
# type: "ReLU"
# bottom: "loss2/fc"
# top: "loss2/fc"
#}
#layer {
# name: "loss2/drop_fc"
# type: "Dropout"
# bottom: "loss2/fc"
# top: "loss2/fc"
# dropout_param {
# dropout_ratio: 0.7
# }
#}
#layer {
# name: "loss2/classifier"
# type: "InnerProduct"
# bottom: "loss2/fc"
# top: "loss2/classifier"
# param {
# lr_mult: 1
# decay_mult: 1
# }
# param {
# lr_mult: 2
# decay_mult: 0
# }
# inner_product_param {
# num_output: 1000
# weight_filler {
# type: "xavier"
# }
# bias_filler {
# type: "constant"
# value: 0
# }
# }
#}
#layer {
# name: "loss2/loss"
# type: "SoftmaxWithLoss"
# bottom: "loss2/classifier"
# bottom: "label"
# top: "loss2/loss1"
# loss_weight: 0.3
#}
layer {
name: "inception_4e/1x1"
type: "Convolution"
bottom: "inception_4d/output"
top: "inception_4e/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4e/relu_1x1"
type: "ReLU"
bottom: "inception_4e/1x1"
top: "inception_4e/1x1"
}
layer {
name: "inception_4e/3x3_reduce"
type: "Convolution"
bottom: "inception_4d/output"
top: "inception_4e/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 160
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4e/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_4e/3x3_reduce"
top: "inception_4e/3x3_reduce"
}
layer {
name: "inception_4e/3x3"
type: "Convolution"
bottom: "inception_4e/3x3_reduce"
top: "inception_4e/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 320
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4e/relu_3x3"
type: "ReLU"
bottom: "inception_4e/3x3"
top: "inception_4e/3x3"
}
layer {
name: "inception_4e/5x5_reduce"
type: "Convolution"
bottom: "inception_4d/output"
top: "inception_4e/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4e/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_4e/5x5_reduce"
top: "inception_4e/5x5_reduce"
}
layer {
name: "inception_4e/5x5"
type: "Convolution"
bottom: "inception_4e/5x5_reduce"
top: "inception_4e/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4e/relu_5x5"
type: "ReLU"
bottom: "inception_4e/5x5"
top: "inception_4e/5x5"
}
layer {
name: "inception_4e/pool"
type: "Pooling"
bottom: "inception_4d/output"
top: "inception_4e/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_4e/pool_proj"
type: "Convolution"
bottom: "inception_4e/pool"
top: "inception_4e/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_4e/relu_pool_proj"
type: "ReLU"
bottom: "inception_4e/pool_proj"
top: "inception_4e/pool_proj"
}
layer {
name: "inception_4e/output"
type: "Concat"
bottom: "inception_4e/1x1"
bottom: "inception_4e/3x3"
bottom: "inception_4e/5x5"
bottom: "inception_4e/pool_proj"
top: "inception_4e/output"
}
layer {
name: "pool4/3x3_s2"
type: "Pooling"
bottom: "inception_4e/output"
top: "pool4/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "inception_5a/1x1"
type: "Convolution"
bottom: "pool4/3x3_s2"
top: "inception_5a/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5a/relu_1x1"
type: "ReLU"
bottom: "inception_5a/1x1"
top: "inception_5a/1x1"
}
layer {
name: "inception_5a/3x3_reduce"
type: "Convolution"
bottom: "pool4/3x3_s2"
top: "inception_5a/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 160
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5a/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_5a/3x3_reduce"
top: "inception_5a/3x3_reduce"
}
layer {
name: "inception_5a/3x3"
type: "Convolution"
bottom: "inception_5a/3x3_reduce"
top: "inception_5a/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 320
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5a/relu_3x3"
type: "ReLU"
bottom: "inception_5a/3x3"
top: "inception_5a/3x3"
}
layer {
name: "inception_5a/5x5_reduce"
type: "Convolution"
bottom: "pool4/3x3_s2"
top: "inception_5a/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5a/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_5a/5x5_reduce"
top: "inception_5a/5x5_reduce"
}
layer {
name: "inception_5a/5x5"
type: "Convolution"
bottom: "inception_5a/5x5_reduce"
top: "inception_5a/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5a/relu_5x5"
type: "ReLU"
bottom: "inception_5a/5x5"
top: "inception_5a/5x5"
}
layer {
name: "inception_5a/pool"
type: "Pooling"
bottom: "pool4/3x3_s2"
top: "inception_5a/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_5a/pool_proj"
type: "Convolution"
bottom: "inception_5a/pool"
top: "inception_5a/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5a/relu_pool_proj"
type: "ReLU"
bottom: "inception_5a/pool_proj"
top: "inception_5a/pool_proj"
}
layer {
name: "inception_5a/output"
type: "Concat"
bottom: "inception_5a/1x1"
bottom: "inception_5a/3x3"
bottom: "inception_5a/5x5"
bottom: "inception_5a/pool_proj"
top: "inception_5a/output"
}
layer {
name: "inception_5b/1x1"
type: "Convolution"
bottom: "inception_5a/output"
top: "inception_5b/1x1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5b/relu_1x1"
type: "ReLU"
bottom: "inception_5b/1x1"
top: "inception_5b/1x1"
}
layer {
name: "inception_5b/3x3_reduce"
type: "Convolution"
bottom: "inception_5a/output"
top: "inception_5b/3x3_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5b/relu_3x3_reduce"
type: "ReLU"
bottom: "inception_5b/3x3_reduce"
top: "inception_5b/3x3_reduce"
}
layer {
name: "inception_5b/3x3"
type: "Convolution"
bottom: "inception_5b/3x3_reduce"
top: "inception_5b/3x3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5b/relu_3x3"
type: "ReLU"
bottom: "inception_5b/3x3"
top: "inception_5b/3x3"
}
layer {
name: "inception_5b/5x5_reduce"
type: "Convolution"
bottom: "inception_5a/output"
top: "inception_5b/5x5_reduce"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 48
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5b/relu_5x5_reduce"
type: "ReLU"
bottom: "inception_5b/5x5_reduce"
top: "inception_5b/5x5_reduce"
}
layer {
name: "inception_5b/5x5"
type: "Convolution"
bottom: "inception_5b/5x5_reduce"
top: "inception_5b/5x5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 2
kernel_size: 5
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5b/relu_5x5"
type: "ReLU"
bottom: "inception_5b/5x5"
top: "inception_5b/5x5"
}
layer {
name: "inception_5b/pool"
type: "Pooling"
bottom: "inception_5a/output"
top: "inception_5b/pool"
pooling_param {
pool: MAX
kernel_size: 3
stride: 1
pad: 1
}
}
layer {
name: "inception_5b/pool_proj"
type: "Convolution"
bottom: "inception_5b/pool"
top: "inception_5b/pool_proj"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
layer {
name: "inception_5b/relu_pool_proj"
type: "ReLU"
bottom: "inception_5b/pool_proj"
top: "inception_5b/pool_proj"
}
layer {
name: "inception_5b/output"
type: "Concat"
bottom: "inception_5b/1x1"
bottom: "inception_5b/3x3"
bottom: "inception_5b/5x5"
bottom: "inception_5b/pool_proj"
top: "inception_5b/output"
}
layer {
name: "pool5/7x7_s1"
type: "Pooling"
bottom: "inception_5b/output"
top: "pool5/7x7_s1"
pooling_param {
pool: AVE
kernel_size: 7
stride: 1
}
}
layer {
name: "pool5/drop_7x7_s1"
type: "Dropout"
bottom: "pool5/7x7_s1"
top: "pool5/7x7_s1"
dropout_param {
dropout_ratio: 0.4
}
}
layer {
name: "loss3/classifier"
type: "InnerProduct"
bottom: "pool5/7x7_s1"
top: "loss3/classifier"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1000
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss3/loss3"
type: "SoftmaxWithLoss"
bottom: "loss3/classifier"
bottom: "label"
top: "loss3/loss3"
loss_weight: 1
}
set -e
function test() {
cfg=$1
batch=$2
prefix=$3
sed -i "/input: \"data\"/{n;s/^input_dim.*/input_dim: $batch/g}" $cfg
sed -i "/input: \"label\"/{n;s/^input_dim.*/input_dim: $batch/g}" $cfg
caffe time --model=$cfg --iterations=50 --gpu 0 > logs/$prefix-1gpu-batch${batch}.log 2>&1
}
if [ ! -d "logs" ]; then
mkdir logs
fi
# alexnet
test alexnet.prototxt 64 alexnet
test alexnet.prototxt 128 alexnet
test alexnet.prototxt 256 alexnet
test alexnet.prototxt 512 alexnet
# googlenet
test googlenet.prototxt 64 googlenet
test googlenet.prototxt 128 googlenet
# small net
test smallnet_mnist_cifar.prototxt 64 smallnet
test smallnet_mnist_cifar.prototxt 128 smallnet
test smallnet_mnist_cifar.prototxt 256 smallnet
test smallnet_mnist_cifar.prototxt 512 smallnet
#!/bin/bash
set -e
function test() {
cfg=$1
batch=$2
prefix=$3
batch_per_gpu=`expr ${batch} / 4`
sed -i "/input: \"data\"/{n;s/^input_dim.*/input_dim: ${batch_per_gpu}/g}" $cfg
sed -i "/input: \"label\"/{n;s/^input_dim.*/input_dim: ${batch_per_gpu}/g}" $cfg
sed -i "1c\net : \"${cfg}\"" solver.prototxt
caffe train --solver=solver.prototxt -gpu 0,1,2,3 > logs/${prefix}-4gpu-batch${batch}.log 2>&1
}
if [ ! -d "logs" ]; then
mkdir logs
fi
# alexnet
test alexnet.prototxt 512 alexnet
test alexnet.prototxt 1024 alexnet
# googlnet
test googlenet.prototxt 512 googlenet
name: "mnist/cifar"
input: "data"
input_dim: 128
input_dim: 3
input_dim: 32
input_dim: 32
input: "label"
input_dim: 128
input_dim: 1
input_dim: 1
input_dim: 1
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.0001
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "pool1"
top: "pool1"
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 32
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 64
pad: 2
kernel_size: 5
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3"
top: "pool3"
pooling_param {
pool: AVE
kernel_size: 3
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool3"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 64
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "ip2"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
net: "alexnet.prototxt"
base_lr: 0.01
lr_policy: "fixed"
display: 20
max_iter: 200
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/caffe_alexnet_train"
solver_mode: GPU
#!/usr/bin/env python
from paddle.trainer_config_helpers import *
height = 227
width = 227
num_class = 1000
batch_size = get_config_arg('batch_size', int, 128)
args = {'height': height, 'width': width, 'color': True, 'num_class': num_class}
define_py_data_sources2(
"train.list", None, module="provider", obj="process", args=args)
settings(
batch_size=batch_size,
learning_rate=0.01 / batch_size,
learning_method=MomentumOptimizer(0.9),
regularization=L2Regularization(0.0005 * batch_size))
# conv1
net = data_layer('data', size=height * width * 3)
net = img_conv_layer(
input=net,
filter_size=11,
num_channels=3,
num_filters=96,
stride=4,
padding=1)
net = img_cmrnorm_layer(input=net, size=5, scale=0.0001, power=0.75)
net = img_pool_layer(input=net, pool_size=3, stride=2)
# conv2
net = img_conv_layer(
input=net, filter_size=5, num_filters=256, stride=1, padding=2, groups=1)
net = img_cmrnorm_layer(input=net, size=5, scale=0.0001, power=0.75)
net = img_pool_layer(input=net, pool_size=3, stride=2)
# conv3
net = img_conv_layer(
input=net, filter_size=3, num_filters=384, stride=1, padding=1)
# conv4
net = img_conv_layer(
input=net, filter_size=3, num_filters=384, stride=1, padding=1, groups=1)
# conv5
net = img_conv_layer(
input=net, filter_size=3, num_filters=256, stride=1, padding=1, groups=1)
net = img_pool_layer(input=net, pool_size=3, stride=2)
net = fc_layer(
input=net,
size=4096,
act=ReluActivation(),
layer_attr=ExtraAttr(drop_rate=0.5))
net = fc_layer(
input=net,
size=4096,
act=ReluActivation(),
layer_attr=ExtraAttr(drop_rate=0.5))
net = fc_layer(input=net, size=1000, act=SoftmaxActivation())
lab = data_layer('label', num_class)
loss = cross_entropy(input=net, label=lab)
outputs(loss)
#!/usr/bin/env python
from paddle.trainer_config_helpers import *
height = 224
width = 224
num_class = 1000
batch_size = get_config_arg('batch_size', int, 128)
args = {'height': height, 'width': width, 'color': True, 'num_class': num_class}
define_py_data_sources2(
"train.list", None, module="provider", obj="process", args=args)
settings(
batch_size=batch_size,
learning_rate=0.01 / batch_size,
learning_method=MomentumOptimizer(0.9),
regularization=L2Regularization(0.0005 * batch_size))
def inception2(name, input, channels, \
filter1,
filter3R, filter3,
filter5R, filter5,
proj):
conv1 = name + '_1'
conv3r = name + '_3r'
conv3 = name + '_3'
conv5r = name + '_5r'
conv5 = name + '_5'
maxpool = name + '_max'
convproj = name + '_proj'
cov1 = img_conv_layer(
name=conv1,
input=input,
filter_size=1,
num_channels=channels,
num_filters=filter1,
stride=1,
padding=0)
cov3r = img_conv_layer(
name=conv3r,
input=input,
filter_size=1,
num_channels=channels,
num_filters=filter3R,
stride=1,
padding=0)
cov3 = img_conv_layer(
name=conv3,
input=cov3r,
filter_size=3,
num_filters=filter3,
stride=1,
padding=1)
cov5r = img_conv_layer(
name=conv5r,
input=input,
filter_size=1,
num_channels=channels,
num_filters=filter5R,
stride=1,
padding=0)
cov5 = img_conv_layer(
name=conv5,
input=cov5r,
filter_size=5,
num_filters=filter5,
stride=1,
padding=2)
pool1 = img_pool_layer(
name=maxpool,
input=input,
pool_size=3,
num_channels=channels,
stride=1,
padding=1)
covprj = img_conv_layer(
name=convproj,
input=pool1,
filter_size=1,
num_filters=proj,
stride=1,
padding=0)
cat = concat_layer(name=name, input=[cov1, cov3, cov5, covprj])
return cat
def inception(name, input, channels, \
filter1,
filter3R, filter3,
filter5R, filter5,
proj):
cov1 = conv_projection(
input=input,
filter_size=1,
num_channels=channels,
num_filters=filter1,
stride=1,
padding=0)
cov3r = img_conv_layer(
name=name + '_3r',
input=input,
filter_size=1,
num_channels=channels,
num_filters=filter3R,
stride=1,
padding=0)
cov3 = conv_projection(
input=cov3r, filter_size=3, num_filters=filter3, stride=1, padding=1)
cov5r = img_conv_layer(
name=name + '_5r',
input=input,
filter_size=1,
num_channels=channels,
num_filters=filter5R,
stride=1,
padding=0)
cov5 = conv_projection(
input=cov5r, filter_size=5, num_filters=filter5, stride=1, padding=2)
pool1 = img_pool_layer(
name=name + '_max',
input=input,
pool_size=3,
num_channels=channels,
stride=1,
padding=1)
covprj = conv_projection(
input=pool1, filter_size=1, num_filters=proj, stride=1, padding=0)
cat = concat_layer(
name=name,
input=[cov1, cov3, cov5, covprj],
bias_attr=True,
act=ReluActivation())
return cat
lab = data_layer(name="label", size=1000)
data = data_layer(name="input", size=3 * height * width)
# stage 1
conv1 = img_conv_layer(
name="conv1",
input=data,
filter_size=7,
num_channels=3,
num_filters=64,
stride=2,
padding=3)
pool1 = img_pool_layer(
name="pool1", input=conv1, pool_size=3, num_channels=64, stride=2)
# stage 2
conv2_1 = img_conv_layer(
name="conv2_1",
input=pool1,
filter_size=1,
num_filters=64,
stride=1,
padding=0)
conv2_2 = img_conv_layer(
name="conv2_2",
input=conv2_1,
filter_size=3,
num_filters=192,
stride=1,
padding=1)
pool2 = img_pool_layer(
name="pool2", input=conv2_2, pool_size=3, num_channels=192, stride=2)
# stage 3
ince3a = inception("ince3a", pool2, 192, 64, 96, 128, 16, 32, 32)
ince3b = inception("ince3b", ince3a, 256, 128, 128, 192, 32, 96, 64)
pool3 = img_pool_layer(
name="pool3", input=ince3b, num_channels=480, pool_size=3, stride=2)
# stage 4
ince4a = inception("ince4a", pool3, 480, 192, 96, 208, 16, 48, 64)
ince4b = inception("ince4b", ince4a, 512, 160, 112, 224, 24, 64, 64)
ince4c = inception("ince4c", ince4b, 512, 128, 128, 256, 24, 64, 64)
ince4d = inception("ince4d", ince4c, 512, 112, 144, 288, 32, 64, 64)
ince4e = inception("ince4e", ince4d, 528, 256, 160, 320, 32, 128, 128)
pool4 = img_pool_layer(
name="pool4", input=ince4e, num_channels=832, pool_size=3, stride=2)
# stage 5
ince5a = inception("ince5a", pool4, 832, 256, 160, 320, 32, 128, 128)
ince5b = inception("ince5b", ince5a, 832, 384, 192, 384, 48, 128, 128)
pool5 = img_pool_layer(
name="pool5",
input=ince5b,
num_channels=1024,
pool_size=7,
stride=7,
pool_type=AvgPooling())
# We remove loss1 and loss2 for all system when testing benchmark
# output 1
# pool_o1 = img_pool_layer(name="pool_o1", input=ince4a, num_channels=512, pool_size=5, stride=3, pool_type=AvgPooling())
# conv_o1 = img_conv_layer(name="conv_o1", input=pool_o1, filter_size=1, num_filters=128, stride=1, padding=0)
# fc_o1 = fc_layer(name="fc_o1", input=conv_o1, size=1024, layer_attr=ExtraAttr(drop_rate=0.7), act=ReluActivation())
# out1 = fc_layer(name="output1", input=fc_o1, size=1000, act=SoftmaxActivation())
# loss1 = cross_entropy(name='loss1', input=out1, label=lab, coeff=0.3)
# output 2
#pool_o2 = img_pool_layer(name="pool_o2", input=ince4d, num_channels=528, pool_size=5, stride=3, pool_type=AvgPooling())
#conv_o2 = img_conv_layer(name="conv_o2", input=pool_o2, filter_size=1, num_filters=128, stride=1, padding=0)
#fc_o2 = fc_layer(name="fc_o2", input=conv_o2, size=1024, layer_attr=ExtraAttr(drop_rate=0.7), act=ReluActivation())
#out2 = fc_layer(name="output2", input=fc_o2, size=1000, act=SoftmaxActivation())
#loss2 = cross_entropy(name='loss2', input=out2, label=lab, coeff=0.3)
# output 3
dropout = dropout_layer(name="dropout", input=pool5, dropout_rate=0.4)
out3 = fc_layer(
name="output3", input=dropout, size=1000, act=SoftmaxActivation())
loss3 = cross_entropy(name='loss3', input=out3, label=lab)
outputs(loss3)
import io, os
import random
import numpy as np
from paddle.trainer.PyDataProvider2 import *
def initHook(settings, height, width, color, num_class, **kwargs):
settings.height = height
settings.width = width
settings.color = color
settings.num_class = num_class
if settings.color:
settings.data_size = settings.height * settings.width * 3
else:
settings.data_size = settings.height * settings.width
settings.slots = [dense_vector(settings.data_size), integer_value(1)]
@provider(
init_hook=initHook, min_pool_size=-1, cache=CacheType.CACHE_PASS_IN_MEM)
def process(settings, file_list):
for i in xrange(1024):
img = np.random.rand(1, settings.data_size).reshape(-1, 1).flatten()
lab = random.randint(0, settings.num_class)
yield img.astype('float32'), int(lab)
set -e
function train() {
cfg=$1
thread=$2
bz=$3
args="batch_size=$3"
prefix=$4
paddle train --job=time \
--config=$cfg \
--use_gpu=True \
--trainer_count=$thread \
--log_period=10 \
--test_period=100 \
--config_args=$args \
> logs/$prefix-${thread}gpu-$bz.log 2>&1
}
if [ ! -d "train.list" ]; then
echo " " > train.list
fi
if [ ! -d "logs" ]; then
mkdir logs
fi
#========single-gpu=========#
# alexnet
train alexnet.py 1 64 alexnet
train alexnet.py 1 128 alexnet
train alexnet.py 1 256 alexnet
train alexnet.py 1 512 alexnet
# googlenet
train googlenet.py 1 64 googlenet
train googlenet.py 1 128 googlenet
train googlenet.py 1 256 googlenet
# smallnet
train smallnet_mnist_cifar.py 1 64 smallnet
train smallnet_mnist_cifar.py 1 128 smallnet
train smallnet_mnist_cifar.py 1 256 smallnet
train smallnet_mnist_cifar.py 1 512 smallnet
############################
#========multi-gpus=========#
train alexnet.py 4 512 alexnet
train alexnet.py 4 1024 alexnet
train googlenet.py 4 512 googlenet
train googlenet.py 4 1024 googlenet
#!/usr/bin/env python
from paddle.trainer_config_helpers import *
height = 32
width = 32
num_class = 10
batch_size = get_config_arg('batch_size', int, 128)
args = {'height': height, 'width': width, 'color': True, 'num_class': num_class}
define_py_data_sources2(
"train.list", None, module="provider", obj="process", args=args)
settings(
batch_size=batch_size,
learning_rate=0.01 / batch_size,
learning_method=MomentumOptimizer(0.9),
regularization=L2Regularization(0.0005 * batch_size))
# conv1
net = data_layer('data', size=height * width * 3)
net = img_conv_layer(
input=net,
filter_size=5,
num_channels=3,
num_filters=32,
stride=1,
padding=2)
net = img_pool_layer(input=net, pool_size=3, stride=2, padding=1)
# conv2
net = img_conv_layer(
input=net, filter_size=5, num_filters=32, stride=1, padding=2)
net = img_pool_layer(
input=net, pool_size=3, stride=2, padding=1, pool_type=AvgPooling())
# conv3
net = img_conv_layer(
input=net, filter_size=3, num_filters=64, stride=1, padding=1)
net = img_pool_layer(
input=net, pool_size=3, stride=2, padding=1, pool_type=AvgPooling())
net = fc_layer(input=net, size=64, act=ReluActivation())
net = fc_layer(input=net, size=10, act=SoftmaxActivation())
lab = data_layer('label', num_class)
loss = classification_cost(input=net, label=lab)
outputs(loss)
from __future__ import print_function
import six.moves.cPickle as pickle
import gzip
import os
import numpy
def get_dataset_file(dataset, default_dataset, origin):
data_dir, data_file = os.path.split(dataset)
if (not os.path.isfile(dataset)) and data_file == default_dataset:
from six.moves import urllib
print('Downloading data from %s' % origin)
urllib.request.urlretrieve(origin, dataset)
return dataset
def create_data(path="imdb.pkl"):
if (not os.path.isfile('imdb.train.pkl')):
path = get_dataset_file(
path, "imdb.pkl",
"http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl")
if path.endswith(".gz"):
f = gzip.open(path, 'rb')
else:
f = open(path, 'rb')
train_set = pickle.load(f)
test_set = pickle.load(f)
f.close()
pickle.dump(train_set, open('imdb.train.pkl', 'wb'))
pickle.dump(test_set, open('imdb.test.pkl', 'wb'))
if (not os.path.isfile('train.list')):
file('train.list', 'w').write('imdb.train.pkl\n')
def main():
create_data('imdb.pkl')
if __name__ == "__main__":
main()
import io, os
import random
import numpy as np
import six.moves.cPickle as pickle
from paddle.trainer.PyDataProvider2 import *
def remove_unk(x, n_words):
return [[1 if w >= n_words else w for w in sen] for sen in x]
# ==============================================================
# tensorflow uses fixed length, but PaddlePaddle can process
# variable-length. Padding is used in benchmark in order to
# compare with other platform.
# ==============================================================
def pad_sequences(sequences,
maxlen=None,
dtype='int32',
padding='post',
truncating='post',
value=0.):
lengths = [len(s) for s in sequences]
nb_samples = len(sequences)
if maxlen is None:
maxlen = np.max(lengths)
x = (np.ones((nb_samples, maxlen)) * value).astype(dtype)
for idx, s in enumerate(sequences):
if len(s) == 0:
continue # empty list was found
if truncating == 'pre':
trunc = s[-maxlen:]
elif truncating == 'post':
trunc = s[:maxlen]
else:
raise ValueError("Truncating type '%s' not understood" % padding)
if padding == 'post':
x[idx, :len(trunc)] = trunc
elif padding == 'pre':
x[idx, -len(trunc):] = trunc
else:
raise ValueError("Padding type '%s' not understood" % padding)
return x
def initHook(settings, vocab_size, pad_seq, maxlen, **kwargs):
settings.vocab_size = vocab_size
settings.pad_seq = pad_seq
settings.maxlen = maxlen
settings.input_types = [
integer_value_sequence(vocab_size), integer_value(2)
]
@provider(
init_hook=initHook, min_pool_size=-1, cache=CacheType.CACHE_PASS_IN_MEM)
def process(settings, file):
f = open(file, 'rb')
train_set = pickle.load(f)
f.close()
x, y = train_set
# remove unk, namely remove the words out of dictionary
x = remove_unk(x, settings.vocab_size)
if settings.pad_seq:
x = pad_sequences(x, maxlen=settings.maxlen, value=0.)
for i in range(len(y)):
yield map(int, x[i]), int(y[i])
#!/usr/bin/env python
from paddle.trainer_config_helpers import *
import imdb
num_class = 2
vocab_size = 30000
fixedlen = 100
batch_size = get_config_arg('batch_size', int, 128)
lstm_num = get_config_arg('lstm_num', int, 1)
hidden_size = get_config_arg('hidden_size', int, 128)
# whether to pad sequence into fixed length
pad_seq = get_config_arg('pad_seq', bool, True)
imdb.create_data('imdb.pkl')
args = {'vocab_size': vocab_size, 'pad_seq': pad_seq, 'maxlen': fixedlen}
define_py_data_sources2(
"train.list", None, module="provider", obj="process", args=args)
settings(
batch_size=batch_size,
learning_rate=2e-3,
learning_method=AdamOptimizer(),
regularization=L2Regularization(8e-4),
gradient_clipping_threshold=25)
net = data_layer('data', size=vocab_size)
net = embedding_layer(input=net, size=128)
for i in xrange(lstm_num):
net = simple_lstm(input=net, size=hidden_size)
net = last_seq(input=net)
net = fc_layer(input=net, size=2, act=SoftmaxActivation())
lab = data_layer('label', num_class)
loss = classification_cost(input=net, label=lab)
outputs(loss)
set -e
function train() {
cfg=$1
thread=$2
args="lstm_num=${3},seq_pad=${4},hidden_size=${5},batch_size=${6}"
paddle train --job=time \
--config=$cfg \
--use_gpu=1 \
--trainer_count=$thread \
--log_period=10 \
--test_period=100 \
--num_passes=1 \
--feed_data=1 \
--config_args=$args \
>logs/rnn-pad${4}-${thread}gpu-lstm${3}-batch${6}-hid${5}.log 2>&1
}
if [ ! -d "logs" ]; then
mkdir logs
fi
## padding, single gpu
#-----config--gpu--lstm_num--padding--hidden_size--batch_size
## lstm_num=2, batch_size=64
train rnn.py 1 2 1 256 64
train rnn.py 1 2 1 512 64
train rnn.py 1 2 1 1280 64
## lstm_num=2, batch_size=128
train rnn.py 1 2 1 256 128
train rnn.py 1 2 1 512 128
train rnn.py 1 2 1 1280 128
## lstm_num=4, batch_size=256
train rnn.py 1 2 1 256 256
train rnn.py 1 2 1 512 256
train rnn.py 1 2 1 1280 256
#==================multi gpus=====================#
# hidden_size=256, lstm_num=2, different batch size
train rnn.py 4 2 1 256 128
train rnn.py 4 2 1 256 256
train rnn.py 4 2 1 256 512
# hidden_size=512, lstm_num=4, different batch size
train rnn.py 4 2 1 512 128
train rnn.py 4 2 1 512 256
train rnn.py 4 2 1 512 512
from six.moves import xrange # pylint: disable=redefined-builtin
from datetime import datetime
import math
import time
import tensorflow.python.platform
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 128, """Batch size.""")
tf.app.flags.DEFINE_integer('num_batches', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('forward_only', False,
"""Only run the forward pass.""")
tf.app.flags.DEFINE_boolean('forward_backward_only', False,
"""Only run the forward-forward pass.""")
tf.app.flags.DEFINE_string('data_format', 'NCHW',
"""The data format for Convnet operations.
Can be either NHWC or NCHW.
""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
def _conv(name, inpOp, nIn, nOut, kH, kW, dH, dW, padType, wd=0.0005):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [kH, kW, nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
if wd is not None and wd > 0:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
if FLAGS.data_format == 'NCHW':
strides = [1, 1, dH, dW]
else:
strides = [1, dH, dW, 1]
conv = tf.nn.conv2d(
inpOp,
kernel,
strides,
padding=padType,
data_format=FLAGS.data_format)
biases = tf.get_variable(
name=name + '_b',
shape=[nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32)
bias = tf.reshape(
tf.nn.bias_add(
conv, biases, data_format=FLAGS.data_format),
conv.get_shape())
conv1 = tf.nn.relu(bias, name=scope)
return conv1
def _affine(name, inpOp, nIn, nOut, wd=0.0005, act=True, drop=None):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
if wd is not None and wd > 0:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
biases = tf.get_variable(
name + '_b', [nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32,
trainable=True)
affine1 = tf.nn.relu_layer(inpOp, kernel, biases, name=name) if act else \
tf.matmul(inpOp, kernel) + biases
output = tf.nn.dropout(affine1, drop) if drop else affine1
return output
def _mpool(name, inpOp, kH, kW, dH, dW):
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.max_pool(
inpOp,
ksize=ksize,
strides=strides,
padding='VALID',
data_format=FLAGS.data_format,
name=name)
def _norm(name, l_input, lsize=4):
return tf.nn.lrn(l_input,
lsize,
bias=1.0,
alpha=0.001 / 9.0,
beta=0.75,
name=name)
def loss(logits, labels):
labels = tf.cast(labels, tf.int64)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits, labels, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
tf.add_to_collection('losses', cross_entropy_mean)
# The total loss is defined as the cross entropy loss plus all of the weight
# decay terms (L2 loss).
return tf.add_n(tf.get_collection('losses'), name='total_loss')
def get_incoming_shape(incoming):
""" Returns the incoming data shape """
if isinstance(incoming, tf.Tensor):
return incoming.get_shape().as_list()
elif type(incoming) in [np.array, list, tuple]:
return np.shape(incoming)
else:
raise Exception("Invalid incoming layer.")
def inference(images):
conv1 = _conv('conv1', images, 3, 96, 11, 11, 4, 4, 'VALID')
pool1 = _mpool('pool1', conv1, 3, 3, 2, 2)
norm1 = _norm('norm1', pool1, lsize=5)
conv2 = _conv('conv2', norm1, 96, 256, 5, 5, 1, 1, 'SAME')
pool2 = _mpool('pool2', conv2, 3, 3, 2, 2)
norm2 = _norm('norm2', pool2, lsize=5)
conv3 = _conv('conv3', norm2, 256, 384, 3, 3, 1, 1, 'SAME')
conv4 = _conv('conv4', conv3, 384, 384, 3, 3, 1, 1, 'SAME')
conv5 = _conv('conv5', conv4, 384, 256, 3, 3, 1, 1, 'SAME')
pool5 = _mpool('pool5', conv5, 3, 3, 2, 2)
resh1 = tf.reshape(pool5, [-1, 256 * 6 * 6])
affn1 = _affine('fc6', resh1, 256 * 6 * 6, 4096, 0.5)
affn2 = _affine('fc7', affn1, 4096, 4096, 0.5)
affn3 = _affine('fc8', affn2, 4096, 1000, wd=None, act=False) # last fc
return affn3
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10
total_duration = 0.0
total_duration_squared = 0.0
if not isinstance(target, list):
target = [target]
target_op = tf.group(*target)
for i in xrange(FLAGS.num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target_op)
duration = time.time() - start_time
if i > num_steps_burn_in:
if not i % 10:
print('%s: step %d, duration = %.3f' %
(datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / FLAGS.num_batches
vr = total_duration_squared / FLAGS.num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), info_string, FLAGS.num_batches, mn, sd))
def _add_loss_summaries(total_loss):
"""
Generates moving average for all losses and associated summaries for
visualizing the performance of the network.
Args:
total_loss: Total loss from loss().
Returns:
loss_averages_op: op for generating moving averages of losses.
"""
# Compute the moving average of all individual losses and the total loss.
loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
losses = tf.get_collection('losses')
loss_averages_op = loss_averages.apply(losses + [total_loss])
# Attach a scalar summary to all individual losses and the total loss; do the
# same for the averaged version of the losses.
for l in losses + [total_loss]:
# Name each loss as '(raw)' and name the moving average version of the loss
# as the original loss name.
tf.scalar_summary(l.op.name + ' (raw)', l)
tf.scalar_summary(l.op.name, loss_averages.average(l))
return loss_averages_op
def run_benchmark():
with tf.Graph().as_default():
with tf.device('/gpu:0'):
# Generate some dummy images.
image_size = 224
# Note that our padding definition is slightly different the cuda-convnet.
# In order to force the model to start with the same activations sizes,
# we add 3 to the image_size and employ VALID padding above.
if FLAGS.data_format == 'NCHW':
image_shape = [
FLAGS.batch_size, 3, image_size + 3, image_size + 3
]
else:
image_shape = [
FLAGS.batch_size, image_size + 3, image_size + 3, 3
]
images = tf.get_variable(
'image',
image_shape,
initializer=tf.truncated_normal_initializer(
stddev=0.1, dtype=tf.float32),
dtype=tf.float32,
trainable=False)
labels = tf.get_variable(
'label', [FLAGS.batch_size],
initializer=tf.constant_initializer(1),
dtype=tf.int32,
trainable=False)
# Build a Graph that computes the logits predictions from the
# inference model.
last_layer = inference(images)
objective = loss(last_layer, labels)
# Compute the gradient with respect to all the parameters.
# Compute gradients.
# opt = tf.train.GradientDescentOptimizer(0.001)
opt = tf.train.MomentumOptimizer(0.001, 0.9)
grads = opt.compute_gradients(objective)
global_step = tf.get_variable(
'global_step', [],
initializer=tf.constant_initializer(
0.0, dtype=tf.float32),
trainable=False,
dtype=tf.float32)
apply_gradient_op = opt.apply_gradients(
grads, global_step=global_step)
# Track the moving averages of all trainable variables.
variable_averages = tf.train.ExponentialMovingAverage(0.9,
global_step)
variables_averages_op = variable_averages.apply(
tf.trainable_variables())
# Build an initialization operation.
init = tf.initialize_all_variables()
# Start running operations on the Graph.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
run_forward = True
run_forward_backward = True
if FLAGS.forward_only and FLAGS.forward_backward_only:
raise ValueError("Cannot specify --forward_only and "
"--forward_backward_only at the same time.")
if FLAGS.forward_only:
run_forward_backward = False
elif FLAGS.forward_backward_only:
run_forward = False
if run_forward:
time_tensorflow_run(sess, last_layer, "Forward")
if run_forward_backward:
with tf.control_dependencies(
[apply_gradient_op, variables_averages_op]):
train_op = tf.no_op(name='train')
time_tensorflow_run(sess, [train_op, objective],
"Forward-backward")
def main(_):
run_benchmark()
if __name__ == '__main__':
tf.app.run()
from six.moves import xrange # pylint: disable=redefined-builtin
from datetime import datetime
import math
import re
import time
import tensorflow.python.platform
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 64, """Batch size.""")
tf.app.flags.DEFINE_integer('num_batches', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_string('data_format', 'NCHW',
"""The data format for Convnet operations.
Can be either NHWC or NCHW.
""")
tf.app.flags.DEFINE_string('train_dir', '/train_model',
"""Directory where to write event logs """
"""and checkpoint.""")
tf.app.flags.DEFINE_integer('num_gpus', 4, """How many GPUs to use.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EPOCHS_PER_DECAY = 50
INITIAL_LEARNING_RATE = 0.1
LEARNING_RATE_DECAY_FACTOR = 0.1
TOWER_NAME = 'tower'
def _conv(name, inpOp, nIn, nOut, kH, kW, dH, dW, padType, wd=0.005):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [kH, kW, nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
if wd is not None:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
if FLAGS.data_format == 'NCHW':
strides = [1, 1, dH, dW]
else:
strides = [1, dH, dW, 1]
conv = tf.nn.conv2d(
inpOp,
kernel,
strides,
padding=padType,
data_format=FLAGS.data_format)
biases = tf.get_variable(
name=name + '_b',
shape=[nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32)
bias = tf.reshape(
tf.nn.bias_add(
conv, biases, data_format=FLAGS.data_format),
conv.get_shape())
conv1 = tf.nn.relu(bias, name=scope)
return conv1
def _affine(name, inpOp, nIn, nOut, wd=0.005, act=True):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
if wd is not None:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
biases = tf.get_variable(
name + '_b', [nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32,
trainable=True)
affine1 = tf.nn.relu_layer(inpOp, kernel, biases, name=name) if act else \
tf.matmul(inpOp, kernel) + biases
return affine1
def _mpool(name, inpOp, kH, kW, dH, dW):
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.max_pool(
inpOp,
ksize=ksize,
strides=strides,
padding='VALID',
data_format=FLAGS.data_format,
name=name)
def _norm(name, l_input, lsize=4):
return tf.nn.lrn(l_input,
lsize,
bias=1.0,
alpha=0.001 / 9.0,
beta=0.75,
name=name)
def loss(logits, labels):
labels = tf.cast(labels, tf.int64)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits, labels, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
tf.add_to_collection('losses', cross_entropy_mean)
# The total loss is defined as the cross entropy loss plus all of the weight
# decay terms (L2 loss).
return tf.add_n(tf.get_collection('losses'), name='total_loss')
def get_incoming_shape(incoming):
""" Returns the incoming data shape """
if isinstance(incoming, tf.Tensor):
return incoming.get_shape().as_list()
elif type(incoming) in [np.array, list, tuple]:
return np.shape(incoming)
else:
raise Exception("Invalid incoming layer.")
def inference(images):
conv1 = _conv('conv1', images, 3, 96, 11, 11, 4, 4, 'VALID')
pool1 = _mpool('pool1', conv1, 3, 3, 2, 2)
norm1 = _norm('norm1', pool1, lsize=5)
conv2 = _conv('conv2', norm1, 96, 256, 5, 5, 1, 1, 'SAME')
pool2 = _mpool('pool2', conv2, 3, 3, 2, 2)
norm2 = _norm('norm2', pool2, lsize=5)
conv3 = _conv('conv3', norm2, 256, 384, 3, 3, 1, 1, 'SAME')
conv4 = _conv('conv4', conv3, 384, 384, 3, 3, 1, 1, 'SAME')
conv5 = _conv('conv5', conv4, 384, 256, 3, 3, 1, 1, 'SAME')
pool5 = _mpool('pool5', conv5, 3, 3, 2, 2)
resh1 = tf.reshape(pool5, [-1, 256 * 6 * 6])
affn1 = _affine('fc6', resh1, 256 * 6 * 6, 4096)
affn2 = _affine('fc7', affn1, 4096, 4096)
affn3 = _affine('fc8', affn2, 4096, 1000, wd=None, act=False) # last fc
return affn3
def tower_loss(scope):
"""Calculate the total loss on a single tower running the model.
Args:
scope: unique prefix string identifying the tower, e.g. 'tower_0'
Returns:
Tensor of shape [] containing the total loss for a batch of data
"""
image_size = 224
if FLAGS.data_format == 'NCHW':
image_shape = [FLAGS.batch_size, 3, image_size + 3, image_size + 3]
else:
image_shape = [FLAGS.batch_size, image_size + 3, image_size + 3, 3]
images = tf.get_variable(
'image',
image_shape,
initializer=tf.truncated_normal_initializer(
stddev=0.1, dtype=tf.float32),
dtype=tf.float32,
trainable=False)
labels = tf.get_variable(
'label', [FLAGS.batch_size],
initializer=tf.constant_initializer(1),
dtype=tf.int32,
trainable=False)
# Build a Graph that computes the logits predictions from the
# inference model.
last_layer = inference(images)
# Build the portion of the Graph calculating the losses. Note that we will
# assemble the total_loss using a custom function below.
_ = loss(last_layer, labels)
# Assemble all of the losses for the current tower only.
losses = tf.get_collection('losses', scope)
# Calculate the total loss for the current tower.
total_loss = tf.add_n(losses, name='total_loss')
# Compute the moving average of all individual losses and the total loss.
loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
loss_averages_op = loss_averages.apply(losses + [total_loss])
# Attach a scalar summary to all individual losses and the total loss; do the
# same for the averaged version of the losses.
for l in losses + [total_loss]:
# Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
# session. This helps the clarity of presentation on tensorboard.
loss_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', l.op.name)
# Name each loss as '(raw)' and name the moving average version of the loss
# as the original loss name.
tf.scalar_summary(loss_name + ' (raw)', l)
tf.scalar_summary(loss_name, loss_averages.average(l))
with tf.control_dependencies([loss_averages_op]):
total_loss = tf.identity(total_loss)
return total_loss
def average_gradients(tower_grads):
"""Calculate the average gradient for each shared variable across all towers.
Note that this function provides a synchronization point across all towers.
Args:
tower_grads: List of lists of (gradient, variable) tuples. The outer list
is over individual gradients. The inner list is over the gradient
calculation for each tower.
Returns:
List of pairs of (gradient, variable) where the gradient has been averaged
across all towers.
"""
average_grads = []
for grad_and_vars in zip(*tower_grads):
# Note that each grad_and_vars looks like the following:
# ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
grads = []
for g, _ in grad_and_vars:
# Add 0 dimension to the gradients to represent the tower.
expanded_g = tf.expand_dims(g, 0)
# Append on a 'tower' dimension which we will average over below.
grads.append(expanded_g)
# Average over the 'tower' dimension.
grad = tf.concat(0, grads)
grad = tf.reduce_mean(grad, 0)
# Keep in mind that the Variables are redundant because they are shared
# across towers. So .. we will just return the first tower's pointer to
# the Variable.
v = grad_and_vars[0][1]
grad_and_var = (grad, v)
average_grads.append(grad_and_var)
return average_grads
def time_tensorflow_run(session, target):
num_steps_burn_in = 50
total_duration = 0.0
total_duration_squared = 0.0
for i in xrange(FLAGS.num_batches + num_steps_burn_in):
start_time = time.time()
_, loss_value = session.run(target)
duration = time.time() - start_time
if i > num_steps_burn_in:
if not i % 10:
num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus
examples_per_sec = num_examples_per_step / duration
sec_per_batch = duration
format_str = (
'%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
'sec/batch batch_size = %d)')
print(format_str %
(datetime.now(), i - num_steps_burn_in, loss_value,
duration, sec_per_batch, num_examples_per_step))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / FLAGS.num_batches
vr = total_duration_squared / FLAGS.num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: FwdBwd across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), FLAGS.num_batches, mn, sd))
def run_benchmark():
with tf.Graph().as_default(), tf.device('/cpu:0'):
# Create a variable to count the number of train() calls. This equals the
# number of batches processed * FLAGS.num_gpus.
global_step = tf.get_variable(
'global_step', [],
initializer=tf.constant_initializer(0),
trainable=False)
# Calculate the learning rate schedule.
num_batches_per_epoch = (NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /
FLAGS.batch_size)
decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)
# Decay the learning rate exponentially based on the number of steps.
lr = tf.train.exponential_decay(
INITIAL_LEARNING_RATE,
global_step,
decay_steps,
LEARNING_RATE_DECAY_FACTOR,
staircase=True)
# Create an optimizer that performs gradient descent.
opt = tf.train.MomentumOptimizer(lr, 0.9)
# Calculate the gradients for each model tower.
tower_grads = []
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
# Calculate the loss for one tower of the model. This function
# constructs the entire model but shares the variables across
# all towers.
loss = tower_loss(scope)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
# Calculate the gradients for the batch of data on this tower.
grads = opt.compute_gradients(loss)
# Keep track of the gradients across all towers.
tower_grads.append(grads)
# We must calculate the mean of each gradient. Note that this is the
# synchronization point across all towers.
grads = average_gradients(tower_grads)
# Apply the gradients to adjust the shared variables.
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
# Group all updates to into a single train op.
train_op = tf.group(apply_gradient_op)
# Build an initialization operation.
init = tf.initialize_all_variables()
# Start running operations on the Graph. allow_soft_placement must be set to
# True to build towers on GPU, as some of the ops do not have GPU
# implementations.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
time_tensorflow_run(sess, [train_op, loss])
def main(_):
run_benchmark()
if __name__ == '__main__':
tf.app.run()
from six.moves import xrange
from datetime import datetime
import math
import time
import tensorflow.python.platform
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 128, """Batch size.""")
tf.app.flags.DEFINE_integer('num_batches', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('forward_only', False,
"""Only run the forward pass.""")
tf.app.flags.DEFINE_boolean('forward_backward_only', False,
"""Only run the forward-forward pass.""")
tf.app.flags.DEFINE_string('data_format', 'NCHW',
"""The data format for Convnet operations.
Can be either NHWC or NCHW.
""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
parameters = []
conv_counter = 1
pool_counter = 1
affine_counter = 1
def _conv(inpOp, nIn, nOut, kH, kW, dH, dW, padType, wd=0.0005):
global conv_counter
global parameters
name = 'conv' + str(conv_counter)
conv_counter += 1
with tf.name_scope(name) as scope:
kernel = tf.Variable(
tf.truncated_normal(
[kH, kW, nIn, nOut], dtype=tf.float32, stddev=1e-1),
name='weights')
if wd is not None and wd > 0:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
if FLAGS.data_format == 'NCHW':
strides = [1, 1, dH, dW]
else:
strides = [1, dH, dW, 1]
conv = tf.nn.conv2d(
inpOp,
kernel,
strides,
padding=padType,
data_format=FLAGS.data_format)
biases = tf.Variable(
tf.constant(
0.0, shape=[nOut], dtype=tf.float32),
trainable=True,
name='biases')
bias = tf.reshape(
tf.nn.bias_add(
conv, biases, data_format=FLAGS.data_format),
conv.get_shape())
conv1 = tf.nn.relu(bias, name=scope)
parameters += [kernel, biases]
return conv1
def _affine(inpOp, nIn, nOut, act=True, wd=0.0005):
global affine_counter
global parameters
name = 'affine' + str(affine_counter)
affine_counter += 1
with tf.name_scope(name) as scope:
kernel = tf.Variable(
tf.truncated_normal(
[nIn, nOut], dtype=tf.float32, stddev=1e-1),
name='weights')
if wd is not None and wd > 0:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
biases = tf.Variable(
tf.constant(
0.0, shape=[nOut], dtype=tf.float32),
trainable=True,
name='biases')
affine1 = tf.nn.relu_layer(
inpOp, kernel, biases,
name=name) if act else tf.matmul(inpOp, kernel) + biases
parameters += [kernel, biases]
return affine1
def _mpool(inpOp, kH, kW, dH, dW, padding):
global pool_counter
global parameters
name = 'pool' + str(pool_counter)
pool_counter += 1
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.max_pool(
inpOp,
ksize=ksize,
strides=strides,
padding=padding,
data_format=FLAGS.data_format,
name=name)
def _apool(inpOp, kH, kW, dH, dW, padding):
global pool_counter
global parameters
name = 'pool' + str(pool_counter)
pool_counter += 1
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.avg_pool(
inpOp,
ksize=ksize,
strides=strides,
padding=padding,
data_format=FLAGS.data_format,
name=name)
def _inception(inp, inSize, o1s, o2s1, o2s2, o3s1, o3s2, o4s1, o4s2):
conv1 = _conv(inp, inSize, o1s, 1, 1, 1, 1, 'VALID')
conv3_ = _conv(inp, inSize, o2s1, 1, 1, 1, 1, 'VALID')
conv3 = _conv(conv3_, o2s1, o2s2, 3, 3, 1, 1, 'SAME')
conv5_ = _conv(inp, inSize, o3s1, 1, 1, 1, 1, 'VALID')
conv5 = _conv(conv5_, o3s1, o3s2, 5, 5, 1, 1, 'SAME')
pool_ = _mpool(inp, o4s1, o4s1, 1, 1, 'SAME')
pool = _conv(pool_, inSize, o4s2, 1, 1, 1, 1, 'VALID')
if FLAGS.data_format == 'NCHW':
channel_dim = 1
else:
channel_dim = 3
incept = tf.concat(channel_dim, [conv1, conv3, conv5, pool])
return incept
def loss(logits, labels):
batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(concated,
tf.pack([batch_size, 1000]), 1.0, 0.0)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
logits, onehot_labels, name='xentropy')
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
return loss
def inference(images):
# stage 1
conv1 = _conv(images, 3, 64, 7, 7, 2, 2, 'SAME')
pool1 = _mpool(conv1, 3, 3, 2, 2, 'SAME')
# stage 2
conv2 = _conv(pool1, 64, 64, 1, 1, 1, 1, 'VALID')
conv3 = _conv(conv2, 64, 192, 3, 3, 1, 1, 'SAME')
pool3 = _mpool(conv3, 3, 3, 2, 2, 'SAME')
# stage 3
incept3a = _inception(pool3, 192, 64, 96, 128, 16, 32, 3, 32)
incept3b = _inception(incept3a, 256, 128, 128, 192, 32, 96, 3, 64)
pool4 = _mpool(incept3b, 3, 3, 2, 2, 'SAME')
# stage 4
incept4a = _inception(pool4, 480, 192, 96, 208, 16, 48, 3, 64)
incept4b = _inception(incept4a, 512, 160, 112, 224, 24, 64, 3, 64)
incept4c = _inception(incept4b, 512, 128, 128, 256, 24, 64, 3, 64)
incept4d = _inception(incept4c, 512, 112, 144, 288, 32, 64, 3, 64)
incept4e = _inception(incept4d, 528, 256, 160, 320, 32, 128, 3, 128)
pool5 = _mpool(incept4e, 3, 3, 2, 2, 'SAME')
# stage 5
incept5a = _inception(pool5, 832, 256, 160, 320, 32, 128, 3, 128)
incept5b = _inception(incept5a, 832, 384, 192, 384, 48, 128, 3, 128)
pool6 = _apool(incept5b, 7, 7, 1, 1, 'VALID')
# output 1
resh1 = tf.reshape(pool6, [-1, 1024])
drop = tf.nn.dropout(resh1, 0.4)
affn1 = _affine(resh1, 1024, 1000, act=False)
return affn1
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10
total_duration = 0.0
total_duration_squared = 0.0
if not isinstance(target, list):
target = [target]
target_op = tf.group(*target)
for i in range(FLAGS.num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target_op)
duration = time.time() - start_time
if i > num_steps_burn_in:
if not i % 10:
print('%s: step %d, duration = %.3f' %
(datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / FLAGS.num_batches
vr = total_duration_squared / FLAGS.num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), info_string, FLAGS.num_batches, mn, sd))
def run_benchmark():
global parameters
with tf.Graph().as_default():
# Generate some dummy images.
image_size = 224
if FLAGS.data_format == 'NCHW':
image_shape = [FLAGS.batch_size, 3, image_size, image_size]
else:
image_shape = [FLAGS.batch_size, image_size, image_size, 3]
images = tf.get_variable(
'image',
image_shape,
initializer=tf.truncated_normal_initializer(
stddev=0.1, dtype=tf.float32),
dtype=tf.float32,
trainable=False)
labels = tf.get_variable(
'label', [FLAGS.batch_size],
initializer=tf.constant_initializer(1),
dtype=tf.int32,
trainable=False)
# Build a Graph that computes the logits predictions from the
# inference model.
last_layer = inference(images)
objective = loss(last_layer, labels)
# Compute gradients.
# opt = tf.train.GradientDescentOptimizer(0.001)
opt = tf.train.MomentumOptimizer(0.001, 0.9)
grads = opt.compute_gradients(objective)
global_step = tf.get_variable(
'global_step', [],
initializer=tf.constant_initializer(
0.0, dtype=tf.float32),
trainable=False,
dtype=tf.float32)
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
# Track the moving averages of all trainable variables.
variable_averages = tf.train.ExponentialMovingAverage(0.9, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables(
))
# Build an initialization operation.
init = tf.initialize_all_variables()
# Start running operations on the Graph.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
run_forward = True
run_forward_backward = True
if FLAGS.forward_only and FLAGS.forward_backward_only:
raise ValueError("Cannot specify --forward_only and "
"--forward_backward_only at the same time.")
if FLAGS.forward_only:
run_forward_backward = False
elif FLAGS.forward_backward_only:
run_forward = False
if run_forward:
# Run the forward benchmark.
time_tensorflow_run(sess, last_layer, "Forward")
if run_forward_backward:
with tf.control_dependencies(
[apply_gradient_op, variables_averages_op]):
train_op = tf.no_op(name='train')
time_tensorflow_run(sess, [train_op, objective], "Forward-backward")
def main(_):
run_benchmark()
if __name__ == '__main__':
tf.app.run()
from six.moves import xrange # pylint: disable=redefined-builtin
from datetime import datetime
import math
import re
import time
import tensorflow.python.platform
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 64, """Batch size.""")
tf.app.flags.DEFINE_integer('num_batches', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_string('data_format', 'NCHW',
"""The data format for Convnet operations.
Can be either NHWC or NCHW.
""")
tf.app.flags.DEFINE_string('train_dir', '/train_model',
"""Directory where to write event logs """
"""and checkpoint.""")
tf.app.flags.DEFINE_integer('num_gpus', 4, """How many GPUs to use.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EPOCHS_PER_DECAY = 50
INITIAL_LEARNING_RATE = 0.1
LEARNING_RATE_DECAY_FACTOR = 0.1
TOWER_NAME = 'tower'
def _conv(name, inpOp, nIn, nOut, kH, kW, dH, dW, padType, wd=0.005):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [kH, kW, nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
if wd is not None:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
if FLAGS.data_format == 'NCHW':
strides = [1, 1, dH, dW]
else:
strides = [1, dH, dW, 1]
conv = tf.nn.conv2d(
inpOp,
kernel,
strides,
padding=padType,
data_format=FLAGS.data_format)
biases = tf.get_variable(
name=name + '_b',
shape=[nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32)
bias = tf.reshape(
tf.nn.bias_add(
conv, biases, data_format=FLAGS.data_format),
conv.get_shape())
conv1 = tf.nn.relu(bias, name=scope)
return conv1
def _affine(name, inpOp, nIn, nOut, wd=0.005, act=True):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
if wd is not None:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
biases = tf.get_variable(
name + '_b', [nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32,
trainable=True)
affine1 = tf.nn.relu_layer(inpOp, kernel, biases, name=name) if act else \
tf.matmul(inpOp, kernel) + biases
return affine1
def _mpool(name, inpOp, kH, kW, dH, dW, padding):
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.max_pool(
inpOp,
ksize=ksize,
strides=strides,
padding=padding,
data_format=FLAGS.data_format,
name=name)
def _apool(name, inpOp, kH, kW, dH, dW, padding):
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.avg_pool(
inpOp,
ksize=ksize,
strides=strides,
padding=padding,
data_format=FLAGS.data_format,
name=name)
def loss(logits, labels):
labels = tf.cast(labels, tf.int64)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits, labels, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
tf.add_to_collection('losses', cross_entropy_mean)
# The total loss is defined as the cross entropy loss plus all of the weight
# decay terms (L2 loss).
return tf.add_n(tf.get_collection('losses'), name='total_loss')
def get_incoming_shape(incoming):
""" Returns the incoming data shape """
if isinstance(incoming, tf.Tensor):
return incoming.get_shape().as_list()
elif type(incoming) in [np.array, list, tuple]:
return np.shape(incoming)
else:
raise Exception("Invalid incoming layer.")
def _inception(name, inp, inSize, o1s, o2s1, o2s2, o3s1, o3s2, o4s1, o4s2):
conv1 = _conv(name + '_1', inp, inSize, o1s, 1, 1, 1, 1, 'VALID')
conv3_ = _conv(name + '_3r', inp, inSize, o2s1, 1, 1, 1, 1, 'VALID')
conv3 = _conv(name + '_3', conv3_, o2s1, o2s2, 3, 3, 1, 1, 'SAME')
conv5_ = _conv(name + '_5r', inp, inSize, o3s1, 1, 1, 1, 1, 'VALID')
conv5 = _conv(name + '5', conv5_, o3s1, o3s2, 5, 5, 1, 1, 'SAME')
pool_ = _mpool(name + 'pool', inp, o4s1, o4s1, 1, 1, 'SAME')
pool = _conv(name + 'proj', pool_, inSize, o4s2, 1, 1, 1, 1, 'VALID')
if FLAGS.data_format == 'NCHW':
channel_dim = 1
else:
channel_dim = 3
incept = tf.concat(channel_dim, [conv1, conv3, conv5, pool])
return incept
def inference(images):
# stage 1
conv1 = _conv('conv1', images, 3, 64, 7, 7, 2, 2, 'SAME')
pool1 = _mpool('pool1', conv1, 3, 3, 2, 2, 'SAME')
# stage 2
conv2 = _conv('conv2', pool1, 64, 64, 1, 1, 1, 1, 'VALID')
conv3 = _conv('conv3', conv2, 64, 192, 3, 3, 1, 1, 'SAME')
pool3 = _mpool('pool3', conv3, 3, 3, 2, 2, 'SAME')
# stage 3
incept3a = _inception('ince3a', pool3, 192, 64, 96, 128, 16, 32, 3, 32)
incept3b = _inception('ince3b', incept3a, 256, 128, 128, 192, 32, 96, 3, 64)
pool4 = _mpool('pool4', incept3b, 3, 3, 2, 2, 'SAME')
# stage 4
incept4a = _inception('ince4a', pool4, 480, 192, 96, 208, 16, 48, 3, 64)
incept4b = _inception('ince4b', incept4a, 512, 160, 112, 224, 24, 64, 3, 64)
incept4c = _inception('ince4c', incept4b, 512, 128, 128, 256, 24, 64, 3, 64)
incept4d = _inception('ince4d', incept4c, 512, 112, 144, 288, 32, 64, 3, 64)
incept4e = _inception('ince4e', incept4d, 528, 256, 160, 320, 32, 128, 3,
128)
pool5 = _mpool('pool5', incept4e, 3, 3, 2, 2, 'SAME')
# stage 5
incept5a = _inception('ince5a', pool5, 832, 256, 160, 320, 32, 128, 3, 128)
incept5b = _inception('ince5b', incept5a, 832, 384, 192, 384, 48, 128, 3,
128)
pool6 = _apool('pool6', incept5b, 7, 7, 1, 1, 'VALID')
# output 1
resh1 = tf.reshape(pool6, [-1, 1024])
drop = tf.nn.dropout(resh1, 0.4)
affn1 = _affine('fc_out', resh1, 1024, 1000, act=False)
return affn1
def tower_loss(scope):
"""Calculate the total loss on a single tower running the model.
Args:
scope: unique prefix string identifying the tower, e.g. 'tower_0'
Returns:
Tensor of shape [] containing the total loss for a batch of data
"""
image_size = 224
if FLAGS.data_format == 'NCHW':
image_shape = [FLAGS.batch_size, 3, image_size, image_size]
else:
image_shape = [FLAGS.batch_size, image_size, image_size, 3]
images = tf.get_variable(
'image',
image_shape,
initializer=tf.truncated_normal_initializer(
stddev=0.1, dtype=tf.float32),
dtype=tf.float32,
trainable=False)
labels = tf.get_variable(
'label', [FLAGS.batch_size],
initializer=tf.constant_initializer(1),
dtype=tf.int32,
trainable=False)
# Build a Graph that computes the logits predictions from the
# inference model.
last_layer = inference(images)
# Build the portion of the Graph calculating the losses. Note that we will
# assemble the total_loss using a custom function below.
_ = loss(last_layer, labels)
# Assemble all of the losses for the current tower only.
losses = tf.get_collection('losses', scope)
# Calculate the total loss for the current tower.
total_loss = tf.add_n(losses, name='total_loss')
# Compute the moving average of all individual losses and the total loss.
loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
loss_averages_op = loss_averages.apply(losses + [total_loss])
# Attach a scalar summary to all individual losses and the total loss; do the
# same for the averaged version of the losses.
for l in losses + [total_loss]:
# Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
# session. This helps the clarity of presentation on tensorboard.
loss_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', l.op.name)
# Name each loss as '(raw)' and name the moving average version of the loss
# as the original loss name.
tf.scalar_summary(loss_name + ' (raw)', l)
tf.scalar_summary(loss_name, loss_averages.average(l))
with tf.control_dependencies([loss_averages_op]):
total_loss = tf.identity(total_loss)
return total_loss
def average_gradients(tower_grads):
"""Calculate the average gradient for each shared variable across all towers.
Note that this function provides a synchronization point across all towers.
Args:
tower_grads: List of lists of (gradient, variable) tuples. The outer list
is over individual gradients. The inner list is over the gradient
calculation for each tower.
Returns:
List of pairs of (gradient, variable) where the gradient has been averaged
across all towers.
"""
average_grads = []
for grad_and_vars in zip(*tower_grads):
# Note that each grad_and_vars looks like the following:
# ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
grads = []
for g, _ in grad_and_vars:
# Add 0 dimension to the gradients to represent the tower.
expanded_g = tf.expand_dims(g, 0)
# Append on a 'tower' dimension which we will average over below.
grads.append(expanded_g)
# Average over the 'tower' dimension.
grad = tf.concat(0, grads)
grad = tf.reduce_mean(grad, 0)
# Keep in mind that the Variables are redundant because they are shared
# across towers. So .. we will just return the first tower's pointer to
# the Variable.
v = grad_and_vars[0][1]
grad_and_var = (grad, v)
average_grads.append(grad_and_var)
return average_grads
def time_tensorflow_run(session, target):
num_steps_burn_in = 50
total_duration = 0.0
total_duration_squared = 0.0
for i in xrange(FLAGS.num_batches + num_steps_burn_in):
start_time = time.time()
_, loss_value = session.run(target)
duration = time.time() - start_time
if i > num_steps_burn_in:
if not i % 10:
num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus
examples_per_sec = num_examples_per_step / duration
sec_per_batch = duration
format_str = (
'%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
'sec/batch batch_size = %d)')
print(format_str %
(datetime.now(), i - num_steps_burn_in, loss_value,
duration, sec_per_batch, num_examples_per_step))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / FLAGS.num_batches
vr = total_duration_squared / FLAGS.num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: FwdBwd across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), FLAGS.num_batches, mn, sd))
def run_benchmark():
with tf.Graph().as_default(), tf.device('/cpu:0'):
# Create a variable to count the number of train() calls. This equals the
# number of batches processed * FLAGS.num_gpus.
global_step = tf.get_variable(
'global_step', [],
initializer=tf.constant_initializer(0),
trainable=False)
# Calculate the learning rate schedule.
num_batches_per_epoch = (NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /
FLAGS.batch_size)
decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)
# Decay the learning rate exponentially based on the number of steps.
lr = tf.train.exponential_decay(
INITIAL_LEARNING_RATE,
global_step,
decay_steps,
LEARNING_RATE_DECAY_FACTOR,
staircase=True)
# Create an optimizer that performs gradient descent.
opt = tf.train.MomentumOptimizer(lr, 0.9)
# Calculate the gradients for each model tower.
tower_grads = []
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
# Calculate the loss for one tower of the model. This function
# constructs the entire model but shares the variables across
# all towers.
loss = tower_loss(scope)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
# Calculate the gradients for the batch of data on this tower.
grads = opt.compute_gradients(loss)
# Keep track of the gradients across all towers.
tower_grads.append(grads)
# We must calculate the mean of each gradient. Note that this is the
# synchronization point across all towers.
grads = average_gradients(tower_grads)
# Apply the gradients to adjust the shared variables.
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
# Group all updates to into a single train op.
train_op = tf.group(apply_gradient_op)
# Build an initialization operation.
init = tf.initialize_all_variables()
# Start running operations on the Graph. allow_soft_placement must be set to
# True to build towers on GPU, as some of the ops do not have GPU
# implementations.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
time_tensorflow_run(sess, [train_op, loss])
def main(_):
run_benchmark()
if __name__ == '__main__':
tf.app.run()
set -e
function test() {
cfg=$1
batch_size=$2
prefix=$3
python $cfg --batch_size=$batch_size > logs/${prefix}-1gpu-${batch_size}.log 2>&1
}
if [ ! -d "logs" ]; then
mkdir logs
fi
# alexnet
test alexnet.py 64 alexnet
test alexnet.py 128 alexnet
test alexnet.py 256 alexnet
test alexnet.py 512 alexnet
# googlenet
test googlenet.py 64 googlenet
test googlenet.py 128 googlenet
# smallnet
test smallnet_mnist_cifar.py 64 smallnet
test smallnet_mnist_cifar.py 128 smallnet
test smallnet_mnist_cifar.py 256 smallnet
test smallnet_mnist_cifar.py 512 smallnet
set -e
function test() {
cfg=$1
num_gpu=$2
batch_size=$3
batch_per_gpu=`expr ${batch_size} / ${num_gpu}`
prefix=$4
python $cfg --num_gpus=$num_gpu --batch_size=${batch_per_gpu} > logs/${prefix}-4gpu-${batch_size}.log 2>&1
}
if [ ! -d "logs" ]; then
mkdir logs
fi
# alexnet
test alexnet_multi_gpu.py 4 512 alexnet
test alexnet_multi_gpu.py 4 1024 alexnet
# googlenet
test googlenet_multi_gpu.py 4 512 alexnet
test googlenet_multi_gpu.py 4 1024 alexnet
from six.moves import xrange # pylint: disable=redefined-builtin
from datetime import datetime
import math
import time
import tensorflow.python.platform
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 128, """Batch size.""")
tf.app.flags.DEFINE_integer('num_batches', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('forward_only', False,
"""Only run the forward pass.""")
tf.app.flags.DEFINE_boolean('forward_backward_only', False,
"""Only run the forward-forward pass.""")
tf.app.flags.DEFINE_string('data_format', 'NCHW',
"""The data format for Convnet operations.
Can be either NHWC or NCHW.
""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
parameters = []
conv_counter = 1
pool_counter = 1
affine_counter = 1
def _conv(inpOp, nIn, nOut, kH, kW, dH, dW, padType, wd=0.005, act=True):
global conv_counter
global parameters
name = 'conv' + str(conv_counter)
conv_counter += 1
with tf.name_scope(name) as scope:
kernel = tf.Variable(
tf.truncated_normal(
[kH, kW, nIn, nOut], dtype=tf.float32, stddev=1e-1),
name='weights')
if wd is not None:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
if FLAGS.data_format == 'NCHW':
strides = [1, 1, dH, dW]
else:
strides = [1, dH, dW, 1]
conv = tf.nn.conv2d(
inpOp,
kernel,
strides,
padding=padType,
data_format=FLAGS.data_format)
biases = tf.Variable(
tf.constant(
0.0, shape=[nOut], dtype=tf.float32),
trainable=True,
name='biases')
bias = tf.reshape(
tf.nn.bias_add(
conv, biases, data_format=FLAGS.data_format),
conv.get_shape())
conv1 = tf.nn.relu(bias, name=scope) if act else bias
parameters += [kernel, biases]
return conv1
def _affine(inpOp, nIn, nOut, wd=None, act=True):
global affine_counter
global parameters
name = 'affine' + str(affine_counter)
affine_counter += 1
with tf.name_scope(name) as scope:
kernel = tf.Variable(
tf.truncated_normal(
[nIn, nOut], dtype=tf.float32, stddev=1e-1),
name='weights')
if wd is not None:
weight_decay = tf.mul(tf.nn.l2_loss(kernel), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
biases = tf.Variable(
tf.constant(
0.0, shape=[nOut], dtype=tf.float32),
trainable=True,
name='biases')
affine1 = tf.nn.relu_layer(
inpOp, kernel, biases,
name=name) if act else tf.matmul(inpOp, kernel) + biases
parameters += [kernel, biases]
return affine1
def _mpool(inpOp, kH, kW, dH, dW, padding):
global pool_counter
global parameters
name = 'pool' + str(pool_counter)
pool_counter += 1
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.max_pool(
inpOp,
ksize=ksize,
strides=strides,
padding=padding,
data_format=FLAGS.data_format,
name=name)
def _apool(inpOp, kH, kW, dH, dW, padding):
global pool_counter
global parameters
name = 'pool' + str(pool_counter)
pool_counter += 1
if FLAGS.data_format == 'NCHW':
ksize = [1, 1, kH, kW]
strides = [1, 1, dH, dW]
else:
ksize = [1, kH, kW, 1]
strides = [1, dH, dW, 1]
return tf.nn.avg_pool(
inpOp,
ksize=ksize,
strides=strides,
padding=padding,
data_format=FLAGS.data_format,
name=name)
def _norm(name, l_input, lsize=4):
return tf.nn.lrn(l_input,
lsize,
bias=1.0,
alpha=0.001 / 9.0,
beta=0.75,
name=name)
def loss(logits, labels):
batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(concated,
tf.pack([batch_size, 10]), 1.0, 0.0)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
logits, onehot_labels, name='xentropy')
loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
return loss
def get_incoming_shape(incoming):
""" Returns the incoming data shape """
if isinstance(incoming, tf.Tensor):
return incoming.get_shape().as_list()
elif type(incoming) in [np.array, list, tuple]:
return np.shape(incoming)
else:
raise Exception("Invalid incoming layer.")
def inference(images):
conv1 = _conv(images, 3, 32, 5, 5, 1, 1, 'SAME')
pool1 = _mpool(conv1, 3, 3, 2, 2, 'SAME')
conv2 = _conv(pool1, 32, 32, 5, 5, 1, 1, 'SAME')
pool2 = _apool(conv2, 3, 3, 2, 2, 'SAME')
conv3 = _conv(pool2, 32, 64, 5, 5, 1, 1, 'SAME')
pool3 = _apool(conv3, 3, 3, 2, 2, 'SAME')
resh1 = tf.reshape(pool3, [-1, 64 * 4 * 4])
affn1 = _affine(resh1, 64 * 4 * 4, 64)
affn2 = _affine(affn1, 64, 10, act=False)
print('conv1:', get_incoming_shape(conv1))
print('pool1:', get_incoming_shape(pool1))
print('conv2:', get_incoming_shape(conv2))
print('pool2:', get_incoming_shape(pool2))
print('conv3:', get_incoming_shape(conv3))
print('pool3:', get_incoming_shape(pool3))
return affn2
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10
total_duration = 0.0
total_duration_squared = 0.0
if not isinstance(target, list):
target = [target]
target_op = tf.group(*target)
for i in xrange(FLAGS.num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target_op)
duration = time.time() - start_time
if i > num_steps_burn_in:
if not i % 10:
print('%s: step %d, duration = %.3f' %
(datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / FLAGS.num_batches
vr = total_duration_squared / FLAGS.num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), info_string, FLAGS.num_batches, mn, sd))
def run_benchmark():
global parameters
with tf.Graph().as_default():
# Generate some dummy images.
image_size = 32
# Note that our padding definition is slightly different the cuda-convnet.
# In order to force the model to start with the same activations sizes,
# we add 3 to the image_size and employ VALID padding above.
if FLAGS.data_format == 'NCHW':
image_shape = [FLAGS.batch_size, 3, image_size, image_size]
else:
image_shape = [FLAGS.batch_size, image_size, image_size, 3]
images = tf.get_variable(
'image',
image_shape,
initializer=tf.truncated_normal_initializer(
stddev=0.1, dtype=tf.float32),
dtype=tf.float32,
trainable=False)
labels = tf.get_variable(
'label', [FLAGS.batch_size],
initializer=tf.constant_initializer(1),
dtype=tf.int32,
trainable=False)
# Build a Graph that computes the logits predictions from the
# inference model.
last_layer = inference(images)
objective = loss(last_layer, labels)
# Compute gradients.
opt = tf.train.MomentumOptimizer(0.001, 0.9)
grads = opt.compute_gradients(objective)
global_step = tf.get_variable(
'global_step', [],
initializer=tf.constant_initializer(
0.0, dtype=tf.float32),
trainable=False,
dtype=tf.float32)
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
# Track the moving averages of all trainable variables.
variable_averages = tf.train.ExponentialMovingAverage(0.9, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables(
))
# Build an initialization operation.
init = tf.initialize_all_variables()
# Start running operations on the Graph.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
run_forward = True
run_forward_backward = True
if FLAGS.forward_only and FLAGS.forward_backward_only:
raise ValueError("Cannot specify --forward_only and "
"--forward_backward_only at the same time.")
if FLAGS.forward_only:
run_forward_backward = False
elif FLAGS.forward_backward_only:
run_forward = False
if run_forward:
# Run the forward benchmark.
time_tensorflow_run(sess, last_layer, "Forward")
if run_forward_backward:
with tf.control_dependencies(
[apply_gradient_op, variables_averages_op]):
train_op = tf.no_op(name='train')
time_tensorflow_run(sess, [train_op, objective], "Forward-backward")
def main(_):
run_benchmark()
if __name__ == '__main__':
tf.app.run()
You also should install tflearn:
```bash
pip install -r requirements.txt
```
import os.path
import io
import numpy as np
import tensorflow as tf
# tflearn
import tflearn
from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb
FLAGS = tf.app.flags.FLAGS
class DataSet(object):
def __init__(self, data, labels):
assert data.shape[0] == labels.shape[0], (
'data.shape: %s labels.shape: %s' % (data.shape, labels.shape))
self._num_examples = data.shape[0]
self._data = data
self._labels = labels
self._epochs_completed = 0
self._index_in_epoch = 0
@property
def data(self):
return self._data
@property
def labels(self):
return self._labels
@property
def num_examples(self):
return self._num_examples
@property
def epochs_completed(self):
return self._epochs_completed
def next_batch(self, batch_size):
assert batch_size <= self._num_examples
start = self._index_in_epoch
self._index_in_epoch += batch_size
if self._index_in_epoch > self._num_examples:
# Finished epoch
self._epochs_completed += 1
# Shuffle the data
perm = np.arange(self._num_examples)
np.random.shuffle(perm)
self._data = self._data[perm]
self._labels = self._labels[perm]
# Start next epoch
start = 0
self._index_in_epoch = batch_size
end = self._index_in_epoch
return self._data[start:end], self._labels[start:end]
def create_datasets(file_path, vocab_size=30000, val_fraction=0.0):
# IMDB Dataset loading
train, test, _ = imdb.load_data(
path=file_path,
n_words=vocab_size,
valid_portion=val_fraction,
sort_by_len=False)
trainX, trainY = train
testX, testY = test
# Data preprocessing
# Sequence padding
trainX = pad_sequences(trainX, maxlen=FLAGS.max_len, value=0.)
testX = pad_sequences(testX, maxlen=FLAGS.max_len, value=0.)
# Converting labels to binary vectors
trainY = to_categorical(trainY, nb_classes=2)
testY = to_categorical(testY, nb_classes=2)
train_dataset = DataSet(trainX, trainY)
return train_dataset
def main():
create_datasets('imdb.pkl')
if __name__ == "__main__":
main()
#!/usr/bin/env python
from six.moves import xrange # pylint: disable=redefined-builtin
import math
import time
import numpy as np
from datetime import datetime
import reader
import tensorflow as tf
from tensorflow.python.ops import rnn
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 128, """Batch size.""")
tf.app.flags.DEFINE_integer('num_batches', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_integer('num_layers', 1, """Number of batches to run.""")
tf.app.flags.DEFINE_integer('max_len', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('forward_only', False,
"""Only run the forward pass.""")
tf.app.flags.DEFINE_boolean('forward_backward_only', False,
"""Only run the forward-forward pass.""")
tf.app.flags.DEFINE_integer('hidden_size', 128, """Number of batches to run.""")
tf.app.flags.DEFINE_integer('emb_size', 128, """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
VOCAB_SIZE = 30000
NUM_CLASS = 2
def get_feed_dict(x_data, y_data=None):
feed_dict = {}
if y_data is not None:
feed_dict[y_input] = y_data
for i in xrange(x_data.shape[0]):
feed_dict[x_input[i]] = x_data[i, :, :]
return feed_dict
def get_incoming_shape(incoming):
""" Returns the incoming data shape """
if isinstance(incoming, tf.Tensor):
return incoming.get_shape().as_list()
elif type(incoming) in [np.array, list, tuple]:
return np.shape(incoming)
else:
raise Exception("Invalid incoming layer.")
# Note input * W is done in LSTMCell,
# which is different from PaddlePaddle
def single_lstm(name,
incoming,
n_units,
use_peepholes=True,
return_seq=False,
return_state=False):
with tf.name_scope(name) as scope:
cell = tf.nn.rnn_cell.LSTMCell(n_units, use_peepholes=use_peepholes)
output, _cell_state = rnn.rnn(cell, incoming, dtype=tf.float32)
out = output if return_seq else output[-1]
return (out, _cell_state) if return_state else out
def lstm(name,
incoming,
n_units,
use_peepholes=True,
return_seq=False,
return_state=False,
num_layers=1):
with tf.name_scope(name) as scope:
lstm_cell = tf.nn.rnn_cell.LSTMCell(
n_units, use_peepholes=use_peepholes)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * num_layers)
initial_state = cell.zero_state(FLAGS.batch_size, dtype=tf.float32)
if not isinstance(incoming, list):
# if the input is embeding, the Tensor shape : [None, time_step, emb_size]
incoming = [
tf.squeeze(input_, [1])
for input_ in tf.split(1, FLAGS.max_len, incoming)
]
outputs, state = tf.nn.rnn(cell,
incoming,
initial_state=initial_state,
dtype=tf.float32)
out = outputs if return_seq else outputs[-1]
return (out, _cell_state) if return_state else out
def embedding(name, incoming, vocab_size, emb_size):
with tf.name_scope(name) as scope:
#with tf.device("/cpu:0"):
embedding = tf.get_variable(
name + '_emb', [vocab_size, emb_size], dtype=tf.float32)
out = tf.nn.embedding_lookup(embedding, incoming)
return out
def fc(name, inpOp, nIn, nOut, act=True):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
biases = tf.get_variable(
name + '_b', [nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32,
trainable=True)
net = tf.nn.relu_layer(inpOp, kernel, biases, name=name) if act else \
tf.matmul(inpOp, kernel) + biases
return net
def inference(seq):
net = embedding('emb', seq, VOCAB_SIZE, FLAGS.emb_size)
print "emb:", get_incoming_shape(net)
net = lstm('lstm', net, FLAGS.hidden_size, num_layers=FLAGS.num_layers)
print "lstm:", get_incoming_shape(net)
net = fc('fc1', net, FLAGS.hidden_size, 2)
return net
def loss(logits, labels):
# one label index for one sample
labels = tf.cast(labels, tf.float32)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
logits, labels, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
tf.add_to_collection('losses', cross_entropy_mean)
return tf.add_n(tf.get_collection('losses'), name='total_loss')
def time_tensorflow_run(session, target, x_input, y_input, info_string):
num_steps_burn_in = 50
total_duration = 0.0
total_duration_squared = 0.0
if not isinstance(target, list):
target = [target]
target_op = tf.group(*target)
train_dataset = reader.create_datasets("imdb.pkl", VOCAB_SIZE)
for i in xrange(FLAGS.num_batches + num_steps_burn_in):
start_time = time.time()
data, label = train_dataset.next_batch(FLAGS.batch_size)
_ = session.run(target_op, feed_dict={x_input: data, y_input: label})
duration = time.time() - start_time
if i > num_steps_burn_in:
if not i % 10:
print('%s: step %d, duration = %.3f' %
(datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / FLAGS.num_batches
vr = total_duration_squared / FLAGS.num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), info_string, FLAGS.num_batches, mn, sd))
def run_benchmark():
with tf.Graph().as_default():
global_step = 0
with tf.device('/cpu:0'):
global_step = tf.Variable(0, trainable=False)
with tf.device('/gpu:0'):
#x_input = tf.placeholder(tf.int32, [None, FLAGS.max_len], name="x_input")
#y_input = tf.placeholder(tf.int32, [None, NUM_CLASS], name="y_input")
x_input = tf.placeholder(
tf.int32, [FLAGS.batch_size, FLAGS.max_len], name="x_input")
y_input = tf.placeholder(
tf.int32, [FLAGS.batch_size, NUM_CLASS], name="y_input")
# Generate some dummy sequnce.
last_layer = inference(x_input)
objective = loss(last_layer, y_input)
opt = tf.train.AdamOptimizer(0.001)
grads = opt.compute_gradients(objective)
apply_gradient_op = opt.apply_gradients(
grads, global_step=global_step)
init = tf.initialize_all_variables()
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
run_forward = True
run_forward_backward = True
if FLAGS.forward_only and FLAGS.forward_backward_only:
raise ValueError("Cannot specify --forward_only and "
"--forward_backward_only at the same time.")
if FLAGS.forward_only:
run_forward_backward = False
elif FLAGS.forward_backward_only:
run_forward = False
if run_forward:
time_tensorflow_run(sess, last_layer, x_input, y_input,
"Forward")
if run_forward_backward:
with tf.control_dependencies([apply_gradient_op]):
train_op = tf.no_op(name='train')
time_tensorflow_run(sess, [train_op, objective], x_input,
y_input, "Forward-backward")
def main(_):
run_benchmark()
if __name__ == '__main__':
tf.app.run()
#!/usr/bin/env python
from six.moves import xrange # pylint: disable=redefined-builtin
import re
import math
import time
import numpy as np
from datetime import datetime
import reader
import tensorflow as tf
from tensorflow.python.ops import rnn
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('batch_size', 64, """Batch size.""")
tf.app.flags.DEFINE_integer('num_batches', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_integer('num_layers', 1, """Number of batches to run.""")
tf.app.flags.DEFINE_integer('max_len', 100, """Number of batches to run.""")
tf.app.flags.DEFINE_integer('hidden_size', 128, """Number of batches to run.""")
tf.app.flags.DEFINE_integer('emb_size', 64, """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
tf.app.flags.DEFINE_integer('num_gpus', 4, """How many GPUs to use.""")
VOCAB_SIZE = 30000
NUM_CLASS = 2
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EPOCHS_PER_DECAY = 50
INITIAL_LEARNING_RATE = 0.1
LEARNING_RATE_DECAY_FACTOR = 0.1
TOWER_NAME = 'tower'
train_dataset = reader.create_datasets("imdb.pkl", VOCAB_SIZE)
def get_incoming_shape(incoming):
""" Returns the incoming data shape """
if isinstance(incoming, tf.Tensor):
return incoming.get_shape().as_list()
elif type(incoming) in [np.array, list, tuple]:
return np.shape(incoming)
else:
raise Exception("Invalid incoming layer.")
# Note input * W is done in LSTMCell,
# which is different from PaddlePaddle
def single_lstm(name,
incoming,
n_units,
use_peepholes=True,
return_seq=False,
return_state=False):
with tf.name_scope(name) as scope:
cell = tf.nn.rnn_cell.LSTMCell(n_units, use_peepholes=use_peepholes)
output, _cell_state = rnn.rnn(cell, incoming, dtype=tf.float32)
out = output if return_seq else output[-1]
return (out, _cell_state) if return_state else out
def lstm(name,
incoming,
n_units,
use_peepholes=True,
return_seq=False,
return_state=False,
num_layers=1):
with tf.name_scope(name) as scope:
lstm_cell = tf.nn.rnn_cell.LSTMCell(
n_units, use_peepholes=use_peepholes)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * num_layers)
initial_state = cell.zero_state(FLAGS.batch_size, dtype=tf.float32)
if not isinstance(incoming, list):
# if the input is embeding, the Tensor shape : [None, time_step, emb_size]
incoming = [
tf.squeeze(input_, [1])
for input_ in tf.split(1, FLAGS.max_len, incoming)
]
outputs, state = tf.nn.rnn(cell,
incoming,
initial_state=initial_state,
dtype=tf.float32)
out = outputs if return_seq else outputs[-1]
return (out, _cell_state) if return_state else out
def embedding(name, incoming, vocab_size, emb_size):
with tf.name_scope(name) as scope:
#with tf.device("/cpu:0"):
embedding = tf.get_variable(
name + '_emb', [vocab_size, emb_size], dtype=tf.float32)
out = tf.nn.embedding_lookup(embedding, incoming)
return out
def fc(name, inpOp, nIn, nOut, act=True):
with tf.name_scope(name) as scope:
kernel = tf.get_variable(
name + '_w', [nIn, nOut],
initializer=tf.truncated_normal_initializer(
stddev=0.01, dtype=tf.float32),
dtype=tf.float32)
biases = tf.get_variable(
name + '_b', [nOut],
initializer=tf.constant_initializer(
value=0.0, dtype=tf.float32),
dtype=tf.float32,
trainable=True)
net = tf.nn.relu_layer(inpOp, kernel, biases, name=name) if act else \
tf.matmul(inpOp, kernel) + biases
return net
def inference(seq):
net = embedding('emb', seq, VOCAB_SIZE, FLAGS.emb_size)
print "emb:", get_incoming_shape(net)
net = lstm('lstm', net, FLAGS.hidden_size, num_layers=FLAGS.num_layers)
print "lstm:", get_incoming_shape(net)
net = fc('fc1', net, FLAGS.hidden_size, 2)
return net
def loss(logits, labels):
# one label index for one sample
#labels = tf.cast(labels, tf.int64)
# cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
# logits, labels, name='cross_entropy_per_example')
labels = tf.cast(labels, tf.float32)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
logits, labels, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
tf.add_to_collection('losses', cross_entropy_mean)
return tf.add_n(tf.get_collection('losses'), name='total_loss')
def tower_loss(scope):
"""Calculate the total loss on a single tower running the model.
Args:
scope: unique prefix string identifying the tower, e.g. 'tower_0'
Returns:
Tensor of shape [] containing the total loss for a batch of data
"""
data, label = train_dataset.next_batch(FLAGS.batch_size)
# Build a Graph that computes the logits predictions from the
# inference model.
last_layer = inference(data)
# Build the portion of the Graph calculating the losses. Note that we will
# assemble the total_loss using a custom function below.
#_ = loss(last_layer, label)
_ = loss(last_layer, label)
# Assemble all of the losses for the current tower only.
losses = tf.get_collection('losses', scope)
# Calculate the total loss for the current tower.
total_loss = tf.add_n(losses, name='total_loss')
# Compute the moving average of all individual losses and the total loss.
loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
loss_averages_op = loss_averages.apply(losses + [total_loss])
# Attach a scalar summary to all individual losses and the total loss; do the
# same for the averaged version of the losses.
for l in losses + [total_loss]:
# Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
# session. This helps the clarity of presentation on tensorboard.
loss_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', l.op.name)
# Name each loss as '(raw)' and name the moving average version of the loss
# as the original loss name.
tf.scalar_summary(loss_name + ' (raw)', l)
#tf.scalar_summary(loss_name, loss_averages.average(l))
with tf.control_dependencies([loss_averages_op]):
total_loss = tf.identity(total_loss)
return total_loss
def average_gradients(tower_grads):
"""Calculate the average gradient for each shared variable across all towers.
Note that this function provides a synchronization point across all towers.
Args:
tower_grads: List of lists of (gradient, variable) tuples. The outer list
is over individual gradients. The inner list is over the gradient
calculation for each tower.
Returns:
List of pairs of (gradient, variable) where the gradient has been averaged
across all towers.
"""
average_grads = []
for grad_and_vars in zip(*tower_grads):
# Note that each grad_and_vars looks like the following:
# ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
grads = []
for g, _ in grad_and_vars:
# Add 0 dimension to the gradients to represent the tower.
expanded_g = tf.expand_dims(g, 0)
# Append on a 'tower' dimension which we will average over below.
grads.append(expanded_g)
# Average over the 'tower' dimension.
grad = tf.concat(0, grads)
grad = tf.reduce_mean(grad, 0)
# Keep in mind that the Variables are redundant because they are shared
# across towers. So .. we will just return the first tower's pointer to
# the Variable.
v = grad_and_vars[0][1]
grad_and_var = (grad, v)
average_grads.append(grad_and_var)
return average_grads
def time_tensorflow_run(session, target):
num_steps_burn_in = 80
total_duration = 0.0
total_duration_squared = 0.0
for i in xrange(FLAGS.num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target, feed_dict={x_input: data, y_input: label})
_, loss_value = session.run(target)
duration = time.time() - start_time
if i > num_steps_burn_in:
if not i % 10:
num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus
examples_per_sec = num_examples_per_step / duration
# sec_per_batch = duration / FLAGS.num_gpus
sec_per_batch = duration
format_str = (
'%s: step %d, loss= %.2f (%.1f examples/sec; %.3f '
'sec/batch batch_size= %d)')
print(format_str %
(datetime.now(), i - num_steps_burn_in, loss_value,
duration, sec_per_batch, num_examples_per_step))
total_duration += duration
total_duration_squared += duration * duration
mn = total_duration / FLAGS.num_batches
vr = total_duration_squared / FLAGS.num_batches - mn * mn
sd = math.sqrt(vr)
print('%s: FwdBwd across %d steps, %.3f +/- %.3f sec / batch' %
(datetime.now(), FLAGS.num_batches, mn, sd))
def run_benchmark():
with tf.Graph().as_default(), tf.device('/cpu:0'):
# Create a variable to count the number of train() calls. This equals the
# number of batches processed * FLAGS.num_gpus.
global_step = tf.get_variable(
'global_step', [],
initializer=tf.constant_initializer(0),
trainable=False)
# Calculate the learning rate schedule.
num_batches_per_epoch = (NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /
FLAGS.batch_size)
decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)
# Create an optimizer that performs gradient descent.
opt = tf.train.AdamOptimizer(0.001)
#train_dataset = reader.create_datasets("imdb.pkl", VOCAB_SIZE)
# Calculate the gradients for each model tower.
tower_grads = []
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
# Calculate the loss for one tower of the model. This function
# constructs the entire model but shares the variables across
# all towers.
loss = tower_loss(scope)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
# summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
# Calculate the gradients for the batch of data on this tower.
grads = opt.compute_gradients(loss)
# Keep track of the gradients across all towers.
tower_grads.append(grads)
# We must calculate the mean of each gradient. Note that this is the
# synchronization point across all towers.
grads = average_gradients(tower_grads)
# Apply the gradients to adjust the shared variables.
apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
# Group all updates to into a single train op.
train_op = tf.group(apply_gradient_op)
# Build an initialization operation.
init = tf.initialize_all_variables()
# Start running operations on the Graph. allow_soft_placement must be set to
# True to build towers on GPU, as some of the ops do not have GPU
# implementations.
sess = tf.Session(config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
sess.run(init)
time_tensorflow_run(sess, [train_op, loss])
def main(_):
run_benchmark()
if __name__ == '__main__':
tf.app.run()
set -e
function test() {
lstm_num=$1
batch_size=$2
hid_size=$3
prefix=$4
python rnn.py --num_layers=${lstm_num} --batch_size=$batch_size \
--hidden_size=${hid_size} \
--forward_backward_only=1 \
> logs/1gpu-${lstm_num}lstm-batch${batch_size}-hid${hid_size}.log 2>&1
}
if [ ! -d "logs" ]; then
mkdir logs
fi
#--lstm_num--batch_size--hidden_size--#
test 2 64 256
test 2 64 512
test 2 64 1280
test 2 128 256
test 2 128 512
test 2 128 1280
test 2 256 256
test 2 256 512
test 2 256 1280
set -e
function test() {
num_gpu=$1
lstm_num=$2
hid_size=$3
batch_per_gpu=`expr ${batch_size} / ${num_gpu}`
batch_size=$4
python rnn_multi_gpu.py --num_layers=${lstm_num} --batch_size=$batch_per_gpu \
--num_gpus=${num_gpu} \
--hidden_size=${hid_size} \
--forward_backward_only=1 \
> logs/${num_gpu}gpu-${lstm_num}lstm-hid${hid_size}-batch${batch_size}.log 2>&1
}
if [ ! -d "logs" ]; then
mkdir logs
fi
#--num_gpus--lstm_num--hiddne_size--batch_size--#
test 4 2 256 128
test 4 2 256 256
test 4 2 256 512
test 4 2 512 128
test 4 2 512 256
test 4 2 512 512
文件模式从 100644 更改为 100755
文件模式从 100644 更改为 100755
文件模式从 100644 更改为 100755
文件模式从 100644 更改为 100755
......@@ -19,27 +19,44 @@ START = "<s>"
END = "<e>"
def hook(settings, src_dict, trg_dict, file_list, **kwargs):
def hook(settings, src_dict_path, trg_dict_path, is_generating, file_list,
**kwargs):
# job_mode = 1: training mode
# job_mode = 0: generating mode
settings.job_mode = trg_dict is not None
settings.src_dict = src_dict
settings.job_mode = not is_generating
settings.src_dict = dict()
with open(src_dict_path, "r") as fin:
settings.src_dict = {
line.strip(): line_count
for line_count, line in enumerate(fin)
}
settings.trg_dict = dict()
with open(trg_dict_path, "r") as fin:
settings.trg_dict = {
line.strip(): line_count
for line_count, line in enumerate(fin)
}
settings.logger.info("src dict len : %d" % (len(settings.src_dict)))
settings.sample_count = 0
if settings.job_mode:
settings.trg_dict = trg_dict
settings.slots = [
settings.slots = {
'source_language_word':
integer_value_sequence(len(settings.src_dict)),
'target_language_word':
integer_value_sequence(len(settings.trg_dict)),
'target_language_next_word':
integer_value_sequence(len(settings.trg_dict))
]
}
settings.logger.info("trg dict len : %d" % (len(settings.trg_dict)))
else:
settings.slots = [
settings.slots = {
'source_language_word':
integer_value_sequence(len(settings.src_dict)),
'sent_id':
integer_value_sequence(len(open(file_list[0], "r").readlines()))
]
}
def _get_ids(s, dictionary):
......@@ -69,6 +86,10 @@ def process(settings, file_name):
continue
trg_ids_next = trg_ids + [settings.trg_dict[END]]
trg_ids = [settings.trg_dict[START]] + trg_ids
yield src_ids, trg_ids, trg_ids_next
yield {
'source_language_word': src_ids,
'target_language_word': trg_ids,
'target_language_next_word': trg_ids_next
}
else:
yield src_ids, [line_count]
yield {'source_language_word': src_ids, 'sent_id': [line_count]}
......@@ -37,17 +37,10 @@ def seq_to_seq_data(data_dir,
"""
src_lang_dict = os.path.join(data_dir, 'src.dict')
trg_lang_dict = os.path.join(data_dir, 'trg.dict')
src_dict = dict()
for line_count, line in enumerate(open(src_lang_dict, "r")):
src_dict[line.strip()] = line_count
trg_dict = dict()
for line_count, line in enumerate(open(trg_lang_dict, "r")):
trg_dict[line.strip()] = line_count
if is_generating:
train_list = None
test_list = os.path.join(data_dir, gen_list)
trg_dict = None
else:
train_list = os.path.join(data_dir, train_list)
test_list = os.path.join(data_dir, test_list)
......@@ -57,8 +50,11 @@ def seq_to_seq_data(data_dir,
test_list,
module="dataprovider",
obj="process",
args={"src_dict": src_dict,
"trg_dict": trg_dict})
args={
"src_dict_path": src_lang_dict,
"trg_dict_path": trg_lang_dict,
"is_generating": is_generating
})
return {
"src_dict_path": src_lang_dict,
......
......@@ -23,7 +23,7 @@ AutoStructify = transform.AutoStructify
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, '@PROJ_ROOT@/python')
templates_path = ["@PROJ_ROOT@/doc/templates"]
templates_path = ["@PROJ_ROOT@/doc_theme/templates"]
# -- General configuration ------------------------------------------------
......@@ -113,13 +113,12 @@ todo_include_todos = False
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#html_theme = 'sphinx_rtd_theme'
html_theme = 'classic'
html_theme = 'sphinx_rtd_theme'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ['@PROJ_ROOT@/doc_theme/static']
# Output file base name for HTML help builder.
htmlhelp_basename = project + 'doc'
......
Using and Building Docker Images
================================
PaddlePaddle in Docker Containers
=================================
We release PaddlePaddle in the form of `Docker <https://www.docker.com/>`_ images on `dockerhub.com <https://hub.docker.com/r/paddledev/paddle/>`_. Running as Docker containers is currently the only officially-supported way to running PaddlePaddle.
Docker container is currently the only officially-supported way to
running PaddlePaddle. This is reasonable as Docker now runs on all
major operating systems including Linux, Mac OS X, and Windows.
Please be aware that you will need to change `Dockers settings
<https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
of your hardware resource on Mac OS X and Windows.
Run Docker images
-----------------
For each version of PaddlePaddle, we release 4 variants of Docker images:
CPU-only and GPU Images
-----------------------
+-----------------+-------------+-------+
| | CPU AVX | GPU |
+=================+=============+=======+
| cpu | yes | no |
+-----------------+-------------+-------+
| cpu-noavx | no | no |
+-----------------+-------------+-------+
| gpu | yes | yes |
+-----------------+-------------+-------+
| gpu-noavx | no | yes |
+-----------------+-------------+-------+
For each version of PaddlePaddle, we release 2 Docker images, a
CPU-only one and a CUDA GPU one. We do so by configuring
`dockerhub.com <https://hub.docker.com/r/paddledev/paddle/>`_
automatically runs the following commands:
We run the following command on Linux to check if the CPU supports :code:`AVX`.
.. code-block:: base
.. code-block:: bash
docker build -t paddle:cpu -f paddle/scripts/docker/Dockerfile .
docker build -t paddle:gpu -f paddle/scripts/docker/Dockerfile.gpu .
if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
On Mac OS X, we need to run
To run the CPU-only image as an interactive container:
.. code-block:: bash
sysctl -a | grep machdep.cpu.leaf7_features
docker run -it --rm paddledev/paddle:cpu-latest /bin/bash
or, we can run it as a daemon container
.. code-block:: bash
docker run -d -p 2202:22 paddledev/paddle:cpu-latest
Once we determine the proper variant, we can cope with the Docker image tag name by appending the version number. For example, the following command runs the AVX-enabled image of the most recent version:
and SSH to this container using password :code:`root`:
.. code-block:: bash
docker run -it --rm paddledev/paddle:cpu-latest /bin/bash
ssh -p 2202 root@localhost
An advantage of using SSH is that we can connect to PaddlePaddle from
more than one terminals. For example, one terminal running vi and
another one running Python interpreter. Another advantage is that we
can run the PaddlePaddle container on a remote server and SSH to it
from a laptop.
To run a GPU-enabled image, you need to install CUDA and let Docker knows about it:
Above methods work with the GPU image too -- just please don't forget
to install CUDA driver and let Docker knows about it:
.. code-block:: bash
......@@ -47,35 +57,49 @@ To run a GPU-enabled image, you need to install CUDA and let Docker knows about
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:gpu-latest
The default entry point of all our Docker images starts the OpenSSH server. To run PaddlePaddle and to expose OpenSSH port to 2202 on the host computer:
Non-AVX Images
--------------
Please be aware that the CPU-only and the GPU images both use the AVX
instruction set, but old computers produced before 2008 do not support
AVX. The following command checks if your Linux computer supports
AVX:
.. code-block:: bash
docker run -d -p 2202:22 paddledev/paddle:cpu-latest
if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
Then we can login to the container using username :code:`root` and password :code:`root`:
If it doesn't, we will need to build non-AVX images manually from
source code:
.. code-block:: bash
ssh -p 2202 root@localhost
cd ~
git clone github.com/PaddlePaddle/Paddle
cd Paddle
docker build --build-arg WITH_AVX=OFF -t paddle:cpu-noavx -f paddle/scripts/docker/Dockerfile .
docker build --build-arg WITH_AVX=OFF -t paddle:gpu-noavx -f paddle/scripts/docker/Dockerfile.gpu .
Build Docker images
-------------------
Documentation
-------------
Developers might want to build Docker images from their local commit or from a tagged version. Suppose that your local repo is at :code:`~/work/Paddle`, the following steps builds a cpu variant from your current work:
Paddle Docker images include an HTML version of C++ source code
generated using `woboq code browser
<https://github.com/woboq/woboq_codebrowser>`_. This makes it easy
for users to browse and understand the C++ source code.
.. code-block:: bash
As long as we give the Paddle Docker container a name, we can run an
additional nginx Docker container to serve the volume from the Paddle
container:
cd ~/Paddle
./paddle/scripts/docker/generates.sh # Use m4 to generate Dockerfiles for each variant.
docker build -t paddle:latest -f ./paddle/scripts/docker/Dockerfile.cpu
.. code-block:: bash
As a release engineer, you might want to build Docker images for a certain version and publish them to dockerhub.com. You can do this by switching to the right Git tag, or create a new tag, before running `docker build`. For example, the following commands build Docker images for v0.9.0:
docker run -d --name paddle-cpu-doc paddle:cpu
docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx
.. code-block:: bash
cd ~/Paddle
git checkout tags/v0.9.0
./paddle/scripts/docker/generates.sh # Use m4 to generate Dockerfiles for each variant.
docker build -t paddle:cpu-v0.9.0 -f ./paddle/scripts/docker/Dockerfile.cpu
Then we can direct our Web browser to the HTML version of source code
at http://localhost:8088/paddle/
......@@ -143,7 +143,7 @@ It looks like there are a lot of arguments. However, most of them are for develo
</tr>
<tr>
<td class="left" rowspan = "2">testing during training</td><td class="left">test_all_data_in_one_period</td>
<td class="left" rowspan = "2">testing during training</td><td class="left">test_period</td>
<td class="left"></td><td class="left"></td><td class="left"></td><td class="left"></td>
</tr>
......
......@@ -31,7 +31,7 @@
- type: string (default: null).
* `--version`
- Whether to print version infomatrion.
- Whether to print version information.
- type: bool (default: 0).
* `--show_layer_stat`
......@@ -110,8 +110,8 @@
- type: int32 (default: -1).
* `--test_period`
- Run testing every test_period train batches. If not set, run testing each pass.
- type: int32 (default: 1000).
- if equal 0, do test on all test data at the end of each pass. While if equal non-zero, do test on all test data every test_period batches.
- type: int32 (default: 0).
* `--test_wait`
- Whether to wait for parameter per pass if not exist. If set test_data_path in submitting environment of cluster, it will launch one process to perfom testing, so we need to set test_wait=1. Note that in the cluster submitting environment, this argument has been set True by default.
......@@ -121,10 +121,6 @@
- File that saves the model list when testing. It was set automatically when using cluster submitting environment after setting model_path.
- type: string (default: "", null).
* `--test_all_data_in_one_period`
- This argument is usually used in testing period during traning. If true, all data will be tested in one test period. Otherwise (batch_size * log_peroid) data will be tested.
- type: bool (default: 0).
* `--predict_output_dir`
- Directory that saves the layer output. It is configured in Outputs() in network config. Default, this argument is null, meaning save nothing. Specify this directory if you want to save feature map of some layers in testing mode. Note that, layer outputs are values after activation function.
- type: string (default: "", null).
......
......@@ -10,9 +10,8 @@ paddle train \
--config=network_config \
--save_dir=output \
--trainer_count=COUNT \ #(default:1)
--test_period=M \ #(default:1000)
--test_all_data_in_one_period=true \ #(default:false)
--num_passes=N \ #(defalut:100)
--test_period=M \ #(default:0)
--num_passes=N \ #(defalut:100)
--log_period=K \ #(default:100)
--dot_period=1000 \ #(default:1)
#[--show_parameter_stats_period=100] \ #(default:0)
......
......@@ -22,7 +22,7 @@ AutoStructify = transform.AutoStructify
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, '@PROJ_ROOT@/python')
templates_path = ["@PROJ_ROOT@/doc/templates"]
templates_path = ["@PROJ_ROOT@/doc_theme/templates"]
# -- General configuration ------------------------------------------------
......@@ -112,12 +112,12 @@ todo_include_todos = False
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#html_theme = 'sphinx_rtd_theme' # sphinx_rtd_theme will cause table bad style
html_theme = 'classic'
html_theme = 'sphinx_rtd_theme'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ['@PROJ_ROOT@/doc_theme/static']
# Output file base name for HTML help builder.
htmlhelp_basename = project + 'doc'
......
......@@ -202,3 +202,53 @@ PaddlePaddle的参数使用名字 :code:`name` 作为参数的ID,相同名字
解决办法是:
* 卸载PaddlePaddle包 :code:`pip uninstall paddle`, 清理掉老旧的PaddlePaddle安装包,使得单元测试有一个干净的环境。如果PaddlePaddle包已经在python的site-packages里面,单元测试会引用site-packages里面的python包,而不是源码目录里 :code:`/python` 目录下的python包。同时,即便设置 :code:`PYTHONPATH` 到 :code:`/python` 也没用,因为python的搜索路径是优先已经安装的python包。
9. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致
----------------------------------------------------------
这是目前CMake寻找Python的逻辑存在缺陷,如果系统安装了多个Python版本,CMake找到的Python库和Python解释器版本可能有不一致现象,导致编译PaddlePaddle失败。正确的解决方法是,
用户强制指定特定的Python版本,具体操作如下:
.. code-block:: bash
cmake .. -DPYTHON_EXECUTABLE=<exc_path> -DPYTHON_LIBRARY=<lib_path> -DPYTHON_INCLUDE_DIR=<inc_path>
用户需要指定本机上Python的路径:``<exc_path>``, ``<lib_path>``, ``<inc_path>``
10. A protocol message was rejected because it was too big
----------------------------------------------------------
如果在训练NLP相关模型时,出现以下错误:
.. code-block:: bash
[libprotobuf ERROR google/protobuf/io/coded_stream.cc:171] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
F1205 14:59:50.295174 14703 TrainerConfigHelper.cpp:59] Check failed: m->conf.ParseFromString(configProtoStr)
可能的原因是:传给dataprovider的某一个args过大,一般是由于直接传递大字典导致的。错误的define_py_data_sources2类似:
.. code-block:: python
src_dict = dict()
for line_count, line in enumerate(open(src_dict_path, "r")):
src_dict[line.strip()] = line_count
define_py_data_sources2(
train_list,
test_list,
module="dataprovider",
obj="process",
args={"src_dict": src_dict})
解决方案是:将字典的地址作为args传给dataprovider,然后在dataprovider里面根据该地址加载字典。即define_py_data_sources2应改为:
.. code-block:: python
define_py_data_sources2(
train_list,
test_list,
module="dataprovider",
obj="process",
args={"src_dict_path": src_dict_path})
完整源码可参考 `seqToseq <https://github.com/PaddlePaddle/Paddle/tree/develop/demo/seqToseq>`_ 示例。
\ No newline at end of file
......@@ -18,13 +18,13 @@ PaddlePaddle的Docker Image构建源码放置在 ``${源码根目录}/paddle/scr
.. code-block:: bash
cd ${源码根目录}/paddle/scripts/docker/
docker build --build-arg LOWEST_DL_SPEED=50K\
docker build --build-arg LOWEST_DL_SPEED=50K \
--build-arg WITH_GPU=ON \
--tag paddle_gpu:latest .
其中,``--build-arg`` 传入的配置参数包括:
- LOWEST\_DL\_SPEED\: 在多线程下载过程中,设置下线线程的最低速度。
- LOWEST\_DL\_SPEED\: 在多线程下载过程中,设置下线程的最低速度。
- 默认单位是Bytes,但可以传入10K、10M、或10G等这样的单位。
- 如果小于这个速度,那么这个线程将会关闭。当所有的线程都关闭了,那么下载进程将会重启。
......
body {
padding-top: 80px;
background-image: none !important;
font-family: Roboto;
}
a, a:focus, a:hover, a:visited {
color: #597cf1;
}
.site-header {
position: fixed;
top: 0;
width: 100%;
left: 0;
z-index: 99;
background: #333;
height: 80px;
display: -webkit-flex;
display: -ms-flex;
display: -o-flex;
display: flex;
flex-flow: row nowrap;
justify-content: space-between;
box-shadow: #ccc 0 3px 3px;
}
.site-header > div {
height: 80px;
display: inline-block;
background-color: #2f323a;
padding: 0 30px;
}
.site-header .site-logo {
line-height: 80px;
width: 290px;
flex: 0 1 290px;
}
.site-header .site-logo > a {
display: inline-block;
width: 230px;
}
.site-header .site-nav-links {
flex: 0 1 100%;
}
.site-header .site-nav-links .site-menu {
height: 30px;
line-height: 30px;
font-size: 12px;
background: -webkit-linear-gradient(#282b33, #2f323a);
background: -o-linear-gradient(#282b33, #2f323a);
background: -moz-linear-gradient(#282b33, #2f323a);
background: linear-gradient(to left, #282b33, #2f323a);
margin-right: -30px;
padding-right: 30px;
}
.site-header .site-nav-links .site-menu .site-page-links {
display: inline-block;
float: right;
margin-right: 20px;
}
.site-header .site-nav-links .site-menu .site-page-links> li {
display: inline-block;
float: left;
}
.site-header .site-nav-links .site-menu .site-page-links > li > a {
color: #a7adbd;
display: inline-block;
height: 30px;
padding: 0 20px;
font-size: 12px;
}
.site-header .site-nav-links .site-menu .site-page-links > li:hover > a,
.site-header .site-nav-links .site-menu .site-page-links > li.active > a {
background-color: #2f323a;
color: #bcc1d0;
}
.site-header .site-nav-links .site-menu .site-page-links > li.active > a {
font-weight: bold;
}
.site-header .site-nav-links .site-menu .fork-on-github {
color: #597cf1;
line-height: 30px;
display: inline-block;
padding: 0 0 0 20px;
float: right;
position: relative;
}
.site-header .site-nav-links .site-menu .fork-on-github .fa {
margin-right: 5px;
font-size: 16px;
vertical-align: middle;
}
.site-header .site-nav-links .site-menu .language-switcher {
height: 30px;
display: inline-block;
float: right;
line-height: 30px;
padding: 0 20px;
position: relative;
}
.site-header .site-nav-links .site-menu .language-switcher > a {
color: #a7adbd;
}
.site-header .site-nav-links .site-menu .language-switcher.open > a {
background-color: #24272f;
color: #bcc1d0;
}
.site-header .site-nav-links .site-menu .language-switcher .fa {
margin-left: 5px;
}
.site-header .site-nav-links .site-menu .language-switcher .fa-angle-down {
display: inline;
}
.site-header .site-nav-links .site-menu .language-switcher.open .fa-angle-down {
display: none;
}
.site-header .site-nav-links .site-menu .language-switcher .fa-angle-up {
display: none;
}
.site-header .site-nav-links .site-menu .language-switcher.open .fa-angle-up {
display: inline;
}
.site-header .site-nav-links .site-menu .fork-on-github:before,
.site-header .site-nav-links .site-menu .language-switcher:before {
width: 1px;
height: 12px;
top: 9px;
background-color: #3a3d47;
left: 0;
display: inline-block;
position: absolute;
content: "";
}
.site-header .site-nav-links .site-menu .language-switcher .dropdown-menu {
display: none;
position: absolute;
box-shadow: #ccc 0 0 5px;
background-color: #fff;
width: 100%;
left: 0;
top: 30px;
}
.site-header .site-nav-links .site-menu .language-switcher .dropdown-menu > li {
line-height: 30px;
padding: 0 20px;
}
.site-header .site-nav-links .site-menu .language-switcher .dropdown-menu > li:hover {
background-color: #f7f8fe;
}
.site-header .site-nav-links .site-menu .language-switcher .dropdown-menu > li + li {
border-top: 1px solid #dedfe5;
}
.site-header .site-nav-links .site-menu .language-switcher .dropdown-menu > li > a {
color: #2f323a;
}
.site-header .site-nav-links .site-menu .language-switcher.open .dropdown-menu {
display: inline-block;
}
.site-header .site-nav-links .doc-module {
display: block;
height: 50px;
line-height: 50px;
}
.site-header .site-nav-links .doc-module > ul > li {
display: inline-block;
float: left;
}
.site-header .site-nav-links .doc-module > ul > li > a {
color: #c9cbd0;
font-size: 14px;
display: inline-block;
height: 50px;
line-height: 50px;
border-bottom: 2px solid transparent;
padding: 0 20px;
}
.site-header .site-nav-links .doc-module > ul > li:hover > a {
color: #fff;
}
.site-header .site-nav-links .doc-module > ul > li.current > a {
border-bottom-color: #fff;
color: #fff;
}
.site-header .site-nav-links .doc-module [role="search"]{
float: right;
}
.site-header .site-nav-links .doc-module [role="search"] input {
background-color: #3a3d47;
border-radius: 15px;
color: #a7adbd;
border: 1px solid transparent;
padding: 6px 15px;
width: 180px;
box-shadow: none;
transition: all .2s;
-webkit-transition: all .2s;
-moz-transition: all .2s;
-o-transition: all .2s;
background-repeat: no-repeat;
background-position: 145px center;
background-image: url("");
}
.site-header .site-nav-links .doc-module [role="search"] input:focus {
width: 300px;
}
.site-header .site-nav-links .doc-module [role="search"] input:focus {
background-position: 265px center;
}
.site-header .site-nav-links .doc-module [role="search"] input:hover,
.site-header .site-nav-links .doc-module [role="search"] input:focus {
color: #fff;
border-color: #597cf1;
background-image: url("");
}
.doc-menu-vertical {
display: inline-block;
float: left;
width: 240px;
height: 100%;
background-color: #ecedee;
position: absolute;
left: 0;
top: 0;
overflow: hidden;
padding: 0;
border-right: 1px solid #dddfe3;
}
.doc-menu-vertical > ul {
display: none;
}
.doc-menu-vertical > ul.current{
display: block;
}
.doc-menu-vertical > ul.current > li.toctree-l1 {
display: none;
}
.doc-menu-vertical > ul.current > li.toctree-l1.current {
display: block;
}
.doc-menu-vertical > ul.current > li.toctree-l1.current > a {
display: none;
}
.doc-menu-vertical .toctree-l2 a {
width: 100%;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
padding-right: 30px;
}
.doc-menu-vertical .toctree-l2 > a {
font-size: 14px;
color: #2f323a;
padding-left: 30px;
line-height: 50px;
display: block;
font-weight: bold;
border-bottom: 1px solid #dddfe3;
}
.doc-menu-vertical .toctree-l2.has-child > a:after {
font-family: "FontAwesome";
display: inline-block;
font-style: normal;
font-weight: normal;
text-decoration: inherit;
content: "";
float: right;
line-height: 50px;
color: #a7adbd;
position: absolute;
right: 15px;
}
.doc-menu-vertical .toctree-l2.has-child.current > a:after {
content: "";
}
.doc-menu-vertical .toctree-l2 > a + ul{
background-color: #e4e6e9;
height: 0;
overflow: hidden;
}
.doc-menu-vertical .toctree-l2.current > a + ul {
border-bottom: 1px solid #dddfe3;
height: auto;
}
.doc-menu-vertical .toctree-l2 li.active > a {
background-color: #597cf1;
color: #fff;
}
.doc-menu-vertical .toctree-l3 > a {
font-size: 12px;
color: #2f323a;
padding-left: 30px;
line-height: 40px;
display: block;
}
.doc-menu-vertical .toctree-l4 > a {
font-size: 12px;
color: #64697b;
padding-left: 50px;
line-height: 30px;
display: block;
}
.doc-menu-vertical .toctree-l5 > a {
font-size: 14px;
color: #ccc;
padding-left: 40px;
display: block;
}
.local-toc {
position: absolute;
height: 100%;
background-color: #f6f7f8;
top: 0;
left: 240px;
padding: 0;
z-index: 9;
}
.local-toc:after {
content: "";
position: absolute;
height: 100%;
width: 1px;
display: inline-block;
right: 0;
background-color: #dddfe3;
top: 0;
z-index: -1;
}
.local-toc:hover a {
width: auto;
}
.local-toc > ul > li a {
position: relative;
font-size: 12px;
overflow: hidden;
display: none;
}
.local-toc > ul > li > ul > li a {
display: block;
border-top: 1px solid transparent;
border-bottom: 1px solid transparent;
padding-right: 20px;
width: 50px;
}
.local-toc > ul > li > ul > li > ul > li > ul a {
display: none;
}
.local-toc > ul > li > ul li > a:after {
content: "";
display: inline-block;
width: 1px;
height: 100%;
background-color: transparent;
position: absolute;
right: 0;
top: 0;
}
.local-toc > ul > li > ul li a:hover{
background-color: #e6eaf7 !important;
}
.local-toc > ul > li > ul li a:hover:after {
background-color: #e6eaf7 !important;
}
.local-toc > ul > li > ul li.active > a {
color: #ff9711;
background-color: #fff;
border-top: 1px solid #dddfe3;
border-bottom: 1px solid #dddfe3;
}
.local-toc > ul > li > ul li.active > a:before {
background-color: #ff9711;
width: 10px;
height: 10px;
margin: 15px 20px;
border-radius: 5px;
}
.local-toc > ul > li > ul li.active > a:after {
background-color: #fff;
}
.local-toc > ul > li > ul > li {
position: relative;
line-height: 40px;
white-space: nowrap;
}
.local-toc > ul > li > ul > li > a {
color: #64697b;
}
.local-toc > ul > li > ul > li > a + ul {
display: none;
}
.local-toc > ul > li > ul > li > a:before {
display: inline-block;
content: "";
width: 6px;
height: 6px;
background-color: #ccc;
border-radius: 3px;
margin: 17px 22px;
float: left;
}
.local-toc > ul > li > ul > li > ul > li > a {
color: #a7adbd;
}
.local-toc > ul > li > ul > li > ul > li > a:before {
display: inline-block;
content: "";
width: 6px;
height: 6px;
background-color: #ccc;
border-radius: 3px;
margin: 17px 22px;
float: left;
}
.main-content-wrap {
position: absolute;
width: 100%;
top: 80px;
bottom: 0;
overflow: auto;
background-color: #f6f7f8;
}
.doc-content-wrap {
margin-left: 290px;
height: 100%;
position: relative;
padding-top: 60px;
background-color: #fff;
}
.doc-content-wrap > div[role='navigation'] {
position: absolute;
top: 0;
width: 100%;
left: 0;
padding: 0 30px;
height: 60px;
}
.wy-breadcrumbs {
line-height: 50px;
height: 60px;
background-image: url("");
background-repeat: repeat no-repeat;
background-position: center 50px;
}
.wy-breadcrumbs > li {
color: #ccc;
}
.wy-breadcrumbs > li a {
color: #ff9711;
padding: 0;
}
.wy-breadcrumbs > li:first-child a {
color: #597cf1;
}
.wy-nav-content{
max-width: none;
overflow: auto;
position: relative;
padding: 30px;
background-color: #fff;
}
.wy-nav-content h1 {
font-size: 24px;
color: #2f323a;
margin-bottom: 30px;
}
.wy-nav-content h2 {
font-size: 20px;
color: #2f323a;
margin-bottom: 30px;
}
.wy-nav-content h3 {
font-size: 18px;
color: #2f323a;
margin-bottom: 30px;
}
.wy-nav-content h4 {
font-size: 16px;
color: #2f323a;
margin-bottom: 30px;
}
.wy-nav-content p + h1,
.wy-nav-content p + h2,
.wy-nav-content p + h3,
.wy-nav-content p + h4 {
margin-top: 20px;
}
.wy-nav-content p{
color: #2f323a;
margin-bottom: 20px;
font-size: 14px;
}
#search-results h2 {
font-size: 24px;
margin: 20px 0 10px 0;
}
#search-results p {
color: #a7adbd;
}
#search-results ul.search > li {
border-bottom: none;
}
#search-results ul.search > li > a {
color: #597cf1;
}
.rst-content .highlighted{
background-color: transparent;
color: #ff9711;
padding: 0;
}
\ No newline at end of file
$(document).ready(function(){
$('.local-toc').on('click' ,'a.reference.internal', function (){
$('.local-toc li.active').removeClass('active');
$(this).parent('li').addClass('active');
});
if ($('.local-toc a:visible').length) {
$('.local-toc > ul').addClass('nav nav-stacked');
$('#doc-content').scrollspy({
target: '.local-toc'
});
$('.local-toc').perfectScrollbar();
} else {
$('.doc-content-wrap').css('margin-left', '-=50px');
$('.local-toc').remove();
}
if (!$('.doc-menu-vertical > ul > li.current > ul').length) {
$('.doc-content-wrap').css('margin-left', '-=240px');
$('.doc-menu-vertical').remove();
$('.local-toc').css('left', '0');
}
$('.doc-menu-vertical .toctree-l2').each(function (i, e){
$(e).toggleClass('has-child', !!$(e).find('ul').length);
});
$('.doc-menu-vertical').find('li.current').last().addClass('active');
$('.doc-menu-vertical').perfectScrollbar();
});
\ No newline at end of file
{# Support for Sphinx 1.3+ page_source_suffix, but don't break old builds. #}
{% if page_source_suffix %}
{% set suffix = page_source_suffix %}
{% else %}
{% set suffix = source_suffix %}
{% endif %}
{% if meta is defined and 'github_url' in meta %}
{% set display_github = True %}
{% endif %}
{% if meta is defined and 'bitbucket_url' in meta %}
{% set display_bitbucket = True %}
{% endif %}
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
{% for doc in parents %}
<li><a href="{{ doc.link|e }}">{{ doc.title }}</a> > </li>
{% endfor %}
<li>{{ title }}</li>
</ul>
</div>
{# TEMPLATE VAR SETTINGS #}
{%- set url_root = pathto('', 1) %}
{%- if url_root == '#' %}{% set url_root = '' %}{% endif %}
{%- if not embedded and docstitle %}
{%- set titlesuffix = " &mdash; "|safe + docstitle|e %}
{%- else %}
{%- set titlesuffix = "" %}
{%- endif %}
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
{{ metatags }}
<meta name="viewport" content="width=device-width, initial-scale=1.0">
{% block htmltitle %}
<title>{{ title|striptags|e }}{{ titlesuffix }}</title>
{% endblock %}
{# FAVICON #}
{% if favicon %}
<link rel="shortcut icon" href="{{ pathto('_static/' + favicon, 1) }}"/>
{% endif %}
{# CSS #}
{# OPENSEARCH #}
{% if not embedded %}
{% if use_opensearch %}
<link rel="search" type="application/opensearchdescription+xml" title="{% trans docstitle=docstitle|e %}Search within {{ docstitle }}{% endtrans %}" href="{{ pathto('_static/opensearch.xml', 1) }}"/>
{% endif %}
{% endif %}
{# RTD hosts this file, so just load on non RTD builds #}
{% if not READTHEDOCS %}
<link rel="stylesheet" href="{{ pathto('_static/' + style, 1) }}" type="text/css" />
{% endif %}
{% for cssfile in css_files %}
<link rel="stylesheet" href="{{ pathto(cssfile, 1) }}" type="text/css" />
{% endfor %}
{% for cssfile in extra_css_files %}
<link rel="stylesheet" href="{{ pathto(cssfile, 1) }}" type="text/css" />
{% endfor %}
{%- block linktags %}
{%- if hasdoc('about') %}
<link rel="author" title="{{ _('About these documents') }}"
href="{{ pathto('about') }}"/>
{%- endif %}
{%- if hasdoc('genindex') %}
<link rel="index" title="{{ _('Index') }}"
href="{{ pathto('genindex') }}"/>
{%- endif %}
{%- if hasdoc('search') %}
<link rel="search" title="{{ _('Search') }}" href="{{ pathto('search') }}"/>
{%- endif %}
{%- if hasdoc('copyright') %}
<link rel="copyright" title="{{ _('Copyright') }}" href="{{ pathto('copyright') }}"/>
{%- endif %}
<link rel="top" title="{{ docstitle|e }}" href="{{ pathto('index') }}"/>
{%- if parents %}
<link rel="up" title="{{ parents[-1].title|striptags|e }}" href="{{ parents[-1].link|e }}"/>
{%- endif %}
{%- if next %}
<link rel="next" title="{{ next.title|striptags|e }}" href="{{ next.link|e }}"/>
{%- endif %}
{%- if prev %}
<link rel="prev" title="{{ prev.title|striptags|e }}" href="{{ prev.link|e }}"/>
{%- endif %}
{%- endblock %}
{%- block extrahead %}
<link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
<link rel="stylesheet" href="{{pathto('_static/css/override.css', 1)}}" type="text/css" />
<script>
var _hmt = _hmt || [];
(function() {
var hm = document.createElement("script");
hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
var s = document.getElementsByTagName("script")[0];
s.parentNode.insertBefore(hm, s);
})();
</script>
{% endblock %}
{# Keep modernizr in head - http://modernizr.com/docs/#installing #}
<script src="{{ pathto('_static/js/modernizr.min.js', 1) }}"></script>
</head>
<body class="wy-body-for-nav" role="document">
{% block extrabody %}
<header class="site-header">
<div class="site-logo">
<a href="/"><img src="{{pathto('_static/images/PP_w.png', 1)}}"></a>
</div>
<div class="site-nav-links">
<div class="site-menu">
<a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Folk me on Github</a>
<div class="language-switcher dropdown">
<a type="button" data-toggle="dropdown">
<span>English</span>
<i class="fa fa-angle-up"></i>
<i class="fa fa-angle-down"></i>
</a>
<ul class="dropdown-menu">
<li><a href="/doc_cn">中文</a></li>
<li><a href="/doc">English</a></li>
</ul>
</div>
<ul class="site-page-links">
<li><a>Home</a></li>
<li><a>Get Started</a></li>
<li class="active"><a>Documentation</a></li>
<li><a>About Us</a></li>
</ul>
</div>
<div class="doc-module">
{%set modules = toctree(maxdepth=0, collapse=False, titles_only=True)%}
{{modules}}
{% include "searchbox.html" %}
</div>
</div>
</header>
{% endblock %}
<div class="main-content-wrap">
{# SIDE NAV, TOGGLES ON MOBILE #}
<nav class="doc-menu-vertical" role="navigation">
{% block menu %}
{% set toctree = toctree(maxdepth=-1, collapse=False,titles_only=True, includehidden=True) %}
{{ toctree }}
{% endblock %}
</nav>
{% if toc %}
<nav class="local-toc">{{ toc }}</nav>
{% endif %}
<section class="doc-content-wrap">
{% include "breadcrumbs.html" %}
{# PAGE CONTENT #}
<div class="wy-nav-content" id="doc-content">
<div class="rst-content">
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
{% block body %}{% endblock %}
</div>
</div>
{% include "footer.html" %}
</div>
</div>
</section>
</div>
{% include "versions.html" %}
{% if not embedded %}
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'{{ url_root }}',
VERSION:'{{ release|e }}',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'{{ '' if no_search_suffix else file_suffix }}',
HAS_SOURCE: {{ has_source|lower }}
};
</script>
{%- for scriptfile in script_files %}
<script type="text/javascript" src="{{ pathto(scriptfile, 1) }}"></script>
{%- endfor %}
{% endif %}
{# RTD hosts this file, so just load on non RTD builds #}
{% if not READTHEDOCS %}
<script type="text/javascript" src="{{ pathto('_static/js/theme.js', 1) }}"></script>
{% endif %}
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
<script src="{{ pathto('_static/js/paddle_doc_init.js', 1) }}"></script>
{%- block footer %} {% endblock %}
</body>
</html>
{#
basic/search.html
~~~~~~~~~~~~~~~~~
Template for the search page.
:copyright: Copyright 2007-2013 by the Sphinx team, see AUTHORS.
:license: BSD, see LICENSE for details.
#}
{%- extends "layout.html" %}
{% set title = _('Search') %}
{% set script_files = script_files + ['_static/searchtools.js'] %}
{% block footer %}
<script type="text/javascript">
jQuery(function() { Search.loadIndex("{{ pathto('searchindex.js', 1) }}"); });
jQuery('.doc-content-wrap > div[role="navigation"]').remove();
jQuery('.doc-content-wrap').css('padding-top', 0);
</script>
{# this is used when loading the search index using $.ajax fails,
such as on Chrome for documents on localhost #}
<script type="text/javascript" id="searchindexloader"></script>
{{ super() }}
{% endblock %}
{% block body %}
<noscript>
<div id="fallback" class="admonition warning">
<p class="last">
{% trans %}Please activate JavaScript to enable the search
functionality.{% endtrans %}
</p>
</div>
</noscript>
{% if search_performed %}
<h2>{{ _('Search Results') }}</h2>
{% if not search_results %}
<p>{{ _('Your search did not match any documents. Please make sure that all words are spelled correctly and that you\'ve selected enough categories.') }}</p>
{% endif %}
{% endif %}
<div id="search-results">
{% if search_results %}
<ul>
{% for href, caption, context in search_results %}
<li>
<a href="{{ pathto(item.href) }}">{{ caption }}</a>
<p class="context">{{ context|e }}</p>
</li>
{% endfor %}
</ul>
{% endif %}
</div>
{% endblock %}
......@@ -60,14 +60,12 @@ bool BatchNormBaseLayer::init(const LayerMap& layerMap,
void BatchNormBaseLayer::calFeatureMapSize() {
const ImageConfig& conf = config_.inputs(0).image_conf();
if (inputLayers_[0]->getOutput().getFrameHeight() == 0 &&
inputLayers_[0]->getOutput().getFrameWidth() == 0) {
imgSize_ = conf.img_size();
imageH_ = imgSize_;
imageW_ = imgSize_;
} else {
imageH_ = inputLayers_[0]->getOutput().getFrameHeight();
imageW_ = inputLayers_[0]->getOutput().getFrameWidth();
if (imageH_ == 0 && imageW_ == 0) {
imageH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
imageW_ = conf.img_size();
} else {
getOutput().setFrameHeight(imageH_);
getOutput().setFrameWidth(imageW_);
}
......
......@@ -77,9 +77,8 @@ protected:
MatrixPtr savedMean_;
MatrixPtr savedInvVar_;
/// Height or width of input image feature, now height is equal to width.
/// imgSize is 1 if the input is fully-connected layer.
int imgSize_;
/// Height or width of input image feature.
/// Both of them are 1 if the input is fully-connected layer.
int imageH_;
int imageW_;
/// Height * Width.
......
......@@ -26,15 +26,15 @@ size_t BilinearInterpLayer::getSize() {
const BilinearInterpConfig& conf = config_.inputs(0).bilinear_interp_conf();
if (inImgH_ == 0) {
inImgH_ = conf.img_size_y();
inImgH_ = conf.image_conf().img_size_y();
}
if (inImgW_ == 0) {
inImgW_ = conf.img_size_x();
inImgW_ = conf.image_conf().img_size();
}
outImgH_ = conf.out_size_y();
outImgW_ = conf.out_size_x();
numChannels_ = conf.num_channels();
numChannels_ = conf.image_conf().channels();
CHECK(outImgH_ > 0 && outImgW_ > 0);
CHECK(inImgH_ > 0 && inImgW_ > 0);
......
......@@ -38,11 +38,12 @@ bool ConvBaseLayer::init(const LayerMap& layerMap,
filterSizeY_.push_back(conf.filter_size_y());
filterPixels_.push_back(filterSize_.back() * filterSizeY_.back());
channels_.push_back(conf.channels());
imgSizeH_.push_back(conf.img_size());
imgSizeH_.push_back(conf.has_img_size_y() ? conf.img_size_y()
: conf.img_size());
imgSizeW_.push_back(conf.img_size());
groups_.push_back(conf.groups());
filterChannels_.push_back(conf.filter_channels());
outputH_.push_back(conf.output_x());
outputH_.push_back(conf.has_output_y() ? conf.output_y() : conf.output_x());
outputW_.push_back(conf.output_x());
}
......@@ -91,16 +92,19 @@ size_t ConvBaseLayer::calOutputSize() {
for (size_t i = 0; i < inputLayers_.size(); i++) {
inH.push_back(inputLayers_[i]->getOutput().getFrameHeight());
inW.push_back(inputLayers_[i]->getOutput().getFrameWidth());
const ConvConfig& conf = config_.inputs(i).conv_conf();
if (isDeconv_) {
if (inH[i] == 0) inH[i] = config_.inputs(i).conv_conf().output_x();
if (inW[i] == 0) inW[i] = config_.inputs(i).conv_conf().output_x();
if (inH[i] == 0)
inH[i] = conf.has_output_y() ? conf.output_y() : conf.output_x();
if (inW[i] == 0) inW[i] = conf.output_x();
outH.push_back(imageSize(
inH[i], filterSizeY_[i], paddingY_[i], strideY_[i], caffeMode_));
outW.push_back(imageSize(
inW[i], filterSize_[i], padding_[i], stride_[i], caffeMode_));
} else {
if (inH[i] == 0) inH[i] = config_.inputs(i).conv_conf().img_size();
if (inW[i] == 0) inW[i] = config_.inputs(i).conv_conf().img_size();
if (inH[i] == 0)
inH[i] = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
if (inW[i] == 0) inW[i] = conf.img_size();
outH.push_back(outputSize(
inH[i], filterSizeY_[i], paddingY_[i], strideY_[i], caffeMode_));
outW.push_back(outputSize(
......
......@@ -93,9 +93,9 @@ private:
bool caffeMode_;
int inputOffset_, outputOffset_, weightOffset_;
int numFilters_;
int padding_, stride_, filterSize_, channels_, imgSize_;
int padding_, stride_, filterSize_, channels_, imgSize_, imgSizeY_;
int paddingY_, strideY_, filterSizeY_;
int imgPixels_, filterPixels_, filterChannels_, outputX_, outputs_;
int imgPixels_, filterPixels_, filterChannels_, outputX_, outputY_, outputs_;
/// Following member variables are same with CudnnConvLayer.
/// There is no explanation here.
......@@ -144,7 +144,7 @@ void ConvOperator::allocConvWorkSpace(size_t maxWorkSpace) {
void ConvOperator::reshape(int batchSize) {
imageH_ = ins_[0]->getFrameHeight();
imageW_ = ins_[0]->getFrameWidth();
if (imageH_ == 0) imageH_ = imgSize_;
if (imageH_ == 0) imageH_ = imgSizeY_;
if (imageW_ == 0) imageW_ = imgSize_;
outputH_ = outputSize(imageH_, filterSizeY_, paddingY_, strideY_, caffeMode_);
outputW_ = outputSize(imageW_, filterSize_, padding_, stride_, caffeMode_);
......@@ -182,7 +182,10 @@ void ConvOperator::computeConvSizes() {
hl_create_tensor_descriptor(&inputDesc_);
int outputX =
outputSize(imgSize_, filterSize_, padding_, stride_, caffeMode_);
int outputY =
outputSize(imgSizeY_, filterSizeY_, paddingY_, strideY_, caffeMode_);
CHECK_EQ(outputX, outputX_);
CHECK_EQ(outputY, outputY_);
hl_create_tensor_descriptor(&outputDesc_);
hl_create_convolution_descriptor(&convDesc_,
inputDesc_,
......@@ -236,10 +239,12 @@ void ConvOperator::getConvParams() {
filterPixels_ = filterSize_ * filterSizeY_;
channels_ = conf.channels();
imgSize_ = conf.img_size();
imgPixels_ = imgSize_ * imgSize_;
imgSizeY_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
imgPixels_ = imgSize_ * imgSizeY_;
CHECK_EQ(conf.groups(), 1U);
filterChannels_ = conf.filter_channels();
outputX_ = conf.output_x();
outputY_ = conf.has_output_y() ? conf.output_y() : conf.output_x();
outputs_ = outputX_ * outputX_;
}
......
......@@ -46,7 +46,7 @@ void ConvProjection::getConvParams() {
filterH_ = conf.filter_size_y();
filterW_ = conf.filter_size();
configImgH_ = conf.img_size();
configImgH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
configImgW_ = conf.img_size();
channels_ = conf.channels();
......@@ -58,9 +58,11 @@ void ConvProjection::getConvParams() {
}
void ConvProjection::initCudnn() {
hl_create_filter_descriptor(
&filterDesc_, channels_ / groups_, numFilters_ / groups_,
filterH_, filterW_);
hl_create_filter_descriptor(&filterDesc_,
channels_ / groups_,
numFilters_ / groups_,
filterH_,
filterW_);
hl_create_tensor_descriptor(&inputDesc_);
hl_create_tensor_descriptor(&outputDesc_);
hl_create_convolution_descriptor(&convDesc_,
......
......@@ -49,8 +49,13 @@ void DataLayer::copyDataToOutput(Argument& output) {
output.ids->copyFrom(*data_.ids);
}
}
if (config_.height() && config_.width()) {
output.setFrameHeight(config_.height());
output.setFrameWidth(config_.width());
} else {
output.setFrameHeight(data_.getFrameHeight());
output.setFrameHeight(data_.getFrameHeight());
output.setFrameWidth(data_.getFrameWidth());
}
output.cpuSequenceDims = data_.cpuSequenceDims;
output.sequenceStartPositions = data_.sequenceStartPositions;
output.subSequenceStartPositions = data_.subSequenceStartPositions;
......
......@@ -29,16 +29,18 @@ bool ExpandConvBaseLayer::init(const LayerMap &layerMap,
* meaning as in conv, we need to swap channels_ and numFilters here for
* convTrans, and in other functions too.
* */
int channel;
int numFilters;
/* Initialize the projection */
for (auto &inputConfig : config_.inputs()) {
const ConvConfig &conf = inputConfig.conv_conf();
numFilters = isDeconv_ ? conf.channels() : numFilters_;
int numFilters = isDeconv_ ? conf.channels() : numFilters_;
subM_.push_back(numFilters / conf.groups());
subN_.push_back(conf.output_x() * conf.output_x());
channel = isDeconv_ ? numFilters_ : conf.channels();
subK_.push_back(channel * conf.filter_size() * conf.filter_size() /
subN_.push_back(conf.output_x() *
(conf.has_output_y() ? conf.output_y() : conf.output_x()));
int channel = isDeconv_ ? numFilters_ : conf.channels();
subK_.push_back(
channel * conf.filter_size() *
(conf.has_filter_size_y() ? conf.filter_size_y() : conf.filter_size()) /
conf.groups());
/* Consistent caffe mode for multiple input */
caffeMode_ = conf.caffe_mode();
......@@ -116,11 +118,11 @@ void ExpandConvBaseLayer::expandOneFrame(MatrixPtr image,
imgSizeH_[inIdx],
imgSizeW_[inIdx],
channel,
filterSizeY_[inIdx],
filterSize_[inIdx],
filterSize_[inIdx],
strideY_[inIdx],
stride_[inIdx],
stride_[inIdx],
padding_[inIdx],
paddingY_[inIdx],
padding_[inIdx],
outputH_[inIdx],
outputW_[inIdx]);
......@@ -208,11 +210,11 @@ void ExpandConvBaseLayer::bpropActs(MatrixPtr out,
imgSizeH_[inpIdx],
imgSizeW_[inpIdx],
channel,
filterSizeY_[inpIdx],
filterSize_[inpIdx],
filterSize_[inpIdx],
stride_[inpIdx],
strideY_[inpIdx],
stride_[inpIdx],
padding_[inpIdx],
paddingY_[inpIdx],
padding_[inpIdx],
outputH_[inpIdx],
outputW_[inpIdx],
......
......@@ -25,10 +25,10 @@ size_t MaxOutLayer::getSize() {
imgSizeH_ = inputLayers_[0]->getOutput().getFrameHeight();
imgSizeW_ = inputLayers_[0]->getOutput().getFrameWidth();
if (imgSizeH_ == 0) {
imgSizeH_ = maxoutConf.img_size_y();
imgSizeH_ = maxoutConf.image_conf().img_size_y();
}
if (imgSizeW_ == 0) {
imgSizeW_ = maxoutConf.img_size_x();
imgSizeW_ = maxoutConf.image_conf().img_size();
}
featLen_ = imgSizeH_ * imgSizeW_;
......@@ -50,7 +50,7 @@ bool MaxOutLayer::init(const LayerMap& layerMap,
const MaxOutConfig& conf = config_.inputs(0).maxout_conf();
groups_ = conf.groups();
channels_ = conf.channels();
channels_ = conf.image_conf().channels();
CHECK_EQ(channels_ % groups_, 0UL);
outputChannels_ = channels_ / groups_;
......
......@@ -48,6 +48,9 @@ bool ResponseNormLayer::init(const LayerMap& layerMap,
outputX_ = conf.output_x();
imgSize_ = conf.img_size();
denoms_ = NULL;
outputY_ = conf.has_output_y() ? conf.output_y() : conf.output_x();
imgSizeY_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
return true;
}
......
......@@ -49,7 +49,7 @@ public:
*/
class ResponseNormLayer : public NormLayer {
protected:
size_t channels_, size_, outputX_, imgSize_;
size_t channels_, size_, outputX_, imgSize_, outputY_, imgSizeY_;
float scale_, pow_;
MatrixPtr denoms_;
......
......@@ -23,7 +23,7 @@ size_t CMRProjectionNormLayer::getSize() {
imgSizeH_ = inputLayers_[0]->getOutput().getFrameHeight();
imgSizeW_ = inputLayers_[0]->getOutput().getFrameWidth();
if (imgSizeH_ == 0) {
imgSizeH_ = imgSize_;
imgSizeH_ = imgSizeY_;
}
if (imgSizeW_ == 0) {
imgSizeW_ = imgSize_;
......
......@@ -56,14 +56,14 @@ ProjectionConfig SpatialPyramidPoolLayer::getConfig(size_t imgSizeW,
size_t SpatialPyramidPoolLayer::getSize() {
CHECK_EQ(inputLayers_.size(), 1UL);
size_t layerSize = 0;
const SppConfig& sppConf = config_.inputs(0).spp_conf();
const ImageConfig& conf = config_.inputs(0).spp_conf().image_conf();
imgSizeH_ = inputLayers_[0]->getOutput().getFrameHeight();
imgSizeW_ = inputLayers_[0]->getOutput().getFrameWidth();
if (imgSizeH_ == 0) {
imgSizeH_ = sppConf.has_img_size_y() ? sppConf.img_size_y() : imgSizeW_;
imgSizeH_ = conf.has_img_size_y() ? conf.img_size_y() : conf.img_size();
}
if (imgSizeW_ == 0) {
imgSizeW_ = sppConf.img_size();
imgSizeW_ = conf.img_size();
}
size_t outputH = 1;
......@@ -82,9 +82,10 @@ bool SpatialPyramidPoolLayer::init(const LayerMap& layerMap,
pyramidHeight_ = sppConf.pyramid_height();
poolType_ = sppConf.pool_type();
channels_ = sppConf.channels();
imgSizeW_ = sppConf.img_size();
imgSizeH_ = sppConf.has_img_size_y() ? sppConf.img_size_y() : imgSizeW_;
const ImageConfig& imageConf = sppConf.image_conf();
channels_ = imageConf.channels();
imgSizeW_ = imageConf.img_size();
imgSizeH_ = imageConf.has_img_size_y() ? imageConf.img_size_y() : imgSizeW_;
poolProjections_.reserve(pyramidHeight_);
projCol_.reserve(pyramidHeight_);
projOutput_.resize(pyramidHeight_);
......
......@@ -28,7 +28,6 @@ maxpool = img_pool_layer(input=conv,
stride_y=2,
padding=1,
padding_y=2,
img_width=16,
pool_type=MaxPooling(),
)
avgpool = img_pool_layer(input=conv,
......@@ -39,7 +38,6 @@ avgpool = img_pool_layer(input=conv,
stride_y=2,
padding=1,
padding_y=2,
img_width=16,
pool_type=AvgPooling(),
)
......
......@@ -202,11 +202,10 @@ void testProjectionConv(size_t groups) {
conf.set_input_size(IMAGE_SIZE * IMAGE_SIZE * CHANNELS);
conf.set_output_size(output_x * output_y * NUM_FILTERS);
testProjectionGrad(
conf,
testProjectionGrad(conf,
INPUT_DATA,
/* parameterSize */ NUM_FILTERS * CHANNELS * FILTER_SIZE * FILTER_SIZE_Y
/ groups,
/* parameterSize */ NUM_FILTERS * CHANNELS * FILTER_SIZE *
FILTER_SIZE_Y / groups,
/* batchSize */ 100,
true,
false,
......@@ -229,9 +228,10 @@ TEST(Layer, BilinearInterpLayer) {
LayerInputConfig* input = config.layerConfig.add_inputs();
BilinearInterpConfig* bilinear = input->mutable_bilinear_interp_conf();
bilinear->set_img_size_x(32);
bilinear->set_img_size_y(32);
bilinear->set_num_channels(4);
ImageConfig* image = bilinear->mutable_image_conf();
image->set_img_size(32);
image->set_img_size_y(32);
image->set_channels(4);
for (auto useGpu : {false, true}) {
for (auto outSize : {32, 64}) {
......@@ -354,7 +354,7 @@ void testConvLayer(const string& type, bool trans, bool useGpu) {
config.layerConfig.set_partial_sum(1);
config.layerConfig.set_shared_biases(true);
config.inputDefs.push_back({INPUT_DATA, "layer_0", 768, 288});
config.inputDefs.push_back({INPUT_DATA, "layer_0", 384, 288});
LayerInputConfig* input = config.layerConfig.add_inputs();
ConvConfig* conv = input->mutable_conv_conf();
conv->set_filter_size(2);
......@@ -367,12 +367,18 @@ void testConvLayer(const string& type, bool trans, bool useGpu) {
conv->set_groups(1);
conv->set_filter_channels(conv->channels() / conv->groups());
conv->set_img_size(16);
conv->set_img_size_y(8);
conv->set_output_x(outputSize(conv->img_size(),
conv->filter_size(),
conv->padding(),
conv->stride(),
/* caffeMode */ true));
config.layerConfig.set_size(conv->output_x() * conv->output_x() *
conv->set_output_y(outputSize(conv->img_size_y(),
conv->filter_size_y(),
conv->padding_y(),
conv->stride_y(),
/* caffeMode */ true));
config.layerConfig.set_size(conv->output_x() * conv->output_y() *
config.layerConfig.num_filters());
testLayerGrad(config, "conv", 100, trans, useGpu);
......@@ -472,10 +478,11 @@ TEST(Layer, maxoutLayer) {
config.inputDefs.push_back({INPUT_DATA, "layer_0", 4096, 0});
LayerInputConfig* input = config.layerConfig.add_inputs();
MaxOutConfig* maxout = input->mutable_maxout_conf();
ImageConfig* image = maxout->mutable_image_conf();
maxout->set_img_size_x(32);
maxout->set_img_size_y(32);
maxout->set_channels(4);
image->set_img_size(32);
image->set_img_size_y(32);
image->set_channels(4);
maxout->set_groups(2);
for (auto useGpu : {false, true}) {
......@@ -987,7 +994,7 @@ void testNormLayer(const string& normType, bool trans, bool useGpu) {
config.layerConfig.set_type("norm");
config.layerConfig.set_active_type("relu");
config.inputDefs.push_back({INPUT_DATA, "layer_0", 3136, 0});
config.inputDefs.push_back({INPUT_DATA, "layer_0", 1568, 0});
LayerInputConfig* input = config.layerConfig.add_inputs();
NormConfig* norm = input->mutable_norm_conf();
norm->set_norm_type(normType);
......@@ -997,7 +1004,9 @@ void testNormLayer(const string& normType, bool trans, bool useGpu) {
norm->set_pow(0.75);
norm->set_blocked(0);
norm->set_img_size(14);
norm->set_img_size_y(7);
norm->set_output_x(norm->img_size());
norm->set_output_y(norm->img_size_y());
if (norm->norm_type() == "cmrnorm" ||
norm->norm_type() == "cmrnorm-projection") {
norm->set_scale(norm->scale() / norm->size());
......@@ -1005,7 +1014,7 @@ void testNormLayer(const string& normType, bool trans, bool useGpu) {
norm->set_scale(norm->scale() / (norm->size() * norm->size()));
}
config.layerConfig.set_size(norm->output_x() * norm->output_x() *
config.layerConfig.set_size(norm->output_x() * norm->output_y() *
norm->channels());
config.biasSize = 0;
......@@ -1106,11 +1115,12 @@ void testSppLayer(const string& poolType,
SppConfig* sppConfig = input->mutable_spp_conf();
sppConfig->set_pool_type(poolType);
sppConfig->set_pyramid_height(pyramidHeight);
sppConfig->set_channels(16);
sppConfig->set_img_size(10);
sppConfig->set_img_size_y(20);
ImageConfig* imageConfig = sppConfig->mutable_image_conf();
imageConfig->set_channels(16);
imageConfig->set_img_size(10);
imageConfig->set_img_size_y(20);
int outputSize = (std::pow(4, sppConfig->pyramid_height()) - 1) / (4 - 1);
config.layerConfig.set_size(outputSize * sppConfig->channels());
config.layerConfig.set_size(outputSize * imageConfig->channels());
testLayerGrad(config, "spp", 100, trans, useGpu);
}
......@@ -1420,13 +1430,15 @@ void testBatchNormLayer(const string& type, bool trans, bool useGpu) {
TestConfig config;
const int CHANNELS = 10;
const int IMG_SIZE = 16;
const int IMG_SIZE_Y = 8;
size_t size = CHANNELS * IMG_SIZE * IMG_SIZE_Y;
config.layerConfig.set_type(type);
config.layerConfig.set_size(CHANNELS * IMG_SIZE * IMG_SIZE);
config.layerConfig.set_size(size);
config.layerConfig.set_active_type("sigmoid");
config.biasSize = CHANNELS;
config.inputDefs.push_back({INPUT_DATA,
"layer_0",
/* dim= */ IMG_SIZE * IMG_SIZE * CHANNELS,
/* dim= */ size,
/* paraSize= */ CHANNELS});
config.inputDefs.push_back({INPUT_DATA, "layer_1_running_mean", 1, CHANNELS});
......@@ -1441,6 +1453,7 @@ void testBatchNormLayer(const string& type, bool trans, bool useGpu) {
ImageConfig* img_conf = input->mutable_image_conf();
img_conf->set_channels(CHANNELS);
img_conf->set_img_size(IMG_SIZE);
img_conf->set_img_size_y(IMG_SIZE_Y);
testLayerGrad(config,
"batch_norm",
......@@ -1467,6 +1480,7 @@ TEST(Operator, conv) {
const int FILTER_SIZE_Y = 3;
const int CHANNELS = 3;
const int IMAGE_SIZE = 16;
const int IMAGE_SIZE_Y = 8;
OperatorConfig& operatorConf = *config.layerConfig.add_operator_confs();
operatorConf.set_type("conv");
ConvConfig* conv = operatorConf.mutable_conv_conf();
......@@ -1481,19 +1495,22 @@ TEST(Operator, conv) {
conv->set_groups(1);
conv->set_filter_channels(conv->channels() / conv->groups());
conv->set_img_size(IMAGE_SIZE);
int output_x = outputSize(conv->img_size(),
conv->set_img_size_y(IMAGE_SIZE_Y);
conv->set_output_x(outputSize(conv->img_size(),
conv->filter_size(),
conv->padding(),
conv->stride(),
/* caffeMode */ true);
conv->set_output_x(output_x);
config.layerConfig.set_size(output_x * output_x *
config.layerConfig.num_filters());
config.layerConfig.set_size(conv->output_x() * conv->output_x() *
/* caffeMode */ true));
conv->set_output_y(outputSize(conv->img_size_y(),
conv->filter_size_y(),
conv->padding_y(),
conv->stride_y(),
/* caffeMode */ true));
config.layerConfig.set_size(conv->output_x() * conv->output_y() *
NUM_FILTERS);
config.inputDefs.push_back(
{INPUT_DATA, "layer_0", IMAGE_SIZE * IMAGE_SIZE * CHANNELS, 0});
{INPUT_DATA, "layer_0", IMAGE_SIZE * IMAGE_SIZE_Y * CHANNELS, 0});
config.inputDefs.push_back(
{INPUT_DATA,
"layer_1",
......
......@@ -1584,11 +1584,6 @@ void BaseMatrixT<real>::minRows(BaseMatrixT& b) {
applyRow(aggregate::min(), b);
}
template<>
void BaseMatrixT<real>::sumCols(BaseMatrixT& b) {
applyCol(aggregate::sum(), b);
}
template<>
void BaseMatrixT<real>::maxCols(BaseMatrixT& b) {
applyCol(aggregate::max(), b);
......
......@@ -1018,8 +1018,6 @@ public:
/// calculate the minimum value of each row of the matrix b.
void minRows(BaseMatrixT& b);
/// calculate the sum of each column of the matrix b.
void sumCols(BaseMatrixT& b);
/// calculate the maximum value of each column of the matrix b.
void maxCols(BaseMatrixT& b);
/// calculate the minimum value of each column of the matrix b.
......
......@@ -2,7 +2,7 @@
add_simple_unittest(test_ExecViaCpu)
add_simple_unittest(test_SIMDFunctions)
add_simple_unittest(test_matrix)
add_simple_unittest(test_SparseMatrix)
# TODO(yuyang18): Refactor TestUtil.cpp. Remove this cross module reference.
add_unittest(test_matrixCompare
......@@ -15,3 +15,5 @@ add_simple_unittest(test_CpuGpuVector)
add_simple_unittest(test_Allocator)
add_simple_unittest(test_FPException)
add_simple_unittest(test_GpuProfiler)
add_simple_unittest(test_BaseMatrix)
add_simple_unittest(test_Matrix)
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
/**
* This file provides a TensorCheck template function, which can be used to
* compare CpuMatrix and GpuMatrix, CpuVector and GpuVector, and so on.
*/
#include <cmath>
#include "paddle/math/Matrix.h"
namespace autotest {
using paddle::Matrix;
using paddle::CpuMatrix;
using paddle::GpuMatrix;
using paddle::VectorT;
using paddle::CpuVectorT;
using paddle::GpuVectorT;
class AssertEqual {
public:
AssertEqual(real err = 0) : err_(err) {}
inline bool operator()(real a, real b) {
if (err_ == 0) {
if (a != b) {
return false;
}
} else {
if (std::fabs(a - b) > err_) {
if ((std::fabs(a - b) / std::fabs(a)) > (err_ / 10.0f)) {
return false;
}
}
}
return true;
}
private:
real err_;
};
template <typename Tensor>
class CopyToCpu;
template <>
class CopyToCpu<CpuMatrix> {
public:
explicit CopyToCpu(const CpuMatrix& arg) : arg_(arg) {}
const CpuMatrix& copiedArg() const { return arg_; }
private:
const CpuMatrix& arg_;
};
template <>
class CopyToCpu<GpuMatrix> {
public:
explicit CopyToCpu(const GpuMatrix& arg)
: arg_(arg.getHeight(), arg.getWidth()) {
arg_.copyFrom(arg);
}
CpuMatrix& copiedArg() { return arg_; }
private:
CpuMatrix arg_;
};
template <>
class CopyToCpu<Matrix> {
public:
explicit CopyToCpu(const Matrix& arg)
: arg_(arg.getHeight(), arg.getWidth()) {
arg_.copyFrom(arg);
}
CpuMatrix& copiedArg() { return arg_; }
private:
CpuMatrix arg_;
};
template <typename T>
class CopyToCpu<CpuVectorT<T>> {
public:
explicit CopyToCpu(const CpuVectorT<T>& arg) : arg_(arg) {}
const CpuVectorT<T>& copiedArg() const { return arg_; }
private:
const CpuVectorT<T>& arg_;
};
template <typename T>
class CopyToCpu<GpuVectorT<T>> {
public:
explicit CopyToCpu(const GpuVectorT<T>& arg) : arg_(arg.getSize()) {
arg_.copyFrom(arg);
}
CpuVectorT<T>& copiedArg() { return arg_; }
private:
CpuVectorT<T> arg_;
};
template <typename T>
class CopyToCpu<VectorT<T>> {
public:
explicit CopyToCpu(const VectorT<T>& arg) : arg_(arg.getSize()) {
arg_.copyFrom(arg);
}
CpuVectorT<T>& copiedArg() { return arg_; }
private:
CpuVectorT<T> arg_;
};
template <typename AssertEq>
void TensorCheck(AssertEq compare,
const CpuMatrix& matrix1,
const CpuMatrix& matrix2) {
CHECK(matrix1.getHeight() == matrix2.getHeight());
CHECK(matrix1.getWidth() == matrix2.getWidth());
int height = matrix1.getHeight();
int width = matrix1.getWidth();
const real* data1 = matrix1.getData();
const real* data2 = matrix2.getData();
int count = 0;
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
real a = data1[i * width + j];
real b = data2[i * width + j];
if (!compare(a, b)) {
count++;
}
}
}
EXPECT_EQ(count, 0) << "There are " << count << " different element.";
}
template <typename AssertEq, class T>
void TensorCheck(AssertEq compare,
const CpuVectorT<T>& vector1,
const CpuVectorT<T>& vector2) {
CHECK(vector1.getSize() == vector2.getSize());
const T* data1 = vector1.getData();
const T* data2 = vector2.getData();
size_t size = vector1.getSize();
int count = 0;
for (size_t i = 0; i < size; i++) {
real a = data1[i];
real b = data2[i];
if (!compare(a, b)) {
count++;
}
}
EXPECT_EQ(count, 0) << "There are " << count << " different element.";
}
template <typename AssertEq, typename Tensor1, typename Tensor2>
void TensorCheck(AssertEq compare,
const Tensor1& tensor1,
const Tensor2& tensor2) {
TensorCheck(compare,
CopyToCpu<Tensor1>(tensor1).copiedArg(),
CopyToCpu<Tensor2>(tensor2).copiedArg());
}
template <typename AssertEq>
void TensorCheck(AssertEq compare, real args1, real args2) {
EXPECT_EQ(compare(args1, args2), true) << "[Test error] args1 = " << args1
<< ", args2 = " << args2;
}
template <typename AssertEq>
void TensorCheck(AssertEq compare, size_t args1, size_t args2) {
EXPECT_EQ(args1, args2) << "[Test error] args1 = " << args1
<< ", args2 = " << args2;
}
template <typename Tensor1, typename Tensor2>
void TensorCheckEqual(const Tensor1& tensor1, const Tensor2& tensor2) {
AssertEqual compare(0);
TensorCheck(compare,
CopyToCpu<Tensor1>(tensor1).copiedArg(),
CopyToCpu<Tensor2>(tensor2).copiedArg());
}
template <typename Tensor1, typename Tensor2>
void TensorCheckErr(const Tensor1& tensor1, const Tensor2& tensor2) {
#ifndef PADDLE_TYPE_DOUBLE
AssertEqual compare(1e-3);
#else
AssertEqual compare(1e-10);
#endif
TensorCheck(compare,
CopyToCpu<Tensor1>(tensor1).copiedArg(),
CopyToCpu<Tensor2>(tensor2).copiedArg());
}
} // namespace autotest
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
/**
* This file provides a AutoCompare calss to simplify the comparison
* of CPU and GPU member functions.
*
* This takes two steps
* 1. Construct an AutoCompare object.
* When constructing an AutoCompare object, you can set the err argument
* to specify the maximum error for CPU and GPU functions.
*
* 2. Use the template functions cmpWithArg or cmpWithoutArg.
* A. [cmpWithArg] Requires the caller construct the cpu arguments.
*
* AutoCompare test;
* Init Argument arg1,arg2...
* test.cmpWithArg(function, arg1, arg2....)
*
* B. [cmpWithoutArg] The caller do not need construct arguments.
* If matrix used in these functions arguments is the same size.
* Such as the element wise function and the aggregate function
* defined in the BaseMatrix.cpp.
*
* AutoCompare test;
* test.cmpWithoutArg<I...>(function, height, width)
*/
#include <gtest/gtest.h>
#include "paddle/math/Matrix.h"
#include "paddle/math/SparseMatrix.h"
#include "TensorCheck.h"
namespace autotest {
using paddle::BaseMatrix;
using paddle::CpuMatrix;
using paddle::GpuMatrix;
using paddle::CpuIVector;
using paddle::GpuIVector;
using paddle::CpuSparseMatrix;
using paddle::GpuSparseMatrix;
template <typename T1, typename T2>
class ReplaceType {
public:
typedef T1 type;
};
template <>
class ReplaceType<BaseMatrix, CpuMatrix> {
public:
typedef CpuMatrix type;
};
template <>
class ReplaceType<BaseMatrix, GpuMatrix> {
public:
typedef GpuMatrix type;
};
template <>
class ReplaceType<Matrix, CpuMatrix> {
public:
typedef CpuMatrix type;
};
template <>
class ReplaceType<Matrix, GpuMatrix> {
public:
typedef GpuMatrix type;
};
// construct a argument
template <typename T>
T construct(int height, int width);
template <>
float construct(int height, int width) {
return 0.5;
}
template <>
double construct(int height, int width) {
return 0.5;
}
template <>
size_t construct(int height, int width) {
size_t offset = std::rand() % (height < width ? height : width);
return offset;
}
template <>
CpuMatrix construct(int height, int width) {
CpuMatrix a(height, width);
return a;
}
template <>
GpuMatrix construct(int height, int width) {
GpuMatrix a(height, width);
return a;
}
// init a argument
template <typename T>
void init(T& v) {
return;
}
template <>
void init(CpuMatrix& v) {
v.randomizeUniform();
}
template <>
void init(GpuMatrix& v) {
v.randomizeUniform();
}
// init a tuple which contains a set of arguments.
template <std::size_t I = 0, typename... Args>
inline typename std::enable_if<I == sizeof...(Args), void>::type initTuple(
std::tuple<Args...>& t) {}
template <std::size_t I = 0, typename... Args>
inline typename std::enable_if <
I<sizeof...(Args), void>::type initTuple(std::tuple<Args...>& t) {
init(std::get<I>(t));
initTuple<I + 1>(t);
}
// copy a argument, copy src to dest
template <typename T1, typename T2>
void copy(T1& dest, T2& src) {
dest = src;
}
template <>
void copy(GpuMatrix& dest, CpuMatrix& src) {
dest.copyFrom(src);
}
// copy a tuple, copy src to dest
template <std::size_t I = 0, typename... Args1, typename... Args2>
inline typename std::enable_if<I == sizeof...(Args1), void>::type copyTuple(
std::tuple<Args1...>& dest, std::tuple<Args2...>& src) {}
template <std::size_t I = 0, typename... Args1, typename... Args2>
inline typename std::enable_if <
I<sizeof...(Args1), void>::type copyTuple(std::tuple<Args1...>& dest,
std::tuple<Args2...>& src) {
copy(std::get<I>(dest), std::get<I>(src));
copyTuple<I + 1>(dest, src);
}
// call member function
template <typename C,
typename FC,
typename R,
typename... FArgs,
typename... Args>
R call(C& obj, R (FC::*f)(FArgs...), Args&&... args) {
return (obj.*f)(args...);
}
template <typename T>
class ReturnType {
public:
typedef T type;
};
template <>
class ReturnType<CpuMatrix> {
public:
typedef GpuMatrix type;
};
template <>
class ReturnType<CpuIVector> {
public:
typedef GpuIVector type;
};
template <>
class ReturnType<CpuSparseMatrix> {
public:
typedef GpuSparseMatrix type;
};
template <typename T>
typename ReturnType<T>::type autoArgs(T& v) {
return v;
}
template <>
GpuMatrix autoArgs(CpuMatrix& v) {
GpuMatrix a(v.getHeight(), v.getWidth());
a.copyFrom(v);
return a;
}
template <>
GpuIVector autoArgs(CpuIVector& v) {
GpuIVector a(v.getSize());
a.copyFrom(v);
return a;
}
template <>
GpuSparseMatrix autoArgs(CpuSparseMatrix& v) {
GpuSparseMatrix a(v.getHeight(),
v.getWidth(),
v.getElementCnt(),
v.getValueType(),
v.getFormat());
a.copyFrom(v, HPPL_STREAM_DEFAULT);
hl_stream_synchronize(HPPL_STREAM_DEFAULT);
return a;
}
class AutoCompare {
public:
/**
* err is the allowed calculation error.
* The smaller the value of err,
* the stricter the comparison is between CPU and GPU calculations.
*/
AutoCompare(size_t height, size_t width, real err = 1e-3)
: cpu(height, width), gpu(height, width), compare(err) {
init(cpu);
copy(gpu, cpu);
}
template <typename C, typename R, typename... FArgs, typename... Args>
void cmpWithArg(R (C::*f)(FArgs...), Args&&... args) {
static_assert(sizeof...(FArgs) == sizeof...(Args),
"size of parameter packs are not equal");
call(cpu, f, args...);
call(gpu, f, autoArgs(args)...);
TensorCheck(compare, cpu, gpu);
}
template <std::size_t... I, typename C, typename R, typename... Args>
void cmpWithoutArg(R (C::*f)(Args...), size_t height, size_t width) {
static_assert(sizeof...(I) == sizeof...(Args),
"size of parameter packs are not equal");
(void)height;
(void)width;
auto tuple1 = std::make_tuple(
construct<typename ReplaceType<
typename std::decay<
typename std::tuple_element<I,
std::tuple<Args...>>::type>::type,
CpuMatrix>::type>(height, width)...);
auto tuple2 = std::make_tuple(
construct<typename ReplaceType<
typename std::decay<
typename std::tuple_element<I,
std::tuple<Args...>>::type>::type,
GpuMatrix>::type>(height, width)...);
initTuple(tuple1);
copyTuple(tuple2, tuple1);
call(cpu, f, std::get<I>(tuple1)...);
call(gpu, f, std::get<I>(tuple2)...);
TensorCheck(compare, cpu, gpu);
}
protected:
CpuMatrix cpu;
GpuMatrix gpu;
AssertEqual compare;
};
} // namespace autotest
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef PADDLE_ONLY_CPU
/**
* This test file use autotest::AutoCompare and cmpWithoutArg to compares the
* implementation of CPU and GPU member function in
* BaseMatrix.cpp and Matrix.cpp.
*/
#include <gtest/gtest.h>
#include "paddle/math/BaseMatrix.h"
#include "TestUtils.h"
using paddle::BaseMatrix;
using paddle::Matrix;
using autotest::AutoCompare;
// Test all void (BaseMatrix::*)() function
TEST(BaseMatrix, void) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
auto compare = [height, width](void (BaseMatrix::*f)()) {
AutoCompare test(height, width, 1e-5);
test.cmpWithoutArg(f, height, width);
};
compare(&BaseMatrix::neg);
compare(&BaseMatrix::exp);
compare(&BaseMatrix::log);
compare(&BaseMatrix::sqrt);
compare(&BaseMatrix::square);
compare(&BaseMatrix::reciprocal);
compare(&BaseMatrix::abs);
compare(&BaseMatrix::sign);
compare(&BaseMatrix::zero);
compare(&BaseMatrix::one);
}
}
}
// Test all void (BaseMatrix::*)(real) function
TEST(BaseMatrix, real) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
auto compare = [height, width](void (BaseMatrix::*f)(real)) {
AutoCompare test(height, width, 1e-5);
test.cmpWithoutArg<0>(f, height, width);
};
compare(&BaseMatrix::pow);
compare(&BaseMatrix::subScalar);
compare(&BaseMatrix::mulScalar);
compare(&BaseMatrix::divScalar);
compare(&BaseMatrix::assign);
compare(&BaseMatrix::add);
compare(&BaseMatrix::biggerThanScalar);
compare(&BaseMatrix::downClip);
}
}
}
// Test all void (BaseMatrix::*)(BaseMatrix&) function
TEST(BaseMatrix, BaseMatrix) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
auto compare = [height, width](void (BaseMatrix::*f)(BaseMatrix&)) {
AutoCompare test(height, width, 1e-5);
test.cmpWithoutArg<0>(f, height, width);
};
compare(&BaseMatrix::assign);
compare(&BaseMatrix::add);
compare(&BaseMatrix::relu);
compare(&BaseMatrix::reluDerivative);
compare(&BaseMatrix::softrelu);
compare(&BaseMatrix::softreluDerivative);
compare(&BaseMatrix::brelu);
compare(&BaseMatrix::breluDerivative);
compare(&BaseMatrix::square);
compare(&BaseMatrix::squareDerivative);
compare(&BaseMatrix::tanh);
compare(&BaseMatrix::tanhDerivative);
compare(&BaseMatrix::reciprocal);
compare(&BaseMatrix::reciprocalDerivative);
compare(&BaseMatrix::abs);
compare(&BaseMatrix::absDerivative);
compare(&BaseMatrix::sigmoid);
compare(&BaseMatrix::sigmoidDerivative);
compare(&BaseMatrix::expDerivative);
compare(&BaseMatrix::sign);
compare(&BaseMatrix::exp);
compare(&BaseMatrix::log);
compare(&BaseMatrix::sqrt);
compare(&BaseMatrix::dotMul);
compare(&BaseMatrix::dotMulSquare);
compare(&BaseMatrix::dotSquareMul);
compare(&BaseMatrix::addColVector);
compare(&BaseMatrix::addRowVector);
compare(&BaseMatrix::mulRowVector);
compare(&BaseMatrix::divRowVector);
compare(&BaseMatrix::addP2P);
compare(&BaseMatrix::invSqrt);
}
}
}
// Test all void (BaseMatrix::*)(real, real) function
TEST(BaseMatrix, real_real) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
auto compare = [height, width](void (BaseMatrix::*f)(real, real)) {
AutoCompare test(height, width, 1e-5);
test.cmpWithoutArg<0, 1>(f, height, width);
};
compare(&BaseMatrix::add);
compare(&BaseMatrix::clip);
}
}
}
// Test all void (BaseMatrix::*)(BaseMatrix&, real) function
TEST(BaseMatrix, BaseMatrix_real) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
auto compare = [height, width](void (BaseMatrix::*f)(BaseMatrix&, real)) {
AutoCompare test(height, width, 1e-5);
test.cmpWithoutArg<0, 1>(f, height, width);
};
compare(&BaseMatrix::addBias);
compare(&BaseMatrix::add);
compare(&BaseMatrix::sub);
compare(&BaseMatrix::pow);
compare(&BaseMatrix::addScalar);
compare(&BaseMatrix::subScalar);
compare(&BaseMatrix::mulScalar);
compare(&BaseMatrix::divScalar);
compare(&BaseMatrix::scalarDiv);
compare(&BaseMatrix::addSquare);
compare(&BaseMatrix::isEqualTo);
}
}
}
// Test all void (BaseMatrix::*)(BaseMatrix&, BaseMatrix&) function
TEST(BaseMatrix, BaseMatrix_BaseMatrix) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
auto compare = [height,
width](void (BaseMatrix::*f)(BaseMatrix&, BaseMatrix&)) {
AutoCompare test(height, width, 1e-5);
test.cmpWithoutArg<0, 1>(f, height, width);
};
compare(&BaseMatrix::softCrossEntropy);
compare(&BaseMatrix::softCrossEntropyBp);
compare(&BaseMatrix::binaryLabelCrossEntropy);
compare(&BaseMatrix::binaryLabelCrossEntropyBp);
compare(&BaseMatrix::sub);
compare(&BaseMatrix::add2);
compare(&BaseMatrix::dotMul);
compare(&BaseMatrix::dotDiv);
compare(&BaseMatrix::logisticRegressionLoss);
compare(&BaseMatrix::logisticRegressionLossBp);
compare(&BaseMatrix::biggerThan);
compare(&BaseMatrix::max);
compare(&BaseMatrix::dotMulSquare);
compare(&BaseMatrix::dotSquareSquare);
}
}
}
void TestEelementWise(size_t height, size_t width) {
AutoCompare rowScale(height, width);
rowScale.cmpWithoutArg<0, 1, 2>(&BaseMatrix::rowScale, height, width);
AutoCompare rowDotMul(height, width);
rowDotMul.cmpWithoutArg<0, 1, 2>(&BaseMatrix::rowDotMul, height, width);
AutoCompare binaryClassificationError(height, width);
binaryClassificationError.cmpWithoutArg<0, 1, 2, 3>(
&BaseMatrix::binaryClassificationError, height, width);
AutoCompare sumOfSquaresBp(height, width);
sumOfSquaresBp.cmpWithoutArg<0, 1>(&Matrix::sumOfSquaresBp, height, width);
}
void TestAggregateToRow(size_t height, size_t width) {
AutoCompare maxCols(1, width);
maxCols.cmpWithoutArg<0>(&BaseMatrix::maxCols, height, width);
AutoCompare minCols(1, width);
minCols.cmpWithoutArg<0>(&BaseMatrix::minCols, height, width);
AutoCompare addDotMulVMM(1, width);
addDotMulVMM.cmpWithoutArg<0, 1>(&BaseMatrix::addDotMulVMM, height, width);
AutoCompare sumCols(1, width);
sumCols.cmpWithoutArg<0, 1, 2>(&BaseMatrix::sumCols, height, width);
AutoCompare collectBias(1, width);
collectBias.cmpWithoutArg<0, 1>(
static_cast<void (Matrix::*)(Matrix&, real)>(&Matrix::collectBias),
height,
width);
}
void TestAggregateToCol(size_t height, size_t width) {
AutoCompare maxRows(height, 1);
maxRows.cmpWithoutArg<0>(&BaseMatrix::maxRows, height, width);
AutoCompare minRows(height, 1);
minRows.cmpWithoutArg<0>(&BaseMatrix::minRows, height, width);
AutoCompare sumRows(height, 1);
sumRows.cmpWithoutArg<0, 1, 2>(&BaseMatrix::sumRows, height, width);
AutoCompare sumOfSquares(height, 1);
sumOfSquares.cmpWithoutArg<0, 1>(&Matrix::sumOfSquares, height, width);
}
TEST(BaseMatrix, Other) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
TestEelementWise(height, width);
TestAggregateToRow(height, width);
TestAggregateToCol(height, width);
}
}
}
int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv);
paddle::initMain(argc, argv);
return RUN_ALL_TESTS();
}
#endif
/* Copyright (c) 2016 Baidu, Inc. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#ifndef PADDLE_ONLY_CPU
/**
* This test file use autotest::AutoCompare and cmpWithArg to compares the
* implementation of CPU and GPU member function in Matrix.cpp.
*/
#include <gtest/gtest.h>
#include "TestUtils.h"
using paddle::BaseMatrix;
using paddle::Matrix;
using paddle::CpuMatrix;
using paddle::CpuIVector;
using paddle::CpuSparseMatrix;
using autotest::AutoCompare;
void testBilinearFwdBwd(int numSamples,
int imgSizeH,
int imgSizeW,
int channels) {
int inWidth = imgSizeH * imgSizeW * channels;
int outWidth = 2 * imgSizeH * 2 * imgSizeW * channels;
real ratioH = 0.5;
real ratioW = 0.5;
AutoCompare forward(numSamples, outWidth);
CpuMatrix arg1(numSamples, inWidth);
arg1.randomizeUniform();
forward.cmpWithArg(&Matrix::bilinearForward,
arg1,
imgSizeH,
imgSizeW,
2 * imgSizeH,
2 * imgSizeW,
channels,
ratioH,
ratioW);
AutoCompare backward(numSamples, inWidth);
CpuMatrix arg2(numSamples, outWidth);
arg2.randomizeUniform();
backward.cmpWithArg(&Matrix::bilinearBackward,
arg2,
2 * imgSizeH,
2 * imgSizeW,
imgSizeH,
imgSizeW,
channels,
ratioH,
ratioW);
}
TEST(Matrix, BilinearFwdBwd) {
for (auto numSamples : {5, 10}) {
for (auto channels : {8, 16}) {
for (auto imgSizeH : {14, 28}) {
for (auto imgSizeW : {16, 30}) {
VLOG(3) << " numSamples=" << numSamples << " channels=" << channels
<< " imgSizeH=" << imgSizeH << " imgSizeW=" << imgSizeW;
testBilinearFwdBwd(numSamples, imgSizeH, imgSizeW, channels);
}
}
}
}
}
void testMatrixAddBias(int height, int width, real scale) {
AutoCompare test(height, width);
CpuMatrix arg1(1, width);
arg1.randomizeUniform();
test.cmpWithArg(
static_cast<void (Matrix::*)(Matrix&, real)>(&Matrix::addBias),
arg1,
scale);
}
void testMatrixAddDotMulMMV(int height, int width) {
AutoCompare test(height, width);
CpuMatrix arg1(height, width);
CpuMatrix arg2(1, width);
arg1.randomizeUniform();
arg2.randomizeUniform();
test.cmpWithArg(&BaseMatrix::addDotMulMMV, arg1, arg2);
}
TEST(Matrix, unary) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
VLOG(3) << " height=" << height << " width=" << width;
testMatrixAddBias(height, width, 1.0);
testMatrixAddBias(height, width, 3.5);
testMatrixAddDotMulMMV(height, width);
}
}
}
void testMatrixAddAtOffset(int height, int width1, int width2, int offset) {
AutoCompare test(height, width2);
CpuMatrix arg1(height, width1);
arg1.randomizeUniform();
test.cmpWithArg(&Matrix::addAtOffset, arg1, offset);
}
void testMatrixAssignAtOffset(int height, int width1, int width2, int offset) {
AutoCompare test(height, width2);
CpuMatrix arg1(height, width1);
arg1.randomizeUniform();
test.cmpWithArg(&Matrix::assignAtOffset, arg1, offset);
}
TEST(Matrix, AtOffset) {
for (auto height : {1, 11, 73, 128, 200}) {
for (auto width1 : {1, 32, 100, 512, 1000}) {
for (auto width2 : {1, 32, 100, 512, 1000}) {
int columnOffset = 0;
int offset = std::abs(width1 - width2);
if (offset) {
columnOffset = std::rand() % offset;
}
VLOG(3) << " height=" << height << " width1=" << width1
<< " width2=" << width2 << " columnOffset = " << columnOffset;
testMatrixAddAtOffset(height, width1, width2, columnOffset);
testMatrixAssignAtOffset(height, width1, width2, columnOffset);
}
}
}
}
void testMatrixSelectRows(int numSamples, int tableSize, int inputDim) {
AutoCompare test(numSamples, inputDim);
CpuMatrix arg1(tableSize, inputDim);
CpuIVector arg2(numSamples);
arg1.randomizeUniform();
arg2.rand(tableSize);
test.cmpWithArg(&Matrix::selectRows, arg1, arg2);
}
TEST(Matrix, tableProjection) {
for (auto numSamples : {10, 100, 1000, 10000, 80000}) {
for (auto tableSize : {10, 100}) {
for (auto inputDim : {20, 50}) {
VLOG(3) << " numSamples=" << numSamples << " tableSize=" << tableSize
<< " inputDim=" << inputDim;
testMatrixSelectRows(numSamples, tableSize, inputDim);
}
}
}
}
void testMatrixCopyByRowIndex(int outHeight, int inHeight, int width) {
AutoCompare test(outHeight, width);
CpuMatrix arg1(inHeight, width);
CpuIVector arg2(outHeight);
arg1.randomizeUniform();
arg2.rand(inHeight);
test.cmpWithArg(&Matrix::copyByRowIndex, arg1, arg2);
}
TEST(Matrix, copyByRowIndex) {
for (auto outHeight : {31, 500, 1000}) {
for (auto inHeight : {17, 257, 500, 1200}) {
for (auto width : {512, 1024}) {
VLOG(3) << outHeight << " " << inHeight << " " << width;
testMatrixCopyByRowIndex(outHeight, inHeight, width);
}
}
}
}
void testCosSim(int heightX, int heightY, int width, real scale) {
AutoCompare test(heightX, 1);
CpuMatrix arg1(heightX, width);
CpuMatrix arg2(heightY, width);
arg1.randomizeUniform();
arg2.randomizeUniform();
arg2.add(-0.5);
test.cmpWithArg(&Matrix::cosSim, arg1, arg2, scale);
}
TEST(Matrix, cosSim) {
for (auto heightX : {10, 100, 1000}) {
for (auto heightY : {1, heightX}) {
for (auto width : {10, 100, 1000}) {
for (auto scale : {1.0, 2.0}) {
testCosSim(heightX, heightY, width, scale);
}
}
}
}
}
void testParamReluForward(int height, int width, int w_height, int w_width) {
AutoCompare test(height, width);
CpuMatrix arg1(height, width);
CpuMatrix arg2(w_height, w_width);
arg1.randomizeUniform();
arg2.randomizeUniform();
arg1.add(-0.5);
test.cmpWithArg(&Matrix::paramReluForward, arg1, arg2);
}
void testParamReluBackwardW(int height, int width, int w_height, int w_width) {
AutoCompare test(w_height, w_width);
CpuMatrix arg1(height, width);
CpuMatrix arg2(height, width);
arg1.randomizeUniform();
arg2.randomizeUniform();
arg2.add(-0.5);
test.cmpWithArg(&Matrix::paramReluBackwardW, arg1, arg2);
}
TEST(Matrix, paramRelu) {
for (auto height : {10, 100}) {
for (auto width : {10, 100}) {
for (auto w_height : {1, 2}) {
for (auto w_width : {1, 2}) {
testParamReluForward(height, width, w_height, w_width);
testParamReluBackwardW(height, width, w_height, w_width);
}
}
}
}
}
void testAddSharedBias(int numSamples, int dim, int channel) {
AutoCompare test(numSamples, dim);
CpuMatrix arg1(1, channel);
arg1.randomizeUniform();
test.cmpWithArg(&Matrix::addSharedBias, arg1, 1.0);
}
void testCollectSharedBias(int numSamples, int dim, int channel) {
AutoCompare test(1, channel);
CpuMatrix arg1(numSamples, dim);
arg1.randomizeUniform();
test.cmpWithArg(&Matrix::collectSharedBias, arg1, 1.0);
}
TEST(Matrix, sharedBias) {
for (auto numSamples : {1, 100, 520}) {
for (auto dim : {100 * 16, 100 * 32}) {
for (auto channel : {8, 16}) {
VLOG(3) << " numSamples=" << numSamples << " dim=" << dim
<< " channel=" << channel;
testAddSharedBias(numSamples, dim, channel);
testCollectSharedBias(numSamples, dim, channel);
}
}
}
}
void testMultiBinaryLabelCrossEntropy(int numSamples, int dim) {
AutoCompare forward(numSamples, 1);
CpuMatrix arg1(numSamples, dim);
CpuSparseMatrix arg2(
numSamples, dim, numSamples, paddle::NO_VALUE, paddle::SPARSE_CSR);
CpuMatrix output1(numSamples, dim);
output1.randomizeUniform();
output1.softmax(arg1);
for (int i = 0; i < numSamples; i++) {
const unsigned int id = std::rand() % dim;
arg2.setRow(i, 1, &id, nullptr);
}
forward.cmpWithArg(&Matrix::multiBinaryLabelCrossEntropy, arg1, arg2);
AutoCompare backward(numSamples, dim);
backward.cmpWithArg(&Matrix::multiBinaryLabelCrossEntropyBp, arg1, arg2);
}
TEST(Matrix, multiBinaryCrossEntropy) {
for (auto numSamples : {100, 1000, 10000}) {
for (auto dim : {100, 1000, 10000}) {
VLOG(3) << " numSamples=" << numSamples << " dim=" << dim;
testMultiBinaryLabelCrossEntropy(numSamples, dim);
}
}
}
int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv);
paddle::initMain(argc, argv);
return RUN_ALL_TESTS();
}
#endif
......@@ -22,163 +22,12 @@ limitations under the License. */
#include <gtest/gtest.h>
#include "paddle/gserver/tests/TestUtil.h"
#include "paddle/utils/Stat.h"
#include "TensorCheck.h"
using namespace paddle; // NOLINT
using namespace std; // NOLINT
template <class T>
void VectorCheckEqual(const VectorT<T>& vector1, const VectorT<T>& vector2) {
CHECK(vector1.getSize() == vector2.getSize());
const T* data1 = vector1.getData();
const T* data2 = vector2.getData();
size_t size = vector1.getSize();
int count = 0;
for (size_t i = 0; i < size; i++) {
if (data1[i] != data2[i]) {
count++;
}
}
EXPECT_EQ(count, 0) << "There are " << count << " different element.";
}
void MatrixCheckEqual(const Matrix& matrix1, const Matrix& matrix2) {
CHECK(matrix1.getHeight() == matrix2.getHeight());
CHECK(matrix1.getWidth() == matrix2.getWidth());
int height = matrix1.getHeight();
int width = matrix1.getWidth();
const real* data1 = matrix1.getData();
const real* data2 = matrix2.getData();
int count = 0;
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
if (data1[i * width + j] != data2[i * width + j]) {
count++;
}
}
}
EXPECT_EQ(count, 0) << "There are " << count << " different element.";
}
void MatrixCheckErr(const Matrix& matrix1, const Matrix& matrix2) {
CHECK(matrix1.getHeight() == matrix2.getHeight());
CHECK(matrix1.getWidth() == matrix2.getWidth());
#ifndef PADDLE_TYPE_DOUBLE
real err = 1e-3;
#else
real err = 1e-10;
#endif
int height = matrix1.getHeight();
int width = matrix1.getWidth();
const real* data1 = matrix1.getData();
const real* data2 = matrix2.getData();
int count = 0;
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
real a = data1[i * width + j];
real b = data2[i * width + j];
if (fabs(a - b) > err) {
if ((fabsf(a - b) / fabsf(a)) > (err / 10.0f)) {
count++;
}
}
}
}
EXPECT_EQ(count, 0) << "There are " << count << " different element.";
}
void testBilinearFwdBwd(int numSamples,
int imgSizeH,
int imgSizeW,
int channels) {
int inWidth = imgSizeH * imgSizeW * channels;
int outWidth = 2 * imgSizeH * 2 * imgSizeW * channels;
real ratioH = 0.5;
real ratioW = 0.5;
// forward
MatrixPtr input = CpuMatrix::create(numSamples, inWidth, false, false);
MatrixPtr inputGpu = GpuMatrix::create(numSamples, inWidth, false, true);
MatrixPtr target = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpu = GpuMatrix::create(numSamples, outWidth, false, true);
MatrixPtr targetCheck = CpuMatrix::create(numSamples, outWidth, false, false);
input->randomizeUniform();
inputGpu->copyFrom(*input);
target->bilinearForward(*input,
imgSizeH,
imgSizeW,
2 * imgSizeH,
2 * imgSizeW,
channels,
ratioH,
ratioW);
targetGpu->bilinearForward(*inputGpu,
imgSizeH,
imgSizeW,
2 * imgSizeH,
2 * imgSizeW,
channels,
ratioH,
ratioW);
// check
targetCheck->copyFrom(*targetGpu);
MatrixCheckErr(*target, *targetCheck);
// backward
MatrixPtr inputGrad = CpuMatrix::create(numSamples, inWidth, false, false);
MatrixPtr inputGpuGrad = GpuMatrix::create(numSamples, inWidth, false, true);
MatrixPtr targetGrad = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpuGrad =
GpuMatrix::create(numSamples, outWidth, false, true);
MatrixPtr targetCheckGrad =
CpuMatrix::create(numSamples, inWidth, false, false);
inputGrad->randomizeUniform();
targetGrad->randomizeUniform();
inputGpuGrad->copyFrom(*inputGrad);
targetGpuGrad->copyFrom(*targetGrad);
inputGrad->bilinearBackward(*targetGrad,
2 * imgSizeH,
2 * imgSizeW,
imgSizeH,
imgSizeW,
channels,
ratioH,
ratioW);
inputGpuGrad->bilinearBackward(*targetGpuGrad,
2 * imgSizeH,
2 * imgSizeW,
imgSizeH,
imgSizeW,
channels,
ratioH,
ratioW);
// check
targetCheckGrad->copyFrom(*inputGpuGrad);
MatrixCheckErr(*inputGrad, *targetCheckGrad);
}
TEST(Matrix, BilinearFwdBwd) {
for (auto numSamples : {5, 10}) {
for (auto channels : {8, 16}) {
for (auto imgSizeH : {14, 28}) {
for (auto imgSizeW : {16, 30}) {
VLOG(3) << " numSamples=" << numSamples << " channels=" << channels
<< " imgSizeH=" << imgSizeH << " imgSizeW=" << imgSizeW;
testBilinearFwdBwd(numSamples, imgSizeH, imgSizeW, channels);
}
}
}
}
}
using autotest::TensorCheckEqual;
using autotest::TensorCheckErr;
void testMatrixProjectionForward(int contextStart,
int contextLength,
......@@ -232,12 +81,7 @@ void testMatrixProjectionForward(int contextStart,
beginPad,
padding);
// check
MatrixPtr outputCheck =
std::make_shared<CpuMatrix>(batchSize, inputDim * contextLength);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckEqual(*cpuOutput, *outputCheck);
TensorCheckEqual(*cpuOutput, *gpuOutput);
}
void testMatrixProjectionBackward(int contextStart,
......@@ -294,15 +138,9 @@ void testMatrixProjectionBackward(int contextStart,
beginPad);
}
// check
MatrixPtr inputGradCheck = std::make_shared<CpuMatrix>(batchSize, inputDim);
inputGradCheck->copyFrom(*gpuInputGrad);
MatrixCheckErr(*cpuInputGrad, *inputGradCheck);
TensorCheckErr(*cpuInputGrad, *gpuInputGrad);
if (padding) {
MatrixPtr weightGradChcek = std::make_shared<CpuMatrix>(pad, inputDim);
weightGradChcek->copyFrom(*gpuWeightGrad);
MatrixCheckErr(*cpuWeightGrad, *weightGradChcek);
TensorCheckErr(*cpuWeightGrad, *gpuWeightGrad);
}
}
......@@ -361,15 +199,8 @@ void testMatrixMaxSequence(int batchSize, int inputDim) {
cpuOutput->maxSequenceForward(*cpuInput, *cpuSequence, *cpuIndex);
gpuOutput->maxSequenceForward(*gpuInput, *gpuSequence, *gpuIndex);
// check
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(newBatchSize, inputDim);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckEqual(*cpuOutput, *outputCheck);
IVectorPtr indexCheck = nullptr;
IVector::resizeOrCreate(indexCheck, newBatchSize * inputDim, false);
indexCheck->copyFrom(*gpuIndex);
VectorCheckEqual(*cpuIndex, *indexCheck);
TensorCheckEqual(*cpuOutput, *gpuOutput);
TensorCheckEqual(*cpuIndex, *gpuIndex);
// backward
MatrixPtr cpuOutputGrad = std::make_shared<CpuMatrix>(newBatchSize, inputDim);
......@@ -385,10 +216,7 @@ void testMatrixMaxSequence(int batchSize, int inputDim) {
cpuInputGrad->maxSequenceBackward(*cpuOutputGrad, *cpuSequence, *cpuIndex);
gpuInputGrad->maxSequenceBackward(*gpuOutputGrad, *gpuSequence, *gpuIndex);
// check
MatrixPtr inputGradCheck = std::make_shared<CpuMatrix>(batchSize, inputDim);
inputGradCheck->copyFrom(*gpuInputGrad);
MatrixCheckEqual(*cpuInputGrad, *inputGradCheck);
TensorCheckEqual(*cpuInputGrad, *gpuInputGrad);
}
TEST(Matrix, maxSequence) {
......@@ -431,6 +259,8 @@ void testMatrixZeroAtOffset(int height, int width) {
int columnOffset = rand() % width; // NOLINT we just use rand() for test.
int numColumns = rand() % (width - columnOffset); // NOLINT
if (numColumns == 0) return;
cpuA->zeroAtOffset(columnOffset, numColumns);
gpuA->zeroAtOffset(columnOffset, numColumns);
......@@ -442,10 +272,8 @@ void testMatrixZeroAtOffset(int height, int width) {
}
}
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckEqual(*cpuA, *outputCheck);
MatrixCheckEqual(*cpuA, *cpuTest);
TensorCheckEqual(*cpuA, *gpuA);
TensorCheckEqual(*cpuA, *cpuTest);
}
void testMatrixDeepSwap(int height, int width) {
......@@ -462,303 +290,8 @@ void testMatrixDeepSwap(int height, int width) {
// swap matrix cpuA and cpuB
cpuA->deepSwap(*cpuB);
MatrixCheckEqual(*cpuA, *cpuCopyB);
MatrixCheckEqual(*cpuB, *cpuCopyA);
}
void testMatrixBinaryAdd(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
cpuA->add(*cpuB);
gpuA->add(*gpuB);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckEqual(*cpuA, *outputCheck);
}
void testMatrixAssign(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
gpuA->copyFrom(*cpuA);
cpuA->assign(2.5);
gpuA->assign(2.5);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckEqual(*cpuA, *outputCheck);
}
void testMatrixAdd(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
gpuA->copyFrom(*cpuA);
cpuA->add(2.5);
gpuA->add(2.5);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckEqual(*cpuA, *outputCheck);
}
void testMatrixSqrt(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
gpuA->copyFrom(*cpuA);
cpuA->sqrt();
gpuA->sqrt();
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixTanhDerivative(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
cpuA->tanhDerivative(*cpuB);
gpuA->tanhDerivative(*gpuB);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixTanh(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
cpuA->tanh(*cpuB);
gpuA->tanh(*gpuB);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixTernarySub(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
cpuA->sub(*cpuB, *cpuC);
gpuA->sub(*gpuB, *gpuC);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckEqual(*cpuA, *outputCheck);
}
void testMatrixSumOfSquaresBp(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
cpuA->sumOfSquaresBp(*cpuB, *cpuC);
gpuA->sumOfSquaresBp(*gpuB, *gpuC);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixBinaryRowScale(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, 1);
MatrixPtr cpuA1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB1 = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr gpuA1 = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB1 = std::make_shared<GpuMatrix>(height, 1);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
cpuA1->copyFrom(*cpuA);
cpuB1->copyFrom(*cpuB);
gpuA1->copyFrom(*cpuA);
gpuB1->copyFrom(*cpuB);
cpuA->addColVector(*cpuB);
gpuA->addColVector(*gpuB);
cpuA1->addColumnVector(*cpuB1);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckEqual(*cpuA, *outputCheck);
MatrixCheckEqual(*cpuA, *cpuA1);
}
void testMatrixAddBias(int height, int width, real scale) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(1, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(1, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
cpuA->addBias(*cpuB, scale);
gpuA->addBias(*gpuB, scale);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixTernaryRowScale(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(height, width);
MatrixPtr cpuA1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC1 = std::make_shared<CpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
cpuA1->copyFrom(*cpuA);
cpuB1->copyFrom(*cpuB);
cpuC1->copyFrom(*cpuC);
int columnOffset = rand() % width; // NOLINT
cpuA->rowScale(columnOffset, *cpuB, *cpuC);
gpuA->rowScale(columnOffset, *gpuB, *gpuC);
cpuA1->rowScale2(columnOffset, *cpuB1, *cpuC1);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckEqual(*cpuA, *outputCheck);
MatrixCheckEqual(*cpuA, *cpuA1);
}
void testMatrixTernaryRowDotMul(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuA1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
cpuA1->copyFrom(*cpuA);
cpuB1->copyFrom(*cpuB);
cpuC1->copyFrom(*cpuC);
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
int columnOffset = rand() % width; // NOLINT
cpuA->rowDotMul(columnOffset, *cpuB, *cpuC);
gpuA->rowDotMul(columnOffset, *gpuB, *gpuC);
cpuA1->rowDotMul2(columnOffset, *cpuB1, *cpuC1);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *cpuA1);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixAddDotMulMMV(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(1, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(1, width);
MatrixPtr cpuA1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC1 = std::make_shared<CpuMatrix>(1, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
cpuA1->copyFrom(*cpuA);
cpuB1->copyFrom(*cpuB);
cpuC1->copyFrom(*cpuC);
cpuA->addDotMulMMV(*cpuB, *cpuC);
gpuA->addDotMulMMV(*gpuB, *gpuC);
cpuA1->addDotMulMMV2(*cpuB1, *cpuC1);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
MatrixCheckEqual(*cpuA, *cpuA1);
TensorCheckEqual(*cpuA, *cpuCopyB);
TensorCheckEqual(*cpuB, *cpuCopyA);
}
void testMatrixTranspose(int height, int width) {
......@@ -772,9 +305,7 @@ void testMatrixTranspose(int height, int width) {
cpu->transpose(cpuT, false);
gpu->transpose(gpuT, false);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(width, height);
outputCheck->copyFrom(*gpuT);
MatrixCheckEqual(*cpuT, *outputCheck);
TensorCheckEqual(*cpuT, *gpuT);
}
void testMatrixInverse(int height) {
......@@ -795,530 +326,127 @@ void testMatrixInverse(int height) {
cpu->inverse(cpuI, false);
gpu->inverse(gpuI, false);
outputCheck->copyFrom(*gpuI);
MatrixCheckErr(*cpuI, *outputCheck);
TensorCheckErr(*cpuI, *gpuI);
outputCheck->mul(cpu, cpuI);
cpu->setDiag(1.0);
MatrixCheckErr(*cpu, *outputCheck);
}
TEST(Matrix, unary) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
VLOG(3) << " height=" << height << " width=" << width;
// applyUnary
testMatrixAssign(height, width);
testMatrixAdd(height, width);
testMatrixSqrt(height, width);
// applyBinary
testMatrixBinaryAdd(height, width);
testMatrixTanh(height, width);
testMatrixTanhDerivative(height, width);
testMatrixDeepSwap(height, width);
// applyTernary
testMatrixTernarySub(height, width);
testMatrixSumOfSquaresBp(height, width);
// asRowVector
testMatrixAddBias(height, width, 1.0);
testMatrixAddBias(height, width, 3.5);
testMatrixAddDotMulMMV(height, width);
// asColVector
testMatrixTernaryRowScale(height, width);
testMatrixBinaryRowScale(height, width);
// sum
testMatrixGetSum(height, width);
// transpose
testMatrixTranspose(height, width);
}
// inverse
testMatrixInverse(height);
}
}
void testMatrixSoftmax(int height, int width) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width);
cpuInput->randomizeUniform();
gpuInput->copyFrom(*cpuInput);
cpuOutput->zero();
gpuOutput->zero();
cpuInput->softmax(*cpuOutput);
gpuInput->softmax(*gpuOutput);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckErr(*cpuOutput, *outputCheck);
}
void testSequenceSoftmax(int batchSize) {
// forward
int inputDim = 1;
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(batchSize, inputDim);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(batchSize, inputDim);
cpuInput->randomizeUniform();
gpuInput->copyFrom(*cpuInput);
IVectorPtr cpuSequence;
generateSequenceStartPositions(batchSize, cpuSequence);
IVectorPtr gpuSequence = IVector::create(cpuSequence->getSize(), true);
gpuSequence->copyFrom(*cpuSequence);
cpuInput->sequenceSoftmax(*cpuInput, *cpuSequence);
gpuInput->sequenceSoftmax(*gpuInput, *gpuSequence);
// check
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(batchSize, inputDim);
outputCheck->copyFrom(*gpuInput);
MatrixCheckErr(*cpuInput, *outputCheck);
}
void testMatrixSoftmaxThreshold(int height, int width) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width);
cpuInput->randomizeUniform();
cpuInput->getData()[0] = 100.0;
gpuInput->copyFrom(*cpuInput);
cpuOutput->zero();
gpuOutput->zero();
cpuInput->softmax(*cpuOutput);
gpuInput->softmax(*gpuOutput);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuOutput);
// check output zero
int cpuCount = 0;
int gpuCount = 0;
auto zeroNum = [](MatrixPtr out, int& count) {
for (size_t i = 0; i < out->getHeight(); i++) {
for (size_t j = 0; j < out->getWidth(); j++) {
if (out->getElement(i, j) == 0) count++;
}
}
};
zeroNum(cpuOutput, cpuCount);
zeroNum(outputCheck, gpuCount);
EXPECT_EQ(cpuCount, 0) << "Cpu softmax output value 0";
EXPECT_EQ(gpuCount, 0) << "Gpu softmax output value 0";
}
void testMatrixSoftmaxBp(int height, int width) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width);
cpuInput->randomizeUniform();
gpuInput->copyFrom(*cpuInput);
cpuOutput->randomizeUniform();
gpuOutput->copyFrom(*cpuOutput);
gpuOutput->softmaxBackward(*gpuInput);
MatrixPtr sftMaxSum = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr sftMaxDot = std::make_shared<CpuMatrix>(height, width);
sftMaxDot->dotMul(*cpuOutput, *cpuInput);
sftMaxSum->colMerge(*sftMaxDot);
cpuOutput->softmaxDerivative(*cpuInput, *sftMaxSum);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckErr(*cpuOutput, *outputCheck);
}
TEST(Matrix, softmax) {
for (auto height : {1, 11, 73, 128, 200}) {
for (auto width : {1, 32, 100, 512, 1000}) {
VLOG(3) << " height=" << height << " width=" << width;
testMatrixSoftmax(height, width);
testMatrixSoftmaxBp(height, width);
testMatrixSoftmaxThreshold(height, width);
}
testSequenceSoftmax(height);
}
}
void testMatrixAddDotMulVMM(int height, int width, int endCol = 0) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(1, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(1, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(height, width);
MatrixPtr cpuA1 = std::make_shared<CpuMatrix>(1, width);
MatrixPtr cpuB1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC1 = std::make_shared<CpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
cpuA1->copyFrom(*cpuA);
cpuB1->copyFrom(*cpuB);
cpuC1->copyFrom(*cpuC);
if (!endCol) {
cpuA->addDotMulVMM(*cpuB, *cpuC);
gpuA->addDotMulVMM(*gpuB, *gpuC);
cpuA1->addDotMulVMM2(*cpuB1, *cpuC1);
MatrixCheckErr(*cpuA, *cpuA1);
} else {
MatrixPtr subCpuA = cpuA->subColMatrix(0, endCol);
MatrixPtr subCpuB = cpuB->subColMatrix(0, endCol);
MatrixPtr subCpuC = cpuC->subColMatrix(0, endCol);
MatrixPtr subGpuA = gpuA->subColMatrix(0, endCol);
MatrixPtr subGpuB = gpuB->subColMatrix(0, endCol);
MatrixPtr subGpuC = gpuC->subColMatrix(0, endCol);
subCpuA->addDotMulVMM(*subCpuB, *subCpuC);
subGpuA->addDotMulVMM(*subGpuB, *subGpuC);
}
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(1, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixRowSum(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, 1);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr cpuA1 = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr cpuB1 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA1 = std::make_shared<GpuMatrix>(height, 1);
MatrixPtr gpuB1 = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
cpuA1->copyFrom(*cpuA);
cpuB1->copyFrom(*cpuB);
gpuA1->copyFrom(*cpuA);
gpuB1->copyFrom(*cpuB);
cpuA->colMerge(*cpuB);
gpuA->colMerge(*gpuB);
cpuB1->rowSum(*cpuA1);
gpuB1->rowSum(*gpuA1);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, 1);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
outputCheck->copyFrom(*gpuA1);
MatrixCheckErr(*cpuA1, *outputCheck);
}
void testMatrixRowMax(int height, int width, int endCol = 0) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, 1);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
if (!endCol) {
cpuB->rowMax(*cpuA);
gpuB->rowMax(*gpuA);
} else {
MatrixPtr subCpuB = cpuB->subColMatrix(0, endCol);
MatrixPtr subGpuB = gpuB->subColMatrix(0, endCol);
subCpuB->rowMax(*cpuA);
subGpuB->rowMax(*gpuA);
}
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, 1);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixColSum(int height, int width, int endCol = 0) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(1, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(1, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
if (!endCol) {
cpuA->accumulateColSum(*cpuB);
gpuA->accumulateColSum(*gpuB);
} else {
MatrixPtr subCpuA = cpuA->subColMatrix(0, endCol);
MatrixPtr subGpuA = gpuA->subColMatrix(0, endCol);
MatrixPtr subCpuB = cpuB->subColMatrix(0, endCol);
MatrixPtr subGpuB = gpuB->subColMatrix(0, endCol);
subCpuA->accumulateColSum(*subCpuB);
subGpuA->accumulateColSum(*subGpuB);
}
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(1, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixColMax(int height, int width, int endCol = 0) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(1, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(1, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
if (!endCol) {
cpuB->colMax(*cpuA);
gpuB->colMax(*gpuA);
} else {
MatrixPtr subCpuA = cpuA->subColMatrix(0, endCol);
MatrixPtr subGpuA = gpuA->subColMatrix(0, endCol);
MatrixPtr subCpuB = cpuB->subColMatrix(0, endCol);
MatrixPtr subGpuB = gpuB->subColMatrix(0, endCol);
subCpuB->colMax(*subCpuA);
subGpuB->colMax(*subGpuA);
}
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(1, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixCollectBias(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(1, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(1, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
real scale = 1.0f / (rand() % 10); // NOLINT
cpuA->collectBias(*cpuB, scale);
gpuA->collectBias(*gpuB, scale);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(1, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixSumOfSquares(int height, int width, int endCol = 0) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, 1);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
if (!endCol) {
cpuA->sumOfSquares(*cpuB, *cpuC);
gpuA->sumOfSquares(*gpuB, *gpuC);
} else {
MatrixPtr subCpuB = cpuB->subColMatrix(0, endCol);
MatrixPtr subCpuC = cpuC->subColMatrix(0, endCol);
MatrixPtr subGpuB = gpuB->subColMatrix(0, endCol);
MatrixPtr subGpuC = gpuC->subColMatrix(0, endCol);
cpuA->sumOfSquares(*subCpuB, *subCpuC);
gpuA->sumOfSquares(*subGpuB, *subGpuC);
}
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, 1);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
}
void testMatrixBinaryClassificationError(int height, int width) {
MatrixPtr cpuA = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuA = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuB = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuC = std::make_shared<GpuMatrix>(height, width);
MatrixPtr cpuA2 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuB2 = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuC2 = std::make_shared<CpuMatrix>(height, width);
cpuA->randomizeUniform();
cpuB->randomizeUniform();
cpuC->randomizeUniform();
gpuA->copyFrom(*cpuA);
gpuB->copyFrom(*cpuB);
gpuC->copyFrom(*cpuC);
cpuA2->copyFrom(*cpuA);
cpuB2->copyFrom(*cpuB);
cpuC2->copyFrom(*cpuC);
real scale = 0.5;
int columnOffset = rand() % width; // NOLINT
cpuA->binaryClassificationError(columnOffset, *cpuB, *cpuC, scale);
gpuA->binaryClassificationError(columnOffset, *gpuB, *gpuC, scale);
cpuA2->binaryClassificationError2(columnOffset, *cpuB2, *cpuC2, scale);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuA);
MatrixCheckErr(*cpuA, *outputCheck);
MatrixCheckErr(*cpuA, *cpuA2);
}
TEST(Matrix, aggregate) {
for (auto height : {1, 11, 16, 32, 64, 73, 128, 200, 1024, 2345}) {
for (auto width : {1, 9, 16, 32, 64, 100, 512, 1000, 1024, 2453}) {
VLOG(3) << " height=" << height << " width=" << width;
testMatrixRowSum(height, width);
testMatrixRowMax(height, width);
testMatrixColSum(height, width);
testMatrixColMax(height, width);
testMatrixCollectBias(height, width);
testMatrixTernaryRowDotMul(height, width);
testMatrixAddDotMulVMM(height, width);
testMatrixSumOfSquares(height, width);
testMatrixBinaryClassificationError(height, width);
}
}
cpu->setDiag(1.0);
TensorCheckErr(*cpu, *outputCheck);
}
TEST(Matrix, aggregate2) {
for (auto height : {16, 32, 128, 512, 1024}) {
for (auto width :
{16, 32, 64, 128, 256, 512, 768, 1024, 2048, 3072, 4096}) {
TEST(Matrix, unary) {
for (auto height : {1, 3, 11, 73, 128, 200, 330}) {
for (auto width : {1, 3, 32, 100, 512, 1000, 3210}) {
VLOG(3) << " height=" << height << " width=" << width;
int endCol = rand() % width; // NOLINT
testMatrixRowMax(height, width, endCol);
testMatrixSumOfSquares(height, width, endCol);
testMatrixColSum(height, width, endCol);
testMatrixColMax(height, width, endCol);
testMatrixAddDotMulVMM(height, width, endCol);
testMatrixDeepSwap(height, width);
testMatrixZeroAtOffset(height, width);
testMatrixGetSum(height, width);
testMatrixTranspose(height, width);
}
// inverse
testMatrixInverse(height);
}
}
void testMatrixAddAtOffset(int height, int width1, int width2) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width1);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width2);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width1);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width2);
void testMatrixSoftmax(int height, int width) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width);
cpuInput->randomizeUniform();
gpuInput->copyFrom(*cpuInput);
cpuOutput->randomizeUniform();
gpuOutput->copyFrom(*cpuOutput);
int columnOffset = 0;
int offset = std::abs(width1 - width2);
if (offset) {
columnOffset = rand() % offset; // NOLINT
}
cpuOutput->addAtOffset(*cpuInput, columnOffset);
gpuOutput->addAtOffset(*gpuInput, columnOffset);
cpuOutput->zero();
gpuOutput->zero();
cpuInput->softmax(*cpuOutput);
gpuInput->softmax(*gpuOutput);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width2);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckEqual(*cpuOutput, *outputCheck);
TensorCheckErr(*cpuOutput, *gpuOutput);
}
void testMatrixAssignAtOffset(int height, int width1, int width2) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width1);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width2);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width1);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width2);
void testSequenceSoftmax(int batchSize) {
// forward
int inputDim = 1;
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(batchSize, inputDim);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(batchSize, inputDim);
cpuInput->randomizeUniform();
gpuInput->copyFrom(*cpuInput);
cpuOutput->randomizeUniform();
gpuOutput->copyFrom(*cpuOutput);
int columnOffset = 0;
int offset = std::abs(width1 - width2);
if (offset) {
columnOffset = rand() % offset; // NOLINT
}
cpuOutput->assignAtOffset(*cpuInput, columnOffset);
gpuOutput->assignAtOffset(*gpuInput, columnOffset);
IVectorPtr cpuSequence;
generateSequenceStartPositions(batchSize, cpuSequence);
IVectorPtr gpuSequence = IVector::create(cpuSequence->getSize(), true);
gpuSequence->copyFrom(*cpuSequence);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width2);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckEqual(*cpuOutput, *outputCheck);
cpuInput->sequenceSoftmax(*cpuInput, *cpuSequence);
gpuInput->sequenceSoftmax(*gpuInput, *gpuSequence);
TensorCheckErr(*cpuInput, *gpuInput);
}
TEST(Matrix, AtOffset) {
for (auto height : {1, 11, 73, 128, 200}) {
for (auto width1 : {1, 32, 100, 512, 1000}) {
for (auto width2 : {1, 32, 100, 512, 1000}) {
VLOG(3) << " height=" << height << " width1=" << width1
<< " width2=" << width2;
void testMatrixSoftmaxThreshold(int height, int width) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width);
testMatrixAddAtOffset(height, width1, width2);
testMatrixAssignAtOffset(height, width1, width2);
}
cpuInput->randomizeUniform();
cpuInput->getData()[0] = 100.0;
gpuInput->copyFrom(*cpuInput);
cpuOutput->zero();
gpuOutput->zero();
cpuInput->softmax(*cpuOutput);
gpuInput->softmax(*gpuOutput);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(height, width);
outputCheck->copyFrom(*gpuOutput);
// check output zero
int cpuCount = 0;
int gpuCount = 0;
auto zeroNum = [](MatrixPtr out, int& count) {
for (size_t i = 0; i < out->getHeight(); i++) {
for (size_t j = 0; j < out->getWidth(); j++) {
if (out->getElement(i, j) == 0) count++;
}
}
};
zeroNum(cpuOutput, cpuCount);
zeroNum(outputCheck, gpuCount);
EXPECT_EQ(cpuCount, 0) << "Cpu softmax output value 0";
EXPECT_EQ(gpuCount, 0) << "Gpu softmax output value 0";
}
void testMatrixSelectRows(int numSamples, int tableSize, int inputDim) {
MatrixPtr cpuTable = std::make_shared<CpuMatrix>(tableSize, inputDim);
MatrixPtr gpuTable = std::make_shared<GpuMatrix>(tableSize, inputDim);
cpuTable->randomizeUniform();
gpuTable->copyFrom(*cpuTable);
IVectorPtr cpuIds;
IVectorPtr gpuIds;
cpuIds = VectorT<int>::create(numSamples, false);
gpuIds = VectorT<int>::create(numSamples, true);
cpuIds->rand(tableSize);
gpuIds->copyFrom(*cpuIds);
void testMatrixSoftmaxBp(int height, int width) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(height, width);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(height, width);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(height, width);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(numSamples, inputDim);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(numSamples, inputDim);
cpuInput->randomizeUniform();
gpuInput->copyFrom(*cpuInput);
cpuOutput->randomizeUniform();
gpuOutput->copyFrom(*cpuOutput);
gpuOutput->softmaxBackward(*gpuInput);
cpuOutput->selectRows(*cpuTable, *cpuIds);
gpuOutput->selectRows(*gpuTable, *gpuIds);
MatrixPtr sftMaxSum = std::make_shared<CpuMatrix>(height, 1);
MatrixPtr sftMaxDot = std::make_shared<CpuMatrix>(height, width);
sftMaxDot->dotMul(*cpuOutput, *cpuInput);
sftMaxSum->colMerge(*sftMaxDot);
cpuOutput->softmaxDerivative(*cpuInput, *sftMaxSum);
// check
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(numSamples, inputDim);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckEqual(*cpuOutput, *outputCheck);
TensorCheckErr(*cpuOutput, *gpuOutput);
}
TEST(Matrix, softmax) {
for (auto height : {1, 11, 73, 128, 200}) {
for (auto width : {1, 32, 100, 512, 1000}) {
VLOG(3) << " height=" << height << " width=" << width;
testMatrixSoftmax(height, width);
testMatrixSoftmaxBp(height, width);
testMatrixSoftmaxThreshold(height, width);
}
testSequenceSoftmax(height);
}
}
void testMatrixAddToRows(int numSamples, int tableSize, int inputDim) {
......@@ -1342,10 +470,7 @@ void testMatrixAddToRows(int numSamples, int tableSize, int inputDim) {
cpuOutput->addToRows(*cpuTable, *cpuIds);
gpuOutput->addToRows(*gpuTable, *gpuIds);
// check
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(tableSize, inputDim);
outputCheck->copyFrom(*gpuTable);
MatrixCheckErr(*cpuTable, *outputCheck);
TensorCheckErr(*cpuTable, *gpuTable);
}
TEST(Matrix, tableProjection) {
......@@ -1354,7 +479,6 @@ TEST(Matrix, tableProjection) {
for (auto inputDim : {20, 50}) {
VLOG(3) << " numSamples=" << numSamples << " tableSize=" << tableSize
<< " inputDim=" << inputDim;
testMatrixSelectRows(numSamples, tableSize, inputDim);
testMatrixAddToRows(numSamples, tableSize, inputDim);
}
}
......@@ -1388,9 +512,7 @@ void testMatrixMul(bool transa, bool transb, int dimM, int dimN, int dimK) {
cpuC->mul(cpuA, cpuB, alpha, beta);
gpuC->mul(gpuA, gpuB, alpha, beta);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(heightC, widthC);
outputCheck->copyFrom(*gpuC);
MatrixCheckErr(*cpuC, *outputCheck);
TensorCheckErr(*cpuC, *gpuC);
}
void testSubMatrixMul(bool transa, bool transb, int dimM, int dimN, int dimK) {
......@@ -1462,9 +584,7 @@ void testSubMatrixMul(bool transa, bool transb, int dimM, int dimN, int dimK) {
subCpuC->mul(subCpuA, subCpuB, alpha, beta);
subGpuC->mul(subGpuA, subGpuB, alpha, beta);
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(heightC, widthC);
outputCheck->copyFrom(*gpuC);
MatrixCheckErr(*cpuC, *outputCheck);
TensorCheckErr(*cpuC, *gpuC);
}
TEST(Matrix, mul) {
......@@ -1518,9 +638,7 @@ void testVectorReset(int size) {
cpu->reset(value);
gpu->reset(value);
std::shared_ptr<CpuVectorT<T>> out = std::make_shared<CpuVectorT<T>>(size);
out->copyFrom(*gpu);
VectorCheckEqual(*cpu, *out);
TensorCheckEqual(*cpu, *gpu);
}
template <class T>
......@@ -1546,9 +664,7 @@ void testVecortSelectFrom(int size) {
cpuDst->selectFrom(*cpuSrc, *cpuIds);
gpuDst->selectFrom(*gpuSrc, *gpuIds);
std::shared_ptr<CpuVectorT<T>> out = std::make_shared<CpuVectorT<T>>(size);
out->copyFrom(*gpuDst);
VectorCheckEqual(*cpuDst, *out);
TensorCheckEqual(*cpuDst, *gpuDst);
}
template <class T>
......@@ -1559,9 +675,7 @@ void testVecotrZeroMem(int size) {
cpu->zeroMem();
gpu->zeroMem();
std::shared_ptr<CpuVectorT<T>> out = std::make_shared<CpuVectorT<T>>(size);
out->copyFrom(*gpu);
VectorCheckEqual(*cpu, *out);
TensorCheckEqual(*cpu, *gpu);
}
template <class T>
......@@ -1582,9 +696,7 @@ void testVectorIsEqual(int size) {
cpuA->isEqualTo(*cpuB, value);
gpuA->isEqualTo(*gpuB, value);
std::shared_ptr<CpuVectorT<T>> out = std::make_shared<CpuVectorT<T>>(size);
out->copyFrom(*gpuA);
VectorCheckEqual(*cpuA, *out);
TensorCheckEqual(*cpuA, *gpuA);
}
TEST(Vector, Equal) {
......@@ -1615,9 +727,7 @@ void testMatrixTopK(int samples, int dim, int beamSize) {
cpuSrc->rowMax(*cpuIds, *cpuVal);
gpuSrc->rowMax(*gpuIds, *gpuVal);
MatrixPtr outVal = std::make_shared<CpuMatrix>(samples, beamSize);
outVal->copyFrom(*gpuVal);
MatrixCheckEqual(*cpuVal, *outVal);
TensorCheckEqual(*cpuVal, *gpuVal);
}
TEST(Matrix, topK) {
......@@ -1653,9 +763,7 @@ void testSMatrixTopK(int samples, int dim, int beamSize, real ratio) {
cpuSrc->rowMax(*cpuIds, *cpuVal);
gpuSrc->rowMax(*gpuIds, *gpuVal);
MatrixPtr outCheckMaxVal = std::make_shared<CpuMatrix>(samples, beamSize);
outCheckMaxVal->copyFrom(*gpuVal);
MatrixCheckEqual(*cpuVal, *outCheckMaxVal);
TensorCheckEqual(*cpuVal, *gpuVal);
IVectorPtr outCheckIds = std::make_shared<CpuIVector>(samples * beamSize);
outCheckIds->copyFrom(*gpuIds);
......@@ -1685,42 +793,6 @@ TEST(SMatrix, topK) {
}
}
void testMatrixCopyByRowIndex(int outHeight, int inHeight, int width) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(inHeight, width);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(inHeight, width);
cpuInput->randomizeUniform();
gpuInput->copyFrom(*cpuInput);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(outHeight, width);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(outHeight, width);
cpuOutput->zero();
gpuOutput->zero();
IVectorPtr cpuRowIndex = IVector::create(outHeight, false);
IVectorPtr gpuRowIndex = IVector::create(outHeight, true);
cpuRowIndex->rand(inHeight);
gpuRowIndex->copyFrom(*cpuRowIndex);
cpuOutput->copyByRowIndex(*cpuInput, *cpuRowIndex);
gpuOutput->copyByRowIndex(*gpuInput, *gpuRowIndex);
// check
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(outHeight, width);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckEqual(*cpuOutput, *outputCheck);
}
TEST(Matrix, copyByRowIndex) {
for (auto outHeight : {31, 500, 1000}) {
for (auto inHeight : {17, 257, 500, 1200}) {
for (auto width : {512, 1024}) {
VLOG(3) << outHeight << " " << inHeight << " " << width;
testMatrixCopyByRowIndex(outHeight, inHeight, width);
}
}
}
}
void testMatrixSequenceAvgForward(int batchSize, int inputDim, int mode) {
MatrixPtr cpuInput = std::make_shared<CpuMatrix>(batchSize, inputDim);
MatrixPtr gpuInput = std::make_shared<GpuMatrix>(batchSize, inputDim);
......@@ -1741,10 +813,7 @@ void testMatrixSequenceAvgForward(int batchSize, int inputDim, int mode) {
cpuOutput->sequenceAvgForward(*cpuInput, *cpuSequence, mode);
gpuOutput->sequenceAvgForward(*gpuInput, *gpuSequence, mode);
// check
MatrixPtr outputCheck = std::make_shared<CpuMatrix>(newBatchSize, inputDim);
outputCheck->copyFrom(*gpuOutput);
MatrixCheckErr(*cpuOutput, *outputCheck);
TensorCheckErr(*cpuOutput, *gpuOutput);
}
TEST(Matrix, sequenceAvgForward) {
......@@ -1759,45 +828,6 @@ TEST(Matrix, sequenceAvgForward) {
}
}
void testCosSim(int heightX, int heightY, int width, real scale) {
MatrixPtr prevOutX = CpuMatrix::create(heightX, width, false, false);
MatrixPtr prevOutY = CpuMatrix::create(heightY, width, false, false);
MatrixPtr output = CpuMatrix::create(heightX, 1, false, false);
prevOutX->randomizeUniform();
prevOutY->randomizeUniform();
prevOutX->add(-0.5);
prevOutY->add(-0.5);
output->randomizeUniform();
MatrixPtr prevOutXGpu = GpuMatrix::create(heightX, width, false, true);
MatrixPtr prevOutYGpu = GpuMatrix::create(heightY, width, false, true);
MatrixPtr outputGpu = GpuMatrix::create(heightX, 1, false, true);
prevOutXGpu->copyFrom(*prevOutX);
prevOutYGpu->copyFrom(*prevOutY);
outputGpu->copyFrom(*output);
output->cosSim(*prevOutX, *prevOutY, scale);
outputGpu->cosSim(*prevOutXGpu, *prevOutYGpu, scale);
MatrixPtr outputCheck = CpuMatrix::create(heightX, 1, false, false);
outputCheck->copyFrom(*outputGpu);
MatrixCheckErr(*output, *outputCheck);
}
TEST(Matrix, cosSim) {
for (auto heightX : {10, 100, 1000}) {
for (auto heightY : {1, heightX}) {
for (auto width : {10, 100, 1000}) {
for (auto scale : {1.0, 2.0}) {
testCosSim(heightX, heightY, width, scale);
}
}
}
}
}
void testCosSimDerivate(int heightX, int heightY, int width, real scale) {
MatrixPtr prevOutX = CpuMatrix::create(heightX, width, false, false);
MatrixPtr prevOutY = CpuMatrix::create(heightY, width, false, false);
......@@ -1837,12 +867,8 @@ void testCosSimDerivate(int heightX, int heightY, int width, real scale) {
*prevGradYGpu,
scale);
MatrixPtr prevGradXCheck = CpuMatrix::create(heightX, width, false, false);
MatrixPtr prevGradYCheck = CpuMatrix::create(heightY, width, false, false);
prevGradXCheck->copyFrom(*prevGradXGpu);
prevGradYCheck->copyFrom(*prevGradYGpu);
MatrixCheckErr(*prevGradX, *prevGradXCheck);
MatrixCheckErr(*prevGradY, *prevGradYCheck);
TensorCheckErr(*prevGradX, *prevGradXGpu);
TensorCheckErr(*prevGradY, *prevGradYGpu);
}
TEST(Matrix, cosSimDerivate) {
......@@ -1857,80 +883,6 @@ TEST(Matrix, cosSimDerivate) {
}
}
void testParamReluForward(int height, int width, int w_height, int w_width) {
MatrixPtr output = CpuMatrix::create(height, width, false, false);
MatrixPtr input = CpuMatrix::create(height, width, false, false);
MatrixPtr w = CpuMatrix::create(w_height, w_width, false, false);
output->randomizeUniform();
input->randomizeUniform();
w->randomizeUniform();
input->add(-0.5);
MatrixPtr outputGpu = GpuMatrix::create(height, width, false, true);
MatrixPtr inputGpu = GpuMatrix::create(height, width, false, true);
MatrixPtr wGpu = GpuMatrix::create(w_height, w_width, false, true);
inputGpu->copyFrom(*input);
wGpu->copyFrom(*w);
output->paramReluForward(*input, *w);
outputGpu->paramReluForward(*inputGpu, *wGpu);
MatrixPtr outputCheck = CpuMatrix::create(height, width, false, false);
outputCheck->copyFrom(*outputGpu);
MatrixCheckEqual(*output, *outputCheck);
}
TEST(Matrix, paramReluForward) {
for (auto height : {10, 100}) {
for (auto width : {10, 100}) {
for (auto w_height : {1, 2}) {
for (auto w_width : {1, 2}) {
testParamReluForward(height, width, w_height, w_width);
}
}
}
}
}
void testParamReluBackwardW(int height, int width, int w_height, int w_width) {
MatrixPtr oGrad = CpuMatrix::create(height, width, false, false);
MatrixPtr input = CpuMatrix::create(height, width, false, false);
MatrixPtr w = CpuMatrix::create(w_height, w_width, false, false);
oGrad->randomizeUniform();
input->randomizeUniform();
w->randomizeUniform();
input->add(-0.5);
MatrixPtr oGradGpu = GpuMatrix::create(height, width, false, true);
MatrixPtr inputGpu = GpuMatrix::create(height, width, false, true);
MatrixPtr wGpu = GpuMatrix::create(w_height, w_width, false, true);
oGradGpu->copyFrom(*oGrad);
inputGpu->copyFrom(*input);
wGpu->copyFrom(*w);
w->paramReluBackwardW(*oGrad, *input);
wGpu->paramReluBackwardW(*oGradGpu, *inputGpu);
MatrixPtr wCheck = CpuMatrix::create(w_height, w_width, false, false);
wCheck->copyFrom(*wGpu);
MatrixCheckErr(*w, *wCheck);
}
TEST(Matrix, paramReluBackwardW) {
for (auto height : {10, 100}) {
for (auto width : {10, 100}) {
for (auto w_height : {1, 2}) {
for (auto w_width : {1, 2}) {
testParamReluBackwardW(height, width, w_height, w_width);
}
}
}
}
}
void testParamReluBackwardDiff(int height,
int width,
int w_height,
......@@ -1959,9 +911,7 @@ void testParamReluBackwardDiff(int height,
diff->paramReluBackwardDiff(*oGrad, *input, *w);
diffGpu->paramReluBackwardDiff(*oGradGpu, *inputGpu, *wGpu);
MatrixPtr diffCheck = CpuMatrix::create(height, width, false, false);
diffCheck->copyFrom(*diffGpu);
MatrixCheckErr(*diff, *diffCheck);
TensorCheckErr(*diff, *diffGpu);
}
TEST(Matrix, paramReluBackwardDiff) {
......@@ -1992,9 +942,7 @@ void testClassificationError(int numSamples, int dim) {
cpuError->classificationError(cpuOutput, cpuLabel);
gpuError->classificationError(gpuOutput, gpuLabel);
MatrixPtr check = std::make_shared<CpuMatrix>(numSamples, 1);
check->copyFrom(*gpuError);
MatrixCheckEqual(*cpuError, *check);
TensorCheckEqual(*cpuError, *gpuError);
}
TEST(Matrix, classificationError) {
......@@ -2159,9 +1107,8 @@ void testAvgPoolFwdBwd(int numSamples,
outW,
padH,
padW);
MatrixPtr targetCheck = CpuMatrix::create(numSamples, outWidth, false, false);
targetCheck->copyFrom(*targetGpu);
MatrixCheckErr(*target, *targetCheck);
TensorCheckErr(*target, *targetGpu);
MatrixPtr inputGrad = CpuMatrix::create(numSamples, inWidth, false, false);
MatrixPtr inputGpuGrad = GpuMatrix::create(numSamples, inWidth, false, true);
......@@ -2200,10 +1147,8 @@ void testAvgPoolFwdBwd(int numSamples,
1.0,
padH,
padW);
MatrixPtr targetBwdCheck =
CpuMatrix::create(numSamples, inWidth, false, false);
targetBwdCheck->copyFrom(*inputGpuGrad);
MatrixCheckErr(*inputGrad, *targetBwdCheck);
TensorCheckErr(*inputGrad, *inputGpuGrad);
}
TEST(Matrix, PoolFwdBwd) {
......@@ -2268,11 +1213,9 @@ void testMaxOutFwdBwd(
MatrixPtr target = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpu = GpuMatrix::create(numSamples, outWidth, false, true);
MatrixPtr targetCheck = CpuMatrix::create(numSamples, outWidth, false, false);
IVectorPtr id = CpuIVector::create(numSamples * outWidth, false);
IVectorPtr idGpu = GpuIVector::create(numSamples * outWidth, true);
IVectorPtr idCheck = CpuIVector::create(numSamples * outWidth, false);
input->randomizeUniform();
inputGpu->copyFrom(*input);
......@@ -2280,11 +1223,8 @@ void testMaxOutFwdBwd(
target->maxoutForward(*input, *id, outChannels, groups);
targetGpu->maxoutForward(*inputGpu, *idGpu, outChannels, groups);
// check
targetCheck->copyFrom(*targetGpu);
MatrixCheckErr(*target, *targetCheck);
idCheck->copyFrom(*idGpu);
VectorCheckEqual(*id, *idCheck);
TensorCheckErr(*target, *targetGpu);
TensorCheckEqual(*id, *idGpu);
// backward
MatrixPtr inputGrad = CpuMatrix::create(numSamples, inWidth, false, false);
......@@ -2293,8 +1233,6 @@ void testMaxOutFwdBwd(
MatrixPtr targetGrad = CpuMatrix::create(numSamples, outWidth, false, false);
MatrixPtr targetGpuGrad =
GpuMatrix::create(numSamples, outWidth, false, true);
MatrixPtr targetCheckGrad =
CpuMatrix::create(numSamples, inWidth, false, false);
inputGrad->randomizeUniform();
targetGrad->randomizeUniform();
......@@ -2304,9 +1242,7 @@ void testMaxOutFwdBwd(
inputGrad->maxoutBackward(*targetGrad, *id, outChannels, groups);
inputGpuGrad->maxoutBackward(*targetGpuGrad, *idGpu, outChannels, groups);
// check
targetCheckGrad->copyFrom(*inputGpuGrad);
MatrixCheckErr(*inputGrad, *targetCheckGrad);
TensorCheckErr(*inputGrad, *inputGpuGrad);
}
TEST(Matrix, MaxOutFwdBwd) {
......@@ -2326,113 +1262,6 @@ TEST(Matrix, MaxOutFwdBwd) {
}
}
void testAddSharedBias(int numSamples, int dim, int channel) {
MatrixPtr cpuData = std::make_shared<CpuMatrix>(numSamples, dim);
MatrixPtr gpuData = std::make_shared<GpuMatrix>(numSamples, dim);
MatrixPtr cpuBias = std::make_shared<CpuMatrix>(1, channel);
MatrixPtr gpuBias = std::make_shared<GpuMatrix>(1, channel);
cpuData->randomizeUniform();
gpuData->copyFrom(*cpuData);
cpuBias->randomizeUniform();
gpuBias->copyFrom(*cpuBias);
cpuData->addSharedBias(*cpuBias, 1.0);
gpuData->addSharedBias(*gpuBias, 1.0);
MatrixPtr check = std::make_shared<CpuMatrix>(numSamples, dim);
check->copyFrom(*gpuData);
MatrixCheckErr(*cpuData, *check);
}
void testCollectSharedBias(int numSamples, int dim, int channel) {
MatrixPtr cpuData = std::make_shared<CpuMatrix>(numSamples, dim);
MatrixPtr gpuData = std::make_shared<GpuMatrix>(numSamples, dim);
MatrixPtr cpuBias = std::make_shared<CpuMatrix>(1, channel);
MatrixPtr gpuBias = std::make_shared<GpuMatrix>(1, channel);
cpuData->randomizeUniform();
gpuData->copyFrom(*cpuData);
cpuBias->randomizeUniform();
gpuBias->copyFrom(*cpuBias);
cpuBias->collectSharedBias(*cpuData, 1.0);
gpuBias->collectSharedBias(*gpuData, 1.0);
MatrixPtr check = std::make_shared<CpuMatrix>(1, channel);
check->copyFrom(*gpuBias);
MatrixCheckErr(*cpuBias, *check);
}
TEST(Matrix, sharedBias) {
for (auto numSamples : {1, 100, 520}) {
for (auto dim : {100 * 16, 100 * 32}) {
for (auto channel : {8, 16}) {
VLOG(3) << " numSamples=" << numSamples << " dim=" << dim
<< " channel=" << channel;
testAddSharedBias(numSamples, dim, channel);
testCollectSharedBias(numSamples, dim, channel);
}
}
}
}
void testMultiBinaryLabelCrossEntropy(int numSamples, int dim) {
MatrixPtr output = std::make_shared<CpuMatrix>(numSamples, dim);
MatrixPtr cpuOutput = std::make_shared<CpuMatrix>(numSamples, dim);
MatrixPtr gpuOutput = std::make_shared<GpuMatrix>(numSamples, dim);
MatrixPtr cpuEntropy = std::make_shared<CpuMatrix>(numSamples, 1);
MatrixPtr gpuEntropy = std::make_shared<GpuMatrix>(numSamples, 1);
MatrixPtr cpuGrad = std::make_shared<CpuMatrix>(numSamples, dim);
MatrixPtr gpuGrad = std::make_shared<GpuMatrix>(numSamples, dim);
MatrixPtr cpuLabel = std::make_shared<CpuSparseMatrix>(
numSamples, dim, numSamples, NO_VALUE, SPARSE_CSR, false);
MatrixPtr gpuLabel = std::make_shared<GpuSparseMatrix>(
numSamples, dim, numSamples, NO_VALUE, SPARSE_CSR, false);
for (int i = 0; i < numSamples; i++) {
const unsigned int id = rand() % dim; // NOLINT
cpuLabel->setRow(i, 1, &id, nullptr);
gpuLabel->setRow(i, 1, &id, nullptr);
}
output->randomizeUniform();
cpuOutput->zeroMem();
output->softmax(*cpuOutput);
gpuOutput->copyFrom(*cpuOutput);
cpuEntropy->zeroMem();
gpuEntropy->zeroMem();
cpuEntropy->multiBinaryLabelCrossEntropy(*cpuOutput, *cpuLabel);
gpuEntropy->multiBinaryLabelCrossEntropy(*gpuOutput, *gpuLabel);
MatrixPtr check1 = std::make_shared<CpuMatrix>(numSamples, 1);
check1->copyFrom(*gpuEntropy);
MatrixCheckErr(*cpuEntropy, *check1);
cpuGrad->zeroMem();
gpuGrad->zeroMem();
cpuGrad->multiBinaryLabelCrossEntropyBp(*cpuOutput, *cpuLabel);
gpuGrad->multiBinaryLabelCrossEntropyBp(*gpuOutput, *gpuLabel);
MatrixPtr check2 = std::make_shared<CpuMatrix>(numSamples, dim);
check2->copyFrom(*gpuGrad);
MatrixCheckErr(*cpuGrad, *check2);
}
TEST(Matrix, multiBinaryCrossEntropy) {
for (auto numSamples : {100, 1000, 10000}) {
for (auto dim : {100, 1000, 10000}) {
VLOG(3) << " numSamples=" << numSamples << " dim=" << dim;
testMultiBinaryLabelCrossEntropy(numSamples, dim);
}
}
}
int main(int argc, char** argv) {
testing::InitGoogleTest(&argc, argv);
initMain(argc, argv);
......
......@@ -225,6 +225,8 @@ void Argument::resizeAndCopyFrom(const Argument& src,
}
resizeAndCopy(udp, src.udp, useGpu, stream);
resizeAndCopy(strs, src.strs, useGpu, stream);
frameWidth = src.frameWidth;
frameHeight = src.frameHeight;
}
int32_t Argument::resizeAndCopyFrom(const Argument& src,
......
FROM PADDLE_BASE_IMAGE
MAINTAINER PaddlePaddle Dev Team <paddle-dev@baidu.com>
FROM ubuntu:14.04
MAINTAINER PaddlePaddle Authors <paddle-dev@baidu.com>
# It is good to run apt-get install with Dockerfile RUN directive,
# because if the following invocation to /root/build.sh fails, `docker
# build` wouldn't have to re-install packages after we fix
# /root/build.sh. For more about Docker build cache, please refer to
# https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#/build-cache.
RUN apt-get update && \
apt-get install -y cmake libprotobuf-dev protobuf-compiler git \
RUN apt-get update \
&& apt-get install -y cmake libprotobuf-dev protobuf-compiler git \
libgoogle-glog-dev libgflags-dev libatlas-dev libatlas3-base g++ m4 python-pip \
python-protobuf python-numpy python-dev swig openssh-server \
wget unzip python-matplotlib tar xz-utils bzip2 gzip coreutils \
sed grep graphviz libjpeg-dev zlib1g-dev doxygen && \
apt-get clean -y
RUN pip install BeautifulSoup docopt PyYAML pillow \
'sphinx>=1.4.0' sphinx_rtd_theme breathe recommonmark
sed grep graphviz libjpeg-dev zlib1g-dev doxygen \
clang-3.8 llvm-3.8 libclang-3.8-dev \
&& apt-get clean -y
RUN pip install -U BeautifulSoup docopt PyYAML pillow \
sphinx sphinx_rtd_theme breathe recommonmark
ENV WITH_GPU=PADDLE_WITH_GPU
ENV WITH_AVX=PADDLE_WITH_AVX
ARG WITH_AVX
ENV WITH_AVX=${WITH_AVX:-ON}
ENV WITH_GPU=OFF
RUN mkdir /paddle
COPY . /paddle/
COPY paddle/scripts/docker/build.sh /root/
RUN /root/build.sh
RUN /paddle/paddle/scripts/docker/build.sh
VOLUME ["/usr/share/nginx/html/data", "/usr/share/nginx/html/paddle"]
RUN echo 'export LD_LIBRARY_PATH=/usr/lib64:${LD_LIBRARY_PATH}' >> /etc/profile
RUN pip install /usr/local/opt/paddle/share/wheels/*.whl
......
FROM nvidia/cuda:7.5-cudnn5-devel-ubuntu14.04
MAINTAINER PaddlePaddle Authors <paddle-dev@baidu.com>
RUN apt-get update \
&& apt-get install -y cmake libprotobuf-dev protobuf-compiler git \
libgoogle-glog-dev libgflags-dev libatlas-dev libatlas3-base g++ m4 python-pip \
python-protobuf python-numpy python-dev swig openssh-server \
wget unzip python-matplotlib tar xz-utils bzip2 gzip coreutils \
sed grep graphviz libjpeg-dev zlib1g-dev doxygen \
clang-3.8 llvm-3.8 libclang-3.8-dev \
&& apt-get clean -y
RUN pip install -U BeautifulSoup docopt PyYAML pillow \
sphinx sphinx_rtd_theme breathe recommonmark
ARG WITH_AVX
ENV WITH_AVX=${WITH_AVX:-ON}
ENV WITH_GPU=ON
RUN mkdir /paddle
COPY . /paddle/
RUN /paddle/paddle/scripts/docker/build.sh
VOLUME ["/usr/share/nginx/html/data", "/usr/share/nginx/html/paddle"]
RUN echo 'export LD_LIBRARY_PATH=/usr/lib64:${LD_LIBRARY_PATH}' >> /etc/profile
RUN pip install /usr/local/opt/paddle/share/wheels/*.whl
RUN paddle version # print version after build
# Configure OpenSSH server. c.f. https://docs.docker.com/engine/examples/running_ssh_service
RUN mkdir /var/run/sshd
RUN echo 'root:root' | chpasswd
RUN sed -ri 's/^PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config
RUN sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
......@@ -20,8 +20,28 @@ cmake .. \
-DWITH_AVX=${WITH_AVX} \
-DWITH_SWIG_PY=ON \
-DCUDNN_ROOT=/usr/ \
-DWITH_STYLE_CHECK=OFF
-DWITH_STYLE_CHECK=OFF \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON
make -j `nproc`
make install
# Install woboq_codebrowser.
git clone https://github.com/woboq/woboq_codebrowser /woboq
cd /woboq
cmake -DLLVM_CONFIG_EXECUTABLE=/usr/bin/llvm-config-3.8 \
-DCMAKE_BUILD_TYPE=Release \
.
make
export WOBOQ_OUT=/usr/share/nginx/html/paddle
export BUILD_DIR=/paddle/build
mkdir -p $WOBOQ_OUT
cp -rv /woboq/data $WOBOQ_OUT/../data
/woboq/generator/codebrowser_generator \
-b /paddle/build \
-a \
-o $WOBOQ_OUT \
-p paddle:/paddle
/woboq/indexgenerator/codebrowser_indexgenerator $WOBOQ_OUT
trap : 0
#!/bin/bash
set -e
cd `dirname $0`
m4 -DPADDLE_WITH_GPU=OFF \
-DPADDLE_WITH_AVX=ON \
-DPADDLE_BASE_IMAGE=ubuntu:14.04 \
Dockerfile.m4 > Dockerfile.cpu
m4 -DPADDLE_WITH_GPU=OFF \
-DPADDLE_WITH_AVX=OFF \
-DPADDLE_BASE_IMAGE=ubuntu:14.04 \
Dockerfile.m4 > Dockerfile.cpu-noavx
m4 -DPADDLE_WITH_GPU=ON \
-DPADDLE_WITH_AVX=ON \
-DPADDLE_BASE_IMAGE=nvidia/cuda:7.5-cudnn5-devel-ubuntu14.04 \
Dockerfile.m4 > Dockerfile.gpu
m4 -DPADDLE_WITH_GPU=ON \
-DPADDLE_WITH_AVX=OFF \
-DPADDLE_BASE_IMAGE=nvidia/cuda:7.5-cudnn5-devel-ubuntu14.04 \
Dockerfile.m4 > Dockerfile.gpu-noavx
FROM paddledev/paddle:cpu-devel-latest
COPY build.sh /
RUN pip install sphinx &&\
pip install sphinx_rtd_theme &&\
apt install -y doxygen graphviz &&\
pip install breathe recommonmark numpy protobuf==2.6.1
CMD /build.sh
......@@ -87,10 +87,8 @@ void Tester::testOneDataBatch(const DataBatch& dataBatch,
void Tester::testOnePeriod() {
DataBatch dataBatch;
int64_t batchSize = config_->getOptConfig().batch_size();
bool testAllData =
intconfig_->testPeriod == 0 || intconfig_->testAllDataInOnePeriod;
int batches =
testAllData ? std::numeric_limits<int>::max() : intconfig_->testPeriod;
int batches = std::numeric_limits<int>::max();
std::vector<Argument> outArgs;
......@@ -102,11 +100,7 @@ void Tester::testOnePeriod() {
if (intconfig_->prevBatchState) {
gradientMachine_->resetState();
}
if (testAllData) {
break;
} else {
num = testDataProvider_->getNextBatch(batchSize, &dataBatch);
}
}
testOneDataBatch(dataBatch, &outArgs);
}
......
......@@ -39,11 +39,6 @@ struct TesterConfig {
*/
int testPeriod;
/**
* indicate whether testing data in one period
*/
bool testAllDataInOnePeriod;
/**
* indicate whether to save previous batch state
*/
......
......@@ -39,20 +39,16 @@ limitations under the License. */
#include "TrainerConfigHelper.h"
P_DEFINE_string(config, "", "Trainer config file");
P_DEFINE_int32(test_period,
0,
"Run test every so many train batches."
" 0 for testing after each pass."
" If not 0, test log_period batches."
" If 0, test on all test data");
P_DEFINE_bool(local, true, "Train in local mode or not");
P_DEFINE_int32(test_period, 0,
"if equal 0, do test on all test data at the end of "
"each pass. While if equal non-zero, do test on all test "
"data every test_period batches");
P_DEFINE_bool(test_all_data_in_one_period, false,
"This option was deprecated, since we will always do "
"test on all test set ");
P_DEFINE_bool(
test_all_data_in_one_period,
false,
"true will test all data in one test peroid."
"Otherwise test (batch_size * log_peroid) data in one test period.");
P_DEFINE_bool(local, true, "Train in local mode or not");
P_DEFINE_int32(average_test_period,
0,
......@@ -633,8 +629,19 @@ void Trainer::test() { tester_->test(); }
std::unique_ptr<TesterConfig> Trainer::createTesterConfig() {
TesterConfig* conf = new TesterConfig;
if (FLAGS_test_period) {
LOG(WARNING)
<< "The meaning of --test_period is changed: "
<< "if equal 0, do test on all test data at the end of "
<< "each pass. While if equal non-zero, do test on all test "
<< "data every test_period batches ";
}
if (FLAGS_test_all_data_in_one_period) {
LOG(WARNING)
<< "--test_all_data_in_one_period was deprecated, since "
<< "we will always do test on all test set ";
}
conf->testPeriod = FLAGS_test_period;
conf->testAllDataInOnePeriod = FLAGS_test_all_data_in_one_period;
conf->prevBatchState = FLAGS_prev_batch_state;
conf->logPeriod = FLAGS_log_period;
conf->loadsaveParametersInPserver = FLAGS_loadsave_parameters_in_pserver;
......
......@@ -59,7 +59,6 @@ pool = img_pool_layer(input=fc2,
padding_y=2,
stride=2,
stride_y=3,
img_width=3,
pool_type=CudnnAvgPooling())
concat = concat_layer(input=[fc3, fc4])
......
......@@ -77,6 +77,12 @@ message ConvConfig {
required uint32 filter_size_y = 10;
required uint32 padding_y = 11;
required uint32 stride_y = 12;
// if not set, use output_x
optional uint32 output_y = 13;
// if not set, use img_size
optional uint32 img_size_y = 14;
}
message PoolConfig {
......@@ -122,11 +128,9 @@ message PoolConfig {
}
message SppConfig {
required string pool_type = 1;
required uint32 pyramid_height = 2;
required uint32 channels = 3;
required uint32 img_size = 4;
optional uint32 img_size_y = 5;
required ImageConfig image_conf = 1;
required string pool_type = 2;
required uint32 pyramid_height = 3;
}
message NormConfig {
......@@ -156,6 +160,12 @@ message NormConfig {
// fixed window: shared a fixed window for each value
// sliding window: have a different window for each value
optional bool blocked = 8;
// if not set, use output_x
optional uint32 output_y = 9;
// if not set, use img_size
optional uint32 img_size_y = 10;
}
message BlockExpandConfig {
......@@ -180,12 +190,8 @@ message BlockExpandConfig {
}
message MaxOutConfig {
required uint32 channels = 1;
required ImageConfig image_conf = 1;
required uint32 groups = 2;
// The size of input feature map.
required uint32 img_size_x = 3;
required uint32 img_size_y = 4;
}
message ProjectionConfig {
......@@ -226,12 +232,10 @@ message OperatorConfig {
message BilinearInterpConfig {
// The size of input feature map.
optional uint32 img_size_x = 1;
optional uint32 img_size_y = 2;
required ImageConfig image_conf = 1;
// The size of output feature map.
required uint32 out_size_x = 3;
required uint32 out_size_y = 4;
required uint32 num_channels = 5;
required uint32 out_size_x = 2;
required uint32 out_size_y = 3;
}
message ImageConfig {
......@@ -241,6 +245,7 @@ message ImageConfig {
// The size of input feature map.
required uint32 img_size = 8;
required uint32 img_size_y = 9;
}
message LayerInputConfig {
......@@ -414,8 +419,12 @@ sinclude(`ModelConfigLayer.proto.m4')
// to string and reinterpreted in the user's own layer implementation.
optional string user_arg = 49;
// For WarpCTCLayer
optional uint32 blank = 50 [default = 0];
// to indicate rectangle image data
optional uint64 height = 50;
optional uint64 width = 51;
// blank label used in ctc loss
optional uint32 blank = 52 [default = 0];
}
message EvaluatorConfig {
......
......@@ -138,7 +138,14 @@ def init_config_environment(
g_root_submodel=None,
g_submodel_map={},
g_submodel_stack=[],
g_add_submodel_suffix=False, ):
g_add_submodel_suffix=False,
# Whether current layer needs to pass the image height and width.
# Default value is true, but if it encounters recurrent_layer_group,
# it will be false. The reason is that image is converted to be sequence,
# image height will be sequence length, and image width will be feature
# length of each timestep.
g_pass_height_width=True, ):
for k, v in locals().iteritems():
globals()[k] = copy.deepcopy(v)
......@@ -686,9 +693,9 @@ class ConvProjection(Projection):
parse_conv(conv_conf, input_layer_name, self.proj_conf.conv_conf,
num_filters)
# TODO: support rectangle input
self.proj_conf.output_size = (self.proj_conf.conv_conf.output_x
**2) * num_filters
self.proj_conf.output_size = self.proj_conf.conv_conf.output_x * \
self.proj_conf.conv_conf.output_y * \
num_filters
def calc_output_size(self, input_layer_config):
return self.proj_conf.output_size
......@@ -764,8 +771,9 @@ class ConvOperator(Operator):
parse_conv(conv_conf,
MakeLayerNameInSubmodel(input_layer_names[0]),
self.operator_conf.conv_conf, num_filters)
self.operator_conf.output_size = (self.operator_conf.conv_conf.output_x
**2) * num_filters
self.operator_conf.output_size = self.operator_conf.conv_conf.output_x * \
self.operator_conf.conv_conf.output_y * \
num_filters
config_assert(len(input_layer_names) == 2, "Conv is binary operator")
......@@ -800,14 +808,12 @@ class Conv(Cfg):
config_assert(output_x <= 0)
# please refer to the comments in proto/ModelConfig.proto
@config_class
class BilinearInterp(Cfg):
def __init__(self, out_size_x=None, out_size_y=None, num_channels=None):
def __init__(self, out_size_x=None, out_size_y=None, channels=None):
self.add_keys(locals())
# please refer to the comments in proto/ModelConfig.proto
@config_class
class Pool(Cfg):
def __init__(
......@@ -825,14 +831,12 @@ class Pool(Cfg):
self.add_keys(locals())
# please refer to the comments in proto/ModelConfig.proto
@config_class
class SpatialPyramidPool(Cfg):
def __init__(self, pool_type, pyramid_height, channels, img_width=None):
def __init__(self, pool_type, pyramid_height, channels):
self.add_keys(locals())
# please refer to the comments in proto/ModelConfig.proto
@config_class
class Norm(Cfg):
def __init__(self,
......@@ -847,7 +851,6 @@ class Norm(Cfg):
self.add_keys(locals())
# please refer to the comments in proto/ModelConfig.proto
@config_class
class Image(Cfg):
def __init__(self, channels, img_size=None):
......@@ -1054,18 +1057,8 @@ def TestData(data_config, async_load_data=None):
g_config.test_data_config.async_load_data = async_load_data
def parse_bilinear(bilinear, input_layer_name, bilinear_conf):
bilinear_conf.out_size_x = bilinear.out_size_x
bilinear_conf.out_size_y = bilinear.out_size_y
bilinear_conf.num_channels = bilinear.num_channels
'''
caffe_mode: compute the output size using floor instead of ceil,
which is consistent of caffe and CuDNN's convention.
'''
#caffe_mode: compute the output size using floor instead of ceil,
# which is consistent of caffe and CuDNN's convention.
def cnn_output_size(img_size, filter_size, padding, stride, caffe_mode):
output = (2 * padding + img_size - filter_size) / float(stride)
if caffe_mode:
......@@ -1074,20 +1067,34 @@ def cnn_output_size(img_size, filter_size, padding, stride, caffe_mode):
return 1 + int(math.ceil(output))
'''
calcualte image_size based on output_size for convolution.
It is the reverse function of cnn_output_size
'''
#calcualte image_size based on output_size for de-convolution (ConvTransLayer).
#It is the reverse function of cnn_output_size
def cnn_image_size(output_size, filter_size, padding, stride, caffe_mode):
if caffe_mode:
img_size = (output_size - 1) * stride + filter_size - 2 * padding
else:
img_size = (output_size - 2) * stride + filter_size - 2 * padding + 1
if not caffe_mode:
img_size = img_size + 1
return img_size
def get_img_size(input_layer_name, channels):
input = g_layer_map[input_layer_name]
img_pixels = input.size / channels
img_size = input.width if input.width > 0 else int(img_pixels**0.5)
img_size_y = input.height if input.height > 0 else int(img_pixels /
img_size)
config_assert(
img_size * img_size_y == img_pixels,
"Input layer %s: Incorrect input image size %d * %d for input image pixels %d"
% (input_layer_name, img_size, img_size_y, img_pixels))
return img_size, img_size_y
def parse_bilinear(bilinear, input_layer_name, bilinear_conf):
parse_image(bilinear, input_layer_name, bilinear_conf.image_conf)
bilinear_conf.out_size_x = bilinear.out_size_x
bilinear_conf.out_size_y = bilinear.out_size_y
def parse_pool(pool, input_layer_name, pool_conf):
pool_conf.pool_type = pool.pool_type
config_assert(pool.pool_type in [
......@@ -1103,14 +1110,8 @@ def parse_pool(pool, input_layer_name, pool_conf):
pool_conf.size_y = default(pool.size_y, pool_conf.size_x)
pool_conf.stride_y = default(pool.stride_y, pool_conf.stride)
img_pixels = g_layer_map[input_layer_name].size / pool.channels
# the img_width may be removed,
# and it can be calculated automatically later.
pool_conf.img_size = default(pool.img_width, int(img_pixels**0.5))
pool_conf.img_size_y = img_pixels / pool_conf.img_size
config_assert(pool_conf.img_size * pool_conf.img_size_y == img_pixels,
"Incorrect input image size %d for input image pixels %d" %
(pool_conf.img_size, img_pixels))
pool_conf.img_size, pool_conf.img_size_y = \
get_img_size(input_layer_name, pool.channels)
config_assert(not pool.start, "start is deprecated in pooling.")
......@@ -1126,29 +1127,18 @@ def parse_pool(pool, input_layer_name, pool_conf):
def parse_spp(spp, input_layer_name, spp_conf):
parse_image(spp, input_layer_name, spp_conf.image_conf)
spp_conf.pool_type = spp.pool_type
config_assert(spp.pool_type in ['max-projection', 'avg-projection'],
"pool-type %s is not in "
"['max-projection', 'avg-projection']" % spp.pool_type)
spp_conf.pyramid_height = spp.pyramid_height
spp_conf.channels = spp.channels
img_pixels = g_layer_map[input_layer_name].size / spp_conf.channels
spp_conf.img_size = default(spp.img_width, int(img_pixels**0.5))
spp_conf.img_size_y = img_pixels / spp_conf.img_size
config_assert(spp_conf.img_size * spp_conf.img_size_y == img_pixels,
"Incorrect input image size %d for input image pixels %d" %
(spp_conf.img_size, img_pixels))
def parse_image(image, input_layer_name, image_conf):
image_conf.channels = image.channels
image_pixels = g_layer_map[input_layer_name].size / image_conf.channels
image_conf.img_size = int(image_pixels**0.5)
config_assert((image_conf.img_size**2) == image_pixels,
"Incorrect input image size %d for input image pixels %d" %
(image_conf.img_size, image_pixels))
image_conf.img_size, image_conf.img_size_y = \
get_img_size(input_layer_name, image_conf.channels)
def parse_norm(norm, input_layer_name, norm_conf):
......@@ -1162,24 +1152,18 @@ def parse_norm(norm, input_layer_name, norm_conf):
norm_conf.pow = norm.pow
norm_conf.blocked = norm.blocked
img_pixels = g_layer_map[input_layer_name].size / norm.channels
norm_conf.img_size = int(img_pixels**0.5)
config_assert((norm_conf.img_size**2) == img_pixels,
"Incorrect input image size %d for input image pixels %d" %
(norm_conf.img_size, img_pixels))
norm_conf.img_size, norm_conf.img_size_y = \
get_img_size(input_layer_name, norm.channels)
norm_conf.output_x = norm_conf.img_size
norm_conf.output_y = norm_conf.img_size_y
if norm.norm_type in ['cmrnorm-projection']:
norm_conf.scale /= norm.size
else:
norm_conf.scale /= norm.size**2
'''
caffe_mode: compute the output size using floor instead of ceil,
which is consistent of caffe and CuDNN's convention.
'''
#caffe_mode: compute the output size using floor instead of ceil,
# which is consistent of caffe and CuDNN's convention.
def parse_conv(conv, input_layer_name, conv_conf, num_filters, trans=False):
conv_conf.filter_size = conv.filter_size
conv_conf.filter_size_y = conv.filter_size_y
......@@ -1193,33 +1177,24 @@ def parse_conv(conv, input_layer_name, conv_conf, num_filters, trans=False):
if not trans:
conv_conf.filter_channels = conv.channels / conv.groups
img_pixels = g_layer_map[input_layer_name].size / conv.channels
print('channels=%d size=%d' % (conv.channels,
g_layer_map[input_layer_name].size))
conv_conf.img_size = int(img_pixels**0.5)
config_assert((conv_conf.img_size**2) == img_pixels, (
"Input layer %s: Incorrect input image size %d for input " +
"image pixels %d") %
(input_layer_name, conv_conf.img_size, img_pixels))
conv_conf.img_size, conv_conf.img_size_y = \
get_img_size(input_layer_name, conv.channels)
conv_conf.output_x = cnn_output_size(
conv_conf.img_size, conv_conf.filter_size, conv_conf.padding,
conv_conf.stride, conv_conf.caffe_mode)
conv_conf.output_y = cnn_output_size(
conv_conf.img_size_y, conv_conf.filter_size_y, conv_conf.padding_y,
conv_conf.stride_y, conv_conf.caffe_mode)
else:
conv_conf.filter_channels = num_filters / conv.groups
outputSize = g_layer_map[input_layer_name].size / conv.channels
print('channels=%d size=%d' % (conv.channels,
g_layer_map[input_layer_name].size))
conv_conf.output_x = int(outputSize**0.5)
config_assert((conv_conf.output_x**2) == outputSize, (
"Input layer %s: Incorrect input image size %d for input " +
"image pixels %d") %
(input_layer_name, conv_conf.output_x, outputSize))
conv_conf.output_x, conv_conf.output_y = \
get_img_size(input_layer_name, conv.channels)
conv_conf.img_size = cnn_image_size(
conv_conf.output_x, conv_conf.filter_size, conv_conf.padding,
conv_conf.stride, conv_conf.caffe_mode)
conv_conf.img_size_y = cnn_image_size(
conv_conf.output_y, conv_conf.filter_size_y, conv_conf.padding_y,
conv_conf.stride_y, conv_conf.caffe_mode)
def parse_block_expand(block_expand, input_layer_name, block_expand_conf):
......@@ -1248,10 +1223,8 @@ def parse_block_expand(block_expand, input_layer_name, block_expand_conf):
def parse_maxout(maxout, input_layer_name, maxout_conf):
maxout_conf.channels = maxout.channels
parse_image(maxout, input_layer_name, maxout_conf.image_conf)
maxout_conf.groups = maxout.groups
maxout_conf.img_size_x = maxout.img_size_x
maxout_conf.img_size_y = maxout.img_size_y
# Define an evaluator
......@@ -1378,6 +1351,12 @@ class LayerBase(object):
g_current_submodel.layer_names.append(self.config.name)
if self.config.type != 'data' and g_pass_height_width:
height = self.get_input_layer(0).height
width = self.get_input_layer(0).width
if height and width:
self.set_layer_height_width(height, width)
def get_input_layer(self, input_index):
return g_layer_map[self.config.inputs[input_index].input_layer_name]
......@@ -1495,6 +1474,23 @@ class LayerBase(object):
'Different inputs result in' +
'different layer size at layer %s' % self.config.name)
def set_layer_height_width(self, height, width):
self.config.height = height
self.config.width = width
def set_cnn_layer(self,
input_layer_name,
height,
width,
channels,
is_print=True):
size = height * width * channels
self.set_layer_size(size)
self.set_layer_height_width(height, width)
if is_print:
print("output for %s: c = %d, h = %d, w = %d, size = %d" %
(input_layer_name, channels, height, width, size))
@config_layer('multi_class_cross_entropy_with_selfnorm')
class MultiClassCrossEntropySelfNormCostLayer(LayerBase):
......@@ -1584,9 +1580,11 @@ class PrintLayer(LayerBase):
@config_layer('data')
class DataLayer(LayerBase):
def __init__(self, name, size, device=None):
def __init__(self, name, size, height=None, width=None, device=None):
super(DataLayer, self).__init__(
name, 'data', size, inputs=[], device=device)
if height and width:
self.set_layer_height_width(height, width)
'''
......@@ -1685,14 +1683,13 @@ class ConvLayerBase(LayerBase):
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index)
parse_conv(self.inputs[input_index].conv, input_layer.name,
self.config.inputs[input_index].conv_conf, num_filters)
conv_conf = self.config.inputs[input_index].conv_conf
parse_conv(self.inputs[input_index].conv, input_layer.name,
conv_conf, num_filters)
psize = self.calc_parameter_size(conv_conf)
print("output size for %s is %d " % (name, conv_conf.output_x))
self.create_input_parameter(input_index, psize)
self.set_layer_size(
(conv_conf.output_x**2) * self.config.num_filters)
self.set_cnn_layer(name, conv_conf.output_y, conv_conf.output_x,
self.config.num_filters)
psize = self.config.size
if shared_biases:
......@@ -1779,10 +1776,11 @@ class NormLayer(LayerBase):
name, 'norm', 0, inputs=inputs, device=device)
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index)
parse_norm(self.inputs[input_index].norm, input_layer.name,
self.config.inputs[input_index].norm_conf)
norm_conf = self.config.inputs[input_index].norm_conf
self.set_layer_size((norm_conf.output_x**2) * norm_conf.channels)
parse_norm(self.inputs[input_index].norm, input_layer.name,
norm_conf)
self.set_cnn_layer(name, norm_conf.output_y, norm_conf.output_x,
norm_conf.channels, False)
@config_layer('pool')
......@@ -1792,13 +1790,11 @@ class PoolLayer(LayerBase):
name, 'pool', 0, inputs=inputs, device=device)
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index)
parse_pool(self.inputs[input_index].pool, input_layer.name,
self.config.inputs[input_index].pool_conf)
pool_conf = self.config.inputs[input_index].pool_conf
print("output size for %s is %d*%d " % (name, pool_conf.output_y,
pool_conf.output_x))
self.set_layer_size(
(pool_conf.output_x * pool_conf.output_y) * pool_conf.channels)
parse_pool(self.inputs[input_index].pool, input_layer.name,
pool_conf)
self.set_cnn_layer(name, pool_conf.output_y, pool_conf.output_x,
pool_conf.channels)
@config_layer('spp')
......@@ -1808,12 +1804,10 @@ class SpatialPyramidPoolLayer(LayerBase):
name, 'spp', 0, inputs=inputs, device=device)
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index)
parse_spp(self.inputs[input_index].spp, input_layer.name,
self.config.inputs[input_index].spp_conf)
spp_conf = self.config.inputs[input_index].spp_conf
output_size = (pow(4, spp_conf.pyramid_height) - 1) / (4 - 1)
print("output size for %s is %d " % (name, output_size))
self.set_layer_size(output_size * spp_conf.channels)
parse_spp(self.inputs[input_index].spp, input_layer.name, spp_conf)
output_x = (pow(4, spp_conf.pyramid_height) - 1) / (4 - 1)
self.set_cnn_layer(name, 1, output_x, spp_conf.image_conf.channels)
@config_layer('batch_norm')
......@@ -1875,10 +1869,10 @@ class BatchNormLayer(LayerBase):
self.config.moving_average_fraction = moving_average_fraction
input_layer = self.get_input_layer(0)
parse_image(self.inputs[0].image, input_layer.name,
self.config.inputs[0].image_conf)
image_conf = self.config.inputs[0].image_conf
self.set_layer_size((image_conf.img_size**2) * image_conf.channels)
parse_image(self.inputs[0].image, input_layer.name, image_conf)
self.set_cnn_layer(name, image_conf.img_size_y, image_conf.img_size,
image_conf.channels)
psize = self.calc_parameter_size(image_conf)
dims = [1, psize]
......@@ -1936,11 +1930,11 @@ class MaxOutLayer(LayerBase):
super(MaxOutLayer, self).__init__(
name, 'maxout', 0, inputs=inputs, **xargs)
input_layer = self.get_input_layer(0)
parse_maxout(self.inputs[0].maxout, input_layer.name,
self.config.inputs[0].maxout_conf)
maxout_conf = self.config.inputs[0].maxout_conf
self.set_layer_size(g_layer_map[input_layer.name].size /
maxout_conf.groups)
parse_maxout(self.inputs[0].maxout, input_layer.name, maxout_conf)
out_channels = maxout_conf.image_conf.channels / maxout_conf.groups
self.set_cnn_layer(name, g_layer_map[input_layer.name].height,
g_layer_map[input_layer.name].width, out_channels)
# key: cost type
......@@ -2520,11 +2514,10 @@ class BilinearInterpLayer(LayerBase):
super(BilinearInterpLayer, self).__init__(
name, 'bilinear_interp', 0, inputs=inputs, **xargs)
input_layer = self.get_input_layer(0)
parse_bilinear(self.inputs[0].bilinear_interp, input_layer.name,
self.config.inputs[0].bilinear_interp_conf)
conf = self.inputs[0].bilinear_interp
self.set_layer_size(conf.out_size_x * conf.out_size_y *
conf.num_channels)
conf = self.config.inputs[0].bilinear_interp_conf
parse_bilinear(self.inputs[0].bilinear_interp, input_layer.name, conf)
self.set_cnn_layer(name, conf.out_size_y, conf.out_size_x,
conf.image_conf.channels)
@config_layer('sum_to_one_norm')
......@@ -3018,6 +3011,8 @@ class WarpCTCLayer(LayerBase):
@config_layer('recurrent_layer_group')
class RecurrentLayerGroup(LayerBase):
def __init__(self, name, device=None):
global g_pass_height_width
g_pass_height_width = False
super(RecurrentLayerGroup, self).__init__(
name, 'recurrent_layer_group', 0, inputs=[], device=device)
......@@ -3403,6 +3398,20 @@ def parse_config(config_file, config_arg_str):
g_root_submodel.is_recurrent_layer_group = False
g_current_submodel = g_root_submodel
# for paddle on spark, need support non-file config.
# you can use parse_config like below:
#
# from paddle.trainer.config_parser import parse_config
# def configs():
# #your paddle config code, which is same as config file.
#
# config = parse_config(configs, "is_predict=1")
# # then you get config proto object.
if hasattr(config_file, '__call__'):
config_file.func_globals.update(
make_config_environment("", config_args))
config_file()
else:
execfile(config_file, make_config_environment(config_file, config_args))
for k, v in settings.iteritems():
if v is None:
......
......@@ -768,7 +768,7 @@ def mixed_layer(size=0,
@layer_support()
def data_layer(name, size, layer_attr=None):
def data_layer(name, size, height=None, width=None, layer_attr=None):
"""
Define DataLayer For NeuralNetwork.
......@@ -783,6 +783,10 @@ def data_layer(name, size, layer_attr=None):
:type name: basestring
:param size: Size of this data layer.
:type size: int
:param height: Height of this data layer, used for image
:type size: int|None
:param width: Width of this data layer, used for image
:type size: int|None
:param layer_attr: Extra Layer Attribute.
:type layer_attr: ExtraLayerAttribute.
:return: LayerOutput object.
......@@ -792,6 +796,8 @@ def data_layer(name, size, layer_attr=None):
type=LayerType.DATA,
name=name,
size=size,
height=height,
width=width,
**ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput(name, LayerType.DATA, size=size)
......@@ -1485,7 +1491,7 @@ def bilinear_interp_layer(input,
bilinear_interp=BilinearInterp(
out_size_x=out_size_x,
out_size_y=out_size_y,
num_channels=num_channels)),
channels=num_channels)),
type=LayerType.BILINEAR_INTERP_LAYER,
**ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput(
......@@ -1925,8 +1931,7 @@ def img_pool_layer(input,
layer_attr=None,
pool_size_y=None,
stride_y=None,
padding_y=None,
img_width=None):
padding_y=None):
"""
Image pooling Layer.
......@@ -1957,9 +1962,6 @@ def img_pool_layer(input,
:type stride_y: int|None
:param layer_attr: Extra Layer attribute.
:type layer_attr: ExtraLayerAttribute
:param img_width: the width of input feature map. If it is None, the input feature
map should be square.
:type img_width: int|None
:return: LayerOutput object.
:rtype: LayerOutput
"""
......@@ -1995,8 +1997,7 @@ def img_pool_layer(input,
padding=padding,
size_y=pool_size_y,
stride_y=stride_y,
padding_y=padding_y,
img_width=img_width))
padding_y=padding_y))
],
**ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput(
......@@ -2014,7 +2015,6 @@ def spp_layer(input,
num_channels=None,
pool_type=None,
pyramid_height=None,
img_width=None,
layer_attr=None):
"""
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.
......@@ -2031,9 +2031,6 @@ def spp_layer(input,
:type scale: BasePoolingType
:param pyramid_height: pyramid height.
:type pyramid_height: int
:param img_width: the width of input feature map. If it is None, the input feature
map should be square.
:type img_width: int|None
:param layer_attr: Extra Layer Attribute.
:type layer_attr: ExtraLayerAttribute
:return: LayerOutput object.
......@@ -2060,8 +2057,7 @@ def spp_layer(input,
spp=SpatialPyramidPool(
pool_type=type_name,
channels=num_channels,
pyramid_height=pyramid_height,
img_width=img_width)),
pyramid_height=pyramid_height)),
**ExtraLayerAttribute.to_kwargs(layer_attr))
return LayerOutput(
name,
......
......@@ -26,11 +26,15 @@ layers {
filter_size_y: 32
padding_y: 1
stride_y: 1
output_y: 227
img_size_y: 256
}
}
bias_parameter_name: "___conv_0__.wbias"
num_filters: 64
shared_biases: true
height: 227
width: 227
}
layers {
name: "__batch_norm_0__"
......@@ -43,6 +47,7 @@ layers {
image_conf {
channels: 64
img_size: 227
img_size_y: 227
}
}
inputs {
......@@ -55,6 +60,8 @@ layers {
}
bias_parameter_name: "___batch_norm_0__.wbias"
moving_average_fraction: 0.9
height: 227
width: 227
}
layers {
name: "__crmnorm_0__"
......@@ -72,8 +79,12 @@ layers {
output_x: 227
img_size: 227
blocked: false
output_y: 227
img_size_y: 227
}
}
height: 227
width: 227
}
layers {
name: "__pool_0__"
......@@ -97,6 +108,8 @@ layers {
padding_y: 0
}
}
height: 196
width: 196
}
parameters {
name: "___conv_0__.w0"
......
......@@ -26,6 +26,8 @@ layers {
filter_size_y: 32
padding_y: 1
stride_y: 1
output_y: 227
img_size_y: 256
}
}
bias_parameter_name: "___conv_0__.wbias"
......@@ -43,6 +45,7 @@ layers {
image_conf {
channels: 64
img_size: 256
img_size_y: 256
}
}
inputs {
......@@ -55,6 +58,8 @@ layers {
}
bias_parameter_name: "___batch_norm_0__.wbias"
moving_average_fraction: 0.9
height: 256
width: 256
}
layers {
name: "__crmnorm_0__"
......@@ -72,8 +77,12 @@ layers {
output_x: 256
img_size: 256
blocked: false
output_y: 256
img_size_y: 256
}
}
height: 256
width: 256
}
layers {
name: "__pool_0__"
......@@ -97,6 +106,8 @@ layers {
padding_y: 0
}
}
height: 225
width: 225
}
parameters {
name: "___conv_0__.w0"
......
......@@ -177,6 +177,8 @@ layers {
filter_size_y: 3
padding_y: 0
stride_y: 1
output_y: 30
img_size_y: 32
}
num_filters: 64
}
......
......@@ -26,11 +26,15 @@ layers {
filter_size_y: 3
padding_y: 1
stride_y: 1
output_y: 48
img_size_y: 48
}
}
bias_parameter_name: "___conv_0__.wbias"
num_filters: 16
shared_biases: true
height: 48
width: 48
}
layers {
name: "__bilinear_interp_layer_0__"
......@@ -40,11 +44,17 @@ layers {
inputs {
input_layer_name: "__conv_0__"
bilinear_interp_conf {
image_conf {
channels: 16
img_size: 48
img_size_y: 48
}
out_size_x: 64
out_size_y: 64
num_channels: 16
}
}
height: 64
width: 64
}
layers {
name: "__pool_0__"
......@@ -55,19 +65,21 @@ layers {
input_layer_name: "__bilinear_interp_layer_0__"
pool_conf {
pool_type: "max-projection"
channels: 4
channels: 16
size_x: 2
stride: 2
output_x: 64
img_size: 128
output_x: 32
img_size: 64
padding: 0
size_y: 2
stride_y: 2
output_y: 64
img_size_y: 128
output_y: 32
img_size_y: 64
padding_y: 0
}
}
height: 32
width: 32
}
layers {
name: "__fc_layer_0__"
......@@ -78,6 +90,8 @@ layers {
input_layer_name: "__pool_0__"
input_parameter_name: "___fc_layer_0__.w0"
}
height: 32
width: 32
}
parameters {
name: "___conv_0__.w0"
......
......@@ -4,6 +4,8 @@ layers {
type: "data"
size: 2304
active_type: ""
height: 48
width: 48
}
layers {
name: "__conv_0__"
......@@ -26,11 +28,15 @@ layers {
filter_size_y: 3
padding_y: 1
stride_y: 1
output_y: 48
img_size_y: 48
}
}
bias_parameter_name: "___conv_0__.wbias"
num_filters: 16
shared_biases: true
height: 48
width: 48
}
layers {
name: "__maxout_layer_0__"
......@@ -40,12 +46,16 @@ layers {
inputs {
input_layer_name: "__conv_0__"
maxout_conf {
image_conf {
channels: 16
img_size: 48
img_size_y: 48
}
groups: 2
img_size_x: 0
img_size_y: 0
}
}
height: 48
width: 48
}
layers {
name: "__pool_0__"
......@@ -69,48 +79,58 @@ layers {
padding_y: 0
}
}
height: 24
width: 24
}
layers {
name: "__conv_1__"
type: "exconv"
size: 18432
size: 73728
active_type: ""
inputs {
input_layer_name: "__pool_0__"
input_parameter_name: "___conv_1__.w0"
conv_conf {
filter_size: 3
channels: 32
channels: 8
stride: 1
padding: 1
groups: 1
filter_channels: 32
output_x: 12
img_size: 12
filter_channels: 8
output_x: 24
img_size: 24
caffe_mode: true
filter_size_y: 3
padding_y: 1
stride_y: 1
output_y: 24
img_size_y: 24
}
}
bias_parameter_name: "___conv_1__.wbias"
num_filters: 128
shared_biases: true
height: 24
width: 24
}
layers {
name: "__maxout_layer_1__"
type: "maxout"
size: 9216
size: 18432
active_type: ""
inputs {
input_layer_name: "__conv_0__"
input_layer_name: "__conv_1__"
maxout_conf {
image_conf {
channels: 128
img_size: 24
img_size_y: 24
}
groups: 4
img_size_x: 0
img_size_y: 0
}
}
height: 24
width: 24
}
layers {
name: "__block_expand_layer_0__"
......@@ -118,7 +138,7 @@ layers {
size: 192
active_type: ""
inputs {
input_layer_name: "__maxout_layer_0__"
input_layer_name: "__maxout_layer_1__"
block_expand_conf {
channels: 32
stride_x: 1
......@@ -133,6 +153,8 @@ layers {
img_size_y: 0
}
}
height: 24
width: 24
}
layers {
name: "__fc_layer_0__"
......@@ -143,6 +165,8 @@ layers {
input_layer_name: "__block_expand_layer_0__"
input_parameter_name: "___fc_layer_0__.w0"
}
height: 24
width: 24
}
parameters {
name: "___conv_0__.w0"
......@@ -164,9 +188,9 @@ parameters {
}
parameters {
name: "___conv_1__.w0"
size: 36864
size: 9216
initial_mean: 0.0
initial_std: 0.0833333333333
initial_std: 0.166666666667
initial_strategy: 0
initial_smart: false
}
......
......@@ -4,6 +4,8 @@ layers {
type: "data"
size: 3200
active_type: ""
height: 20
width: 10
}
layers {
name: "__spp_0__"
......@@ -13,13 +15,17 @@ layers {
inputs {
input_layer_name: "data"
spp_conf {
pool_type: "max-projection"
pyramid_height: 2
image_conf {
channels: 16
img_size: 10
img_size_y: 20
}
pool_type: "max-projection"
pyramid_height: 2
}
}
height: 1
width: 5
}
input_layer_names: "data"
output_layer_names: "__spp_0__"
......
......@@ -5,7 +5,7 @@ set -e
protostr=`dirname $0`/protostr
files=`ls $protostr | grep -v "unitest"`
files=`ls $protostr | grep -v "unittest"`
./generate_protostr.sh
......
......@@ -17,7 +17,7 @@ bilinear = bilinear_interp_layer(input=conv, out_size_x=64, out_size_y=64)
pool = img_pool_layer(
input=bilinear,
num_channels=4,
num_channels=16,
pool_size=2,
stride=2,
pool_type=MaxPooling())
......
......@@ -2,7 +2,7 @@ from paddle.trainer_config_helpers import *
settings(batch_size=1000, learning_rate=1e-5)
data = data_layer(name='data', size=2304)
data = data_layer(name='data', size=2304, height=48, width=48)
conv = img_conv_layer(
input=data,
......@@ -21,16 +21,21 @@ pool = img_pool_layer(
conv2 = img_conv_layer(
input=pool,
filter_size=3,
num_channels=32,
num_channels=8,
num_filters=128,
padding=1,
act=LinearActivation(),
bias_attr=True)
maxout2 = maxout_layer(input=conv, num_channels=128, groups=4)
maxout2 = maxout_layer(input=conv2, num_channels=128, groups=4)
block = block_expand_layer(
input=maxout, num_channels=32, stride_x=1, stride_y=1, block_x=1, block_y=6)
input=maxout2,
num_channels=32,
stride_x=1,
stride_y=1,
block_x=1,
block_y=6)
fc = fc_layer(input=block, size=384, bias_attr=False)
......
......@@ -2,13 +2,9 @@ from paddle.trainer_config_helpers import *
settings(batch_size=100, learning_rate=1e-5)
data = data_layer(name='data', size=3200)
data = data_layer(name='data', size=3200, height=20, width=10)
spp = spp_layer(
input=data,
pyramid_height=2,
num_channels=16,
pool_type=MaxPooling(),
img_width=10)
input=data, pyramid_height=2, num_channels=16, pool_type=MaxPooling())
outputs(spp)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册