Several convolutional neural networks and recurrent neural network are used to test.
Several convolutional neural networks and recurrent neural networks are used to test.
## Image
## Image
### Benchmark Model
### Benchmark Model
AlexNet, GooleNet and a small network which refer the config of cifar10 in Caffe are used.
AlexNet, GoogleNet and a small network used in Caffe.
-[AlexNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet): but the group size is one.
-[AlexNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet): but the group size is one.
...
@@ -38,9 +38,9 @@ AlexNet, GooleNet and a small network which refer the config of cifar10 in Caffe
...
@@ -38,9 +38,9 @@ AlexNet, GooleNet and a small network which refer the config of cifar10 in Caffe
| TensorFlow | 223 | 364 | 645 | 1235 |
| TensorFlow | 223 | 364 | 645 | 1235 |
| Caffe | 324 | 627 | 1232 | 2513 |
| Caffe | 324 | 627 | 1232 | 2513 |
##### Notation
**Notation**
All platforms use cuDnn-v5.1. You might see that caffe is slower, because the workspace limit size is 8 * 1024 * 1024 in Caffe's cuDnn-conv interface. This size is larger in PaddlePaddle and TensorFlow. Caffe will be faster if increasing the workspace limit size.
All platforms use cuDNN-v5.1. We see that caffe is slower in this experiment, because its workspace limit size of cuDNN-conv interface is 8 * 1024 * 1024, which is smaller in PaddlePaddle and TensorFlow. Note that Caffe will be faster if increasing the workspace limit size.
@@ -59,9 +59,9 @@ All platforms use cuDnn-v5.1. You might see that caffe is slower, because the wo
...
@@ -59,9 +59,9 @@ All platforms use cuDnn-v5.1. You might see that caffe is slower, because the wo
| TensorFlow | 9 | 15 | 28 | 59 |
| TensorFlow | 9 | 15 | 28 | 59 |
| Caffe | 9.373 | 16.6606 | 31.4797 | 59.719 |
| Caffe | 9.373 | 16.6606 | 31.4797 | 59.719 |
##### Notation
**Notation**
All the tests in caffe use `caffe time` to execute, which is not including the parameter updating process. But the time in PaddlePaddle and TensorFlow contains it.
All the experiments in caffe use `caffe time` to execute, which does not include the time of parameter updating. The time in PaddlePaddle and TensorFlow contains it. But, compared with the total time, the time of parameter updating is relatively little.
In Tensorflow, they implement algorithm searching method instead of using the algorithm searching interface in cuDNN.
In Tensorflow, they implement algorithm searching method instead of using the algorithm searching interface in cuDNN.
...
@@ -69,13 +69,13 @@ In Tensorflow, they implement algorithm searching method instead of using the al
...
@@ -69,13 +69,13 @@ In Tensorflow, they implement algorithm searching method instead of using the al
- AlexNet, ms / batch
- AlexNet, ms / batch
| totoal-BatchSize | 128 * 4 | 256 * 4 |
| total-BatchSize | 128 * 4 | 256 * 4 |
|------------------|----------| -----------|
|------------------|----------| -----------|
| PaddlePaddle | 347 | 622 |
| PaddlePaddle | 347 | 622 |
| TensorFlow | 377 | 675 |
| TensorFlow | 377 | 675 |
| Caffe | 1229 | 2435 |
| Caffe | 1229 | 2435 |
For example, if `totoal-BatchSize = 128 * 4`, the speed is calculated by
For example, if `total-BatchSize = 128 * 4`, the speedup ratio is calculated by
- Sequence legth=100, in fact, PaddlePaddle support training with variable-length sequence. But TensorFlow need to pad, in order to compare, we also pad sequence length to 100 in PaddlePaddle.
- Sequence legth is 100. In fact, PaddlePaddle supports training with variable-length sequence, but TensorFlow needs to pad, we also pad sequence length to 100 in PaddlePaddle in order to compare.
- Dictionary size=30000
- Dictionary size=30000
- Peephole connection is used in `lstmemory` by default in PaddlePaddle. It is also configured in TensorFlow.
- Peephole connection is used in `lstmemory` by default in PaddlePaddle. It is also configured in TensorFlow.
...
@@ -110,7 +110,7 @@ We use lstm network for text classfication to test benchmark.
...
@@ -110,7 +110,7 @@ We use lstm network for text classfication to test benchmark.
#### LSTM in Text Classification
#### LSTM in Text Classification
Testing network for different hidden size, batch size with `2 lstm layer + fc` network.
Testing `2 lstm layer + fc` network with different hidden size and batch size.
- Batch size = 64, ms / batch
- Batch size = 64, ms / batch
...
@@ -138,7 +138,7 @@ Testing network for different hidden size, batch size with `2 lstm layer + fc` n
...
@@ -138,7 +138,7 @@ Testing network for different hidden size, batch size with `2 lstm layer + fc` n
#### Seq2Seq
#### Seq2Seq
The benchmark of sequence-to-sequence network will be add later.
The benchmark of sequence-to-sequence network will be added later.
### Multi GPU: 4 GPUs
### Multi GPU: 4 GPUs
...
@@ -165,4 +165,4 @@ The benchmark of sequence-to-sequence network will be add later.
...
@@ -165,4 +165,4 @@ The benchmark of sequence-to-sequence network will be add later.
#### Seq2Seq
#### Seq2Seq
The benchmark of sequence-to-sequence network will be add later.
The benchmark of sequence-to-sequence network will be added later.