Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into doc

d7d3b411 · Kavya Srinet · c806eeff · 5dbd5370 · d7d3b411
隐藏空白更改
内联并排

Showing with 78 addition and 0 deletion

benchmark/cluster/README.md benchmark/cluster/README.md +78 -0

未找到文件。
--- a/benchmark/cluster/README.md
+++ b/benchmark/cluster/README.md
+# Cluster Training Benchmark
+## Setup
+- Platform
+  - Kubernetes: v1.6.2
+  - Linux Kernel: v3.10.0
+- Resource
+  - CPU: 10 Cores per Pod
+  - Memory: 5GB per Pod
+- Docker Image
+  We use different base Docker Image to run the benchmark on Kubernetes:
+  - PaddlePaddle v2: paddlepaddle/paddle:0.11.0
+  - PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id]
+  - TensorFlow: tensorflow/tensorflow:1.5.0-rc0
+- Model
+  vgg16 is used in this benchmark.
+## Cases
+- Variable
+  - Batch Size of training data.
+  - PServer count of the training job.
+  - The number of trainers.
+- Invariant
+  - The resource of trainer/pserver Pod.
+### Measure the Performance for Different Batch Size
+- PServer Count: 40
+- Trainer Count: 100
+- Metrics: mini-batch / sec
+| Batch Size | 32 | 64 | 128 | 256 |
+| -- | -- | -- | -- | -- |
+| PaddlePaddle Fluid | - | - | - | - |
+| PaddlePaddle v2 | - | - | - | - |
+| TensorFlow | - | - | - | - |
+### Measure the Performance for Different PServer Count
+- Trainer Count: 100
+- Batch Size: 64
+- Metrics: mini-batch / sec
+| PServer Count | 10 | 20 | 40 | 60 |
+| -- | -- | -- | -- | -- |
+| PaddlePaddle Fluid | - | - | - | - |
+| PaddlePaddle v2 | - | - | - | - |
+| TensorFlow | - | - | - | - |
+### Measure Parallel Efficiency By Increasing Trainer Count
+- PServer Count: 20
+- Batch Size: 64
+- Metrics:
+$S = \div(T1, TN)$
+which S is the ratio of T1 over TN, training time of 1 and N trainers.
+The parallel efficiency is:
+$E = \div(S, N)$
+| Trainer Counter | 1 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
+| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
+| PaddlePaddle Fluid | - | - | - | - | - | - | - | - | - | - | - |
+| PaddlePaddle v2 | - | - | - | - | - | - | - | - | - | - | - | - |
+| TensorFlow | - | - | - | - | - | - | - | - | - | - | - | - | - |
+## Reproduce the benchmark
+TODO