# Cluster Training Benchmark ## Setup - Platform - Kubernetes: v1.6.2 - Linux Kernel: v3.10.0 - Resource - CPU: 10 Cores per Pod - Memory: 5GB per Pod - Docker Image We use different base Docker Image to run the benchmark on Kubernetes: - PaddlePaddle v2: paddlepaddle/paddle:0.11.0 - PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id] - TensorFlow: tensorflow/tensorflow:1.5.0-rc0 - Model vgg16 is used in this benchmark. ## Cases - Variable - Batch Size of training data. - PServer count of the training job. - The number of trainers. - Invariant - The resource of trainer/pserver Pod. ### Measure the Performance for Different Batch Size - PServer Count: 40 - Trainer Count: 100 - Metrics: mini-batch / sec
Batch Size 32 64 128 256
PaddlePaddle Fluid - - - -
PaddlePaddle v2 - - - -
TensorFlow - - - -
### Measure the Performance for Different PServer Count - Trainer Count: 100 - Batch Size: 64 - Metrics: mini-batch / sec
PServer Count 10 20 40 60
PaddlePaddle Fluid - - - -
PaddlePaddle v2 - - - -
TensorFlow - - - -
### Measure Parallel Efficiency By Increasing Trainer Count - PServer Count: 20 - Batch Size: 64 - Metrics: $S = \div(T1, TN)$ which S is the ratio of T1 over TN, training time of 1 and N trainers. The parallel efficiency is: $E = \div(S, N)$
Trainer Counter 1 10 20 30 40 50 60 70 80 90 100
PaddlePaddle Fluid - - - - - - - - - - -
PaddlePaddle v2 - - - - - - - - - - -
TensorFlow - - - - - - - - - - -
## Reproduce the benchmark TODO