# Cluster Training Benchmark ## Setup - Platform - Kubernetes: v1.6.2 - Linux Kernel: v3.10.0 - Resource - CPU: 10 Cores per Pod - Memory: 5GB per Pod - Docker Image We use different base Docker Image to run the benchmark on Kubernetes: - PaddlePaddle v2: paddlepaddle/paddle:0.11.0 - PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id] - TensorFlow: tensorflow/tensorflow:1.5.0-rc0 - Model vgg16 is used in this benchmark. ## Cases - Variable - Batch Size of training data. - PServer count of the training job. - The number of trainers. - Invariant - The resource of trainer/pserver Pod. ### Measure the Performance for Different Batch Size - PServer Count: 40 - Trainer Count: 100 - Metrics: mini-batch / sec

Batch Size	32	64	128	256
PaddlePaddle Fluid	-	-	-	-
PaddlePaddle v2	-	-	-	-
TensorFlow	-	-	-	-

### Measure the Performance for Different PServer Count - Trainer Count: 100 - Batch Size: 64 - Metrics: mini-batch / sec

PServer Count	10	20	40	60
PaddlePaddle Fluid	-	-	-	-
PaddlePaddle v2	-	-	-	-
TensorFlow	-	-	-	-

### Measure Parallel Efficiency By Increasing Trainer Count - PServer Count: 20 - Batch Size: 64 - Metrics: $S = \div(T1, TN)$ which S is the ratio of T1 over TN, training time of 1 and N trainers. The parallel efficiency is: $E = \div(S, N)$

Trainer Counter	1	10	20	30	40	50	60	70	80	90	100
PaddlePaddle Fluid	-	-	-	-	-	-	-	-	-	-	-
PaddlePaddle v2	-	-	-	-	-	-	-	-	-	-	-
TensorFlow	-	-	-	-	-	-	-	-	-	-	-

## Reproduce the benchmark TODO