# Cluster Training Benchmark ## Setup - Platform - Kubernetes: v1.6.2 - Linux Kernel: v3.10.0 - Resource - CPU: 10 Cores per Pod - Memory: 5GB per Pod - Docker Image We use different base Docker Image to run the benchmark on Kubernetes: - PaddlePaddle v2: paddlepaddle/paddle:0.11.0 - PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id] - TensorFlow: tensorflow/tensorflow:1.5.0-rc0 - Model vgg16 is used in this benchmark. ## Cases - Variable - Batch Size of training data. - PServer count of the training job. - The number of trainers. - Invariant - The resource of trainer/pserver Pod. ### Measure the Performance for Different Batch Size - PServer Count: 40 - Trainer Count: 100 - Metrics: mini-batch / sec | Batch Size | 32 | 64 | 128 | 256 | | -- | -- | -- | -- | -- | | PaddlePaddle Fluid | - | - | - | - | | PaddlePaddle v2 | - | - | - | - | | TensorFlow | - | - | - | - | ### Measure the Performance for Different PServer Count - Trainer Count: 100 - Batch Size: 64 - Metrics: mini-batch / sec | PServer Count | 10 | 20 | 40 | 60 | | -- | -- | -- | -- | -- | | PaddlePaddle Fluid | - | - | - | - | | PaddlePaddle v2 | - | - | - | - | | TensorFlow | - | - | - | - | ### Measure Parallel Efficiency By Increasing Trainer Count - PServer Count: 20 - Batch Size: 64 - Metrics: $S = \div(T1, TN)$ which S is the ratio of T1 over TN, training time of 1 and N trainers. The parallel efficiency is: $E = \div(S, N)$ | Trainer Counter | 1 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | | PaddlePaddle Fluid | - | - | - | - | - | - | - | - | - | - | - | | PaddlePaddle v2 | - | - | - | - | - | - | - | - | - | - | - | - | | TensorFlow | - | - | - | - | - | - | - | - | - | - | - | - | - | ## Reproduce the benchmark TODO