# Cluster Training Benchmark

## Setup

- Platform
  - Kubernetes: v1.6.2
  - Linux Kernel: v3.10.0

- Resource
  - CPU: 10 Cores per Pod
  - Memory: 5GB per Pod

- Docker Image

  We use different base Docker Image to run the benchmark on Kubernetes:
  - PaddlePaddle v2: paddlepaddle/paddle:0.11.0
  - PaddlePaddle Fluid: paddlepaddle/paddle:[commit-id]
  - TensorFlow: tensorflow/tensorflow:1.5.0-rc0

- Model
  vgg16 is used in this benchmark.

## Cases

- Variable
  - Batch Size of training data.
  - PServer count of the training job.
  - The number of trainers.

- Invariant
  - The resource of trainer/pserver Pod.

### Measure the Performance for Different Batch Size

- PServer Count: 40
- Trainer Count: 100
- Metrics: mini-batch / sec

| Batch Size | 32 | 64 | 128 | 256 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - |
| TensorFlow | - | - | - | - |

### Measure the Performance for Different PServer Count

- Trainer Count: 100
- Batch Size: 64
- Metrics: mini-batch / sec

| PServer Count | 10 | 20 | 40 | 60 |
| -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - |
| TensorFlow | - | - | - | - |

### Measure Parallel Efficiency By Increasing Trainer Count

- PServer Count: 20
- Batch Size: 64
- Metrics:

$S = \div(T1, TN)$

which S is the ratio of T1 over TN, training time of 1 and N trainers.
The parallel efficiency is:

$E = \div(S, N)$

| Trainer Counter | 1 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| PaddlePaddle Fluid | - | - | - | - | - | - | - | - | - | - | - |
| PaddlePaddle v2 | - | - | - | - | - | - | - | - | - | - | - | - |
| TensorFlow | - | - | - | - | - | - | - | - | - | - | - | - | - |

## Reproduce the benchmark

TODO