Benchmark of Fluid CPU multi-thread (#11620) · Issue · PaddlePaddle / Paddle

Benchmark of Fluid CPU multi-thread

Created by: luotao1

Environment

Machine: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, 2 socket, total 40 cores
Model: ResNet50
Input: 3 * 224 * 224
docker images:
- paddlepaddle/paddle:latest (for MKLML)
- paddlepaddle/paddle:latest-openblas (for OpenBlas)
v2 results: https://github.com/PaddlePaddle/Paddle/blob/develop/benchmark/IntelOptimizedPaddle.md
scripts: https://github.com/chengduoZH/benchmark/tree/add_resnet_50_v2/fluid/ResNet_50

Training

ensure that CPU_NUM * batch_size_per_trainer = batch_size, for example, batchsize=64, threads=8

export CPU_NUM=8
python train_resnet.py --display_step=1 --warmup=0 --use_gpu=false --number_iteration=20 --skip_first_steps=5 --batch_size_per_trainer=8

The Result (images/second) is as follows:

core	1	8	16	32	40
OpenBlas (Fluid)	1.6342	10.7300	18.9921	30.3432	---
MKLML (Fluid)	2.6408	15.874	28.2036	33.9912	---
OpenBlas (V2-0.11.0)	---	---	---	---	25.22
MKLML (V2-0.11.0)	---	---	---	---	32.52

Inference

ensure that --batch_size_per_trainer=1, only change the CPU_NUM, for example, batchsize=8, threads=8

export CPU_NUM=8
python train_resnet.py --display_step=1 --warmup=0 --use_gpu=false --number_iteration=20 --skip_first_steps=5 --batch_size_per_trainer=1 --with_test=True

The Result (images/second) is as follows:

BatchSize	1	2	4	8	16
OpenBLAS (Fluid)	4.7254	8.6016	16.7441	33.9707	59.2398
MKLML (Fluid)	8.4839	13.60	24.8657	48.3597	74.5651
OpenBLAS (V2-0.11.0)	3.31	6.72	11.59	13.17	9.27
MKLML (V2-0.11.0)	6.33	12.02	22.88	40.53	63.09

PaddlePaddle / Paddle 大约 1 年 前同步成功

Benchmark of Fluid CPU multi-thread

Environment

Training

Inference

PaddlePaddle / Paddle
大约 1 年前同步成功