Add NHWC datalayout support on conv2d without data transformation. (!20597) · 合并请求 · PaddlePaddle / Paddle

Add NHWC datalayout support on conv2d without data transformation. !20597

Created by: gongweibao

PR Description:

Previous conv2d only supports NCHW computation even though data format is NHWC. Transforming data format from NHWC to NCHW will be applied internally. But on Tensor Cores, conv2d is faster with NHWC format when the data is fp16, it will improve perf when we use amp.

This pr implements NHWC computation when the data is fp16 and device has Tensor Cores. The perf results are listed below:

	FP32	FP16 NCHW	FP16 NHWC
ResNet50	0.40	0.195(2x)	0.16(2.2x)
VGG16	0.77	0.55(1.4x)	0.32(2.4x)
Inception V4	0.55	0.30(1.83x)	0.27(2x)
GoogleNet	0.12	0.1(1.2x)	0.075(1.6x)

PaddlePaddle / Paddle 大约 1 年 前同步成功

Add NHWC datalayout support on conv2d without data transformation. !20597

PaddlePaddle / Paddle
大约 1 年前同步成功