Fix DataTransFunc (!10906) · 合并请求 · PaddlePaddle / Paddle

Fix DataTransFunc !10906

Created by: chengduoZH

TransDataDevice is used to transform data from GPU to CPU and the enforced checkings have been done in GetDeviceContext, so the dev_ctx->Wait() is necessary. But dev_ctx->Wait() will make the program slow, especially when the number of elements is little, for example, the elements of learning rate are one and it's CPU side. One solution is to use a CUDA kernel to complete the copy operation when the transforming is from CPU to GPU and the number of elements is little. But the embarrassment is that this solution this solution makes training slower.

PaddlePaddle / Paddle 大约 1 年 前同步成功

Fix DataTransFunc !10906

PaddlePaddle / Paddle
大约 1 年前同步成功