use prefetch to load next mem into cache (!21206) · 合并请求 · PaddlePaddle / Paddle

use prefetch to load next mem into cache !21206

Created by: LeoZhao-Intel

memcpy usage in hash_embedding_ff of Pyramid DNN is a typical random memory access case, which makes cache miss frequently. Before each memcpy, it uses a hash func to calculate next position for mem read, this position is not continuous, so each time L1 cache need refresh and access cache line to read next mem block, it costs time, and make memcpy perf bad.

The good thing is the hashed position is predictable, we can use XXH32() to predict next position in last memcpy, by using prefetch(), we can let system preload next mem into cache during current memcpy, it saves time and reduce cache miss rate.

test=develop

PaddlePaddle / Paddle 大约 2 年 前同步成功

use prefetch to load next mem into cache !21206

PaddlePaddle / Paddle
大约 2 年前同步成功