hugetlbfs: parallelize hugetlbfs_fallocate with ktask
hulk inclusion
category: feature
bugzilla: 13228
CVE: NA
---------------------------
hugetlbfs_fallocate preallocates huge pages to back a file in a
hugetlbfs filesystem. The time to call this function grows linearly
with size.
ktask performs well with its default thread count of 4; higher thread
counts are given for context only.
Machine: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz, 288 CPUs, 1T memory
Test: fallocate(1) a file on a hugetlbfs filesystem
nthread speedup size (GiB) min time (s) stdev
1 200 127.53 2.19
2 3.09x 200 41.30 2.11
4 5.72x 200 22.29 0.51
8 9.45x 200 13.50 2.58
16 9.74x 200 13.09 1.64
1 400 193.09 2.47
2 2.14x 400 90.31 3.39
4 3.84x 400 50.32 0.44
8 5.11x 400 37.75 1.23
16 6.12x 400 31.54 3.13
The primary bottleneck for better scaling at higher thread counts is
hugetlb_fault_mutex_table[hash]. perf showed L1-dcache-loads increase
with 8 threads and again sharply with 16 threads, and a CPU counter
profile showed that 31% of the L1d misses were on
hugetlb_fault_mutex_table[hash] in the 16-thread case.
Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Tested-by: NHongbo Yao <yaohongbo@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Showing
想要评论请 注册 或 登录