• D
    hugetlbfs: parallelize hugetlbfs_fallocate with ktask · 4733c59f
    Daniel Jordan 提交于
    hulk inclusion
    category: feature
    bugzilla: 13228
    CVE: NA
    ---------------------------
    
    hugetlbfs_fallocate preallocates huge pages to back a file in a
    hugetlbfs filesystem.  The time to call this function grows linearly
    with size.
    
    ktask performs well with its default thread count of 4; higher thread
    counts are given for context only.
    
    Machine: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz, 288 CPUs, 1T memory
    Test:    fallocate(1) a file on a hugetlbfs filesystem
    
    nthread   speedup   size (GiB)   min time (s)   stdev
          1                    200         127.53    2.19
          2     3.09x          200          41.30    2.11
          4     5.72x          200          22.29    0.51
          8     9.45x          200          13.50    2.58
         16     9.74x          200          13.09    1.64
    
          1                    400         193.09    2.47
          2     2.14x          400          90.31    3.39
          4     3.84x          400          50.32    0.44
          8     5.11x          400          37.75    1.23
         16     6.12x          400          31.54    3.13
    
    The primary bottleneck for better scaling at higher thread counts is
    hugetlb_fault_mutex_table[hash].  perf showed L1-dcache-loads increase
    with 8 threads and again sharply with 16 threads, and a CPU counter
    profile showed that 31% of the L1d misses were on
    hugetlb_fault_mutex_table[hash] in the 16-thread case.
    Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
    Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
    Tested-by: NHongbo Yao <yaohongbo@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    4733c59f
inode.c 39.0 KB