• D
    mm: parallelize deferred struct page initialization within each node · eb761d65
    Daniel Jordan 提交于
    hulk inclusion
    category: feature
    bugzilla: 13228
    CVE: NA
    ---------------------------
    
    Deferred struct page initialization currently runs one thread per node,
    but this is a bottleneck during boot on big machines, so use ktask
    within each pgdatinit thread to parallelize the struct page
    initialization, allowing the system to take better advantage of its
    memory bandwidth.
    
    Because the system is not fully up yet and most CPUs are idle, use more
    than the default maximum number of ktask threads.  The kernel doesn't
    know the memory bandwidth of a given system to get the most efficient
    number of threads, so there's some guesswork involved.  In testing, a
    reasonable value turned out to be about a quarter of the CPUs on the
    node.
    
    __free_pages_core used to increase the zone's managed page count by the
    number of pages being freed.  To accommodate multiple threads, however,
    account the number of freed pages with an atomic shared across the ktask
    threads and bump the managed page count with it after ktask is finished.
    
    Test:    Boot the machine with deferred struct page init three times
    
    Machine: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, 88 CPUs, 503G memory,
             2 sockets
    
    kernel                   speedup   max time per   stdev
                                       node (ms)
    
    baseline (4.15-rc2)                        5860     8.6
    ktask                      9.56x            613    12.4
    Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
    Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
    Tested-by: NHongbo Yao <yaohongbo@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    eb761d65
page_alloc.c 228.2 KB