mm: parallelize deferred struct page initialization within each node
hulk inclusion
category: feature
bugzilla: 13228
CVE: NA
---------------------------
Deferred struct page initialization currently runs one thread per node,
but this is a bottleneck during boot on big machines, so use ktask
within each pgdatinit thread to parallelize the struct page
initialization, allowing the system to take better advantage of its
memory bandwidth.
Because the system is not fully up yet and most CPUs are idle, use more
than the default maximum number of ktask threads. The kernel doesn't
know the memory bandwidth of a given system to get the most efficient
number of threads, so there's some guesswork involved. In testing, a
reasonable value turned out to be about a quarter of the CPUs on the
node.
__free_pages_core used to increase the zone's managed page count by the
number of pages being freed. To accommodate multiple threads, however,
account the number of freed pages with an atomic shared across the ktask
threads and bump the managed page count with it after ktask is finished.
Test: Boot the machine with deferred struct page init three times
Machine: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz, 88 CPUs, 503G memory,
2 sockets
kernel speedup max time per stdev
node (ms)
baseline (4.15-rc2) 5860 8.6
ktask 9.56x 613 12.4
Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Tested-by: NHongbo Yao <yaohongbo@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Showing
想要评论请 注册 或 登录