-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I47W8L CVE: NA --------------------------- We detect a performance deterioration when using Unixbench, we use the dichotomy to locate the patch 7e66740a ("MPAM / ACPI: Refactoring MPAM init process and set MPAM ACPI as entrance"), In comparing two commit df5defd9 ("KVM: X86: MMU: Use the correct inherited permissions to get shadow page") and ac4dbb75 ("ACPI 6.x: Add definitions for MPAM table") we get following testing result: CMD: ./Run -c xx context1 RESULT: +-------------UnixBench context1-----------+ +---------+--------------+-----------------+ + + ac4dbb75 + df5defd9 + +---------+--------------+---------+-------+ + Cores + Score + Score + +---------+--------------+-----------------+ + 1 + 522.8 + 535.7 + +---------+--------------+-----------------+ + 24 + 11231.5 + 12111.2 + +---------+--------------+-----------------+ + 48 + 8535.1 + 8745.1 + +---------+--------------+-----------------+ + 72 + 10821.9 + 10343.8 + +---------+--------------+-----------------+ + 96 + 15238.5 + 42947.8 + +---------+--------------+-----------------+ We found a irrefutable difference in latency sampling when using the perf tool: HEAD:ac4dbb75 HEAD:df5defd9 45.18% [kernel] [k] ktime_get_coarse_real_ts64 -> 1.78% [kernel] [k] ktime_get_coarse_real_ts64 ... 65.87 │ dmb ishld //smp_rmb() Through ftrace we get the calltrace and and detected the number of visits of ktime_get_coarse_real_ts64, which frequently visits tk_core->seq and tk_core->timekeeper->tkr_mono: - 48.86% [kernel] [k] ktime_get_coarse_real_ts64 - 5.76% ktime_get_coarse_real_ts64 #about 111437657 times per 10 seconds - 14.70% __audit_syscall_entry syscall_trace_enter el0_svc_common el0_svc_handler + el0_svc - 2.85% current_time So this may be performance degradation caused by interference when happened different fields access, We compare .bss and .data section of this two version: HEAD:ac4dbb75 `-> ffff00000962e680 l O .bss 0000000000000110 tk_core ffff000009355680 l O .data 0000000000000078 tk_fast_mono ffff0000093557a0 l O .data 0000000000000090 dummy_clock ffff000009355700 l O .data 0000000000000078 tk_fast_raw ffff000009355778 l O .data 0000000000000028 timekeeping_syscore_ops ffff00000962e640 l O .bss 0000000000000008 cycles_at_suspend HEAD:df5defd9 `-> ffff00000957dbc0 l O .bss 0000000000000110 tk_core ffff0000092b4e80 l O .data 0000000000000078 tk_fast_mono ffff0000092b4fa0 l O .data 0000000000000090 dummy_clock ffff0000092b4f00 l O .data 0000000000000078 tk_fast_raw ffff0000092b4f78 l O .data 0000000000000028 timekeeping_syscore_ops ffff00000957db80 l O .bss 0000000000000008 cycles_at_suspend By comparing this two version tk_core's address: ffff00000962e680 is 128Byte aligned but latter df5defd9 is 64Byte aligned, the memory storage layout of tk_core has undergone subtle changes: HEAD:ac4dbb75 `-> |<--------formmer 64Bytes---------->|<------------latter 64Byte------------->| 0xffff00000957dbc0_>|<-seq 8Bytes->|<-tkr_mono 56Bytes->|<-thr_raw 56Bytes->|<-xtime_sec 8Bytes->| 0xffff00000957dc00_>... HEAD:df5defd9 `-> |<------formmer 64Bytes---->|<------------latter 64Byte-------->| 0xffff00000962e680_>|<-Other variables 64Bytes->|<-seq 8Bytes->|<-tkr_mono 56Bytes->| 0xffff00000962e6c0_>.. We testified thr_raw,xtime_sec fields interfere strongly with seq,tkr_mono field because of frequent load/store operation, this will cause as known false sharing. We add a 64Bytes padding field in tk_core for reservation of any after usefull usage and keep tk_core 128Byte aligned, this can avoid changes in the way tk_core's layout is stored, In this solution, layout of tk_core always like this: crash> struct -o tk_core_t struct tk_core_t { [0] u64 padding[8]; [64] seqcount_t seq; [72] struct timekeeper timekeeper; } SIZE: 336 crash> struct -o timekeeper struct timekeeper { [0] struct tk_read_base tkr_mono; [56] struct tk_read_base tkr_raw; [112] u64 xtime_sec; [120] unsigned long ktime_sec; ... } SIZE: 264 After appling our own solution: +---------+--------------+ + + Our solution + +---------+--------------+ + Cores + Score + +---------+--------------+ + 1 + 548.9 + +---------+--------------+ + 24 + 11018.3 + +---------+--------------+ + 48 + 8938.2 + +---------+--------------+ + 72 + 14610.7 + +---------+--------------+ + 96 + 40811.7 + +---------+--------------+ Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
ae79e85a