rcu: Reject memory-order-induced stall-warning false positives
If a system is idle from an RCU perspective for longer than specified by CONFIG_RCU_CPU_STALL_TIMEOUT, and if one CPU starts a grace period just as a second checks for CPU stalls, and if this second CPU happens to see the old value of rsp->jiffies_stall, it will incorrectly report a CPU stall. This is quite rare, but apparently occurs deterministically on systems with about 6TB of memory. This commit therefore orders accesses to the data used to determine whether or not a CPU stall is in progress. Grace-period initialization and cleanup first increments rsp->completed to mark the end of the previous grace period, then records the current jiffies in rsp->gp_start, then records the jiffies at which a stall can be expected to occur in rsp->jiffies_stall, and finally increments rsp->gpnum to mark the start of the new grace period. Now, this ordering by itself does not prevent false positives. For example, if grace-period initialization was delayed between recording rsp->gp_start and rsp->jiffies_stall, the CPU stall warning code might still see an old value of rsp->jiffies_stall. Therefore, this commit also orders the CPU stall warning accesses as well, loading rsp->gpnum and jiffies, then rsp->jiffies_stall, then rsp->gp_start, and finally rsp->completed. This ordering means that the false-positive scenario in the previous paragraph would result in rsp->completed being greater than or equal to rsp->gpnum, which is never valid for a CPU stall, allowing the false positive to be rejected. Furthermore, any fetch that gets an old value of rsp->jiffies_stall must also get an old value of rsp->gpnum, which will again be rejected by the comparison of rsp->gpnum and rsp->completed. Situations where rsp->gp_start is later than rsp->jiffies_stall are also rejected, as are situations where jiffies is less than rsp->jiffies_stall. Although use of unsynchronized accesses means that there are likely still some false-positive scenarios (synchronization has proven to be a very bad idea on large systems), this should get rid of a large class of these scenarios. Reported-by: NFabian Herschel <fabian.herschel@suse.com> Reported-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Tested-by: NJochen Striepe <jochen@tolot.escape.de>
Showing
想要评论请 注册 或 登录