• D
    pcpcntrs: fix dying cpu summation race · b020fa54
    Dave Chinner 提交于
    mainline inclusion
    from mainline-v6.3-rc4
    commit 8b57b11c
    category: bugfix
    bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35
    
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8b57b11cca88f397035a95b9e12b03511847b0e8
    
    --------------------------------
    
    In commit f689054a ("percpu_counter: add percpu_counter_sum_all
    interface") a race condition between a cpu dying and
    percpu_counter_sum() iterating online CPUs was identified. The
    solution was to iterate all possible CPUs for summation via
    percpu_counter_sum_all().
    
    We recently had a percpu_counter_sum() call in XFS trip over this
    same race condition and it fired a debug assert because the
    filesystem was unmounting and the counter *should* be zero just
    before we destroy it. That was reported here:
    
    https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/
    
    likely as a result of running generic/648 which exercises
    filesystems in the presence of CPU online/offline events.
    
    The solution to use percpu_counter_sum_all() is an awful one. We
    use percpu counters and percpu_counter_sum() for accurate and
    reliable threshold detection for space management, so a summation
    race condition during these operations can result in overcommit of
    available space and that may result in filesystem shutdowns.
    
    As percpu_counter_sum_all() iterates all possible CPUs rather than
    just those online or even those present, the mask can include CPUs
    that aren't even installed in the machine, or in the case of
    machines that can hot-plug CPU capable nodes, even have physical
    sockets present in the machine.
    
    Fundamentally, this race condition is caused by the CPU being
    offlined being removed from the cpu_online_mask before the notifier
    that cleans up per-cpu state is run. Hence percpu_counter_sum() will
    not sum the count for a cpu currently being taken offline,
    regardless of whether the notifier has run or not. This is
    the root cause of the bug.
    
    The percpu counter notifier iterates all the registered counters,
    locks the counter and moves the percpu count to the global sum.
    This is serialised against other operations that move the percpu
    counter to the global sum as well as percpu_counter_sum() operations
    that sum the percpu counts while holding the counter lock.
    
    Hence the notifier is safe to run concurrently with sum operations,
    and the only thing we actually need to care about is that
    percpu_counter_sum() iterates dying CPUs. That's trivial to do,
    and when there are no CPUs dying, it has no addition overhead except
    for a cpumask_or() operation.
    
    This change makes percpu_counter_sum() always do the right thing in
    the presence of CPU hot unplug events and makes
    percpu_counter_sum_all() unnecessary. This, in turn, means that
    filesystems like XFS, ext4, and btrfs don't have to work out when
    they should use percpu_counter_sum() vs percpu_counter_sum_all() in
    their space accounting algorithms
    
    conflicts:
    lib/percpu_counter.c
    Signed-off-by: NDave Chinner <dchinner@redhat.com>
    Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
    Signed-off-by: NZeng Heng <zengheng4@huawei.com>
    b020fa54
percpu_counter.c 6.9 KB