• V
    mm/slub: restrict sysfs validation to debug caches and make it safe · c7323a5a
    Vlastimil Babka 提交于
    Rongwei Wang reports [1] that cache validation triggered by writing to
    /sys/kernel/slab/<cache>/validate is racy against normal cache
    operations (e.g. freeing) in a way that can cause false positive
    inconsistency reports for caches with debugging enabled. The problem is
    that debugging actions that mark object free or active and actual
    freelist operations are not atomic, and the validation can see an
    inconsistent state.
    
    For caches that do or don't have debugging enabled, additional races
    involving n->nr_slabs are possible that result in false reports of wrong
    slab counts.
    
    This patch attempts to solve these issues while not adding overhead to
    normal (especially fastpath) operations for caches that do not have
    debugging enabled. Such overhead would not be justified to make possible
    userspace-triggered validation safe. Instead, disable the validation for
    caches that don't have debugging enabled and make their sysfs validate
    handler return -EINVAL.
    
    For caches that do have debugging enabled, we can instead extend the
    existing approach of not using percpu freelists to force all alloc/free
    operations to the slow paths where debugging flags is checked and acted
    upon. There can adjust the debug-specific paths to increase n->list_lock
    coverage against concurrent validation as necessary.
    
    The processing on free in free_debug_processing() already happens under
    n->list_lock so we can extend it to actually do the freeing as well and
    thus make it atomic against concurrent validation. As observed by
    Hyeonggon Yoo, we do not really need to take slab_lock() anymore here
    because all paths we could race with are protected by n->list_lock under
    the new scheme, so drop its usage here.
    
    The processing on alloc in alloc_debug_processing() currently doesn't
    take any locks, but we have to first allocate the object from a slab on
    the partial list (as debugging caches have no percpu slabs) and thus
    take the n->list_lock anyway. Add a function alloc_single_from_partial()
    that grabs just the allocated object instead of the whole freelist, and
    does the debug processing. The n->list_lock coverage again makes it
    atomic against validation and it is also ultimately more efficient than
    the current grabbing of freelist immediately followed by slab
    deactivation.
    
    To prevent races on n->nr_slabs updates, make sure that for caches with
    debugging enabled, inc_slabs_node() or dec_slabs_node() is called under
    n->list_lock. When allocating a new slab for a debug cache, handle the
    allocation by a new function alloc_single_from_new_slab() instead of the
    current forced deactivation path.
    
    Neither of these changes affect the fast paths at all. The changes in
    slow paths are negligible for non-debug caches.
    
    [1] https://lore.kernel.org/all/20220529081535.69275-1-rongwei.wang@linux.alibaba.com/Reported-by: NRongwei Wang <rongwei.wang@linux.alibaba.com>
    Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
    c7323a5a
slub.c 158.1 KB