• N
    mm: slab: fix kmem_cache_create failed when sysfs node not destroyed · d0ffe36f
    Nanyong Sun 提交于
    hulk inclusion
    category: bugfix
    bugzilla: 174641
    CVE: NA
    
    ------------------------------------
    
    The commit d38a2b7a ("mm: memcg/slab: fix memory leak at non-root
    kmem_cache destroy") introduced a problem: If one thread destroy a
    kmem_cache A and another thread concurrently create a kmem_cache B,
    which is mergeable with A and has same size with A, the B may fail to
    create due to the duplicate sysfs node.
    The scenario in detail:
    1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
    mergeable, it decreases A's refcount and if refcount is 0, then call
    memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
    then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
    a while.
    Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
    present, it will be deleted in shutdown_cache() which will be called
    after flush_memcg_workqueue() is done and lock the slab_mutex again.
    2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
    is mergeable with A(their size is same), it gain the lock of slab_mutex,
    then call __kmem_cache_alias() trying to find a mergeable node, because
    of the below added code in commit d38a2b7a ("mm: memcg/slab: fix
    memory leak at non-root kmem_cache destroy"), B is not mergeable with
    A whose memcg_params.dying is true.
    
    int slab_unmergeable(struct kmem_cache *s)
     	if (s->refcount < 0)
     		return 1;
    
    	/*
    	 * Skip the dying kmem_cache.
    	 */
    	if (s->memcg_params.dying)
    		return 1;
    
     	return 0;
     }
    
    So B has to create its own sysfs node by calling:
     create_cache->
    	__kmem_cache_create->
    		sysfs_slab_add->
    			kobject_init_and_add
    Because B is mergeable itself, its filename of sysfs node is based on its size,
    like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
    node of A is still present now, so kobject_init_and_add() will return
    fail and result in kmem_cache_create() fail.
    
    Concurrently modprobe and rmmod the two modules below can reproduce the issue
    quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.
    
    LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
    v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
    controller") almost refactored memcg slab.
    
    A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
    the slab_mutex lock can prevent the race between alias kmem_cache creating thread
    and root kmem_cache destroying thread. In the destroying thread, after
    flush_memcg_workqueue() is done, judge the refcount again, if someone
    reference it again during un-lock time, we don't need to destroy the kmem_cache
    completely, we can reuse it.
    
    Another potential solution: revert the commit d38a2b7a ("mm: memcg/slab:
    fix memory leak at non-root kmem_cache destroy"), compare to the fail of
    kmem_cache_create, the memory leak in special scenario seems less harmful.
    
    Call trace:
     sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
     Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
     Call trace:
      dump_backtrace+0x0/0x198
      show_stack+0x24/0x30
      dump_stack+0xb0/0x100
      sysfs_warn_dup+0x6c/0x88
      sysfs_create_dir_ns+0x104/0x120
      kobject_add_internal+0xd0/0x378
      kobject_init_and_add+0x90/0xd8
      sysfs_slab_add+0x16c/0x2d0
      __kmem_cache_create+0x16c/0x1d8
      create_cache+0xbc/0x1f8
      kmem_cache_create_usercopy+0x1a0/0x230
      kmem_cache_create+0x50/0x68
      init_se_kmem_caches+0x38/0x258 [target_core_mod]
      target_core_init_configfs+0x8c/0x390 [target_core_mod]
      do_one_initcall+0x54/0x230
      do_init_module+0x64/0x1ec
      load_module+0x150c/0x16f0
      __se_sys_finit_module+0xf0/0x108
      __arm64_sys_finit_module+0x24/0x30
      el0_svc_common+0x80/0x1c0
      el0_svc_handler+0x78/0xe0
      el0_svc+0x10/0x260
     kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
     kmem_cache_create(se_sess_cache) failed with error -17
     Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
     Call trace:
      dump_backtrace+0x0/0x198
      show_stack+0x24/0x30
      dump_stack+0xb0/0x100
      kmem_cache_create_usercopy+0xa8/0x230
      kmem_cache_create+0x50/0x68
      init_se_kmem_caches+0x38/0x258 [target_core_mod]
      target_core_init_configfs+0x8c/0x390 [target_core_mod]
      do_one_initcall+0x54/0x230
      do_init_module+0x64/0x1ec
      load_module+0x150c/0x16f0
      __se_sys_finit_module+0xf0/0x108
      __arm64_sys_finit_module+0x24/0x30
      el0_svc_common+0x80/0x1c0
      el0_svc_handler+0x78/0xe0
      el0_svc+0x10/0x260
    
    Fixes: d38a2b7a ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
    Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
    Cc: stable@vger.kernel.org
    Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    d0ffe36f
slab_common.c 38.3 KB