• B
    writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs · 79c2ed51
    Baokun Li 提交于
    mainline inclusion
    from mainline-v6.3-rc8
    commit 1ba1199e
    category: bugfix
    bugzilla: 188601, https://gitee.com/openeuler/kernel/issues/I6TNTC
    CVE: NA
    
    Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ba1199ec5747f475538c0d25a32804e5ba1dfde
    
    --------------------------------
    
    KASAN report null-ptr-deref:
    ==================================================================
    BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
    Write of size 8 at addr 0000000000000000 by task sync/943
    CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
    Call Trace:
     <TASK>
     dump_stack_lvl+0x7f/0xc0
     print_report+0x2ba/0x340
     kasan_report+0xc4/0x120
     kasan_check_range+0x1b7/0x2e0
     __kasan_check_write+0x24/0x40
     bdi_split_work_to_wbs+0x5c5/0x7b0
     sync_inodes_sb+0x195/0x630
     sync_inodes_one_sb+0x3a/0x50
     iterate_supers+0x106/0x1b0
     ksys_sync+0x98/0x160
    [...]
    ==================================================================
    
    The race that causes the above issue is as follows:
    
               cpu1                     cpu2
    -------------------------|-------------------------
    inode_switch_wbs
     INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
     queue_rcu_work(isw_wq, &isw->work)
     // queue_work async
      inode_switch_wbs_work_fn
       wb_put_many(old_wb, nr_switched)
        percpu_ref_put_many
         ref->data->release(ref)
         cgwb_release
          queue_work(cgwb_release_wq, &wb->release_work)
          // queue_work async
           &wb->release_work
           cgwb_release_workfn
                                ksys_sync
                                 iterate_supers
                                  sync_inodes_one_sb
                                   sync_inodes_sb
                                    bdi_split_work_to_wbs
                                     kmalloc(sizeof(*work), GFP_ATOMIC)
                                     // alloc memory failed
            percpu_ref_exit
             ref->data = NULL
             kfree(data)
                                     wb_get(wb)
                                      percpu_ref_get(&wb->refcnt)
                                       percpu_ref_get_many(ref, 1)
                                        atomic_long_add(nr, &ref->data->count)
                                         atomic64_add(i, v)
                                         // trigger null-ptr-deref
    
    bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
    wbs.  If the allocation of new work fails, the on-stack fallback will be
    used and the reference count of the current wb is increased afterwards.
    If cgroup writeback membership switches occur before getting the reference
    count and the current wb is released as old_wd, then calling wb_get() or
    wb_put() will trigger the null pointer dereference above.
    
    This issue was introduced in v4.3-rc7 (see fix tag1).  Both
    sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
    bdi_split_work_to_wbs() can trigger this issue.  For scenarios called via
    sync_inodes_sb(), originally commit 7fc5854f ("writeback: synchronize
    sync(2) against cgroup writeback membership switches") reduced the
    possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
    fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
    inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
    thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
    and the issue becomes easily reproducible again.
    
    To solve this problem, percpu_ref_exit() is called under RCU protection to
    avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
    Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
    and skip the current wb if wb_tryget() fails because the wb has already
    been shutdown.
    
    Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
    Fixes: b817525a ("writeback: bdi_writeback iteration must not skip dying ones")
    Signed-off-by: NBaokun Li <libaokun1@huawei.com>
    Reviewed-by: NJan Kara <jack@suse.cz>
    Acked-by: NTejun Heo <tj@kernel.org>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Dennis Zhou <dennis@kernel.org>
    Cc: Hou Tao <houtao1@huawei.com>
    Cc: yangerkun <yangerkun@huawei.com>
    Cc: Zhang Yi <yi.zhang@huawei.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    
    Conflicts:
    	mm/backing-dev.c
    Signed-off-by: NBaokun Li <libaokun1@huawei.com>
    Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
    Reviewed-by: NYang Erkun <yangerkun@huawei.com>
    Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
    79c2ed51
fs-writeback.c 74.4 KB