1. 14 11月, 2019 1 次提交
    • G
      bcache: fix a lost wake-up problem caused by mca_cannibalize_lock · 34cf78bf
      Guoju Fang 提交于
      This patch fix a lost wake-up problem caused by the race between
      mca_cannibalize_lock and bch_cannibalize_unlock.
      
      Consider two processes, A and B. Process A is executing
      mca_cannibalize_lock, while process B takes c->btree_cache_alloc_lock
      and is executing bch_cannibalize_unlock. The problem happens that after
      process A executes cmpxchg and will execute prepare_to_wait. In this
      timeslice process B executes wake_up, but after that process A executes
      prepare_to_wait and set the state to TASK_INTERRUPTIBLE. Then process A
      goes to sleep but no one will wake up it. This problem may cause bcache
      device to dead.
      Signed-off-by: NGuoju Fang <fangguoju@gmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      34cf78bf
  2. 22 7月, 2019 1 次提交
  3. 28 6月, 2019 16 次提交
    • C
      bcache: shrink btree node cache after bch_btree_check() · 1df3877f
      Coly Li 提交于
      When cache set starts, bch_btree_check() will check all bkeys on cache
      device by calculating the checksum. This operation will consume a huge
      number of system memory if there are a lot of data cached. Since bcache
      uses its own mca cache to maintain all its read-in btree nodes, and only
      releases the cache space when system memory manage code starts to shrink
      caches. Then before memory manager code to call the mca cache shrinker
      callback, bcache mca cache will compete memory resource with user space
      application, which may have nagive effect to performance of user space
      workloads (e.g. data base, or I/O service of distributed storage node).
      
      This patch tries to call bcache mca shrinker routine to proactively
      release mca cache memory, to decrease the memory pressure of system and
      avoid negative effort of the overall system I/O performance.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1df3877f
    • C
      bcache: fix potential deadlock in cached_def_free() · 7e865eba
      Coly Li 提交于
      When enable lockdep and reboot system with a writeback mode bcache
      device, the following potential deadlock warning is reported by lockdep
      engine.
      
      [  101.536569][  T401] kworker/2:2/401 is trying to acquire lock:
      [  101.538575][  T401] 00000000bbf6e6c7 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
      [  101.542054][  T401]
      [  101.542054][  T401] but task is already holding lock:
      [  101.544587][  T401] 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
      [  101.548386][  T401]
      [  101.548386][  T401] which lock already depends on the new lock.
      [  101.548386][  T401]
      [  101.551874][  T401]
      [  101.551874][  T401] the existing dependency chain (in reverse order) is:
      [  101.555000][  T401]
      [  101.555000][  T401] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
      [  101.557860][  T401]        process_one_work+0x277/0x640
      [  101.559661][  T401]        worker_thread+0x39/0x3f0
      [  101.561340][  T401]        kthread+0x125/0x140
      [  101.562963][  T401]        ret_from_fork+0x3a/0x50
      [  101.564718][  T401]
      [  101.564718][  T401] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
      [  101.567701][  T401]        lock_acquire+0xb4/0x1c0
      [  101.569651][  T401]        flush_workqueue+0xae/0x4c0
      [  101.571494][  T401]        drain_workqueue+0xa9/0x180
      [  101.573234][  T401]        destroy_workqueue+0x17/0x250
      [  101.575109][  T401]        cached_dev_free+0x44/0x120 [bcache]
      [  101.577304][  T401]        process_one_work+0x2a4/0x640
      [  101.579357][  T401]        worker_thread+0x39/0x3f0
      [  101.581055][  T401]        kthread+0x125/0x140
      [  101.582709][  T401]        ret_from_fork+0x3a/0x50
      [  101.584592][  T401]
      [  101.584592][  T401] other info that might help us debug this:
      [  101.584592][  T401]
      [  101.588355][  T401]  Possible unsafe locking scenario:
      [  101.588355][  T401]
      [  101.590974][  T401]        CPU0                    CPU1
      [  101.592889][  T401]        ----                    ----
      [  101.594743][  T401]   lock((work_completion)(&cl->work)#2);
      [  101.596785][  T401]                                lock((wq_completion)bcache_writeback_wq);
      [  101.600072][  T401]                                lock((work_completion)(&cl->work)#2);
      [  101.602971][  T401]   lock((wq_completion)bcache_writeback_wq);
      [  101.605255][  T401]
      [  101.605255][  T401]  *** DEADLOCK ***
      [  101.605255][  T401]
      [  101.608310][  T401] 2 locks held by kworker/2:2/401:
      [  101.610208][  T401]  #0: 00000000cf2c7d17 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
      [  101.613709][  T401]  #1: 00000000f5f305b3 ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
      [  101.617480][  T401]
      [  101.617480][  T401] stack backtrace:
      [  101.619539][  T401] CPU: 2 PID: 401 Comm: kworker/2:2 Tainted: G        W         5.2.0-rc4-lp151.20-default+ #1
      [  101.623225][  T401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
      [  101.627210][  T401] Workqueue: events cached_dev_free [bcache]
      [  101.629239][  T401] Call Trace:
      [  101.630360][  T401]  dump_stack+0x85/0xcb
      [  101.631777][  T401]  print_circular_bug+0x19a/0x1f0
      [  101.633485][  T401]  __lock_acquire+0x16cd/0x1850
      [  101.635184][  T401]  ? __lock_acquire+0x6a8/0x1850
      [  101.636863][  T401]  ? lock_acquire+0xb4/0x1c0
      [  101.638421][  T401]  ? find_held_lock+0x34/0xa0
      [  101.640015][  T401]  lock_acquire+0xb4/0x1c0
      [  101.641513][  T401]  ? flush_workqueue+0x87/0x4c0
      [  101.643248][  T401]  flush_workqueue+0xae/0x4c0
      [  101.644832][  T401]  ? flush_workqueue+0x87/0x4c0
      [  101.646476][  T401]  ? drain_workqueue+0xa9/0x180
      [  101.648303][  T401]  drain_workqueue+0xa9/0x180
      [  101.649867][  T401]  destroy_workqueue+0x17/0x250
      [  101.651503][  T401]  cached_dev_free+0x44/0x120 [bcache]
      [  101.653328][  T401]  process_one_work+0x2a4/0x640
      [  101.655029][  T401]  worker_thread+0x39/0x3f0
      [  101.656693][  T401]  ? process_one_work+0x640/0x640
      [  101.658501][  T401]  kthread+0x125/0x140
      [  101.660012][  T401]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  101.661985][  T401]  ret_from_fork+0x3a/0x50
      [  101.691318][  T401] bcache: bcache_device_free() bcache0 stopped
      
      Here is how the above potential deadlock may happen in reboot/shutdown
      code path,
      1) bcache_reboot() is called firstly in the reboot/shutdown code path,
         then in bcache_reboot(), bcache_device_stop() is called.
      2) bcache_device_stop() sets BCACHE_DEV_CLOSING on d->falgs, then call
         closure_queue(&d->cl) to invoke cached_dev_flush(). And in turn
         cached_dev_flush() calls cached_dev_free() via closure_at()
      3) In cached_dev_free(), after stopped writebach kthread
         dc->writeback_thread, the kwork dc->writeback_write_wq is stopping by
         destroy_workqueue().
      4) Inside destroy_workqueue(), drain_workqueue() is called. Inside
         drain_workqueue(), flush_workqueue() is called. Then wq->lockdep_map
         is acquired by lock_map_acquire() in flush_workqueue(). After the
         lock acquired the rest part of flush_workqueue() just wait for the
         workqueue to complete.
      5) Now we look back at writeback thread routine bch_writeback_thread(),
         in the main while-loop, write_dirty() is called via continue_at() in
         read_dirty_submit(), which is called via continue_at() in while-loop
         level called function read_dirty(). Inside write_dirty() it may be
         re-called on workqueeu dc->writeback_write_wq via continue_at().
         It means when the writeback kthread is stopped in cached_dev_free()
         there might be still one kworker queued on dc->writeback_write_wq
         to execute write_dirty() again.
      6) Now this kworker is scheduled on dc->writeback_write_wq to run by
         process_one_work() (which is called by worker_thread()). Before
         calling the kwork routine, wq->lockdep_map is acquired.
      7) But wq->lockdep_map is acquired already in step 4), so a A-A lock
         (lockdep terminology) scenario happens.
      
      Indeed on multiple cores syatem, the above deadlock is very rare to
      happen, just as the code comments in process_one_work() says,
      2263     * AFAICT there is no possible deadlock scenario between the
      2264     * flush_work() and complete() primitives (except for
      	   single-threaded
      2265     * workqueues), so hiding them isn't a problem.
      
      But it is still good to fix such lockdep warning, even no one running
      bcache on single core system.
      
      The fix is simple. This patch solves the above potential deadlock by,
      - Do not destroy workqueue dc->writeback_write_wq in cached_dev_free().
      - Flush and destroy dc->writeback_write_wq in writebach kthread routine
        bch_writeback_thread(), where after quit the thread main while-loop
        and before cached_dev_put() is called.
      
      By this fix, dc->writeback_write_wq will be stopped and destroy before
      the writeback kthread stopped, so the chance for a A-A locking on
      wq->lockdep_map is disappeared, such A-A deadlock won't happen
      any more.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7e865eba
    • C
      bcache: acquire bch_register_lock later in cached_dev_free() · 80265d8d
      Coly Li 提交于
      When enable lockdep engine, a lockdep warning can be observed when
      reboot or shutdown system,
      
      [ 3142.764557][    T1] bcache: bcache_reboot() Stopping all devices:
      [ 3142.776265][ T2649]
      [ 3142.777159][ T2649] ======================================================
      [ 3142.780039][ T2649] WARNING: possible circular locking dependency detected
      [ 3142.782869][ T2649] 5.2.0-rc4-lp151.20-default+ #1 Tainted: G        W
      [ 3142.785684][ T2649] ------------------------------------------------------
      [ 3142.788479][ T2649] kworker/3:67/2649 is trying to acquire lock:
      [ 3142.790738][ T2649] 00000000aaf02291 ((wq_completion)bcache_writeback_wq){+.+.}, at: flush_workqueue+0x87/0x4c0
      [ 3142.794678][ T2649]
      [ 3142.794678][ T2649] but task is already holding lock:
      [ 3142.797402][ T2649] 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
      [ 3142.801462][ T2649]
      [ 3142.801462][ T2649] which lock already depends on the new lock.
      [ 3142.801462][ T2649]
      [ 3142.805277][ T2649]
      [ 3142.805277][ T2649] the existing dependency chain (in reverse order) is:
      [ 3142.808902][ T2649]
      [ 3142.808902][ T2649] -> #2 (&bch_register_lock){+.+.}:
      [ 3142.812396][ T2649]        __mutex_lock+0x7a/0x9d0
      [ 3142.814184][ T2649]        cached_dev_free+0x17/0x120 [bcache]
      [ 3142.816415][ T2649]        process_one_work+0x2a4/0x640
      [ 3142.818413][ T2649]        worker_thread+0x39/0x3f0
      [ 3142.820276][ T2649]        kthread+0x125/0x140
      [ 3142.822061][ T2649]        ret_from_fork+0x3a/0x50
      [ 3142.823965][ T2649]
      [ 3142.823965][ T2649] -> #1 ((work_completion)(&cl->work)#2){+.+.}:
      [ 3142.827244][ T2649]        process_one_work+0x277/0x640
      [ 3142.829160][ T2649]        worker_thread+0x39/0x3f0
      [ 3142.830958][ T2649]        kthread+0x125/0x140
      [ 3142.832674][ T2649]        ret_from_fork+0x3a/0x50
      [ 3142.834915][ T2649]
      [ 3142.834915][ T2649] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}:
      [ 3142.838121][ T2649]        lock_acquire+0xb4/0x1c0
      [ 3142.840025][ T2649]        flush_workqueue+0xae/0x4c0
      [ 3142.842035][ T2649]        drain_workqueue+0xa9/0x180
      [ 3142.844042][ T2649]        destroy_workqueue+0x17/0x250
      [ 3142.846142][ T2649]        cached_dev_free+0x52/0x120 [bcache]
      [ 3142.848530][ T2649]        process_one_work+0x2a4/0x640
      [ 3142.850663][ T2649]        worker_thread+0x39/0x3f0
      [ 3142.852464][ T2649]        kthread+0x125/0x140
      [ 3142.854106][ T2649]        ret_from_fork+0x3a/0x50
      [ 3142.855880][ T2649]
      [ 3142.855880][ T2649] other info that might help us debug this:
      [ 3142.855880][ T2649]
      [ 3142.859663][ T2649] Chain exists of:
      [ 3142.859663][ T2649]   (wq_completion)bcache_writeback_wq --> (work_completion)(&cl->work)#2 --> &bch_register_lock
      [ 3142.859663][ T2649]
      [ 3142.865424][ T2649]  Possible unsafe locking scenario:
      [ 3142.865424][ T2649]
      [ 3142.868022][ T2649]        CPU0                    CPU1
      [ 3142.869885][ T2649]        ----                    ----
      [ 3142.871751][ T2649]   lock(&bch_register_lock);
      [ 3142.873379][ T2649]                                lock((work_completion)(&cl->work)#2);
      [ 3142.876399][ T2649]                                lock(&bch_register_lock);
      [ 3142.879727][ T2649]   lock((wq_completion)bcache_writeback_wq);
      [ 3142.882064][ T2649]
      [ 3142.882064][ T2649]  *** DEADLOCK ***
      [ 3142.882064][ T2649]
      [ 3142.885060][ T2649] 3 locks held by kworker/3:67/2649:
      [ 3142.887245][ T2649]  #0: 00000000e774cdd0 ((wq_completion)events){+.+.}, at: process_one_work+0x21e/0x640
      [ 3142.890815][ T2649]  #1: 00000000f7df89da ((work_completion)(&cl->work)#2){+.+.}, at: process_one_work+0x21e/0x640
      [ 3142.894884][ T2649]  #2: 000000004fcf89c5 (&bch_register_lock){+.+.}, at: cached_dev_free+0x17/0x120 [bcache]
      [ 3142.898797][ T2649]
      [ 3142.898797][ T2649] stack backtrace:
      [ 3142.900961][ T2649] CPU: 3 PID: 2649 Comm: kworker/3:67 Tainted: G        W         5.2.0-rc4-lp151.20-default+ #1
      [ 3142.904789][ T2649] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
      [ 3142.909168][ T2649] Workqueue: events cached_dev_free [bcache]
      [ 3142.911422][ T2649] Call Trace:
      [ 3142.912656][ T2649]  dump_stack+0x85/0xcb
      [ 3142.914181][ T2649]  print_circular_bug+0x19a/0x1f0
      [ 3142.916193][ T2649]  __lock_acquire+0x16cd/0x1850
      [ 3142.917936][ T2649]  ? __lock_acquire+0x6a8/0x1850
      [ 3142.919704][ T2649]  ? lock_acquire+0xb4/0x1c0
      [ 3142.921335][ T2649]  ? find_held_lock+0x34/0xa0
      [ 3142.923052][ T2649]  lock_acquire+0xb4/0x1c0
      [ 3142.924635][ T2649]  ? flush_workqueue+0x87/0x4c0
      [ 3142.926375][ T2649]  flush_workqueue+0xae/0x4c0
      [ 3142.928047][ T2649]  ? flush_workqueue+0x87/0x4c0
      [ 3142.929824][ T2649]  ? drain_workqueue+0xa9/0x180
      [ 3142.931686][ T2649]  drain_workqueue+0xa9/0x180
      [ 3142.933534][ T2649]  destroy_workqueue+0x17/0x250
      [ 3142.935787][ T2649]  cached_dev_free+0x52/0x120 [bcache]
      [ 3142.937795][ T2649]  process_one_work+0x2a4/0x640
      [ 3142.939803][ T2649]  worker_thread+0x39/0x3f0
      [ 3142.941487][ T2649]  ? process_one_work+0x640/0x640
      [ 3142.943389][ T2649]  kthread+0x125/0x140
      [ 3142.944894][ T2649]  ? kthread_create_worker_on_cpu+0x70/0x70
      [ 3142.947744][ T2649]  ret_from_fork+0x3a/0x50
      [ 3142.970358][ T2649] bcache: bcache_device_free() bcache0 stopped
      
      Here is how the deadlock happens.
      1) bcache_reboot() calls bcache_device_stop(), then inside
         bcache_device_stop() BCACHE_DEV_CLOSING bit is set on d->flags.
         Then closure_queue(&d->cl) is called to invoke cached_dev_flush().
      2) In cached_dev_flush(), cached_dev_free() is called by continu_at().
      3) In cached_dev_free(), when stopping the writeback kthread of the
         cached device by kthread_stop(), dc->writeback_thread will be waken
         up to quite the kthread while-loop, then cached_dev_put() is called
         in bch_writeback_thread().
      4) Calling cached_dev_put() in writeback kthread may drop dc->count to
         0, then dc->detach kworker is scheduled, which is initialized as
         cached_dev_detach_finish().
      5) Inside cached_dev_detach_finish(), the last line of code is to call
         closure_put(&dc->disk.cl), which drops the last reference counter of
         closrure dc->disk.cl, then the callback cached_dev_flush() gets
         called.
      Now cached_dev_flush() is called for second time in the code path, the
      first time is in step 2). And again bch_register_lock will be acquired
      again, and a A-A lock (lockdep terminology) is happening.
      
      The root cause of the above A-A lock is in cached_dev_free(), mutex
      bch_register_lock is held before stopping writeback kthread and other
      kworkers. Fortunately now we have variable 'bcache_is_reboot', which may
      prevent device registration or unregistration during reboot/shutdown
      time, so it is unncessary to hold bch_register_lock such early now.
      
      This is how this patch fixes the reboot/shutdown time A-A lock issue:
      After moving mutex_lock(&bch_register_lock) to a later location where
      before atomic_read(&dc->running) in cached_dev_free(), such A-A lock
      problem can be solved without any reboot time registration race.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      80265d8d
    • C
      bcache: acquire bch_register_lock later in cached_dev_detach_finish() · 97ba3b81
      Coly Li 提交于
      Now there is variable bcache_is_reboot to prevent device register or
      unregister during reboot, it is unncessary to still hold mutex lock
      bch_register_lock before stopping writeback_rate_update kworker and
      writeback kthread. And if the stopping kworker or kthread holding
      bch_register_lock inside their routine (we used to have such problem
      in writeback thread, thanks to Junhui Wang fixed it), it is very easy
      to introduce deadlock during reboot/shutdown procedure.
      
      Therefore in this patch, the location to acquire bch_register_lock is
      moved to the location before calling calc_cached_dev_sectors(). Which
      is later then original location in cached_dev_detach_finish().
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      97ba3b81
    • C
      bcache: avoid a deadlock in bcache_reboot() · a59ff6cc
      Coly Li 提交于
      It is quite frequently to observe deadlock in bcache_reboot() happens
      and hang the system reboot process. The reason is, in bcache_reboot()
      when calling bch_cache_set_stop() and bcache_device_stop() the mutex
      bch_register_lock is held. But in the process to stop cache set and
      bcache device, bch_register_lock will be acquired again. If this mutex
      is held here, deadlock will happen inside the stopping process. The
      aftermath of the deadlock is, whole system reboot gets hung.
      
      The fix is to avoid holding bch_register_lock for the following loops
      in bcache_reboot(),
             list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
                      bch_cache_set_stop(c);
      
              list_for_each_entry_safe(dc, tdc, &uncached_devices, list)
                      bcache_device_stop(&dc->disk);
      
      A module range variable 'bcache_is_reboot' is added, it sets to true
      in bcache_reboot(). In register_bcache(), if bcache_is_reboot is checked
      to be true, reject the registration by returning -EBUSY immediately.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a59ff6cc
    • C
      bcache: stop writeback kthread and kworker when bch_cached_dev_run() failed · 5c2a634c
      Coly Li 提交于
      In bch_cached_dev_attach() after bch_cached_dev_writeback_start()
      called, the wrireback kthread and writeback rate update kworker of the
      cached device are created, if the following bch_cached_dev_run()
      failed, bch_cached_dev_attach() will return with -ENOMEM without
      stopping the writeback related kthread and kworker.
      
      This patch stops writeback kthread and writeback rate update kworker
      before returning -ENOMEM if bch_cached_dev_run() returns error.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5c2a634c
    • C
      bcache: add pendings_cleanup to stop pending bcache device · 0c277e21
      Coly Li 提交于
      If a bcache device is in dirty state and its cache set is not
      registered, this bcache device will not appear in /dev/bcache<N>,
      and there is no way to stop it or remove the bcache kernel module.
      
      This is an as-designed behavior, but sometimes people has to reboot
      whole system to release or stop the pending backing device.
      
      This sysfs interface may remove such pending bcache devices when
      write anything into the sysfs file manually.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0c277e21
    • C
      bcache: remove "XXX:" comment line from run_cache_set() · 68a53c95
      Coly Li 提交于
      In previous bcache patches for Linux v5.2, the failure code path of
      run_cache_set() is tested and fixed. So now the following comment
      line can be removed from run_cache_set(),
      	/* XXX: test this, it's broken */
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      68a53c95
    • C
      bcache: improve error message in bch_cached_dev_run() · e0faa3d7
      Coly Li 提交于
      This patch adds more error message in bch_cached_dev_run() to indicate
      the exact reason why an error value is returned. Please notice when
      printing out the "is running already" message, pr_info() is used here,
      because in this case also -EBUSY is returned, the bcache device can
      continue to attach to the cache devince and run, so it won't be an
      error level message in kernel message.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e0faa3d7
    • C
      bcache: add more error message in bch_cached_dev_attach() · 633bb2ce
      Coly Li 提交于
      This patch adds more error message for attaching cached device, this is
      helpful to debug code failure during bache device start up.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      633bb2ce
    • C
      bcache: more detailed error message to bcache_device_link() · 4b6efb4b
      Coly Li 提交于
      This patch adds more accurate error message for specific
      ssyfs_create_link() call, to help debugging failure during
      bcache device start tup.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4b6efb4b
    • C
      bcache: add return value check to bch_cached_dev_run() · 0b13efec
      Coly Li 提交于
      This patch adds return value check to bch_cached_dev_run(), now if there
      is error happens inside bch_cached_dev_run(), it can be catched.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b13efec
    • C
      bcache: add io error counting in write_bdev_super_endio() · 08ec1e62
      Coly Li 提交于
      When backing device super block is written by bch_write_bdev_super(),
      the bio complete callback write_bdev_super_endio() simply ignores I/O
      status. Indeed such write request also contribute to backing device
      health status if the request failed.
      
      This patch checkes bio->bi_status in write_bdev_super_endio(), if there
      is error, bch_count_backing_io_errors() will be called to count an I/O
      error to dc->io_errors.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08ec1e62
    • C
      bcache: avoid flushing btree node in cache_set_flush() if io disabled · e6dcbd3e
      Coly Li 提交于
      When cache_set_flush() is called for too many I/O errors detected on
      cache device and the cache set is retiring, inside the function it
      doesn't make sense to flushing cached btree nodes from c->btree_cache
      because CACHE_SET_IO_DISABLE is set on c->flags already and all I/Os
      onto cache device will be rejected.
      
      This patch checks in cache_set_flush() that whether CACHE_SET_IO_DISABLE
      is set. If yes, then avoids to flush the cached btree nodes to reduce
      more time and make cache set retiring more faster.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e6dcbd3e
    • C
      Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()" · 695277f1
      Coly Li 提交于
      This reverts commit 6147305c.
      
      Although this patch helps the failed bcache device to stop faster when
      too many I/O errors detected on corresponding cached device, setting
      CACHE_SET_IO_DISABLE bit to cache set c->flags was not a good idea. This
      operation will disable all I/Os on cache set, which means other attached
      bcache devices won't work neither.
      
      Without this patch, the failed bcache device can also be stopped
      eventually if internal I/O accomplished (e.g. writeback). Therefore here
      I revert it.
      
      Fixes: 6147305c ("bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()")
      Reported-by: NYong Li <mr.liyong@qq.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      695277f1
    • C
      bcache: check c->gc_thread by IS_ERR_OR_NULL in cache_set_flush() · b387e9b5
      Coly Li 提交于
      When system memory is in heavy pressure, bch_gc_thread_start() from
      run_cache_set() may fail due to out of memory. In such condition,
      c->gc_thread is assigned to -ENOMEM, not NULL pointer. Then in following
      failure code path bch_cache_set_error(), when cache_set_flush() gets
      called, the code piece to stop c->gc_thread is broken,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      
      And KASAN catches such NULL pointer deference problem, with the warning
      information:
      
      [  561.207881] ==================================================================
      [  561.207900] BUG: KASAN: null-ptr-deref in kthread_stop+0x3b/0x440
      [  561.207904] Write of size 4 at addr 000000000000001c by task kworker/15:1/313
      
      [  561.207913] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G        W         5.0.0-vanilla+ #3
      [  561.207916] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.207935] Workqueue: events cache_set_flush [bcache]
      [  561.207940] Call Trace:
      [  561.207948]  dump_stack+0x9a/0xeb
      [  561.207955]  ? kthread_stop+0x3b/0x440
      [  561.207960]  ? kthread_stop+0x3b/0x440
      [  561.207965]  kasan_report+0x176/0x192
      [  561.207973]  ? kthread_stop+0x3b/0x440
      [  561.207981]  kthread_stop+0x3b/0x440
      [  561.207995]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  561.208008]  process_one_work+0x856/0x1620
      [  561.208015]  ? find_held_lock+0x39/0x1d0
      [  561.208028]  ? drain_workqueue+0x380/0x380
      [  561.208048]  worker_thread+0x87/0xb80
      [  561.208058]  ? __kthread_parkme+0xb6/0x180
      [  561.208067]  ? process_one_work+0x1620/0x1620
      [  561.208072]  kthread+0x326/0x3e0
      [  561.208079]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  561.208090]  ret_from_fork+0x3a/0x50
      [  561.208110] ==================================================================
      [  561.208113] Disabling lock debugging due to kernel taint
      [  561.208115] irq event stamp: 11800231
      [  561.208126] hardirqs last  enabled at (11800231): [<ffffffff83008538>] do_syscall_64+0x18/0x410
      [  561.208127] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
      [  561.208129] #PF error: [WRITE]
      [  561.312253] hardirqs last disabled at (11800230): [<ffffffff830052ff>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  561.312259] softirqs last  enabled at (11799832): [<ffffffff850005c7>] __do_softirq+0x5c7/0x8c3
      [  561.405975] PGD 0 P4D 0
      [  561.442494] softirqs last disabled at (11799821): [<ffffffff831add2c>] irq_exit+0x1ac/0x1e0
      [  561.791359] Oops: 0002 [#1] SMP KASAN NOPTI
      [  561.791362] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G    B   W         5.0.0-vanilla+ #3
      [  561.791363] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
      [  561.791371] Workqueue: events cache_set_flush [bcache]
      [  561.791374] RIP: 0010:kthread_stop+0x3b/0x440
      [  561.791376] Code: 00 00 65 8b 05 26 d5 e0 7c 89 c0 48 0f a3 05 ec aa df 02 0f 82 dc 02 00 00 4c 8d 63 20 be 04 00 00 00 4c 89 e7 e8 65 c5 53 00 <f0> ff 43 20 48 8d 7b 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
      [  561.791377] RSP: 0018:ffff88872fc8fd10 EFLAGS: 00010286
      [  561.838895] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838916] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838934] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838948] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838966] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838979] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  561.838996] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.067028] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffffffff832dd314
      [  563.067030] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000297
      [  563.067032] RBP: ffff88872fc8fe88 R08: fffffbfff0b8213d R09: fffffbfff0b8213d
      [  563.067034] R10: 0000000000000001 R11: fffffbfff0b8213c R12: 000000000000001c
      [  563.408618] R13: ffff88dc61cc0f68 R14: ffff888102b94900 R15: ffff88dc61cc0f68
      [  563.408620] FS:  0000000000000000(0000) GS:ffff888f7dc00000(0000) knlGS:0000000000000000
      [  563.408622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  563.408623] CR2: 000000000000001c CR3: 0000000f48a1a004 CR4: 00000000007606e0
      [  563.408625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  563.408627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  563.904795] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  563.915796] PKRU: 55555554
      [  563.915797] Call Trace:
      [  563.915807]  cache_set_flush+0xd4/0x6d0 [bcache]
      [  563.915812]  process_one_work+0x856/0x1620
      [  564.001226] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.033563]  ? find_held_lock+0x39/0x1d0
      [  564.033567]  ? drain_workqueue+0x380/0x380
      [  564.033574]  worker_thread+0x87/0xb80
      [  564.062823] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.118042]  ? __kthread_parkme+0xb6/0x180
      [  564.118046]  ? process_one_work+0x1620/0x1620
      [  564.118048]  kthread+0x326/0x3e0
      [  564.118050]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  564.167066] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.252441]  ret_from_fork+0x3a/0x50
      [  564.252447] Modules linked in: msr rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib i40iw configfs iw_cm ib_cm libiscsi scsi_transport_iscsi mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat intel_rapl skx_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ses raid0 aesni_intel cdc_ether enclosure usbnet ipmi_ssif joydev aes_x86_64 i40e scsi_transport_sas mii bcache md_mod crypto_simd mei_me ioatdma crc64 ptp cryptd pcspkr i2c_i801 mlx4_core glue_helper pps_core mei lpc_ich dca wmi ipmi_si ipmi_devintf nd_pmem dax_pmem nd_btt ipmi_msghandler device_dax pcc_cpufreq button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd ttm megaraid_sas drm usbcore nfit libnvdimm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
      [  564.299390] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
      [  564.348360] CR2: 000000000000001c
      [  564.348362] ---[ end trace b7f0e5cc7b2103b0 ]---
      
      Therefore, it is not enough to only check whether c->gc_thread is NULL,
      we should use IS_ERR_OR_NULL() to check both NULL pointer and error
      value.
      
      This patch changes the above buggy code piece in this way,
               if (!IS_ERR_OR_NULL(c->gc_thread))
                       kthread_stop(c->gc_thread);
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b387e9b5
  4. 30 4月, 2019 1 次提交
  5. 25 4月, 2019 9 次提交
    • S
      bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC... · 95f18c9d
      Shenghui Wang 提交于
      bcache: avoid potential memleak of list of journal_replay(s) in the CACHE_SYNC branch of run_cache_set
      
      In the CACHE_SYNC branch of run_cache_set(), LIST_HEAD(journal) is used
      to collect journal_replay(s) and filled by bch_journal_read().
      
      If all goes well, bch_journal_replay() will release the list of
      jounal_replay(s) at the end of the branch.
      
      If something goes wrong, code flow will jump to the label "err:" and leave
      the list unreleased.
      
      This patch will release the list of journal_replay(s) in the case of
      error detected.
      
      v1 -> v2:
      * Move the release code to the location after label 'err:' to
        simply the change.
      Signed-off-by: NShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      95f18c9d
    • C
      bcache: improve bcache_reboot() · eb8cbb6d
      Coly Li 提交于
      This patch tries to release mutex bch_register_lock early, to give
      chance to stop cache set and bcache device early.
      
      This patch also expends time out of stopping all bcache device from
      2 seconds to 10 seconds, because stopping writeback rate update worker
      may delay for 5 seconds, 2 seconds is not enough.
      
      After this patch applied, stopping bcache devices during system reboot
      or shutdown is very hard to be observed any more.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eb8cbb6d
    • C
      bcache: add comments for closure_fn to be called in closure_queue() · 63d63b51
      Coly Li 提交于
      Add code comments to explain which call back function might be called
      for the closure_queue(). This is an effort to make code to be more
      understandable for readers.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      63d63b51
    • C
      bcache: Add comments for blkdev_put() in registration code path · bb6d355c
      Coly Li 提交于
      Add comments to explain why in register_bcache() blkdev_put() won't
      be called in two location. Add comments to explain why blkdev_put()
      must be called in register_cache() when cache_alloc() failed.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb6d355c
    • C
      bcache: add error check for calling register_bdev() · 88c12d42
      Coly Li 提交于
      This patch adds return value to register_bdev(). Then if failure happens
      inside register_bdev(), its caller register_bcache() may detect and
      handle the failure more properly.
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      88c12d42
    • C
      bcache: add comments for kobj release callback routine · 2d17456e
      Coly Li 提交于
      Bcache has several routines to release resources in implicit way, they
      are called when the associated kobj released. This patch adds code
      comments to notice when and which release callback will be called,
      - When dc->disk.kobj released:
        void bch_cached_dev_release(struct kobject *kobj)
      - When d->kobj released:
        void bch_flash_dev_release(struct kobject *kobj)
      - When c->kobj released:
        void bch_cache_set_release(struct kobject *kobj)
      - When ca->kobj released
        void bch_cache_release(struct kobject *kobj)
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2d17456e
    • C
      bcache: add failure check to run_cache_set() for journal replay · ce3e4cfb
      Coly Li 提交于
      Currently run_cache_set() has no return value, if there is failure in
      bch_journal_replay(), the caller of run_cache_set() has no idea about
      such failure and just continue to execute following code after
      run_cache_set().  The internal failure is triggered inside
      bch_journal_replay() and being handled in async way. This behavior is
      inefficient, while failure handling inside bch_journal_replay(), cache
      register code is still running to start the cache set. Registering and
      unregistering code running as same time may introduce some rare race
      condition, and make the code to be more hard to be understood.
      
      This patch adds return value to run_cache_set(), and returns -EIO if
      bch_journal_rreplay() fails. Then caller of run_cache_set() may detect
      such failure and stop registering code flow immedidately inside
      register_cache_set().
      
      If journal replay fails, run_cache_set() can report error immediately
      to register_cache_set(). This patch makes the failure handling for
      bch_journal_replay() be in synchronized way, easier to understand and
      debug, and avoid poetential race condition for register-and-unregister
      in same time.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ce3e4cfb
    • L
      bcache: fix a race between cache register and cacheset unregister · a4b732a2
      Liang Chen 提交于
      There is a race between cache device register and cache set unregister.
      For an already registered cache device, register_bcache will call
      bch_is_open to iterate through all cachesets and check every cache
      there. The race occurs if cache_set_free executes at the same time and
      clears the caches right before ca is dereferenced in bch_is_open_cache.
      To close the race, let's make sure the clean up work is protected by
      the bch_register_lock as well.
      
      This issue can be reproduced as follows,
      while true; do echo /dev/XXX> /sys/fs/bcache/register ; done&
      while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &
      
      and results in the following oops,
      
      [  +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998
      [  +0.000457] #PF error: [normal kernel read fault]
      [  +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0
      [  +0.000388] Oops: 0000 [#1] SMP PTI
      [  +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6
      [  +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
      [  +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache]
      [  +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d
      [  +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202
      [  +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000
      [  +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001
      [  +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a
      [  +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0
      [  +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620
      [  +0.000384] FS:  00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000
      [  +0.000420] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0
      [  +0.000837] Call Trace:
      [  +0.000682]  ? _cond_resched+0x10/0x20
      [  +0.000691]  ? __kmalloc+0x131/0x1b0
      [  +0.000710]  kernfs_fop_write+0xfa/0x170
      [  +0.000733]  __vfs_write+0x2e/0x190
      [  +0.000688]  ? inode_security+0x10/0x30
      [  +0.000698]  ? selinux_file_permission+0xd2/0x120
      [  +0.000752]  ? security_file_permission+0x2b/0x100
      [  +0.000753]  vfs_write+0xa8/0x1a0
      [  +0.000676]  ksys_write+0x4d/0xb0
      [  +0.000699]  do_syscall_64+0x3a/0xf0
      [  +0.000692]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NLiang Chen <liangchen.linux@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a4b732a2
    • G
      bcache: use kmemdup_nul for CACHED_LABEL buffer · 792732d9
      Geliang Tang 提交于
      This patch uses kmemdup_nul to create a NUL-terminated string from
      dc->sb.label. This is better than open coding it.
      
      With this, we can move env[2] initialization into env[] array to make
      code more elegant.
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      792732d9
  6. 09 2月, 2019 1 次提交
  7. 13 12月, 2018 3 次提交
    • C
      bcache: make cutoff_writeback and cutoff_writeback_sync tunable · 9aaf5165
      Coly Li 提交于
      Currently the cutoff writeback and cutoff writeback sync thresholds are
      defined by CUTOFF_WRITEBACK (40) and CUTOFF_WRITEBACK_SYNC (70) as
      static values. Most of time these they work fine, but when people want
      to do research on bcache writeback mode performance tuning, there is no
      chance to modify the soft and hard cutoff writeback values.
      
      This patch introduces two module parameters bch_cutoff_writeback_sync
      and bch_cutoff_writeback which permit people to tune the values when
      loading bcache.ko. If they are not specified by module loading, current
      values CUTOFF_WRITEBACK_SYNC and CUTOFF_WRITEBACK will be used as
      default and nothing changes.
      
      When people want to tune this two values,
      - cutoff_writeback can be set in range [1, 70]
      - cutoff_writeback_sync can be set in range [1, 90]
      - cutoff_writeback always <= cutoff_writeback_sync
      
      The default values are strongly recommended to most of users for most of
      workloads. Anyway, if people wants to take their own risk to do research
      on new writeback cutoff tuning for their own workload, now they can make
      it.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9aaf5165
    • C
      bcache: add MODULE_DESCRIPTION information · 009673d0
      Coly Li 提交于
      This patch moves MODULE_AUTHOR and MODULE_LICENSE to end of super.c, and
      add MODULE_DESCRIPTION("Bcache: a Linux block layer cache").
      
      This is preparation for adding module parameters.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      009673d0
    • S
      bcache: do not check if debug dentry is ERR or NULL explicitly on remove · ae171023
      Shenghui Wang 提交于
      debugfs_remove and debugfs_remove_recursive will check if the dentry
      pointer is NULL or ERR, and will do nothing in that case.
      
      Remove the check in cache_set_free and bch_debug_init.
      Signed-off-by: NShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ae171023
  8. 08 10月, 2018 7 次提交
  9. 27 9月, 2018 1 次提交
    • G
      bcache: add separate workqueue for journal_write to avoid deadlock · 0f843e65
      Guoju Fang 提交于
      After write SSD completed, bcache schedules journal_write work to
      system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM
      flag. system_wq is also a bound wq, and there may be no idle kworker on
      current processor. Creating a new kworker may unfortunately need to
      reclaim memory first, by shrinking cache and slab used by vfs, which
      depends on bcache device. That's a deadlock.
      
      This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM
      flag. It's rescuer thread will work to avoid the deadlock.
      Signed-off-by: NGuoju Fang <fangguoju@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0f843e65