• C
    bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() · 42f21b46
    Coly Li 提交于
    mainline inclusion
    from v5.19-rc1
    commit a1a2d8f0
    category: bugfix
    bugzilla: https://gitee.com/openeuler/kernel/issues/I59A5L?from=project-issue
    CVE: N/A
    
    -----------------------------------------
    
    The kworker routine update_writeback_rate() is schedued to update the
    writeback rate in every 5 seconds by default. Before calling
    __update_writeback_rate() to do real job, semaphore dc->writeback_lock
    should be held by the kworker routine.
    
    At the same time, bcache writeback thread routine bch_writeback_thread()
    also needs to hold dc->writeback_lock before flushing dirty data back
    into the backing device. If the dirty data set is large, it might be
    very long time for bch_writeback_thread() to scan all dirty buckets and
    releases dc->writeback_lock. In such case update_writeback_rate() can be
    starved for long enough time so that kernel reports a soft lockup warn-
    ing started like:
      watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]
    
    Such soft lockup condition is unnecessary, because after the writeback
    thread finishes its job and releases dc->writeback_lock, the kworker
    update_writeback_rate() may continue to work and everything is fine
    indeed.
    
    This patch avoids the unnecessary soft lockup by the following method,
    - Add new member to struct cached_dev
      - dc->rate_update_retry (0 by default)
    - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock)
      firstly, if it fails then lock contention happens.
    - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_MAX_SKIPS (15), doesn't
      acquire the lock and reschedules the kworker for next try.
    - If dc->rate_update_retry > BCH_WBRATE_UPDATE_MAX_SKIPS, no retry
      anymore and call down_read(&dc->writeback_lock) to wait for the lock.
    
    By the above method, at worst case update_writeback_rate() may retry for
    1+ minutes before blocking on dc->writeback_lock by calling down_read().
    For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
    lockup warning message can be avoided.
    
    When retrying to acquire dc->writeback_lock in update_writeback_rate(),
    of course the writeback rate cannot be updated. It is fair, because when
    the kworker is blocked on the lock contention of dc->writeback_lock, the
    writeback rate cannot be updated neither.
    
    This change follows Jens Axboe's suggestion to a more clear and simple
    version.
    Signed-off-by: NColy Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20220528124550.32834-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
    Reviewed-by: NJason Yan <yanaijie@huawei.com>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    42f21b46
bcache.h 32.3 KB