1. 26 1月, 2021 1 次提交
    • Y
      scsi: do quiesce for enclosure driver · bfbbc08d
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 46860
      
      --------------------------------
      
      Drivers (such as scsi enclosure) will not call blk_register_queue()
      to do initialize for request_queue. And we rely on driver self
      to deal with the race in that case when cleanup queue.
      
      But, some self-developed drivers cannot deal the race. To avoid
      null pointer reference as following, we do quiesce in kernel.
      
      [67760.308034] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      ...
      [67760.308069] pc : blk_mq_do_dispatch_sched+0x94/0x130
      [67760.308072] lr : blk_mq_sched_dispatch_requests+0x128/0x1f0
      [67760.308072] sp : ffff0000b2bb3ca0
      [67760.308073] x29: ffff0000b2bb3ca0 x28: 0000000000000000
      [67760.308075] x27: 0000000000000000 x26: ffff00008128a000
      [67760.308076] x25: ffff8042b13e9700 x24: 0000000000000000
      [67760.308077] x23: ffff0000b2bb3cf8 x22: ffff8042b2976808
      [67760.308078] x21: ffff0000b2bb3d58 x20: ffff00008128a000
      [67760.308079] x19: ffff8042b2976800 x18: 0000000000000000
      [67760.308080] x17: 0000000000000000 x16: 0000000000000000
      [67760.308081] x15: 0000000000000000 x14: 0000000000000000
      [67760.308082] x13: 0000000000000000 x12: 0000000000000000
      [67760.308084] x11: ffff800040801004 x10: ffff80004080100c
      [67760.308085] x9 : 0000000000000060 x8 : ffff809e58f7e500
      [67760.308087] x7 : 0000000000000000 x6 : 00000000ffffffff
      [67760.308088] x5 : ffff000080ab7550 x4 : ffff80418e84ec80
      [67760.308089] x3 : ffff0000815d7f10 x2 : 94e133dc71839c00
      [67760.308090] x1 : 0000000000000000 x0 : ffff00008128a748
      [67760.308091] Call trace:
      [67760.308093]  blk_mq_do_dispatch_sched+0x94/0x130
      [67760.308095]  blk_mq_sched_dispatch_requests+0x128/0x1f0
      [67760.308096]  __blk_mq_run_hw_queue+0x98/0x138
      [67760.308097]  blk_mq_run_work_fn+0x28/0x38
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      bfbbc08d
  2. 22 9月, 2020 2 次提交
    • M
      block: split .sysfs_lock into two locks · 6c58d2be
      Ming Lei 提交于
      mainline inclusion
      from mainline-5.4-rc1
      commit cecf5d87
      category: bugfix
      bugzilla: 21614
      CVE: NA
      
      ---------------------------
      
      The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
      path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
      required.
      
      However, when mq & iosched kobjects are removed via
      blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
      too. This way causes AB-BA lock because the kernfs built-in lock of
      'kn-count' is required inside kobject_del() too, see the lockdep warning[1].
      
      On the other hand, it isn't necessary to acquire q->sysfs_lock for
      both blk_mq_unregister_dev() & elv_unregister_queue() because
      clearing REGISTERED flag prevents storing to 'queue/scheduler'
      from being happened. Also sysfs write(store) is exclusive, so no
      necessary to hold the lock for elv_unregister_queue() when it is
      called in switching elevator path.
      
      So split .sysfs_lock into two: one is still named as .sysfs_lock for
      covering sync .store, the other one is named as .sysfs_dir_lock
      for covering kobjects and related status change.
      
      sysfs itself can handle the race between add/remove kobjects and
      showing/storing attributes under kobjects. For switching scheduler
      via storing to 'queue/scheduler', we use the queue flag of
      QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
      we can avoid to hold .sysfs_lock during removing/adding kobjects.
      
      [1]  lockdep warning
          ======================================================
          WARNING: possible circular locking dependency detected
          5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
          ------------------------------------------------------
          rmmod/777 is trying to acquire lock:
          00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72
      
          but task is already holding lock:
          00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&q->sysfs_lock){+.+.}:
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __mutex_lock+0x14a/0xa9b
                 blk_mq_hw_sysfs_show+0x63/0xb6
                 sysfs_kf_seq_show+0x11f/0x196
                 seq_read+0x2cd/0x5f2
                 vfs_read+0xc7/0x18c
                 ksys_read+0xc4/0x13e
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          -> #0 (kn->count#202){++++}:
                 check_prev_add+0x5d2/0xc45
                 validate_chain+0xed3/0xf94
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __kernfs_remove+0x237/0x40b
                 kernfs_remove_by_name_ns+0x59/0x72
                 remove_files+0x61/0x96
                 sysfs_remove_group+0x81/0xa4
                 sysfs_remove_groups+0x3b/0x44
                 kobject_del+0x44/0x94
                 blk_mq_unregister_dev+0x83/0xdd
                 blk_unregister_queue+0xa0/0x10b
                 del_gendisk+0x259/0x3fa
                 null_del_dev+0x8b/0x1c3 [null_blk]
                 null_exit+0x5c/0x95 [null_blk]
                 __se_sys_delete_module+0x204/0x337
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          other info that might help us debug this:
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(&q->sysfs_lock);
                                         lock(kn->count#202);
                                         lock(&q->sysfs_lock);
            lock(kn->count#202);
      
           *** DEADLOCK ***
      
          2 locks held by rmmod/777:
           #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
           #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          stack backtrace:
          CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
          Call Trace:
           dump_stack+0x9a/0xe6
           check_noncircular+0x207/0x251
           ? print_circular_bug+0x32a/0x32a
           ? find_usage_backwards+0x84/0xb0
           check_prev_add+0x5d2/0xc45
           validate_chain+0xed3/0xf94
           ? check_prev_add+0xc45/0xc45
           ? mark_lock+0x11b/0x804
           ? check_usage_forwards+0x1ca/0x1ca
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           ? kernfs_remove_by_name_ns+0x59/0x72
           __kernfs_remove+0x237/0x40b
           ? kernfs_remove_by_name_ns+0x59/0x72
           ? kernfs_next_descendant_post+0x7d/0x7d
           ? strlen+0x10/0x23
           ? strcmp+0x22/0x44
           kernfs_remove_by_name_ns+0x59/0x72
           remove_files+0x61/0x96
           sysfs_remove_group+0x81/0xa4
           sysfs_remove_groups+0x3b/0x44
           kobject_del+0x44/0x94
           blk_mq_unregister_dev+0x83/0xdd
           blk_unregister_queue+0xa0/0x10b
           del_gendisk+0x259/0x3fa
           ? disk_events_poll_msecs_store+0x12b/0x12b
           ? check_flags+0x1ea/0x204
           ? mark_held_locks+0x1f/0x7a
           null_del_dev+0x8b/0x1c3 [null_blk]
           null_exit+0x5c/0x95 [null_blk]
           __se_sys_delete_module+0x204/0x337
           ? free_module+0x39f/0x39f
           ? blkcg_maybe_throttle_current+0x8a/0x718
           ? rwlock_bug+0x62/0x62
           ? __blkcg_punt_bio_submit+0xd0/0xd0
           ? trace_hardirqs_on_thunk+0x1a/0x20
           ? mark_held_locks+0x1f/0x7a
           ? do_syscall_64+0x4c/0x295
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
          RIP: 0033:0x7fb696cdbe6b
          Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
          RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
          RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
          RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
          RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
          R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
          R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      
      Conflicts:
        block/blk.h
        block/elevator.c
        block/blk-sysfs.c
        block/blk-core.c
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6c58d2be
    • M
      block: add helper for checking if queue is registered · 2b920b89
      Ming Lei 提交于
      mainline inclusion
      from mainline-5.4-rc1
      commit 58c898ba
      category: bugfix
      bugzilla: 21614
      CVE: NA
      
      ---------------------------
      
      There are 4 users which check if queue is registered, so add one helper
      to check it.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Reviewed-by: NYufen Yu <yuyufen@huawei.com>
      
      Conflicts:
        block/blk-sysfs.c
        block/blk-wbt.c
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2b920b89
  3. 25 8月, 2020 1 次提交
    • T
      blk-mq: add optional request->alloc_time_ns · 449de6b8
      Tejun Heo 提交于
      mainline inclusion
      from mainline-5.4-rc1
      commit 6f816b4b746c2241540e537682d30d8e9997d674
      category: feature
      bugzilla: 38688
      CVE: NA
      
      ---------------------------
      
      There are currently two start time timestamps - start_time_ns and
      io_start_time_ns.  The former marks the request allocation and and the
      second issue-to-device time.  The planned io.weight controller needs
      to measure the total time bios take to execute after it leaves rq_qos
      including the time spent waiting for request to become available,
      which can easily dominate on saturated devices.
      
      This patch adds request->alloc_time_ns which records when the request
      allocation attempt started.  As it isn't used for the usual stats,
      make it optional behind CONFIG_BLK_RQ_ALLOC_TIME and
      QUEUE_FLAG_RQ_ALLOC_TIME so that it can be compiled out when there are
      no users and it's active only on queues which need it even when
      compiled in.
      
      v2: s/pre_start_time/alloc_time/ and add CONFIG_BLK_RQ_ALLOC_TIME
          gating as suggested by Jens.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      
      Conflict:
        include/linux/blkdev.h
        block/Kconfig
        block/blk-mq.c
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      449de6b8
  4. 17 4月, 2020 1 次提交
  5. 16 4月, 2020 1 次提交
  6. 27 12月, 2019 4 次提交
  7. 12 9月, 2018 1 次提交
  8. 09 8月, 2018 1 次提交
    • B
      block: Remove two superfluous #include directives · b1f4267c
      Bart Van Assche 提交于
      Commit 12f5b931 ("blk-mq: Remove generation seqeunce") removed the
      only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
      did not remove the corresponding #include directives. Since these
      include directives are no longer needed, remove them.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>,
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1f4267c
  9. 27 7月, 2018 1 次提交
  10. 18 7月, 2018 1 次提交
    • T
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo 提交于
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @OP with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f289dcb
  11. 13 7月, 2018 1 次提交
  12. 09 7月, 2018 5 次提交
  13. 27 6月, 2018 1 次提交
  14. 15 6月, 2018 1 次提交
  15. 14 6月, 2018 1 次提交
  16. 31 5月, 2018 1 次提交
  17. 29 5月, 2018 5 次提交
  18. 14 5月, 2018 1 次提交
  19. 09 5月, 2018 3 次提交
  20. 19 4月, 2018 1 次提交
    • B
      scsi: sd_zbc: Avoid that resetting a zone fails sporadically · ccce20fc
      Bart Van Assche 提交于
      Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is
      called from sd_probe_async() and since sd_revalidate_disk() calls
      sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
      concurrently with blkdev_report_zones() and/or blkdev_reset_zones().  That can
      cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets
      q->nr_zones to zero before restoring it to the actual value, even if no drive
      characteristics have changed.  Avoid that this can happen by making the
      following changes:
      
      - Protect the code that updates zone information with blk_queue_enter()
        and blk_queue_exit().
      - Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
        these functions do not modify struct scsi_disk before all zone
        information has been obtained.
      
      Note: since commit 055f6e18 ("block: Make q_usage_counter also track
      legacy requests"; kernel v4.15) the request queue freezing mechanism also
      affects legacy request queues.
      
      Fixes: 89d94756 ("sd: Implement support for ZBC devices")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Damien Le Moal <damien.lemoal@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: stable@vger.kernel.org # v4.16
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ccce20fc
  21. 18 4月, 2018 1 次提交
  22. 18 3月, 2018 1 次提交
    • B
      block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> · 233bde21
      Bart Van Assche 提交于
      It happens often while I'm preparing a patch for a block driver that
      I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
      available for this driver? Do I have to introduce definitions of these
      constants before I can use these constants? To avoid this confusion,
      move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
      <linux/blkdev.h> header file such that these become available for all
      block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
      header file conditional to avoid that including that header file after
      <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
      redefinition.
      
      Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
      not been removed from uapi header files nor from NAND drivers in
      which these constants are used for another purpose than converting
      block layer offsets and sizes into a number of sectors.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      233bde21
  23. 09 3月, 2018 4 次提交