1. 25 2月, 2020 1 次提交
  2. 16 1月, 2020 1 次提交
  3. 03 1月, 2020 1 次提交
  4. 03 12月, 2019 3 次提交
  5. 13 11月, 2019 1 次提交
  6. 07 11月, 2019 2 次提交
  7. 23 10月, 2019 1 次提交
    • A
      compat_ioctl: reimplement SG_IO handling · 98aaaec4
      Arnd Bergmann 提交于
      There are two code locations that implement the SG_IO ioctl: the old
      sg.c driver, and the generic scsi_ioctl helper that is in turn used by
      multiple drivers.
      
      To eradicate the old compat_ioctl conversion handler for the SG_IO
      command, I implement a readable pair of put_sg_io_hdr() /get_sg_io_hdr()
      helper functions that can be used for both compat and native mode,
      and then I call this from both drivers.
      
      For the iovec handling, there is already a compat_import_iovec() function
      that can simply be called in place of import_iovec().
      
      To avoid having to pass the compat/native state through multiple
      indirections, I mark the SG_IO command itself as compatible in
      fs/compat_ioctl.c and use in_compat_syscall() to figure out where
      we are called from.
      
      As a side-effect of this, the sg.c driver now also accepts the 32-bit
      sg_io_hdr format in compat mode using the read/write interface, not
      just ioctl. This should improve compatiblity with old 32-bit binaries,
      but it would break if any application intentionally passes the 64-bit
      data structure in compat mode here.
      
      Steffen Maier helped debug an issue in an earlier version of this patch.
      
      Cc: Steffen Maier <maier@linux.ibm.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      98aaaec4
  8. 07 10月, 2019 1 次提交
    • B
      block: Remove request_queue.nr_queues · 95662565
      Bart Van Assche 提交于
      Commit 897bb0c7 ("blk-mq: Use proper cpumask iterator"; v4.6)
      removed the last use of request_queue.nr_queues from outside
      blk_mq_init_allocate_queue(). Remove this member variable to make
      struct request_queue smaller. This patch does not change any
      functionality.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      95662565
  9. 18 9月, 2019 1 次提交
  10. 16 9月, 2019 1 次提交
    • H
      block: make rq sector size accessible for block stats · 3d244306
      Hou Tao 提交于
      Currently rq->data_len will be decreased by partial completion or
      zeroed by completion, so when blk_stat_add() is invoked, data_len
      will be zero and there will never be samples in poll_cb because
      blk_mq_poll_stats_bkt() will return -1 if data_len is zero.
      
      We could move blk_stat_add() back to __blk_mq_complete_request(),
      but that would make the effort of trying to call ktime_get_ns()
      once in vain. Instead we can reuse throtl_size field, and use
      it for both block stats and block throttle, and adjust the
      logic in blk_mq_poll_stats_bkt() accordingly.
      
      Fixes: 4bc6339a ("block: move blk_stat_add() to __blk_mq_end_request()")
      Tested-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d244306
  11. 06 9月, 2019 1 次提交
    • D
      block: Introduce elevator features · 68c43f13
      Damien Le Moal 提交于
      Introduce the definition of elevator features through the
      elevator_features flags in the elevator_type structure. Each flag can
      represent a feature supported by an elevator. The first feature defined
      by this patch is support for zoned block device sequential write
      constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
      by the mq-deadline elevator using zone write locking.
      
      Other possible features are IO priorities, write hints, latency targets
      or single-LUN dual-actuator disks (for which the elevator could maintain
      one LBA ordered list per actuator).
      
      The required_elevator_features field is also added to the request_queue
      structure to allow a device driver to specify elevator feature flags
      that an elevator must support for the correct operation of the device
      (e.g. device drivers for zoned block devices can have the
      ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
      The helper function blk_queue_required_elevator_features() is
      defined for setting this new field.
      
      With these two new fields in place, the elevator functions
      elevator_match() and elevator_find() are modified to allow a user to set
      only an elevator with a set of features that satisfies the device
      required features. Elevators not matching the device requirements are
      not shown in the device sysfs queue/scheduler file to prevent their use.
      
      The "none" elevator can always be selected as before.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      68c43f13
  12. 03 9月, 2019 1 次提交
  13. 29 8月, 2019 1 次提交
    • T
      blk-mq: add optional request->alloc_time_ns · 6f816b4b
      Tejun Heo 提交于
      There are currently two start time timestamps - start_time_ns and
      io_start_time_ns.  The former marks the request allocation and and the
      second issue-to-device time.  The planned io.weight controller needs
      to measure the total time bios take to execute after it leaves rq_qos
      including the time spent waiting for request to become available,
      which can easily dominate on saturated devices.
      
      This patch adds request->alloc_time_ns which records when the request
      allocation attempt started.  As it isn't used for the usual stats,
      make it optional behind CONFIG_BLK_RQ_ALLOC_TIME and
      QUEUE_FLAG_RQ_ALLOC_TIME so that it can be compiled out when there are
      no users and it's active only on queues which need it even when
      compiled in.
      
      v2: s/pre_start_time/alloc_time/ and add CONFIG_BLK_RQ_ALLOC_TIME
          gating as suggested by Jens.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6f816b4b
  14. 28 8月, 2019 2 次提交
    • M
      block: split .sysfs_lock into two locks · cecf5d87
      Ming Lei 提交于
      The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
      path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
      required.
      
      However, when mq & iosched kobjects are removed via
      blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
      too. This way causes AB-BA lock because the kernfs built-in lock of
      'kn-count' is required inside kobject_del() too, see the lockdep warning[1].
      
      On the other hand, it isn't necessary to acquire q->sysfs_lock for
      both blk_mq_unregister_dev() & elv_unregister_queue() because
      clearing REGISTERED flag prevents storing to 'queue/scheduler'
      from being happened. Also sysfs write(store) is exclusive, so no
      necessary to hold the lock for elv_unregister_queue() when it is
      called in switching elevator path.
      
      So split .sysfs_lock into two: one is still named as .sysfs_lock for
      covering sync .store, the other one is named as .sysfs_dir_lock
      for covering kobjects and related status change.
      
      sysfs itself can handle the race between add/remove kobjects and
      showing/storing attributes under kobjects. For switching scheduler
      via storing to 'queue/scheduler', we use the queue flag of
      QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
      we can avoid to hold .sysfs_lock during removing/adding kobjects.
      
      [1]  lockdep warning
          ======================================================
          WARNING: possible circular locking dependency detected
          5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
          ------------------------------------------------------
          rmmod/777 is trying to acquire lock:
          00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72
      
          but task is already holding lock:
          00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&q->sysfs_lock){+.+.}:
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __mutex_lock+0x14a/0xa9b
                 blk_mq_hw_sysfs_show+0x63/0xb6
                 sysfs_kf_seq_show+0x11f/0x196
                 seq_read+0x2cd/0x5f2
                 vfs_read+0xc7/0x18c
                 ksys_read+0xc4/0x13e
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          -> #0 (kn->count#202){++++}:
                 check_prev_add+0x5d2/0xc45
                 validate_chain+0xed3/0xf94
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __kernfs_remove+0x237/0x40b
                 kernfs_remove_by_name_ns+0x59/0x72
                 remove_files+0x61/0x96
                 sysfs_remove_group+0x81/0xa4
                 sysfs_remove_groups+0x3b/0x44
                 kobject_del+0x44/0x94
                 blk_mq_unregister_dev+0x83/0xdd
                 blk_unregister_queue+0xa0/0x10b
                 del_gendisk+0x259/0x3fa
                 null_del_dev+0x8b/0x1c3 [null_blk]
                 null_exit+0x5c/0x95 [null_blk]
                 __se_sys_delete_module+0x204/0x337
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          other info that might help us debug this:
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(&q->sysfs_lock);
                                         lock(kn->count#202);
                                         lock(&q->sysfs_lock);
            lock(kn->count#202);
      
           *** DEADLOCK ***
      
          2 locks held by rmmod/777:
           #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
           #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          stack backtrace:
          CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
          Call Trace:
           dump_stack+0x9a/0xe6
           check_noncircular+0x207/0x251
           ? print_circular_bug+0x32a/0x32a
           ? find_usage_backwards+0x84/0xb0
           check_prev_add+0x5d2/0xc45
           validate_chain+0xed3/0xf94
           ? check_prev_add+0xc45/0xc45
           ? mark_lock+0x11b/0x804
           ? check_usage_forwards+0x1ca/0x1ca
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           ? kernfs_remove_by_name_ns+0x59/0x72
           __kernfs_remove+0x237/0x40b
           ? kernfs_remove_by_name_ns+0x59/0x72
           ? kernfs_next_descendant_post+0x7d/0x7d
           ? strlen+0x10/0x23
           ? strcmp+0x22/0x44
           kernfs_remove_by_name_ns+0x59/0x72
           remove_files+0x61/0x96
           sysfs_remove_group+0x81/0xa4
           sysfs_remove_groups+0x3b/0x44
           kobject_del+0x44/0x94
           blk_mq_unregister_dev+0x83/0xdd
           blk_unregister_queue+0xa0/0x10b
           del_gendisk+0x259/0x3fa
           ? disk_events_poll_msecs_store+0x12b/0x12b
           ? check_flags+0x1ea/0x204
           ? mark_held_locks+0x1f/0x7a
           null_del_dev+0x8b/0x1c3 [null_blk]
           null_exit+0x5c/0x95 [null_blk]
           __se_sys_delete_module+0x204/0x337
           ? free_module+0x39f/0x39f
           ? blkcg_maybe_throttle_current+0x8a/0x718
           ? rwlock_bug+0x62/0x62
           ? __blkcg_punt_bio_submit+0xd0/0xd0
           ? trace_hardirqs_on_thunk+0x1a/0x20
           ? mark_held_locks+0x1f/0x7a
           ? do_syscall_64+0x4c/0x295
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
          RIP: 0033:0x7fb696cdbe6b
          Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
          RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
          RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
          RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
          RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
          R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
          R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cecf5d87
    • M
      block: add helper for checking if queue is registered · 58c898ba
      Ming Lei 提交于
      There are 4 users which check if queue is registered, so add one helper
      to check it.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      58c898ba
  15. 19 8月, 2019 1 次提交
  16. 05 8月, 2019 2 次提交
  17. 12 7月, 2019 2 次提交
  18. 10 7月, 2019 1 次提交
    • D
      block: Fix potential overflow in blk_report_zones() · 113ab72e
      Damien Le Moal 提交于
      For large values of the number of zones reported and/or large zone
      sizes, the sector increment calculated with
      
      blk_queue_zone_sectors(q) * n
      
      in blk_report_zones() loop can overflow the unsigned int type used for
      the calculation as both "n" and blk_queue_zone_sectors() value are
      unsigned int. E.g. for a device with 256 MB zones (524288 sectors),
      overflow happens with 8192 or more zones reported.
      
      Changing the return type of blk_queue_zone_sectors() to sector_t, fixes
      this problem and avoids overflow problem for all other callers of this
      helper too. The same change is also applied to the bdev_zone_sectors()
      helper.
      
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      113ab72e
  19. 21 6月, 2019 3 次提交
  20. 20 6月, 2019 2 次提交
  21. 24 5月, 2019 1 次提交
    • B
      blk-mq: fix hang caused by freeze/unfreeze sequence · 7996a8b5
      Bob Liu 提交于
      The following is a description of a hang in blk_mq_freeze_queue_wait().
      The hang happens on attempt to freeze a queue while another task does
      queue unfreeze.
      
      The root cause is an incorrect sequence of percpu_ref_resurrect() and
      percpu_ref_kill() and as a result those two can be swapped:
      
       CPU#0                         CPU#1
       ----------------              -----------------
       q1 = blk_mq_init_queue(shared_tags)
      
                                      q2 = blk_mq_init_queue(shared_tags):
                                        blk_mq_add_queue_tag_set(shared_tags):
                                          blk_mq_update_tag_set_depth(shared_tags):
      				     list_for_each_entry()
                                            blk_mq_freeze_queue(q1)
                                             > percpu_ref_kill()
                                             > blk_mq_freeze_queue_wait()
      
       blk_cleanup_queue(q1)
        blk_mq_freeze_queue(q1)
         > percpu_ref_kill()
                       ^^^^^^ freeze_depth can't guarantee the order
      
                                            blk_mq_unfreeze_queue()
                                              > percpu_ref_resurrect()
      
         > blk_mq_freeze_queue_wait()
                       ^^^^^^ Hang here!!!!
      
      This wrong sequence raises kernel warning:
      percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release!
      WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0
      
      But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
      which waits for a zero of a q_usage_counter, which never happens
      because percpu-ref was reinited (instead of being killed) and stays in
      PERCPU state forever.
      
      How to reproduce:
       - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2"
       - cpu0: python Script.py 0; taskset the corresponding process running on cpu0
       - cpu1: python Script.py 1; taskset the corresponding process running on cpu1
      
       Script.py:
       ------
       #!/usr/bin/python3
      
      import os
      import sys
      
      while True:
          on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          os.system(on)
          os.system(off)
      ------
      
      This bug was first reported and fixed by Roman, previous discussion:
      [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
      [2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org
      [3] https://patchwork.kernel.org/patch/9268199/Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7996a8b5
  22. 04 5月, 2019 1 次提交
    • M
      blk-mq: always free hctx after request queue is freed · 2f8f1336
      Ming Lei 提交于
      In normal queue cleanup path, hctx is released after request queue
      is freed, see blk_mq_release().
      
      However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
      of hw queues shrinking. This way is easy to cause use-after-free,
      because: one implicit rule is that it is safe to call almost all block
      layer APIs if the request queue is alive; and one hctx may be retrieved
      by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
      finally use-after-free is triggered.
      
      Fixes this issue by always freeing hctx after releasing request queue.
      If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
      a per-queue list to hold them, then try to resuse these hctxs if numa
      node is matched.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Tested-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2f8f1336
  23. 01 5月, 2019 1 次提交
  24. 20 4月, 2019 1 次提交
  25. 05 4月, 2019 4 次提交
  26. 21 3月, 2019 1 次提交
  27. 15 2月, 2019 2 次提交