1. 13 11月, 2019 11 次提交
  2. 08 11月, 2019 9 次提交
    • M
      block: split bio if the only bvec's length is > SZ_4K · 6952a7f8
      Ming Lei 提交于
      64K PAGE_SIZE is popular on ARM64 or other ARCHs, and 64K has been big
      enough to break some devices probably, so change the logic to split bio
      if the only bvec's length is > SZ_4K instead of PAGE_SIZE.
      
      Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations)
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6952a7f8
    • M
      block: still try to split bio if the bvec crosses pages · 59db8ba2
      Ming Lei 提交于
      Some device may set segment boundary as PAGE_SIZE - 1. If the bvec
      crosses pages, and meantime its length is <= PAGE_SIZE, we still need
      to split the bvec into 2 segments.
      
      Fixes this issue by still splitting bio if the single bvec crosses
      pages.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations)
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      59db8ba2
    • T
      blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT · 1d156646
      Tejun Heo 提交于
      blkg_rwstat is now only used by bfq-iosched and blk-throtl when on
      cgroup1.  Let's move it into its own files and gate it behind a config
      option.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1d156646
    • T
      blk-cgroup: reimplement basic IO stats using cgroup rstat · f7331648
      Tejun Heo 提交于
      blk-cgroup has been using blkg_rwstat to track basic IO stats.
      Unfortunately, reading recursive stats scales badly as itinvolves
      walking all descendants.  On systems with a huge number of cgroups
      (dead or alive), this can lead to substantial CPU cost when reading IO
      stats.
      
      This patch reimplements basic IO stats using cgroup rstat which uses
      more memory but makes recursive stat reading O(# descendants which
      have been active since last reading) instead of O(# descendants).
      
      * blk-cgroup core no longer uses sync/async stats.  Introduce new stat
        enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}.
      
      * Add blkg_iostat[_set] which encapsulates byte and io stats, last
        values for propagation delta calculation and u64_stats_sync for
        correctness on 32bit archs.
      
      * Update the new percpu stat counters directly and implement
        blkcg_rstat_flush() to implement propagation.
      
      * blkg_print_stat() can now bring the stats up to date by calling
        cgroup_rstat_flush() and print them instead of directly summing up
        all descendants.
      
      * It now allocates 96 bytes per cpu.  It used to be 40 bytes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Dan Schatzberg <dschatzberg@fb.com>
      Cc: Daniel Xu <dlxu@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f7331648
    • T
      blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive() · 8a80d5d6
      Tejun Heo 提交于
      These don't have users anymore.  Remove them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a80d5d6
    • T
      blk-throtl: stop using blkg->stat_bytes and ->stat_ios · 7ca46438
      Tejun Heo 提交于
      When used on cgroup1, blk-throtl uses the blkg->stat_bytes and
      ->stat_ios from blk-cgroup core to populate four stat knobs.
      blk-cgroup core is moving away from blkg_rwstat to improve scalability
      and won't be able to support this usage.
      
      It isn't like the sharing gains all that much.  Let's break them out
      to dedicated rwstat counters which are updated when on cgroup1.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7ca46438
    • T
      bfq-iosched: stop using blkg->stat_bytes and ->stat_ios · fd41e603
      Tejun Heo 提交于
      When used on cgroup1, bfq uses the blkg->stat_bytes and ->stat_ios
      from blk-cgroup core to populate six stat knobs.  blk-cgroup core is
      moving away from blkg_rwstat to improve scalability and won't be able
      to support this usage.
      
      It isn't like the sharing gains all that much.  Let's break it out to
      dedicated rwstat counters which are updated when on cgroup1.  This
      makes use of bfqg_*rwstat*() helpers outside of
      CONFIG_BFQ_CGROUP_DEBUG.  Move them out.
      
      v2: Compile fix when !CONFIG_BFQ_CGROUP_DEBUG.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd41e603
    • T
      bfq-iosched: relocate bfqg_*rwstat*() helpers · a557f1c7
      Tejun Heo 提交于
      Collect them right under #ifdef CONFIG_BFQ_CGROUP_DEBUG.  The next
      patch will use them from !DEBUG path and this makes it easy to move
      them out of the ifdef block.
      
      This is pure code reorganization.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a557f1c7
    • J
      Merge branch 'for-linus' into for-5.5/block · 912c0a85
      Jens Axboe 提交于
      Pull on for-linus to resolve what otherwise would have been a conflict
      with the cgroups rstat patchset from Tejun.
      
      * for-linus: (942 commits)
        blkcg: make blkcg_print_stat() print stats only for online blkgs
        nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
        nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
        nvme-rdma: fix a segmentation fault during module unload
        iocost: don't nest spin_lock_irq in ioc_weight_write()
        io_uring: ensure we clear io_kiocb->result before each issue
        um-ubd: Entrust re-queue to the upper layers
        nvme-multipath: remove unused groups_only mode in ana log
        nvme-multipath: fix possible io hang after ctrl reconnect
        io_uring: don't touch ctx in setup after ring fd install
        io_uring: Fix leaked shadow_req
        Linux 5.4-rc5
        riscv: cleanup do_trap_break
        nbd: verify socket is supported during setup
        ata: libahci_platform: Fix regulator_get_optional() misuse
        nbd: handle racing with error'ed out commands
        nbd: protect cmd->status with cmd->lock
        io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD
        io_uring: used cached copies of sq->dropped and cq->overflow
        ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
        ...
      912c0a85
  3. 07 11月, 2019 11 次提交
  4. 06 11月, 2019 6 次提交
  5. 05 11月, 2019 3 次提交
    • A
      nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths · 763303a8
      Anton Eidelman 提交于
      nvme_mpath_clear_ctrl_paths() iterates through
      the ctrl->namespaces list while holding ctrl->scan_lock.
      This does not seem to be the correct way of protecting
      from concurrent list modification.
      
      Specifically, nvme_scan_work() sorts ctrl->namespaces
      AFTER unlocking scan_lock.
      
      This may result in the following (rare) crash in ctrl disconnect
      during scan_work:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000050
          Oops: 0000 [#1] SMP PTI
          CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
          RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
          ...
          Call Trace:
           nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
           nvme_remove_namespaces+0x35/0xe0 [nvme_core]
           nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
           nvme_sysfs_delete+0x49/0x60 [nvme_core]
           dev_attr_store+0x17/0x30
           sysfs_kf_write+0x3e/0x50
           kernfs_fop_write+0x11e/0x1a0
           __vfs_write+0x1b/0x40
           vfs_write+0xb9/0x1a0
           ksys_write+0x67/0xe0
           __x64_sys_write+0x1a/0x20
           do_syscall_64+0x5a/0x130
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f8d02bfb154
      
      Fix:
      After taking scan_lock in nvme_mpath_clear_ctrl_paths()
      down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
      This will not cause deadlocks because taking scan_lock never happens
      while holding the namespaces_rwsem.
      Moreover, scan work downs namespaces_rwsem in the same order.
      
      Alternative: sort ctrl->namespaces in nvme_scan_work()
      while still holding the scan_lock.
      This would leave nvme_mpath_clear_ctrl_paths() without correct protection
      against ctrl->namespaces modification by anyone other than scan_work.
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      763303a8
    • M
      nvme-rdma: fix a segmentation fault during module unload · 9ad9e8d6
      Max Gurtovoy 提交于
      In case there are controllers that are not associated with any RDMA
      device (e.g. during unsuccessful reconnection) and the user will unload
      the module, these controllers will not be freed and will access already
      freed memory. The same logic appears in other fabric drivers as well.
      
      Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset")
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      9ad9e8d6
    • C
      block: avoid blk_bio_segment_split for small I/O operations · fa532287
      Christoph Hellwig 提交于
      __blk_queue_split() adds significant overhead for small I/O operations.
      Add a shortcut to avoid it for cases where we know we never need to
      split.
      
      Based on a patch from Ming Lei.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fa532287