1. 17 10月, 2019 2 次提交
  2. 16 10月, 2019 3 次提交
    • D
      libata/ahci: Fix PCS quirk application · 09d6ac8d
      Dan Williams 提交于
      Commit c312ef17 "libata/ahci: Drop PCS quirk for Denverton and
      beyond" got the polarity wrong on the check for which board-ids should
      have the quirk applied. The board type board_ahci_pcs7 is defined at the
      end of the list such that "pcs7" boards can be special cased in the
      future if they need the quirk. All prior Intel board ids "<
      board_ahci_pcs7" should proceed with applying the quirk.
      Reported-by: NAndreas Friedrich <afrie@gmx.net>
      Reported-by: NStephen Douthit <stephend@silicom-usa.com>
      Fixes: c312ef17 ("libata/ahci: Drop PCS quirk for Denverton and beyond")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      09d6ac8d
    • T
      blk-rq-qos: fix first node deletion of rq_qos_del() · 307f4065
      Tejun Heo 提交于
      rq_qos_del() incorrectly assigns the node being deleted to the head if
      it was the first on the list in the !prev path.  Fix it by iterating
      with ** instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      307f4065
    • T
      blkcg: Fix multiple bugs in blkcg_activate_policy() · 9d179b86
      Tejun Heo 提交于
      blkcg_activate_policy() has the following bugs.
      
      * cf09a8ee ("blkcg: pass @q and @blkcg into
        blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
        blkcg_activate_policy() ends up using pd's allocated for the root
        blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
        can be passed in pd's which are allocated for the root blkcg.
      
        For blk-iocost, this means that ->pd_init_fn() can write beyond the
        end of the allocated object as it determines the length of the flex
        array at the end based on the blkcg's nesting level.
      
      * Each pd is initialized as they get allocated.  If alloc fails, the
        policy will get freed with pd's initialized on it.
      
      * After the above partial failure, the partial pds are not freed.
      
      This patch fixes all the above issues by
      
      * Restructuring blkcg_activate_policy() so that alloc and init passes
        are separate.  Init takes place only after all allocs succeeded and
        on failure all allocated pds are freed.
      
      * Unifying and fixing the cleanup of the remaining pd_prealloc.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: cf09a8ee ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9d179b86
  3. 15 10月, 2019 2 次提交
    • Y
      io_uring: consider the overflow of sequence for timeout req · 5da0fb1a
      yangerkun 提交于
      Now we recalculate the sequence of timeout with 'req->sequence =
      ctx->cached_sq_head + count - 1', judge the right place to insert
      for timeout_list by compare the number of request we still expected for
      completion. But we have not consider about the situation of overflow:
      
      1. ctx->cached_sq_head + count - 1 may overflow. And a bigger count for
      the new timeout req can have a small req->sequence.
      
      2. cached_sq_head of now may overflow compare with before req. And it
      will lead the timeout req with small req->sequence.
      
      This overflow will lead to the misorder of timeout_list, which can lead
      to the wrong order of the completion of timeout_list. Fix it by reuse
      req->submit.sequence to store the count, and change the logic of
      inserting sort in io_timeout.
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5da0fb1a
    • D
      block: Fix elv_support_iosched() · 7a7c5e71
      Damien Le Moal 提交于
      A BIO based request queue does not have a tag_set, which prevent testing
      for the flag BLK_MQ_F_NO_SCHED indicating that the queue does not
      require an elevator. This leads to an incorrect initialization of a
      default elevator in some cases such as BIO based null_blk
      (queue_mode == BIO) with zoned mode enabled as the default elevator in
      this case is mq-deadline instead of "none".
      
      Fix this by testing for a NULL queue mq_ops field which indicates that
      the queue is BIO based and should not have an elevator.
      Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7a7c5e71
  4. 11 10月, 2019 1 次提交
  5. 10 10月, 2019 3 次提交
  6. 08 10月, 2019 1 次提交
  7. 06 10月, 2019 3 次提交
  8. 04 10月, 2019 3 次提交
  9. 03 10月, 2019 1 次提交
  10. 01 10月, 2019 4 次提交
    • S
      Revert "s390/dasd: Add discard support for ESE volumes" · 964ce509
      Stefan Haberland 提交于
      This reverts commit 7e64db15.
      
      The thin provisioning feature introduces an IOCTL and the discard support
      to allow userspace tools and filesystems to release unused and previously
      allocated space respectively.
      
      During some internal performance improvements and further tests, the
      release of allocated space revealed some issues that may lead to data
      corruption in some configurations when filesystems are mounted with
      discard support enabled.
      
      While we're working on a fix and trying to clarify the situation,
      this commit reverts the discard support for ESE volumes to prevent
      potential data corruption.
      
      Cc: <stable@vger.kernel.org> # 5.3
      Signed-off-by: NStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      964ce509
    • J
      s390/dasd: Fix error handling during online processing · dd454839
      Jan Höppner 提交于
      It is possible that the CCW commands for reading volume and extent pool
      information are not supported, either by the storage server (for
      dedicated DASDs) or by z/VM (for virtual devices, such as MDISKs).
      
      As a command reject will occur in such a case, the current error
      handling leads to a failing online processing and thus the DASD can't be
      used at all.
      
      Since the data being read is not essential for an fully operational
      DASD, the error handling can be removed. Information about the failing
      command is sent to the s390dbf debug feature.
      
      Fixes: c729696b ("s390/dasd: Recognise data for ESE volumes")
      Cc: <stable@vger.kernel.org> # 5.3
      Reported-by: NFrank Heimes <frank.heimes@canonical.com>
      Signed-off-by: NJan Höppner <hoeppner@linux.ibm.com>
      Signed-off-by: NStefan Haberland <sth@linux.ibm.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dd454839
    • A
      io_uring: use __kernel_timespec in timeout ABI · bdf20073
      Arnd Bergmann 提交于
      All system calls use struct __kernel_timespec instead of the old struct
      timespec, but this one was just added with the old-style ABI. Change it
      now to enforce the use of __kernel_timespec, avoiding ABI confusion and
      the need for compat handlers on 32-bit architectures.
      
      Any user space caller will have to use __kernel_timespec now, but this
      is unambiguous and works for any C library regardless of the time_t
      definition. A nicer way to specify the timeout would have been a less
      ambiguous 64-bit nanosecond value, but I suppose it's too late now to
      change that as this would impact both 32-bit and 64-bit users.
      
      Fixes: 5262f567 ("io_uring: IORING_OP_TIMEOUT support")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bdf20073
    • M
      loop: change queue block size to match when using DIO · 85560117
      Martijn Coenen 提交于
      The loop driver assumes that if the passed in fd is opened with
      O_DIRECT, the caller wants to use direct I/O on the loop device.
      However, if the underlying block device has a different block size than
      the loop block queue, direct I/O can't be enabled. Instead of requiring
      userspace to manually change the blocksize and re-enable direct I/O,
      just change the queue block sizes to match, as well as the io_min size.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMartijn Coenen <maco@android.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      85560117
  11. 28 9月, 2019 4 次提交
    • J
      Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-linus · 2d5ba0c7
      Jens Axboe 提交于
      Pull NVMe changes from Sagi:
      
      "This set consists of various fixes and cleanups:
       - controller removal race fix from Balbir
       - quirk additions from Gabriel and Jian-Hong
       - nvme-pci power state save fix from Mario
       - Add 64bit user commands (for 64bit registers) from Marta
       - nvme-rdma/nvme-tcp fixes from Max, Mark and Me
       - Minor cleanups and nits from James, Dan and John"
      
      * 'nvme-5.4' of git://git.infradead.org/nvme:
        nvme-rdma: fix possible use-after-free in connect timeout
        nvme: Move ctrl sqsize to generic space
        nvme: Add ctrl attributes for queue_count and sqsize
        nvme: allow 64-bit results in passthru commands
        nvme: Add quirk for Kingston NVME SSD running FW E8FK11.T
        nvmet-tcp: remove superflous check on request sgl
        Added QUIRKs for ADATA XPG SX8200 Pro 512GB
        nvme-rdma: Fix max_hw_sectors calculation
        nvme: fix an error code in nvme_init_subsystem()
        nvme-pci: Save PCI state before putting drive into deepest state
        nvme-tcp: fix wrong stop condition in io_work
        nvme-pci: Fix a race in controller removal
        nvmet: change ppl to lpp
      2d5ba0c7
    • M
      blk-mq: apply normal plugging for HDD · 3154df26
      Ming Lei 提交于
      Some HDD drive may expose multiple hardware queues, such as MegraRaid.
      Let's apply the normal plugging for such devices because sequential IO
      may benefit a lot from plug merging.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3154df26
    • M
      blk-mq: honor IO scheduler for multiqueue devices · a12de1d4
      Ming Lei 提交于
      If a device is using multiple queues, the IO scheduler may be bypassed.
      This may hurt performance for some slow MQ devices, and it also breaks
      zoned devices which depend on mq-deadline for respecting the write order
      in one zone.
      
      Don't bypass io scheduler if we have one setup.
      
      This patch can double sequential write performance basically on MQ
      scsi_debug when mq-deadline is applied.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Reviewed-by: NJavier González <javier@javigon.com>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a12de1d4
    • S
      nvme-rdma: fix possible use-after-free in connect timeout · 67b483dd
      Sagi Grimberg 提交于
      If the connect times out, we may have already destroyed the
      queue in the timeout handler, so test if the queue is still
      allocated in the connect error handler.
      Reported-by: NYi Zhang <yi.zhang@redhat.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      67b483dd
  12. 27 9月, 2019 3 次提交
    • Y
      block: fix null pointer dereference in blk_mq_rq_timed_out() · 8d699663
      Yufen Yu 提交于
      We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
      as following:
      
      [  108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
      [  108.827059] PGD 0 P4D 0
      [  108.827313] Oops: 0000 [#1] SMP PTI
      [  108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
      [  108.829503] Workqueue: kblockd blk_mq_timeout_work
      [  108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
      [  108.838191] Call Trace:
      [  108.838406]  bt_iter+0x74/0x80
      [  108.838665]  blk_mq_queue_tag_busy_iter+0x204/0x450
      [  108.839074]  ? __switch_to_asm+0x34/0x70
      [  108.839405]  ? blk_mq_stop_hw_queue+0x40/0x40
      [  108.839823]  ? blk_mq_stop_hw_queue+0x40/0x40
      [  108.840273]  ? syscall_return_via_sysret+0xf/0x7f
      [  108.840732]  blk_mq_timeout_work+0x74/0x200
      [  108.841151]  process_one_work+0x297/0x680
      [  108.841550]  worker_thread+0x29c/0x6f0
      [  108.841926]  ? rescuer_thread+0x580/0x580
      [  108.842344]  kthread+0x16a/0x1a0
      [  108.842666]  ? kthread_flush_work+0x170/0x170
      [  108.843100]  ret_from_fork+0x35/0x40
      
      The bug is caused by the race between timeout handle and completion for
      flush request.
      
      When timeout handle function blk_mq_rq_timed_out() try to read
      'req->q->mq_ops', the 'req' have completed and reinitiated by next
      flush request, which would call blk_rq_init() to clear 'req' as 0.
      
      After commit 12f5b931 ("blk-mq: Remove generation seqeunce"),
      normal requests lifetime are protected by refcount. Until 'rq->ref'
      drop to zero, the request can really be free. Thus, these requests
      cannot been reused before timeout handle finish.
      
      However, flush request has defined .end_io and rq->end_io() is still
      called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
      can be reused by the next flush request handle, resulting in null
      pointer deference BUG ON.
      
      We fix this problem by covering flush request with 'rq->ref'.
      If the refcount is not zero, flush_end_io() return and wait the
      last holder recall it. To record the request status, we add a new
      entry 'rq_status', which will be used in flush_end_io().
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: stable@vger.kernel.org # v4.18+
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      
      -------
      v2:
       - move rq_status from struct request to struct blk_flush_queue
      v3:
       - remove unnecessary '{}' pair.
      v4:
       - let spinlock to protect 'fq->rq_status'
      v5:
       - move rq_status after flush_running_idx member of struct blk_flush_queue
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8d699663
    • Y
      rq-qos: get rid of redundant wbt_update_limits() · 2af2783f
      Yufen Yu 提交于
      We have updated limits after calling wbt_set_min_lat(). No need to
      update again.
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2af2783f
    • K
      nvme: Move ctrl sqsize to generic space · f968688f
      Keith Busch 提交于
      This isn't specific to fabrics.
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      f968688f
  13. 26 9月, 2019 10 次提交