1. 12 12月, 2018 1 次提交
  2. 11 12月, 2018 1 次提交
  3. 07 12月, 2018 5 次提交
    • J
      Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linus · 8b878ee2
      Jens Axboe 提交于
      Pull NVMe fixes from Christoph.
      
      * 'nvme-4.20' of git://git.infradead.org/nvme:
        nvmet-rdma: fix response use after free
        nvme: validate controller state before rescheduling keep alive
      8b878ee2
    • J
      blk-mq: punt failed direct issue to dispatch list · c616cbee
      Jens Axboe 提交于
      After the direct dispatch corruption fix, we permanently disallow direct
      dispatch of non read/write requests. This works fine off the normal IO
      path, as they will be retried like any other failed direct dispatch
      request. But for the blk_insert_cloned_request() that only DM uses to
      bypass the bottom level scheduler, we always first attempt direct
      dispatch. For some types of requests, that's now a permanent failure,
      and no amount of retrying will make that succeed. This results in a
      livelock.
      
      Instead of making special cases for what we can direct issue, and now
      having to deal with DM solving the livelock while still retaining a BUSY
      condition feedback loop, always just add a request that has been through
      ->queue_rq() to the hardware queue dispatch list. These are safe to use
      as no merging can take place there. Additionally, if requests do have
      prepped data from drivers, we aren't dependent on them not sharing space
      in the request structure to safely add them to the IO scheduler lists.
      
      This basically reverts ffe81d45 and is based on a patch from Ming,
      but with the list insert case covered as well.
      
      Fixes: ffe81d45 ("blk-mq: fix corruption with direct issue")
      Cc: stable@vger.kernel.org
      Suggested-by: NMing Lei <ming.lei@redhat.com>
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c616cbee
    • I
      nvmet-rdma: fix response use after free · d7dcdf9d
      Israel Rukshin 提交于
      nvmet_rdma_release_rsp() may free the response before using it at error
      flow.
      
      Fixes: 8407879c ("nvmet-rdma: fix possible bogus dereference under heavy load")
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d7dcdf9d
    • J
      nvme: validate controller state before rescheduling keep alive · 86880d64
      James Smart 提交于
      Delete operations are seeing NULL pointer references in call_timer_fn.
      Tracking these back, the timer appears to be the keep alive timer.
      
      nvme_keep_alive_work() which is tied to the timer that is cancelled
      by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
      wait for it's completion. So nvme_stop_keep_alive() only stops a timer
      when it's pending. When a keep alive is in flight, there is no timer
      running and the nvme_stop_keep_alive() will have no affect on the keep
      alive io. Thus, if the io completes successfully, the keep alive timer
      will be rescheduled.   In the failure case, delete is called, the
      controller state is changed, the nvme_stop_keep_alive() is called while
      the io is outstanding, and the delete path continues on. The keep
      alive happens to successfully complete before the delete paths mark it
      as aborted as part of the queue termination, so the timer is restarted.
      The delete paths then tear down the controller, and later on the timer
      code fires and the timer entry is now corrupt.
      
      Fix by validating the controller state before rescheduling the keep
      alive. Testing with the fix has confirmed the condition above was hit.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      86880d64
    • P
      block, bfq: fix decrement of num_active_groups · ba7aeae5
      Paolo Valente 提交于
      Since commit '2d29c9f8 ("block, bfq: improve asymmetric scenarios
      detection")', if there are process groups with I/O requests waiting for
      completion, then BFQ tags the scenario as 'asymmetric'. This detection
      is needed for preserving service guarantees (for details, see comments
      on the computation * of the variable asymmetric_scenario in the
      function bfq_better_to_idle).
      
      Unfortunately, commit '2d29c9f8 ("block, bfq: improve asymmetric
      scenarios detection")' contains an error exactly in the updating of
      the number of groups with I/O requests waiting for completion: if a
      group has more than one descendant process, then the above number of
      groups, which is renamed from num_active_groups to a more appropriate
      num_groups_with_pending_reqs by this commit, may happen to be wrongly
      decremented multiple times, namely every time one of the descendant
      processes gets all its pending I/O requests completed.
      
      A correct, complete solution should work as follows. Consider a group
      that is inactive, i.e., that has no descendant process with pending
      I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
      is still accounting for this group, because the group still has some
      descendant process with some I/O request still in
      flight. num_groups_with_pending_reqs should be decremented when the
      in-flight request of the last descendant process is finally completed
      (assuming that nothing else has changed for the group in the meantime,
      in terms of composition of the group and active/inactive state of
      child groups and processes). To accomplish this, an additional
      pending-request counter must be added to entities, and must be
      updated correctly.
      
      To avoid this additional field and operations, this commit resorts to
      the following tradeoff between simplicity and accuracy: for an
      inactive group that is still counted in num_groups_with_pending_reqs,
      this commit decrements num_groups_with_pending_reqs when the first
      descendant process of the group remains with no request waiting for
      completion.
      
      This simplified scheme provides a fix to the unbalanced decrements
      introduced by 2d29c9f8. Since this error was also caused by lack
      of comments on this non-trivial issue, this commit also adds related
      comments.
      
      Fixes: 2d29c9f8 ("block, bfq: improve asymmetric scenarios detection")
      Reported-by: NSteven Barrett <steven@liquorix.net>
      Tested-by: NSteven Barrett <steven@liquorix.net>
      Tested-by: NLucjan Lucjanov <lucjan.lucjanov@gmail.com>
      Reviewed-by: NFederico Motta <federico@willer.it>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ba7aeae5
  4. 05 12月, 2018 1 次提交
    • J
      blk-mq: fix corruption with direct issue · ffe81d45
      Jens Axboe 提交于
      If we attempt a direct issue to a SCSI device, and it returns BUSY, then
      we queue the request up normally. However, the SCSI layer may have
      already setup SG tables etc for this particular command. If we later
      merge with this request, then the old tables are no longer valid. Once
      we issue the IO, we only read/write the original part of the request,
      not the new state of it.
      
      This causes data corruption, and is most often noticed with the file
      system complaining about the just read data being invalid:
      
      [  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)
      
      because most of it is garbage...
      
      This doesn't happen from the normal issue path, as we will simply defer
      the request to the hardware queue dispatch list if we fail. Once it's on
      the dispatch list, we never merge with it.
      
      Fix this from the direct issue path by flagging the request as
      REQ_NOMERGE so we don't change the size of it before issue.
      
      See also:
        https://bugzilla.kernel.org/show_bug.cgi?id=201685Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Fixes: 6ce3dd6e ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ffe81d45
  5. 04 12月, 2018 1 次提交
  6. 01 12月, 2018 5 次提交
    • J
      Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linus · 1c9b357c
      Jens Axboe 提交于
      Pull NVMe fixes from Christoph:
      
      "Various fixlets all over."
      
      * 'nvme-4.20' of git://git.infradead.org/nvme:
        nvme-rdma: fix double freeing of async event data
        nvme: flush namespace scanning work just before removing namespaces
        nvme: warn when finding multi-port subsystems without multipathing enabled
        nvme-pci: fix surprise removal
        nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()
        nvme: Free ctrl device name on init failure
      1c9b357c
    • M
      block: fix single range discard merge · 2a5cf35c
      Ming Lei 提交于
      There are actually two kinds of discard merge:
      
      - one is the normal discard merge, just like normal read/write request,
      and call it single-range discard
      
      - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1
      
      For the former case, queue_max_discard_segments(rq->q) is 1, and we
      should handle this kind of discard merge like the normal read/write
      request.
      
      This patch fixes the following kernel panic issue[1], which is caused by
      not removing the single-range discard request from elevator queue.
      
      Guangwu has one raid discard test case, in which this issue is a bit
      easier to trigger, and I verified that this patch can fix the kernel
      panic issue in Guangwu's test case.
      
      [1] kernel panic log from Jens's report
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
       PGD 0 P4D 0.
       Oops: 0000 [#1] SMP PTI
       CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
      4.20.0-rc3-00649-ge64d9a554a91-dirty #14  Hardware name: Wiwynn \
      Leopard-Orv2/Leopard-DDR BW, BIOS LBM08   03/03/2017       Workqueue: kblockd \
      blk_mq_run_work_fn                                            RIP: \
      0010:blk_mq_get_driver_tag+0x81/0x120                                       Code: 24 \
      10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
      0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
      f6 87 b0 00 00 00 02  RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246                     \
        RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
       RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
       RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
       R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
       R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
       FS:  0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        blk_mq_dispatch_rq_list+0xec/0x480
        ? elv_rb_del+0x11/0x30
        blk_mq_do_dispatch_sched+0x6e/0xf0
        blk_mq_sched_dispatch_requests+0xfa/0x170
        __blk_mq_run_hw_queue+0x5f/0xe0
        process_one_work+0x154/0x350
        worker_thread+0x46/0x3c0
        kthread+0xf5/0x130
        ? process_one_work+0x350/0x350
        ? kthread_destroy_worker+0x50/0x50
        ret_from_fork+0x1f/0x30
       Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
      kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
      cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
      button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
      nvme_core fuse sg loop efivarfs autofs4  CR2: 0000000000000148                        \
      
       ---[ end trace 340a1fb996df1b9b ]---
       RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
       Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
      00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \
      20 72 37 f6 87 b0 00 00 00 02
      
      Fixes: 445251d0 ("blk-mq: fix discard merge with scheduler attached")
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guangwu Zhang <guazhang@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2a5cf35c
    • P
      nvme-rdma: fix double freeing of async event data · 6344d02d
      Prabhath Sajeepa 提交于
      Some error paths in configuration of admin queue free data buffer
      associated with async request SQE without resetting the data buffer
      pointer to NULL, This buffer is also freed up again if the controller
      is shutdown or reset.
      Signed-off-by: NPrabhath Sajeepa <psajeepa@purestorage.com>
      Reviewed-by: NRoland Dreier <roland@purestorage.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      6344d02d
    • S
      nvme: flush namespace scanning work just before removing namespaces · f6c8e432
      Sagi Grimberg 提交于
      nvme_stop_ctrl can be called also for reset flow and there is no need to
      flush the scan_work as namespaces are not being removed. This can cause
      deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
      before controller teardown (and specifically I/O cancellation of the
      scan_work itself) takes place, but the scan_work will be blocked anyways
      so there is no need to flush it.
      
      Instead, move scan_work flush to nvme_remove_namespaces() where it really
      needs to flush.
      Reported-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed by: James Smart <jsmart2021@gmail.com>
      Tested-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f6c8e432
    • C
      nvme: warn when finding multi-port subsystems without multipathing enabled · 14a1336e
      Christoph Hellwig 提交于
      Without CONFIG_NVME_MULTIPATH enabled a multi-port subsystem might
      show up as invididual devices and cause problems, warn about it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      14a1336e
  7. 30 11月, 2018 1 次提交
    • M
      fs: fix lost error code in dio_complete · 41e817bc
      Maximilian Heyne 提交于
      commit e2592217 ("fs: simplify the
      generic_write_sync prototype") reworked callers of generic_write_sync(),
      and ended up dropping the error return for the directio path. Prior to
      that commit, in dio_complete(), an error would be bubbled up the stack,
      but after that commit, errors passed on to dio_complete were eaten up.
      
      This was reported on the list earlier, and a fix was proposed in
      https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
      never followed up with.  We recently hit this bug in our testing where
      fencing io errors, which were previously erroring out with EIO, were
      being returned as success operations after this commit.
      
      The fix proposed on the list earlier was a little short -- it would have
      still called generic_write_sync() in case `ret` already contained an
      error. This fix ensures generic_write_sync() is only called when there's
      no pending error in the write. Additionally, transferred is replaced
      with ret to bring this code in line with other callers.
      
      Fixes: e2592217 ("fs: simplify the generic_write_sync prototype")
      Reported-by: NRavi Nankani <rnankani@amazon.com>
      Signed-off-by: NMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      CC: Torsten Mehlan <tomeh@amazon.de>
      CC: Uwe Dannowski <uwed@amazon.de>
      CC: Amit Shah <aams@amazon.de>
      CC: David Woodhouse <dwmw@amazon.co.uk>
      CC: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      41e817bc
  8. 28 11月, 2018 2 次提交
    • I
      nvme-pci: fix surprise removal · 751a0cc0
      Igor Konopko 提交于
      When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls
      blk_cleanup_queue() on the admin queue, which frees the hctx for that
      queue.  Moments later, on the same path nvme_kill_queues() calls
      blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it,
      which leads to following OOPS:
      
      Oops: 0000 [#1] SMP PTI
      RIP: 0010:sbitmap_any_bit_set+0xb/0x40
      Call Trace:
       blk_mq_run_hw_queue+0xd5/0x150
       blk_mq_run_hw_queues+0x3a/0x50
       nvme_kill_queues+0x26/0x50
       nvme_remove_namespaces+0xb2/0xc0
       nvme_remove+0x60/0x140
       pci_device_remove+0x3b/0xb0
      
      Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      751a0cc0
    • E
      nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() · dfa74422
      Ewan D. Milne 提交于
      __nvme_fc_init_request() invokes memset() on the nvme_fcp_op_w_sgl structure, which
      NULLed-out the nvme_req(req)->ctrl field previously set by nvme_fc_init_request().
      This apparently was not referenced until commit faf4a44fff ("nvme: support traffic
      based keep-alive") which now results in a crash in nvme_complete_rq():
      
      [ 8386.897130] RIP: 0010:panic+0x220/0x26c
      [ 8386.901406] Code: 83 3d 6f ee 72 01 00 74 05 e8 e8 54 02 00 48 c7 c6 40 fd 5b b4 48 c7 c7 d8 8d c6 b3 31e
      [ 8386.922359] RSP: 0018:ffff99650019fc40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [ 8386.930804] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
      [ 8386.938764] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8e325f8168b0
      [ 8386.946725] RBP: ffff99650019fcb0 R08: 0000000000000000 R09: 00000000000004f8
      [ 8386.954687] R10: 0000000000000000 R11: ffff99650019f9b8 R12: ffffffffb3c55f3c
      [ 8386.962648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      [ 8386.970613]  oops_end+0xd1/0xe0
      [ 8386.974116]  no_context+0x1b2/0x3c0
      [ 8386.978006]  do_page_fault+0x32/0x140
      [ 8386.982090]  page_fault+0x1e/0x30
      [ 8386.985786] RIP: 0010:nvme_complete_rq+0x65/0x1d0 [nvme_core]
      [ 8386.992195] Code: 41 bc 03 00 00 00 74 16 0f 86 c3 00 00 00 66 3d 83 00 41 bc 06 00 00 00 0f 85 e7 00 000
      [ 8387.013147] RSP: 0018:ffff99650019fe18 EFLAGS: 00010246
      [ 8387.018973] RAX: 0000000000000000 RBX: ffff8e322ae51280 RCX: 0000000000000001
      [ 8387.026935] RDX: 0000000000000400 RSI: 0000000000000001 RDI: ffff8e322ae51280
      [ 8387.034897] RBP: ffff8e322ae51280 R08: 0000000000000000 R09: ffffffffb2f0b890
      [ 8387.042859] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [ 8387.050821] R13: 0000000000000100 R14: 0000000000000004 R15: ffff8e2b0446d990
      [ 8387.058782]  ? swiotlb_unmap_page+0x40/0x40
      [ 8387.063448]  nvme_fc_complete_rq+0x2d/0x70 [nvme_fc]
      [ 8387.068986]  blk_done_softirq+0xa1/0xd0
      [ 8387.073264]  __do_softirq+0xd6/0x2a9
      [ 8387.077251]  run_ksoftirqd+0x26/0x40
      [ 8387.081238]  smpboot_thread_fn+0x10e/0x160
      [ 8387.085807]  kthread+0xf8/0x130
      [ 8387.089309]  ? sort_range+0x20/0x20
      [ 8387.093198]  ? kthread_stop+0x110/0x110
      [ 8387.097475]  ret_from_fork+0x35/0x40
      [ 8387.101462] ---[ end trace 7106b0adf5e422f8 ]---
      
      Fixes: faf4a44fff ("nvme: support traffic based keep-alive")
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      dfa74422
  9. 27 11月, 2018 1 次提交
  10. 21 11月, 2018 1 次提交
  11. 15 11月, 2018 1 次提交
    • J
      nvme-fc: resolve io failures during connect · 4cff280a
      James Smart 提交于
      If an io error occurs on an io issued while connecting, recovery
      of the io falls flat as the state checking ends up nooping the error
      handler.
      
      Create an err_work work item that is scheduled upon an io error while
      connecting. The work thread terminates all io on all queues and marks
      the queues as not connected.  The termination of the io will return
      back to the callee, which will then back out of the connection attempt
      and will reschedule, if possible, the connection attempt.
      
      The changes:
      - in case there are several commands hitting the error handler, a
        state flag is kept so that the error work is only scheduled once,
        on the first error. The subsequent errors can be ignored.
      - The calling sequence to stop keep alive and terminate the queues
        and their io is lifted from the reset routine. Made a small
        service routine used by both reset and err_work.
      - During debugging, found that the teardown path can reference
        an uninitialized pointer, resulting in a NULL pointer oops.
        The aen_ops weren't initialized yet. Add validation on their
        initialization before calling the teardown routine.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4cff280a
  12. 14 11月, 2018 2 次提交
    • M
      SCSI: fix queue cleanup race before queue initialization is done · 8dc765d4
      Ming Lei 提交于
      c2856ae2 ("blk-mq: quiesce queue before freeing queue") has
      already fixed this race, however the implied synchronize_rcu()
      in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
      performance regression.
      
      Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      tried to quiesce queue for avoiding unnecessary synchronize_rcu()
      only when queue initialization is done, because it is usual to see
      lots of inexistent LUNs which need to be probed.
      
      However, turns out it isn't safe to quiesce queue only when queue
      initialization is done. Because when one SCSI command is completed,
      the user of sending command can be waken up immediately, then the
      scsi device may be removed, meantime the run queue in scsi_end_request()
      is still in-progress, so kernel panic can be caused.
      
      In Red Hat QE lab, there are several reports about this kind of kernel
      panic triggered during kernel booting.
      
      This patch tries to address the issue by grabing one queue usage
      counter during freeing one request and the following run queue.
      
      Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
      Cc: stable <stable@vger.kernel.org>
      Cc: jianchao.wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8dc765d4
    • D
      block: fix 32 bit overflow in __blkdev_issue_discard() · 4800bf7b
      Dave Chinner 提交于
      A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to
      fall into an endless loop in the discard code. The test is creating
      a device that is exactly 2^32 sectors in size to test mkfs boundary
      conditions around the 32 bit sector overflow region.
      
      mkfs issues a discard for the entire device size by default, and
      hence this throws a sector count of 2^32 into
      blkdev_issue_discard(). It takes the number of sectors to discard as
      a sector_t - a 64 bit value.
      
      The commit ba5d7385 ("block: cleanup __blkdev_issue_discard")
      takes this sector count and casts it to a 32 bit value before
      comapring it against the maximum allowed discard size the device
      has. This truncates away the upper 32 bits, and so if the lower 32
      bits of the sector count is zero, it starts issuing discards of
      length 0. This causes the code to fall into an endless loop, issuing
      a zero length discards over and over again on the same sector.
      
      Fixes: ba5d7385 ("block: cleanup __blkdev_issue_discard")
      Tested-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      
      Killed pointless WARN_ON().
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4800bf7b
  13. 13 11月, 2018 2 次提交
  14. 12 11月, 2018 1 次提交
  15. 10 11月, 2018 4 次提交
    • J
      floppy: fix race condition in __floppy_read_block_0() · de7b75d8
      Jens Axboe 提交于
      LKP recently reported a hang at bootup in the floppy code:
      
      [  245.678853] INFO: task mount:580 blocked for more than 120 seconds.
      [  245.679906]       Tainted: G                T 4.19.0-rc6-00172-ga9f38e1d #1
      [  245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  245.682181] mount           D 6372   580      1 0x00000004
      [  245.683023] Call Trace:
      [  245.683425]  __schedule+0x2df/0x570
      [  245.683975]  schedule+0x2d/0x80
      [  245.684476]  schedule_timeout+0x19d/0x330
      [  245.685090]  ? wait_for_common+0xa5/0x170
      [  245.685735]  wait_for_common+0xac/0x170
      [  245.686339]  ? do_sched_yield+0x90/0x90
      [  245.686935]  wait_for_completion+0x12/0x20
      [  245.687571]  __floppy_read_block_0+0xfb/0x150
      [  245.688244]  ? floppy_resume+0x40/0x40
      [  245.688844]  floppy_revalidate+0x20f/0x240
      [  245.689486]  check_disk_change+0x43/0x60
      [  245.690087]  floppy_open+0x1ea/0x360
      [  245.690653]  __blkdev_get+0xb4/0x4d0
      [  245.691212]  ? blkdev_get+0x1db/0x370
      [  245.691777]  blkdev_get+0x1f3/0x370
      [  245.692351]  ? path_put+0x15/0x20
      [  245.692871]  ? lookup_bdev+0x4b/0x90
      [  245.693539]  blkdev_get_by_path+0x3d/0x80
      [  245.694165]  mount_bdev+0x2a/0x190
      [  245.694695]  squashfs_mount+0x10/0x20
      [  245.695271]  ? squashfs_alloc_inode+0x30/0x30
      [  245.695960]  mount_fs+0xf/0x90
      [  245.696451]  vfs_kern_mount+0x43/0x130
      [  245.697036]  do_mount+0x187/0xc40
      [  245.697563]  ? memdup_user+0x28/0x50
      [  245.698124]  ksys_mount+0x60/0xc0
      [  245.698639]  sys_mount+0x19/0x20
      [  245.699167]  do_int80_syscall_32+0x61/0x130
      [  245.699813]  entry_INT80_32+0xc7/0xc7
      
      showing that we never complete that read request. The reason is that
      the completion setup is racy - it initializes the completion event
      AFTER submitting the IO, which means that the IO could complete
      before/during the init. If it does, we are passing garbage to
      complete() and we may sleep forever waiting for the event to
      occur.
      
      Fixes: 7b7b68bb ("floppy: bail out in open() if drive is not responding to block0 read")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      de7b75d8
    • L
      Merge tag 'for-linus-20181109' of git://git.kernel.dk/linux-block · dc5db218
      Linus Torvalds 提交于
      Pull block layer fixes from Jens Axboe:
      
       - Two fixes for an ubd regression, one for missing locking, and one for
         a missing initialization of a field. The latter was an old latent
         bug, but it's now visible and triggers (Me, Anton Ivanov)
      
       - Set of NVMe fixes via Christoph, but applied manually due to a git
         tree mixup (Christoph, Sagi)
      
       - Fix for a discard split regression, in three patches (Ming)
      
       - Update libata git trees (Geert)
      
       - SPDX identifier for sata_rcar (Kuninori Morimoto)
      
       - Virtual boundary merge fix (Johannes)
      
       - Preemptively clear memory we are going to pass to userspace, in case
         the driver does a short read (Keith)
      
      * tag 'for-linus-20181109' of git://git.kernel.dk/linux-block:
        block: make sure writesame bio is aligned with logical block size
        block: cleanup __blkdev_issue_discard()
        block: make sure discard bio is aligned with logical block size
        Revert "nvmet-rdma: use a private workqueue for delete"
        nvme: make sure ns head inherits underlying device limits
        nvmet: don't try to add ns to p2p map unless it actually uses it
        sata_rcar: convert to SPDX identifiers
        ubd: fix missing initialization of io_req
        block: Clear kernel memory before copying to user
        MAINTAINERS: Fix remaining pointers to obsolete libata.git
        ubd: fix missing lock around request issue
        block: respect virtual boundary mask in bvecs
      dc5db218
    • L
      Merge tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-client · d757a3b0
      Linus Torvalds 提交于
      Pull Ceph fixes from Ilya Dryomov:
       "Two CephFS fixes (copy_file_range and quota) and a small feature bit
        cleanup"
      
      * tag 'ceph-for-4.20-rc2' of https://github.com/ceph/ceph-client:
        libceph: assume argonaut on the server side
        ceph: quota: fix null pointer dereference in quota check
        ceph: add destination file data sync before doing any remote copy
      d757a3b0
    • L
      Merge tag 'mips_fixes_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 26eaed46
      Linus Torvalds 提交于
      Pull MIPS fixes from Paul Burton:
       "A couple of small MIPS fixes for 4.20:
      
         - Extend an array to avoid overruns on some Octeon hardware, fixing a
           bug introduced in 4.3.
      
         - Fix a coherent DMA regression for systems without cache-coherent
           DMA introduced in the 4.20 merge window"
      
      * tag 'mips_fixes_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
        MIPS: Fix `dma_alloc_coherent' returning a non-coherent allocation
        MIPS: OCTEON: fix out of bounds array access on CN68XX
      26eaed46
  16. 09 11月, 2018 11 次提交