1. 15 9月, 2020 2 次提交
    • D
      nvme-pci: disable the write zeros command for Intel 600P/P3100 · ce4cc313
      David Milburn 提交于
      The write zeros command does not work with 4k range.
      
      bash-4.4# ./blkdiscard /dev/nvme0n1p2
      bash-4.4# strace -efallocate xfs_io -c "fzero 536895488 2048" /dev/nvme0n1p2
      fallocate(3, FALLOC_FL_ZERO_RANGE, 536895488, 2048) = 0
      +++ exited with 0 +++
      bash-4.4# dd bs=1 if=/dev/nvme0n1p2 skip=536895488 count=512 | hexdump -C
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00000200
      
      bash-4.4# ./blkdiscard /dev/nvme0n1p2
      bash-4.4# strace -efallocate xfs_io -c "fzero 536895488 4096" /dev/nvme0n1p2
      fallocate(3, FALLOC_FL_ZERO_RANGE, 536895488, 4096) = 0
      +++ exited with 0 +++
      bash-4.4# dd bs=1 if=/dev/nvme0n1p2 skip=536895488 count=512 | hexdump -C
      00000000  5c 61 5c b0 96 21 1b 5e  85 0c 07 32 9c 8c eb 3c  |\a\..!.^...2...<|
      00000010  4a a2 06 ca 67 15 2d 8e  29 8d a8 a0 7e 46 8c 62  |J...g.-.)...~F.b|
      00000020  bb 4c 6c c1 6b f5 ae a5  e4 a9 bc 93 4f 60 ff 7a  |.Ll.k.......O`.z|
      Reported-by: NEric Sandeen <esandeen@redhat.com>
      Signed-off-by: NDavid Milburn <dmilburn@redhat.com>
      Tested-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ce4cc313
    • J
      s390/dasd: Fix zero write for FBA devices · 709192d5
      Jan Höppner 提交于
      A discard request that writes zeros using the global kernel internal
      ZERO_PAGE will fail for machines with more than 2GB of memory due to the
      location of the ZERO_PAGE.
      
      Fix this by using a driver owned global zero page allocated with GFP_DMA
      flag set.
      
      Fixes: 28b841b3 ("s390/dasd: Add discard support for FBA devices")
      Signed-off-by: NJan Höppner <hoeppner@linux.ibm.com>
      Reviewed-by: NStefan Haberland <sth@linux.ibm.com>
      Cc: <stable@vger.kernel.org> # 4.14+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      709192d5
  2. 10 9月, 2020 1 次提交
    • J
      Merge tag 'nvme-5.9-2020-09-10' of git://git.infradead.org/nvme into block-5.9 · fd04358e
      Jens Axboe 提交于
      Pull NVMe fixes from Christoph.
      
      "nvme fixes for 5.9
      
       - cancel async events before freeing them (David Milburn)
       - revert a broken race fix (James Smart)
       - fix command processing during resets (Sagi Grimberg)"
      
      * tag 'nvme-5.9-2020-09-10' of git://git.infradead.org/nvme:
        nvme-fabrics: allow to queue requests for live queues
        nvme-tcp: cancel async events before freeing event struct
        nvme-rdma: cancel async events before freeing event struct
        nvme-fc: cancel async events before freeing event struct
        nvme: Revert: Fix controller creation races with teardown flow
      fd04358e
  3. 09 9月, 2020 7 次提交
    • R
      block: Set same_page to false in __bio_try_merge_page if ret is false · 2cd896a5
      Ritesh Harjani 提交于
      If we hit the UINT_MAX limit of bio->bi_iter.bi_size and so we are anyway
      not merging this page in this bio, then it make sense to make same_page
      also as false before returning.
      
      Without this patch, we hit below WARNING in iomap.
      This mostly happens with very large memory system and / or after tweaking
      vm dirty threshold params to delay writeback of dirty data.
      
      WARNING: CPU: 18 PID: 5130 at fs/iomap/buffered-io.c:74 iomap_page_release+0x120/0x150
       CPU: 18 PID: 5130 Comm: fio Kdump: loaded Tainted: G        W         5.8.0-rc3 #6
       Call Trace:
        __remove_mapping+0x154/0x320 (unreliable)
        iomap_releasepage+0x80/0x180
        try_to_release_page+0x94/0xe0
        invalidate_inode_page+0xc8/0x110
        invalidate_mapping_pages+0x1dc/0x540
        generic_fadvise+0x3c8/0x450
        xfs_file_fadvise+0x2c/0xe0 [xfs]
        vfs_fadvise+0x3c/0x60
        ksys_fadvise64_64+0x68/0xe0
        sys_fadvise64+0x28/0x40
        system_call_exception+0xf8/0x1c0
        system_call_common+0xf0/0x278
      
      Fixes: cc90bc68 ("block: fix "check bi_size overflow before merge"")
      Reported-by: NShivaprasad G Bhat <sbhat@linux.ibm.com>
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Signed-off-by: NRitesh Harjani <riteshh@linux.ibm.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2cd896a5
    • S
      nvme-fabrics: allow to queue requests for live queues · 73a53799
      Sagi Grimberg 提交于
      Right now we are failing requests based on the controller state (which
      is checked inline in nvmf_check_ready) however we should definitely
      accept requests if the queue is live.
      
      When entering controller reset, we transition the controller into
      NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath
      requests (have blk_noretry_request set).
      
      This is also the case for NVME_REQ_USER for the wrong reason. There
      shouldn't be any reason for us to reject this I/O in a controller reset.
      We do want to prevent passthru commands on the admin queue because we
      need the controller to fully initialize first before we let user passthru
      admin commands to be issued.
      
      In a non-mpath setup, this means that the requests will simply be
      requeued over and over forever not allowing the q_usage_counter to drop
      its final reference, causing controller reset to hang if running
      concurrently with heavy I/O.
      
      Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      73a53799
    • O
      block: only call sched requeue_request() for scheduled requests · e8a8a185
      Omar Sandoval 提交于
      Yang Yang reported the following crash caused by requeueing a flush
      request in Kyber:
      
        [    2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
        ...
        [    2.517468] pc : clear_bit+0x18/0x2c
        [    2.517502] lr : sbitmap_queue_clear+0x40/0x228
        [    2.517503] sp : ffffff800832bc60 pstate : 00c00145
        ...
        [    2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
        [    2.517602] Call trace:
        [    2.517606]  clear_bit+0x18/0x2c
        [    2.517619]  kyber_finish_request+0x74/0x80
        [    2.517627]  blk_mq_requeue_request+0x3c/0xc0
        [    2.517637]  __scsi_queue_insert+0x11c/0x148
        [    2.517640]  scsi_softirq_done+0x114/0x130
        [    2.517643]  blk_done_softirq+0x7c/0xb0
        [    2.517651]  __do_softirq+0x208/0x3bc
        [    2.517657]  run_ksoftirqd+0x34/0x60
        [    2.517663]  smpboot_thread_fn+0x1c4/0x2c0
        [    2.517667]  kthread+0x110/0x120
        [    2.517669]  ret_from_fork+0x10/0x18
      
      This happens because Kyber doesn't track flush requests, so
      kyber_finish_request() reads a garbage domain token. Only call the
      scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
      the finish_request() hook in blk_mq_free_request()). Now that we're
      handling it in blk-mq, also remove the check from BFQ.
      Reported-by: NYang Yang <yang.yang@vivo.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e8a8a185
    • D
      nvme-tcp: cancel async events before freeing event struct · ceb1e087
      David Milburn 提交于
      Cancel async event work in case async event has been queued up, and
      nvme_tcp_submit_async_event() runs after event has been freed.
      Signed-off-by: NDavid Milburn <dmilburn@redhat.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ceb1e087
    • D
      nvme-rdma: cancel async events before freeing event struct · 925dd04c
      David Milburn 提交于
      Cancel async event work in case async event has been queued up, and
      nvme_rdma_submit_async_event() runs after event has been freed.
      Signed-off-by: NDavid Milburn <dmilburn@redhat.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      925dd04c
    • D
      nvme-fc: cancel async events before freeing event struct · e126e821
      David Milburn 提交于
      Cancel async event work in case async event has been queued up, and
      nvme_fc_submit_async_event() runs after event has been freed.
      Signed-off-by: NDavid Milburn <dmilburn@redhat.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e126e821
    • J
      nvme: Revert: Fix controller creation races with teardown flow · b63de840
      James Smart 提交于
      The indicated patch introduced a barrier in the sysfs_delete attribute
      for the controller that rejects the request if the controller isn't
      created. "Created" is defined as at least 1 call to nvme_start_ctrl().
      
      This is problematic in error-injection testing.  If an error occurs on
      the initial attempt to create an association and the controller enters
      reconnect(s) attempts, the admin cannot delete the controller until
      either there is a successful association created or ctrl_loss_tmo
      times out.
      
      Where this issue is particularly hurtful is when the "admin" is the
      nvme-cli, it is performing a connection to a discovery controller, and
      it is initiated via auto-connect scripts.  With the FC transport, if the
      first connection attempt fails, the controller enters a normal reconnect
      state but returns control to the cli thread that created the controller.
      In this scenario, the cli attempts to read the discovery log via ioctl,
      which fails, causing the cli to see it as an empty log and then proceeds
      to delete the discovery controller. The delete is rejected and the
      controller is left live. If the discovery controller reconnect then
      succeeds, there is no action to delete it, and it sits live doing nothing.
      
      Cc: <stable@vger.kernel.org> # v5.7+
      Fixes: ce151813 ("nvme: Fix controller creation races with teardown flow")
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      CC: Israel Rukshin <israelr@mellanox.com>
      CC: Max Gurtovoy <maxg@mellanox.com>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Keith Busch <kbusch@kernel.org>
      CC: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b63de840
  4. 08 9月, 2020 1 次提交
  5. 03 9月, 2020 1 次提交
  6. 02 9月, 2020 2 次提交
  7. 01 9月, 2020 3 次提交
  8. 30 8月, 2020 1 次提交
    • J
      Merge branch 'nvme-5.9-rc' of git://git.infradead.org/nvme into block-5.9 · 5d220bcd
      Jens Axboe 提交于
      Pull NVMe fixes from Sagi:
      
      "- instance leak and io boundary fixes from Keith
       - fc locking fix from Christophe
       - various tcp/rdma reset during traffic fixes from Me
       - pci use-after-free fix from Tong
       - tcp target null deref fix from Ziye"
      
      * 'nvme-5.9-rc' of git://git.infradead.org/nvme:
        nvme-pci: cancel nvme device request before disabling
        nvme: only use power of two io boundaries
        nvme: fix controller instance leak
        nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
        nvme: Fix NULL dereference for pci nvme controllers
        nvme-rdma: fix reset hang if controller died in the middle of a reset
        nvme-rdma: fix timeout handler
        nvme-rdma: serialize controller teardown sequences
        nvme-tcp: fix reset hang if controller died in the middle of a reset
        nvme-tcp: fix timeout handler
        nvme-tcp: serialize controller teardown sequences
        nvme: have nvme_wait_freeze_timeout return if it timed out
        nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance
        nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
      5d220bcd
  9. 29 8月, 2020 14 次提交
    • T
      nvme-pci: cancel nvme device request before disabling · 7ad92f65
      Tong Zhang 提交于
      This patch addresses an irq free warning and null pointer dereference
      error problem when nvme devices got timeout error during initialization.
      This problem happens when nvme_timeout() function is called while
      nvme_reset_work() is still in execution. This patch fixed the problem by
      setting flag of the problematic request to NVME_REQ_CANCELLED before
      calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
      an error code and let nvme_submit_sync_cmd() fail gracefully.
      The following is console output.
      
      [   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
      [   62.488796] nvme nvme0: could not set timestamp (881)
      [   62.494888] ------------[ cut here ]------------
      [   62.495142] Trying to free already-free IRQ 11
      [   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
      [   62.495742] Modules linked in:
      [   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
      [   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
      [   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
      [   62.497019] RIP: 0010:free_irq+0x1f7/0x370
      [   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
      [   62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
      [   62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
      [   62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
      [   62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
      [   62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
      [   62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
      [   62.500140] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [   62.500534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [   62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   62.501864] Call Trace:
      [   62.501993]  pci_free_irq+0x13/0x20
      [   62.502167]  nvme_reset_work+0x5d0/0x12a0
      [   62.502369]  ? update_load_avg+0x59/0x580
      [   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
      [   62.502780]  ? try_to_wake_up+0x1a2/0x450
      [   62.502979]  process_one_work+0x1d2/0x390
      [   62.503179]  worker_thread+0x45/0x3b0
      [   62.503361]  ? process_one_work+0x390/0x390
      [   62.503568]  kthread+0xf9/0x130
      [   62.503726]  ? kthread_park+0x80/0x80
      [   62.503911]  ret_from_fork+0x22/0x30
      [   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
      [  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
      [  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
      [  123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  123.917469] #PF: supervisor write access in kernel mode
      [  123.917725] #PF: error_code(0x0002) - not-present page
      [  123.917976] PGD 0 P4D 0
      [  123.918109] Oops: 0002 [#1] SMP PTI
      [  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G        W         5.8.0+ #8
      [  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
      [  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
      [  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
      [  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
      [  123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
      [  123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
      [  123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
      [  123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
      [  123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
      [  123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
      [  123.922641] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [  123.923032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [  123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  123.924353] Call Trace:
      [  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
      [  123.924694]  nvme_reset_work+0xed6/0x12a0
      [  123.924898]  process_one_work+0x1d2/0x390
      [  123.925099]  worker_thread+0x45/0x3b0
      [  123.925280]  ? process_one_work+0x390/0x390
      [  123.925486]  kthread+0xf9/0x130
      [  123.925642]  ? kthread_park+0x80/0x80
      [  123.925825]  ret_from_fork+0x22/0x30
      [  123.926004] Modules linked in:
      [  123.926158] CR2: 0000000000000000
      [  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
      [  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
      [  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
      [  123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
      [  123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
      [  123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
      [  123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
      [  123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
      [  123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
      [  123.929715] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [  123.930106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [  123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Co-developed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NTong Zhang <ztong0001@gmail.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7ad92f65
    • K
      nvme: only use power of two io boundaries · e83d776f
      Keith Busch 提交于
      The kernel requires a power of two for boundaries because that's the
      only way it can efficiently split commands that cross them. A
      controller, however, may report a non-power of two boundary.
      
      The driver had been rounding the controller's value to one the kernel
      can use, but splitting on the wrong boundary provides no benefit on the
      device side, and incurs additional submission overhead from non-optimal
      splits.
      
      Don't provide any boundary hint if the controller's value can't be used
      and log a warning when first scanning a disk's unreported IO boundary.
      Since the chunk sector logic has grown, move it to a separate function.
      
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e83d776f
    • K
      nvme: fix controller instance leak · 192f6c29
      Keith Busch 提交于
      If the driver has to unbind from the controller for an early failure
      before the subsystem has been set up, there won't be a subsystem holding
      the controller's instance, so the controller needs to free its own
      instance in this case.
      
      Fixes: 733e4b69 ("nvme: Assign subsys instance from first ctrl")
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      192f6c29
    • C
      nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()' · 70e37988
      Christophe JAILLET 提交于
      The way 'spin_lock()' and 'spin_lock_irqsave()' are used is not consistent
      in this function.
      
      Use 'spin_lock_irqsave()' also here, as there is no guarantee that
      interruptions are disabled at that point, according to surrounding code.
      
      Fixes: a97ec51b ("nvmet_fc: Rework target side abort handling")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      70e37988
    • S
      nvme: Fix NULL dereference for pci nvme controllers · 7cd49f75
      Sagi Grimberg 提交于
      PCIe controllers do not have fabric opts, verify they exist before
      showing ctrl_loss_tmo or reconnect_delay attributes.
      
      Fixes: 764075fd ("nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs")
      Reported-by: NTobias Markus <tobias@markus-regensburg.de>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7cd49f75
    • S
      nvme-rdma: fix reset hang if controller died in the middle of a reset · 2362acb6
      Sagi Grimberg 提交于
      If the controller becomes unresponsive in the middle of a reset, we
      will hang because we are waiting for the freeze to complete, but that
      cannot happen since we have commands that are inflight holding the
      q_usage_counter, and we can't blindly fail requests that times out.
      
      So give a timeout and if we cannot wait for queue freeze before
      unfreezing, fail and have the error handling take care how to
      proceed (either schedule a reconnect of remove the controller).
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      2362acb6
    • S
      nvme-rdma: fix timeout handler · 0475a8dc
      Sagi Grimberg 提交于
      When a request times out in a LIVE state, we simply trigger error
      recovery and let the error recovery handle the request cancellation,
      however when a request times out in a non LIVE state, we make sure to
      complete it immediately as it might block controller setup or teardown
      and prevent forward progress.
      
      However tearing down the entire set of I/O and admin queues causes
      freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
      an overkill to what we actually need, which is to just fence controller
      teardown that may be running, stop the queue, and cancel the request if
      it is not already completed.
      
      Now that we have the controller teardown_lock, we can safely serialize
      request cancellation. This addresses a hang caused by calling extra
      queue freeze on controller namespaces, causing unfreeze to not complete
      correctly.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      0475a8dc
    • S
      nvme-rdma: serialize controller teardown sequences · 5110f402
      Sagi Grimberg 提交于
      In the timeout handler we may need to complete a request because the
      request that timed out may be an I/O that is a part of a serial sequence
      of controller teardown or initialization. In order to complete the
      request, we need to fence any other context that may compete with us
      and complete the request that is timing out.
      
      In this case, we could have a potential double completion in case
      a hard-irq or a different competing context triggered error recovery
      and is running inflight request cancellation concurrently with the
      timeout handler.
      
      Protect using a ctrl teardown_lock to serialize contexts that may
      complete a cancelled request due to error recovery or a reset.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      5110f402
    • S
      nvme-tcp: fix reset hang if controller died in the middle of a reset · e5c01f4f
      Sagi Grimberg 提交于
      If the controller becomes unresponsive in the middle of a reset, we will
      hang because we are waiting for the freeze to complete, but that cannot
      happen since we have commands that are inflight holding the
      q_usage_counter, and we can't blindly fail requests that times out.
      
      So give a timeout and if we cannot wait for queue freeze before
      unfreezing, fail and have the error handling take care how to proceed
      (either schedule a reconnect of remove the controller).
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e5c01f4f
    • S
      nvme-tcp: fix timeout handler · 236187c4
      Sagi Grimberg 提交于
      When a request times out in a LIVE state, we simply trigger error
      recovery and let the error recovery handle the request cancellation,
      however when a request times out in a non LIVE state, we make sure to
      complete it immediately as it might block controller setup or teardown
      and prevent forward progress.
      
      However tearing down the entire set of I/O and admin queues causes
      freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
      an overkill to what we actually need, which is to just fence controller
      teardown that may be running, stop the queue, and cancel the request if
      it is not already completed.
      
      Now that we have the controller teardown_lock, we can safely serialize
      request cancellation. This addresses a hang caused by calling extra
      queue freeze on controller namespaces, causing unfreeze to not complete
      correctly.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      236187c4
    • S
      nvme-tcp: serialize controller teardown sequences · d4d61470
      Sagi Grimberg 提交于
      In the timeout handler we may need to complete a request because the
      request that timed out may be an I/O that is a part of a serial sequence
      of controller teardown or initialization. In order to complete the
      request, we need to fence any other context that may compete with us
      and complete the request that is timing out.
      
      In this case, we could have a potential double completion in case
      a hard-irq or a different competing context triggered error recovery
      and is running inflight request cancellation concurrently with the
      timeout handler.
      
      Protect using a ctrl teardown_lock to serialize contexts that may
      complete a cancelled request due to error recovery or a reset.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      d4d61470
    • S
      nvme: have nvme_wait_freeze_timeout return if it timed out · 7cf0d7c0
      Sagi Grimberg 提交于
      Users can detect if the wait has completed or not and take appropriate
      actions based on this information (e.g. weather to continue
      initialization or rather fail and schedule another initialization
      attempt).
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7cf0d7c0
    • S
      nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance · d7144f5c
      Sagi Grimberg 提交于
      NVME_CTRL_NEW should never see any I/O, because in order to start
      initialization it has to transition to NVME_CTRL_CONNECTING and from
      there it will never return to this state.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      d7144f5c
    • Z
      nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu · a6ce7d7b
      Ziye Yang 提交于
      When handling commands without in-capsule data, we assign the ttag
      assuming we already have the queue commands array allocated (based
      on the queue size information in the connect data payload). However
      if the connect itself did not send the connect data in-capsule we
      have yet to allocate the queue commands,and we will assign a bogus
      ttag and suffer a NULL dereference when we receive the corresponding
      h2cdata pdu.
      
      Fix this by checking if we already allocated commands before
      dereferencing it when handling h2cdata, if we didn't, its for sure a
      connect and we should use the preallocated connect command.
      Signed-off-by: NZiye Yang <ziye.yang@intel.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      a6ce7d7b
  10. 28 8月, 2020 2 次提交
  11. 26 8月, 2020 2 次提交
  12. 22 8月, 2020 4 次提交