1. 03 10月, 2017 6 次提交
  2. 01 10月, 2017 1 次提交
  3. 30 9月, 2017 2 次提交
  4. 27 9月, 2017 1 次提交
  5. 26 9月, 2017 20 次提交
  6. 25 9月, 2017 10 次提交
    • S
      block: fix a crash caused by wrong API · f5c156c4
      Shaohua Li 提交于
      part_stat_show takes a part device not a disk, so we should use
      part_to_disk.
      
      Fixes: d62e26b3("block: pass in queue to inflight accounting")
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5c156c4
    • L
      fs: Fix page cache inconsistency when mixing buffered and AIO DIO · 332391a9
      Lukas Czerner 提交于
      Currently when mixing buffered reads and asynchronous direct writes it
      is possible to end up with the situation where we have stale data in the
      page cache while the new data is already written to disk. This is
      permanent until the affected pages are flushed away. Despite the fact
      that mixing buffered and direct IO is ill-advised it does pose a thread
      for a data integrity, is unexpected and should be fixed.
      
      Fix this by deferring completion of asynchronous direct writes to a
      process context in the case that there are mapped pages to be found in
      the inode. Later before the completion in dio_complete() invalidate
      the pages in question. This ensures that after the completion the pages
      in the written area are either unmapped, or populated with up-to-date
      data. Also do the same for the iomap case which uses
      iomap_dio_complete() instead.
      
      This has a side effect of deferring the completion to a process context
      for every AIO DIO that happens on inode that has pages mapped. However
      since the consensus is that this is ill-advised practice the performance
      implication should not be a problem.
      
      This was based on proposal from Jeff Moyer, thanks!
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      332391a9
    • J
      nvmet: implement valid sqhd values in completions · bb1cc747
      James Smart 提交于
      To support sqhd, for initiators that are following the spec and
      paying attention to sqhd vs their sqtail values:
      
      - add sqhd to struct nvmet_sq
      - initialize sqhd to 0 in nvmet_sq_setup
      - rather than propagate the 0's-based qsize value from the connect message
        which requires a +1 in every sqhd update, and as nothing else references
        it, convert to 1's-based value in nvmt_sq/cq_setup() calls.
      - validate connect message sqsize being non-zero per spec.
      - updated assign sqhd for every completion that goes back.
      
      Also remove handling the NULL sq case in __nvmet_req_complete, as it can't
      happen with the current code.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb1cc747
    • G
      nvme-fabrics: Allow 0 as KATO value · 8edd11c9
      Guilherme G. Piccoli 提交于
      Currently, driver code allows user to set 0 as KATO
      (Keep Alive TimeOut), but this is not being respected.
      This patch enforces the expected behavior.
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8edd11c9
    • J
      nvme: allow timed-out ios to retry · 0951338d
      James Smart 提交于
      Currently the nvme_req_needs_retry() applies several checks to see if
      a retry is allowed. On of those is whether the current time has exceeded
      the start time of the io plus the timeout length. This check, if an io
      times out, means there is never a retry allowed for the io. Which means
      applications see the io failure.
      
      Remove this check and allow the io to timeout, like it does on other
      protocols, and retries to be made.
      
      On the FC transport, a frame can be lost for an individual io, and there
      may be no other errors that escalate for the connection/association.
      The io will timeout, which causes the transport to escalate into creating
      a new association, but the io that timed out, due to this retry logic, has
      already failed back to the application and things are hosed.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0951338d
    • J
      nvme: stop aer posting if controller state not live · cd48282c
      James Smart 提交于
      If an nvme async_event command completes, in most cases, a new
      async event is posted. However, if the controller enters a
      resetting or reconnecting state, there is nothing to block the
      scheduled work element from posting the async event again. Nor are
      there calls from the transport to stop async events when an
      association dies.
      
      In the case of FC, where the association is torn down, the aer must
      be aborted on the FC link and completes through the normal job
      completion path. Thus the terminated async event ends up being
      rescheduled even though the controller isn't in a valid state for
      the aer, and the reposting gets the transport into a partially torn
      down data structure.
      
      It's possible to hit the scenario on rdma, although much less likely
      due to an aer completing right as the association is terminated and
      as the association teardown reclaims the blk requests via
      nvme_cancel_request() so its immediate, not a link-related action
      like on FC.
      
      Fix by putting controller state checks in both the async event
      completion routine where it schedules the async event and in the
      async event work routine before it calls into the transport. It's
      effectively a "stop_async_events()" behavior.  The transport, when
      it creates a new association with the subsystem will transition
      the state back to live and is already restarting the async event
      posting.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      [hch: remove taking a lock over reading the controller state]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cd48282c
    • K
      nvme-pci: Print invalid SGL only once · d0877473
      Keith Busch 提交于
      The WARN_ONCE macro returns true if the condition is true, not if the
      warn was raised, so we're printing the scatter list every time it's
      invalid. This is excessive and makes debugging harder, so this patch
      prints it just once.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d0877473
    • K
      nvme-pci: initialize queue memory before interrupts · 161b8be2
      Keith Busch 提交于
      A spurious interrupt before the nvme driver has initialized the completion
      queue may inadvertently cause the driver to believe it has a completion
      to process. This may result in a NULL dereference since the nvmeq's tags
      are not set at this point.
      
      The patch initializes the host's CQ memory so that a spurious interrupt
      isn't mistaken for a real completion.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      161b8be2
    • J
      nvmet-fc: fix failing max io queue connections · deb61742
      James Smart 提交于
      fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g.
      admin queue plus NVMET_NR_QUEUES-1 io queues.  But NVMET_NR_QUEUES is
      the number of io queues, so maximum queue count is really
      NVMET_NR_QUEUES+1.
      
      Fix the handling in the target fc transport
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      deb61742
    • J
      nvme-fc: use transport-specific sgl format · d9d34c0b
      James Smart 提交于
      Sync with NVM Express spec change and FC-NVME 1.18.
      
      FC transport sets SGL type to Transport SGL Data Block Descriptor and
      subtype to transport-specific value 0x0A.
      
      Removed the warn-on's on the PRP fields. They are unneeded. They were
      to check for values from the upper layer that weren't set right, and
      for the most part were fine. But, with Async events, which reuse the
      same structure and 2nd time issued the SGL overlay converted them to
      the Transport SGL values - the warn-on's were errantly firing.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d9d34c0b