1. 11 11月, 2017 11 次提交
  2. 03 11月, 2017 1 次提交
  3. 01 11月, 2017 4 次提交
  4. 27 10月, 2017 5 次提交
  5. 20 10月, 2017 1 次提交
  6. 19 10月, 2017 1 次提交
  7. 16 10月, 2017 1 次提交
  8. 04 10月, 2017 1 次提交
  9. 26 9月, 2017 1 次提交
  10. 25 9月, 2017 2 次提交
    • J
      nvme: allow timed-out ios to retry · 0951338d
      James Smart 提交于
      Currently the nvme_req_needs_retry() applies several checks to see if
      a retry is allowed. On of those is whether the current time has exceeded
      the start time of the io plus the timeout length. This check, if an io
      times out, means there is never a retry allowed for the io. Which means
      applications see the io failure.
      
      Remove this check and allow the io to timeout, like it does on other
      protocols, and retries to be made.
      
      On the FC transport, a frame can be lost for an individual io, and there
      may be no other errors that escalate for the connection/association.
      The io will timeout, which causes the transport to escalate into creating
      a new association, but the io that timed out, due to this retry logic, has
      already failed back to the application and things are hosed.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0951338d
    • J
      nvme: stop aer posting if controller state not live · cd48282c
      James Smart 提交于
      If an nvme async_event command completes, in most cases, a new
      async event is posted. However, if the controller enters a
      resetting or reconnecting state, there is nothing to block the
      scheduled work element from posting the async event again. Nor are
      there calls from the transport to stop async events when an
      association dies.
      
      In the case of FC, where the association is torn down, the aer must
      be aborted on the FC link and completes through the normal job
      completion path. Thus the terminated async event ends up being
      rescheduled even though the controller isn't in a valid state for
      the aer, and the reposting gets the transport into a partially torn
      down data structure.
      
      It's possible to hit the scenario on rdma, although much less likely
      due to an aer completing right as the association is terminated and
      as the association teardown reclaims the blk requests via
      nvme_cancel_request() so its immediate, not a link-related action
      like on FC.
      
      Fix by putting controller state checks in both the async event
      completion routine where it schedules the async event and in the
      async event work routine before it calls into the transport. It's
      effectively a "stop_async_events()" behavior.  The transport, when
      it creates a new association with the subsystem will transition
      the state back to live and is already restarting the async event
      posting.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      [hch: remove taking a lock over reading the controller state]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cd48282c
  11. 12 9月, 2017 2 次提交
  12. 30 8月, 2017 3 次提交
  13. 29 8月, 2017 7 次提交