1. 29 8月, 2020 12 次提交
    • K
      nvme: fix controller instance leak · 192f6c29
      Keith Busch 提交于
      If the driver has to unbind from the controller for an early failure
      before the subsystem has been set up, there won't be a subsystem holding
      the controller's instance, so the controller needs to free its own
      instance in this case.
      
      Fixes: 733e4b69 ("nvme: Assign subsys instance from first ctrl")
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      192f6c29
    • C
      nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()' · 70e37988
      Christophe JAILLET 提交于
      The way 'spin_lock()' and 'spin_lock_irqsave()' are used is not consistent
      in this function.
      
      Use 'spin_lock_irqsave()' also here, as there is no guarantee that
      interruptions are disabled at that point, according to surrounding code.
      
      Fixes: a97ec51b ("nvmet_fc: Rework target side abort handling")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      70e37988
    • S
      nvme: Fix NULL dereference for pci nvme controllers · 7cd49f75
      Sagi Grimberg 提交于
      PCIe controllers do not have fabric opts, verify they exist before
      showing ctrl_loss_tmo or reconnect_delay attributes.
      
      Fixes: 764075fd ("nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs")
      Reported-by: NTobias Markus <tobias@markus-regensburg.de>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7cd49f75
    • S
      nvme-rdma: fix reset hang if controller died in the middle of a reset · 2362acb6
      Sagi Grimberg 提交于
      If the controller becomes unresponsive in the middle of a reset, we
      will hang because we are waiting for the freeze to complete, but that
      cannot happen since we have commands that are inflight holding the
      q_usage_counter, and we can't blindly fail requests that times out.
      
      So give a timeout and if we cannot wait for queue freeze before
      unfreezing, fail and have the error handling take care how to
      proceed (either schedule a reconnect of remove the controller).
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      2362acb6
    • S
      nvme-rdma: fix timeout handler · 0475a8dc
      Sagi Grimberg 提交于
      When a request times out in a LIVE state, we simply trigger error
      recovery and let the error recovery handle the request cancellation,
      however when a request times out in a non LIVE state, we make sure to
      complete it immediately as it might block controller setup or teardown
      and prevent forward progress.
      
      However tearing down the entire set of I/O and admin queues causes
      freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
      an overkill to what we actually need, which is to just fence controller
      teardown that may be running, stop the queue, and cancel the request if
      it is not already completed.
      
      Now that we have the controller teardown_lock, we can safely serialize
      request cancellation. This addresses a hang caused by calling extra
      queue freeze on controller namespaces, causing unfreeze to not complete
      correctly.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      0475a8dc
    • S
      nvme-rdma: serialize controller teardown sequences · 5110f402
      Sagi Grimberg 提交于
      In the timeout handler we may need to complete a request because the
      request that timed out may be an I/O that is a part of a serial sequence
      of controller teardown or initialization. In order to complete the
      request, we need to fence any other context that may compete with us
      and complete the request that is timing out.
      
      In this case, we could have a potential double completion in case
      a hard-irq or a different competing context triggered error recovery
      and is running inflight request cancellation concurrently with the
      timeout handler.
      
      Protect using a ctrl teardown_lock to serialize contexts that may
      complete a cancelled request due to error recovery or a reset.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      5110f402
    • S
      nvme-tcp: fix reset hang if controller died in the middle of a reset · e5c01f4f
      Sagi Grimberg 提交于
      If the controller becomes unresponsive in the middle of a reset, we will
      hang because we are waiting for the freeze to complete, but that cannot
      happen since we have commands that are inflight holding the
      q_usage_counter, and we can't blindly fail requests that times out.
      
      So give a timeout and if we cannot wait for queue freeze before
      unfreezing, fail and have the error handling take care how to proceed
      (either schedule a reconnect of remove the controller).
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e5c01f4f
    • S
      nvme-tcp: fix timeout handler · 236187c4
      Sagi Grimberg 提交于
      When a request times out in a LIVE state, we simply trigger error
      recovery and let the error recovery handle the request cancellation,
      however when a request times out in a non LIVE state, we make sure to
      complete it immediately as it might block controller setup or teardown
      and prevent forward progress.
      
      However tearing down the entire set of I/O and admin queues causes
      freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
      an overkill to what we actually need, which is to just fence controller
      teardown that may be running, stop the queue, and cancel the request if
      it is not already completed.
      
      Now that we have the controller teardown_lock, we can safely serialize
      request cancellation. This addresses a hang caused by calling extra
      queue freeze on controller namespaces, causing unfreeze to not complete
      correctly.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      236187c4
    • S
      nvme-tcp: serialize controller teardown sequences · d4d61470
      Sagi Grimberg 提交于
      In the timeout handler we may need to complete a request because the
      request that timed out may be an I/O that is a part of a serial sequence
      of controller teardown or initialization. In order to complete the
      request, we need to fence any other context that may compete with us
      and complete the request that is timing out.
      
      In this case, we could have a potential double completion in case
      a hard-irq or a different competing context triggered error recovery
      and is running inflight request cancellation concurrently with the
      timeout handler.
      
      Protect using a ctrl teardown_lock to serialize contexts that may
      complete a cancelled request due to error recovery or a reset.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      d4d61470
    • S
      nvme: have nvme_wait_freeze_timeout return if it timed out · 7cf0d7c0
      Sagi Grimberg 提交于
      Users can detect if the wait has completed or not and take appropriate
      actions based on this information (e.g. weather to continue
      initialization or rather fail and schedule another initialization
      attempt).
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7cf0d7c0
    • S
      nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance · d7144f5c
      Sagi Grimberg 提交于
      NVME_CTRL_NEW should never see any I/O, because in order to start
      initialization it has to transition to NVME_CTRL_CONNECTING and from
      there it will never return to this state.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      d7144f5c
    • Z
      nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu · a6ce7d7b
      Ziye Yang 提交于
      When handling commands without in-capsule data, we assign the ttag
      assuming we already have the queue commands array allocated (based
      on the queue size information in the connect data payload). However
      if the connect itself did not send the connect data in-capsule we
      have yet to allocate the queue commands,and we will assign a bogus
      ttag and suffer a NULL dereference when we receive the corresponding
      h2cdata pdu.
      
      Fix this by checking if we already allocated commands before
      dereferencing it when handling h2cdata, if we didn't, its for sure a
      connect and we should use the preallocated connect command.
      Signed-off-by: NZiye Yang <ziye.yang@intel.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      a6ce7d7b
  2. 28 8月, 2020 2 次提交
  3. 26 8月, 2020 2 次提交
  4. 22 8月, 2020 23 次提交
  5. 18 8月, 2020 1 次提交
    • D
      bfq: fix blkio cgroup leakage v4 · 2de791ab
      Dmitry Monakhov 提交于
      Changes from v1:
          - update commit description with proper ref-accounting justification
      
      commit db37a34c ("block, bfq: get a ref to a group when adding it to a service tree")
      introduce leak forbfq_group and blkcg_gq objects because of get/put
      imbalance.
      In fact whole idea of original commit is wrong because bfq_group entity
      can not dissapear under us because it is referenced by child bfq_queue's
      entities from here:
       -> bfq_init_entity()
          ->bfqg_and_blkg_get(bfqg);
          ->entity->parent = bfqg->my_entity
      
       -> bfq_put_queue(bfqq)
          FINAL_PUT
          ->bfqg_and_blkg_put(bfqq_group(bfqq))
          ->kmem_cache_free(bfq_pool, bfqq);
      
      So parent entity can not disappear while child entity is in tree,
      and child entities already has proper protection.
      This patch revert commit db37a34c ("block, bfq: get a ref to a group when adding it to a service tree")
      
      bfq_group leak trace caused by bad commit:
      -> blkg_alloc
         -> bfq_pq_alloc
           -> bfqg_get (+1)
      ->bfq_activate_bfqq
        ->bfq_activate_requeue_entity
          -> __bfq_activate_entity
             ->bfq_get_entity
               ->bfqg_and_blkg_get (+1)  <==== : Note1
      ->bfq_del_bfqq_busy
        ->bfq_deactivate_entity+0x53/0xc0 [bfq]
          ->__bfq_deactivate_entity+0x1b8/0x210 [bfq]
            -> bfq_forget_entity(is_in_service = true)
      	 entity->on_st_or_in_serv = false   <=== :Note2
      	 if (is_in_service)
      	     return;  ==> do not touch reference
      -> blkcg_css_offline
       -> blkcg_destroy_blkgs
        -> blkg_destroy
         -> bfq_pd_offline
          -> __bfq_deactivate_entity
               if (!entity->on_st_or_in_serv) /* true, because (Note2)
      		return false;
       -> bfq_pd_free
          -> bfqg_put() (-1, byt bfqg->ref == 2) because of (Note2)
      So bfq_group and blkcg_gq  will leak forever, see test-case below.
      
      ##TESTCASE_BEGIN:
      #!/bin/bash
      
      max_iters=${1:-100}
      #prep cgroup mounts
      mount -t tmpfs cgroup_root /sys/fs/cgroup
      mkdir /sys/fs/cgroup/blkio
      mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
      
      # Prepare blkdev
      grep blkio /proc/cgroups
      truncate -s 1M img
      losetup /dev/loop0 img
      echo bfq > /sys/block/loop0/queue/scheduler
      
      grep blkio /proc/cgroups
      for ((i=0;i<max_iters;i++))
      do
          mkdir -p /sys/fs/cgroup/blkio/a
          echo 0 > /sys/fs/cgroup/blkio/a/cgroup.procs
          dd if=/dev/loop0 bs=4k count=1 of=/dev/null iflag=direct 2> /dev/null
          echo 0 > /sys/fs/cgroup/blkio/cgroup.procs
          rmdir /sys/fs/cgroup/blkio/a
          grep blkio /proc/cgroups
      done
      ##TESTCASE_END:
      
      Fixes: db37a34c ("block, bfq: get a ref to a group when adding it to a service tree")
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NDmitry Monakhov <dmtrmonakhov@yandex-team.ru>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2de791ab