1. 31 3月, 2017 2 次提交
    • M
      target: Fix ALUA transition state race between multiple initiators · d19c4643
      Mike Christie 提交于
      Multiple threads could be writing to alua_access_state at
      the same time, or there could be multiple STPGs in flight
      (different initiators sending them or one initiator sending
      them to different ports), or a combo of both and the
      core_alua_do_transition_tg_pt calls will race with each other.
      
      Because from the last patches we no longer delay running
      core_alua_do_transition_tg_pt_work, there does not seem to be
      any point in running that in a workqueue. And, we always
      wait for it to complete one way or another, so we can sleep
      in this code path. So, this patch made over target-pending just adds a
      mutex and does the work core_alua_do_transition_tg_pt_work was doing in
      core_alua_do_transition_tg_pt.
      
      There is also no need to use an atomic for the
      tg_pt_gp_alua_access_state. In core_alua_do_transition_tg_pt we will
      test and set it under the transition mutex. And, it is a int/32 bits
      so in the other places where it is read, we will never see it partially
      updated.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      d19c4643
    • N
      target: Fix unknown fabric callback queue-full errors · fa7e25cf
      Nicholas Bellinger 提交于
      This patch fixes a set of queue-full response handling
      bugs, where outgoing responses are leaked when a fabric
      driver is propagating non -EAGAIN or -ENOMEM errors
      to target-core.
      
      It introduces TRANSPORT_COMPLETE_QF_ERR state used to
      signal when CHECK_CONDITION status should be generated,
      when fabric driver ->write_pending(), ->queue_data_in(),
      or ->queue_status() callbacks fail with non -EAGAIN or
      -ENOMEM errors, and data-transfer should not be retried.
      
      Note all fabric driver -EAGAIN and -ENOMEM errors are
      still retried indefinately with associated data-transfer
      callbacks, following existing queue-full logic.
      
      Also fix two missing ->queue_status() queue-full cases
      related to CMD_T_ABORTED w/ TAS status handling.
      Reported-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Reviewed-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Tested-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Cc: Potnuri Bharat Teja <bharat@chelsio.com>
      Reported-by: NSteve Wise <swise@opengridcomputing.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      fa7e25cf
  2. 30 3月, 2017 1 次提交
    • N
      target: Avoid mappedlun symlink creation during lun shutdown · 49cb77e2
      Nicholas Bellinger 提交于
      This patch closes a race between se_lun deletion during configfs
      unlink in target_fabric_port_unlink() -> core_dev_del_lun()
      -> core_tpg_remove_lun(), when transport_clear_lun_ref() blocks
      waiting for percpu_ref RCU grace period to finish, but a new
      NodeACL mappedlun is added before the RCU grace period has
      completed.
      
      This can happen in target_fabric_mappedlun_link() because it
      only checks for se_lun->lun_se_dev, which is not cleared until
      after transport_clear_lun_ref() percpu_ref RCU grace period
      finishes.
      
      This bug originally manifested as NULL pointer dereference
      OOPsen in target_stat_scsi_att_intr_port_show_attr_dev() on
      v4.1.y code, because it dereferences lun->lun_se_dev without
      a explicit NULL pointer check.
      
      In post v4.1 code with target-core RCU conversion, the code
      in target_stat_scsi_att_intr_port_show_attr_dev() no longer
      uses se_lun->lun_se_dev, but the same race still exists.
      
      To address the bug, go ahead and set se_lun>lun_shutdown as
      early as possible in core_tpg_remove_lun(), and ensure new
      NodeACL mappedlun creation in target_fabric_mappedlun_link()
      fails during se_lun shutdown.
      Reported-by: NJames Shen <jcs@datera.io>
      Cc: James Shen <jcs@datera.io>
      Tested-by: NJames Shen <jcs@datera.io>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      49cb77e2
  3. 19 3月, 2017 1 次提交
  4. 03 3月, 2017 1 次提交
  5. 02 3月, 2017 1 次提交
  6. 27 2月, 2017 2 次提交
    • N
      target: Add counters for ABORT_TASK success + failure · c87ba9c4
      Nicholas Bellinger 提交于
      This patch introduces two counters for ABORT_TASK success +
      failure under:
      
         /sys/kernel/config/target/core/$HBA/$DEV/statistics/scsi_tgt_dev/
      
      that are useful for diagnosing various backend device latency
      and front fabric issues.
      
      Normally when folks see alot of aborts_complete happening,
      it means the backend device I/O completion latency is high,
      and not returning completions fast enough before host side
      timeouts trigger.
      
      And normally when folks see alot of aborts_no_task, it means
      completions are being posted by target-core into fabric driver
      code, but the responses aren't making it back to the host.
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      c87ba9c4
    • N
      target: Fix NULL dereference during LUN lookup + active I/O shutdown · bd4e2d29
      Nicholas Bellinger 提交于
      When transport_clear_lun_ref() is shutting down a se_lun via
      configfs with new I/O in-flight, it's possible to trigger a
      NULL pointer dereference in transport_lookup_cmd_lun() due
      to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
      checking before incrementing lun->lun_ref.count after
      lun->lun_ref has switched to atomic_t mode.
      
      This results in a NULL pointer dereference as LUN shutdown
      code in core_tpg_remove_lun() continues running after the
      existing ->release() -> core_tpg_lun_ref_release() callback
      completes, and clears the RCU protected se_lun->lun_se_dev
      pointer.
      
      During the OOPs, the state of lun->lun_ref in the process
      which triggered the NULL pointer dereference looks like
      the following on v4.1.y stable code:
      
      struct se_lun {
        lun_link_magic = 4294932337,
        lun_status = TRANSPORT_LUN_STATUS_FREE,
      
        .....
      
        lun_se_dev = 0x0,
        lun_sep = 0x0,
      
        .....
      
        lun_ref = {
          count = {
            counter = 1
          },
          percpu_count_ptr = 3,
          release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>,
          confirm_switch = 0x0,
          force_atomic = false,
          rcu = {
            next = 0xffff88154fa1a5d0,
            func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu>
          }
        }
      }
      
      To address this bug, use percpu_ref_tryget_live() to ensure
      once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
      has switched to atomic_t, all new I/Os will fail to obtain
      a new lun->lun_ref reference.
      
      Also use an explicit percpu_ref_kill_and_confirm() callback
      to block on ->lun_ref_comp to allow the first stage and
      associated RCU grace period to complete, and then block on
      ->lun_ref_shutdown waiting for the final percpu_ref_put()
      to drop the last reference via transport_lun_remove_cmd()
      before continuing with core_tpg_remove_lun() shutdown.
      Reported-by: NRob Millner <rlm@daterainc.com>
      Tested-by: NRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Tested-by: NVaibhav Tandon <vst@datera.io>
      Cc: Vaibhav Tandon <vst@datera.io>
      Tested-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      bd4e2d29
  7. 09 2月, 2017 5 次提交
  8. 28 1月, 2017 1 次提交
  9. 11 1月, 2017 1 次提交
  10. 16 12月, 2016 1 次提交
  11. 10 12月, 2016 1 次提交
    • B
      target: Minimize #include directives · 8dcf07be
      Bart Van Assche 提交于
      Remove superfluous #include directives from the include/target/*.h
      files. Add missing #include directives to other *.h and *.c files.
      Use forward declarations for structures where possible. This
      change reduces the build time for make M=drivers/target on my
      laptop from 27.1s to 18.7s or by about 30%.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
      8dcf07be
  12. 20 10月, 2016 1 次提交
    • N
      target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE · 449a1378
      Nicholas Bellinger 提交于
      This patch addresses a bug where EXTENDED_COPY across multiple LUNs
      results in a CHECK_CONDITION when the source + destination are not
      located on the same physical node.
      
      ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
      NOT REACHABLE to be returned when this occurs, in order to signal
      fallback to local copy method.
      
      As described in section 6.3.3 of spc4r22:
      
        "If it is not possible to complete processing of a segment because the
         copy manager is unable to establish communications with a copy target
         device, because the copy target device does not respond to INQUIRY,
         or because the data returned in response to INQUIRY indicates
         an unsupported logical unit, then the EXTENDED COPY command shall be
         terminated with CHECK CONDITION status, with the sense key set to
         COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
         NOT REACHABLE."
      
      Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.
      Reported-by: NNixon Vincent <nixon.vincent@calsoftinc.com>
      Tested-by: NNixon Vincent <nixon.vincent@calsoftinc.com>
      Cc: Nixon Vincent <nixon.vincent@calsoftinc.com>
      Tested-by: NDinesh Israni <ddi@datera.io>
      Signed-off-by: NDinesh Israni <ddi@datera.io>
      Cc: Dinesh Israni <ddi@datera.io>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      449a1378
  13. 20 7月, 2016 1 次提交
  14. 10 5月, 2016 2 次提交
  15. 11 3月, 2016 1 次提交
  16. 06 3月, 2016 1 次提交
  17. 11 2月, 2016 1 次提交
  18. 07 2月, 2016 1 次提交
  19. 06 2月, 2016 1 次提交
    • N
      target: Fix remote-port TMR ABORT + se_cmd fabric stop · 0f4a9431
      Nicholas Bellinger 提交于
      To address the bug where fabric driver level shutdown
      of se_cmd occurs at the same time when TMR CMD_T_ABORTED
      is happening resulting in a -1 ->cmd_kref, this patch
      adds a CMD_T_FABRIC_STOP bit that is used to determine
      when TMR + driver I_T nexus shutdown is happening
      concurrently.
      
      It changes target_sess_cmd_list_set_waiting() to obtain
      se_cmd->cmd_kref + set CMD_T_FABRIC_STOP, and drop local
      reference in target_wait_for_sess_cmds() and invoke extra
      target_put_sess_cmd() during Task Aborted Status (TAS)
      when necessary.
      
      Also, it adds a new target_wait_free_cmd() wrapper around
      transport_wait_for_tasks() for the special case within
      transport_generic_free_cmd() to set CMD_T_FABRIC_STOP,
      and is now aware of CMD_T_ABORTED + CMD_T_TAS status
      bits to know when an extra transport_put_cmd() during
      TAS is required.
      
      Note transport_generic_free_cmd() is expected to block on
      cmd->cmd_wait_comp in order to follow what iscsi-target
      expects during iscsi_conn context se_cmd shutdown.
      
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@daterainc.com>
      0f4a9431
  20. 04 2月, 2016 1 次提交
    • N
      target: Fix LUN_RESET active I/O handling for ACK_KREF · febe562c
      Nicholas Bellinger 提交于
      This patch fixes a NULL pointer se_cmd->cmd_kref < 0
      refcount bug during TMR LUN_RESET with active se_cmd
      I/O, that can be triggered during se_cmd descriptor
      shutdown + release via core_tmr_drain_state_list() code.
      
      To address this bug, add common __target_check_io_state()
      helper for ABORT_TASK + LUN_RESET w/ CMD_T_COMPLETE
      checking, and set CMD_T_ABORTED + obtain ->cmd_kref for
      both cases ahead of last target_put_sess_cmd() after
      TFO->aborted_task() -> transport_cmd_finish_abort()
      callback has completed.
      
      It also introduces SCF_ACK_KREF to determine when
      transport_cmd_finish_abort() needs to drop the second
      extra reference, ahead of calling target_put_sess_cmd()
      for the final kref_put(&se_cmd->cmd_kref).
      
      It also updates transport_cmd_check_stop() to avoid
      holding se_cmd->t_state_lock while dropping se_cmd
      device state via target_remove_from_state_list(), now
      that core_tmr_drain_state_list() is holding the
      se_device lock while checking se_cmd state from
      within TMR logic.
      
      Finally, move transport_put_cmd() release of SGL +
      TMR + extended CDB memory into target_free_cmd_mem()
      in order to avoid potential resource leaks in TMR
      ABORT_TASK + LUN_RESET code-paths.  Also update
      target_release_cmd_kref() accordingly.
      Reviewed-by: NQuinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      febe562c
  21. 08 1月, 2016 2 次提交
  22. 07 1月, 2016 1 次提交
  23. 29 11月, 2015 1 次提交
    • N
      target: Fix race for SCF_COMPARE_AND_WRITE_POST checking · 057085e5
      Nicholas Bellinger 提交于
      This patch addresses a race + use after free where the first
      stage of COMPARE_AND_WRITE in compare_and_write_callback()
      is rescheduled after the backend sends the secondary WRITE,
      resulting in second stage compare_and_write_post() callback
      completing in target_complete_ok_work() before the first
      can return.
      
      Because current code depends on checking se_cmd->se_cmd_flags
      after return from se_cmd->transport_complete_callback(),
      this results in first stage having SCF_COMPARE_AND_WRITE_POST
      set, which incorrectly falls through into second stage CAW
      processing code, eventually triggering a NULL pointer
      dereference due to use after free.
      
      To address this bug, pass in a new *post_ret parameter into
      se_cmd->transport_complete_callback(), and depend upon this
      value instead of ->se_cmd_flags to determine when to return
      or fall through into ->queue_status() code for CAW.
      
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: <stable@vger.kernel.org> # v3.12+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      057085e5
  24. 14 10月, 2015 1 次提交
  25. 25 9月, 2015 1 次提交
  26. 31 7月, 2015 4 次提交
  27. 07 7月, 2015 1 次提交
  28. 23 6月, 2015 2 次提交