1. 13 7月, 2017 1 次提交
    • M
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko 提交于
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
  2. 09 6月, 2017 1 次提交
    • N
      target: Fix kref->refcount underflow in transport_cmd_finish_abort · 73d4e580
      Nicholas Bellinger 提交于
      This patch fixes a se_cmd->cmd_kref underflow during CMD_T_ABORTED
      when a fabric driver drops it's second reference from below the
      target_core_tmr.c based callers of transport_cmd_finish_abort().
      
      Recently with the conversion of kref to refcount_t, this bug was
      manifesting itself as:
      
      [705519.601034] refcount_t: underflow; use-after-free.
      [705519.604034] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 20116.512 msecs
      [705539.719111] ------------[ cut here ]------------
      [705539.719117] WARNING: CPU: 3 PID: 26510 at lib/refcount.c:184 refcount_sub_and_test+0x33/0x51
      
      Since the original kref atomic_t based kref_put() didn't check for
      underflow and only invoked the final callback when zero was reached,
      this bug did not manifest in practice since all se_cmd memory is
      using preallocated tags.
      
      To address this, go ahead and propigate the existing return from
      transport_put_cmd() up via transport_cmd_finish_abort(), and
      change transport_cmd_finish_abort() + core_tmr_handle_tas_abort()
      callers to only do their local target_put_sess_cmd() if necessary.
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Tested-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: stable@vger.kernel.org # 3.14+
      Tested-by: NGary Guo <ghg@datera.io>
      Tested-by: NChu Yuan Lin <cyl@datera.io>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      73d4e580
  3. 16 5月, 2017 1 次提交
    • N
      target: Re-add check to reject control WRITEs with overflow data · 4ff83daa
      Nicholas Bellinger 提交于
      During v4.3 when the overflow/underflow check was relaxed by
      commit c72c5250:
      
        commit c72c5250
        Author: Roland Dreier <roland@purestorage.com>
        Date:   Wed Jul 22 15:08:18 2015 -0700
      
             target: allow underflow/overflow for PR OUT etc. commands
      
      to allow underflow/overflow for Windows compliance + FCP, a
      consequence was to allow control CDBs to process overflow
      data for iscsi-target with immediate data as well.
      
      As per Roland's original change, continue to allow underflow
      cases for control CDBs to make Windows compliance + FCP happy,
      but until overflow for control CDBs is supported tree-wide,
      explicitly reject all control WRITEs with overflow following
      pre v4.3.y logic.
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: <stable@vger.kernel.org> # v4.3+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      4ff83daa
  4. 02 5月, 2017 1 次提交
  5. 31 3月, 2017 1 次提交
  6. 19 3月, 2017 1 次提交
  7. 27 2月, 2017 1 次提交
    • N
      target: Fix NULL dereference during LUN lookup + active I/O shutdown · bd4e2d29
      Nicholas Bellinger 提交于
      When transport_clear_lun_ref() is shutting down a se_lun via
      configfs with new I/O in-flight, it's possible to trigger a
      NULL pointer dereference in transport_lookup_cmd_lun() due
      to the fact percpu_ref_get() doesn't do any __PERCPU_REF_DEAD
      checking before incrementing lun->lun_ref.count after
      lun->lun_ref has switched to atomic_t mode.
      
      This results in a NULL pointer dereference as LUN shutdown
      code in core_tpg_remove_lun() continues running after the
      existing ->release() -> core_tpg_lun_ref_release() callback
      completes, and clears the RCU protected se_lun->lun_se_dev
      pointer.
      
      During the OOPs, the state of lun->lun_ref in the process
      which triggered the NULL pointer dereference looks like
      the following on v4.1.y stable code:
      
      struct se_lun {
        lun_link_magic = 4294932337,
        lun_status = TRANSPORT_LUN_STATUS_FREE,
      
        .....
      
        lun_se_dev = 0x0,
        lun_sep = 0x0,
      
        .....
      
        lun_ref = {
          count = {
            counter = 1
          },
          percpu_count_ptr = 3,
          release = 0xffffffffa02fa1e0 <core_tpg_lun_ref_release>,
          confirm_switch = 0x0,
          force_atomic = false,
          rcu = {
            next = 0xffff88154fa1a5d0,
            func = 0xffffffff8137c4c0 <percpu_ref_switch_to_atomic_rcu>
          }
        }
      }
      
      To address this bug, use percpu_ref_tryget_live() to ensure
      once __PERCPU_REF_DEAD is visable on all CPUs and ->lun_ref
      has switched to atomic_t, all new I/Os will fail to obtain
      a new lun->lun_ref reference.
      
      Also use an explicit percpu_ref_kill_and_confirm() callback
      to block on ->lun_ref_comp to allow the first stage and
      associated RCU grace period to complete, and then block on
      ->lun_ref_shutdown waiting for the final percpu_ref_put()
      to drop the last reference via transport_lun_remove_cmd()
      before continuing with core_tpg_remove_lun() shutdown.
      Reported-by: NRob Millner <rlm@daterainc.com>
      Tested-by: NRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Tested-by: NVaibhav Tandon <vst@datera.io>
      Cc: Vaibhav Tandon <vst@datera.io>
      Tested-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org> # v3.14+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      bd4e2d29
  8. 21 2月, 2017 1 次提交
  9. 09 2月, 2017 8 次提交
  10. 08 2月, 2017 1 次提交
    • N
      target: Fix early transport_generic_handle_tmr abort scenario · c54eeffb
      Nicholas Bellinger 提交于
      This patch fixes a bug where incoming task management requests
      can be explicitly aborted during an active LUN_RESET, but who's
      struct work_struct are canceled in-flight before execution.
      
      This occurs when core_tmr_drain_tmr_list() invokes cancel_work_sync()
      for the incoming se_tmr_req->task_cmd->work, resulting in cmd->work
      for target_tmr_work() never getting invoked and the aborted TMR
      waiting indefinately within transport_wait_for_tasks().
      
      To address this case, perform a CMD_T_ABORTED check early in
      transport_generic_handle_tmr(), and invoke the normal path via
      transport_cmd_check_stop_to_fabric() to complete any TMR kthreads
      blocked waiting for CMD_T_STOP in transport_wait_for_tasks().
      
      Also, move the TRANSPORT_ISTATE_PROCESSING assignment earlier
      into transport_generic_handle_tmr() so the existing check in
      core_tmr_drain_tmr_list() avoids attempting abort the incoming
      se_tmr_req->task_cmd->work if it has already been queued into
      se_device->tmr_wq.
      Reported-by: NRob Millner <rlm@daterainc.com>
      Tested-by: NRob Millner <rlm@daterainc.com>
      Cc: Rob Millner <rlm@daterainc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      c54eeffb
  11. 11 1月, 2017 1 次提交
  12. 21 10月, 2016 1 次提交
  13. 20 10月, 2016 3 次提交
    • N
      Revert "target: Fix residual overflow handling in target_complete_cmd_with_length" · 61f36166
      Nicholas Bellinger 提交于
      This reverts commit c1ccbfe0.
      
      Reverting this patch, as it incorrectly assumes the additional length
      for INQUIRY in target_complete_cmd_with_length() is SCSI allocation
      length, which breaks existing user-space code when SCSI allocation
      length is smaller than additional length.
      
        root@scsi-mq:~# sg_inq --len=4 -vvvv /dev/sdb
        found bsg_major=253
        open /dev/sdb with flags=0x800
            inquiry cdb: 12 00 00 00 04 00
              duration=0 ms
            inquiry: pass-through requested 4 bytes (data-in) but got -28 bytes
            inquiry: pass-through can't get negative bytes, say it got none
            inquiry: got too few bytes (0)
        INQUIRY resid (32) should never exceed requested len=4
            inquiry: failed requesting 4 byte response: Malformed response to
                     SCSI command [resid=32]
      
      AFAICT the original change was not to address a specific host issue,
      so go ahead and revert to original logic for now.
      
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Sumit Rai <sumitrai96@gmail.com>
      Cc: stable@vger.kernel.org # 4.8+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      61f36166
    • N
      target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE · 449a1378
      Nicholas Bellinger 提交于
      This patch addresses a bug where EXTENDED_COPY across multiple LUNs
      results in a CHECK_CONDITION when the source + destination are not
      located on the same physical node.
      
      ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
      NOT REACHABLE to be returned when this occurs, in order to signal
      fallback to local copy method.
      
      As described in section 6.3.3 of spc4r22:
      
        "If it is not possible to complete processing of a segment because the
         copy manager is unable to establish communications with a copy target
         device, because the copy target device does not respond to INQUIRY,
         or because the data returned in response to INQUIRY indicates
         an unsupported logical unit, then the EXTENDED COPY command shall be
         terminated with CHECK CONDITION status, with the sense key set to
         COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
         NOT REACHABLE."
      
      Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.
      Reported-by: NNixon Vincent <nixon.vincent@calsoftinc.com>
      Tested-by: NNixon Vincent <nixon.vincent@calsoftinc.com>
      Cc: Nixon Vincent <nixon.vincent@calsoftinc.com>
      Tested-by: NDinesh Israni <ddi@datera.io>
      Signed-off-by: NDinesh Israni <ddi@datera.io>
      Cc: Dinesh Israni <ddi@datera.io>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      449a1378
    • N
      target: Re-add missing SCF_ACK_KREF assignment in v4.1.y · 527268df
      Nicholas Bellinger 提交于
      This patch fixes a regression in >= v4.1.y code where the original
      SCF_ACK_KREF assignment in target_get_sess_cmd() was dropped upstream
      in commit 054922bb, but the series for addressing TMR ABORT_TASK +
      LUN_RESET with fabric session reinstatement in commit febe562c still
      depends on this code in transport_cmd_finish_abort().
      
      The regression manifests itself as a se_cmd->cmd_kref +1 leak, where
      ABORT_TASK + LUN_RESET can hang indefinately for a specific I_T session
      for drivers using SCF_ACK_KREF, resulting in hung kthreads.
      
      This patch has been verified with v4.1.y code.
      Reported-by: NVaibhav Tandon <vst@datera.io>
      Tested-by: NVaibhav Tandon <vst@datera.io>
      Cc: Vaibhav Tandon <vst@datera.io>
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      527268df
  14. 24 7月, 2016 1 次提交
  15. 20 7月, 2016 4 次提交
    • N
      target: Fix ordered task CHECK_CONDITION early exception handling · 410c29df
      Nicholas Bellinger 提交于
      If a Simple command is sent with a failure, target_setup_cmd_from_cdb
      returns with TCM_UNSUPPORTED_SCSI_OPCODE or TCM_INVALID_CDB_FIELD.
      
      So in the cases where target_setup_cmd_from_cdb returns an error, we
      never get far enough to call target_execute_cmd to increment simple_cmds.
      Since simple_cmds isn't incremented, the result of the failure from
      target_setup_cmd_from_cdb causes transport_generic_request_failure to
      decrement simple_cmds, due to call to transport_complete_task_attr.
      
      With this dev->simple_cmds or dev->dev_ordered_sync is now -1, not 0.
      So when a subsequent command with an Ordered Task is sent, it causes
      a hang, since dev->simple_cmds is at -1.
      Tested-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Tested-by: NMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Signed-off-by: NMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 4.4+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      410c29df
    • N
      target: Fix ordered task target_setup_cmd_from_cdb exception hang · dff0ca9e
      Nicholas Bellinger 提交于
      If a command with a Simple task attribute is failed due to a Unit
      Attention, then a subsequent command with an Ordered task attribute
      will hang forever.  The reason for this is that the Unit Attention
      status is checked for in target_setup_cmd_from_cdb, before the call
      to target_execute_cmd, which calls target_handle_task_attr, which
      in turn increments dev->simple_cmds.
      
      However, transport_generic_request_failure still calls
      transport_complete_task_attr, which will decrement dev->simple_cmds.
      In this case, simple_cmds is now -1.  So when a command with the
      Ordered task attribute is sent, target_handle_task_attr sees that
      dev->simple_cmds is not 0, so it decides it can't execute the
      command until all the (nonexistent) Simple commands have completed.
      Reported-by: NMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Tested-by: NMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Reported-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Tested-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 4.4+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      dff0ca9e
    • N
      target: Fix race between iscsi-target connection shutdown + ABORT_TASK · 064cdd2d
      Nicholas Bellinger 提交于
      This patch fixes a race in iscsit_release_commands_from_conn() ->
      iscsit_free_cmd() -> transport_generic_free_cmd() + wait_for_tasks=1,
      where CMD_T_FABRIC_STOP could end up being set after the final
      kref_put() is called from core_tmr_abort_task() context.
      
      This results in transport_generic_free_cmd() blocking indefinately
      on se_cmd->cmd_wait_comp, because the target_release_cmd_kref()
      check for CMD_T_FABRIC_STOP returns false.
      
      To address this bug, make iscsit_release_commands_from_conn()
      do list_splice and set CMD_T_FABRIC_STOP early while holding
      iscsi_conn->cmd_lock.  Also make iscsit_aborted_task() only
      remove iscsi_cmd_t if CMD_T_FABRIC_STOP has not already been
      set.
      
      Finally in target_release_cmd_kref(), only honor fabric_stop
      if CMD_T_ABORTED has been set.
      
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.14+
      Tested-by: NNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      064cdd2d
    • N
      target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP · 5e2c956b
      Nicholas Bellinger 提交于
      During transport_generic_free_cmd() with a concurrent TMR
      ABORT_TASK and shutdown CMD_T_FABRIC_STOP bit set, the
      caller will be blocked on se_cmd->cmd_wait_stop completion
      until the final kref_put() -> target_release_cmd_kref()
      has been invoked to call complete().
      
      However, when ABORT_TASK is completed with FUNCTION_COMPLETE
      in core_tmr_abort_task(), the aborted se_cmd will have already
      been removed from se_sess->sess_cmd_list via list_del_init().
      
      This results in target_release_cmd_kref() hitting the
      legacy list_empty() == true check, invoking ->release_cmd()
      but skipping complete() to wakeup se_cmd->cmd_wait_stop
      blocked earlier in transport_generic_free_cmd() code.
      
      To address this bug, it's safe to go ahead and drop the
      original list_empty() check so that fabric_stop invokes
      the complete() as expected, since list_del_init() can
      safely be used on a empty list.
      
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.14+
      Tested-by: NNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      5e2c956b
  16. 14 5月, 2016 1 次提交
  17. 10 5月, 2016 3 次提交
  18. 22 3月, 2016 1 次提交
  19. 11 3月, 2016 1 次提交
    • N
      target: Avoid DataIN transfers for non-GOOD SAM status · 4347ab5a
      Nicholas Bellinger 提交于
      This patch modifies existing transport_complete_*() code
      to avoid invoking target_core_fabric_ops->queue_data_in()
      driver callbacks for I/O READs with non-GOOD SAM status.
      
      Some initiators expect GOOD status when a DATA-IN payload
      transfer is involved, so to be safe go ahead and always
      invoke target_core_fabric_ops->queue_status() to generate
      fabric responses instead.
      
      Note this is a prerequisite for IBLOCK supporting retriable
      status, so SAM_STAT_BUSY + SAM_STAT_TASK_SET_FULL always
      generates fabric driver responses instead of initiating
      DataIN payload transfer when non-GOOD status is present
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      4347ab5a
  20. 28 2月, 2016 1 次提交
  21. 11 2月, 2016 1 次提交
  22. 07 2月, 2016 2 次提交
    • N
      target: Drop legacy se_cmd->task_stop_comp + REQUEST_STOP usage · 57dae190
      Nicholas Bellinger 提交于
      With CMD_T_FABRIC_STOP + se_cmd->cmd_wait_set usage in place,
      go ahead and drop left-over CMD_T_REQUEST_STOP checks in
      target_complete_cmd() and unused target_stop_cmd().
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      57dae190
    • N
      target: Fix race with SCF_SEND_DELAYED_TAS handling · 310d3d31
      Nicholas Bellinger 提交于
      This patch fixes a race between setting of SCF_SEND_DELAYED_TAS
      in transport_send_task_abort(), and check of the same bit in
      transport_check_aborted_status().
      
      It adds a __transport_check_aborted_status() version that is
      used by target_execute_cmd() when se_cmd->t_state_lock is
      held, and a transport_check_aborted_status() wrapper for
      all other existing callers.
      
      Also, it handles the case where the check happens before
      transport_send_task_abort() gets called.  For this, go
      ahead and set SCF_SEND_DELAYED_TAS early when necessary,
      and have transport_send_task_abort() send the abort.
      
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      310d3d31
  23. 06 2月, 2016 1 次提交
    • N
      target: Fix remote-port TMR ABORT + se_cmd fabric stop · 0f4a9431
      Nicholas Bellinger 提交于
      To address the bug where fabric driver level shutdown
      of se_cmd occurs at the same time when TMR CMD_T_ABORTED
      is happening resulting in a -1 ->cmd_kref, this patch
      adds a CMD_T_FABRIC_STOP bit that is used to determine
      when TMR + driver I_T nexus shutdown is happening
      concurrently.
      
      It changes target_sess_cmd_list_set_waiting() to obtain
      se_cmd->cmd_kref + set CMD_T_FABRIC_STOP, and drop local
      reference in target_wait_for_sess_cmds() and invoke extra
      target_put_sess_cmd() during Task Aborted Status (TAS)
      when necessary.
      
      Also, it adds a new target_wait_free_cmd() wrapper around
      transport_wait_for_tasks() for the special case within
      transport_generic_free_cmd() to set CMD_T_FABRIC_STOP,
      and is now aware of CMD_T_ABORTED + CMD_T_TAS status
      bits to know when an extra transport_put_cmd() during
      TAS is required.
      
      Note transport_generic_free_cmd() is expected to block on
      cmd->cmd_wait_comp in order to follow what iscsi-target
      expects during iscsi_conn context se_cmd shutdown.
      
      Cc: Quinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@daterainc.com>
      0f4a9431
  24. 04 2月, 2016 2 次提交
    • N
      target: Fix LUN_RESET active TMR descriptor handling · a6d9bb1c
      Nicholas Bellinger 提交于
      This patch fixes a NULL pointer se_cmd->cmd_kref < 0
      refcount bug during TMR LUN_RESET with active TMRs,
      triggered during se_cmd + se_tmr_req descriptor
      shutdown + release via core_tmr_drain_tmr_list().
      
      To address this bug, go ahead and obtain a local
      kref_get_unless_zero(&se_cmd->cmd_kref) for active I/O
      to set CMD_T_ABORTED, and transport_wait_for_tasks()
      followed by the final target_put_sess_cmd() to drop
      the local ->cmd_kref.
      
      Also add two new checks within target_tmr_work() to
      avoid CMD_T_ABORTED -> TFO->queue_tm_rsp() callbacks
      ahead of invoking the backend -> fabric put in
      transport_cmd_check_stop_to_fabric().
      
      For good measure, also change core_tmr_release_req()
      to use list_del_init() ahead of se_tmr_req memory
      free.
      Reviewed-by: NQuinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      a6d9bb1c
    • N
      target: Fix LUN_RESET active I/O handling for ACK_KREF · febe562c
      Nicholas Bellinger 提交于
      This patch fixes a NULL pointer se_cmd->cmd_kref < 0
      refcount bug during TMR LUN_RESET with active se_cmd
      I/O, that can be triggered during se_cmd descriptor
      shutdown + release via core_tmr_drain_state_list() code.
      
      To address this bug, add common __target_check_io_state()
      helper for ABORT_TASK + LUN_RESET w/ CMD_T_COMPLETE
      checking, and set CMD_T_ABORTED + obtain ->cmd_kref for
      both cases ahead of last target_put_sess_cmd() after
      TFO->aborted_task() -> transport_cmd_finish_abort()
      callback has completed.
      
      It also introduces SCF_ACK_KREF to determine when
      transport_cmd_finish_abort() needs to drop the second
      extra reference, ahead of calling target_put_sess_cmd()
      for the final kref_put(&se_cmd->cmd_kref).
      
      It also updates transport_cmd_check_stop() to avoid
      holding se_cmd->t_state_lock while dropping se_cmd
      device state via target_remove_from_state_list(), now
      that core_tmr_drain_state_list() is holding the
      se_device lock while checking se_cmd state from
      within TMR logic.
      
      Finally, move transport_put_cmd() release of SGL +
      TMR + extended CDB memory into target_free_cmd_mem()
      in order to avoid potential resource leaks in TMR
      ABORT_TASK + LUN_RESET code-paths.  Also update
      target_release_cmd_kref() accordingly.
      Reviewed-by: NQuinn Tran <quinn.tran@qlogic.com>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Sagi Grimberg <sagig@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Andy Grover <agrover@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      febe562c