1. 18 2月, 2010 1 次提交
    • M
      [SCSI] libiscsi: reset cmd timer if cmds are making progress · 92ed4d69
      Mike Christie 提交于
      This patch resets the cmd timer if cmds started before
      the timedout command are making progress. The idea is
      that the cmd probably timed out because we are trying
      to exeucte too many commands. If it turns out that the
      device the IO timedout on was bad or the cmd just got
      screwed up but other IO/devs were ok then we will
      will figure this out when the cmds ahead of the timed
      out one complete ok.
      
      This also fixes a bug where we were sort of detecting
      this by setting the last_timeout and last_xfer to the
      same value when the task was allocated. That caught
      the case where we never got to send any IO for it. However,
      if the problem had started right before we started the
      new task, then we were forced to wait an extra cmd
      timeout seconds to start the scsi eh.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      92ed4d69
  2. 23 12月, 2009 4 次提交
  3. 05 12月, 2009 5 次提交
  4. 03 10月, 2009 1 次提交
  5. 12 9月, 2009 2 次提交
  6. 05 9月, 2009 3 次提交
  7. 30 7月, 2009 1 次提交
  8. 21 6月, 2009 2 次提交
  9. 24 5月, 2009 11 次提交
  10. 27 4月, 2009 1 次提交
  11. 03 4月, 2009 1 次提交
    • J
      [SCSI] libiscsi: fix iscsi pool error path · fd6e1c14
      Jean Delvare 提交于
      Le lundi 30 mars 2009, Chris Wright a écrit :
      > q->queue could be ERR_PTR(-ENOMEM) which will break unwinding
      > on error.  Make iscsi_pool_free more defensive.
      >
      
      Making the freeing of q->queue dependent on q->pool being set looks
      really weird (although it is correct at the moment. But this seems
      to be fixable in a much simpler way.
      
      With the benefit that only the error case is slowed down. In both
      cases we have a problem if q->queue contains an error value but it's
      not -ENOMEM. Apparently this can't happen today, but it doesn't feel
      right to assume this will always be true. Maybe it's the right time
      to fix this as well.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      fd6e1c14
  12. 14 3月, 2009 7 次提交
  13. 11 2月, 2009 1 次提交
    • M
      [SCSI] libiscsi: Fix scsi command timeout oops in iscsi_eh_timed_out · 308cec14
      Mike Christie 提交于
      Yanling Qi from LSI found the root cause of the panic, below is his
      analysis:
      
      Problem description: the open iscsi driver installs eh_timed_out handler
      to the
      blank_transport_template of the scsi middle level that causes panic of
      timed
      out command of other host
      
      Here are the details
      
      Iscsi Session creation
      
      During iscsi session creation time, the iscsi_tcp_session_create() of
      iscsi_tpc.c will create a scsi-host for the session. See the statement
      marked
      with the label A. The statement B replaces the shost->transportt point
      with a
      local struct variable.
      
      static struct iscsi_cls_session *
      iscsi_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
                               uint16_t qdepth, uint32_t initial_cmdsn,
                               uint32_t *hostno)
      {
              struct iscsi_cls_session *cls_session;
              struct iscsi_session *session;
              struct Scsi_Host *shost;
              int cmd_i;
              if (ep) {
                      printk(KERN_ERR "iscsi_tcp: invalid ep %p.\n", ep);
                      return NULL;
              }
      
      A        shost = iscsi_host_alloc(&iscsi_sht, 0, qdepth);
      
              if (!shost)
      
                      return NULL;
      
      B         shost->transportt = iscsi_tcp_scsi_transport;
      
              shost->max_lun = iscsi_max_lun;
      
      Please note the scsi host is allocated by invoking isccsi_host_alloc()
      in
      libiscsi.c
      
      Polluting the middle level blank_transport_template in
      iscsi_host_alloc() of
      libiscsi.c
      
      The iscsi_host_alloc() invokes the middle level function
      scsi_host_alloc() in
      hosts.c for allocating a scsi_host. Then the statement marked with C
      assigns
      the iscsi_eh_cmd_timed_out handler to the eh_timed_out callback
      function.
      
      struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
      
                                         int dd_data_size, uint16_t qdepth)
      
      {
              struct Scsi_Host *shost;
              struct iscsi_host *ihost;
              shost = scsi_host_alloc(sht, sizeof(struct iscsi_host) +
      dd_data_size);
              if (!shost)
                      return NULL;
      
       C      shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out;
      
      Please note the shost->transport is the middle level
      blank_transport_template
      as shown in the code segment below. We see two problems here. 1.
      iscsi_eh_cmd_timed_out is installed to the blank_transport_template that
      will
      cause some body else problem. 2. iscsi_eh_cmd_timed_out will never be
      invoked
      when iscsi command gets timeout because the statement B resets the
      pointer.
      
      Middle level blank_transport_template
      
      In the middle level function scsi_host_alloc() of hosts.c, the middle
      level
      assigns a blank_transport_template for those hosts not implementing its
      transport layer. All HBAs without supporting a specific scsi_transport
      will
      share the middle level blank_transport_template. Please see the
      statement D
      
      struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int
      privsize)
      
      {
              struct Scsi_Host *shost;
              gfp_t gfp_mask = GFP_KERNEL;
              int rval;
              if (sht->unchecked_isa_dma && privsize)
                      gfp_mask |= __GFP_DMA;
      
               shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);
              if (!shost)
                      return NULL;
      
              shost->host_lock = &shost->default_lock;
      
              spin_lock_init(shost->host_lock);
      
              shost->shost_state = SHOST_CREATED;
      
              INIT_LIST_HEAD(&shost->__devices);
      
              INIT_LIST_HEAD(&shost->__targets);
      
              INIT_LIST_HEAD(&shost->eh_cmd_q);
      
              INIT_LIST_HEAD(&shost->starved_list);
      
              init_waitqueue_head(&shost->host_wait);
      
              mutex_init(&shost->scan_mutex);
      
              shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy */
      
              shost->dma_channel = 0xff;
      
              /* These three are default values which can be overridden */
      
              shost->max_channel = 0;
      
              shost->max_id = 8;
      
              shost->max_lun = 8;
      
              /* Give each shost a default transportt */
      
       D       shost->transportt = &blank_transport_template;
      
      Why we see panic at iscsi_eh_cmd_timed_out()
      
      The mpp virtual HBA doesn’t have a specific scsi_transport. Therefore,
      the
      blank_transport_template will be assigned to the virtual host of the MPP
      virtual HBA by SCSI middle level. Please note that the statement C has
      assigned
      iscsi-transport eh_timedout handler to the blank_transport_template.
      When a mpp
      virtual command gets timedout, the iscsi_eh_cmd_timed_out() will be
      invoked to
      handle mpp virtual command timeout from the middle level
      scsi_times_out()
      function of the scsi_error.c.
      
      enum blk_eh_timer_return scsi_times_out(struct request *req)
      
      {
      
              struct scsi_cmnd *scmd = req->special;
      
              enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
      
              enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
      
              scsi_log_completion(scmd, TIMEOUT_ERROR);
      
              if (scmd->device->host->transportt->eh_timed_out)
      
       E               eh_timed_out =
      scmd->device->host->transportt->eh_timed_out;
      
              else if (scmd->device->host->hostt->eh_timed_out)
      
                      eh_timed_out = scmd->device->host->hostt->eh_timed_out;
      
              else
      
                      eh_timed_out = NULL;
      
              if (eh_timed_out) {
      
                      rtn = eh_timed_out(scmd);
      
      It is very easy to understand why we get panic in the
      iscsi_eh_cmd_timed_out().
      A scsi_cmnd from a no-iscsi device definitely can not resolve out a
      session and
      session->lock. The panic can be happed anywhere during the differencing.
      
      static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd
      *scmd)
      
      {
      
              struct iscsi_cls_session *cls_session;
      
              struct iscsi_session *session;
      
              struct iscsi_conn *conn;
      
              enum blk_eh_timer_return rc = BLK_EH_NOT_HANDLED;
      
              cls_session = starget_to_session(scsi_target(scmd->device));
      
              session = cls_session->dd_data;
      
              debug_scsi("scsi cmd %p timedout\n", scmd);
      
              spin_lock(&session->lock);
      
      This patch fixes the problem by moving the setting of the
      iscsi_eh_cmd_timed_out to iscsi_add_host, which is after the LLDs
      have set their transport template to shost->transportt.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      308cec14