1. 14 3月, 2009 6 次提交
  2. 11 2月, 2009 1 次提交
    • M
      [SCSI] libiscsi: Fix scsi command timeout oops in iscsi_eh_timed_out · 308cec14
      Mike Christie 提交于
      Yanling Qi from LSI found the root cause of the panic, below is his
      analysis:
      
      Problem description: the open iscsi driver installs eh_timed_out handler
      to the
      blank_transport_template of the scsi middle level that causes panic of
      timed
      out command of other host
      
      Here are the details
      
      Iscsi Session creation
      
      During iscsi session creation time, the iscsi_tcp_session_create() of
      iscsi_tpc.c will create a scsi-host for the session. See the statement
      marked
      with the label A. The statement B replaces the shost->transportt point
      with a
      local struct variable.
      
      static struct iscsi_cls_session *
      iscsi_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
                               uint16_t qdepth, uint32_t initial_cmdsn,
                               uint32_t *hostno)
      {
              struct iscsi_cls_session *cls_session;
              struct iscsi_session *session;
              struct Scsi_Host *shost;
              int cmd_i;
              if (ep) {
                      printk(KERN_ERR "iscsi_tcp: invalid ep %p.\n", ep);
                      return NULL;
              }
      
      A        shost = iscsi_host_alloc(&iscsi_sht, 0, qdepth);
      
              if (!shost)
      
                      return NULL;
      
      B         shost->transportt = iscsi_tcp_scsi_transport;
      
              shost->max_lun = iscsi_max_lun;
      
      Please note the scsi host is allocated by invoking isccsi_host_alloc()
      in
      libiscsi.c
      
      Polluting the middle level blank_transport_template in
      iscsi_host_alloc() of
      libiscsi.c
      
      The iscsi_host_alloc() invokes the middle level function
      scsi_host_alloc() in
      hosts.c for allocating a scsi_host. Then the statement marked with C
      assigns
      the iscsi_eh_cmd_timed_out handler to the eh_timed_out callback
      function.
      
      struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
      
                                         int dd_data_size, uint16_t qdepth)
      
      {
              struct Scsi_Host *shost;
              struct iscsi_host *ihost;
              shost = scsi_host_alloc(sht, sizeof(struct iscsi_host) +
      dd_data_size);
              if (!shost)
                      return NULL;
      
       C      shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out;
      
      Please note the shost->transport is the middle level
      blank_transport_template
      as shown in the code segment below. We see two problems here. 1.
      iscsi_eh_cmd_timed_out is installed to the blank_transport_template that
      will
      cause some body else problem. 2. iscsi_eh_cmd_timed_out will never be
      invoked
      when iscsi command gets timeout because the statement B resets the
      pointer.
      
      Middle level blank_transport_template
      
      In the middle level function scsi_host_alloc() of hosts.c, the middle
      level
      assigns a blank_transport_template for those hosts not implementing its
      transport layer. All HBAs without supporting a specific scsi_transport
      will
      share the middle level blank_transport_template. Please see the
      statement D
      
      struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int
      privsize)
      
      {
              struct Scsi_Host *shost;
              gfp_t gfp_mask = GFP_KERNEL;
              int rval;
              if (sht->unchecked_isa_dma && privsize)
                      gfp_mask |= __GFP_DMA;
      
               shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);
              if (!shost)
                      return NULL;
      
              shost->host_lock = &shost->default_lock;
      
              spin_lock_init(shost->host_lock);
      
              shost->shost_state = SHOST_CREATED;
      
              INIT_LIST_HEAD(&shost->__devices);
      
              INIT_LIST_HEAD(&shost->__targets);
      
              INIT_LIST_HEAD(&shost->eh_cmd_q);
      
              INIT_LIST_HEAD(&shost->starved_list);
      
              init_waitqueue_head(&shost->host_wait);
      
              mutex_init(&shost->scan_mutex);
      
              shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy */
      
              shost->dma_channel = 0xff;
      
              /* These three are default values which can be overridden */
      
              shost->max_channel = 0;
      
              shost->max_id = 8;
      
              shost->max_lun = 8;
      
              /* Give each shost a default transportt */
      
       D       shost->transportt = &blank_transport_template;
      
      Why we see panic at iscsi_eh_cmd_timed_out()
      
      The mpp virtual HBA doesn’t have a specific scsi_transport. Therefore,
      the
      blank_transport_template will be assigned to the virtual host of the MPP
      virtual HBA by SCSI middle level. Please note that the statement C has
      assigned
      iscsi-transport eh_timedout handler to the blank_transport_template.
      When a mpp
      virtual command gets timedout, the iscsi_eh_cmd_timed_out() will be
      invoked to
      handle mpp virtual command timeout from the middle level
      scsi_times_out()
      function of the scsi_error.c.
      
      enum blk_eh_timer_return scsi_times_out(struct request *req)
      
      {
      
              struct scsi_cmnd *scmd = req->special;
      
              enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
      
              enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
      
              scsi_log_completion(scmd, TIMEOUT_ERROR);
      
              if (scmd->device->host->transportt->eh_timed_out)
      
       E               eh_timed_out =
      scmd->device->host->transportt->eh_timed_out;
      
              else if (scmd->device->host->hostt->eh_timed_out)
      
                      eh_timed_out = scmd->device->host->hostt->eh_timed_out;
      
              else
      
                      eh_timed_out = NULL;
      
              if (eh_timed_out) {
      
                      rtn = eh_timed_out(scmd);
      
      It is very easy to understand why we get panic in the
      iscsi_eh_cmd_timed_out().
      A scsi_cmnd from a no-iscsi device definitely can not resolve out a
      session and
      session->lock. The panic can be happed anywhere during the differencing.
      
      static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd
      *scmd)
      
      {
      
              struct iscsi_cls_session *cls_session;
      
              struct iscsi_session *session;
      
              struct iscsi_conn *conn;
      
              enum blk_eh_timer_return rc = BLK_EH_NOT_HANDLED;
      
              cls_session = starget_to_session(scsi_target(scmd->device));
      
              session = cls_session->dd_data;
      
              debug_scsi("scsi cmd %p timedout\n", scmd);
      
              spin_lock(&session->lock);
      
      This patch fixes the problem by moving the setting of the
      iscsi_eh_cmd_timed_out to iscsi_add_host, which is after the LLDs
      have set their transport template to shost->transportt.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      308cec14
  3. 25 1月, 2009 1 次提交
  4. 30 12月, 2008 5 次提交
  5. 04 12月, 2008 1 次提交
  6. 13 10月, 2008 7 次提交
  7. 09 10月, 2008 1 次提交
  8. 04 10月, 2008 1 次提交
  9. 12 7月, 2008 15 次提交
  10. 09 5月, 2008 1 次提交
  11. 08 5月, 2008 1 次提交
    • M
      [SCSI] libiscsi regression in 2.6.25: fix nop timer handling · 4cf10435
      Mike Christie 提交于
      The following patch fixes a bug in the iscsi nop processing.
      The target sends iscsi nops to ping the initiator and the
      initiator has to send nops to reply and can send nops to
      ping the target.
      
      In 2.6.25 we moved the nop processing to the kernel to handle
      problems when the userspace daemon is not up, but the target
      is pinging us, and to handle when scsi commands timeout, but
      the transport may be the cause (we can send a nop to check
      the transport). When we added this code we added a bug where
      if the transport timer wakes at the exact same time we are supposed to check
      for a nop timeout we drop the session instead of checking the transport.
      
      This patch checks if a iscsi ping is outstanding and if the ping has
      timed out, to determine if we need to signal a connection problem.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      4cf10435