1. 11 2月, 2009 10 次提交
    • A
      [SCSI] qla2xxx: Remove interrupt request bit check in the response processing path in multiq mode. · 618a7523
      Anirban Chakraborty 提交于
      Correct response-queue-0 processing by instructing the firmware
      to run with interrupt-handshaking disabled, similarly to what is
      now done for all non-0 response queues.  Since all
      response-queues now run in the same mode, the driver no longer
      needs the hot-path 'is-disabled-HCCR' test.
      Signed-off-by: NAnirban Chakraborty <anirban.chakraborty@qlogic.com>
      Signed-off-by: NAndrew Vasquez <andrew.vasquez@qlogic.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      618a7523
    • J
      [SCSI] lpfc: introduce missing kfree · e916141c
      Julia Lawall 提交于
      Error handling code following a kmalloc should free the allocated data.
      
      The semantic match that finds the problem is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @r exists@
      local idexpression x;
      statement S;
      expression E;
      identifier f,l;
      position p1,p2;
      expression *ptr != NULL;
      @@
      
      (
      if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
      |
      x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
      ...
      if (x == NULL) S
      )
      <... when != x
           when != if (...) { <+...x...+> }
      x->f = E
      ...>
      (
       return \(0\|<+...x...+>\|ptr\);
      |
       return@p2 ...;
      )
      
      @script:python@
      p1 << r.p1;
      p2 << r.p2;
      @@
      
      print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Acked-by: NJames Smart <james.smart@emulex.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      e916141c
    • M
      [SCSI] libiscsi: Fix scsi command timeout oops in iscsi_eh_timed_out · 308cec14
      Mike Christie 提交于
      Yanling Qi from LSI found the root cause of the panic, below is his
      analysis:
      
      Problem description: the open iscsi driver installs eh_timed_out handler
      to the
      blank_transport_template of the scsi middle level that causes panic of
      timed
      out command of other host
      
      Here are the details
      
      Iscsi Session creation
      
      During iscsi session creation time, the iscsi_tcp_session_create() of
      iscsi_tpc.c will create a scsi-host for the session. See the statement
      marked
      with the label A. The statement B replaces the shost->transportt point
      with a
      local struct variable.
      
      static struct iscsi_cls_session *
      iscsi_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
                               uint16_t qdepth, uint32_t initial_cmdsn,
                               uint32_t *hostno)
      {
              struct iscsi_cls_session *cls_session;
              struct iscsi_session *session;
              struct Scsi_Host *shost;
              int cmd_i;
              if (ep) {
                      printk(KERN_ERR "iscsi_tcp: invalid ep %p.\n", ep);
                      return NULL;
              }
      
      A        shost = iscsi_host_alloc(&iscsi_sht, 0, qdepth);
      
              if (!shost)
      
                      return NULL;
      
      B         shost->transportt = iscsi_tcp_scsi_transport;
      
              shost->max_lun = iscsi_max_lun;
      
      Please note the scsi host is allocated by invoking isccsi_host_alloc()
      in
      libiscsi.c
      
      Polluting the middle level blank_transport_template in
      iscsi_host_alloc() of
      libiscsi.c
      
      The iscsi_host_alloc() invokes the middle level function
      scsi_host_alloc() in
      hosts.c for allocating a scsi_host. Then the statement marked with C
      assigns
      the iscsi_eh_cmd_timed_out handler to the eh_timed_out callback
      function.
      
      struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
      
                                         int dd_data_size, uint16_t qdepth)
      
      {
              struct Scsi_Host *shost;
              struct iscsi_host *ihost;
              shost = scsi_host_alloc(sht, sizeof(struct iscsi_host) +
      dd_data_size);
              if (!shost)
                      return NULL;
      
       C      shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out;
      
      Please note the shost->transport is the middle level
      blank_transport_template
      as shown in the code segment below. We see two problems here. 1.
      iscsi_eh_cmd_timed_out is installed to the blank_transport_template that
      will
      cause some body else problem. 2. iscsi_eh_cmd_timed_out will never be
      invoked
      when iscsi command gets timeout because the statement B resets the
      pointer.
      
      Middle level blank_transport_template
      
      In the middle level function scsi_host_alloc() of hosts.c, the middle
      level
      assigns a blank_transport_template for those hosts not implementing its
      transport layer. All HBAs without supporting a specific scsi_transport
      will
      share the middle level blank_transport_template. Please see the
      statement D
      
      struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int
      privsize)
      
      {
              struct Scsi_Host *shost;
              gfp_t gfp_mask = GFP_KERNEL;
              int rval;
              if (sht->unchecked_isa_dma && privsize)
                      gfp_mask |= __GFP_DMA;
      
               shost = kzalloc(sizeof(struct Scsi_Host) + privsize, gfp_mask);
              if (!shost)
                      return NULL;
      
              shost->host_lock = &shost->default_lock;
      
              spin_lock_init(shost->host_lock);
      
              shost->shost_state = SHOST_CREATED;
      
              INIT_LIST_HEAD(&shost->__devices);
      
              INIT_LIST_HEAD(&shost->__targets);
      
              INIT_LIST_HEAD(&shost->eh_cmd_q);
      
              INIT_LIST_HEAD(&shost->starved_list);
      
              init_waitqueue_head(&shost->host_wait);
      
              mutex_init(&shost->scan_mutex);
      
              shost->host_no = scsi_host_next_hn++; /* XXX(hch): still racy */
      
              shost->dma_channel = 0xff;
      
              /* These three are default values which can be overridden */
      
              shost->max_channel = 0;
      
              shost->max_id = 8;
      
              shost->max_lun = 8;
      
              /* Give each shost a default transportt */
      
       D       shost->transportt = &blank_transport_template;
      
      Why we see panic at iscsi_eh_cmd_timed_out()
      
      The mpp virtual HBA doesn’t have a specific scsi_transport. Therefore,
      the
      blank_transport_template will be assigned to the virtual host of the MPP
      virtual HBA by SCSI middle level. Please note that the statement C has
      assigned
      iscsi-transport eh_timedout handler to the blank_transport_template.
      When a mpp
      virtual command gets timedout, the iscsi_eh_cmd_timed_out() will be
      invoked to
      handle mpp virtual command timeout from the middle level
      scsi_times_out()
      function of the scsi_error.c.
      
      enum blk_eh_timer_return scsi_times_out(struct request *req)
      
      {
      
              struct scsi_cmnd *scmd = req->special;
      
              enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
      
              enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
      
              scsi_log_completion(scmd, TIMEOUT_ERROR);
      
              if (scmd->device->host->transportt->eh_timed_out)
      
       E               eh_timed_out =
      scmd->device->host->transportt->eh_timed_out;
      
              else if (scmd->device->host->hostt->eh_timed_out)
      
                      eh_timed_out = scmd->device->host->hostt->eh_timed_out;
      
              else
      
                      eh_timed_out = NULL;
      
              if (eh_timed_out) {
      
                      rtn = eh_timed_out(scmd);
      
      It is very easy to understand why we get panic in the
      iscsi_eh_cmd_timed_out().
      A scsi_cmnd from a no-iscsi device definitely can not resolve out a
      session and
      session->lock. The panic can be happed anywhere during the differencing.
      
      static enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd
      *scmd)
      
      {
      
              struct iscsi_cls_session *cls_session;
      
              struct iscsi_session *session;
      
              struct iscsi_conn *conn;
      
              enum blk_eh_timer_return rc = BLK_EH_NOT_HANDLED;
      
              cls_session = starget_to_session(scsi_target(scmd->device));
      
              session = cls_session->dd_data;
      
              debug_scsi("scsi cmd %p timedout\n", scmd);
      
              spin_lock(&session->lock);
      
      This patch fixes the problem by moving the setting of the
      iscsi_eh_cmd_timed_out to iscsi_add_host, which is after the LLDs
      have set their transport template to shost->transportt.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      308cec14
    • S
      [SCSI] qla2xxx: fix Kernel Panic with Qlogic 2472 Card. · 7f977ddd
      Shyam_Iyer@Dell.com 提交于
      Kernel Panic is observed with a Qlogic 2472 Card is plugged into the
      system and the qla2xxx driver is loaded:
      
      QLogic Fibre Channel HBA Driver: 8.02.01.02.11.0-k9
      vendor=8086 device=3410
      qla2xxx 0000:05:00.0: PCI INT A -> GSI 40 (level, low) -> IRQ 40
      qla2xxx 0000:05:00.0: Found an ISP2432, irq 40, iobase
      0xffffc2001091c000
      qla2xxx 0000:05:00.0: Configuring PCI space...
      qla2xxx 0000:05:00.0: setting latency timer to 64
      qla2xxx 0000:05:00.0: Configure NVRAM parameters...
      BUG: unable to handle kernel NULL pointer dereference at
      0000000000000000
      IP: [<ffffffff8036319a>] strncpy+0x5/0x1e
      PGD 7c564067 PUD 78d8c067 PMD 0
      Oops: 0000 [1] SMP
      last sysfs file:
      /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-2/6-2:1.1/input/input4/event
      4/dev
      CPU 1
      Modules linked in: qla2xxx(+) squashfs usb_storage scsi_transport_fc
      scsi_tgt parport_pc parport arc4 ecb crypto_blkcipher acpi_cpufreq fan
      loop nfs nfs_acl lockd sunrpc nls_iso8859_1 nls_cp437 ipv6 af_packet st
      sr_mod ide_disk ide_cd_mod ide_core cdrom usbhid hid ff_memless sg
      sd_mod crc_t10dif uhci_hcd mptsas mptscsih ehci_hcd mptbase
      scsi_transport_sas rtc_cmos rtc_core rtc_lib usbcore scsi_mod thermal
      bnx2 button processor thermal_sys hwmon edd
      Supported: Yes
      Pid: 4415, comm: insmod Not tainted 2.6.27.13-1-default #1
      RIP: 0010:[<ffffffff8036319a>] [<ffffffff8036319a>] strncpy+0x5/0x1e
      RSP: 0018:ffff88007b04fbc0 EFLAGS: 00010202
      RAX: 00000000000000b7 RBX: ffff88007b9641e0 RCX: ffff88007c1b2ad7
      RDX: 000000000000004f RSI: 0000000000000000 RDI: ffff88007c1b2ad7
      RBP: ffff88007c1b0620 R08: 0000000000000010 R09: 0000000100000000
      R10: 0000000000000046 R11: ffffffff803651c6 R12: ffff88007b074000
      R13: ffff88007b964000 R14: ffff88007c1b2ac6 R15: 0000000000000000
      FS: 00007f91a6c366f0(0000) GS:ffff88007dbeee40(0000)
      knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 000000007bd7c000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process insmod (pid: 4415, threadinfo ffff88007b04e000, task
      ffff880078586180)
      Stack: ffffffffa02d82c4 0000000000002432 ffff88007d385000
      ffff88007c1b0620
      ffff88007c1b0620 ffff88007c1b0000 ffff88007d385000 0000000000002432
      ffffffffa02dcb1e 0000000000002432 ffffc2001091c000 ffff88007c1b0620
      Call Trace:
      [<ffffffffa02d82c4>] qla24xx_nvram_config+0x385/0x6c2 [qla2xxx]
      [<ffffffffa02dcb1e>] qla2x00_initialize_adapter+0x169/0x383 [qla2xxx]
      [<ffffffffa02f2040>] qla2x00_probe_one+0x6bc/0x9c6 [qla2xxx]
      [<ffffffff8037346f>] pci_device_probe+0xb8/0x105
      [<ffffffff803e5a27>] really_probe+0xdd/0x1e5
      [<ffffffff803e5c14>] __driver_attach+0x46/0x6d
      [<ffffffff803e51e1>] bus_for_each_dev+0x44/0x78
      [<ffffffff803e4ac7>] bus_add_driver+0xef/0x235
      [<ffffffff803e5dd8>] driver_register+0xa2/0x11f
      [<ffffffff803736fd>] __pci_register_driver+0x5d/0x90
      [<ffffffffa0308126>] qla2x00_module_init+0x126/0x159 [qla2xxx]
      [<ffffffff80209041>] _stext+0x41/0x110
      [<ffffffff80260abd>] sys_init_module+0xa0/0x1ba
      [<ffffffff8020bfbb>] system_call_fastpath+0x16/0x1b
      [<00007f91a679b76a>] 0x7f91a679b76a
      Code: ff c1 41 39 c0 75 05 45 85 c0 75 bf 41 29 c0 44 89 c0 c3 31 d2 8a
      04 16 88 04 17 48 ff c2 84 c0 75 f3 48 89 f8 c3 48 89 f9 eb 10 <8a> 06
      3c 01 88 01 48 83 de ff 48 ff c1 48 ff ca 48 85 d2 75 eb
      RIP [<ffffffff8036319a>] strncpy+0x5/0x1e
      RSP <ffff88007b04fbc0>
      CR2: 0000000000000000
      ---[ end trace 829d7d78dfafb785 ]---
      
      The attached patch fixes the issue.
      Signed-off-by: NShyam Iyer <shyam_iyer@dell.com>
      Acked-by: NSeokmann Ju <Seokmann.ju@qlogic.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      7f977ddd
    • B
      [SCSI] ibmvfc: Increase cancel timeout · 14ae6fac
      Brian King 提交于
      During cancel testing it has been shown that 15 seconds is not
      nearly long enough for the VIOS to respond to a cancel under
      loaded situations. Increasing this timeout to 60 seconds allows
      time for the VIOS to cancel the outstanding commands and prevents
      us from escalating to a full host reset, which can take much longer.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      14ae6fac
    • B
      [SCSI] ibmvfc: Fix rport relogin · 0883e3b3
      Brian King 提交于
      The ibmvfc driver has a bug in its SCN handling. If it receives
      an ELS event such asn an N-Port SCN event or an unsolicited PLOGI,
      or any other SCN event which causes ibmvfc_reinit_host to be called,
      it is possible that we will call fc_remote_port_add for a target
      that already has an rport added, which can result in duplicate
      rports getting created for the same targets. Fix this by calling
      fc_remote_port_rolechg in this scenario instead to report any possible
      role change that may have occurred.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      0883e3b3
    • B
      [SCSI] ibmvfc: Fix command timeout errors · d4b17a20
      Brian King 提交于
      Currently the ibmvfc driver sets the IBMVFC_CLASS_3_ERR flag
      in the VFC Frame if both the adapter and the device claim support
      for Class 3. However, this bit actually refers to Class 3 Error
      Recovery, which is currently not supported by the VIOS. Setting this
      bit can cause lots of command timeout responses from the VIOS resulting
      in general instability. Fix this by never setting this bit.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      d4b17a20
    • M
      [SCSI] sg: fix device number in blktrace data · 76e3a19d
      Martin Peschke 提交于
      Hi,
      
      we have run into an issue with blktrace being started for sg devices.
      Please apply.
      
      Thanks,
      Martin
      
      From: Martin Peschke <mpeschke@linux.vnet.ibm.com>
      
      The device number denoting a generic SCSI devices (sg) in a blktrace
      trace is broken; major and minor are always 0. It looks like
      sdp->device->sdev_gendev.devt is not initialized properly.
      The fix below uses other data to make up a valid device number,
      similar to the way an sg device number is generated for sysfs output.
      Reported-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NMartin Peschke <mpeschke@linux.vnet.ibm.com>
      Acked-by: NDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      76e3a19d
    • J
      [SCSI] scsi_scan: add missing interim SDEV_DEL state if slave_alloc fails · c2f9e49f
      James Smart 提交于
      We were running i/o and performing a bunch of hba resets in a loop.
      This forces a lot of target removes and then rescans. Since the
      resets are occuring during scan it's causing the scan i/o to timeout,
      invoking error recovery, etc.  We end up getting some nasty crashing
      in scsi_scan.c due to references to old sdevs that are failing
      but had some lingering references that kept them around.
      
      Fix by setting device state to SDEV_DEL if the LLD's slave_alloc
      fails.
      Signed-off-by: NJames Smart <james.smart@emulex.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      c2f9e49f
    • R
      [SCSI] ibmvscsi: Correct DMA mapping leak · e637d553
      Robert Jennings 提交于
      The ibmvscsi client driver is not unmapping the SCSI command after
      encountering a DMA mapping error while trying to map an indirect
      scattergather list for the event pool.  This leads to a leak of DMA
      entitlement that could result in the device failing future DMA operations
      in a CMO environment.
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      e637d553
  2. 10 2月, 2009 30 次提交