1. 28 7月, 2010 14 次提交
    • K
      [SCSI] mptfusion: Extra debug prints added relavent to Device missing delay error handling · 213aaca3
      Kashyap, Desai 提交于
      Adding function name in original debug prints and few more debug prints are
      added.
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      213aaca3
    • K
      [SCSI] mptfusion: Block Error handling for deleting devices or Device in DMD · c9de7dc4
      Kashyap, Desai 提交于
      Issue description:
      In multipath topology, when device deletion is in transient state,
      multipath driver can call blk_flush_queue() as part of path failure.
      Before device get deleted from OS, Device may go OFFLINE as part of error
      handling kicked off triggered from multipathing driver. Above condition hits
      more frequently if device missing delay timer (which is LSI specific firmware
      parameter) is non zero value.
      
      root cause of this issue is Error handling thread is getting kicked off for
      device which is not really present(in transient state of deleting).
      
      This patch has solution for this issue. driver is now using eh_timed_out
      callback. See below.
      
      mptsas_transport_template->eh_timed_out = mptsas_eh_timed_out
      
      Using mptsas_eh_timed_out function, driver can decide weather vdevice is
      under Device missing delay or deleting state.
      
      for either of those cases, there is BLK_EH_RESET_TIMER return to scsi mid
      and error handling thread will not be kicked off for that particular scsi
      command.
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      c9de7dc4
    • K
      [SCSI] mptfusion: print Doorbell register in a case of hard reset and timeout · 97009a29
      Kei Tokunaga 提交于
      Printing Doorbell register in a case of hard reset and timeout
      should be useful for figuring out the state of the system.
      Signed-off-by: NKei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
      Acked-by: N"Desai, Kashyap" <Kashyap.Desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      97009a29
    • K
      [SCSI] mptsas: fixed hot-removal processing · 3e84beba
      Kei Tokunaga 提交于
      This patch fixes mptsas disk hot-removal processing.  The
      hot-removal processing doesn't complete because of this condition.
      
        drivers/message/fusion/mptsas.c:
        mptsas_taskmgmt_complete()
      
        if ((mptsas_find_vtarget(ioc, channel, id)) && !ioc->fw_events_off)
          mptsas_queue_device_delete(...);
      
      mptsas_queue_device_delete(), which must be called for
      hot-removal, never gets called because mptsas_find_vtarget()
      always returns 0 here.  At that time, the vtarget has already
      been freed in mptsas_target_destroy(), and also the scsi_device
      has been marked as SDEV_DEL.
      
      As a result of the issue, port deletion functions won't get
      called and the device ends up being in an incomplete state.
      (Some data structures and sysfs entries, which should be
      removed in hot-removal, remain.)  One side effect of this is
      that a hot-addition of the device (bringing the device back
      on) fails.
      
      This patch just removes mptsas_find_vtarget() from the if-state
      condition.
      Signed-off-by: NKei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
      Acked-by: N"Desai, Kashyap" <Kashyap.Desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      3e84beba
    • B
      [SCSI] mpt fusion: Cleanup some duplicate calls in mptbase.c · 15f7fc06
      Bandan Das 提交于
      In mpt_detach, call to pci_set_drvdata is redundant because it
      has already been called in mpt_adapter_disable. In mpt_attach,
      ioc->pcidev is set to pdev two times.
      Signed-off-by: NBandan Das <bandan.das@stratus.com>
      Acked-by: N"Desai, Kashyap" <Kashyap.Desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      15f7fc06
    • K
      [SCSI] mptfusion: Bump version 03.04.16 · c817ce84
      Kashyap, Desai 提交于
      Upgrade driver version to 3.4.16
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      c817ce84
    • K
      [SCSI] mptfusion: Added missing reset for ioc_reset_in_progress in SoftReset · b9a0f872
      Kashyap, Desai 提交于
      Added missing part which will reset ioc_reset_in_progress before returning from SoftResetHandler.
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      b9a0f872
    • K
      [SCSI] mptfusion: Added code for occationally SATA hotplug failure. · cc7e9f5f
      Kashyap, Desai 提交于
      Issue: SATA hotplug does not work sometimes.
      At the time of ADD device/ADD phys disk, drive may fail to add SATA device
      due to temporary SAS Address for SATA device generated by firmware. Final
      SAS address for SATA driver will be generated only after disk spinup is
      done. This may take some times for slow spining SATA drives.
      
      At phy link up driver gets attached device sas address and stores into
      phyinfo. At the time of ADD event driver will read sas device page0 using
      channel and FW ID provided in ADD Device event. Here in case of SATA drives,
      driver will see miss match in phyinfo->sas_address and latest sas address
      read from SAS DEVICE PAGE0 and eventually device won't be added to OS.
      
      Fix:
      When Driver read SAS DEVICE PAGE0, it can identify Device type looking at
      device_info. If device is SATA drive and sas address mismatch happens,
      Driver will do same stuffs which happened at the time of LINK UP to get
      correct piece of information from Pages. ( Find parent device and refresh
      parent device phys either HBA refresh/Exp refresh)
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      cc7e9f5f
    • K
      [SCSI] mptfusion: schedule_target_reset from all Reset context · b68bf096
      Kashyap, Desai 提交于
      Issue:
      target reset will be queued to driver's internal queue to get schedule
      later. When driver add target into internal target_reset queue we will block IOs
      on those target using scsi midlayer API. Now due to some cause driver is not
      executing those target_reset list and it is always in block state.
      
      Changes:
      now we are clearing target_reset queue from all other Callback context
      instead of only DeviceReset context.Now wherever driver is clearing
      taskmgmt_in_progress flag it is considering target_reset queue cleanup
      also.
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      b68bf096
    • K
      [SCSI] mptfusion: Added sanity to check B_T mapping for device before adding to OS · 51106ab5
      Kashyap, Desai 提交于
      Added sanity check before treating any device is a valid device.
      It is possible that firmware can have device page0 in its table, but that
      devicemay not be available in topology. Device will be available in topology
      only if there is Bus Target mapping is done in firmware. Driver will always
      check B_T mapping of firmware before reporting device to upper layer.
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      51106ab5
    • K
      [SCSI] mptfusion: Corrected declaration of device_missing_delay · aca794dd
      Kashyap, Desai 提交于
      device missing delay is 8 bit value in io unit pg1. Making correct variable
      declaration for device_missing_delay.
      
      The driver is storing the calculated device missing delay in IOC structure
      as a u8 instead of a u16. It needs to be a u16 if the delay is > 255.
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      aca794dd
    • K
      [SCSI] mptfusion: Use DID_TRANSPORT_DISRUPTED instead of DID_BUS_BUSY · 4d069566
      Kashyap, Desai 提交于
      Changed the return value for Nexus Loss IOs to be DID_TRANSPORT_DISRUPTED.
      What this will allow is the multi-path driver to delay the fail over
      process. They would like the path to keep up as long as the nexus loss
      Loginfo is return from firmware. With DID_BUS_BUSY the path fails over
      immediately.
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      4d069566
    • K
      [SCSI] mptfusion: Set fw_events_off to 1 at driver load time. · 8ce13de2
      Kashyap, Desai 提交于
      fw_events_off is flag checking for driver to do Event handling or not.
      Normally it should be OFF at the time of initialization. Only enable it at
      the time of INTR enable of device first time. This will always occur only
      after resource allocation.
      
      ioc->fw_events_off = 1 is set in mpt_attach()
      Signed-off-by: NKashyap Desai <kashyap.desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      8ce13de2
    • R
      [SCSI] mptsas: fix hangs caused by ATA pass-through · 2a1b7e57
      Ryan Kuester 提交于
      I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
      pass-through commands, in particular by smartctl.
      
      First, my version of the symptoms.  On an LSI SAS1068E B3 HBA running
      01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
      occasional task, bus, and host resets, some of which lead to hard faults of
      the HBA requiring a reboot.  Abusively looping the smartctl command,
      
          # while true; do smartctl -a /dev/sdb > /dev/null; done
      
      dramatically increases the frequency of these failures to nearly one per
      minute.  A high IO load through the HBA while looping smartctl seems to
      improve the chance of a full scsi host reset or a non-recoverable hang.
      
      I reduced what smartctl was doing down to a simple test case which
      causes the hang with a single IO when pointed at the sd interface.  See
      the code at the bottom of this e-mail.  It uses an SG_IO ioctl to issue
      a single pass-through ATA identify device command.  If the buffer
      userspace gives for the read data has certain alignments, the task is
      issued to the HBA but the HBA fails to respond.  If run against the sg
      interface, neither the test code nor smartctl causes a hang.
      
      sd and sg handle the SG_IO ioctl slightly differently.  Unless you
      specifically set a flag to do direct IO, sg passes a buffer of its own,
      which is page-aligned, to the block layer and later copies the result
      into the userspace buffer regardless of its alignment.  sd, on the other
      hand, always does direct IO unless the userspace buffer fails an
      alignment test at block/blk-map.c line 57, in which case a page-aligned
      buffer is created and used for the transfer.
      
      The alignment test currently checks for word-alignment, the default
      setup by scsi_lib.c; therefore, userspace buffers of almost any
      alignment are given directly to the HBA as DMA targets.  The LSI 1068
      hardware doesn't seem to like at least a couple of the alignments which
      cross a page boundary (see the test code below).  Curiously, many
      page-boundary-crossing alignments do work just fine.
      
      So, either the hardware has an bug handling certain alignments or the
      hardware has a stricter alignment requirement than the driver is
      advertising.  If stricter alignment is required, then in no case should
      misaligned buffers from userspace be allowed through without being
      bounced or at least causing an error to be returned.
      
      It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
      a stricter alignment requirement.  If it does, sd does the right thing and
      bounces misaligned buffers (see block/blk-map.c line 57).  The following
      patch to 2.6.34-rc5 makes my symptoms go away.  I'm sure this is the wrong
      place for this code, but it gets my idea across.
      Acked-by: N"Desai, Kashyap" <Kashyap.Desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      2a1b7e57
  2. 28 5月, 2010 1 次提交
  3. 24 5月, 2010 1 次提交
  4. 26 4月, 2010 1 次提交
  5. 11 4月, 2010 13 次提交
  6. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  7. 26 2月, 2010 2 次提交
  8. 18 2月, 2010 1 次提交
    • M
      [SCSI] fusion: hold off error recovery while alternate ioc is initializing · 03cb3829
      Michael Reed 提交于
      After discussing this patch with LSI, I resubmitting with a recommended
      40 second wait for the alternate ioc's initialization to complete.
      --
      Fusion FC chips are two function with some shared resources.  During
      initialization of one function its driver inhibits the ability of the
      other function's driver to allocate message frames by clearing its
      "active" flag.  Should mid-layer error recovery be initiated for a
      scsi command during this initialization (which can take up to 40 seconds)
      error recovery will escalate to the level of host reset.  This host
      reset might fail (as the other function is resetting) resulting in
      all connected targets being taken offline.
      
      This patch holds off mid-layer error recovery for up to 40 seconds
      to permit initialization of the other function to complete.
      Signed-off-by: NMichael Reed <mdr@sgi.com>
      Acked-by: N"Desai, Kashyap" <Kashyap.Desai@lsi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      03cb3829
  9. 09 2月, 2010 1 次提交
  10. 05 2月, 2010 1 次提交
  11. 19 1月, 2010 4 次提交