1. 20 7月, 2012 1 次提交
  2. 01 3月, 2012 1 次提交
    • D
      [SCSI] libsas: fix sas_find_local_phy(), take phy references · f41a0c44
      Dan Williams 提交于
      In the direct-attached case this routine returns the phy on which this
      device was first discovered.  Which is broken if we want to support
      wide-targets, as this phy reference can become stale even though the
      port is still active.
      
      In the expander-attached case this routine tries to lookup the phy by
      scanning the attached sas addresses of the parent expander, and BUG_ONs
      if it can't find it.  However since eh and the libsas workqueue run
      independently we can still be attempting device recovery via eh after
      libsas has recorded the device as detached.  This is even easier to hit
      now that eh is blocked while device domain rediscovery takes place, and
      that libata is fed more timed out commands increasing the chances that
      it will try to recover the ata device.
      
      Arrange for dev->phy to always point to a last known good phy, it may be
      stale after the port is torn down, but it will catch up for wide port
      reconfigurations, and never be NULL.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      f41a0c44
  3. 20 2月, 2012 2 次提交
    • D
      [SCSI] libsas: remove ata_port.lock management duties from lldds · 312d3e56
      Dan Williams 提交于
      Each libsas driver (mvsas, pm8001, and isci) has invented a different
      method for managing the ap->lock.  The lock is held by the ata
      ->queuecommand() path.  mvsas drops it prior to acquiring any internal
      locks which allows it to hold its internal lock across calls to
      task->task_done().  This capability is important as it is the only way
      the driver can flush task->task_done() instances to guarantee that it no
      longer has any in-flight references to a domain_device at
      ->lldd_dev_gone() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      312d3e56
    • D
      [SCSI] libsas: introduce sas_drain_work() · b1124cd3
      Dan Williams 提交于
      When an lldd invokes ->notify_port_event() it can trigger a chain of libsas
      events to:
      
        1/ form the port and find the direct attached device
      
        2/ if the attached device is an expander perform domain discovery
      
      A call to flush_workqueue() will only flush the initial port formation work.
      Currently libsas users need to call scsi_flush_work() up to the max depth of
      chain (which will grow from 2 to 3 when ata discovery is moved to its own
      discovery event).  Instead of open coding multiple calls switch to use
      drain_workqueue() to flush sas work.
      
      drain_workqueue() does not handle new work submitted during the drain so
      libsas needs a bit of infrastructure to hold off unchained work submissions
      while a drain is in flight.  A lldd ->notify() event is considered 'unchained'
      while a sas_discover_event() is 'chained'.  As Tejun notes:
      
        "For now, I think it would be best to add private wrapper in libsas to
         support deferring unchained work items while draining."
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      b1124cd3
  4. 19 2月, 2012 3 次提交
    • M
      [SCSI] pm8001: deficient responses to IO_XFER_ERROR_BREAK and IO_XFER_OPEN_RETRY_TIMEOUT · 5954d738
      Mark Salyzyn 提交于
      IO_XFER_ERROR_BREAK and IO_XFER_OPEN_RETRY_TIMEOUT are deficient of the
      required actions as outlined in the programming manual for the pm8001. Due to
      the overlapping code requirements of these recovery responses, we found it
      necessary to bundle them together into one patch.
      
      When a break is received during the command phase (ssp_completion), this is a
      result of a timeout or interruption on the bus. Logic suggests that we should
      retry the command.
      
      When a break is received during the data-phase (ssp_event), the task must be
      aborted on the target or it will retain a data-phase lock turning the target
      reticent to all future media commands yet will successfully respond to TUR,
      INQUIRY and ABORT leading eventually to target failure through several
      abort-cycle loops.
      
      The open retry interval is exceedingly short resulting in occasional target
      drop-off during expander resets or when targets push-back during bad-block
      remapping. Increased effective timeout from 130ms to 1.5 seconds for each try
      so as to trigger after the administrative inquiry/tur timeout in the scsi
      subsystem to keep error-recovery harmonics to a minimum.
      
      When an open retry timeout event is received, the action required by the
      targets is to issue an abort for the outstanding command then logic suggests
      we retry the command as this state is usually an indication of a credit block
      or busy condition on the target.
      
      We hijacked the pm8001_handle_event work queue handler so that it will handle
      task as an argument instead of device for the workers in support of the
      deferred handling outlined above.
      
      Moderate to Heavy bad-path testing on a 2.6.32 vintage kernel, compile-testing
      on scsi-misc-2.6 kernel ...
      Signed-off-by: NMark Salyzyn <mark_salyzyn@xyratex.com>
      Acked-by: NJack Wang <jack_wang@usish.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      5954d738
    • M
      [SCSI] pm8001: Add FUNC_GET_EVENTS · d95d0001
      Mark Salyzyn 提交于
      Jack noticed I dropped a patch fragment associated with a flags automatic
      variable in mpi_set_phys_g3_with_ssc (ooops) and that the pre-emptive locking
      that piggy-backed this patch was not in-fact necessary because of underlying
      atomic accesses to the hardware. Here is the updated patch fixing these two
      issues.
      
      The pm8001 driver is missing the FUNC_GET_EVENTS handler in the phy control
      function. Since the pm8001_bar4_shift function was not designed to be called
      at runtime, added locking surrounding the adjustment for all accesses.
      Signed-off-by: NMark Salyzyn <mark_salyzyn@xyratex.com>
      Acked-by: NJack Wang <jack_wang@usish.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      d95d0001
    • M
      [SCSI] pm8001: fix lockup on phy_control hard reset. · 5c4fb76a
      Mark Salyzyn 提交于
      pm8001_phy_control PHY_FUNC_HARD_RESET locks up on second try via
      smp_phy_control because response HW_EVENT_PHY_START_STATUS fails to complete
      previous command. The PM8001F_RUN_TIME flag is not treated as a bit, but a
      state in all readers, yet once we are operational or in the run time state,
      the flags use a bit-set operation.
      Signed-off-by: NMark Salyzyn <mark_salyzyn@xyratex.com>
      Acked-by: NJack Wang <jack_wang@usish.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      5c4fb76a
  5. 03 10月, 2011 4 次提交
    • D
      [SCSI] isci: export phy events via ->lldd_control_phy() · ac013ed1
      Dan Williams 提交于
      Allow the sas-transport-class to update events for local phys via a new
      PHY_FUNC_GET_EVENTS command to ->lldd_control_phy().  Fixup drivers that
      are not prepared for new enum phy_func values, and unify
      ->lldd_control_phy() error codes.
      
      These are the SAS defined phy events that are reported in a
      smp-report-phy-error-log command:
       * /sys/class/sas_phy/<phyX>/invalid_dword_count
       * /sys/class/sas_phy/<phyX>/running_disparity_error_count
       * /sys/class/sas_phy/<phyX>/loss_of_dword_sync_count
       * /sys/class/sas_phy/<phyX>/phy_reset_problem_count
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      ac013ed1
    • M
      [SCSI] pm8001: missing break statements · 6fbc7692
      Mark Salyzyn 提交于
      Code Inspection: found two missing break directives. First one will
      result in not retrying an a task that report
      IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY, the second will result in cosmetic
      debug printk conflicting statement stutter. Because checkpatch.pl came
      up with a warning regarding unnecessary space before a newline on one of
      the fragments associated with the diff context, I took the liberty of
      fixing all the cases of this issue in the pair of files touched by this
      defect. These cosmetic changes hide the break changes :-(
      
      To help focus, break changes are in pm8001_hwi.c fragment line 1649 for
      the IO_OPEN_CNX_ERROR_HW_RESOURCE_BUSY case statement and pm8001_sas.c
      line 1000 deals with the conflicting debug print stutter.
      Signed-off-by: NMark Salyzyn <mark_salyzyn@us.xyratex.com>
      Acked-by: NJack Wang <jack_wang@usish.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      6fbc7692
    • M
      [SCSI] pm8001: fix DEV_IS_GONE infinite retry · b90b378a
      Mark Salyzyn 提交于
      On the pm8001, when a device is in the process of going away (device
      power off or hot plug), depending on the timing, the driver would return
      SAS_PHY_DOWN as the return value to the queuecommand DEV_IS_GONE logic.
      The net result is an near infinite retry (especially if SAS debugging is
      enabled), the logs will fill with:
      
      kernel: mpi_ssp_completion 2119:e21:SSP IO status 0x13 tag 0xcc1c0000
      dlen=90 param=0xe
      kernel: wwn=5000c50034069e86  cdb=12 00 00 00 5a 00 00 00 00 00 00 00 00
      00 00 00
      kernel: sas: lldd_execute_task returned: 138
      kernel: sas: lldd_execute_task returned: 138
      kernel: sas: lldd_execute_task returned: 138
      kernel: sas: lldd_execute_task returned: 138
      kernel: sas: lldd_execute_task returned: 138
      kernel: sas: lldd_execute_task returned: 138
      kernel: sas: lldd_execute_task returned: 138
      . . .
      
      This patch changes to leverage the port_attached logic to complete the
      command with a status of PHY_DOWN so that the disposition can be handled
      immediately and correctly.
      Signed-off-by: NMark Salyzyn <mark_salyzyn@us.xyratex.com>
      Acked-by: NJack Wang <jack_wang@usish.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      b90b378a
    • D
      [SCSI] pm8001: remove pm8001_slave_{alloc|configure} · 11e16364
      Dan Williams 提交于
      libsas handles:
      1/ limiting ata scanning to lun0
      2/ changes to /sys/block/<sdX>/device/queue_depth for ata devices
      
      libata handles turning off ncq globally via kernel command line
      (libata.force=noncq) or sysfs (echo 1 >
      /sys/block/<sdX>/device/queue_depth).  A lldd specific compile option is
      not necessary.
      
      Cc: Jack Wang <jack_wang@usish.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      11e16364
  6. 27 8月, 2011 1 次提交
  7. 28 7月, 2010 1 次提交
  8. 02 5月, 2010 1 次提交
  9. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  10. 09 2月, 2010 1 次提交
  11. 11 12月, 2009 5 次提交
  12. 05 12月, 2009 3 次提交