1. 13 10月, 2007 7 次提交
  2. 20 7月, 2007 7 次提交
    • T
      libata: implement EH fast drain · 5ddf24c5
      Tejun Heo 提交于
      In most cases, when EH is scheduled, all in-flight commands are
      aborted causing EH to kick in immediately.  However, in some cases
      (especially with PMP), it's unclear which commands are affected by the
      error condition and although aborting all in-flight commands work, it
      isn't optimal and may cause unnecessary disruption.  On the other
      hand, waiting for in-flight commands to drain themselves can take up
      to 30seconds.
      
      This patch implements EH fast drain to handle such situations.  It
      gives in-flight commands some time to finish up but doesn't wait for
      too long.  After EH is scheduled, fast drain timer is started and if
      no other completion occurs in ATA_EH_FASTDRAIN_INTERVAL all in-flight
      commands are aborted.  If any completion occurred in the interval, the
      port is given another interval to finish up itself.
      
      Currently ATA_EH_FASTDRAIN_INTERVAL is 3 secs which should be enough
      for finishing up most commands.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      5ddf24c5
    • T
      libata: schedule probing after SError access failure during autopsy · 4e57c517
      Tejun Heo 提交于
      If SError isn't accessible, EH can't tell whether hotplug has happened
      or not.  Report SError read failure with AC_ERR_OTHER and schedule
      probing with hardreset.  This will be mainly useful for PMPs.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      4e57c517
    • T
      libata: clear HOTPLUG flag after a reset · fccb6ea5
      Tejun Heo 提交于
      ATA_EHI_HOTPLUGGED is a hint for reset functions indicating the the
      port might have gone through hotplug/unplug just before entering EH.
      Reset functions modify their behaviors a bit to handle the situation
      better - e.g. using longer debouncing delay.
      
      Currently, once HOTPLUG is set, it isn't cleared till the end of EH.
      This is unnecessary and makes EH take longer.  Clear the HOTPLUGGED
      flag after a reset try (successful or not).
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      fccb6ea5
    • T
      libata: quickly trigger SATA SPD down after debouncing failed · f1545154
      Tejun Heo 提交于
      Debouncing failure is a good indicator of basic link problem.  Use
      -EPIPE to indicate debouncing failure and make ata_eh_reset() invoke
      sata_down_spd_limit() if the error occurs during reset.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      f1545154
    • T
      libata: improve SATA PHY speed down logic · 008a7896
      Tejun Heo 提交于
      sata_down_spd_limit() first reads the current SPD from SStatus and
      limit the speed to the lower one of one below the current limit or one
      below the current SPD in SStatus.  SPD may not be accessible or valid
      when SPD down is requested making sata_down_spd_limit() fail when it's
      most needed.
      
      This patch makes the current SPD cached after each successful reset
      and forces GEN I speed (1.5Gbps) if neither of SStatus or the cached
      value is valid, so sata_down_spd_limit() is now guaranteed to lower
      the speed limit if lower speed is available.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      008a7896
    • T
      libata: implement AC_ERR_NCQ · 5335b729
      Tejun Heo 提交于
      When an NCQ command fails, all commands in flight are aborted and the
      offending one is reported using log page 10h.  Depending on controller
      characteristics and LLD implementation, all commands may appear as
      having a device error due to shared TF status making it hard to
      determine what's actually going on.
      
      This patch adds AC_ERR_NCQ, marks the command reported by log page 10h
      with it and print extra "<F>" after the error report for the command
      to help distinguishing the offending command.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      5335b729
    • T
      libata: improve EH report formatting · b64bbc39
      Tejun Heo 提交于
      Requiring LLDs to format multiple error description messages properly
      doesn't work too well.  Help LLDs a bit by making ata_ehi_push_desc()
      insert ", " on each invocation.  __ata_ehi_push_desc() is the raw
      version without the automatic separator.
      
      While at it, make ehi_desc interface proper functions instead of
      macros.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      b64bbc39
  3. 11 7月, 2007 1 次提交
  4. 10 7月, 2007 2 次提交
    • T
      libata-acpi: implement _GTM/_STM support · 64578a3d
      Tejun Heo 提交于
      Implement _GTM/_STM support.  acpi_gtm is added to ata_port which
      stores _GTM parameters over suspend/resume cycle.  A new hook
      ata_acpi_on_suspend() is responsible for storing _GTM parameters
      during suspend.  _STM is executed in ata_acpi_on_resume().  With this
      change, invoking _GTF is safe on IDE hierarchy and acpi_sata check
      before _GTF is removed.
      
      ata_acpi_gtm() and ata_acpi_stm() implementation is taken from Alan
      Cox's pata_acpi implementation.  ata_acpi_gtm() is fixed such that the
      result parameter is not shifted by sizeof(union acpi_object).
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      64578a3d
    • T
      libata: reimplement ACPI invocation · 6746544c
      Tejun Heo 提交于
      This patch reimplements ACPI invocation such that, instead of
      exporting ACPI details to the rest of libata, ACPI event handlers -
      ata_acpi_on_resume() and ata_acpi_on_devcfg() - are used.  These two
      functions are responsible for determining whether specific ACPI method
      is used and when.
      
      On resume, _GTF is scheduled by setting ATA_DFLAG_ACPI_PENDING device
      flag.  This is done this way to avoid performing the action on wrong
      device device (device swapping while suspended).
      
      On every ata_dev_configure(), ata_acpi_on_devcfg() is called, which
      performs _SDD and _GTF.  _GTF is performed only after resuming and, if
      SATA, hardreset as the ACPI spec specifies.  As _GTF may contain
      arbitrary commands, IDENTIFY page is re-read after _GTF taskfiles are
      executed.
      
      If one of ACPI methods fails, ata_acpi_on_devcfg() retries on the
      first failure.  If it fails again on the second try, ACPI is disabled
      on the device.  Note that successful configuration clears ACPI failed
      status.
      
      With all feature checks moved to the above two functions,
      do_drive_set_taskfiles() is trivial and thus collapsed into
      ata_acpi_exec_tfs(), which is now static and converted to return the
      number of executed taskfiles to be used by ata_acpi_on_resume().  As
      failures are handled properly, ata_acpi_push_id() now returns -errno
      on errors instead of unconditional zero.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      6746544c
  5. 27 6月, 2007 3 次提交
  6. 22 5月, 2007 1 次提交
  7. 12 5月, 2007 3 次提交
    • T
      libata: give devices one last chance even if recovery failed with -EINVAL · 8575b814
      Tejun Heo 提交于
      After certain errors, some devices report complete garbage on
      IDENTIFY.  This can cause ata_dev_read_id() to fail with -EINVAL
      resulting in immediate disabling of the device.  Give the device one
      last chance after -EINVAL to allow recovery from such situations.  As
      -EINVAL is triggered very rarely, this shouldn't cause any noticeable
      affect on more common error paths.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: Harald Dunkel <harald.dunkel@t-online.de>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      8575b814
    • T
      libata: ignore EH scheduling during initialization · f4d6d004
      Tejun Heo 提交于
      libata enables SCSI host during ATA host activation which happens
      after IRQ handler is registered and IRQ is enabled.  All ATA ports are
      in frozen state when IRQ is enabled but frozen ports may raise limited
      number of IRQs after being frozen - IOW, ->freeze() is not responsible
      for clearing pending IRQs.  During normal operation, the IRQ handler
      is responsible for clearing spurious IRQs on frozen ports and it
      usually doesn't require any extra code.
      
      Unfortunately, during host initialization, the IRQ handler can end up
      scheduling EH for a port whose SCSI host isn't initialized yet.  This
      results in OOPS in the SCSI midlayer.  This is relatively short window
      and scheduling EH for probing is the first thing libata does after
      initialization, so ignoring EH scheduling until initialization is
      complete solves the problem nicely.
      
      This problem was spotted by Berck E. Nash in the following thread.
      
        http://thread.gmane.org/gmane.linux.kernel/519412Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: Berck E. Nash <flyboy@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      f4d6d004
    • T
      libata: reimplement suspend/resume support using sdev->manage_start_stop · 9666f400
      Tejun Heo 提交于
      Reimplement suspend/resume support using sdev->manage_start_stop.
      
      * Device suspend/resume is now SCSI layer's responsibility and the
        code is simplified a lot.
      
      * DPM is dropped.  This also simplifies code a lot.  Suspend/resume
        status is port-wide now.
      
      * ata_scsi_device_suspend/resume() and ata_dev_ready() removed.
      
      * Resume now has to wait for disk to spin up before proceeding.  I
        couldn't find easy way out as libata is in EH waiting for the
        disk to be ready and sd is waiting for EH to complete to issue
        START_STOP.
      
      * sdev->manage_start_stop is set to 1 in ata_scsi_slave_config().
        This fixes spindown on shutdown and suspend-to-disk.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      9666f400
  8. 01 5月, 2007 2 次提交
    • T
      libata: reimplement reset sequencing · 31daabda
      Tejun Heo 提交于
      libata previously depended upon waits in prereset to get resets after
      hotplug right for both spin up and device ready wait.  This was
      necessary both for reliablity and speed as reset was likely to fail if
      initiated too early and each try usually took more than 30secs to
      fail.  Previous patches fixed the reliability part by fixing status
      and SCR handling in resets.  This patch remedies the speed part by
      improving reset sequencing.
      
      Prereset waiting timeout is adjusted to 10s because spinup wait is
      replaced by reset sequencing and !BSY wait is not as important as
      before.  During boot or module loading where the drive is already
      fully spun up, !BSY wait succeeds immediately, so 10s should be enough
      in most cases.  It matters after hotplugging or other error
      conditions, but in those cases, !BSY wait in prereset simply can't be
      relied upon due to the varied and weird behaviors ATA controllers and
      devices show.
      
      Reset is now driven by ata_eh_reset_timeouts[] table which contains
      timeouts for each reset try.  The first reset can be softreset but the
      following ones are always hardreset if available.  Each timeout
      defines deadline for the reset try.  If a reset try fails, reset is
      retried with the next timeout till the end of the timeout table is
      reached.  If a reset try fails before the timeout with error, libata
      waits till the deadline of the failed try before retrying.
      
      IOW, the timeout table defines timetable of reset tries such that the
      n'th try always begins at least after the sum of all previous timeouts
      has passed.  The current timetable defines 4 tries and takes around 1
      minute.
      
      @0	: First try.  This should succeed most of the time during boot.
      @10	: 10s is enough to spin up most consumer harddrives.  Give it
      	  another shot.
      @20	: 20s should spin up > 99% of working drives.  This has 30s
      	  timeout for retarded devices needing long idleness post reset.
      @55	: Final try with 5s timeout just in case.
      
      The above timetable is trade off between not annoying the device too
      much with frequent resets and taking reasonable amount of time in most
      cases.  Some controllers may do better with shorter timeouts while
      others may fare better with longer but we just can't rely upon LLD
      writers to test each controller with wide variety of devices using
      various scenarios.  We need default behavior which reasonably fits
      most cases.
      
      I've tested the above timetable on a dozen SATA controllers and a few
      PATA controllers with about a dozen different drives from all major
      vendors and 4 different ODDs from three different vendors for both
      boot and hotplug (if available) cases.
      
      Boot probing is not affected unless the device is broken in which
      cases new code gives up on the port after a minute rather than five or
      nine minutes.  When hotplugging, most devices get detected on the
      first or second try.  Multi-platter drives with long spin up time
      which sometimes took > 40 secs with the original code, now usually
      comes up during the second try and at least right after the third try
      @20.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      31daabda
    • T
      libata: add deadline support to prereset and reset methods · d4b2bab4
      Tejun Heo 提交于
      Add @deadline to prereset and reset methods and make them honor it.
      ata_wait_ready() which directly takes @deadline is implemented to be
      used as the wait function.  This patch is in preparation for EH timing
      improvements.
      
      * ata_wait_ready() never does busy sleep.  It's only used from EH and
        no wait in EH is that urgent.  This function also prints 'be
        patient' message automatically after 5 secs of waiting if more than
        3 secs is remaining till deadline.
      
      * ata_bus_post_reset() now fails with error code if any of its wait
        fails.  This is important because earlier reset tries will have
        shorter timeout than the spec requires.  If a device fails to
        respond before the short timeout, reset should be retried with
        longer timeout rather than silently ignoring the device.
      
        There are three behavior differences.
      
        1. Timeout is applied to both devices at once, not separately.  This
           is more consistent with what the spec says.
      
        2. When a device passes devchk but fails to become ready before
           deadline.  Previouly, post_reset would just succeed and let
           device classification remove the device.  New code fails the
           reset thus causing reset retry.  After a few times, EH will give
           up disabling the port.
      
        3. When slave device passes devchk but fails to become accessible
           (TF-wise) after reset.  Original code disables dev1 after 30s
           timeout and continues as if the device doesn't exist, while the
           patched code fails reset.  When this happens, new code fails
           reset on whole port rather than proceeding with only the primary
           device.
      
        If the failing device is suffering transient problems, new code
        retries reset which is a better behavior.  If the failing device is
        actually broken, the net effect is identical to it, but not to the
        other device sharing the channel.  In the previous code, reset would
        have succeeded after 30s thus detecting the working one.  In the new
        code, reset fails and whole port gets disabled.  IMO, it's a
        pathological case anyway (broken device sharing bus with working
        one) and doesn't really matter.
      
      * ata_bus_softreset() is changed to return error code from
        ata_bus_post_reset().  It used to return 0 unconditionally.
      
      * Spin up waiting is to be removed and not converted to honor
        deadline.
      
      * To be on the safe side, deadline is set to 40s for the time being.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      d4b2bab4
  9. 29 4月, 2007 4 次提交
    • T
      libata: separate ATA_EHI_DID_RESET into DID_SOFTRESET and DID_HARDRESET · 0d64a233
      Tejun Heo 提交于
      Separate ATA_EHI_DID_RESET into ATA_EHI_DID_SOFTRESET and
      ATA_EHI_DID_HARDRESET.  ATA_EHI_DID_RESET is redefined as OR of the
      two flags.  This patch doesn't introduce any behavior change.  This
      will be used later to determine whether _SDD is necessary or not.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      0d64a233
    • T
      libata: add missing call to ->cable_detect() in new EH path · c1c4e8d5
      Tejun Heo 提交于
      ->cable_detect() used to be called on by the old ata_bus_probe() path.
      Add invocation to ata_eh_revalidate_and_attach() right after IDENTIFYs
      are done.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      c1c4e8d5
    • T
      libata: improve AC_ERR_DEV handling for ->post_internal_cmd · a51d644a
      Tejun Heo 提交于
      ->post_internal_cmd is simplified EH for internal commands.  Its
      primary mission is to stop the controller such that no rogue memory
      access or other activities occur after the internal command is
      released.  It may provide error diagnostics by setting qc->err_mask
      but this hasn't been a requirement.
      
      To ignore SETXFER failure for CFA devices, libata needs to know
      whether a command was failed by the device or for any other reason.
      ie. internal command needs to get AC_ERR_DEV right.
      
      This patch makes the following changes to AC_ERR_DEV handling and
      ->post_internal_cmd semantics to accomodate this need and simplify
      callback implementation.
      
      1. As long as the correct bits in the result TF registers are set,
         there is no need to set AC_ERR_DEV explicitly.  libata EH core
         takes care of that for both normal and internal commands.
      
      2. The only requirement for ->post_internal_cmd() is to put the
         controller into quiescent state.  It needs not to set any err_mask.
      
      3. ata_exec_internal_sg() performs minimal error analysis such that
         AC_ERR_DEV is automatically set as long as result_tf is filled
         correctly.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      a51d644a
    • T
      libata: hardreset on SERR_INTERNAL · 771b8dad
      Tejun Heo 提交于
      There was a rare report where SB600 reported SERR_INTERNAL and SRST
      couldn't get it out of the failure mode.  Hardreset on SERR_INTERNAL.
      As the problem is intermittent, whether this fixes the problem or not
      hasn't been verified yet, but hardresetting the channel on internal
      error is a good idea anyway.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      771b8dad
  10. 04 4月, 2007 1 次提交
  11. 28 3月, 2007 1 次提交
    • T
      libata: IDENTIFY backwards for drive side cable detection · 8c3c52a8
      Tejun Heo 提交于
      For drive side cable detection to work correctly, drives need to be
      identified backwards such that the slave device releases PDIAG- before
      the mater drive tries to detect cable type.  ata_bus_probe() was fixed
      by commit f31f0cc2 but the new EH path
      wasn't fixed.  This patch makes new EH path do IDENTIFY backwards.
      
      ata_dev_configure() for new devices are still performed master first.
      This is to keep the detection messages in forward order.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      8c3c52a8
  12. 19 3月, 2007 1 次提交
  13. 03 3月, 2007 1 次提交
  14. 21 2月, 2007 4 次提交
    • T
      libata: s/ap->id/ap->print_id/g · 44877b4e
      Tejun Heo 提交于
      ata_port has two different id fields - id and port_no.  id is
      system-wide 1-based unique id for the port while port_no is 0-based
      host-wide port number.  The former is primarily used to identify the
      ATA port to the user in printk messages while the latter is used in
      various places in libata core and LLDs to index the port inside the
      host.
      
      The two fields feel quite similar and sometimes ap->id is used in
      place of ap->port_no, which is very difficult to spot.  This patch
      renames ap->id to ap->print_id to reduce the possibility of such bugs.
      
      Some printk messages are adjusted such that id string (ata%u[.%u])
      isn't printed twice and/or to use ata_*_printk() instead of hardcoded
      id format.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      44877b4e
    • T
      libata: put some intelligence into EH speed down sequence · 7d47e8d4
      Tejun Heo 提交于
      The current EH speed down code is more of a proof that the EH
      framework is capable of adjusting transfer speed in response to error.
      This patch puts some intelligence into EH speed down sequence.  The
      rules are..
      
      * If there have been more than three timeout, HSM violation or
        unclassified DEV errors for known supported commands during last 10
        mins, NCQ is turned off.
      
      * If there have been more than three timeout or HSM violation for known
        supported command, transfer mode is slowed down.  If DMA is active,
        it is first slowered by one grade (e.g. UDMA133->100).  If that
        doesn't help, it's slowered to 40c limit (UDMA33).  If PIO is
        active, it's slowered by one grade first.  If that doesn't help,
        PIO0 is forced.  Note that this rule does not change transfer mode.
        DMA is never degraded into PIO by this rule.
      
      * If there have been more than ten ATA bus, timeout, HSM violation or
        unclassified device errors for known supported commands && speeding
        down DMA mode didn't help, the device is forced into PIO mode.  Note
        that this rule is considered only for PATA devices and is pretty
        difficult to trigger.
      
      One error can only trigger one rule at a time.  After a rule is
      triggered, error history is cleared such that the next speed down
      happens only after some number of errors are accumulated.  This makes
      sense because now speed down is done in bigger stride.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      7d47e8d4
    • T
      libata: improve probe failure handling · 4ae72a1e
      Tejun Heo 提交于
      * Move forcing device to PIO0 on device disable into
        ata_dev_disable().  This makes both old and new EHs act the same
        way.
      
      * Speed down only PIO mode on probe failure.  All commands used during
        probing are PIO commands.  There's no point in speeding down DMA.
      
      * Retry at least once after -ENODEV.  Some devices report garbled
        IDENTIFY data after certain events.  This shouldn't cause device
        detach and re-attach.
      
      * Rearrange EH failure path for simplicity.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      4ae72a1e
    • T
      libata: improve ata_down_xfermask_limit() · 458337db
      Tejun Heo 提交于
      Make ata_down_xfermask_limit() accept @sel instead of @force_pio0.
      @sel selects how the xfermask limit will be adjusted.  The following
      selectors are defined.
      
      * ATA_DNXFER_PIO	: only speed down PIO
      * ATA_DNXFER_DMA	: only speed down DMA, don't cause transfer mode change
      * ATA_DNXFER_40C	: apply 40c cable limit
      * ATA_DNXFER_FORCE_PIO	: force PIO
      * ATA_DNXFER_FORCE_PIO0	: force PIO0 (same as original with @force_pio0 == 1)
      * ATA_DNXFER_ANY	: same as original with @force_pio0 == 0
      
      Currently, only ANY and FORCE_PIO0 are used to maintain the original
      behavior.  Other selectors will be used later to improve EH speed down
      sequence.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      458337db
  15. 10 2月, 2007 1 次提交
    • T
      libata: kill qc->nsect and cursect · 726f0785
      Tejun Heo 提交于
      libata used two separate sets of variables to record request size and
      current offset for ATA and ATAPI.  This is confusing and fragile.
      This patch replaces qc->nsect/cursect with qc->nbytes/curbytes and
      kills them.  Also, ata_pio_sector() is updated to use bytes for
      qc->cursg_ofs instead of sectors.  The field used to be used in bytes
      for ATAPI and in sectors for ATA.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      726f0785
  16. 27 1月, 2007 1 次提交