1. 23 4月, 2010 2 次提交
  2. 21 1月, 2010 1 次提交
  3. 03 12月, 2009 1 次提交
    • T
      libata: retry failed FLUSH if device didn't fail it · 6013efd8
      Tejun Heo 提交于
      If ATA device failed FLUSH, it means that the device failed to write
      out some amount of data and the error needs to be reported to upper
      layers. As retries can't recover the lost data, FLUSH failures need to
      be reported immediately in general.
      
      However, if FLUSH fails due to transmission errors, the FLUSH needs to
      be retried; otherwise, filesystems may switch to RO mode and/or raid
      array may drop a drive for a random transmission glitch.
      
      This condition can be rather easily reproduced on certain ahci
      controllers which go through a PHY event after powersave mode switch +
      ext4 combination.  Powersave mode switch is often closely followed by
      flush from the filesystem failing the FLUSH with ATA bus error which
      makes the filesystem code believe that data is lost and drop to RO
      mode.  This was reported in the following bugzilla bug.
      
        http://bugzilla.kernel.org/show_bug.cgi?id=14543
      
      This patch makes libata EH retry FLUSH if it wasn't failed by the
      device.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndrey Vihrov <andrey.vihrov@gmail.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      6013efd8
  4. 16 10月, 2009 1 次提交
    • T
      libata: fix PMP initialization · 4f7c2874
      Tejun Heo 提交于
      Commit 842faa6c fixed error handling
      during attach by not committing detected device class to dev->class
      while attaching a new device.  However, this change missed the PMP
      class check in the configuration loop causing a new PMP device to go
      through ata_dev_configure() as if it were an ATA or ATAPI device.
      
      As PMP device doesn't have a regular IDENTIFY data, this makes
      ata_dev_configure() tries to configure a PMP device using an invalid
      data.  For the most part, it wasn't too harmful and went unnoticed but
      this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
      sata_pmp_attach().  This means that SATA_PMP_FEAT_NOTIFY ends up being
      disabled on PMPs and on PMPs which honor the flag breaks hotplug
      support.
      
      This problem was discovered and reported by Ethan Hsiao.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NEthan Hsiao <ethanhsiao@jmicron.com>
      Cc: stable@kernel.org
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      4f7c2874
  5. 07 10月, 2009 1 次提交
    • T
      libata: fix incorrect link online check during probe · 3b761d3d
      Tejun Heo 提交于
      While trying to work around spurious detection retries for
      non-existent devices on slave links, commit
      816ab897 incorrectly added link
      offline check logic before ata_eh_thaw() was called.  This means that
      if an occupied link goes down briefly at the time that offline check
      was performed, device class will be cleared to ATA_DEV_NONE and libata
      wouldn't retry thus failing detection of the device.
      
      The offline check should be done after the port is thawed together
      with online check so that such link glitches can be detected by the
      interrupt handler and handled properly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NTim Blechmann <tim@klingt.org>
      Cc: stable@kernel.org
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      3b761d3d
  6. 02 9月, 2009 3 次提交
    • R
      libata: add command name parsing for error output · 6521148c
      Robert Hancock 提交于
      This patch improve libata's output for error/notification messages
      to allow easier comprehension and debugging:
      
      When ATAPI commands issued through the SCSI layer fail, use SCSI
      functions to print the CDB in human-readable form instead of just
      dumping out the CDB in hex.
      
      Print out the name of the failed command (as defined by the ATA
      specification) in error handling output along with the raw register
      contents.
      
      When reporting status of ACPI taskfile commands executed on resume,
      also output the names of the commands being executed (or not) in
      readable form.
      
      Since the extra data for printing command names increases kernel
      size slightly, a config option has been added to allow disabling
      command name output (as well as some of the error register parsing)
      for those highly sensitive to kernel text size.
      Signed-off-by: NRobert Hancock <hancockrwd@gmail.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      6521148c
    • T
      libata: clear eh_info on reset completion · 1e641060
      Tejun Heo 提交于
      Resets are done with port frozen but some controllers still issue
      interrupts during reset and they may end up recording error conditions
      in ehi leading to unnecessary EH retrials.
      
      This patch makes ata_eh_reset() clear ehi on reset completion.  As
      reset is the most severe recovery action, there's nothing to lose by
      clearing ehi on its completion.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NZdenek Kaspar <zkaspar82@gmail.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      1e641060
    • J
      [libata] EH: freeze port before aborting commands · 54c38444
      Jeff Garzik 提交于
      Call the ->freeze() hook before aborting qc's, because some hardware
      requires special handling prior to accessing the taskfile registers
      (for diagnosis/analysis/reset).  Most notably, hardware may wish to
      disable the DMA engine or interrupts in the ->freeze() hook.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      54c38444
  7. 29 7月, 2009 1 次提交
  8. 15 7月, 2009 1 次提交
  9. 13 6月, 2009 1 次提交
  10. 12 5月, 2009 2 次提交
    • T
      libata: clear ering on resume · 6f9c1ea2
      Tejun Heo 提交于
      Error timestamps are in jiffies which doesn't run while suspended and
      PHY events during resume isn't too uncommon.  When the two are
      combined, it can lead to unnecessary speed downs if the machine is
      suspended and resumed repeatedly.  Clear error history on resume.
      
      This was reported and verified in bnc#486803 by Vladimir Botka.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NVladimir Botka <vbotka@novell.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      6f9c1ea2
    • T
      libata: fix attach error handling · 842faa6c
      Tejun Heo 提交于
      New device attach path in ata_eh_revalidate_and_attach() is divided
      into two separate loops because ATA requires IDENTIFY to be issued to
      slave first while the user expects to see device probe messages from
      the master device.  new_mask is used to track which devices are the
      new ones between the first loop and the second.
      
      This usually works well but if an error occurs during configuration
      stage, ata_dev_revalidate_and_attach() returns with error code and
      forgets new_mask.  On the retry run, dev->class is set and new_mask
      for the device is clear, so the device just gets revalidated and thus
      ends up skipping post-configuration procedure including scheduling of
      SCSI_HOTPLUG for the device.  When this occurs, ATA part of probing
      works fine but SCSI probing usually doesn't happen and makes the
      device unreachable.
      
      The behavior has been around for a very long time but it has been
      uncovered with the recent addition of 1_5_GBPS horkage which uses
      -EAGAIN return value from ata_dev_configure() to restart the probing
      sequence after forcing cable speed.
      
      This can be fixed by making sure dev->class is permanently set only
      after all configurations are successfully complete.  Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NTim Connors <tconnors+linuxkml@astro.swin.edu.au>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      842faa6c
  11. 25 3月, 2009 1 次提交
    • A
      [libata] Improve timeout handling · c96f1732
      Alan Cox 提交于
      On a timeout call a device specific handler early in the recovery so that
      we can complete and process successful commands which timed out due to IRQ
      loss or the like rather more elegantly.
      
      [Revised to exclude the timeout handling on a few devices that inherit from
       SFF but are not SFF enough to use the default timeout handler]
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      c96f1732
  12. 05 3月, 2009 2 次提交
    • T
      libata: make sure port is thawed when skipping resets · d6515e6f
      Tejun Heo 提交于
      When SCR access is available and the link is offline, softreset is
      skipped as it only wastes time and some controllers don't respond very
      well.  However, the skip path forgot to thaw the port, which not only
      blocks further event notification from the port but also causes
      repeated EH invocations on the same event on drivers which rely on
      ->thaw() to clear events if the IRQ is shared with another device or
      port.
      
      This problem has always been there but is uncovered by recent sata_nv
      nf2/3 change which dropped hardreset support while maintaining SCR
      access.  nf2/3 doesn't clear hotplug event mask from the interrupt
      handler but relies on ->thaw() to clear them.  When the hardreset was
      there, the reset action was never skipped and the port was always
      thawed but, with the hardreset gone, ->prereset() determines that
      there's no need for softreset and both ->softreset() and ->thaw() are
      skipped.  This leads to stuck hotplug event in the IRQ status register
      triggering hotplug event whenever IRQ is delieverd on the same IRQ.
      As the controller shares the same IRQ for both ports, this happens on
      every IO if one port is occpupied and the other isn't.
      
      This patch fixes the problem by making sure that the port is thawed on
      reset-skip path.
      
      bko#11615 reports this problem.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Robert Hancock <hancockrwd@gmail.com>
      Reported-by: NDan Andresan <danyer@gmail.com>
      Reported-by: NArne Woerner <arne_woerner@yahoo.com>
      Reported-by: NStefan Lippers-Hollmann <s.L-H@gmx.de>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      d6515e6f
    • T
      libata: don't use on-stack sense buffer · b5357081
      Tejun Heo 提交于
      sense_buffer is used as DMA target and shouldn't be allocated on
      stack.  Use ap->sector_buf instead.  This problem is spotted by Chuck
      Ebbert.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NChuck Ebbert <cebbert@redhat.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      b5357081
  13. 03 2月, 2009 6 次提交
    • T
      libata: add no penalty retry request for EH device handling routines · cf9a590a
      Tejun Heo 提交于
      Let -EAGAIN from EH device handling routines trigger EH retry without
      consuming its tries count.  This will be used to implement link SPD
      horkage which requires hardreset to adjust SPD without affecting other
      EH decisions.  As it bypasses the forward progress guarantee provided
      by the tries count, the requester is responsible for ensuring forward
      progress.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      cf9a590a
    • T
      libata: improve probe failure handling · c2c7a89c
      Tejun Heo 提交于
      When link is flaky at high speed, it isn't uncommon for a device to
      repeatedly fail probing sequence early after successfully negotiating
      high link speed.  This often leads to consecutive hotplug events
      without successful probing.
      
      This patch improves libata EH such that it remembers probing trials
      and if there have been more than two unsuccessful trials in the past
      60 seconds, slows down link speed to 1.5Gbps.
      
      As link speed negotiation is the duty of the PHY layer proper, the
      goal of this fallback mechanism is to provide the last resort when
      everything else fails, which unfortunately happens not too
      infrequently, so no fancy 6->3->1.5 speeding down or highest
      successful transmission speed seen kind of logics (yet).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      c2c7a89c
    • T
      libata: add @spd_limit to sata_down_spd_limit() · a07d499b
      Tejun Heo 提交于
      Add @spd_limit to sata_down_spd_limit() so that the caller can specify
      the SPD limit it wants.  This parameter doesn't get in the way even
      when it's too low.  The closest possible limit is applied.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      a07d499b
    • T
      libata: clear dev->ering in smarter way · 99cf610a
      Tejun Heo 提交于
      dev->ering used to be cleared together with the rest of ata_device in
      ata_dev_init() which is called whenever a probing event occurs.
      dev->ering is about to be used to track probing failures so it needs
      to remain persistent over multiple porbing events.  This patch
      achieves this by doing the following.
      
      * Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only
        clear between BEGIN and END.  ering is moved after END.  The split
        of persistent area is to allow hotter items remain at the head.
      
      * ering is explicitly cleared on ata_dev_disable() and when device
        attach succeeds.  So, ering is persistent throug a device's life
        time (unless explicitly cleared of course) and also through periods
        inbetween disablement of an attached device and successful detection
        of the next one.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      99cf610a
    • T
      libata: move ata_dev_disable() to libata-eh.c · 678afac6
      Tejun Heo 提交于
      ata_dev_disable() is about to be more tightly integrated into EH
      logic.  Move it to libata-eh.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      678afac6
    • T
      libata: fix EH device failure handling · d89293ab
      Tejun Heo 提交于
      The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
      speed down warning messages but it accidentally disabled SATA link spd
      down during configuration phase after reset where PIO mode is always
      zero.
      
      This patch fixes the problem by moving the test where it belongs.
      This makes libata probing sequence behave better when the connection
      is flaky at higher link speeds which isn't too uncommon for eSATA
      devices.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      d89293ab
  14. 29 12月, 2008 2 次提交
    • T
      libata: perform port detach in EH · ece180d1
      Tejun Heo 提交于
      ata_port_detach() first made sure EH saw ATA_PFLAG_UNLOADING and then
      assumed EH context belongs to it and performed detach operation
      itself.  However, UNLOADING doesn't disable all of EH and this could
      lead to problems including triggering WARN_ON()'s in EH path.
      
      This patch makes port detach behave more like other EH actions such
      that ata_port_detach() requests EH to detach and waits for completion.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      ece180d1
    • T
      libata: beef up iterators · 1eca4365
      Tejun Heo 提交于
      There currently are the following looping constructs.
      
      * __ata_port_for_each_link() for all available links
      * ata_port_for_each_link() for edge links
      * ata_link_for_each_dev() for all devices
      * ata_link_for_each_dev_reverse() for all devices in reverse order
      
      Now there's a need for looping construct which is similar to
      __ata_port_for_each_link() but iterates over PMP links before the host
      link.  Instead of adding another one with long name, do the following
      cleanup.
      
      * Implement and export ata_link_next() and ata_dev_next() which take
        @mode parameter and can be used to build custom loop.
      * Implement ata_for_each_link() and ata_for_each_dev() which take
        looping mode explicitly.
      
      The following iteration modes are implemented.
      
      * ATA_LITER_EDGE		: loop over edge links
      * ATA_LITER_HOST_FIRST		: loop over all links, host link first
      * ATA_LITER_PMP_FIRST		: loop over all links, PMP links first
      
      * ATA_DITER_ENABLED		: loop over enabled devices
      * ATA_DITER_ENABLED_REVERSE	: loop over enabled devices in reverse order
      * ATA_DITER_ALL			: loop over all devices
      * ATA_DITER_ALL_REVERSE		: loop over all devices in reverse order
      
      This change removes exlicit device enabledness checks from many loops
      and makes it clear which ones are iterated over in which direction.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      1eca4365
  15. 11 11月, 2008 1 次提交
    • T
      libata: fix last_reset timestamp handling · 19b72321
      Tejun Heo 提交于
      ehc->last_reset is used to ensure that resets are not issued too
      close to each other.  It's initialized to jiffies minus one minute
      on EH entry.  However, when new links are initialized after PMP is
      probed, new links have zero for this timestamp resulting in long wait
      depending on the current jiffies.
      
      This patch makes last_set considered iff ATA_EHI_DID_RESET is set, in
      which case last_reset is always initialized.  As an added precaution,
      WARN_ON() is added so that warning is printed if last_reset is
      in future.
      
      This problem is spotted and debugged by Shane Huang.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Shane Huang <Shane.Huang@amd.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      19b72321
  16. 28 10月, 2008 2 次提交
  17. 23 10月, 2008 3 次提交
    • T
      libata: set device class to NONE if phys_offline · 816ab897
      Tejun Heo 提交于
      Reset methods don't have access to phys link status for slave links
      and may incorrectly indicate device presence causing unnecessary probe
      failures for unoccupied links.  This patch clears device class to NONE
      during post-reset processing if phys link is offline.
      
      As on/offlineness semantics is strictly defined and used in multiple
      places by the core layer, this won't change behavior for drivers which
      don't use slave links.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      816ab897
    • T
      libata-eh: fix slave link EH action mask handling · a568d1d2
      Tejun Heo 提交于
      Slave link action mask is transferred to master link and all the EH
      actions are taken by the master link.  ata_eh_about_to_do() and
      ata_eh_done() are called with ATA_EH_ALL_ACTIONS to clear the slave
      link actions during transfer.  This always sets ATA_PFLAG_RECOVERED
      flag causing spurious "EH complete" messages.
      
      Don't set ATA_PFLAG_RECOVERED for slave link actions.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      a568d1d2
    • T
      libata: transfer EHI control flags to slave ehc.i · 848e4c68
      Tejun Heo 提交于
      ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used to control the behavior
      of EH.  As only the master link is visible outside EH, these flags are
      set only for the master link although they should also apply to the
      slave link, which causes spurious EH messages during probe and
      suspend/resume.
      
      This patch transfers those two flags to slave ehc.i before performing
      slave autopsy and reporting.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      848e4c68
  18. 09 10月, 2008 1 次提交
  19. 29 9月, 2008 3 次提交
    • T
      libata-eh: clear UNIT ATTENTION after reset · 11fc33da
      Tejun Heo 提交于
      Resets make ATAPI devices raise UNIT ATTENTION which fails the next
      command.  As resets can happen asynchronously for unrelated reasons,
      this sometimes disrupts innocent users.  For example, reading DVD
      fails after the system wakes up from suspend or the other device
      sharing the channel went through bus error.
      
      Clearing UA has some problems as it might clear UA which the userland
      needs to know about.  However, UA after resets can only be about the
      reset itself and benefits of clearing it overweights cons.  Missing UA
      can only delay failure to one of the following commands anyway.  For
      example, timeout while burning is in progress will trigger reset and
      reset the device state and probably corrupt the burning run.  Although
      the userland application won't get the UA, its pending writes will
      fail.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      11fc33da
    • E
      libata: Implement disk shock protection support · 45fabbb7
      Elias Oltmanns 提交于
      On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD
      FEATURE as specified in ATA-7 is issued to the device and processing of
      the request queue is stopped thereafter until the specified timeout
      expires or user space asks to resume normal operation. This is supposed
      to prevent the heads of a hard drive from accidentally crashing onto the
      platter when a heavy shock is anticipated (like a falling laptop
      expected to hit the floor). In fact, the whole port stops processing
      commands until the timeout has expired in order to avoid any resets due
      to failed commands on another device.
      Signed-off-by: NElias Oltmanns <eo@nebensachen.de>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      45fabbb7
    • T
      libata: implement slave_link · b1c72916
      Tejun Heo 提交于
      Explanation taken from the comment of ata_slave_link_init().
      
       In libata, a port contains links and a link contains devices.  There
       is single host link but if a PMP is attached to it, there can be
       multiple fan-out links.  On SATA, there's usually a single device
       connected to a link but PATA and SATA controllers emulating TF based
       interface can have two - master and slave.
      
       However, there are a few controllers which don't fit into this
       abstraction too well - SATA controllers which emulate TF interface
       with both master and slave devices but also have separate SCR
       register sets for each device.  These controllers need separate links
       for physical link handling (e.g. onlineness, link speed) but should
       be treated like a traditional M/S controller for everything else
       (e.g. command issue, softreset).
      
       slave_link is libata's way of handling this class of controllers
       without impacting core layer too much.  For anything other than
       physical link handling, the default host link is used for both master
       and slave.  For physical link handling, separate @ap->slave_link is
       used.  All dirty details are implemented inside libata core layer.
       From LLD's POV, the only difference is that prereset, hardreset and
       postreset are called once more for the slave link, so the reset
       sequence looks like the following.
      
       prereset(M) -> prereset(S) -> hardreset(M) -> hardreset(S) ->
       softreset(M) -> postreset(M) -> postreset(S)
      
       Note that softreset is called only for the master.  Softreset resets
       both M/S by definition, so SRST on master should handle both (the
       standard method will work just fine).
      
      As slave_link excludes PMP support and only code paths which deal with
      the attributes of physical link are affected, all the changes are
      localized to libata.h, libata-core.c and libata-eh.c.
      
       * ata_is_host_link() updated so that slave_link is considered as host
         link too.
      
       * iterator extended to iterate over the slave_link when using the
         underbarred version.
      
       * force param handling updated such that devno 16 is mapped to the
         slave link/device.
      
       * ata_link_on/offline() updated to return the combined result from
         master and slave link.  ata_phys_link_on/offline() are the direct
         versions.
      
       * EH autopsy and report are performed separately for master slave
         links.  Reset is udpated to implement the above described reset
         sequence.
      
      Except for reset update, most changes are minor, many of them just
      modifying dev->link to ata_dev_phys_link(dev) or using phys online
      test instead.
      
      After this update, LLDs can take full advantage of per-dev SCR
      registers by simply turning on slave link.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      b1c72916
  20. 22 8月, 2008 4 次提交
  21. 15 7月, 2008 1 次提交