1. 11 9月, 2014 1 次提交
    • B
      PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device · b440bde7
      Bjorn Helgaas 提交于
      Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold),
      normally generates a hot-remove event that unbinds the driver.
      
      Some drivers expect to remain bound to a device even while they power it
      off and back on again.  This can be dangerous, because if the device is
      removed or replaced while it is powered off, the driver doesn't know that
      anything changed.  But some drivers accept that risk.
      
      Add pci_ignore_hotplug() for use by drivers that know their device cannot
      be removed.  Using pci_ignore_hotplug() tells the PCI core that hot-plug
      events for the device should be ignored.
      
      The radeon and nouveau drivers use this to switch between a low-power,
      integrated GPU and a higher-power, higher-performance discrete GPU.  They
      power off the unused GPU, but they want to remain bound to it.
      
      This is a reimplementation of f244d8b6 ("ACPIPHP / radeon / nouveau:
      Fix VGA switcheroo problem related to hotplug") but extends it to work with
      both acpiphp and pciehp.
      
      This fixes a problem where systems with dual GPUs using the radeon drivers
      become unusable, freezing every few seconds (see bugzillas below).  The
      resume of the radeon device may also fail, e.g.,
      
      This fixes problems on dual GPU systems where the radeon driver becomes
      unusable because of problems while suspending the device, as in bug 79701:
      
          [drm] radeon: finishing device.
          radeon 0000:01:00.0: Userspace still has active objects !
          radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free
          ...
          WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]()
          trying to unbind memory from uninitialized GART !
      
      or while resuming it, as in bug 77261:
      
          radeon 0000:01:00.0: ring 0 stalled for more than 10158msec
          radeon 0000:01:00.0: GPU lockup ...
          radeon 0000:01:00.0: GPU pci config reset
          pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1)
          radeon 0000:01:00.0: GPU reset succeeded, trying to resume
          *ERROR* radeon: dpm resume failed
          radeon 0000:01:00.0: Wait for MC idle timedout !
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701Reported-by: NShawn Starr <shawn.starr@rogers.com>
      Reported-by: NJose P. <lbdkmjdf@sharklasers.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Acked-by: NRajat Jain <rajatxjain@gmail.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NDave Airlie <airlied@redhat.com>
      CC: stable@vger.kernel.org	# v3.15+
      b440bde7
  2. 08 7月, 2014 1 次提交
    • M
      PCI: pciehp: Clear Data Link Layer State Changed during init · 0d25d35c
      Myron Stowe 提交于
      During PCIe hot-plug initialization - pciehp_probe() - data structures
      related to slot capabilities are set up.  As part of this set up, ISRs are
      put in place to handle slot events and all event bits are cleared out.
      
      This patch adds the Data Link Layer State Changed (PCI_EXP_SLTSTA_DLLSC)
      Slot Status bit to the event bits that are cleared out during
      initialization.
      
      If the BIOS doesn't clear DLLSC before handoff to the OS, pciehp notices
      that it's set and interprets it as a new Link Up event, which results in
      spurious messages:
      
        pciehp 0000:82:04.0:pcie24: slot(4): Link Up event
        pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at 0000:83:00, cannot hot-add
        pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00
      
      Prior to e48f1b67 ("PCI: pciehp: Use link change notifications for
      hot-plug and removal"), pciehp ignored DLLSC.
      
      Reference:
        PCI-SIG.  PCI Express Base Specification Revision 4.0 Version 0.3
        (PCI-SIG, 2014): 7.8.11. Slot Status Register (Offset 1Ah).
      
      [bhelgaas: add e48f1b67 ref and stable tag]
      Fixes: e48f1b67 ("PCI: pciehp: Use link change notifications for hot-plug and removal")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=79611Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org	# v3.15+
      0d25d35c
  3. 06 7月, 2014 1 次提交
  4. 18 6月, 2014 3 次提交
    • B
      PCI: pciehp: Remove assumptions about which commands cause completion events · 2cc56f30
      Bjorn Helgaas 提交于
      We use incorrect logic to decide whether a PCIe hotplug controller
      generates command completion events.
      
      5808639b ("pciehp: fix slow probing") assumed that the Slot Status
      "Command Completed" bit was set only for commands affecting slot power,
      indicators, or electromechanical interlock.  That assumption is false: per
      sec. 6.7.3.2 of PCIe spec r3.0, a write targeting any portion of the Slot
      Control register is a command, and (if command completed events are
      supported) software must wait for a command to complete before issuing the
      next command.
      
      5808639b was to fix boot-time timeouts (see bugzilla below) on a Lenovo
      Thinkpad R61 with an Intel hotplug controller.  The controller probably has
      the Intel CF118 erratum, which means it doesn't report Command Completed
      unless the Slot Control power, indicator, or interlock bits are changed.
      This causes a timeout because pciehp always waits for Command Complete (if
      supported), regardless of which bits are changed.
      
      Remove the incorrect logic because the timeouts have been addressed
      differently by these changes:
      
        PCI: pciehp: Wait for hotplug command completion lazily
        PCI: pciehp: Compute timeout from hotplug command start time
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=10751
      Tested-by: Rajat Jain <rajatxjain@gmail.com>	(IDT 807a controller)
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      2cc56f30
    • B
      PCI: pciehp: Compute timeout from hotplug command start time · 40b96083
      Bjorn Helgaas 提交于
      If we issue a hotplug command, go do something else, then come back and
      wait for the command to complete, we don't have to wait the whole timeout
      period, because some of it elapsed while we were doing something else.
      
      Keep track of the time we issued the command, and wait only until the
      timeout period from that point has elapsed.
      
      For controllers with errata like Intel CF118, we previously timed out
      before issuing the second hotplug command:
      
        At time T1 (during boot):
          - Write DLLSCE, ABPE, PDCE, etc. to Slot Control
        At time T2 (hotplug event):
          - Wait for command completion (CC) in Slot Status
          - Timeout at T2 + 1 second because CC is never set in Slot Status
          - Write PCC, PIC, etc. to Slot Control
      
      With this change, we wait until T1 + 1 second instead of T2 + 1 second.
      If the hotplug event is more than 1 second after the boot-time
      initialization, we won't wait for the timeout at all.
      
      We still emit a "Timeout on hotplug command" message if it timed out; we
      should see this on the first hotplug event on every controller with this
      erratum, as well as on real errors on controllers without the erratum.
      
      Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
      Tested-by: Rajat Jain <rajatxjain@gmail.com>	(IDT 807a controller)
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      40b96083
    • B
      PCI: pciehp: Wait for hotplug command completion lazily · 3461a068
      Bjorn Helgaas 提交于
      Previously we issued a hotplug command and waited for it to complete.  But
      there's no need to wait until we're ready to issue the *next* command.  The
      next command will probably be much later, so the first one may have already
      completed and we may not have to actually wait at all.
      
      Because of hardware errata, some controllers generate command completion
      events for some commands but not others.  In the case of Intel CF118 (see
      spec update reference), the controller indicates command completion only
      for Slot Control writes that change the value of the following bits:
      
        Power Controller Control
        Power Indicator Control
        Attention Indicator Control
        Electromechanical Interlock Control
      
      Changes to other bits, e.g., the interrupt enable bits, do not cause the
      Command Completed bit to be set.  Controllers from AMD and Nvidia are
      reported to have similar errata.
      
      These errata cause timeouts when pcie_enable_notification() enables
      interrupts.  Previously that timeout occurred at boot-time.  With this
      change, the timeout occurs later, when we change the state of the slot
      power, indicators, or interlock.  This speeds up boot but causes a timeout
      at the first hotplug event on the slot.  Subsequent events don't timeout
      because only the first (boot-time) hotplug command updates Slot Control
      without touching the power/indicator/interlock controls.
      
      Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
      Tested-by: Rajat Jain <rajatxjain@gmail.com>	(IDT 807a controller)
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      3461a068
  5. 17 6月, 2014 1 次提交
  6. 11 6月, 2014 2 次提交
  7. 25 4月, 2014 1 次提交
  8. 20 2月, 2014 1 次提交
  9. 12 2月, 2014 4 次提交
  10. 11 2月, 2014 2 次提交
  11. 16 12月, 2013 7 次提交
  12. 15 12月, 2013 1 次提交
  13. 15 11月, 2013 1 次提交
  14. 15 8月, 2013 1 次提交
  15. 04 7月, 2013 1 次提交
  16. 13 1月, 2013 1 次提交
    • Y
      PCI: pciehp: Use per-slot workqueues to avoid deadlock · c2be6f93
      Yijing Wang 提交于
      When we have a hotplug-capable PCIe port with a second hotplug-capable
      PCIe port below it, removing the device below the upstream port causes
      a deadlock.
      
      The deadlock happens because we use the pciehp_wq workqueue to run
      pciehp_power_thread(), which uses pciehp_disable_slot() to remove devices
      below the upstream port.  When we remove the downstream PCIe port, we call
      pciehp_remove(), the pciehp driver's .remove() method.  That calls
      flush_workqueue(pciehp_wq), which deadlocks because the
      pciehp_power_thread() work item is still running.
      
      This patch avoids the deadlock by creating a workqueue for every PCIe port
      and removing the single shared workqueue.
      
      Here's the call path that leads to the deadlock:
      
        pciehp_queue_pushbutton_work
          queue_work(pciehp_wq)                   # queue pciehp_power_thread
          ...
      
        pciehp_power_thread
          pciehp_disable_slot
            remove_board
      	pciehp_unconfigure_device
      	  pci_stop_and_remove_bus_device
      	    ...
      	      pciehp_remove                 # pciehp driver .remove method
      		pciehp_release_ctrl
      		  pcie_cleanup_slot
      		    flush_workqueue(pciehp_wq)
      
      This is fairly urgent because it can be caused by simply unplugging a
      Thunderbolt adapter, as reported by Daniel below.
      
      [bhelgaas: changelog]
      Reference: http://lkml.kernel.org/r/CAMVG2ssiRgcTD1bej2tkUUfsWmpL5eNtPcNif9va2-Gzb2u8nQ@mail.gmail.comReported-and-tested-by: NDaniel J Blueman <daniel@quora.org>
      Reviewed-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      Signed-off-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: stable@vger.kernel.org
      c2be6f93
  17. 24 8月, 2012 1 次提交
  18. 13 7月, 2012 1 次提交
  19. 15 2月, 2012 5 次提交
  20. 07 1月, 2012 1 次提交
  21. 06 12月, 2011 1 次提交
  22. 12 11月, 2011 2 次提交