1. 02 9月, 2020 3 次提交
    • A
      PCI: pciehp: Disable in-band presence detect when possible · 7144257e
      Alexandru Gagniuc 提交于
      task #29600094
      
      commit 202853595e53f981c86656c49fc1cc1e3620f558 upstream.
      Backport summary: for 4.19 kernel ICX PCIe Gen4 support.
      
      The presence detect state (PDS) is normally a logical OR of in-band and
      out-of-band (OOB) presence detect.  As of PCIe 4.0, there is the option to
      disable in-band presence so that the PDS bit always reflects the state of
      the out-of-band presence.
      
      The recommendation of the PCIe spec is to disable in-band presence whenever
      supported (PCIe r5.0, appendix I implementation note):
      
        Due to architectural issues, the in-band (Physical-Layer-based) portion
        of the PD mechanism is deprecated for use with async hot-plug. One issue
        is that in-band PD as architected does not detect adapter removal during
        certain LTSSM states, notably the L1 and Disabled States.  Another issue
        is that when both in-band and OOB PD are being used together, the
        Presence Detect State bit and its associated interrupt mechanism always
        reflect the logical OR of the inband and OOB PD states, and with some
        hot-plug hardware configurations, it is important for software to detect
        and respond to in-band and OOB PD events independently.  If OOB PD is
        being used and the associated DSP supports In-Band PD Disable, it is
        recommended that the In-Band PD Disable bit be Set, and the Presence
        Detect State bit and its associated interrupt mechanism be used
        exclusively for OOB PD.  As a substitute for in-band PD with async
        hot-plug, the reference model uses either the DPC or the DLL Link Active
        mechanism.
      
      Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com
      [bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD
      value (suggested by Lukas)]
      Signed-off-by: NAlexandru Gagniuc <mr.nuke.me@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
      Reviewed-by: NLukas Wunner <lukas@wunner.de>
      
      (cherry picked from commit 202853595e53f981c86656c49fc1cc1e3620f558)
      Signed-off-by: NEthan Zhao <haifeng.zhao@intel.com>
      
      Conflicts:
      	drivers/pci/hotplug/pciehp.h
      	drivers/pci/hotplug/pciehp_hpc.c
      Signed-off-by: NArtie Ding <artie.ding@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      7144257e
    • K
      PCI: Make link active reporting detection generic · 02160c1f
      Keith Busch 提交于
      task #29600094
      
      commit f0157160b359b1d263ee9d4e0a435a7ad85bbcea upstream.
      Backport summary: for 4.19 kernel ICX PCIe Gen4 support.
      
      The spec has timing requirements when waiting for a link to become active
      after a conventional reset.  Implement those hard delays when waiting for
      an active link so pciehp and dpc drivers don't need to duplicate this.
      
      For devices that don't support data link layer active reporting, wait the
      fixed time recommended by the PCIe spec.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      [bhelgaas: changelog]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NSinan Kaya <okaya@kernel.org>
      (cherry picked from commit f0157160b359b1d263ee9d4e0a435a7ad85bbcea)
      Signed-off-by: NEthan Zhao <haifeng.zhao@intel.com>
      Signed-off-by: NArtie Ding <artie.ding@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      02160c1f
    • L
      PCI: Simplify disconnected marking · 9e884897
      Lukas Wunner 提交于
      task #29600094
      
      commit a50ac6bfd6042b16e0de4ac3264c407e678c9b10 upstream.
      Backport summary: for 4.19 kernel ICX PCIe Gen4 support.
      
      Commit 89ee9f76 ("PCI: Add device disconnected state") iterates over
      the devices on a parent bus, marks each as disconnected, then marks
      each device's children as disconnected using pci_walk_bus().
      
      The same can be achieved more succinctly by calling pci_walk_bus() on
      the parent bus.  Moreover, this does not need to wait until acquiring
      pci_lock_rescan_remove(), so move it out of that critical section.
      
      The critical section in err.c contains a pci_dev_get() / pci_dev_put()
      pair which was apparently copy-pasted from pciehp_pci.c.  In the latter
      it serves the purpose of holding the struct pci_dev in place until the
      Command register is updated.  err.c doesn't do anything like that, hence
      the pair is unnecessary.  Remove it.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Oza Pawandeep <poza@codeaurora.org>
      Cc: Sinan Kaya <okaya@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      (cherry picked from commit a50ac6bfd6042b16e0de4ac3264c407e678c9b10)
      Signed-off-by: NEthan Zhao <haifeng.zhao@intel.com>
      Signed-off-by: NArtie Ding <artie.ding@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      9e884897
  2. 21 12月, 2019 1 次提交
  3. 18 12月, 2019 1 次提交
    • M
      ACPI / hotplug / PCI: Allocate resources directly under the non-hotplug bridge · 9f5ee706
      Mika Westerberg 提交于
      commit 77adf9355304f8dcf09054280af5e23fc451ab3d upstream.
      
      Valerio and others reported that commit 84c8b58e ("ACPI / hotplug /
      PCI: Don't scan bridges managed by native hotplug") prevents some recent
      LG and HP laptops from booting with endless loop of:
      
        ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835)
        ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835)
        ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835)
        ...
      
      What seems to happen is that during boot, after the initial PCI enumeration
      when EC is enabled the platform triggers ACPI Notify() to one of the root
      ports. The root port itself looks like this:
      
        pci 0000:00:1b.0: PCI bridge to [bus 02-3a]
        pci 0000:00:1b.0:   bridge window [mem 0xc4000000-0xda0fffff]
        pci 0000:00:1b.0:   bridge window [mem 0x80000000-0xa1ffffff 64bit pref]
      
      The BIOS has configured the root port so that it does not have I/O bridge
      window.
      
      Now when the ACPI Notify() is triggered ACPI hotplug handler calls
      acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system is
      using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() to
      allocate resources.
      
      The device connected to the root port is a PCIe switch (Thunderbolt
      controller) with two hotplug downstream ports. Because of the hotplug ports
      __pci_bus_size_bridges() tries to add "additional I/O" of 256 bytes to each
      (DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k as that's the
      minimum I/O window size so each hotplug port gets 4k I/O window and the
      same happens for the root port (which is also hotplug port). This means
      3 * 4k = 12k I/O window.
      
      Because of this pci_assign_unassigned_bridge_resources() ends up opening a
      I/O bridge window for the root port at first available I/O address which
      seems to be in range 0x1000 - 0x3fff. Normally this range is used for ACPI
      stuff such as GPE bits (below is part of /proc/ioports):
      
          1800-1803 : ACPI PM1a_EVT_BLK
          1804-1805 : ACPI PM1a_CNT_BLK
          1808-180b : ACPI PM_TMR
          1810-1815 : ACPI CPU throttle
          1850-1850 : ACPI PM2_CNT_BLK
          1854-1857 : pnp 00:05
          1860-187f : ACPI GPE0_BLK
      
      However, when the ACPI Notify() happened this range was not yet reserved
      for ACPI/PNP (that happens later) so PCI gets it. It then starts writing to
      this range and accidentally stomps over GPE bits among other things causing
      the endless stream of messages about missing GPE handler.
      
      This problem does not happen if "pci=hpiosize=0" is passed in the kernel
      command line. The reason is that then the kernel does not try to allocate
      the additional 256 bytes for each hotplug port.
      
      Fix this by allocating resources directly below the non-hotplug bridges
      where a new device may appear as a result of ACPI Notify(). This avoids the
      hotplug bridges and prevents opening the additional I/O window.
      
      Fixes: 84c8b58e ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203617
      Link: https://lore.kernel.org/r/20191030150545.19885-1-mika.westerberg@linux.intel.comReported-by: NValerio Passini <passini.valerio@gmail.com>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f5ee706
  4. 21 11月, 2019 1 次提交
    • K
      PCI: portdrv: Initialize service drivers directly · 1eeee2fd
      Keith Busch 提交于
      [ Upstream commit c29de84149aba5f74e87b6491c13ac7203c12f55 ]
      
      The PCI port driver saves the PCI state after initializing the device with
      the applicable service devices.  This was, however, before the service
      drivers were even registered because PCI probe happens before the
      device_initcall initialized those service drivers.  The config space state
      that the services set up were not being saved.  The end result would cause
      PCI devices to not react to events that the drivers think they did if the
      PCI state ever needed to be restored.
      
      Fix this by changing the service drivers from using the init calls to
      having the portdrv driver calling the services directly.  This will get the
      state saved as desired, while making the relationship between the port
      driver and the services under it more explicit in the code.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NSinan Kaya <okaya@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1eeee2fd
  5. 08 10月, 2019 1 次提交
  6. 15 6月, 2019 1 次提交
  7. 17 4月, 2019 1 次提交
  8. 27 9月, 2018 1 次提交
    • M
      ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge · f188b99f
      Mika Westerberg 提交于
      HP 6730b laptop has an ethernet NIC connected to one of the PCIe root
      ports.  The root ports themselves are native PCIe hotplug capable.  Now,
      during boot after PCI devices are scanned the BIOS triggers ACPI bus check
      directly to the NIC:
      
        ACPI: \_SB_.PCI0.RP06.NIC_: Bus check in hotplug_event()
      
      It is not clear why it is sending bus check but regardless the ACPI hotplug
      notify handler calls enable_slot() directly (instead of going through
      acpiphp_check_bridge() as there is no bridge), which ends up handling
      special case for non-hotplug bridges with native PCIe hotplug.  This
      results a crash of some kind but the reporter only sees black screen so it
      is hard to figure out the exact spot and what actually happens.  Based on
      a few fix proposals it was tracked to crash somewhere inside
      pci_assign_unassigned_bridge_resources().
      
      In any case we should not really be in that special branch at all because
      the ACPI notify happened to a slot that is not a PCI bridge (it is just a
      regular PCI device).
      
      Fix this so that we only go to that special branch if we are calling
      enable_slot() for a bridge (e.g., the ACPI notification was for the
      bridge).
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201127
      Fixes: 84c8b58e ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug")
      Reported-by: NPeter Anemone <peter.anemone@gmail.com>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      CC: stable@vger.kernel.org	# v4.18+
      f188b99f
  9. 11 9月, 2018 1 次提交
    • K
      PCI: pciehp: Fix hot-add vs powerfault detection order · 34fb6bf9
      Keith Busch 提交于
      If both hot-add and power fault were observed in a single interrupt, we
      handled the hot-add first, then the power fault, in this path:
      
        pciehp_ist
          if (events & (PDC | DLLSC))
            pciehp_handle_presence_or_link_change
              case OFF_STATE:
                pciehp_enable_slot
                  __pciehp_enable_slot
                    board_added
                      pciehp_power_on_slot
                        ctrl->power_fault_detected = 0
                        pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC)
                      pciehp_green_led_on(p_slot)             # power LED on
      		pciehp_set_attention_status(p_slot, 0)  # attention LED off
          if ((events & PFD) && !ctrl->power_fault_detected)
            ctrl->power_fault_detected = 1
            pciehp_set_attention_status(1)                    # attention LED on
            pciehp_green_led_off(slot)                        # power LED off
      
      This left the attention indicator on (even though the hot-add succeeded)
      and the power indicator off (even though the slot power was on).
      
      Fix this by checking for power faults before checking for new devices.
      
      Prior to 0e94916e, this was successful because everything was chained
      through work queues and the order was:
      
        INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ
      
      The ENABLE_REQ cleared the power fault at the end, but now everything is
      handled inline with the interrupt thread, such that the work ENABLE_REQ was
      doing happens before power fault handling now.
      
      Fixes: 0e94916e ("PCI: pciehp: Handle events synchronously")
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      [bhelgaas: changelog]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLukas Wunner <lukas@wunner.de>
      34fb6bf9
  10. 01 8月, 2018 8 次提交
    • L
      PCI: pciehp: Deduplicate presence check on probe & resume · 4e6a1335
      Lukas Wunner 提交于
      On driver probe and on resume from system sleep, pciehp checks the
      Presence Detect State bit in the Slot Status register to bring up an
      occupied slot or bring down an unoccupied slot.  Both code paths are
      identical, so deduplicate them per Mika's request.
      
      On probe, an additional check is performed to disable power of an
      unoccupied slot.  This can e.g. happen if power was enabled by BIOS.
      It cannot happen once pciehp has taken control, hence is not necessary
      on resume:  The Slot Control register is set to the same value that it
      had on suspend by pci_restore_state(), so if the slot was occupied,
      power is enabled and if it wasn't, power is disabled.  Should occupancy
      have changed during the system sleep transition, power is adjusted by
      bringing up or down the slot per the paragraph above.
      
      To allow for deduplication of the presence check, move the power check
      to pcie_init().  This seems safer anyway, because right now it is
      performed while interrupts are already enabled, and although I can't
      think of a scenario where pciehp_power_off_slot() and the IRQ thread
      collide, it does feel brittle.
      
      However this means that pcie_init() may now write to the Slot Control
      register before the IRQ is requested.  If both the CCIE and HPIE bits
      happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
      of polling the Command Completed bit) and eventually emit a timeout
      message.  Additionally, if a level-triggered INTx interrupt is used,
      the user may see a spurious interrupt splat.  Avoid by disabling
      interrupts before disabling power.  (Normally the HPIE and CCIE bits
      should be clear on probe, but conceivably they may already have been
      set e.g. by BIOS.)
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      4e6a1335
    • L
      PCI: pciehp: Avoid implicit fallthroughs in switch statements · 8bb46b07
      Lukas Wunner 提交于
      Per Mika's request, add an explicit break to the last case of switch
      statements everywhere in pciehp to be more defensive towards future
      amendments.
      
      Per Gustavo's request, mark all non-empty implicit fallthroughs with a
      comment to silence warnings triggered by -Wimplicit-fallthrough=2.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      8bb46b07
    • H
      PCI: Fix is_added/is_busmaster race condition · 44bda4b7
      Hari Vyas 提交于
      When a PCI device is detected, pdev->is_added is set to 1 and proc and
      sysfs entries are created.
      
      When the device is removed, pdev->is_added is checked for one and then
      device is detached with clearing of proc and sys entries and at end,
      pdev->is_added is set to 0.
      
      is_added and is_busmaster are bit fields in pci_dev structure sharing same
      memory location.
      
      A strange issue was observed with multiple removal and rescan of a PCIe
      NVMe device using sysfs commands where is_added flag was observed as zero
      instead of one while removing device and proc,sys entries are not cleared.
      This causes issue in later device addition with warning message
      "proc_dir_entry" already registered.
      
      Debugging revealed a race condition between the PCI core setting the
      is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
      setting the is_busmaster bit in pci_set_master().  As these fields are not
      handled atomically, that clears the is_added bit.
      
      Move the is_added bit to a separate private flag variable and use atomic
      functions to set and retrieve the device addition state.  This avoids the
      race because is_added no longer shares a memory location with is_busmaster.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283Signed-off-by: NHari Vyas <hari.vyas@broadcom.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLukas Wunner <lukas@wunner.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      44bda4b7
    • L
      PCI: pciehp: Resume parent to D0 on config space access · 4417aa45
      Lukas Wunner 提交于
      Ensure accessibility of a hotplug port's config space when accessed via
      sysfs by resuming its parent to D0.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      4417aa45
    • L
      PCI: pciehp: Resume to D0 on enable/disable · 83503074
      Lukas Wunner 提交于
      pciehp's IRQ thread ensures accessibility of the port by runtime resuming
      its parent to D0.  However when the slot is enabled/disabled, the port
      itself needs to be in D0 because its secondary bus is accessed in:
      
          pciehp_check_link_status(),
          pciehp_configure_device() (both called from board_added())
      and
          pciehp_unconfigure_device() (called from remove_board()).
      
      Thus, acquire a runtime PM ref on enable/disablement of the slot.
      
      Yinghai Lu additionally discovered that some SkyLake servers feature a
      Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
      which requires the port to be in D0 when invoking
      
          pciehp_power_on_slot() (likewise called from board_added()).
      
      If slot power is turned on while in D3hot, link training later fails:
      https://lkml.kernel.org/r/20170205073454.GA253@wunner.de
      
      The spec is silent about such a requirement, but it seems prudent to
      assume that any hotplug port with a Power Controller may need this.
      
      The present commit holds a runtime PM ref whenever slot power is turned
      on and off, but it doesn't keep the port in D0 as long as slot power is
      on.  If vendors determine that's necessary, they need to amend pciehp to
      acquire a runtime PM ref in pciehp_power_on_slot() and release one in
      pciehp_power_off_slot().
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      83503074
    • L
      PCI: pciehp: Support interrupts sent from D3hot · 6b08c385
      Lukas Wunner 提交于
      If a hotplug port is able to send an interrupt, one would naively assume
      that it is accessible at that moment.  After all, if it wouldn't be
      accessible, i.e. if its parent is in D3hot and the link to the hotplug
      port is thus down, how should an interrupt come through?
      
      It turns out that assumption is wrong at least for Thunderbolt:  Even
      though its parents are in D3hot, a Thunderbolt hotplug port is able to
      signal interrupts.  Because the port's config space is inaccessible and
      resuming the parents may sleep, the hard IRQ handler has to defer
      runtime resuming the parents and reading the Slot Status register to the
      IRQ thread.
      
      If the hotplug port uses a level-triggered INTx interrupt, it needs to
      be masked until the IRQ thread has cleared the signaled events.  For
      simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
      Note that if the interrupt is shared (which can only happen for INTx),
      other devices are starved from receiving interrupts until the IRQ thread
      is scheduled, has runtime resumed the hotplug port's parents and has
      read and cleared the Slot Status register.
      
      That delay is dominated by the 10 ms D3hot->D0 transition time of each
      parent port.  The worst case is a Thunderbolt downstream port at the
      end of a daisy chain:  There may be up to six Thunderbolt controllers
      in-between it and the root port, each comprising an upstream and
      downstream port, plus its own upstream port.  That's 13 x 10 = 130 ms.
      Possible mitigations are polling the interrupt while it's disabled or
      reducing the d3_delay of Thunderbolt ports if possible.
      
      Open code masking of the interrupt instead of requesting it with the
      IRQF_ONESHOT flag to minimize the period during which it is masked.
      (IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
      
      PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
      associated form factor specification, a hotplug capable Downstream Port
      must support generation of a wakeup event (using the PME mechanism) on
      hotplug events that occur when the system is in a sleep state or the
      Port is in device state D1, D2, or D3Hot."
      
      This would seem to imply that PME needs to be enabled on the hotplug
      port when it is runtime suspended.  pci_enable_wake() currently doesn't
      enable PME on bridges, it may be necessary to add an exemption for
      hotplug bridges there.  On "Light Ridge" Thunderbolt controllers, the
      PME_Status bit is not set when an interrupt occurs while the hotplug
      port is in D3hot, even if PME is enabled.  (I've tested this on a Mac
      and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
      negotiate_os_control(), modifying it to 1 didn't change the behavior.)
      
      (Side note:  Section 6.7.3.4 also states that "PME and Hot-Plug Event
      interrupts (when both are implemented) always share the same MSI or
      MSI-X vector".  That would only seem to apply to Root Ports, however
      the section never mentions Root Ports, only Downstream Ports.  This is
      explained in the definition of "Downstream Port" in the "Terms and
      Acronyms" section of the PCIe Base Spec:  "The Ports on a Switch that
      are not the Upstream Port are Downstream Ports.  All Ports on a Root
      Complex are Downstream Ports.")
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      6b08c385
    • L
      PCI: pciehp: Obey compulsory command delay after resume · 469e764c
      Lukas Wunner 提交于
      Upon resume from system sleep, the Slot Control register is written via:
      
        pci_pm_resume_noirq()
          pci_pm_default_resume_early()
            pci_restore_state()
              pci_restore_pcie_state()
      
      PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that
      targets any portion of the Port's Slot Control register, [...] software
      must wait for [the] command to complete before issuing the next command".
      
      pciehp currently fails to enforce that rule after the above-mentioned
      write.  Fix it.
      
      (Moving restoration of the Slot Control register to pciehp doesn't seem
      to make sense because the other PCIe hotplug drivers may need it as
      well.)
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      469e764c
    • L
      PCI: pciehp: Clear spurious events earlier on resume · 79037824
      Lukas Wunner 提交于
      Thunderbolt hotplug ports that were occupied before system sleep resume
      with their downstream link in "off" state.  Only after the Thunderbolt
      controller has reestablished the PCIe tunnels does the link go up.
      As a result, a spurious Presence Detect Changed and/or Data Link Layer
      State Changed event occurs.
      
      The events are not immediately acted upon because tunnel reestablishment
      happens in the ->resume_noirq phase, when interrupts are still disabled.
      Also, notification of events may initially be disabled in the Slot
      Control register when coming out of system sleep and is reenabled in the
      ->resume_noirq phase through:
      
        pci_pm_resume_noirq()
          pci_pm_default_resume_early()
            pci_restore_state()
              pci_restore_pcie_state()
      
      It is not guaranteed that the events are acted upon at all:  PCIe r4.0,
      sec 6.7.3.4 says that "a port may optionally send an MSI when there are
      hot-plug events that occur while interrupt generation is disabled, and
      interrupt generation is subsequently enabled."  Note the "optionally".
      
      If an MSI is sent, pciehp will gratuitously turn the slot off and back
      on once the ->resume_early phase has commenced.
      
      If an MSI is not sent, the extant, unacknowledged events in the Slot
      Status register will prevent future notification of presence or link
      changes.
      
      Commit 13c65840 ("PCI: pciehp: Clear Presence Detect and Data Link
      Layer Status Changed on resume") fixed the latter by clearing the events
      in the ->resume phase.  Move this to the ->resume_noirq phase to also
      fix the gratuitous disable/enablement of the slot.
      
      The commit further restored the Slot Control register in the ->resume
      phase, but that's dispensable because as shown above it's already been
      done in the ->resume_noirq phase.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      79037824
  11. 31 7月, 2018 1 次提交
    • L
      PCI: pciehp: Avoid slot access during reset · 5b3f7b7d
      Lukas Wunner 提交于
      The ->reset_slot callback introduced by commits:
      
        2e35afae ("PCI: pciehp: Add reset_slot() method") and
        06a8d89a ("PCI: pciehp: Disable link notification across slot reset")
      
      disables notification of Presence Detect Changed and Data Link Layer
      State Changed events for the duration of a secondary bus reset.
      
      However a bus reset not only triggers these events, but may also clear
      the Presence Detect State bit in the Slot Status register and the Data
      Link Layer Link Active bit in the Link Status register momentarily.
      According to Sinan Kaya:
      
       "I know for a fact that bus reset clears the Data Link Layer Active bit
        as soon as link goes down.  It gets set again following link up.
        Presence detect depends on the HW implementation.  QDT root ports
        don't change presence detect for instance since nobody actually
        removed the card.  If an implementation supports in-band presence
        detect, the answer is yes.  As soon as the link goes down, presence
        detect bit will get cleared until recovery."
        https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org
      
        In-band presence detect is also covered in Table 4-15 in PCIe r4.0,
        sec 4.2.6.
      
      pciehp should therefore ensure that any parts of the driver that access
      those bits do not run concurrently to a bus reset.  The only precaution
      the commits took to that effect was to halt interrupt polling.  They
      made no effort to drain the slot workqueue, cancel an outstanding
      Attention Button work, or block slot enable/disable requests via sysfs
      and in the ->probe hook.
      
      Now that pciehp is converted to enable/disable the slot exclusively from
      the IRQ thread, the only places accessing the two above-mentioned bits
      are the IRQ thread and the ->probe hook.  Add locking to serialize them
      with a bus reset.  This obviates the need to halt interrupt polling.
      Do not add locking to the ->get_adapter_status sysfs callback to afford
      users unfettered access to that bit.  Use an rw_semaphore in lieu of a
      regular mutex to allow parallel execution of the non-reset code paths
      accessing the critical bits, i.e. the IRQ thread and the ->probe hook.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rajat Jain <rajatja@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Sinan Kaya <okaya@kernel.org>
      5b3f7b7d
  12. 24 7月, 2018 20 次提交
    • L
      PCI: pciehp: Always enable occupied slot on probe · cdf6b736
      Lukas Wunner 提交于
      Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
      there are hot-plug events that occur while interrupt generation is
      disabled, and interrupt generation is subsequently enabled."
      
      On probe, we currently clear all event bits in the Slot Status register
      with the notable exception of the Presence Detect Changed bit.  Thereby
      we seek to receive an interrupt for an already occupied slot once event
      notification is enabled.
      
      But because the interrupt is optional, users may have to specify the
      pciehp_force parameter on the command line, which is inconvenient.
      
      Moreover, now that pciehp's event handling has become resilient to
      missed events, a Presence Detect Changed interrupt for a slot which is
      powered on is interpreted as removal of the card.  If the slot has
      already been brought up by the BIOS, receiving such an interrupt on
      probe causes the slot to be powered off and immediately back on, which
      is likewise undesirable.
      
      Avoid both issues by making the behavior of pciehp_force the default and
      clearing the Presence Detect Changed bit on probe.
      
      Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
      ("Force pciehp, even if OSHP is missing") seems nonsensical because the
      OSHP control method is only relevant for SHCP slots according to the
      PCI Firmware specification r3.0, sec 4.8.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      cdf6b736
    • L
      PCI: pciehp: Become resilient to missed events · d331710e
      Lukas Wunner 提交于
      A hotplug port's Slot Status register does not count how often each type
      of event occurred, it only records the fact *that* an event has occurred.
      
      Previously pciehp queued a work item for each event.  But if it missed
      an event, e.g. removal of a card in-between two back-to-back insertions,
      it queued up the wrong work item or no work item at all.  Commit
      fad214b0 ("PCI: pciehp: Process all hotplug events before looking
      for new ones") sought to improve the situation by shrinking the window
      during which events may be missed.
      
      But Stefan Roese reports unbalanced Card present and Link Up events,
      suggesting that we're still missing events if they occur very rapidly.
      Bjorn Helgaas responds that he considers pciehp's event handling
      "baroque" and calls for its simplification and rationalization:
      https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com
      
      It gets worse once a hotplug port is runtime suspended:  The port can
      signal an interrupt while it and its parents are in D3hot, i.e. while
      it is inaccessible.  By the time we've runtime resumed all parents to D0
      and read the port's Slot Status register, we may have missed an arbitrary
      number of events.  Event handling therefore needs to be reworked to
      become resilient to missed events.
      
      Assume that a Presence Detect Changed event has occurred.
      Consider the following truth table:
      - Slot is in OFF_STATE and is currently empty.    => Do nothing.
        (The event is trailing a Link Down or we've
        missed an insertion and subsequent removal.)
      - Slot is in OFF_STATE and is currently occupied. => Turn the slot on.
      - Slot is in ON_STATE  and is currently empty.    => Turn the slot off.
      - Slot is in ON_STATE  and is currently occupied. => Turn the slot off,
        (Be cautious and assume the card in                then back on.
        the slot isn't the same as before.)
      
      This leads to the following simple algorithm:
      1 If the slot is in ON_STATE, turn it off unconditionally.
      2 If the slot is currently occupied, turn it on.
      
      Because those actions are now carried out synchronously, rather than by
      scheduled work items, pciehp reacts to the *current* situation and
      missed events no longer matter.
      
      Data Link Layer State Changed events can be handled identically to
      Presence Detect Changed events.  Note that in the above truth table,
      a Link Up trailing a Card present event didn't have to be accounted for:
      It is filtered out by pciehp_check_link_status().
      
      As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says:
      "Once the Power Indicator begins blinking, a 5-second abort interval
      exists during which a second depression of the Attention Button cancels
      the operation."  In other words, the user can only expect the system to
      react to a button press after it starts blinking.  Missed button presses
      that occur in-between are irrelevant.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Stefan Roese <sr@denx.de>
      Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      d331710e
    • L
      PCI: pciehp: Tolerate initially unstable link · 6c35a1ac
      Lukas Wunner 提交于
      When a device is hotplugged, Presence Detect and Link Up events often do
      not occur simultaneously, but with a lag of a few milliseconds.  Only
      the first event received is relevant, the other one can be disregarded.
      
      Moreover, Stefan Roese reports that on certain platforms, Link State and
      Presence Detect may flap for up to 100 ms before stabilizing, suggesting
      that such events should be disregarded for at least this long:
      https://lkml.kernel.org/r/20180130084121.18653-1-sr@denx.de
      
      On slot enablement, pciehp_check_link_status() waits for 100 ms per
      PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor
      register for up to 1 second.
      
      If this succeeds, the link is definitely up, so ignore any Presence
      Detect or Link State events that occurred up to this point.
      
      pciehp_check_link_status() then checks the Link Training bit in the
      Link Status register.  This is the final opportunity to detect
      inaccessibility of the device and abort slot enablement.  Any link
      or presence change that occurs afterwards will cause the slot to be
      disabled again immediately after attempting to enable it.
      
      The astute reviewer may appreciate that achieving this behavior would be
      more complicated had pciehp not just been converted to enable/disable
      the slot exclusively from the IRQ thread:  When the slot is enabled via
      sysfs, each link or presence flap would otherwise cause the IRQ thread
      to run and it would have to sense that those events are belonging to a
      concurrent slot enablement operation and disregard them.  It would be
      much more difficult than this mere 3 line change.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Stefan Roese <sr@denx.de>
      6c35a1ac
    • L
      PCI: pciehp: Declare pciehp_enable/disable_slot() static · 25c83b84
      Lukas Wunner 提交于
      No callers of pciehp_enable/disable_slot() outside of pciehp_ctrl.c
      remain, so declare the functions static.  For now this requires forward
      declarations.  Those can be eliminated by reshuffling functions once the
      ongoing effort to refactor the driver has settled.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      25c83b84
    • L
      PCI: pciehp: Drop enable/disable lock · 1656716d
      Lukas Wunner 提交于
      Previously slot enablement and disablement could happen concurrently.
      But now it's under the exclusive control of the IRQ thread, rendering
      the locking obsolete.  Drop it.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      1656716d
    • L
      PCI: pciehp: Enable/disable exclusively from IRQ thread · 32a8cef2
      Lukas Wunner 提交于
      Besides the IRQ thread, there are several other places in the driver
      which enable or disable the slot:
      
      - pciehp_probe() enables the slot if it's occupied and the pciehp_force
        module parameter is used.
      
      - pciehp_resume() enables or disables the slot after system sleep.
      
      - pciehp_queue_pushbutton_work() enables or disables the slot after the
        5 second delay following an Attention Button press.
      
      - pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
        disable the slot on sysfs write.
      
      This requires locking and complicates pciehp's state machine.
      
      A simplification can be achieved by enabling and disabling the slot
      exclusively from the IRQ thread.
      
      Amend the functions listed above to request slot enable/disablement from
      the IRQ thread by either synthesizing a Presence Detect Changed event or,
      in the case of a disable user request (via sysfs or an Attention Button
      press), submitting a newly introduced force disable request.  The latter
      is needed because the slot shall be forced off despite being occupied.
      For this force disable request, avoid colliding with Slot Status register
      bits by using a bit number greater than 16.
      
      For synchronous execution of requests (on sysfs write), wait for the
      request to finish and retrieve the result.  There can only ever be one
      sysfs write in flight due to the locking in kernfs_fop_write(), hence
      there is no risk of returning the result of a different sysfs request to
      user space.
      
      The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
      above-listed functions, but solely by the IRQ thread when it begins a
      power transition.  Afterwards, it moves to STATIC_STATE.  The same
      applies to canceling the Attention Button work, it likewise becomes an
      IRQ thread only operation.
      
      An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
      never observed by the IRQ thread itself, only by functions called in a
      different context, such as pciehp_sysfs_enable_slot().  So remove
      handling of these states from pciehp_handle_button_press() and
      pciehp_handle_link_change() which are exclusively called from the IRQ
      thread.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      32a8cef2
    • L
      PCI: pciehp: Track enable/disable status · 9590192f
      Lukas Wunner 提交于
      handle_button_press_event() currently determines whether the slot has
      been turned on or off by looking at the Power Controller Control bit in
      the Slot Control register.  This assumes that an attention button
      implies presence of a power controller even though that's not mandated
      by the spec.  Moreover the Power Controller Control bit is unreliable
      when a power fault occurs (PCIe r4.0, sec 6.7.1.8).  This issue has
      existed since the driver was introduced in 2004.
      
      Fix by replacing STATIC_STATE with ON_STATE and OFF_STATE and tracking
      whether the slot has been turned on or off.  This is also a required
      ingredient to make pciehp resilient to missed events, which is the
      object of an upcoming commit.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      9590192f
    • L
      PCI: pciehp: Publish to user space last on probe · 774d446b
      Lukas Wunner 提交于
      The PCI hotplug core has just been refactored to separate slot
      initialization for in-kernel use from publication to user space.
      
      Take advantage of it in pciehp by publishing to user space last on
      probe.  This will allow enable/disablement of the slot exclusively from
      the IRQ thread because the IRQ is requested after initialization for
      in-kernel use (thereby getting its unique name needed by the IRQ thread)
      but before user space is able to submit enable/disable requests.
      
      On teardown, the order is the same in reverse:  The user space interface
      is removed prior to freeing the IRQ and destroying the slot.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      774d446b
    • L
      PCI: hotplug: Demidlayer registration with the core · 51bbf9be
      Lukas Wunner 提交于
      When a hotplug driver calls pci_hp_register(), all steps necessary for
      registration are carried out in one go, including creation of a kobject
      and addition to sysfs.  That's a problem for pciehp once it's converted
      to enable/disable the slot exclusively from the IRQ thread:  The thread
      needs to be spawned after creation of the kobject (because it uses the
      kobject's name), but before addition to sysfs (because it will handle
      enable/disable requests submitted via sysfs).
      
      pci_hp_deregister() does offer a ->release callback that's invoked
      after deletion from sysfs and before destruction of the kobject.  But
      because pci_hp_register() doesn't offer a counterpart, hotplug drivers'
      ->probe and ->remove code becomes asymmetric, which is error prone
      as recently discovered use-after-free bugs in pciehp's ->remove hook
      have shown.
      
      In a sense, this appears to be a case of the midlayer antipattern:
      
         "The core thesis of the "midlayer mistake" is that midlayers are
          bad and should not exist.  That common functionality which it is
          so tempting to put in a midlayer should instead be provided as
          library routines which can [be] used, augmented, or ignored by
          each bottom level driver independently.  Thus every subsystem
          that supports multiple implementations (or drivers) should
          provide a very thin top layer which calls directly into the
          bottom layer drivers, and a rich library of support code that
          eases the implementation of those drivers.  This library is
          available to, but not forced upon, those drivers."
              --  Neil Brown (2009), https://lwn.net/Articles/336262/
      
      The presence of midlayer traits in the PCI hotplug core might be ascribed
      to its age:  When it was introduced in February 2002, the blessings of a
      library approach might not have been well known:
      https://git.kernel.org/tglx/history/c/a8a2069f432c
      
      For comparison, the driver core does offer split functions for creating
      a kobject (device_initialize()) and addition to sysfs (device_add()) as
      an alternative to carrying out everything at once (device_register()).
      This was introduced in October 2002:
      https://git.kernel.org/tglx/history/c/8b290eb19962
      
      The odd ->release callback in the PCI hotplug core was added in 2003:
      https://git.kernel.org/tglx/history/c/69f8d663b595
      
      Clearly, a library approach would not force every hotplug driver to
      implement a ->release callback, but rather allow the driver to remove
      the sysfs files, release its data structures and finally destroy the
      kobject.  Alternatively, a driver may choose to remove everything with
      pci_hp_deregister(), then release its data structures.
      
      To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a
      split-up version of pci_hp_register().  Likewise, offer pci_hp_del()
      and pci_hp_destroy() as a split-up version of pci_hp_deregister().
      
      Eliminate the ->release callback and move its code into each driver's
      teardown routine.
      
      Declare pci_hp_deregister() void, in keeping with the usual kernel
      pattern that enablement can fail, but disablement cannot.  It only
      returned an error if the caller passed in a NULL pointer or a slot which
      has never or is no longer registered or is sharing its name with another
      slot.  Those would be bugs, so WARN about them.  Few hotplug drivers
      actually checked the return value and those that did only printed a
      useless error message to dmesg.  Remove that.
      
      For most drivers the conversion was straightforward since it doesn't
      matter whether the code in the ->release callback is executed before or
      after destruction of the kobject.  But in the case of ibmphp, it was
      unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to
      NULL needs to happen before the kobject is destroyed, so I erred on
      the side of caution and ensured that the order stays the same.  Another
      nontrivial case is pnv_php, I've found the list and kref logic difficult
      to understand, however my impression was that it is safe to delete the
      list element and drop the references until after the kobject is
      destroyed.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>  # drivers/platform/x86
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Scott Murray <scott@spiteful.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Corentin Chary <corentin.chary@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Andy Shevchenko <andy@infradead.org>
      51bbf9be
    • L
      PCI: pciehp: Drop slot workqueue · 55a6b7a6
      Lukas Wunner 提交于
      Previously the slot workqueue was used to handle events and enable or
      disable the slot.  That's no longer the case as those tasks are done
      synchronously in the IRQ thread.  The slot workqueue is thus merely used
      to handle a button press after the 5 second delay and only one such work
      item may be in flight at any given time.  A separate workqueue isn't
      necessary for this simple task, so use the system workqueue instead.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      55a6b7a6
    • L
      PCI: pciehp: Handle events synchronously · 0e94916e
      Lukas Wunner 提交于
      Up until now, pciehp's IRQ handler schedules a work item for each event,
      which in turn schedules a work item to enable or disable the slot.  This
      double indirection was necessary because sleeping wasn't allowed in the
      IRQ handler.
      
      However it is now that pciehp has been converted to threaded IRQ handling
      and polling, so handle events synchronously in pciehp_ist() and remove
      the work item infrastructure (with the exception of work items to handle
      a button press after the 5 second delay).
      
      For link or presence change events, move the register read to determine
      the current link or presence state behind acquisition of the slot lock
      to prevent it from becoming stale while the lock is contended.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      0e94916e
    • L
      PCI: pciehp: Stop blinking on slot enable failure · b0ccd9dd
      Lukas Wunner 提交于
      If the attention button is pressed to power on the slot AND the user
      powers on the slot via sysfs before 5 seconds have elapsed AND powering
      on the slot fails because either the slot is unoccupied OR the latch is
      open, we neglect turning off the green LED so it keeps on blinking.
      
      That's because the error path of pciehp_sysfs_enable_slot() doesn't call
      pciehp_green_led_off(), unlike pciehp_power_thread() which does.
      The bug has been present since 2004 when the driver was introduced.
      
      Fix by deduplicating common code in pciehp_sysfs_enable_slot() and
      pciehp_power_thread() into a wrapper function pciehp_enable_slot() and
      renaming the existing function to __pciehp_enable_slot().  Same for
      pciehp_disable_slot().  This will also simplify the upcoming rework of
      pciehp's event handling.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      b0ccd9dd
    • L
      PCI: pciehp: Convert to threaded polling · ec07a447
      Lukas Wunner 提交于
      We've just converted pciehp to threaded IRQ handling, but still cannot
      sleep in pciehp_ist() because the function is also called in poll mode,
      which runs in softirq context (from a timer).
      
      Convert poll mode to a kthread so that pciehp_ist() always runs in task
      context.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      ec07a447
    • L
      PCI: pciehp: Convert to threaded IRQ · 7b4ce26b
      Lukas Wunner 提交于
      pciehp's IRQ handler queues up a work item for each event signaled by
      the hardware.  A more modern alternative is to let a long running
      kthread service the events.  The IRQ handler's sole job is then to check
      whether the IRQ originated from the device in question, acknowledge its
      receipt to the hardware to quiesce the interrupt and wake up the kthread.
      
      One benefit is reduced latency to handle the IRQ, which is a necessity
      for realtime environments.  Another benefit is that we can make pciehp
      simpler and more robust by handling events synchronously in process
      context, rather than asynchronously by queueing up work items.  pciehp's
      usage of work items is a historic artifact, it predates the introduction
      of threaded IRQ handlers by two years.  (The former was introduced in
      2007 with commit 5d386e1a ("pciehp: Event handling rework"), the
      latter in 2009 with commit 3aa551c9 ("genirq: add threaded interrupt
      handler support").)
      
      Convert pciehp to threaded IRQ handling by retrieving the pending events
      in pciehp_isr(), saving them for later consumption by the thread handler
      pciehp_ist() and clearing them in the Slot Status register.
      
      By clearing the Slot Status (and thereby acknowledging the events) in
      pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
      would have the unpleasant side effect of starving devices sharing the
      IRQ until pciehp_ist() has finished.
      
      pciehp_isr() does not count how many times each event occurred, but
      merely records the fact *that* an event occurred.  If the same event
      occurs a second time before pciehp_ist() is woken, that second event
      will not be recorded separately, which is problematic according to
      commit fad214b0 ("PCI: pciehp: Process all hotplug events before
      looking for new ones") because we may miss removal of a card in-between
      two back-to-back insertions.  We're about to make pciehp_ist() resilient
      to missed events.  The present commit regresses the driver's behavior
      temporarily in order to separate the changes into reviewable chunks.
      This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
      operations that happen in a timespan shorter than wakeup of the IRQ
      thread.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
      Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      7b4ce26b
    • L
      PCI: pciehp: Document struct slot and struct controller · 4aed1cd6
      Lukas Wunner 提交于
      Document the driver's data structures to lower the barrier to entry for
      contributors.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      4aed1cd6
    • L
      PCI: pciehp: Declare pciehp_unconfigure_device() void · 1d2e2673
      Lukas Wunner 提交于
      Since commit 0f4bd801 ("PCI: hotplug: Drop checking of PCI_BRIDGE_
      CONTROL in *_unconfigure_device()"), pciehp_unconfigure_device() can no
      longer fail, so declare it and its sole caller remove_board() void, in
      keeping with the usual kernel pattern that enablement can fail, but
      disablement cannot.  No functional change intended.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      1d2e2673
    • L
      PCI: pciehp: Drop unnecessary NULL pointer check · 6641311d
      Lukas Wunner 提交于
      pciehp_disable_slot() checks if the ctrl attribute of the slot is NULL
      and bails out if so.  However the function is not called prior to the
      attribute being set in pcie_init_slot(), and pcie_init_slot() is not
      called if ctrl is NULL.  So the check is unnecessary.  Drop it.
      
      It has been present ever since the driver was introduced in 2004, but it
      was already unnecessary back then:
      https://git.kernel.org/tglx/history/c/c16b4b14d980Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      6641311d
    • L
      PCI: pciehp: Fix unprotected list iteration in IRQ handler · 1204e35b
      Lukas Wunner 提交于
      Commit b440bde7 ("PCI: Add pci_ignore_hotplug() to ignore hotplug
      events for a device") iterates over the devices on a hotplug port's
      subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem.
      It is thus possible for a user to cause a crash by concurrently
      manipulating the device list, e.g. by disabling slot power via sysfs
      on a different CPU or by initiating a remove/rescan via sysfs.
      
      This can't be fixed by acquiring pci_bus_sem because it may sleep.
      The simplest fix is to avoid the list iteration altogether and just
      check the ignore_hotplug flag on the port itself.  This works because
      pci_ignore_hotplug() sets the flag both on the device as well as on its
      parent bridge.
      
      We do lose the ability to print the name of the device blocking hotplug
      in the debug message, but that's probably bearable.
      
      Fixes: b440bde7 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device")
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org
      1204e35b
    • L
      PCI: pciehp: Fix use-after-free on unplug · 281e878e
      Lukas Wunner 提交于
      When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the
      hotplug_slot struct is deregistered and thus freed before freeing the
      IRQ.  The IRQ handler and the work items it schedules print the slot
      name referenced from the freed structure in various informational and
      debug log messages, each time resulting in a quadruple dereference of
      freed pointers (hotplug_slot -> pci_slot -> kobject -> name).
      
      At best the slot name is logged as "(null)", at worst kernel memory is
      exposed in logs or the driver crashes:
      
        pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present
      
      An attacker may provoke the bug by unplugging multiple devices on a
      Thunderbolt daisy chain at once.  Unplugging can also be simulated by
      powering down slots via sysfs.  The bug is particularly easy to trigger
      in poll mode.
      
      It has been present since the driver's introduction in 2004:
      https://git.kernel.org/tglx/history/c/c16b4b14d980
      
      Fix by rearranging teardown such that the IRQ is freed first.  Run the
      work items queued by the IRQ handler to completion before freeing the
      hotplug_slot struct by draining the work queue from the ->release_slot
      callback which is invoked by pci_hp_deregister().
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # v2.6.4
      281e878e
    • L
      PCI: hotplug: Don't leak pci_slot on registration failure · 4ce64358
      Lukas Wunner 提交于
      If addition of sysfs files fails on registration of a hotplug slot, the
      struct pci_slot as well as the entry in the slot_list is leaked.  The
      issue has been present since the hotplug core was introduced in 2002:
      https://git.kernel.org/tglx/history/c/a8a2069f432c
      
      Perhaps the idea was that even though sysfs addition fails, the slot
      should still be usable.  But that's not how drivers use the interface,
      they abort probe if a non-zero value is returned.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # v2.4.15+
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      4ce64358