1. 16 8月, 2018 1 次提交
  2. 15 8月, 2018 3 次提交
    • J
      PCI: Limit config space size for Netronome NFP5000 · 2538fb89
      Jakub Kicinski 提交于
      Like the NFP4000 and NFP6000, the NFP5000 as an erratum where reading/
      writing to PCI config space addresses above 0x600 can cause the NFP to
      generate PCIe completion timeouts.
      
      Limit the NFP5000's PF's config space size to 0x600 bytes as is already
      done for the NFP4000 and NFP6000.
      
      The NFP5000's VF is 0x6003 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same
      device ID as the NFP6000's VF.  Thus, its config space is already limited
      by the existing use of quirk_nfp6000().
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NTony Egan <tony.egan@netronome.com>
      2538fb89
    • H
      PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips · 923aa4c3
      Heiner Kallweit 提交于
      If flag IRQCHIP_ONESHOT_SAFE isn't set for an irqchip and we have a
      threaded interrupt with no primary handler, flag IRQF_ONESHOT needs to be
      set for the interrupt, causing some overhead in the threaded interrupt
      handler.  For more detailed explanation also check following comment in
      __setup_irq():
      
        The interrupt was requested with handler = NULL, so we use the default
        primary handler for it. But it does not have the oneshot flag set.  In
        combination with level interrupts this is deadly, because the default
        primary handler just wakes the thread, then the irq lines is reenabled,
        but the device still has the level irq asserted.  Rinse and repeat....
      
        While this works for edge type interrupts, we play it safe and reject
        unconditionally because we can't say for sure which type this interrupt
        really has.  The type flags are unreliable as the underlying chip
        implementation can override them.
      
      Another comment in __setup_irq() gives a hint already that this
      overhead can be avoided for PCI-MSI:
      
        Some irq chips like MSI based interrupts are per se one shot safe.  Check
        the chip flags, so we can avoid the unmask dance at the end of the
        threaded handler for those.
      
      Following this let's mark all PCI-MSI irqchips as oneshot-safe.
      
      See also discussion here:
      https://lkml.kernel.org/r/alpine.DEB.2.21.1808032136490.1658@nanos.tec.linutronix.deSigned-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      923aa4c3
    • B
      PCI/VPD: Check for VPD access completion before checking for timeout · 6eaf2781
      Bert Kenward 提交于
      Previously we checked the timeout before checking the VPD access completion
      bit.  On a very heavily loaded system this can cause VPD access to timeout.
      Check the completion bit before checking the timeout.
      Signed-off-by: NBert Kenward <bkenward@solarflare.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      6eaf2781
  3. 14 8月, 2018 3 次提交
    • M
      PCI: Match Root Port's MPS to endpoint's MPSS as necessary · 9f0e8935
      Myron Stowe 提交于
      In commit 27d868b5 ("PCI: Set MPS to match upstream bridge"), we made
      sure every device's MPS setting matches its upstream bridge, making it more
      likely that a hot-added device will work in a system with an optimized MPS
      configuration.
      
      Recently I've started encountering systems where the endpoint device's MPSS
      capability is less than its Root Port's current MPS value, thus the
      endpoint is not capable of matching its upstream bridge's MPS setting (see:
      bugzilla via "Link:" below).  This leaves the system vulnerable - the
      upstream Root Port could respond with larger TLPs than the device can
      handle, and the device will consider them to be 'Malformed'.
      
      One could use the "pci=pcie_bus_safe" kernel parameter to work around the
      issue, but that forces a user to supply a kernel parameter to get the
      system to function reliably and may end up limiting MPS settings of other
      unrelated, sub-topologies which could benefit from maintaining their larger
      values.
      
      Augment Keith's approach to include tuning down a Root Port's MPS setting
      when its hot-added endpoint device is not capable of matching it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NJon Mason <jdmason@kudzu.us>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sinan Kaya <okaya@kernel.org>
      Cc: Dongdong Liu <liudongdong3@huawei.com>
      9f0e8935
    • M
      PCI: Skip MPS logic for Virtual Functions (VFs) · 3dbe97ef
      Myron Stowe 提交于
      PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
      Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
      VFs.  Just prior to the table it states:
      
        "PF and VF functionality is defined in Section 7.5.3.4 except where
         noted in Table 9-16.  For VF fields marked 'RsvdP', the PF setting
         applies to the VF."
      
      All of which implies that with respect to Max_Payload_Size Supported
      (MPSS), MPS, and MRRS values, we should not be paying any attention to the
      VF's fields, but rather only to the PF's.  Only looking at the PF's fields
      also logically makes sense as it's the sole physical interface to the PCIe
      bus.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
      Fixes: 27d868b5 ("PCI: Set MPS to match upstream bridge")
      Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # 4.3+
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Sinan Kaya <okaya@kernel.org>
      Cc: Dongdong Liu <liudongdong3@huawei.com>
      Cc: Jon Mason <jdmason@kudzu.us>
      3dbe97ef
    • B
      PCI: Add function 1 DMA alias quirk for Marvell 88SS9183 · 7695e73f
      Bjorn Helgaas 提交于
      Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679#c134Reported-and-tested-by: NFelix Blüthner <f.bluethner@mailbox.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      7695e73f
  4. 11 8月, 2018 1 次提交
    • A
      PCI: Check for PCIe Link downtraining · 2d1ce5ec
      Alexandru Gagniuc 提交于
      When both ends of a PCIe Link are capable of a higher bandwidth than is
      currently in use, the Link is said to be "downtrained".  A downtrained Link
      may indicate hardware or configuration problems in the system, but it's
      hard to identify such Links from userspace.
      
      Refactor pcie_print_link_status() so it continues to always print PCIe
      bandwidth information, as several NIC drivers desire.
      
      Add a new internal __pcie_print_link_status() to emit a message only when a
      device's bandwidth is constrained by the fabric and call it from the PCI
      core for all devices, which identifies all downtrained Links.  It also
      emits messages for a few cases that are technically not downtrained, such
      as a x4 device in an open-ended x1 slot.
      Signed-off-by: NAlexandru Gagniuc <mr.nuke.me@gmail.com>
      [bhelgaas: changelog, move __pcie_print_link_status() declaration to
      drivers/pci/, rename pcie_check_upstream_link() to
      pcie_report_downtraining()]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      2d1ce5ec
  5. 10 8月, 2018 10 次提交
  6. 07 8月, 2018 2 次提交
  7. 06 8月, 2018 1 次提交
  8. 01 8月, 2018 13 次提交
    • B
      PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition · 944d5859
      Bjorn Helgaas 提交于
      PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
      under #ifdef CONFIG_ACPI_APEI, and again at the top level.  This looks like
      my merge error from these commits:
      
        fd3362cb ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
        41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c")
      
      Remove the duplicate PCI_EXP_AER_FLAGS definition.
      
      Fixes: 41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NOza Pawandeep <poza@codeaurora.org>
      944d5859
    • L
      PCI: pciehp: Deduplicate presence check on probe & resume · 4e6a1335
      Lukas Wunner 提交于
      On driver probe and on resume from system sleep, pciehp checks the
      Presence Detect State bit in the Slot Status register to bring up an
      occupied slot or bring down an unoccupied slot.  Both code paths are
      identical, so deduplicate them per Mika's request.
      
      On probe, an additional check is performed to disable power of an
      unoccupied slot.  This can e.g. happen if power was enabled by BIOS.
      It cannot happen once pciehp has taken control, hence is not necessary
      on resume:  The Slot Control register is set to the same value that it
      had on suspend by pci_restore_state(), so if the slot was occupied,
      power is enabled and if it wasn't, power is disabled.  Should occupancy
      have changed during the system sleep transition, power is adjusted by
      bringing up or down the slot per the paragraph above.
      
      To allow for deduplication of the presence check, move the power check
      to pcie_init().  This seems safer anyway, because right now it is
      performed while interrupts are already enabled, and although I can't
      think of a scenario where pciehp_power_off_slot() and the IRQ thread
      collide, it does feel brittle.
      
      However this means that pcie_init() may now write to the Slot Control
      register before the IRQ is requested.  If both the CCIE and HPIE bits
      happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
      of polling the Command Completed bit) and eventually emit a timeout
      message.  Additionally, if a level-triggered INTx interrupt is used,
      the user may see a spurious interrupt splat.  Avoid by disabling
      interrupts before disabling power.  (Normally the HPIE and CCIE bits
      should be clear on probe, but conceivably they may already have been
      set e.g. by BIOS.)
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      4e6a1335
    • L
      PCI: pciehp: Avoid implicit fallthroughs in switch statements · 8bb46b07
      Lukas Wunner 提交于
      Per Mika's request, add an explicit break to the last case of switch
      statements everywhere in pciehp to be more defensive towards future
      amendments.
      
      Per Gustavo's request, mark all non-empty implicit fallthroughs with a
      comment to silence warnings triggered by -Wimplicit-fallthrough=2.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      8bb46b07
    • H
      PCI: Fix is_added/is_busmaster race condition · 44bda4b7
      Hari Vyas 提交于
      When a PCI device is detected, pdev->is_added is set to 1 and proc and
      sysfs entries are created.
      
      When the device is removed, pdev->is_added is checked for one and then
      device is detached with clearing of proc and sys entries and at end,
      pdev->is_added is set to 0.
      
      is_added and is_busmaster are bit fields in pci_dev structure sharing same
      memory location.
      
      A strange issue was observed with multiple removal and rescan of a PCIe
      NVMe device using sysfs commands where is_added flag was observed as zero
      instead of one while removing device and proc,sys entries are not cleared.
      This causes issue in later device addition with warning message
      "proc_dir_entry" already registered.
      
      Debugging revealed a race condition between the PCI core setting the
      is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
      setting the is_busmaster bit in pci_set_master().  As these fields are not
      handled atomically, that clears the is_added bit.
      
      Move the is_added bit to a separate private flag variable and use atomic
      functions to set and retrieve the device addition state.  This avoids the
      race because is_added no longer shares a memory location with is_busmaster.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283Signed-off-by: NHari Vyas <hari.vyas@broadcom.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLukas Wunner <lukas@wunner.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      44bda4b7
    • L
      PCI: Whitelist Thunderbolt ports for runtime D3 · 47a8e237
      Lukas Wunner 提交于
      Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W.
      This requires that runtime D3 is allowed on its PCIe ports, so whitelist
      them.
      
      The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports
      is unnecessary on Thunderbolt because we know that even the oldest
      controller, Light Ridge (2010), is able to suspend its ports to D3 just
      fine -- specifically including its hotplug ports.  And the power saving
      should be afforded to machines even if their BIOS predates 2015.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Andreas Noever <andreas.noever@gmail.com>
      47a8e237
    • L
      PCI: Whitelist native hotplug ports for runtime D3 · eb3b5bf1
      Lukas Wunner 提交于
      Previously we blacklisted PCIe hotplug ports for runtime D3 because:
      
      (a) Ports handled by the firmware must not be transitioned to D3 by the
          OS behind the firmware's back:
          https://bugzilla.kernel.org/show_bug.cgi?id=53811
      
      (b) Ports handled natively by the OS lacked runtime D3 support in the
          pciehp driver.
      
      We've just rectified the latter, so allow users to manually enable and
      test it by passing pcie_port_pm=force on the command line.  Vendors are
      thus put in a position to validate hotplug ports for runtime D3 and
      perhaps we can someday enable it by default, but with a BIOS cutoff date.
      
      Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in
      2017 and encountered Hardware Error NMIs, so this feature clearly cannot
      be enabled for everyone yet:
      https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03
      
      While at it, remove an erroneous code comment I added with 97a90aee
      ("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which
      claims that parents of a hotplug port must stay awake lest interrupts
      cannot be delivered.  That has turned out to be wrong at least for
      Thunderbolt hotplug ports.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      eb3b5bf1
    • L
      PCI: sysfs: Resume to D0 on function reset · 82c3fbff
      Lukas Wunner 提交于
      When performing a function reset via sysfs, the device's config space is
      accessed in places such as pcie_flr() and its MMIO space is accessed e.g.
      in reset_ivb_igd(), so ensure accessibility by resuming the device to D0.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      82c3fbff
    • L
      PCI: pciehp: Resume parent to D0 on config space access · 4417aa45
      Lukas Wunner 提交于
      Ensure accessibility of a hotplug port's config space when accessed via
      sysfs by resuming its parent to D0.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      4417aa45
    • L
      PCI: pciehp: Resume to D0 on enable/disable · 83503074
      Lukas Wunner 提交于
      pciehp's IRQ thread ensures accessibility of the port by runtime resuming
      its parent to D0.  However when the slot is enabled/disabled, the port
      itself needs to be in D0 because its secondary bus is accessed in:
      
          pciehp_check_link_status(),
          pciehp_configure_device() (both called from board_added())
      and
          pciehp_unconfigure_device() (called from remove_board()).
      
      Thus, acquire a runtime PM ref on enable/disablement of the slot.
      
      Yinghai Lu additionally discovered that some SkyLake servers feature a
      Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
      which requires the port to be in D0 when invoking
      
          pciehp_power_on_slot() (likewise called from board_added()).
      
      If slot power is turned on while in D3hot, link training later fails:
      https://lkml.kernel.org/r/20170205073454.GA253@wunner.de
      
      The spec is silent about such a requirement, but it seems prudent to
      assume that any hotplug port with a Power Controller may need this.
      
      The present commit holds a runtime PM ref whenever slot power is turned
      on and off, but it doesn't keep the port in D0 as long as slot power is
      on.  If vendors determine that's necessary, they need to amend pciehp to
      acquire a runtime PM ref in pciehp_power_on_slot() and release one in
      pciehp_power_off_slot().
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      83503074
    • L
      PCI: pciehp: Support interrupts sent from D3hot · 6b08c385
      Lukas Wunner 提交于
      If a hotplug port is able to send an interrupt, one would naively assume
      that it is accessible at that moment.  After all, if it wouldn't be
      accessible, i.e. if its parent is in D3hot and the link to the hotplug
      port is thus down, how should an interrupt come through?
      
      It turns out that assumption is wrong at least for Thunderbolt:  Even
      though its parents are in D3hot, a Thunderbolt hotplug port is able to
      signal interrupts.  Because the port's config space is inaccessible and
      resuming the parents may sleep, the hard IRQ handler has to defer
      runtime resuming the parents and reading the Slot Status register to the
      IRQ thread.
      
      If the hotplug port uses a level-triggered INTx interrupt, it needs to
      be masked until the IRQ thread has cleared the signaled events.  For
      simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
      Note that if the interrupt is shared (which can only happen for INTx),
      other devices are starved from receiving interrupts until the IRQ thread
      is scheduled, has runtime resumed the hotplug port's parents and has
      read and cleared the Slot Status register.
      
      That delay is dominated by the 10 ms D3hot->D0 transition time of each
      parent port.  The worst case is a Thunderbolt downstream port at the
      end of a daisy chain:  There may be up to six Thunderbolt controllers
      in-between it and the root port, each comprising an upstream and
      downstream port, plus its own upstream port.  That's 13 x 10 = 130 ms.
      Possible mitigations are polling the interrupt while it's disabled or
      reducing the d3_delay of Thunderbolt ports if possible.
      
      Open code masking of the interrupt instead of requesting it with the
      IRQF_ONESHOT flag to minimize the period during which it is masked.
      (IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
      
      PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
      associated form factor specification, a hotplug capable Downstream Port
      must support generation of a wakeup event (using the PME mechanism) on
      hotplug events that occur when the system is in a sleep state or the
      Port is in device state D1, D2, or D3Hot."
      
      This would seem to imply that PME needs to be enabled on the hotplug
      port when it is runtime suspended.  pci_enable_wake() currently doesn't
      enable PME on bridges, it may be necessary to add an exemption for
      hotplug bridges there.  On "Light Ridge" Thunderbolt controllers, the
      PME_Status bit is not set when an interrupt occurs while the hotplug
      port is in D3hot, even if PME is enabled.  (I've tested this on a Mac
      and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
      negotiate_os_control(), modifying it to 1 didn't change the behavior.)
      
      (Side note:  Section 6.7.3.4 also states that "PME and Hot-Plug Event
      interrupts (when both are implemented) always share the same MSI or
      MSI-X vector".  That would only seem to apply to Root Ports, however
      the section never mentions Root Ports, only Downstream Ports.  This is
      explained in the definition of "Downstream Port" in the "Terms and
      Acronyms" section of the PCIe Base Spec:  "The Ports on a Switch that
      are not the Upstream Port are Downstream Ports.  All Ports on a Root
      Complex are Downstream Ports.")
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      6b08c385
    • L
      PCI: pciehp: Obey compulsory command delay after resume · 469e764c
      Lukas Wunner 提交于
      Upon resume from system sleep, the Slot Control register is written via:
      
        pci_pm_resume_noirq()
          pci_pm_default_resume_early()
            pci_restore_state()
              pci_restore_pcie_state()
      
      PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that
      targets any portion of the Port's Slot Control register, [...] software
      must wait for [the] command to complete before issuing the next command".
      
      pciehp currently fails to enforce that rule after the above-mentioned
      write.  Fix it.
      
      (Moving restoration of the Slot Control register to pciehp doesn't seem
      to make sense because the other PCIe hotplug drivers may need it as
      well.)
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      469e764c
    • L
      PCI: pciehp: Clear spurious events earlier on resume · 79037824
      Lukas Wunner 提交于
      Thunderbolt hotplug ports that were occupied before system sleep resume
      with their downstream link in "off" state.  Only after the Thunderbolt
      controller has reestablished the PCIe tunnels does the link go up.
      As a result, a spurious Presence Detect Changed and/or Data Link Layer
      State Changed event occurs.
      
      The events are not immediately acted upon because tunnel reestablishment
      happens in the ->resume_noirq phase, when interrupts are still disabled.
      Also, notification of events may initially be disabled in the Slot
      Control register when coming out of system sleep and is reenabled in the
      ->resume_noirq phase through:
      
        pci_pm_resume_noirq()
          pci_pm_default_resume_early()
            pci_restore_state()
              pci_restore_pcie_state()
      
      It is not guaranteed that the events are acted upon at all:  PCIe r4.0,
      sec 6.7.3.4 says that "a port may optionally send an MSI when there are
      hot-plug events that occur while interrupt generation is disabled, and
      interrupt generation is subsequently enabled."  Note the "optionally".
      
      If an MSI is sent, pciehp will gratuitously turn the slot off and back
      on once the ->resume_early phase has commenced.
      
      If an MSI is not sent, the extant, unacknowledged events in the Slot
      Status register will prevent future notification of presence or link
      changes.
      
      Commit 13c65840 ("PCI: pciehp: Clear Presence Detect and Data Link
      Layer Status Changed on resume") fixed the latter by clearing the events
      in the ->resume phase.  Move this to the ->resume_noirq phase to also
      fix the gratuitous disable/enablement of the slot.
      
      The commit further restored the Slot Control register in the ->resume
      phase, but that's dispensable because as shown above it's already been
      done in the ->resume_noirq phase.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      79037824
    • L
      PCI: portdrv: Deduplicate PM callback iterator · 6ccb127b
      Lukas Wunner 提交于
      Replace suspend_iter() and resume_iter() with a single function pm_iter()
      to allow addition of port service callbacks for further power management
      phases without having to add another iterator each time.
      
      No functional change intended.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      6ccb127b
  9. 31 7月, 2018 3 次提交
    • L
      PCI: pciehp: Avoid slot access during reset · 5b3f7b7d
      Lukas Wunner 提交于
      The ->reset_slot callback introduced by commits:
      
        2e35afae ("PCI: pciehp: Add reset_slot() method") and
        06a8d89a ("PCI: pciehp: Disable link notification across slot reset")
      
      disables notification of Presence Detect Changed and Data Link Layer
      State Changed events for the duration of a secondary bus reset.
      
      However a bus reset not only triggers these events, but may also clear
      the Presence Detect State bit in the Slot Status register and the Data
      Link Layer Link Active bit in the Link Status register momentarily.
      According to Sinan Kaya:
      
       "I know for a fact that bus reset clears the Data Link Layer Active bit
        as soon as link goes down.  It gets set again following link up.
        Presence detect depends on the HW implementation.  QDT root ports
        don't change presence detect for instance since nobody actually
        removed the card.  If an implementation supports in-band presence
        detect, the answer is yes.  As soon as the link goes down, presence
        detect bit will get cleared until recovery."
        https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org
      
        In-band presence detect is also covered in Table 4-15 in PCIe r4.0,
        sec 4.2.6.
      
      pciehp should therefore ensure that any parts of the driver that access
      those bits do not run concurrently to a bus reset.  The only precaution
      the commits took to that effect was to halt interrupt polling.  They
      made no effort to drain the slot workqueue, cancel an outstanding
      Attention Button work, or block slot enable/disable requests via sysfs
      and in the ->probe hook.
      
      Now that pciehp is converted to enable/disable the slot exclusively from
      the IRQ thread, the only places accessing the two above-mentioned bits
      are the IRQ thread and the ->probe hook.  Add locking to serialize them
      with a bus reset.  This obviates the need to halt interrupt polling.
      Do not add locking to the ->get_adapter_status sysfs callback to afford
      users unfettered access to that bit.  Use an rw_semaphore in lieu of a
      regular mutex to allow parallel execution of the non-reset code paths
      accessing the critical bits, i.e. the IRQ thread and the ->probe hook.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rajat Jain <rajatja@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Sinan Kaya <okaya@kernel.org>
      5b3f7b7d
    • H
      PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler · f7368a55
      Heiner Kallweit 提交于
      If we have a threaded interrupt with the handler being NULL, then
      request_threaded_irq() -> __setup_irq() will complain and bail out if the
      IRQF_ONESHOT flag isn't set.  Therefore check for the handler being NULL
      and set IRQF_ONESHOT in this case.
      
      This change is needed to migrate the mei_me driver to
      pci_alloc_irq_vectors() and pci_request_irq().
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      f7368a55
    • C
      PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core · a8651194
      Christoph Hellwig 提交于
      There is nothing arch-specific about PCI or dma-debug, so call
      dma_debug_add_bus() from the PCI core just after registering the bus type.
      
      Most of dma-debug is already generic; this just adds reporting of pending
      dma-allocations on driver unload for arches other than powerpc, sh, and
      x86.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      a8651194
  10. 28 7月, 2018 1 次提交
  11. 27 7月, 2018 1 次提交
    • T
      PCI/AER: Work around use-after-free in pcie_do_fatal_recovery() · bd91b56c
      Thomas Tai 提交于
      When an fatal error is received by a non-bridge device, the device is
      removed, and pci_stop_and_remove_bus_device() deallocates the device
      structure.  The freed device structure is used by subsequent code to send
      uevents and print messages.
      
      Hold a reference on the device until we're finished using it.  This is not
      an ideal fix because pcie_do_fatal_recovery() should not use the device at
      all after removing it, but that's too big a project for right now.
      
      Fixes: 7e9084b3 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices")
      Signed-off-by: NThomas Tai <thomas.tai@oracle.com>
      [bhelgaas: changelog, reduce get/put coverage]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      bd91b56c
  12. 24 7月, 2018 1 次提交
    • L
      PCI: pciehp: Always enable occupied slot on probe · cdf6b736
      Lukas Wunner 提交于
      Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
      there are hot-plug events that occur while interrupt generation is
      disabled, and interrupt generation is subsequently enabled."
      
      On probe, we currently clear all event bits in the Slot Status register
      with the notable exception of the Presence Detect Changed bit.  Thereby
      we seek to receive an interrupt for an already occupied slot once event
      notification is enabled.
      
      But because the interrupt is optional, users may have to specify the
      pciehp_force parameter on the command line, which is inconvenient.
      
      Moreover, now that pciehp's event handling has become resilient to
      missed events, a Presence Detect Changed interrupt for a slot which is
      powered on is interpreted as removal of the card.  If the slot has
      already been brought up by the BIOS, receiving such an interrupt on
      probe causes the slot to be powered off and immediately back on, which
      is likewise undesirable.
      
      Avoid both issues by making the behavior of pciehp_force the default and
      clearing the Presence Detect Changed bit on probe.
      
      Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
      ("Force pciehp, even if OSHP is missing") seems nonsensical because the
      OSHP control method is only relevant for SHCP slots according to the
      PCI Firmware specification r3.0, sec 4.8.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      cdf6b736