- 16 8月, 2018 1 次提交
-
-
由 Alexandru Gagniuc 提交于
If the platform requests Firmware-First error handling, firmware is responsible for reading and clearing AER status bits. If OSPM also clears them, we may miss errors. See ACPI v6.2, sec 18.3.2.5 and 18.4. This race is mostly of theoretical significance, as it is not easy to reasonably demonstrate it in testing. Signed-off-by: NAlexandru Gagniuc <mr.nuke.me@gmail.com> [bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status() and pci_aer_clear_fatal_status()] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 15 8月, 2018 3 次提交
-
-
由 Jakub Kicinski 提交于
Like the NFP4000 and NFP6000, the NFP5000 as an erratum where reading/ writing to PCI config space addresses above 0x600 can cause the NFP to generate PCIe completion timeouts. Limit the NFP5000's PF's config space size to 0x600 bytes as is already done for the NFP4000 and NFP6000. The NFP5000's VF is 0x6003 (PCI_DEVICE_ID_NETRONOME_NFP6000_VF), the same device ID as the NFP6000's VF. Thus, its config space is already limited by the existing use of quirk_nfp6000(). Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NTony Egan <tony.egan@netronome.com>
-
由 Heiner Kallweit 提交于
If flag IRQCHIP_ONESHOT_SAFE isn't set for an irqchip and we have a threaded interrupt with no primary handler, flag IRQF_ONESHOT needs to be set for the interrupt, causing some overhead in the threaded interrupt handler. For more detailed explanation also check following comment in __setup_irq(): The interrupt was requested with handler = NULL, so we use the default primary handler for it. But it does not have the oneshot flag set. In combination with level interrupts this is deadly, because the default primary handler just wakes the thread, then the irq lines is reenabled, but the device still has the level irq asserted. Rinse and repeat.... While this works for edge type interrupts, we play it safe and reject unconditionally because we can't say for sure which type this interrupt really has. The type flags are unreliable as the underlying chip implementation can override them. Another comment in __setup_irq() gives a hint already that this overhead can be avoided for PCI-MSI: Some irq chips like MSI based interrupts are per se one shot safe. Check the chip flags, so we can avoid the unmask dance at the end of the threaded handler for those. Following this let's mark all PCI-MSI irqchips as oneshot-safe. See also discussion here: https://lkml.kernel.org/r/alpine.DEB.2.21.1808032136490.1658@nanos.tec.linutronix.deSigned-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Bert Kenward 提交于
Previously we checked the timeout before checking the VPD access completion bit. On a very heavily loaded system this can cause VPD access to timeout. Check the completion bit before checking the timeout. Signed-off-by: NBert Kenward <bkenward@solarflare.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 14 8月, 2018 3 次提交
-
-
由 Myron Stowe 提交于
In commit 27d868b5 ("PCI: Set MPS to match upstream bridge"), we made sure every device's MPS setting matches its upstream bridge, making it more likely that a hot-added device will work in a system with an optimized MPS configuration. Recently I've started encountering systems where the endpoint device's MPSS capability is less than its Root Port's current MPS value, thus the endpoint is not capable of matching its upstream bridge's MPS setting (see: bugzilla via "Link:" below). This leaves the system vulnerable - the upstream Root Port could respond with larger TLPs than the device can handle, and the device will consider them to be 'Malformed'. One could use the "pci=pcie_bus_safe" kernel parameter to work around the issue, but that forces a user to supply a kernel parameter to get the system to function reliably and may end up limiting MPS settings of other unrelated, sub-topologies which could benefit from maintaining their larger values. Augment Keith's approach to include tuning down a Root Port's MPS setting when its hot-added endpoint device is not capable of matching it. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527Signed-off-by: NMyron Stowe <myron.stowe@redhat.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NJon Mason <jdmason@kudzu.us> Cc: Keith Busch <keith.busch@intel.com> Cc: Sinan Kaya <okaya@kernel.org> Cc: Dongdong Liu <liudongdong3@huawei.com>
-
由 Myron Stowe 提交于
PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for VFs. Just prior to the table it states: "PF and VF functionality is defined in Section 7.5.3.4 except where noted in Table 9-16. For VF fields marked 'RsvdP', the PF setting applies to the VF." All of which implies that with respect to Max_Payload_Size Supported (MPSS), MPS, and MRRS values, we should not be paying any attention to the VF's fields, but rather only to the PF's. Only looking at the PF's fields also logically makes sense as it's the sole physical interface to the PCIe bus. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527 Fixes: 27d868b5 ("PCI: Set MPS to match upstream bridge") Signed-off-by: NMyron Stowe <myron.stowe@redhat.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # 4.3+ Cc: Keith Busch <keith.busch@intel.com> Cc: Sinan Kaya <okaya@kernel.org> Cc: Dongdong Liu <liudongdong3@huawei.com> Cc: Jon Mason <jdmason@kudzu.us>
-
由 Bjorn Helgaas 提交于
Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller. Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679#c134Reported-and-tested-by: NFelix Blüthner <f.bluethner@mailbox.org> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 11 8月, 2018 1 次提交
-
-
由 Alexandru Gagniuc 提交于
When both ends of a PCIe Link are capable of a higher bandwidth than is currently in use, the Link is said to be "downtrained". A downtrained Link may indicate hardware or configuration problems in the system, but it's hard to identify such Links from userspace. Refactor pcie_print_link_status() so it continues to always print PCIe bandwidth information, as several NIC drivers desire. Add a new internal __pcie_print_link_status() to emit a message only when a device's bandwidth is constrained by the fabric and call it from the PCI core for all devices, which identifies all downtrained Links. It also emits messages for a few cases that are technically not downtrained, such as a x4 device in an open-ended x1 slot. Signed-off-by: NAlexandru Gagniuc <mr.nuke.me@gmail.com> [bhelgaas: changelog, move __pcie_print_link_status() declaration to drivers/pci/, rename pcie_check_upstream_link() to pcie_report_downtraining()] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 10 8月, 2018 10 次提交
-
-
由 Logan Gunthorpe 提交于
Intel Sunrise Point PCH hardware has an implementation of the ACS bits that does not comply with the PCIe standard. Add a device-specific quirk, pci_quirk_disable_intel_spt_pch_acs_redir() to disable ACS Redirection on this system. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> [bhelgaas: changelog, split to separate patch] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Logan Gunthorpe 提交于
Intel Sunrise Point (SPT) PCH hardware has an implementation of the ACS bits that does not comply with the PCIe standard. To deal with this we need device-specific quirks to disable ACS redirection. Add a new pci_dev_specific_disable_acs_redir() quirk and a new .disable_acs_redir() function pointer for use by non-compliant devices. No functional change intended. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> [bhelgaas: split to separate patch, move pci_dev_specific_disable_acs_redir() declarations to drivers/pci/pci.h] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Logan Gunthorpe 提交于
Convert the search for device-specific ACS enable quirks from searching a NULL-terminated array to iterating through the array, which is always fixed-size anyway. No functional change intended. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> [bhelgaas: changelog, split to separate patch for reviewability] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Logan Gunthorpe 提交于
To support peer-to-peer traffic on a segment of the PCI hierarchy, we must disable the ACS redirect bits for select PCI bridges. The bridges must be selected before the devices are discovered by the kernel and the IOMMU groups created. Therefore, add a kernel command line parameter to specify devices which must have their ACS bits disabled. The new parameter takes a list of devices separated by a semicolon. Each device specified will have its ACS redirect bits disabled. This is similar to the existing 'resource_alignment' parameter. The ACS Request P2P Request Redirect, P2P Completion Redirect and P2P Egress Control bits are disabled, which is sufficient to always allow passing P2P traffic uninterrupted. The bits are set after the kernel (optionally) enables the ACS bits itself. It is also done regardless of whether the kernel or platform firmware sets the bits. If the user tries to disable the ACS redirect for a device without the ACS capability, print a warning to dmesg. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> [bhelgaas: reorder to add the generic code first and move the device-specific quirk to subsequent patches] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NChristian König <christian.koenig@amd.com>
-
由 Logan Gunthorpe 提交于
When specifying PCI devices on the kernel command line using a bus/device/function address, bus numbers can change when adding or replacing a device, changing motherboard firmware, or applying kernel parameters like "pci=assign-buses". When bus numbers change, it's likely the command line tweak will be applied to the wrong device. Therefore, it is useful to be able to specify devices with a base bus number and the path of devfns needed to get to it, similar to the "device scope" structure in the Intel VT-d spec, Section 8.3.1. Thus, we add an option to specify devices in the following format: [<domain>:]<bus>:<device>.<func>[/<device>.<func>]* The path can be any segment within the PCI hierarchy of any length and determined through the use of 'lspci -t'. When specified this way, it is less likely that a renumbered bus will result in a valid device specification and the tweak won't be applied to the wrong device. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> [bhelgaas: use "device" instead of "slot" in documentation since that's the usual language in the PCI specs] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NChristian König <christian.koenig@amd.com>
-
由 Logan Gunthorpe 提交于
Separate out the code to match a PCI device with a string (typically originating from a kernel parameter) from the pci_specified_resource_alignment() function into its own helper function. While we are at it, this change fixes the kernel style of the function (fixing a number of long lines and extra parentheses). Additionally, make the analogous change to the kernel parameter documentation: Separate the description of how to specify a PCI device into its own section at the head of the "pci=" parameter. This patch should have no functional alterations. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> [bhelgaas: use "device" instead of "slot" in documentation since that's the usual language in the PCI specs] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NStephen Bates <sbates@raithlin.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NChristian König <christian.koenig@amd.com>
-
由 Bjorn Helgaas 提交于
Move declarations for these functions: pci_dev_specific_acs_enabled() pci_dev_specific_enable_acs() from include/linux/pci.h to drivers/pci/pci.h because nothing outside the PCI core needs to use them. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> -
由 Alex Williamson 提交于
Add a device-specific reset for Intel DC P3700 NVMe device which exhibits a timeout failure in drivers waiting for the ready status to update after NVMe enable if the driver interacts with the device too soon after FLR. As this has been observed in device assignment scenarios, resolve this with a device-specific reset quirk to add an additional, heuristically determined, delay after the FLR completes. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1592654Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Alex Williamson 提交于
The Samsung SM961/PM961 (960 EVO) sometimes fails to return from FLR with the PCI config space reading back as -1. A reproducible instance of this behavior is resolved by clearing the enable bit in the NVMe configuration register and waiting for the ready status to clear (disabling the NVMe controller) prior to FLR. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1542494Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Alex Williamson 提交于
pcie_flr() suggests pcie_has_flr() to ensure that PCIe FLR support is present prior to calling. pcie_flr() is exported while pcie_has_flr() is not. Resolve this. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 07 8月, 2018 2 次提交
-
-
由 Bjorn Helgaas 提交于
Several PCI core files include pci-aspm.h even though they don't need anything provided by that file. Remove the unnecessary includes of it. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NSinan Kaya <okaya@kernel.org>
-
由 Andy Shevchenko 提交于
The sysfs_match_string() helper returns index of the matching string in an array. Use it in pcie_aspm_set_policy() to simplify the code. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> [bhelgaas: squash sysfs_match_string() fix into original patch for issue Reported-by: Heiner Kallweit <hkallweit1@gmail.com>] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 06 8月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
There isn't a hard dependency of the Xilinx AXI-PCIe host bridge on any architecture. For example: at SiFive we map RISC-V cores to Xilinx FPGAs and connect the Xilinx IP via a TileLink adapter, so the RISC-V Linux port will need to be able to enable PCIE_XILINX in order to have PCIe support. This patch decouples the PCIE_XILINX support from ARCH. Instead it just depends on OF, which is the only true dependency. Signed-off-by: NPalmer Dabbelt <palmer@dabbelt.com> [hch: switch to OF instead of OF_PCI now that the latter is gone] Signed-off-by: NChristoph Hellwig <hch@lst.de> [lorenzo.pieralisi@arm.com: trimmed the commit log] Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
-
- 01 8月, 2018 13 次提交
-
-
由 Bjorn Helgaas 提交于
PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once under #ifdef CONFIG_ACPI_APEI, and again at the top level. This looks like my merge error from these commits: fd3362cb ("PCI/AER: Squash aerdrv_core.c into aerdrv.c") 41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c") Remove the duplicate PCI_EXP_AER_FLAGS definition. Fixes: 41cbc9eb ("PCI/AER: Squash ecrc.c into aerdrv.c") Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NOza Pawandeep <poza@codeaurora.org>
-
由 Lukas Wunner 提交于
On driver probe and on resume from system sleep, pciehp checks the Presence Detect State bit in the Slot Status register to bring up an occupied slot or bring down an unoccupied slot. Both code paths are identical, so deduplicate them per Mika's request. On probe, an additional check is performed to disable power of an unoccupied slot. This can e.g. happen if power was enabled by BIOS. It cannot happen once pciehp has taken control, hence is not necessary on resume: The Slot Control register is set to the same value that it had on suspend by pci_restore_state(), so if the slot was occupied, power is enabled and if it wasn't, power is disabled. Should occupancy have changed during the system sleep transition, power is adjusted by bringing up or down the slot per the paragraph above. To allow for deduplication of the presence check, move the power check to pcie_init(). This seems safer anyway, because right now it is performed while interrupts are already enabled, and although I can't think of a scenario where pciehp_power_off_slot() and the IRQ thread collide, it does feel brittle. However this means that pcie_init() may now write to the Slot Control register before the IRQ is requested. If both the CCIE and HPIE bits happen to be set, pcie_wait_cmd() will wait for an interrupt (instead of polling the Command Completed bit) and eventually emit a timeout message. Additionally, if a level-triggered INTx interrupt is used, the user may see a spurious interrupt splat. Avoid by disabling interrupts before disabling power. (Normally the HPIE and CCIE bits should be clear on probe, but conceivably they may already have been set e.g. by BIOS.) Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
-
由 Lukas Wunner 提交于
Per Mika's request, add an explicit break to the last case of switch statements everywhere in pciehp to be more defensive towards future amendments. Per Gustavo's request, mark all non-empty implicit fallthroughs with a comment to silence warnings triggered by -Wimplicit-fallthrough=2. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Acked-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
-
由 Hari Vyas 提交于
When a PCI device is detected, pdev->is_added is set to 1 and proc and sysfs entries are created. When the device is removed, pdev->is_added is checked for one and then device is detached with clearing of proc and sys entries and at end, pdev->is_added is set to 0. is_added and is_busmaster are bit fields in pci_dev structure sharing same memory location. A strange issue was observed with multiple removal and rescan of a PCIe NVMe device using sysfs commands where is_added flag was observed as zero instead of one while removing device and proc,sys entries are not cleared. This causes issue in later device addition with warning message "proc_dir_entry" already registered. Debugging revealed a race condition between the PCI core setting the is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue setting the is_busmaster bit in pci_set_master(). As these fields are not handled atomically, that clears the is_added bit. Move the is_added bit to a separate private flag variable and use atomic functions to set and retrieve the device addition state. This avoids the race because is_added no longer shares a memory location with is_busmaster. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283Signed-off-by: NHari Vyas <hari.vyas@broadcom.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NLukas Wunner <lukas@wunner.de> Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Lukas Wunner 提交于
Thunderbolt controllers can be runtime suspended to D3cold to save ~1.5W. This requires that runtime D3 is allowed on its PCIe ports, so whitelist them. The 2015 BIOS cutoff that we've instituted for runtime D3 on PCIe ports is unnecessary on Thunderbolt because we know that even the oldest controller, Light Ridge (2010), is able to suspend its ports to D3 just fine -- specifically including its hotplug ports. And the power saving should be afforded to machines even if their BIOS predates 2015. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Andreas Noever <andreas.noever@gmail.com>
-
由 Lukas Wunner 提交于
Previously we blacklisted PCIe hotplug ports for runtime D3 because: (a) Ports handled by the firmware must not be transitioned to D3 by the OS behind the firmware's back: https://bugzilla.kernel.org/show_bug.cgi?id=53811 (b) Ports handled natively by the OS lacked runtime D3 support in the pciehp driver. We've just rectified the latter, so allow users to manually enable and test it by passing pcie_port_pm=force on the command line. Vendors are thus put in a position to validate hotplug ports for runtime D3 and perhaps we can someday enable it by default, but with a BIOS cutoff date. Ashok Raj tested runtime D3 on hotplug ports of a SkyLake Xeon-SP in 2017 and encountered Hardware Error NMIs, so this feature clearly cannot be enabled for everyone yet: https://lkml.kernel.org/r/20170503180426.GA4058@otc-nc-03 While at it, remove an erroneous code comment I added with 97a90aee ("PCI: Consolidate conditions to allow runtime PM on PCIe ports") which claims that parents of a hotplug port must stay awake lest interrupts cannot be delivered. That has turned out to be wrong at least for Thunderbolt hotplug ports. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> -
由 Lukas Wunner 提交于
When performing a function reset via sysfs, the device's config space is accessed in places such as pcie_flr() and its MMIO space is accessed e.g. in reset_ivb_igd(), so ensure accessibility by resuming the device to D0. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
-
由 Lukas Wunner 提交于
Ensure accessibility of a hotplug port's config space when accessed via sysfs by resuming its parent to D0. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
-
由 Lukas Wunner 提交于
pciehp's IRQ thread ensures accessibility of the port by runtime resuming its parent to D0. However when the slot is enabled/disabled, the port itself needs to be in D0 because its secondary bus is accessed in: pciehp_check_link_status(), pciehp_configure_device() (both called from board_added()) and pciehp_unconfigure_device() (called from remove_board()). Thus, acquire a runtime PM ref on enable/disablement of the slot. Yinghai Lu additionally discovered that some SkyLake servers feature a Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8) which requires the port to be in D0 when invoking pciehp_power_on_slot() (likewise called from board_added()). If slot power is turned on while in D3hot, link training later fails: https://lkml.kernel.org/r/20170205073454.GA253@wunner.de The spec is silent about such a requirement, but it seems prudent to assume that any hotplug port with a Power Controller may need this. The present commit holds a runtime PM ref whenever slot power is turned on and off, but it doesn't keep the port in D0 as long as slot power is on. If vendors determine that's necessary, they need to amend pciehp to acquire a runtime PM ref in pciehp_power_on_slot() and release one in pciehp_power_off_slot(). Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> -
由 Lukas Wunner 提交于
If a hotplug port is able to send an interrupt, one would naively assume that it is accessible at that moment. After all, if it wouldn't be accessible, i.e. if its parent is in D3hot and the link to the hotplug port is thus down, how should an interrupt come through? It turns out that assumption is wrong at least for Thunderbolt: Even though its parents are in D3hot, a Thunderbolt hotplug port is able to signal interrupts. Because the port's config space is inaccessible and resuming the parents may sleep, the hard IRQ handler has to defer runtime resuming the parents and reading the Slot Status register to the IRQ thread. If the hotplug port uses a level-triggered INTx interrupt, it needs to be masked until the IRQ thread has cleared the signaled events. For simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts. Note that if the interrupt is shared (which can only happen for INTx), other devices are starved from receiving interrupts until the IRQ thread is scheduled, has runtime resumed the hotplug port's parents and has read and cleared the Slot Status register. That delay is dominated by the 10 ms D3hot->D0 transition time of each parent port. The worst case is a Thunderbolt downstream port at the end of a daisy chain: There may be up to six Thunderbolt controllers in-between it and the root port, each comprising an upstream and downstream port, plus its own upstream port. That's 13 x 10 = 130 ms. Possible mitigations are polling the interrupt while it's disabled or reducing the d3_delay of Thunderbolt ports if possible. Open code masking of the interrupt instead of requesting it with the IRQF_ONESHOT flag to minimize the period during which it is masked. (IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.) PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the associated form factor specification, a hotplug capable Downstream Port must support generation of a wakeup event (using the PME mechanism) on hotplug events that occur when the system is in a sleep state or the Port is in device state D1, D2, or D3Hot." This would seem to imply that PME needs to be enabled on the hotplug port when it is runtime suspended. pci_enable_wake() currently doesn't enable PME on bridges, it may be necessary to add an exemption for hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the PME_Status bit is not set when an interrupt occurs while the hotplug port is in D3hot, even if PME is enabled. (I've tested this on a Mac and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in negotiate_os_control(), modifying it to 1 didn't change the behavior.) (Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event interrupts (when both are implemented) always share the same MSI or MSI-X vector". That would only seem to apply to Root Ports, however the section never mentions Root Ports, only Downstream Ports. This is explained in the definition of "Downstream Port" in the "Terms and Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that are not the Upstream Port are Downstream Ports. All Ports on a Root Complex are Downstream Ports.") Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
-
由 Lukas Wunner 提交于
Upon resume from system sleep, the Slot Control register is written via: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_restore_state() pci_restore_pcie_state() PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that targets any portion of the Port's Slot Control register, [...] software must wait for [the] command to complete before issuing the next command". pciehp currently fails to enforce that rule after the above-mentioned write. Fix it. (Moving restoration of the Slot Control register to pciehp doesn't seem to make sense because the other PCIe hotplug drivers may need it as well.) Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> -
由 Lukas Wunner 提交于
Thunderbolt hotplug ports that were occupied before system sleep resume with their downstream link in "off" state. Only after the Thunderbolt controller has reestablished the PCIe tunnels does the link go up. As a result, a spurious Presence Detect Changed and/or Data Link Layer State Changed event occurs. The events are not immediately acted upon because tunnel reestablishment happens in the ->resume_noirq phase, when interrupts are still disabled. Also, notification of events may initially be disabled in the Slot Control register when coming out of system sleep and is reenabled in the ->resume_noirq phase through: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_restore_state() pci_restore_pcie_state() It is not guaranteed that the events are acted upon at all: PCIe r4.0, sec 6.7.3.4 says that "a port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." Note the "optionally". If an MSI is sent, pciehp will gratuitously turn the slot off and back on once the ->resume_early phase has commenced. If an MSI is not sent, the extant, unacknowledged events in the Slot Status register will prevent future notification of presence or link changes. Commit 13c65840 ("PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume") fixed the latter by clearing the events in the ->resume phase. Move this to the ->resume_noirq phase to also fix the gratuitous disable/enablement of the slot. The commit further restored the Slot Control register in the ->resume phase, but that's dispensable because as shown above it's already been done in the ->resume_noirq phase. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> -
由 Lukas Wunner 提交于
Replace suspend_iter() and resume_iter() with a single function pm_iter() to allow addition of port service callbacks for further power management phases without having to add another iterator each time. No functional change intended. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 31 7月, 2018 3 次提交
-
-
由 Lukas Wunner 提交于
The ->reset_slot callback introduced by commits: 2e35afae ("PCI: pciehp: Add reset_slot() method") and 06a8d89a ("PCI: pciehp: Disable link notification across slot reset") disables notification of Presence Detect Changed and Data Link Layer State Changed events for the duration of a secondary bus reset. However a bus reset not only triggers these events, but may also clear the Presence Detect State bit in the Slot Status register and the Data Link Layer Link Active bit in the Link Status register momentarily. According to Sinan Kaya: "I know for a fact that bus reset clears the Data Link Layer Active bit as soon as link goes down. It gets set again following link up. Presence detect depends on the HW implementation. QDT root ports don't change presence detect for instance since nobody actually removed the card. If an implementation supports in-band presence detect, the answer is yes. As soon as the link goes down, presence detect bit will get cleared until recovery." https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org In-band presence detect is also covered in Table 4-15 in PCIe r4.0, sec 4.2.6. pciehp should therefore ensure that any parts of the driver that access those bits do not run concurrently to a bus reset. The only precaution the commits took to that effect was to halt interrupt polling. They made no effort to drain the slot workqueue, cancel an outstanding Attention Button work, or block slot enable/disable requests via sysfs and in the ->probe hook. Now that pciehp is converted to enable/disable the slot exclusively from the IRQ thread, the only places accessing the two above-mentioned bits are the IRQ thread and the ->probe hook. Add locking to serialize them with a bus reset. This obviates the need to halt interrupt polling. Do not add locking to the ->get_adapter_status sysfs callback to afford users unfettered access to that bit. Use an rw_semaphore in lieu of a regular mutex to allow parallel execution of the non-reset code paths accessing the critical bits, i.e. the IRQ thread and the ->probe hook. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Rajat Jain <rajatja@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Sinan Kaya <okaya@kernel.org>
-
由 Heiner Kallweit 提交于
If we have a threaded interrupt with the handler being NULL, then request_threaded_irq() -> __setup_irq() will complain and bail out if the IRQF_ONESHOT flag isn't set. Therefore check for the handler being NULL and set IRQF_ONESHOT in this case. This change is needed to migrate the mei_me driver to pci_alloc_irq_vectors() and pci_request_irq(). Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Christoph Hellwig 提交于
There is nothing arch-specific about PCI or dma-debug, so call dma_debug_add_bus() from the PCI core just after registering the bus type. Most of dma-debug is already generic; this just adds reporting of pending dma-allocations on driver unload for arches other than powerpc, sh, and x86. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
-
- 28 7月, 2018 1 次提交
-
-
由 Dan Carpenter 提交于
IB_WIN_SIZE is larger than INT_MAX so we need to cast it to u64. Fixes: 9af6bcb1 ("PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 27 7月, 2018 1 次提交
-
-
由 Thomas Tai 提交于
When an fatal error is received by a non-bridge device, the device is removed, and pci_stop_and_remove_bus_device() deallocates the device structure. The freed device structure is used by subsequent code to send uevents and print messages. Hold a reference on the device until we're finished using it. This is not an ideal fix because pcie_do_fatal_recovery() should not use the device at all after removing it, but that's too big a project for right now. Fixes: 7e9084b3 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices") Signed-off-by: NThomas Tai <thomas.tai@oracle.com> [bhelgaas: changelog, reduce get/put coverage] Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 24 7月, 2018 1 次提交
-
-
由 Lukas Wunner 提交于
Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." On probe, we currently clear all event bits in the Slot Status register with the notable exception of the Presence Detect Changed bit. Thereby we seek to receive an interrupt for an already occupied slot once event notification is enabled. But because the interrupt is optional, users may have to specify the pciehp_force parameter on the command line, which is inconvenient. Moreover, now that pciehp's event handling has become resilient to missed events, a Presence Detect Changed interrupt for a slot which is powered on is interpreted as removal of the card. If the slot has already been brought up by the BIOS, receiving such an interrupt on probe causes the slot to be powered off and immediately back on, which is likewise undesirable. Avoid both issues by making the behavior of pciehp_force the default and clearing the Presence Detect Changed bit on probe. Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC ("Force pciehp, even if OSHP is missing") seems nonsensical because the OSHP control method is only relevant for SHCP slots according to the PCI Firmware specification r3.0, sec 4.8. Signed-off-by: NLukas Wunner <lukas@wunner.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
-