1. 19 8月, 2017 1 次提交
    • S
      PCI: Avoid race while enabling upstream bridges · 40f11adc
      Srinath Mannam 提交于
      When we enable a device, we first enable any upstream bridges.  If a bridge
      has multiple downstream devices and we enable them simultaneously, the race
      to enable the upstream bridge may cause problems.  Consider this hierarchy:
      
        bridge A --+-- device B
                   +-- device C
      
      If drivers for B and C call pci_enable_device() simultaneously, both will
      attempt to enable A, which involves setting PCI_COMMAND_MASTER via
      pci_set_master() and PCI_COMMAND_MEMORY via pci_enable_resources().
      
      In the following sequence, B's update to set A's PCI_COMMAND_MEMORY is
      lost, and neither B nor C will work correctly:
      
            B                                C
        pci_set_master(A)
          cmd = read(A, PCI_COMMAND)
          cmd |= PCI_COMMAND_MASTER
                                         pci_set_master(A)
                                           cmd = read(A, PCI_COMMAND)
                                           cmd |= PCI_COMMAND_MASTER
          write(A, PCI_COMMAND, cmd)
        pci_enable_device(A)
          pci_enable_resources(A)
            cmd = read(A, PCI_COMMAND)
            cmd |= PCI_COMMAND_MEMORY
            write(A, PCI_COMMAND, cmd)
                                           write(A, PCI_COMMAND, cmd)
      
      Avoid this race by holding a new pci_bridge_mutex while enabling a bridge.
      This ensures that both PCI_COMMAND_MASTER and PCI_COMMAND_MEMORY will be
      updated before another thread can start enabling the bridge.
      
      Note that although pci_enable_bridge() is recursive, it enables any
      upstream bridges *before* acquiring the mutex.  When it acquires the mutex
      and calls pci_set_master() and pci_enable_device(), any upstream bridges
      have already been enabled so pci_enable_device() will not deadlock by
      calling pci_enable_bridge() again.
      Signed-off-by: NSrinath Mannam <srinath.mannam@broadcom.com>
      [bhelgaas: changelog, comment]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      40f11adc
  2. 13 7月, 2017 1 次提交
    • R
      PCI / PM: Restore PME Enable after config space restoration · 0ce3fcaf
      Rafael J. Wysocki 提交于
      Commit dc15e71e (PCI / PM: Restore PME Enable if skipping wakeup
      setup) introduced a mechanism by which the PME Enable bit can be
      restored by pci_enable_wake() if dev->wakeup_prepared is set in
      case it has been overwritten by PCI config space restoration.
      
      However, that commit overlooked the fact that on some systems (Dell
      XPS13 9360 in particular) the AML handling wakeup events checks PME
      Status and PME Enable and it won't trigger a Notify() for devices
      where those bits are not set while it is running.
      
      That happens during resume from suspend-to-idle when pci_restore_state()
      invoked by pci_pm_default_resume_early() clears PME Enable before the
      wakeup events are processed by AML, effectively causing those wakeup
      events to be ignored.
      
      Fix this issue by restoring the PME Enable configuration right after
      pci_restore_state() has been called instead of doing that in
      pci_enable_wake().
      
      Fixes: dc15e71e (PCI / PM: Restore PME Enable if skipping wakeup setup)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      0ce3fcaf
  3. 03 7月, 2017 2 次提交
  4. 01 7月, 2017 1 次提交
  5. 28 6月, 2017 2 次提交
  6. 17 6月, 2017 1 次提交
  7. 15 6月, 2017 2 次提交
  8. 31 5月, 2017 1 次提交
  9. 24 5月, 2017 1 次提交
  10. 18 5月, 2017 1 次提交
    • A
      PCI: Do not disregard parent resources starting at 0x0 · 31342330
      Ard Biesheuvel 提交于
      Commit f44116ae ("PCI: Remove pci_find_parent_resource() use for
      allocation") updated the logic that iterates over all bus resources and
      compares them to a given resource, in order to decide whether one is the
      parent of the latter.
      
      This change inadvertently causes pci_find_parent_resource() to disregard
      resources starting at address 0x0, resulting in an error such as the one
      below on ARM systems whose I/O window starts at 0x0.
      
        pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff window]
        pci_bus 0000:00: root bus resource [io  0x0000-0xffff window]
        pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff window]
        pci_bus 0000:00: root bus resource [bus 00-0f]
        pci 0000:00:01.0: PCI bridge to [bus 01]
        pci 0000:00:02.0: PCI bridge to [bus 02]
        pci 0000:00:03.0: PCI bridge to [bus 03]
        pci 0000:00:03.0: can't claim BAR 13 [io  0x0000-0x0fff]: no compatible bridge window
        pci 0000:03:01.0: can't claim BAR 0 [io  0x0000-0x001f]: no compatible bridge window
      
      While this never happens on x86, it is perfectly legal in general for a PCI
      MMIO or IO window to start at address 0x0, and it was supported in the code
      before commit f44116ae.
      
      Drop the test for res->start != 0; resource_contains() already checks
      whether [start, end) completely covers the resource, and so it should be
      redundant.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      31342330
  11. 25 4月, 2017 1 次提交
    • L
      PCI: Implement devm_pci_remap_cfgspace() · 490cb6dd
      Lorenzo Pieralisi 提交于
      The introduction of the pci_remap_cfgspace() interface allows PCI host
      controller drivers to map PCI config space through a dedicated kernel
      interface. Current PCI host controller drivers use the devm_ioremap_*()
      devres interfaces to map PCI configuration space regions so in order to
      update them to the new pci_remap_cfgspace() mapping interface a new set of
      devres interfaces should be implemented so that PCI host controller drivers
      can make use of them.
      
      Introduce two new functions in the PCI kernel layer and Devres
      documentation:
      
      - devm_pci_remap_cfgspace()
      - devm_pci_remap_cfg_resource()
      
      so that PCI host controller drivers can make use of them to map PCI
      configuration space regions.
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      490cb6dd
  12. 21 4月, 2017 1 次提交
  13. 20 4月, 2017 6 次提交
    • C
      PCI: Export pcie_flr() · a60a2b73
      Christoph Hellwig 提交于
      Currently we opencode the FLR sequence in lots of place; export a core
      helper instead.  We split out the probing for FLR support as all the
      non-core callers already know their hardware.
      
      Note that in the new pci_has_flr() function the quirk check has been moved
      before the capability check as there is no point in reading the capability
      in this case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      a60a2b73
    • L
      PCI: Remove __weak tag from pci_remap_iospace() · 7b309aef
      Lorenzo Pieralisi 提交于
      pci_remap_iospace() is marked as a weak symbol even though no architecture
      is currently overriding it; given that its implementation internals have
      already code paths that are arch specific (ie PCI_IOBASE and
      ioremap_page_range() attributes) there is no need to leave the weak symbol
      in the kernel since the same functionality can be achieved by customizing
      per-arch the corresponding functionality.
      
      Remove the __weak symbol from pci_remap_iospace().
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      7b309aef
    • Y
      PCI: Don't resize resources when realigning all devices in system · e3adec72
      Yongji Xie 提交于
      The "pci=resource_alignment" argument aligns BARs of designated devices by
      artificially increasing their size.  Increasing the size increases the
      alignment and prevents other resources from being assigned in the same
      alignment region, e.g., in the same page, but it can break drivers that use
      the BAR size to locate things, e.g., ilo_map_device() does this:
      
        off = pci_resource_len(pdev, bar) - 0x2000;
      
      The new pcibios_default_alignment() interface allows an arch to request
      that *all* BARs in the system be aligned to a larger size.  In this case,
      we don't need to artificially increase the resource size because we know
      every BAR of every device will be realigned, so nothing will share the same
      alignment region.
      
      Use IORESOURCE_STARTALIGN to request realignment of PCI BARs when we know
      we're realigning all BARs in the system.
      
      [bhelgaas: comment, changelog]
      Signed-off-by: NYongji Xie <elohimes@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      e3adec72
    • B
      PCI: Don't reassign resources that are already aligned · 0dde1c08
      Bjorn Helgaas 提交于
      The "pci=resource_alignment=" kernel argument designates devices for which
      we want alignment greater than is required by the PCI specs.  Previously we
      set IORESOURCE_UNSET for every MEM resource of those devices, even if the
      resource was *already* sufficiently aligned.
      
      If a resource is already sufficiently aligned, leave it alone and don't try
      to reassign it.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      0dde1c08
    • B
      PCI: Factor pci_reassigndev_resource_alignment() · 81a5e70e
      Bjorn Helgaas 提交于
      Pull the BAR size adjustment out into a new function,
      pci_request_resource_alignment(), and add a comment about how and why we
      increase the resource size and alignment.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      81a5e70e
    • Y
      PCI: Add pcibios_default_alignment() for arch-specific alignment control · 0a701aa6
      Yongji Xie 提交于
      When VFIO passes through a PCI device to a guest, it does not allow the
      guest to mmap BARs that are smaller than PAGE_SIZE unless it can reserve
      the rest of the page (see vfio_pci_probe_mmaps()). This is because a page
      might contain several small BARs for unrelated devices and a guest should
      not be able to access all of them.
      
      VFIO emulates guest accesses to non-mappable BARs, which is functional but
      slow. On systems with large page sizes, e.g., PowerNV with 64K pages, BARs
      are more likely to share a page and performance is more likely to be a
      problem.
      
      Add a weak function to set default alignment for all PCI devices.  An arch
      can override it to force the PCI core to place memory BARs on their own
      pages.
      Signed-off-by: NYongji Xie <elohimes@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      0a701aa6
  14. 19 4月, 2017 2 次提交
    • L
      PCI: Freeze PME scan before suspending devices · ea00353f
      Lukas Wunner 提交于
      Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
      crashes during suspend tests.  Geert Uytterhoeven managed to reproduce the
      issue on an M2-W Koelsch board (r8a7791):
      
        It occurs when the PME scan runs, once per second.  During PME scan, the
        PCI host bridge (rcar-pci) registers are accessed while its module clock
        has already been disabled, leading to the crash.
      
      One reproducer is to configure s2ram to use "s2idle" instead of "deep"
      suspend:
      
        # echo 0 > /sys/module/printk/parameters/console_suspend
        # echo s2idle > /sys/power/mem_sleep
        # echo mem > /sys/power/state
      
      Another reproducer is to write either "platform" or "processors" to
      /sys/power/pm_test.  It does not (or is less likely) to happen during full
      system suspend ("core" or "none") because system suspend also disables
      timers, and thus the workqueue handling PME scans no longer runs.  Geert
      believes the issue may still happen in the small window between disabling
      module clocks and disabling timers:
      
        # echo 0 > /sys/module/printk/parameters/console_suspend
        # echo platform > /sys/power/pm_test    # Or "processors"
        # echo mem > /sys/power/state
      
      (Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)
      
      Rafael Wysocki agrees that PME scans should be suspended before the host
      bridge registers become inaccessible.  To that end, queue the task on a
      workqueue that gets frozen before devices suspend.
      
      Rafael notes however that as a result, some wakeup events may be missed if
      they are delivered via PME from a device without working IRQ (which hence
      must be polled) and occur after the workqueue has been frozen.  If that
      turns out to be an issue in practice, it may be possible to solve it by
      calling pci_pme_list_scan() once directly from one of the host bridge's
      pm_ops callbacks.
      
      Stacktrace for posterity:
      
        PM: Syncing filesystems ... [   38.566237] done.
        PM: Preparing system for sleep (mem)
        Freezing user space processes ... [   38.579813] (elapsed 0.001 seconds) done.
        Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
        PM: Suspending system (mem)
        PM: suspend of devices complete after 152.456 msecs
        PM: late suspend of devices complete after 2.809 msecs
        PM: noirq suspend of devices complete after 29.863 msecs
        suspend debug: Waiting for 5 second(s).
        Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
        pgd = c0003000
        [00000000] *pgd=80000040004003, *pmd=00000000
        Internal error: : 1211 [#1] SMP ARM
        Modules linked in:
        CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted
        4.9.0-rc1-koelsch-00011-g68db9bc8 #3383
        Hardware name: Generic R8A7791 (Flattened Device Tree)
        Workqueue: events pci_pme_list_scan
        task: eb56e140 task.stack: eb58e000
        PC is at pci_generic_config_read+0x64/0x6c
        LR is at rcar_pci_cfg_base+0x64/0x84
        pc : [<c041d7b4>]    lr : [<c04309a0>]    psr: 600d0093
        sp : eb58fe98  ip : c041d750  fp : 00000008
        r10: c0e2283c  r9 : 00000000  r8 : 600d0013
        r7 : 00000008  r6 : eb58fed6  r5 : 00000002  r4 : eb58feb4
        r3 : 00000000  r2 : 00000044  r1 : 00000008  r0 : 00000000
        Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
        Control: 30c5387d  Table: 6a9f6c80  DAC: 55555555
        Process kworker/1:1 (pid: 20, stack limit = 0xeb58e210)
        Stack: (0xeb58fe98 to 0xeb590000)
        fe80:                                                       00000002 00000044
        fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 eb78a000 eb78a000
        fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 00000000 eb78a000 c0e22830
        fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 eb55f380 c0e22838 eb9a9800 c0235fbc
        ff00: eb55f380 c0e22838 eb55f380 eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100
        ff20: eb55f398 c02366c4 eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000
        ff40: 00000000 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380
        ff60: 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 00000000
        ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac eb5631c0
        ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 00000000 00000000
        ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
        ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 3a81336c 10ccd1dd
        [<c041d7b4>] (pci_generic_config_read) from [<c041d9b0>]
        (pci_bus_read_config_word+0x58/0x80)
        [<c041d9b0>] (pci_bus_read_config_word) from [<c0424bf0>]
        (pci_check_pme_status+0x34/0x78)
        [<c0424bf0>] (pci_check_pme_status) from [<c0424c5c>] (pci_pme_wakeup+0x28/0x54)
        [<c0424c5c>] (pci_pme_wakeup) from [<c0424ce0>] (pci_pme_list_scan+0x58/0xb4)
        [<c0424ce0>] (pci_pme_list_scan) from [<c0235fbc>]
        (process_one_work+0x1bc/0x308)
        [<c0235fbc>] (process_one_work) from [<c02366c4>] (worker_thread+0x2a8/0x3e0)
        [<c02366c4>] (worker_thread) from [<c023a928>] (kthread+0xe4/0xfc)
        [<c023a928>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
        Code: ea000000 e5903000 f57ff04f e3a00000 (e5843000)
        ---[ end trace 667d43ba3aa9e589 ]---
      
      Fixes: df17e62e ("PCI: Add support for polling PME state on suspended legacy PCI devices")
      Reported-and-tested-by: NLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Reported-and-tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: stable@vger.kernel.org	# 2.6.37+
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Cc: Simon Horman <horms+renesas@verge.net.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      ea00353f
    • Y
      PCI: Ignore requested alignment for IOV BARs · ea629d87
      Yongji Xie 提交于
      We would call pci_reassigndev_resource_alignment() before
      pci_init_capabilities().  So the requested alignment would never work for
      IOV BARs.
      
      Furthermore, it's meaningless to request additional alignment for IOV BARs,
      the IOV BAR alignment is only determined by the VF BAR size.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      ea629d87
  15. 04 4月, 2017 1 次提交
  16. 30 3月, 2017 1 次提交
  17. 15 3月, 2017 1 次提交
  18. 03 2月, 2017 1 次提交
  19. 30 11月, 2016 1 次提交
  20. 29 11月, 2016 1 次提交
  21. 18 11月, 2016 7 次提交
    • L
      PCI: pciehp: Add runtime PM support for PCIe hotplug ports · 68db9bc8
      Lukas Wunner 提交于
      Linux 4.8 added support for runtime suspending PCIe ports to D3hot with
      commit 006d44e4 ("PCI: Add runtime PM support for PCIe ports"), but
      excluded hotplug ports.  Those are now afforded runtime PM by the present
      commit.
      
      Hotplug ports require a few extra considerations:
      
      - The configuration space of the port remains accessible in D3hot, so all
        the functions to read or modify the Slot Status and Slot Control
        registers need not be modified.  Even turning on slot power doesn't seem
        to require the port to be in D0, at least the PCIe spec doesn't say so
        and I confirmed that by testing with a Thunderbolt controller.
      
      - However D0 is required to access devices on the secondary bus.  This
        happens in pciehp_check_link_status() and pciehp_configure_device() (both
        called from board_added()) and in pciehp_unconfigure_device() (called
        from remove_board()), so acquire a runtime PM ref for their invocation.
      
      - The hotplug port stays active as long as it has active children.  If all
        hotplugged devices below the port runtime suspend, the port is allowed to
        runtime suspend as well.  Plug and unplug detection continues to work in
        D3hot.
      
      - Hotplug interrupts are delivered in-band, so while the hotplug port
        itself is allowed to go to D3hot, its parent ports must stay in D0 for
        interrupts to come through.  Add a corresponding restriction to
        pci_dev_check_d3cold().
      
      - Runtime PM may only be allowed if the hotplug port is handled natively by
        the OS.  On ACPI systems, the port may alternatively be handled by the
        firmware and things break if the OS puts the port into D3 behind the
        firmware's back:  E.g. Thunderbolt hotplug ports on non-Macs are handled
        by Intel's firmware in System Management Mode and the firmware is known
        to access devices on the port's secondary bus without checking first if
        the port is in D0: https://bugzilla.kernel.org/show_bug.cgi?id=53811Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      CC: Mika Westerberg <mika.westerberg@linux.intel.com>
      68db9bc8
    • L
      PCI: Unfold conditions to block runtime PM on PCIe ports · 718a0609
      Lukas Wunner 提交于
      The conditions to block D3 on parent ports are currently condensed into a
      single expression in pci_dev_check_d3cold().  Upcoming commits will add
      further conditions for hotplug ports, making this expression fairly large
      and impenetrable.  Unfold the conditions to maintain readability when they
      are amended.
      
      No functional change intended.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      CC: Mika Westerberg <mika.westerberg@linux.intel.com>
      CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      718a0609
    • L
      PCI: Consolidate conditions to allow runtime PM on PCIe ports · 97a90aee
      Lukas Wunner 提交于
      The conditions to allow runtime PM on PCIe ports are currently spread
      across two different files:  The condition relating to hotplug ports is
      located in portdrv_pci.c whereas all other conditions are located in pci.c.
      
      Consolidate all conditions in a single place in pci.c, thus making it
      easier to follow the logic and amend conditions down the road.
      
      Note that the condition relating to hotplug ports is inserted *before* the
      condition relating to the "pcie_port_pm=force" command line option, so
      runtime PM is not afforded to hotplug ports even if this option is given.
      That's exactly how the code behaved up until now.  If this is not desired,
      the ordering of the conditions can simply be reversed.
      
      No functional change intended.
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      97a90aee
    • L
      PCI: Activate runtime PM on a PCIe port only if it can suspend · c6a63307
      Lukas Wunner 提交于
      Currently pcie_portdrv_probe() activates runtime PM on a PCIe port even
      if it will never actually suspend because the BIOS is too old or the
      "pcie_port_pm=off" option was specified on the kernel command line.
      
      A few CPU cycles can be saved by not activating runtime PM at all in these
      cases, because rpm_idle() and rpm_suspend() will bail out right at the
      beginning when calling rpm_check_suspend_allowed(), instead of carrying out
      various locking and assignments, invoking rpm_callback(), getting back
      -EBUSY and rolling everything back.
      
      The conditions checked in pci_bridge_d3_possible() are all static, they
      never change during uptime of the system, hence it's safe to call this to
      determine if runtime PM should be activated.
      
      No functional change intended.
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c6a63307
    • L
      PCI: Speed up algorithm in pci_bridge_d3_update() · e8559b71
      Lukas Wunner 提交于
      After a device has been added, removed or had its D3cold attributes
      changed, we recheck whether its parent bridge may runtime suspend to D3hot
      with pci_bridge_d3_update().
      
      The most naive algorithm would be to iterate over the bridge's children and
      check if any of them are blocking D3.
      
      The function already tries to be a bit smarter than that by first checking
      the device that was changed.  If this device already blocks D3 on the
      bridge, then walking over all the other children can be skipped.  A
      drawback of this approach is that if the device is *not* blocking D3, it
      will be checked a second time by pci_walk_bus().  But that's cheap and is
      outweighed by the performance gain of potentially skipping pci_walk_bus()
      altogether.
      
      The algorithm can be optimized further by taking into account if D3 is
      currently allowed for the bridge, as shown in the following truth table:
      
      (a)  remove &&  bridge_d3:  D3 is currently allowed for the bridge and
                                  removing one of its children won't change
                                  that.  No action necessary.
      (b)  remove && !bridge_d3:  D3 may now be allowed for the bridge if the
                                  removed child was the only one blocking it.
                                  Check all its siblings to verify that.
      (c) !remove &&  bridge_d3:  D3 may now be disallowed but this can only
                                  be caused by the added/changed child, not
                                  any of its siblings.  Check only that single
                                  device.
      (d) !remove && !bridge_d3:  D3 may now be allowed for the bridge if the
                                  changed child was the only one blocking it.
                                  Check all its siblings to verify that.
                                  By checking beforehand if the changed child
                                  is blocking D3, we may be able to skip
                                  checking its siblings.
      
      Currently we do not special-case option (a) and in case of option (c) we
      gratuitously call pci_walk_bus().  Speed up the algorithm by adding these
      optimizations.  Reword the comments a bit in an attempt to improve clarity.
      
      No functional change intended.
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e8559b71
    • L
      PCI: Autosense device removal in pci_bridge_d3_update() · 1ed276a7
      Lukas Wunner 提交于
      The algorithm to update the flag indicating whether a bridge may go to D3
      makes a few optimizations based on whether the update was caused by the
      removal of a device on the one hand, versus the addition of a device or the
      change of its D3cold flags on the other hand.
      
      The information whether the update pertains to a removal is currently
      passed in by the caller, but the function may as well determine that itself
      by examining the device in question, thereby allowing for a considerable
      simplification and reduction of the code.
      
      Out of several options to determine removal, I've chosen the function
      device_is_registered() because it's cheap:  It merely returns the
      dev->kobj.state_in_sysfs flag.  That flag is set through device_add() when
      the root bus is scanned and cleared through device_remove().  The call to
      pci_bridge_d3_update() happens after each of these calls, respectively, so
      the ordering is correct.
      
      No functional change intended.
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1ed276a7
    • L
      PCI: Don't acquire ref on parent in pci_bridge_d3_update() · 738a7edb
      Lukas Wunner 提交于
      This function is always called with an existing pci_dev struct, which
      holds a reference on the pci_bus struct it resides on, which in turn
      holds a reference on pci_bus->bridge, which is the pci_dev's parent.
      
      Hence there's no need to acquire an additional ref on the parent.
      
      More specifically, the pci_dev exists until pci_destroy_dev() drops the
      final reference on it, so all calls to pci_bridge_d3_update() must be
      finished before that.  It is arguably the caller's responsibility to ensure
      that it doesn't call pci_bridge_d3_update() with a pci_dev that might
      suddenly disappear, but in any case the existing callers are all safe:
      
      - The call in pci_destroy_dev() happens before the call to put_device().
      - The call in pci_bus_add_device() is synchronized with pci_destroy_dev()
        using pci_lock_rescan_remove().
      - The calls to pci_d3cold_disable() from the xhci and nouveau drivers
        are safe because a ref on the pci_dev is held as long as it's bound to
        a driver.
      - The calls to pci_d3cold_enable() / pci_d3cold_disable() when modifying
        the sysfs "d3cold_allowed" entry are also safe because kernfs_drain()
        waits for existing sysfs users to finish before removing the entry,
        and pci_destroy_dev() is called way after that.
      
      No functional change intended.
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      738a7edb
  22. 12 11月, 2016 1 次提交
    • A
      PCI: Check for PME in targeted sleep state · 6496ebd7
      Alan Stern 提交于
      One some systems, the firmware does not allow certain PCI devices to be put
      in deep D-states.  This can cause problems for wakeup signalling, if the
      device does not support PME# in the deepest allowed suspend state.  For
      example, Pierre reports that on his system, ACPI does not permit his xHCI
      host controller to go into D3 during runtime suspend -- but D3 is the only
      state in which the controller can generate PME# signals.  As a result, the
      controller goes into runtime suspend but never wakes up, so it doesn't work
      properly.  USB devices plugged into the controller are never detected.
      
      If the device relies on PME# for wakeup signals but is not capable of
      generating PME# in the target state, the PCI core should accurately report
      that it cannot do wakeup from runtime suspend.  This patch modifies the
      pci_dev_run_wake() routine to add this check.
      Reported-by: NPierre de Villemereuil <flyos@mailoo.org>
      Tested-by: NPierre de Villemereuil <flyos@mailoo.org>
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      CC: stable@vger.kernel.org
      CC: Lukas Wunner <lukas@wunner.de>
      6496ebd7
  23. 29 9月, 2016 3 次提交
    • Y
      PCI: Ignore requested alignment for VF BARs · 62d9a78f
      Yongji Xie 提交于
      Resource allocation for VFs is done via the VF BARx registers in the PF's
      SR-IOV Capability, and the BARs in the VFs themselves are read-only zeros
      (see SR-IOV spec r1.1, secs 3.3.14 and 3.4.1.11).
      
      Even though the actual VF BARs are read-only zeros, the VF dev->resource[]
      structs describe the space allocated for the VF (this is a piece of the
      space described by the VF BARx register in the PF's SR-IOV capability).
      
      It's meaningless to request additional alignment for a VF: the VF BAR
      alignment is completely determined by the alignment of the VF BARx in the
      PF and the size of the VF BAR.
      
      Ignore the user's alignment requests for VF devices.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      62d9a78f
    • Y
      PCI: Ignore requested alignment for PROBE_ONLY and fixed resources · f0b99f70
      Yongji Xie 提交于
      Users may request additional alignment of PCI resources, e.g., to align
      BARs on page boundaries so they can be shared with guests via VFIO.  This
      of course may require reallocation if firmware has already assigned the
      BARs with smaller alignments.
      
      If the platform has requested PCI_PROBE_ONLY, we should never change any
      PCI BARs, so we can't provide any additional alignment.  Also, if a BAR is
      marked as IORESOURCE_PCI_FIXED, e.g., for PCI Enhanced Allocation or if the
      firmware depends on the current BAR value, we can't change the alignment.
      
      In these cases, log a message and ignore the user's alignment requests.
      
      [bhelgaas: changelog, use goto to simplify PCI_PROBE_ONLY check]
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      f0b99f70
    • L
      PCI: Recognize D3cold in pci_update_current_state() · a6a64026
      Lukas Wunner 提交于
      Whenever a device is resumed or its power state is changed using the
      platform, its new power state is read from the PM Control & Status Register
      and cached in pci_dev->current_state by calling pci_update_current_state().
      
      If the device is in D3cold, reading from config space typically results in
      a fabricated "all ones" response.  But if it's in D3hot, the two bits
      representing the power state in the PMCSR are *also* set to 1.  Thus D3hot
      and D3cold are not discernible by just reading the PMCSR.
      
      To account for this, pci_update_current_state() uses two workarounds:
      
      - When transitioning to D3cold using pci_platform_power_transition(), the
        new power state is set blindly by pci_update_current_state(), i.e.
        without verifying that the device actually *is* in D3cold.  This is
        achieved by setting the "state" argument to PCI_D3cold.  The "state"
        argument was originally intended to convey the new state in case the
        device doesn't have the PM capability.  It is *also* used to convey the
        device state if the PM capability is present and the new state is D3cold,
        but this was never explained in the kerneldoc.
      
      - Once the current_state is set to D3cold, further invocations of
        pci_update_current_state() will blindly assume that the device is still
        in D3cold and leave the current_state unmodified.  To get out of this
        impasse, the current_state has to be set directly, typically by calling
        pci_raw_set_power_state() or pci_enable_device().
      
      It would be desirable if pci_update_current_state() could reliably detect
      D3cold by itself.  That would allow us to do away with these workarounds,
      and it would allow for a smarter, more energy conserving runtime resume
      strategy after system sleep:  Currently devices which utilize
      direct_complete are mandatorily runtime resumed in their ->complete stage.
      This can be avoided if their power state after system sleep is the same as
      before, but it requires a mechanism to detect the power state reliably.
      
      We've just gained the ability to query the platform firmware for its
      opinion on the device's power state.  On platforms conforming to ACPI 4.0
      or newer, this allows recognition of D3cold.  Pre-4.0 platforms lack _PR3
      and therefore the deepest power state that will ever be reported is D3hot,
      even though the device may actually be in D3cold.  To detect D3cold in
      those cases, accessibility of the vendor ID in config space is probed using
      pci_device_is_present().  This also works for devices which are not
      platform-power-manageable at all, but can be suspended to D3cold using a
      nonstandard mechanism (e.g. some hybrid graphics laptops or Thunderbolt on
      the Mac).
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a6a64026