1. 05 10月, 2016 5 次提交
  2. 30 9月, 2016 1 次提交
  3. 29 9月, 2016 6 次提交
    • Y
      PCI: Ignore requested alignment for VF BARs · 62d9a78f
      Yongji Xie 提交于
      Resource allocation for VFs is done via the VF BARx registers in the PF's
      SR-IOV Capability, and the BARs in the VFs themselves are read-only zeros
      (see SR-IOV spec r1.1, secs 3.3.14 and 3.4.1.11).
      
      Even though the actual VF BARs are read-only zeros, the VF dev->resource[]
      structs describe the space allocated for the VF (this is a piece of the
      space described by the VF BARx register in the PF's SR-IOV capability).
      
      It's meaningless to request additional alignment for a VF: the VF BAR
      alignment is completely determined by the alignment of the VF BARx in the
      PF and the size of the VF BAR.
      
      Ignore the user's alignment requests for VF devices.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      62d9a78f
    • Y
      PCI: Ignore requested alignment for PROBE_ONLY and fixed resources · f0b99f70
      Yongji Xie 提交于
      Users may request additional alignment of PCI resources, e.g., to align
      BARs on page boundaries so they can be shared with guests via VFIO.  This
      of course may require reallocation if firmware has already assigned the
      BARs with smaller alignments.
      
      If the platform has requested PCI_PROBE_ONLY, we should never change any
      PCI BARs, so we can't provide any additional alignment.  Also, if a BAR is
      marked as IORESOURCE_PCI_FIXED, e.g., for PCI Enhanced Allocation or if the
      firmware depends on the current BAR value, we can't change the alignment.
      
      In these cases, log a message and ignore the user's alignment requests.
      
      [bhelgaas: changelog, use goto to simplify PCI_PROBE_ONLY check]
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      f0b99f70
    • L
      PCI: Avoid unnecessary resume after direct-complete · a0d2a959
      Lukas Wunner 提交于
      Commit 58a1fbbb ("PM / PCI / ACPI: Kick devices that might have been
      reset by firmware") added a runtime resume for devices that were runtime
      suspended when the system entered sleep.
      
      The motivation was that devices might be in a reset-power-on state after
      waking from system sleep, so their power state as perceived by Linux
      (stored in pci_dev->current_state) would no longer reflect reality.  By
      resuming such devices, we allow them to return to a low-power state via
      autosuspend and also bring their current_state in sync with reality.
      
      However for devices that are *not* in a reset-power-on state, doing an
      unconditional resume wastes energy.  A more refined approach is called for
      which issues a runtime resume only if the power state after direct-complete
      is shallower than it was before. To achieve this, update the device's
      current_state and compare it to its pre-sleep value.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a0d2a959
    • L
      PCI: Recognize D3cold in pci_update_current_state() · a6a64026
      Lukas Wunner 提交于
      Whenever a device is resumed or its power state is changed using the
      platform, its new power state is read from the PM Control & Status Register
      and cached in pci_dev->current_state by calling pci_update_current_state().
      
      If the device is in D3cold, reading from config space typically results in
      a fabricated "all ones" response.  But if it's in D3hot, the two bits
      representing the power state in the PMCSR are *also* set to 1.  Thus D3hot
      and D3cold are not discernible by just reading the PMCSR.
      
      To account for this, pci_update_current_state() uses two workarounds:
      
      - When transitioning to D3cold using pci_platform_power_transition(), the
        new power state is set blindly by pci_update_current_state(), i.e.
        without verifying that the device actually *is* in D3cold.  This is
        achieved by setting the "state" argument to PCI_D3cold.  The "state"
        argument was originally intended to convey the new state in case the
        device doesn't have the PM capability.  It is *also* used to convey the
        device state if the PM capability is present and the new state is D3cold,
        but this was never explained in the kerneldoc.
      
      - Once the current_state is set to D3cold, further invocations of
        pci_update_current_state() will blindly assume that the device is still
        in D3cold and leave the current_state unmodified.  To get out of this
        impasse, the current_state has to be set directly, typically by calling
        pci_raw_set_power_state() or pci_enable_device().
      
      It would be desirable if pci_update_current_state() could reliably detect
      D3cold by itself.  That would allow us to do away with these workarounds,
      and it would allow for a smarter, more energy conserving runtime resume
      strategy after system sleep:  Currently devices which utilize
      direct_complete are mandatorily runtime resumed in their ->complete stage.
      This can be avoided if their power state after system sleep is the same as
      before, but it requires a mechanism to detect the power state reliably.
      
      We've just gained the ability to query the platform firmware for its
      opinion on the device's power state.  On platforms conforming to ACPI 4.0
      or newer, this allows recognition of D3cold.  Pre-4.0 platforms lack _PR3
      and therefore the deepest power state that will ever be reported is D3hot,
      even though the device may actually be in D3cold.  To detect D3cold in
      those cases, accessibility of the vendor ID in config space is probed using
      pci_device_is_present().  This also works for devices which are not
      platform-power-manageable at all, but can be suspended to D3cold using a
      nonstandard mechanism (e.g. some hybrid graphics laptops or Thunderbolt on
      the Mac).
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a6a64026
    • L
      PCI: Query platform firmware for device power state · cc7cc02b
      Lukas Wunner 提交于
      Usually the most accurate way to determine a PCI device's power state is to
      read its PM Control & Status Register.  There are two cases however when
      this is not an option:  If the device doesn't have the PM capability at
      all, or if it is in D3cold (in which case its config space is
      inaccessible).
      
      In both cases, we can alternatively query the platform firmware for its
      opinion on the device's power state.  To facilitate this, augment struct
      pci_platform_pm_ops with a ->get_power callback and implement it for
      acpi_pci_platform_pm (the only pci_platform_pm_ops existing so far).
      
      It is used by a forthcoming commit to let pci_update_current_state()
      recognize D3cold.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cc7cc02b
    • L
      PCI: Afford direct-complete to devices with non-standard PM · 4132a577
      Lukas Wunner 提交于
      There are devices not power-manageable by the platform, but still able to
      runtime suspend to D3cold with a non-standard mechanism.  One example is
      laptop hybrid graphics where the discrete GPU and its built-in HDA
      controller are power-managed either with a _DSM (AMD PowerXpress, Nvidia
      Optimus) or a separate gmux controller (MacBook Pro).  Another example is
      Thunderbolt on Macs which is power-managed with custom ACPI methods.
      
      When putting the system to sleep, we currently handle such devices
      improperly by transitioning them from D3cold to D3hot (the default power
      state defined at the top of pci_target_state()).  This wastes energy and
      prolongs the suspend sequence (powering up the Thunderbolt controller takes
      2 seconds).
      
      Avoid that by assuming that a non-standard PM mechanism is at work if the
      device is not platform-power-manageable but currently in D3cold.
      
      If the device is wakeup enabled, we might still have to wake it up from
      D3cold if PME cannot be signaled from that power state.
      
      The check for devices without PM capability comes before the check for
      D3cold since such devices could in theory also be powered down by
      non-standard means and should then be afforded direct-complete as well.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4132a577
  4. 28 9月, 2016 2 次提交
  5. 23 9月, 2016 1 次提交
    • K
      PCI: pciehp: Allow exclusive userspace control of indicators · 576243b3
      Keith Busch 提交于
      PCIe hotplug supports optional Attention and Power Indicators, which are
      used internally by pciehp.  Users can't control the Power Indicator, but
      they can control the Attention Indicator by writing to a sysfs "attention"
      file.
      
      The Slot Control register has two bits for each indicator, and the PCIe
      spec defines the encodings for each as (Reserved/On/Blinking/Off).  For
      sysfs "attention" writes, pciehp_set_attention_status() maps into these
      encodings, so the only useful write values are 0 (Off), 1 (On), and 2
      (Blinking).
      
      However, some platforms use all four bits for platform-specific indicators,
      and they need to allow direct user control of them while preventing pciehp
      from using them at all.
      
      Add a "hotplug_user_indicators" flag to the pci_dev structure.  When set,
      pciehp does not use either the Attention Indicator or the Power Indicator,
      and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
      are written directly to the Attention Indicator Control and Power Indicator
      Control fields.
      
      [bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      576243b3
  6. 21 9月, 2016 1 次提交
  7. 15 9月, 2016 12 次提交
  8. 13 9月, 2016 12 次提交