1. 24 5月, 2022 2 次提交
  2. 19 5月, 2022 3 次提交
    • A
      vfio/pci: Move the unused device into low power state with runtime PM · 7ab5e10e
      Abhishek Sahu 提交于
      Currently, there is very limited power management support
      available in the upstream vfio_pci_core based drivers. If there
      are no users of the device, then the PCI device will be moved into
      D3hot state by writing directly into PCI PM registers. This D3hot
      state help in saving power but we can achieve zero power consumption
      if we go into the D3cold state. The D3cold state cannot be possible
      with native PCI PM. It requires interaction with platform firmware
      which is system-specific. To go into low power states (including D3cold),
      the runtime PM framework can be used which internally interacts with PCI
      and platform firmware and puts the device into the lowest possible
      D-States.
      
      This patch registers vfio_pci_core based drivers with the
      runtime PM framework.
      
      1. The PCI core framework takes care of most of the runtime PM
         related things. For enabling the runtime PM, the PCI driver needs to
         decrement the usage count and needs to provide 'struct dev_pm_ops'
         at least. The runtime suspend/resume callbacks are optional and needed
         only if we need to do any extra handling. Now there are multiple
         vfio_pci_core based drivers. Instead of assigning the
         'struct dev_pm_ops' in individual parent driver, the vfio_pci_core
         itself assigns the 'struct dev_pm_ops'. There are other drivers where
         the 'struct dev_pm_ops' is being assigned inside core layer
         (For example, wlcore_probe() and some sound based driver, etc.).
      
      2. This patch provides the stub implementation of 'struct dev_pm_ops'.
         The subsequent patch will provide the runtime suspend/resume
         callbacks. All the config state saving, and PCI power management
         related things will be done by PCI core framework itself inside its
         runtime suspend/resume callbacks (pci_pm_runtime_suspend() and
         pci_pm_runtime_resume()).
      
      3. Inside pci_reset_bus(), all the devices in dev_set needs to be
         runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take
         care of the runtime resume and its error handling.
      
      4. Inside vfio_pci_core_disable(), the device usage count always needs
         to be decremented which was incremented in vfio_pci_core_enable().
      
      5. Since the runtime PM framework will provide the same functionality,
         so directly writing into PCI PM config register can be replaced with
         the use of runtime PM routines. Also, the use of runtime PM can help
         us in more power saving.
      
         In the systems which do not support D3cold,
      
         With the existing implementation:
      
         // PCI device
         # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
         D3hot
         // upstream bridge
         # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
         D0
      
         With runtime PM:
      
         // PCI device
         # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
         D3hot
         // upstream bridge
         # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
         D3hot
      
         So, with runtime PM, the upstream bridge or root port will also go
         into lower power state which is not possible with existing
         implementation.
      
         In the systems which support D3cold,
      
         // PCI device
         # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
         D3hot
         // upstream bridge
         # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
         D0
      
         With runtime PM:
      
         // PCI device
         # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state
         D3cold
         // upstream bridge
         # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state
         D3cold
      
         So, with runtime PM, both the PCI device and upstream bridge will
         go into D3cold state.
      
      6. If 'disable_idle_d3' module parameter is set, then also the runtime
         PM will be enabled, but in this case, the usage count should not be
         decremented.
      
      7. vfio_pci_dev_set_try_reset() return value is unused now, so this
         function return type can be changed to void.
      
      8. Use the runtime PM API's in vfio_pci_core_sriov_configure().
         The device can be in low power state either with runtime
         power management (when there is no user) or PCI_PM_CTRL register
         write by the user. In both the cases, the PF should be moved to
         D0 state. For preventing any runtime usage mismatch, pci_num_vf()
         has been called explicitly during disable.
      Signed-off-by: NAbhishek Sahu <abhsahu@nvidia.com>
      Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      7ab5e10e
    • A
      vfio/pci: Virtualize PME related registers bits and initialize to zero · 54918c28
      Abhishek Sahu 提交于
      If any PME event will be generated by PCI, then it will be mostly
      handled in the host by the root port PME code. For example, in the case
      of PCIe, the PME event will be sent to the root port and then the PME
      interrupt will be generated. This will be handled in
      drivers/pci/pcie/pme.c at the host side. Inside this, the
      pci_check_pme_status() will be called where PME_Status and PME_En bits
      will be cleared. So, the guest OS which is using vfio-pci device will
      not come to know about this PME event.
      
      To handle these PME events inside guests, we need some framework so
      that if any PME events will happen, then it needs to be forwarded to
      virtual machine monitor. We can virtualize PME related registers bits
      and initialize these bits to zero so vfio-pci device user will assume
      that it is not capable of asserting the PME# signal from any power state.
      Signed-off-by: NAbhishek Sahu <abhsahu@nvidia.com>
      Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      54918c28
    • A
      vfio/pci: Change the PF power state to D0 before enabling VFs · f4162eb1
      Abhishek Sahu 提交于
      According to [PCIe v5 9.6.2] for PF Device Power Management States
      
       "The PF's power management state (D-state) has global impact on its
        associated VFs. If a VF does not implement the Power Management
        Capability, then it behaves as if it is in an equivalent
        power state of its associated PF.
      
        If a VF implements the Power Management Capability, the Device behavior
        is undefined if the PF is placed in a lower power state than the VF.
        Software should avoid this situation by placing all VFs in lower power
        state before lowering their associated PF's power state."
      
      From the vfio driver side, user can enable SR-IOV when the PF is in D3hot
      state. If VF does not implement the Power Management Capability, then
      the VF will be actually in D3hot state and then the VF BAR access will
      fail. If VF implements the Power Management Capability, then VF will
      assume that its current power state is D0 when the PF is D3hot and
      in this case, the behavior is undefined.
      
      To support PF power management, we need to create power management
      dependency between PF and its VF's. The runtime power management support
      may help with this where power management dependencies are supported
      through device links. But till we have such support in place, we can
      disallow the PF to go into low power state, if PF has VF enabled.
      There can be a case, where user first enables the VF's and then
      disables the VF's. If there is no user of PF, then the PF can put into
      D3hot state again. But with this patch, the PF will still be in D0
      state after disabling VF's since detecting this case inside
      vfio_pci_core_sriov_configure() requires access to
      struct vfio_device::open_count along with its locks. But the subsequent
      patches related to runtime PM will handle this case since runtime PM
      maintains its own usage count.
      
      Also, vfio_pci_core_sriov_configure() can be called at any time
      (with and without vfio pci device user), so the power state change
      and SR-IOV enablement need to be protected with the required locks.
      Signed-off-by: NAbhishek Sahu <abhsahu@nvidia.com>
      Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      f4162eb1
  3. 18 5月, 2022 8 次提交
  4. 17 5月, 2022 1 次提交
  5. 14 5月, 2022 10 次提交
  6. 13 5月, 2022 1 次提交
  7. 12 5月, 2022 12 次提交
  8. 11 5月, 2022 3 次提交