1. 25 5月, 2022 1 次提交
  2. 15 11月, 2021 1 次提交
  3. 15 10月, 2021 1 次提交
  4. 13 10月, 2021 1 次提交
  5. 02 10月, 2020 1 次提交
    • D
      PCI: hv: Fix hibernation in case interrupts are not re-created · 915cff7f
      Dexuan Cui 提交于
      pci_restore_msi_state() directly writes the MSI/MSI-X related registers
      via MMIO. On a physical machine, this works perfectly; for a Linux VM
      running on a hypervisor, which typically enables IOMMU interrupt remapping,
      the hypervisor usually should trap and emulate the MMIO accesses in order
      to re-create the necessary interrupt remapping table entries in the IOMMU,
      otherwise the interrupts can not work in the VM after hibernation.
      
      Hyper-V is different from other hypervisors in that it does not trap and
      emulate the MMIO accesses, and instead it uses a para-virtualized method,
      which requires the VM to call hv_compose_msi_msg() to notify the hypervisor
      of the info that would be passed to the hypervisor in the case of the
      trap-and-emulate method. This is not an issue to a lot of PCI device
      drivers, which destroy and re-create the interrupts across hibernation, so
      hv_compose_msi_msg() is called automatically. However, some PCI device
      drivers (e.g. the in-tree GPU driver nouveau and the out-of-tree Nvidia
      proprietary GPU driver) do not destroy and re-create MSI/MSI-X interrupts
      across hibernation, so hv_pci_resume() has to call hv_compose_msi_msg(),
      otherwise the PCI device drivers can no longer receive interrupts after
      the VM resumes from hibernation.
      
      Hyper-V is also different in that chip->irq_unmask() may fail in a
      Linux VM running on Hyper-V (on a physical machine, chip->irq_unmask()
      can not fail because unmasking an MSI/MSI-X register just means an MMIO
      write): during hibernation, when a CPU is offlined, the kernel tries
      to move the interrupt to the remaining CPUs that haven't been offlined
      yet. In this case, hv_irq_unmask() -> hv_do_hypercall() always fails
      because the vmbus channel has been closed: here the early "return" in
      hv_irq_unmask() means the pci_msi_unmask_irq() is not called, i.e. the
      desc->masked remains "true", so later after hibernation, the MSI interrupt
      always remains masked, which is incorrect. Refer to cpu_disable_common()
      -> fixup_irqs() -> irq_migrate_all_off_this_cpu() -> migrate_one_irq():
      
      static bool migrate_one_irq(struct irq_desc *desc)
      {
      ...
              if (maskchip && chip->irq_mask)
                      chip->irq_mask(d);
      ...
              err = irq_do_set_affinity(d, affinity, false);
      ...
              if (maskchip && chip->irq_unmask)
                      chip->irq_unmask(d);
      
      Fix the issue by calling pci_msi_unmask_irq() unconditionally in
      hv_irq_unmask(). Also suppress the error message for hibernation because
      the hypercall failure during hibernation does not matter (at this time
      all the devices have been frozen). Note: the correct affinity info is
      still updated into the irqdata data structure in migrate_one_irq() ->
      irq_do_set_affinity() -> hv_set_affinity(), so later when the VM
      resumes, hv_pci_restore_msi_state() is able to correctly restore
      the interrupt with the correct affinity.
      
      Link: https://lore.kernel.org/r/20201002085158.9168-1-decui@microsoft.com
      Fixes: ac82fc83 ("PCI: hv: Add hibernation support")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NJake Oshins <jakeo@microsoft.com>
      915cff7f
  6. 28 9月, 2020 1 次提交
  7. 16 9月, 2020 2 次提交
  8. 28 7月, 2020 1 次提交
  9. 27 7月, 2020 1 次提交
  10. 28 5月, 2020 1 次提交
    • G
      PCI: hv: Use struct_size() helper · d0684fd0
      Gustavo A. R. Silva 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct hv_dr_state {
      	...
              struct hv_pcidev_description func[];
      };
      
      struct pci_bus_relations {
      	...
              struct pci_function_description func[];
      } __packed;
      
      Make use of the struct_size() helper instead of an open-coded version
      in order to avoid any potential type mistakes.
      
      So, replace the following forms:
      
      offsetof(struct hv_dr_state, func) +
      	(sizeof(struct hv_pcidev_description) *
      	(relations->device_count))
      
      offsetof(struct pci_bus_relations, func) +
      	(sizeof(struct pci_function_description) *
      	(bus_rel->device_count))
      
      with:
      
      struct_size(dr, func, relations->device_count)
      
      and
      
      struct_size(bus_rel, func, bus_rel->device_count)
      
      respectively.
      
      Link: https://lore.kernel.org/r/20200525164319.GA13596@embeddedorSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NWei Liu <wei.liu@kernel.org>
      d0684fd0
  11. 11 5月, 2020 2 次提交
  12. 23 4月, 2020 1 次提交
    • A
      PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU... · 240ad77c
      Andrea Parri (Microsoft) 提交于
      PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
      
      The current implementation of hv_compose_msi_msg() is incompatible with
      the new functionality that allows changing the vCPU a VMBus channel will
      interrupt: if this function always calls hv_pci_onchannelcallback() in
      the polling loop, the interrupt going to a different CPU could cause
      hv_pci_onchannelcallback() to be running simultaneously in a tasklet,
      which will break.  The current code also has a problem in that it is not
      synchronized with vmbus_reset_channel_cb(): hv_compose_msi_msg() could
      be accessing the ring buffer via the call of hv_pci_onchannelcallback()
      well after the time that vmbus_reset_channel_cb() has finished.
      
      Fix these issues as follows.  Disable the channel tasklet before
      entering the polling loop in hv_compose_msi_msg() and re-enable it when
      done.  This will prevent hv_pci_onchannelcallback() from running in a
      tasklet on a different CPU.  Moreover, poll by always calling
      hv_pci_onchannelcallback(), but check the channel callback function for
      NULL and invoke the callback within a sched_lock critical section.  This
      will prevent hv_compose_msi_msg() from accessing the ring buffer after
      vmbus_reset_channel_cb() has acquired the sched_lock spinlock.
      Suggested-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NAndrea Parri (Microsoft) <parri.andrea@gmail.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: <linux-pci@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200406001514.19876-8-parri.andrea@gmail.comReviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NWei Liu <wei.liu@kernel.org>
      240ad77c
  13. 09 3月, 2020 3 次提交
  14. 06 3月, 2020 3 次提交
  15. 24 2月, 2020 2 次提交
  16. 26 11月, 2019 4 次提交
    • D
      PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer · 877b911a
      Dexuan Cui 提交于
      With the recent 59bb4798 ("mm, sl[aou]b: guarantee natural
      alignment for kmalloc(power-of-two)"), kzalloc() is able to allocate
      a 4KB buffer that is guaranteed to be 4KB-aligned. Here the size and
      alignment of hbus is important because hbus's field
      retarget_msi_interrupt_params must not cross a 4KB page boundary.
      
      Here we prefer kzalloc to get_zeroed_page(), because a buffer
      allocated by the latter is not tracked and scanned by kmemleak, and
      hence kmemleak reports the pointer contained in the hbus buffer
      (i.e. the hpdev struct, which is created in new_pcichild_device() and
      is tracked by hbus->children) as memory leak (false positive).
      
      If the kernel doesn't have 59bb4798, get_zeroed_page() *must* be
      used to allocate the hbus buffer and we can avoid the kmemleak false
      positive by using kmemleak_alloc() and kmemleak_free() to ask
      kmemleak to track and scan the hbus buffer.
      Reported-by: NLili Deng <v-lide@microsoft.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      877b911a
    • D
      PCI: hv: Change pci_protocol_version to per-hbus · 14ef39fd
      Dexuan Cui 提交于
      A VM can have multiple Hyper-V hbus. It's incorrect to set the global
      variable 'pci_protocol_version' when *every* hbus is initialized in
      hv_pci_protocol_negotiation(). This is not an issue in practice since
      every hbus should have the same value of hbus->protocol_version, but
      we should make the variable per-hbus, so in case we have busses
      with different protocol versions, the driver can still work correctly.
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      14ef39fd
    • D
      PCI: hv: Add hibernation support · ac82fc83
      Dexuan Cui 提交于
      Add suspend() and resume() functions so that Hyper-V virtual PCI devices
      are handled properly when the VM hibernates and resumes from
      hibernation.
      
      Note that the suspend() function must make sure there are no pending
      work items before calling vmbus_close(), since it runs in a process
      context as a callback in dpm_suspend(). When it starts to run, the
      channel callback hv_pci_onchannelcallback(), which runs in a tasklet
      context, can be still running concurrently and scheduling new work items
      onto hbus->wq in hv_pci_devices_present() and hv_pci_eject_device(), and
      the work item handlers can access the vmbus channel, which can be being
      closed by hv_pci_suspend(), e.g. the work item handler
      pci_devices_present_work() -> new_pcichild_device() writes to the vmbus
      channel.
      
      To eliminate the race, hv_pci_suspend() disables the channel callback
      tasklet, sets hbus->state to hv_pcibus_removing, and re-enables the
      tasklet.  This way, when hv_pci_suspend() proceeds, it knows that no new
      work item can be scheduled, and then it flushes hbus->wq and safely
      closes the vmbus channel.
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      ac82fc83
    • D
      PCI: hv: Reorganize the code in preparation of hibernation · a8e37506
      Dexuan Cui 提交于
      There is no functional change. This is just preparatory for a later
      patch which adds the hibernation support for the pci-hyperv driver.
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      a8e37506
  17. 14 10月, 2019 1 次提交
  18. 10 9月, 2019 1 次提交
  19. 22 8月, 2019 2 次提交
  20. 21 8月, 2019 1 次提交
  21. 12 8月, 2019 1 次提交
  22. 07 8月, 2019 1 次提交
  23. 05 7月, 2019 1 次提交
  24. 27 3月, 2019 3 次提交
    • D
      PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary · 340d4556
      Dexuan Cui 提交于
      When we hot-remove a device, usually the host sends us a PCI_EJECT message,
      and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
      
      When we execute the quick hot-add/hot-remove test, the host may not send
      us the PCI_EJECT message if the guest has not fully finished the
      initialization by sending the PCI_RESOURCES_ASSIGNED* message to the
      host, so it's potentially unsafe to only depend on the
      pci_destroy_slot() in hv_eject_device_work() because the code path
      
      create_root_hv_pci_bus()
       -> hv_pci_assign_slots()
      
      is not called in this case. Note: in this case, the host still sends the
      guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
      
      In the quick hot-add/hot-remove test, we can have such a race before
      the code path
      
      pci_devices_present_work()
       -> new_pcichild_device()
      
      adds the new device into the hbus->children list, we may have already
      received the PCI_EJECT message, and since the tasklet handler
      
      hv_pci_onchannelcallback()
      
      may fail to find the "hpdev" by calling
      
      get_pcichild_wslot(hbus, dev_message->wslot.slot)
      
      hv_pci_eject_device() is not called; Later, by continuing execution
      
      create_root_hv_pci_bus()
       -> hv_pci_assign_slots()
      
      creates the slot and the PCI_BUS_RELATIONS message with
      bus_rel->device_count == 0 removes the device from hbus->children, and
      we end up being unable to remove the slot in
      
      hv_pci_remove()
       -> hv_pci_remove_slots()
      
      Remove the slot in pci_devices_present_work() when the device
      is removed to address this race.
      
      pci_devices_present_work() and hv_eject_device_work() run in the
      singled-threaded hbus->wq, so there is not a double-remove issue for the
      slot.
      
      We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback()
      to the workqueue, because we need the hv_pci_onchannelcallback()
      synchronously call hv_pci_eject_device() to poll the channel
      ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue
      fixed in commit de0aa7b2 ("PCI: hv: Fix 2 hang issues in
      hv_compose_msi_msg()")
      
      Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: rewritten commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      340d4556
    • D
      PCI: hv: Add hv_pci_remove_slots() when we unload the driver · 15becc2b
      Dexuan Cui 提交于
      When we unload the pci-hyperv host controller driver, the host does not
      send us a PCI_EJECT message.
      
      In this case we also need to make sure the sysfs PCI slot directory is
      removed, otherwise a command on a slot file eg:
      
      "cat /sys/bus/pci/slots/2/address"
      
      will trigger a
      
      "BUG: unable to handle kernel paging request"
      
      and, if we unload/reload the driver several times we would end up with
      stale slot entries in PCI slot directories in /sys/bus/pci/slots/
      
      root@localhost:~# ls -rtl  /sys/bus/pci/slots/
      total 0
      drwxr-xr-x 2 root root 0 Feb  7 10:49 2
      drwxr-xr-x 2 root root 0 Feb  7 10:49 2-1
      drwxr-xr-x 2 root root 0 Feb  7 10:51 2-2
      
      Add the missing code to remove the PCI slot and fix the current
      behaviour.
      
      Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: reformatted the log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      15becc2b
    • D
      PCI: hv: Fix a memory leak in hv_eject_device_work() · 05f151a7
      Dexuan Cui 提交于
      When a device is created in new_pcichild_device(), hpdev->refs is set
      to 2 (i.e. the initial value of 1 plus the get_pcichild()).
      
      When we hot remove the device from the host, in a Linux VM we first call
      hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and
      then schedules a work of hv_eject_device_work(), so hpdev->refs becomes
      3 (let's ignore the paired get/put_pcichild() in other places). But in
      hv_eject_device_work(), currently we only call put_pcichild() twice,
      meaning the 'hpdev' struct can't be freed in put_pcichild().
      
      Add one put_pcichild() to fix the memory leak.
      
      The device can also be removed when we run "rmmod pci-hyperv". On this
      path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()),
      hpdev->refs is 2, and we do correctly call put_pcichild() twice in
      pci_devices_present_work().
      
      Fixes: 4daace0d ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: commit log rework]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      05f151a7
  25. 01 3月, 2019 3 次提交