1. 17 5月, 2019 2 次提交
    • D
      PCI: hv: Add hv_pci_remove_slots() when we unload the driver · 76888d13
      Dexuan Cui 提交于
      commit 15becc2b56c6eda3d9bf5ae993bafd5661c1fad1 upstream.
      
      When we unload the pci-hyperv host controller driver, the host does not
      send us a PCI_EJECT message.
      
      In this case we also need to make sure the sysfs PCI slot directory is
      removed, otherwise a command on a slot file eg:
      
      "cat /sys/bus/pci/slots/2/address"
      
      will trigger a
      
      "BUG: unable to handle kernel paging request"
      
      and, if we unload/reload the driver several times we would end up with
      stale slot entries in PCI slot directories in /sys/bus/pci/slots/
      
      root@localhost:~# ls -rtl  /sys/bus/pci/slots/
      total 0
      drwxr-xr-x 2 root root 0 Feb  7 10:49 2
      drwxr-xr-x 2 root root 0 Feb  7 10:49 2-1
      drwxr-xr-x 2 root root 0 Feb  7 10:51 2-2
      
      Add the missing code to remove the PCI slot and fix the current
      behaviour.
      
      Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: reformatted the log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76888d13
    • D
      PCI: hv: Fix a memory leak in hv_eject_device_work() · a47e0054
      Dexuan Cui 提交于
      commit 05f151a73ec2b23ffbff706e5203e729a995cdc2 upstream.
      
      When a device is created in new_pcichild_device(), hpdev->refs is set
      to 2 (i.e. the initial value of 1 plus the get_pcichild()).
      
      When we hot remove the device from the host, in a Linux VM we first call
      hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and
      then schedules a work of hv_eject_device_work(), so hpdev->refs becomes
      3 (let's ignore the paired get/put_pcichild() in other places). But in
      hv_eject_device_work(), currently we only call put_pcichild() twice,
      meaning the 'hpdev' struct can't be freed in put_pcichild().
      
      Add one put_pcichild() to fix the memory leak.
      
      The device can also be removed when we run "rmmod pci-hyperv". On this
      path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()),
      hpdev->refs is 2, and we do correctly call put_pcichild() twice in
      pci_devices_present_work().
      
      Fixes: 4daace0d ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      [lorenzo.pieralisi@arm.com: commit log rework]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a47e0054
  2. 22 9月, 2018 1 次提交
  3. 17 9月, 2018 1 次提交
    • S
      PCI: hv: support reporting serial number as slot information · a15f2c08
      Stephen Hemminger 提交于
      The Hyper-V host API for PCI provides a unique "serial number" which
      can be used as basis for sysfs PCI slot table. This can be useful
      for cases where userspace wants to find the PCI device based on
      serial number.
      
      When an SR-IOV NIC is added, the host sends an attach message
      with serial number. The kernel doesn't use the serial number, but
      it is useful when doing the same thing in a userspace driver such
      as the DPDK. By having /sys/bus/pci/slots/N it provides a direct
      way to find the matching PCI device.
      
      There maybe some cases where serial number is not unique such
      as when using GPU's. But the PCI slot infrastructure will handle
      that.
      
      This has a side effect which may also be useful. The common udev
      network device naming policy uses the slot information (rather
      than PCI address).
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a15f2c08
  4. 05 8月, 2018 1 次提交
    • N
      x86: Don't include linux/irq.h from asm/hardirq.h · 447ae316
      Nicolai Stange 提交于
      The next patch in this series will have to make the definition of
      irq_cpustat_t available to entering_irq().
      
      Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
      dependencies like
      
        asm/smp.h
          asm/apic.h
            asm/hardirq.h
              linux/irq.h
                linux/topology.h
                  linux/smp.h
                    asm/smp.h
      
      or
      
        linux/gfp.h
          linux/mmzone.h
            asm/mmzone.h
              asm/mmzone_64.h
                asm/smp.h
                  asm/apic.h
                    asm/hardirq.h
                      linux/irq.h
                        linux/irqdesc.h
                          linux/kobject.h
                            linux/sysfs.h
                              linux/kernfs.h
                                linux/idr.h
                                  linux/gfp.h
      
      and others.
      
      This causes compilation errors because of the header guards becoming
      effective in the second inclusion: symbols/macros that had been defined
      before wouldn't be available to intermediate headers in the #include chain
      anymore.
      
      A possible workaround would be to move the definition of irq_cpustat_t
      into its own header and include that from both, asm/hardirq.h and
      asm/apic.h.
      
      However, this wouldn't solve the real problem, namely asm/harirq.h
      unnecessarily pulling in all the linux/irq.h cruft: nothing in
      asm/hardirq.h itself requires it. Also, note that there are some other
      archs, like e.g. arm64, which don't have that #include in their
      asm/hardirq.h.
      
      Remove the linux/irq.h #include from x86' asm/hardirq.h.
      
      Fix resulting compilation errors by adding appropriate #includes to *.c
      files as needed.
      
      Note that some of these *.c files could be cleaned up a bit wrt. to their
      set of #includes, but that should better be done from separate patches, if
      at all.
      Signed-off-by: NNicolai Stange <nstange@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      447ae316
  5. 10 7月, 2018 1 次提交
    • D
      PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg() · 35a88a18
      Dexuan Cui 提交于
      Commit de0aa7b2 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
      uses local_bh_disable()/enable(), because hv_pci_onchannelcallback() can
      also run in tasklet context as the channel event callback, so bottom halves
      should be disabled to prevent a race condition.
      
      With CONFIG_PROVE_LOCKING=y in the recent mainline, or old kernels that
      don't have commit f71b74bc ("irq/softirqs: Use lockdep to assert IRQs
      are disabled/enabled"), when the upper layer IRQ code calls
      hv_compose_msi_msg() with local IRQs disabled, we'll see a warning at the
      beginning of __local_bh_enable_ip():
      
        IRQs not enabled as expected
          WARNING: CPU: 0 PID: 408 at kernel/softirq.c:162 __local_bh_enable_ip
      
      The warning exposes an issue in de0aa7b2: local_bh_enable() can
      potentially call do_softirq(), which is not supposed to run when local IRQs
      are disabled. Let's fix this by using local_irq_save()/restore() instead.
      
      Note: hv_pci_onchannelcallback() is not a hot path because it's only called
      when the PCI device is hot added and removed, which is infrequent.
      
      Fixes: de0aa7b2 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Cc: stable@vger.kernel.org
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      35a88a18
  6. 29 6月, 2018 1 次提交
  7. 08 6月, 2018 1 次提交
  8. 25 5月, 2018 1 次提交
  9. 24 5月, 2018 3 次提交
  10. 02 5月, 2018 1 次提交
    • S
      PCI: hv: Make sure the bus domain is really unique · 29927dfb
      Sridhar Pitchai 提交于
      When Linux runs as a guest VM in Hyper-V and Hyper-V adds the virtual PCI
      bus to the guest, Hyper-V always provides unique PCI domain.
      
      commit 4a9b0933 ("PCI: hv: Use device serial number as PCI domain")
      overrode unique domain with the serial number of the first device added to
      the virtual PCI bus.
      
      The reason for that patch was to have a consistent and short name for the
      device, but Hyper-V doesn't provide unique serial numbers. Using non-unique
      serial numbers as domain IDs leads to duplicate device addresses, which
      causes PCI bus registration to fail.
      
      commit 0c195567 ("netvsc: transparent VF management") avoids the need
      for commit 4a9b0933 ("PCI: hv: Use device serial number as PCI
      domain").  When scripts were used to configure VF devices, the name of
      the VF needed to be consistent and short, but with commit 0c195567
      ("netvsc: transparent VF management") all the setup is done in the kernel,
      and we do not need to maintain consistent name.
      
      Revert commit 4a9b0933 ("PCI: hv: Use device serial number as PCI
      domain") so we can reliably support multiple devices being assigned to
      a guest.
      
      Tag the patch for stable kernels containing commit 0c195567
      ("netvsc: transparent VF management").
      
      Fixes: 4a9b0933 ("PCI: hv: Use device serial number as PCI domain")
      Signed-off-by: NSridhar Pitchai <sridhar.pitchai@microsoft.com>
      [lorenzo.pieralisi@arm.com: trimmed commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org # v4.14+
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      29927dfb
  11. 17 3月, 2018 5 次提交
  12. 29 1月, 2018 1 次提交
  13. 29 12月, 2017 1 次提交
    • T
      x86/apic: Switch all APICs to Fixed delivery mode · a31e58e1
      Thomas Gleixner 提交于
      Some of the APIC incarnations are operating in lowest priority delivery
      mode. This worked as long as the vector management code allocated the same
      vector on all possible CPUs for each interrupt.
      
      Lowest priority delivery mode does not necessarily respect the affinity
      setting and may redirect to some other online CPU. This was documented
      somewhere in the old code and the conversion to single target delivery
      missed to update the delivery mode of the affected APIC drivers which
      results in spurious interrupts on some of the affected CPU/Chipset
      combinations.
      
      Switch the APIC drivers over to Fixed delivery mode and remove all
      leftovers of lowest priority delivery mode.
      
      Switching to Fixed delivery mode is not a problem on these CPUs because the
      kernel already uses Fixed delivery mode for IPIs. The reason for this is
      that th SDM explicitely forbids lowest prio mode for IPIs. The reason is
      obvious: If the irq routing does not honor destination targets in lowest
      prio mode then an IPI targeted at CPU1 might end up on CPU0, which would be
      a fatal problem in many cases.
      
      As a consequence of this change, the apic::irq_delivery_mode field is now
      pointless, but this needs to be cleaned up in a separate patch.
      
      Fixes: fdba46ff ("x86/apic: Get rid of multi CPU affinity")
      Reported-by: vcaputo@pengaru.com
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: vcaputo@pengaru.com
      Cc: Pavel Machek <pavel@ucw.cz>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712281140440.1688@nanos
      a31e58e1
  14. 08 11月, 2017 1 次提交
    • D
      PCI: hv: Use effective affinity mask · 79aa801e
      Dexuan Cui 提交于
      The effective_affinity_mask is always set when an interrupt is assigned in
      __assign_irq_vector() -> apic->cpu_mask_to_apicid(), e.g. for struct apic
      apic_physflat: -> default_cpu_mask_to_apicid() ->
      irq_data_update_effective_affinity(), but it looks d->common->affinity
      remains all-1's before the user space or the kernel changes it later.
      
      In the early allocation/initialization phase of an IRQ, we should use the
      effective_affinity_mask, otherwise Hyper-V may not deliver the interrupt to
      the expected CPU.  Without the patch, if we assign 7 Mellanox ConnectX-3
      VFs to a 32-vCPU VM, one of the VFs may fail to receive interrupts.
      Tested-by: NAdrian Suhov <v-adsuho@microsoft.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NJake Oshins <jakeo@microsoft.com>
      Cc: stable@vger.kernel.org
      Cc: Jork Loeser <jloeser@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      79aa801e
  15. 10 8月, 2017 1 次提交
  16. 04 8月, 2017 1 次提交
    • S
      PCI: hv: Do not sleep in compose_msi_msg() · 80bfeeb9
      Stephen Hemminger 提交于
      The setup of MSI with Hyper-V host was sleeping with locks held.  This
      error is reported when doing SR-IOV hotplug with kernel built with lockdep:
      
          BUG: sleeping function called from invalid context at kernel/sched/completion.c:93
          in_atomic(): 1, irqs_disabled(): 1, pid: 1405, name: ip
          3 locks held by ip/1405:
         #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff976b10bb>] rtnetlink_rcv+0x1b/0x40
         #1:  (&desc->request_mutex){+.+...}, at: [<ffffffff970ddd33>] __setup_irq+0xb3/0x720
         #2:  (&irq_desc_lock_class){-.-...}, at: [<ffffffff970ddd65>] __setup_irq+0xe5/0x720
         irq event stamp: 3476
         hardirqs last  enabled at (3475): [<ffffffff971b3005>] get_page_from_freelist+0x225/0xc90
         hardirqs last disabled at (3476): [<ffffffff978024e7>] _raw_spin_lock_irqsave+0x27/0x90
         softirqs last  enabled at (2446): [<ffffffffc05ef0b0>] ixgbevf_configure+0x380/0x7c0 [ixgbevf]
         softirqs last disabled at (2444): [<ffffffffc05ef08d>] ixgbevf_configure+0x35d/0x7c0 [ixgbevf]
      
      The workaround is to poll for host response instead of blocking on
      completion.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      80bfeeb9
  17. 03 7月, 2017 5 次提交
  18. 18 4月, 2017 1 次提交
  19. 05 4月, 2017 2 次提交
  20. 24 3月, 2017 2 次提交
  21. 18 2月, 2017 1 次提交
  22. 11 2月, 2017 1 次提交
  23. 30 11月, 2016 1 次提交
  24. 17 11月, 2016 3 次提交
  25. 01 11月, 2016 1 次提交