提交 · 1b50dbd593914298a2e90cecfd847f4674abf5ac · openeuler / Kernel

25 5月, 2022 1 次提交

PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology · 1b50dbd5

由 Long Li 提交于 5月 23, 2022

stable inclusion
from stable-v5.10.102
commit ade1077c7fc054d1207ed6fbf3787f921af95814
bugzilla: https://gitee.com/openeuler/kernel/issues/I567K6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ade1077c7fc054d1207ed6fbf3787f921af95814

--------------------------------

commit 3149efcd upstream.

When kernel boots with a NUMA topology with some NUMA nodes offline, the PCI
driver should only set an online NUMA node on the device. This can happen
during KDUMP where some NUMA nodes are not made online by the KDUMP kernel.

This patch also fixes the case where kernel is booting with "numa=off".

Fixes: 999dd956 ("PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2")
Signed-off-by: NLong Li <longli@microsoft.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Tested-by: NPurna Pavan Chandra Aekkaladevi <paekkaladevi@microsoft.com>
Acked-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/1643247814-15184-1-git-send-email-longli@linuxonhyperv.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1b50dbd5

15 11月, 2021 1 次提交

PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus · 8afb4df9

由 Long Li 提交于 11月 15, 2021

stable inclusion
from stable-5.10.73
commit 8aef3824e9469445e748d00b89a9f18bb77cab03
bugzilla: 182983 https://gitee.com/openeuler/kernel/issues/I4I3M0

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8aef3824e9469445e748d00b89a9f18bb77cab03

--------------------------------

[ Upstream commit 41608b64 ]

In hv_pci_bus_exit, the code is holding a spinlock while calling
pci_destroy_slot(), which takes a mutex.

This is not safe for spinlock. Fix this by moving the children to be
deleted to a list on the stack, and removing them after spinlock is
released.

Fixes: 94d22763 ("PCI: hv: Fix a race condition when removing the device")

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: "Krzysztof Wilczyński" <kw@linux.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/linux-hyperv/20210823152130.GA21501@kili/Signed-off-by: NLong Li <longli@microsoft.com>
Reviewed-by: NWei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1630365207-20616-1-git-send-email-longli@linuxonhyperv.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8afb4df9

15 10月, 2021 1 次提交

PCI: hv: Fix a race condition when removing the device · cc9800a9

由 Long Li 提交于 10月 14, 2021

stable inclusion
from stable-5.10.52
commit 7667cdc4b7e866aee35591407a54b45944637ffe
bugzilla: 175542 https://gitee.com/openeuler/kernel/issues/I4DTKU

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7667cdc4b7e866aee35591407a54b45944637ffe

--------------------------------

[ Upstream commit 94d22763 ]

On removing the device, any work item (hv_pci_devices_present() or
hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
and race with hv_pci_remove().

This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
and decide to rescind the channel immediately after that.

Fix this by flushing/destroying the workqueue of hbus before doing hbus remove.

Link: https://lore.kernel.org/r/1620806800-30983-1-git-send-email-longli@linuxonhyperv.comSigned-off-by: NLong Li <longli@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cc9800a9

13 10月, 2021 1 次提交

PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv() · a02ada86

由 Haiyang Zhang 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit 998d9fefdd47ad7160b027017445684507236b9f
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=998d9fefdd47ad7160b027017445684507236b9f

--------------------------------

[ Upstream commit 7d815f4a ]

Add check for hv_is_hyperv_initialized() at the top of
init_hv_pci_drv(), so if the pci-hyperv driver is force-loaded on non
Hyper-V platforms, the init_hv_pci_drv() will exit immediately, without
any side effects, like assignments to hvpci_block_ops, etc.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reported-and-tested-by: NMohammad Alqayeem <mohammad.alqyeem@nutanix.com>
Reviewed-by: NWei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1621984653-1210-1-git-send-email-haiyangz@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a02ada86

02 10月, 2020 1 次提交

PCI: hv: Fix hibernation in case interrupts are not re-created · 915cff7f

由 Dexuan Cui 提交于 10月 02, 2020

pci_restore_msi_state() directly writes the MSI/MSI-X related registers
via MMIO. On a physical machine, this works perfectly; for a Linux VM
running on a hypervisor, which typically enables IOMMU interrupt remapping,
the hypervisor usually should trap and emulate the MMIO accesses in order
to re-create the necessary interrupt remapping table entries in the IOMMU,
otherwise the interrupts can not work in the VM after hibernation.

Hyper-V is different from other hypervisors in that it does not trap and
emulate the MMIO accesses, and instead it uses a para-virtualized method,
which requires the VM to call hv_compose_msi_msg() to notify the hypervisor
of the info that would be passed to the hypervisor in the case of the
trap-and-emulate method. This is not an issue to a lot of PCI device
drivers, which destroy and re-create the interrupts across hibernation, so
hv_compose_msi_msg() is called automatically. However, some PCI device
drivers (e.g. the in-tree GPU driver nouveau and the out-of-tree Nvidia
proprietary GPU driver) do not destroy and re-create MSI/MSI-X interrupts
across hibernation, so hv_pci_resume() has to call hv_compose_msi_msg(),
otherwise the PCI device drivers can no longer receive interrupts after
the VM resumes from hibernation.

Hyper-V is also different in that chip->irq_unmask() may fail in a
Linux VM running on Hyper-V (on a physical machine, chip->irq_unmask()
can not fail because unmasking an MSI/MSI-X register just means an MMIO
write): during hibernation, when a CPU is offlined, the kernel tries
to move the interrupt to the remaining CPUs that haven't been offlined
yet. In this case, hv_irq_unmask() -> hv_do_hypercall() always fails
because the vmbus channel has been closed: here the early "return" in
hv_irq_unmask() means the pci_msi_unmask_irq() is not called, i.e. the
desc->masked remains "true", so later after hibernation, the MSI interrupt
always remains masked, which is incorrect. Refer to cpu_disable_common()
-> fixup_irqs() -> irq_migrate_all_off_this_cpu() -> migrate_one_irq():

static bool migrate_one_irq(struct irq_desc *desc)
{
...
        if (maskchip && chip->irq_mask)
                chip->irq_mask(d);
...
        err = irq_do_set_affinity(d, affinity, false);
...
        if (maskchip && chip->irq_unmask)
                chip->irq_unmask(d);

Fix the issue by calling pci_msi_unmask_irq() unconditionally in
hv_irq_unmask(). Also suppress the error message for hibernation because
the hypercall failure during hibernation does not matter (at this time
all the devices have been frozen). Note: the correct affinity info is
still updated into the irqdata data structure in migrate_one_irq() ->
irq_do_set_affinity() -> hv_set_affinity(), so later when the VM
resumes, hv_pci_restore_msi_state() is able to correctly restore
the interrupt with the correct affinity.

Link: https://lore.kernel.org/r/20201002085158.9168-1-decui@microsoft.com
Fixes: ac82fc83 ("PCI: hv: Add hibernation support")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NJake Oshins <jakeo@microsoft.com>

915cff7f

28 9月, 2020 1 次提交

PCI: hv: Document missing hv_pci_protocol_negotiation() parameter · 6d2730cb

由 Krzysztof Wilczyński 提交于 9月 25, 2020

Add missing documentation for the parameter "version" and "num_version"
of the hv_pci_protocol_negotiation() function and resolve build time
kernel-doc warnings:

drivers/pci/controller/pci-hyperv.c:2535: warning: Function parameter
or member 'version' not described in 'hv_pci_protocol_negotiation'

drivers/pci/controller/pci-hyperv.c:2535: warning: Function parameter
or member 'num_version' not described in 'hv_pci_protocol_negotiation'

No change to functionality intended.
Signed-off-by: NKrzysztof Wilczyński <kw@linux.com>
Link: https://lore.kernel.org/r/20200925234753.1767227-1-kw@linux.comReviewed-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NWei Liu <wei.liu@kernel.org>

6d2730cb

16 9月, 2020 2 次提交

x86/msi: Use generic MSI domain ops · 9006c133

由 Thomas Gleixner 提交于 8月 26, 2020

pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the
generic MSI domain ops in the core and PCI MSI code unconditionally and get
rid of the x86 specific implementations in the X86 MSI code and in the
hyperv PCI driver.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200826112332.564274859@linutronix.de

9006c133

x86/msi: Consolidate MSI allocation · 3b9c1d37

由 Thomas Gleixner 提交于 8月 26, 2020

Convert the interrupt remap drivers to retrieve the pci device from the msi
descriptor and use info::hwirq.

This is the first step to prepare x86 for using the generic MSI domain ops.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NWei Liu <wei.liu@kernel.org>
Acked-by: NJoerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20200826112332.466405395@linutronix.de

3b9c1d37

28 7月, 2020 1 次提交

PCI: hv: Make some functions static · a459d9e1

由 Wei Yongjun 提交于 7月 06, 2020

sparse report build warning as follows:

drivers/pci/controller/pci-hyperv.c:941:5: warning:
symbol 'hv_read_config_block' was not declared. Should it be static?
drivers/pci/controller/pci-hyperv.c:1021:5: warning:
symbol 'hv_write_config_block' was not declared. Should it be static?
drivers/pci/controller/pci-hyperv.c:1090:5: warning:
symbol 'hv_register_block_invalidate' was not declared. Should it be static?

Those functions are not used outside of this file, so mark them static.

Link: https://lore.kernel.org/r/20200706135234.80758-1-weiyongjun1@huawei.comReported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>

a459d9e1

27 7月, 2020 1 次提交

PCI: hv: Fix a timing issue which causes kdump to fail occasionally · d6af2ed2

由 Wei Hu 提交于 7月 27, 2020

Kdump could fail sometime on Hyper-V guest because the retry in
hv_pci_enter_d0() releases child device structures in hv_pci_bus_exit().

Although there is a second asynchronous device relations message sending
from the host, if this message arrives to the guest after
hv_send_resource_allocated() is called, the retry would fail.

Fix the problem by moving retry to hv_pci_probe() and start the retry
from hv_pci_query_relations() call.  This will cause a device relations
message to arrive to the guest synchronously; the guest would then be
able to rebuild the child device structures before calling
hv_send_resource_allocated().

Link: https://lore.kernel.org/r/20200727071731.18516-1-weh@microsoft.com
Fixes: c81992e7 ("PCI: hv: Retry PCI bus D0 entry on invalid device state")
Signed-off-by: NWei Hu <weh@microsoft.com>
[lorenzo.pieralisi@arm.com: fixed a comment and commit log]
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

d6af2ed2

28 5月, 2020 1 次提交

PCI: hv: Use struct_size() helper · d0684fd0

由 Gustavo A. R. Silva 提交于 5月 25, 2020

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct hv_dr_state {
	...
        struct hv_pcidev_description func[];
};

struct pci_bus_relations {
	...
        struct pci_function_description func[];
} __packed;

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

So, replace the following forms:

offsetof(struct hv_dr_state, func) +
	(sizeof(struct hv_pcidev_description) *
	(relations->device_count))

offsetof(struct pci_bus_relations, func) +
	(sizeof(struct pci_function_description) *
	(bus_rel->device_count))

with:

struct_size(dr, func, relations->device_count)

and

struct_size(bus_rel, func, bus_rel->device_count)

respectively.

Link: https://lore.kernel.org/r/20200525164319.GA13596@embeddedorSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NWei Liu <wei.liu@kernel.org>

d0684fd0

11 5月, 2020 2 次提交

PCI: hv: Retry PCI bus D0 entry on invalid device state · c81992e7

由 Wei Hu 提交于 5月 07, 2020

When kdump is triggered, some PCI devices may have not been shut down
cleanly before the kdump kernel starts.

This causes the initial attempt to enter D0 state in the kdump kernel to
fail with invalid device state returned from Hyper-V host.

When this happens, explicitly call hv_pci_bus_exit() and retry to enter
the D0 state.

Link: https://lore.kernel.org/r/20200507050300.10974-1-weh@microsoft.comSigned-off-by: NWei Hu <weh@microsoft.com>
[lorenzo.pieralisi@arm.com: commit log]
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

c81992e7

PCI: hv: Fix the PCI HyperV probe failure path to release resource properly · 83cc3508

由 Wei Hu 提交于 5月 07, 2020

In some error cases in hv_pci_probe(), allocated resources are not freed.

Fix this by adding a field to keep track of the high water mark for slots
that have resources allocated to them. In case of an error, this high
water mark is used to know which slots have resources that must be released.
Since slots are numbered starting with zero, a value of -1 indicates no
slots have been allocated resources. There may be unused slots in the range
between slot 0 and the high water mark slot, but these slots are already
ignored by the existing code in the allocate and release loops with the call
to get_pcichild_wslot().

Link: https://lore.kernel.org/r/20200507050211.10923-1-weh@microsoft.comSigned-off-by: NWei Hu <weh@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

83cc3508

23 4月, 2020 1 次提交

PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU... · 240ad77c

由 Andrea Parri (Microsoft) 提交于 4月 06, 2020

PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality

The current implementation of hv_compose_msi_msg() is incompatible with
the new functionality that allows changing the vCPU a VMBus channel will
interrupt: if this function always calls hv_pci_onchannelcallback() in
the polling loop, the interrupt going to a different CPU could cause
hv_pci_onchannelcallback() to be running simultaneously in a tasklet,
which will break. The current code also has a problem in that it is not
synchronized with vmbus_reset_channel_cb(): hv_compose_msi_msg() could
be accessing the ring buffer via the call of hv_pci_onchannelcallback()
well after the time that vmbus_reset_channel_cb() has finished.

Fix these issues as follows. Disable the channel tasklet before
entering the polling loop in hv_compose_msi_msg() and re-enable it when
done. This will prevent hv_pci_onchannelcallback() from running in a
tasklet on a different CPU. Moreover, poll by always calling
hv_pci_onchannelcallback(), but check the channel callback function for
NULL and invoke the callback within a sched_lock critical section. This
will prevent hv_compose_msi_msg() from accessing the ring buffer after
vmbus_reset_channel_cb() has acquired the sched_lock spinlock.
Suggested-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NAndrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <linux-pci@vger.kernel.org>
Link: https://lore.kernel.org/r/20200406001514.19876-8-parri.andrea@gmail.comReviewed-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NWei Liu <wei.liu@kernel.org>

240ad77c

09 3月, 2020 3 次提交

PCI: hv: Introduce hv_msi_entry · 1cf106d9

由 Boqun Feng 提交于 2月 10, 2020

Add a new structure (hv_msi_entry), which is also defined in the TLFS,
to describe the msi entry for HVCALL_RETARGET_INTERRUPT. The structure
is needed because its layout may be different from architecture to
architecture.

Also add a new generic interface hv_set_msi_entry_from_desc() to allow
different archs to set the msi entry from msi_desc.

No functional change, only preparation for the future support of virtual
PCI on non-x86 architectures.
Signed-off-by: NBoqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NDexuan Cui <decui@microsoft.com>

1cf106d9

PCI: hv: Move retarget related structures into tlfs header · 61bfd920

由 Boqun Feng 提交于 2月 10, 2020

Currently, retarget_msi_interrupt and other structures it relys on are
defined in pci-hyperv.c. However, those structures are actually defined
in Hypervisor Top-Level Functional Specification [1] and may be
different in sizes of fields or layout from architecture to
architecture. Let's move those definitions into x86's tlfs header file
to support virtual PCI on non-x86 architectures in the future. Note that
"__packed" attribute is added to these structures during the movement
for the same reason as we use the attribute for other TLFS structures in
the header file: make sure the structures meet the specification and
avoid anything unexpected from the compilers.

Additionally, rename struct retarget_msi_interrupt to
hv_retarget_msi_interrupt for the consistent naming convention, also
mirroring the name in TLFS.

[1]: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfsSigned-off-by: NBoqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NDexuan Cui <decui@microsoft.com>

61bfd920

PCI: hv: Move hypercall related definitions into tlfs header · b00f80fc

由 Boqun Feng 提交于 2月 10, 2020

Currently HVCALL_RETARGET_INTERRUPT and HV_PARTITION_ID_SELF are defined
in pci-hyperv.c. However, similar to other hypercall related
definitions, it makes more sense to put them in the tlfs header file.

Besides, these definitions are arch-dependent, so for the support of
virtual PCI on non-x86 archs in the future, move them into arch-specific
tlfs header file.
Signed-off-by: NBoqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NAndrew Murray <amurray@thegoodpenguin.co.uk>
Reviewed-by: NDexuan Cui <decui@microsoft.com>

b00f80fc

06 3月, 2020 3 次提交

PCI: hv: Replace zero-length array with flexible-array member · 067fb6c9

由 Gustavo A. R. Silva 提交于 2月 12, 2020

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NDexuan Cui <decui@microsoft.com>

067fb6c9

PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2 · 999dd956

由 Long Li 提交于 2月 25, 2020

Starting with Hyper-V PCI protocol version 1.3, the host VSP can send
PCI_BUS_RELATIONS2 and pass the vNUMA node information for devices on the
bus. The vNUMA node tells which guest NUMA node this device is on based
on guest VM configuration topology and physical device information.

Add code to negotiate v1.3 and process PCI_BUS_RELATIONS2.
Signed-off-by: NLong Li <longli@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

999dd956

PCI: hv: Decouple the func definition in hv_dr_state from VSP message · f9ad0f36

由 Long Li 提交于 2月 25, 2020

hv_dr_state is used to find present PCI devices on the bus. The structure
reuses struct pci_function_description from VSP message to describe a
device.

To prepare support for pci_function_description v2, decouple this
dependence in hv_dr_state so it can work with both v1 and v2 VSP messages.

There is no functionality change.
Signed-off-by: NLong Li <longli@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

f9ad0f36

24 2月, 2020 2 次提交

PCI: hv: Add missing kfree(hbus) in hv_pci_probe()'s error handling path · 42c3d418

由 Dexuan Cui 提交于 2月 21, 2020

Now that we use kzalloc() to allocate the hbus buffer, we must call
kfree() in the error path as well to prevent memory leakage.

Fixes: 877b911a ("PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

42c3d418

PCI: hv: Remove unnecessary type casting from kzalloc · e658a4fe

由 Dexuan Cui 提交于 2月 21, 2020

In C, there is no need to cast a void * to any other pointer type,
remove an unnecessary cast.
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

e658a4fe

26 11月, 2019 4 次提交

PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer · 877b911a

由 Dexuan Cui 提交于 11月 24, 2019

With the recent 59bb4798 ("mm, sl[aou]b: guarantee natural
alignment for kmalloc(power-of-two)"), kzalloc() is able to allocate
a 4KB buffer that is guaranteed to be 4KB-aligned. Here the size and
alignment of hbus is important because hbus's field
retarget_msi_interrupt_params must not cross a 4KB page boundary.

Here we prefer kzalloc to get_zeroed_page(), because a buffer
allocated by the latter is not tracked and scanned by kmemleak, and
hence kmemleak reports the pointer contained in the hbus buffer
(i.e. the hpdev struct, which is created in new_pcichild_device() and
is tracked by hbus->children) as memory leak (false positive).

If the kernel doesn't have 59bb4798, get_zeroed_page() *must* be
used to allocate the hbus buffer and we can avoid the kmemleak false
positive by using kmemleak_alloc() and kmemleak_free() to ask
kmemleak to track and scan the hbus buffer.
Reported-by: NLili Deng <v-lide@microsoft.com>
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

877b911a

PCI: hv: Change pci_protocol_version to per-hbus · 14ef39fd

由 Dexuan Cui 提交于 11月 24, 2019

A VM can have multiple Hyper-V hbus. It's incorrect to set the global
variable 'pci_protocol_version' when *every* hbus is initialized in
hv_pci_protocol_negotiation(). This is not an issue in practice since
every hbus should have the same value of hbus->protocol_version, but
we should make the variable per-hbus, so in case we have busses
with different protocol versions, the driver can still work correctly.
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

14ef39fd

PCI: hv: Add hibernation support · ac82fc83

由 Dexuan Cui 提交于 11月 24, 2019

Add suspend() and resume() functions so that Hyper-V virtual PCI devices
are handled properly when the VM hibernates and resumes from
hibernation.

Note that the suspend() function must make sure there are no pending
work items before calling vmbus_close(), since it runs in a process
context as a callback in dpm_suspend(). When it starts to run, the
channel callback hv_pci_onchannelcallback(), which runs in a tasklet
context, can be still running concurrently and scheduling new work items
onto hbus->wq in hv_pci_devices_present() and hv_pci_eject_device(), and
the work item handlers can access the vmbus channel, which can be being
closed by hv_pci_suspend(), e.g. the work item handler
pci_devices_present_work() -> new_pcichild_device() writes to the vmbus
channel.

To eliminate the race, hv_pci_suspend() disables the channel callback
tasklet, sets hbus->state to hv_pcibus_removing, and re-enables the
tasklet.  This way, when hv_pci_suspend() proceeds, it knows that no new
work item can be scheduled, and then it flushes hbus->wq and safely
closes the vmbus channel.
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

ac82fc83

PCI: hv: Reorganize the code in preparation of hibernation · a8e37506

由 Dexuan Cui 提交于 11月 24, 2019

There is no functional change. This is just preparatory for a later
patch which adds the hibernation support for the pci-hyperv driver.
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>

a8e37506

14 10月, 2019 1 次提交

PCI: Add PCI_STD_NUM_BARS for the number of standard BARs · c9c13ba4

由 Denis Efremov 提交于 9月 28, 2019

Code that iterates over all standard PCI BARs typically uses
PCI_STD_RESOURCE_END. However, that requires the unusual test
"i <= PCI_STD_RESOURCE_END" rather than something the typical
"i < PCI_STD_NUM_BARS".

Add a definition for PCI_STD_NUM_BARS and change loops to use the more
idiomatic C style to help avoid fencepost errors.

Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com
Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com
Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.comSigned-off-by: NDenis Efremov <efremov@linux.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # arch/s390/
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> # video/fbdev/
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> # pci/controller/dwc/
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> # scsi/pm8001/
Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi/pm8001/
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # memstick/

c9c13ba4

10 9月, 2019 1 次提交

PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers · f73f8a50

由 Haiyang Zhang 提交于 8月 15, 2019

As recommended by Azure host team, the bytes 4, 5 have more uniqueness
(info entropy) than bytes 8, 9 so use them as the PCI domain numbers.

On older hosts, bytes 4, 5 can also be used -- no backward compatibility
issues are introduced and the chance of collision is greatly reduced.

In the rare cases of collision, the driver code detects and finds
another number that is not in use.
Suggested-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: NSasha Levin <sashal@kernel.org>

f73f8a50

22 8月, 2019 2 次提交

PCI: hv: Add a Hyper-V PCI interface driver for software backchannel interface · 348dd93e

由 Haiyang Zhang 提交于 8月 22, 2019

This interface driver is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

348dd93e

PCI: hv: Add a paravirtual backchannel in software · e5d2f910

由 Dexuan Cui 提交于 8月 22, 2019

Windows SR-IOV provides a backchannel mechanism in software for communication
between a VF driver and a PF driver. These "configuration blocks" are
similar in concept to PCI configuration space, but instead of doing reads and
writes in 32-bit chunks through a very slow path, packets of up to 128 bytes
can be sent or received asynchronously.

Nearly every SR-IOV device contains just such a communications channel in
hardware, so using this one in software is usually optional. Using the
software channel, however, allows driver implementers to leverage software
tools that fuzz the communications channel looking for vulnerabilities.

The usage model for these packets puts the responsibility for reading or
writing on the VF driver. The VF driver sends a read or a write packet,
indicating which "block" is being referred to by number.

If the PF driver wishes to initiate communication, it can "invalidate" one or
more of the first 64 blocks. This invalidation is delivered via a callback
supplied by the VF driver by this driver.

No protocol is implied, except that supplied by the PF and VF drivers.
Signed-off-by: NJake Oshins <jakeo@microsoft.com>
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5d2f910

21 8月, 2019 1 次提交

PCI: hv: Detect and fix Hyper-V PCI domain number collision · be700103

由 Haiyang Zhang 提交于 8月 15, 2019

Currently in Azure cloud, for passthrough devices, the host sets the
device instance ID's bytes 8 - 15 to a value derived from the host HWID,
which is the same on all devices in a VM. So, the device instance ID's
bytes 8 and 9 provided by the host are no longer unique. This affects
all Azure hosts since July 2018, and can cause device passthrough to VMs
to fail because the bytes 8 and 9 are used as PCI domain number.
Collision of domain numbers will cause the second device with the same
domain number fail to load.

In the cases of collision, we will detect and find another number that is
not in use.
Suggested-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: NSasha Levin <sashal@kernel.org>

be700103

12 8月, 2019 1 次提交

PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it · 533ca1fe

由 Dexuan Cui 提交于 8月 02, 2019

The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Fixes: 15becc2b ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org

533ca1fe

07 8月, 2019 1 次提交

PCI: hv: Allocate a named fwnode instead of an address-based one · 467a3bb9

由 Marc Zyngier 提交于 8月 06, 2019

To allocate its fwnode that is then used to allocate an irqdomain,
the driver uses irq_domain_alloc_fwnode(), passing it a VA as an
identifier. This is a rather bad idea, as this address ends up
published in debugfs (and we want to move away from VAs there
anyway).

Instead, let's allocate a named fwnode by using the device GUID as
an identifier. It is allegedly unique, and can be traced back to
the original device.
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

467a3bb9

05 7月, 2019 1 次提交

PCI: hv: Fix a use-after-free bug in hv_eject_device_work() · 4df591b2

由 Dexuan Cui 提交于 6月 21, 2019

Fix a use-after-free in hv_eject_device_work().

Fixes: 05f151a7 ("PCI: hv: Fix a memory leak in hv_eject_device_work()")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org

4df591b2

27 3月, 2019 3 次提交

PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary · 340d4556

由 Dexuan Cui 提交于 3月 04, 2019

When we hot-remove a device, usually the host sends us a PCI_EJECT message,
and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.

When we execute the quick hot-add/hot-remove test, the host may not send
us the PCI_EJECT message if the guest has not fully finished the
initialization by sending the PCI_RESOURCES_ASSIGNED* message to the
host, so it's potentially unsafe to only depend on the
pci_destroy_slot() in hv_eject_device_work() because the code path

create_root_hv_pci_bus()
 -> hv_pci_assign_slots()

is not called in this case. Note: in this case, the host still sends the
guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.

In the quick hot-add/hot-remove test, we can have such a race before
the code path

pci_devices_present_work()
 -> new_pcichild_device()

adds the new device into the hbus->children list, we may have already
received the PCI_EJECT message, and since the tasklet handler

hv_pci_onchannelcallback()

may fail to find the "hpdev" by calling

get_pcichild_wslot(hbus, dev_message->wslot.slot)

hv_pci_eject_device() is not called; Later, by continuing execution

create_root_hv_pci_bus()
 -> hv_pci_assign_slots()

creates the slot and the PCI_BUS_RELATIONS message with
bus_rel->device_count == 0 removes the device from hbus->children, and
we end up being unable to remove the slot in

hv_pci_remove()
 -> hv_pci_remove_slots()

Remove the slot in pci_devices_present_work() when the device
is removed to address this race.

pci_devices_present_work() and hv_eject_device_work() run in the
singled-threaded hbus->wq, so there is not a double-remove issue for the
slot.

We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback()
to the workqueue, because we need the hv_pci_onchannelcallback()
synchronously call hv_pci_eject_device() to poll the channel
ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue
fixed in commit de0aa7b2 ("PCI: hv: Fix 2 hang issues in
hv_compose_msi_msg()")

Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: rewritten commit log]
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org

340d4556

PCI: hv: Add hv_pci_remove_slots() when we unload the driver · 15becc2b

由 Dexuan Cui 提交于 3月 04, 2019

When we unload the pci-hyperv host controller driver, the host does not
send us a PCI_EJECT message.

In this case we also need to make sure the sysfs PCI slot directory is
removed, otherwise a command on a slot file eg:

"cat /sys/bus/pci/slots/2/address"

will trigger a

"BUG: unable to handle kernel paging request"

and, if we unload/reload the driver several times we would end up with
stale slot entries in PCI slot directories in /sys/bus/pci/slots/

root@localhost:~# ls -rtl  /sys/bus/pci/slots/
total 0
drwxr-xr-x 2 root root 0 Feb  7 10:49 2
drwxr-xr-x 2 root root 0 Feb  7 10:49 2-1
drwxr-xr-x 2 root root 0 Feb  7 10:51 2-2

Add the missing code to remove the PCI slot and fix the current
behaviour.

Fixes: a15f2c08 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: reformatted the log]
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org

15becc2b

PCI: hv: Fix a memory leak in hv_eject_device_work() · 05f151a7

由 Dexuan Cui 提交于 3月 04, 2019

When a device is created in new_pcichild_device(), hpdev->refs is set
to 2 (i.e. the initial value of 1 plus the get_pcichild()).

When we hot remove the device from the host, in a Linux VM we first call
hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and
then schedules a work of hv_eject_device_work(), so hpdev->refs becomes
3 (let's ignore the paired get/put_pcichild() in other places). But in
hv_eject_device_work(), currently we only call put_pcichild() twice,
meaning the 'hpdev' struct can't be freed in put_pcichild().

Add one put_pcichild() to fix the memory leak.

The device can also be removed when we run "rmmod pci-hyperv". On this
path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()),
hpdev->refs is 2, and we do correctly call put_pcichild() twice in
pci_devices_present_work().

Fixes: 4daace0d ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: NDexuan Cui <decui@microsoft.com>
[lorenzo.pieralisi@arm.com: commit log rework]
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org

05f151a7

01 3月, 2019 3 次提交

PCI: hv: Refactor hv_irq_unmask() to use cpumask_to_vpset() · c8ccf759

由 Maya Nakamura 提交于 3月 01, 2019

Remove the duplicate implementation of cpumask_to_vpset() and use the
shared implementation. Export hv_max_vp_index, which is required by
cpumask_to_vpset().
Signed-off-by: NMaya Nakamura <m.maya.nakamura@gmail.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>

c8ccf759

PCI: hv: Replace hv_vp_set with hv_vpset · 9bc11742

由 Maya Nakamura 提交于 3月 01, 2019

Remove a duplicate definition of VP set (hv_vp_set) and use the common
definition (hv_vpset) that is used in other places.

Change the order of the members in struct hv_pcibus_device so that the
declaration of retarget_msi_interrupt_params is the last member. Struct
hv_vpset, which contains a flexible array, is nested two levels deep in
struct hv_pcibus_device via retarget_msi_interrupt_params.

Add a comment that retarget_msi_interrupt_params should be the last
member of struct hv_pcibus_device.
Signed-off-by: NMaya Nakamura <m.maya.nakamura@gmail.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>

9bc11742

PCI: hv: Add __aligned(8) to struct retarget_msi_interrupt · 6ae91579

由 Maya Nakamura 提交于 3月 01, 2019

Because Hyper-V requires that hypercall arguments be aligned on an 8
byte boundary, add __aligned(8) to struct retarget_msi_interrupt.

Link: https://lore.kernel.org/lkml/87k1hlqlby.fsf@vitty.brq.redhat.com/Signed-off-by: NMaya Nakamura <m.maya.nakamura@gmail.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>

6ae91579

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功