提交 · fc675e355e705a046df7b635d3f3330c0ad94569 · openanolis / cloud-kernel

19 9月, 2014 10 次提交

arm/arm64: KVM: vgic: kill VGIC_MAX_CPUS · fc675e35

由 Marc Zyngier 提交于 7月 08, 2014

We now have the information about the number of CPU interfaces in
the distributor itself. Let's get rid of VGIC_MAX_CPUS, and just
rely on KVM_MAX_VCPUS where we don't have the choice. Yet.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

fc675e35

arm/arm64: KVM: vgic: Parametrize VGIC_NR_SHARED_IRQS · fb65ab63

由 Marc Zyngier 提交于 7月 08, 2014

Having a dynamic number of supported interrupts means that we
cannot relly on VGIC_NR_SHARED_IRQS being fixed anymore.

Instead, make it take the distributor structure as a parameter,
so it can return the right value.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

fb65ab63

arm/arm64: KVM: vgic: switch to dynamic allocation · c1bfb577

由 Marc Zyngier 提交于 7月 08, 2014

So far, all the VGIC data structures are statically defined by the
*maximum* number of vcpus and interrupts it supports. It means that
we always have to oversize it to cater for the worse case.

Start by changing the data structures to be dynamically sizeable,
and allocate them at runtime.

The sizes are still very static though.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

c1bfb577

KVM: ARM: vgic: plug irq injection race · 71afaba4

由 Marc Zyngier 提交于 7月 08, 2014

As it stands, nothing prevents userspace from injecting an interrupt
before the guest's GIC is actually initialized.

This goes unnoticed so far (as everything is pretty much statically
allocated), but ends up exploding in a spectacular way once we switch
to a more dynamic allocation (the GIC data structure isn't there yet).

The fix is to test for the "ready" flag in the VGIC distributor before
trying to inject the interrupt. Note that in order to avoid breaking
userspace, we have to ignore what is essentially an error.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>

71afaba4

arm/arm64: KVM: vgic: Clarify and correct vgic documentation · 7e362919

由 Christoffer Dall 提交于 6月 14, 2014

The VGIC virtual distributor implementation documentation was written a
very long time ago, before the true nature of the beast had been
partially absorbed into my bloodstream.  Clarify the docs.

Plus, it fixes an actual bug.  ICFRn, pfff.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

7e362919

arm/arm64: KVM: vgic: Fix SGI writes to GICD_I{CS}PENDR0 · 9da48b55

由 Christoffer Dall 提交于 6月 14, 2014

Writes to GICD_ISPENDR0 and GICD_ICPENDR0 ignore all settings of the
pending state for SGIs.  Make sure the implementation handles this
correctly.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

9da48b55

arm/arm64: KVM: vgic: Improve handling of GICD_I{CS}PENDRn · faa1b46c

由 Christoffer Dall 提交于 6月 14, 2014

Writes to GICD_ISPENDRn and GICD_ICPENDRn are currently not handled
correctly for level-triggered interrupts. The spec states that for
level-triggered interrupts, writes to the GICD_ISPENDRn activate the
output of a flip-flop which is in turn or'ed with the actual input
interrupt signal. Correspondingly, writes to GICD_ICPENDRn simply
deactivates the output of that flip-flop, but does not (of course) affect
the external input signal. Reads from GICC_IAR will also deactivate the
flip-flop output.

This requires us to track the state of the level-input separately from
the state in the flip-flop. We therefore introduce two new variables on
the distributor struct to track these two states. Astute readers may
notice that this is introducing more state than required (because an OR
of the two states gives you the pending state), but the remaining vgic
code uses the pending bitmap for optimized operations to figure out, at
the end of the day, if an interrupt is pending or not on the distributor
side. Refactoring the code to consider the two state variables all the
places where we currently access the precomputed pending value, did not
look pretty.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

faa1b46c

arm/arm64: KVM: vgic: Clear queued flags on unqueue · cced50c9

由 Christoffer Dall 提交于 6月 14, 2014

If we unqueue a level-triggered interrupt completely, and the LR does
not stick around in the active state (and will therefore no longer
generate a maintenance interrupt), then we should clear the queued flag
so that the vgic can actually queue this level-triggered interrupt at a
later time and deal with its pending state then.

Note: This should actually be properly fixed to handle the active state
on the distributor.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

cced50c9

arm/arm64: KVM: Rename irq_active to irq_queued · dbf20f9d

由 Christoffer Dall 提交于 6月 09, 2014

We have a special bitmap on the distributor struct to keep track of when
level-triggered interrupts are queued on the list registers. This was
named irq_active, which is confusing, because the active state of an
interrupt as per the GIC spec is a different thing, not specifically
related to edge-triggered/level-triggered configurations but rather
indicates an interrupt which has been ack'ed but not yet eoi'ed.

Rename the bitmap and the corresponding accessor functions to irq_queued
to clarify what this is actually used for.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

dbf20f9d

arm/arm64: KVM: Rename irq_state to irq_pending · 227844f5

由 Christoffer Dall 提交于 6月 09, 2014

The irq_state field on the distributor struct is ambiguous in its
meaning; the comment says it's the level of the input put, but that
doesn't make much sense for edge-triggered interrupts.  The code
actually uses this state variable to check if the interrupt is in the
pending state on the distributor so clarify the comment and rename the
actual variable and accessor methods.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

227844f5

17 9月, 2014 4 次提交

KVM: VFIO: register kvm_device_ops dynamically · 80ce1639

由 Will Deacon 提交于 9月 02, 2014

Now that we have a dynamic means to register kvm_device_ops, use that
for the VFIO kvm device, instead of relying on the static table.

This is achieved by a module_init call to register the ops with KVM.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: NAlex Williamson <Alex.Williamson@redhat.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

80ce1639

KVM: s390: register flic ops dynamically · 84877d93

由 Cornelia Huck 提交于 9月 02, 2014

Using the new kvm_register_device_ops() interface makes us get rid of
an #ifdef in common code.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

84877d93

KVM: ARM: vgic: register kvm_device_ops dynamically · c06a841b

由 Will Deacon 提交于 9月 02, 2014

Now that we have a dynamic means to register kvm_device_ops, use that
for the ARM VGIC, instead of relying on the static table.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c06a841b

KVM: device: add simple registration mechanism for kvm_device_ops · d60eacb0

由 Will Deacon 提交于 9月 02, 2014

kvm_ioctl_create_device currently has knowledge of all the device types
and their associated ops. This is fairly inflexible when adding support
for new in-kernel device emulations, so move what we currently have out
into a table, which can support dynamic registration of ops by new
drivers for virtual hardware.

Cc: Alex Williamson <Alex.Williamson@redhat.com>
Cc: Alex Graf <agraf@suse.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d60eacb0

16 9月, 2014 1 次提交

kvm: ioapic: conditionally delay irq delivery duringeoi broadcast · 184564ef

由 Zhang Haoyu 提交于 9月 11, 2014

Currently, we call ioapic_service() immediately when we find the irq is still
active during eoi broadcast. But for real hardware, there's some delay between
the EOI writing and irq delivery. If we do not emulate this behavior, and
re-inject the interrupt immediately after the guest sends an EOI and re-enables
interrupts, a guest might spend all its time in the ISR if it has a broken
handler for a level-triggered interrupt.

Such livelock actually happens with Windows guests when resuming from
hibernation.

As there's no way to recognize the broken handle from new raised ones, this patch
delays an interrupt if 10.000 consecutive EOIs found that the interrupt was
still high. The guest can then make a little forward progress, until a proper
IRQ handler is set or until some detection routine in the guest (such as
Linux's note_interrupt()) recognizes the situation.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NZhang Haoyu <zhanghy@sangfor.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

184564ef

11 9月, 2014 1 次提交

KVM: EVENTFD: remove inclusion of irq.h · 0ba09511

由 Eric Auger 提交于 9月 01, 2014

No more needed. irq.h would be void on ARM.
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NEric Auger <eric.auger@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

0ba09511

05 9月, 2014 3 次提交

KVM: remove redundant assignments in __kvm_set_memory_region · f2a25160

由 Christian Borntraeger 提交于 9月 04, 2014

__kvm_set_memory_region sets r to EINVAL very early.
Doing it again is not necessary. The same is true later on, where
r is assigned -ENOMEM twice.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f2a25160

KVM: remove redundant assigment of return value in kvm_dev_ioctl · a13f533b

由 Christian Borntraeger 提交于 9月 04, 2014

The first statement of kvm_dev_ioctl is
        long r = -EINVAL;

No need to reassign the same value.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a13f533b

KVM: remove redundant check of in_spin_loop · 34656113

由 Christian Borntraeger 提交于 9月 04, 2014

The expression `vcpu->spin_loop.in_spin_loop' is always true,
because it is evaluated only when the condition
`!vcpu->spin_loop.in_spin_loop' is false.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

34656113

03 9月, 2014 2 次提交

kvm: fix potentially corrupt mmio cache · ee3d1570

由 David Matlack 提交于 8月 18, 2014

vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
   <action based on cache occurs even       |
    though the caching decision was based   |
    on the old memslots>                    |
...                                         |
   <action *continues* to occur until next  |
    memslot generation change, which may    |
    be never>                               |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Cc: stable@vger.kernel.org
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee3d1570

KVM: do not bias the generation number in kvm_current_mmio_generation · 00f034a1

由 Paolo Bonzini 提交于 8月 20, 2014

The next patch will give a meaning (a la seqcount) to the low bit of the
generation number.  Ensure that it matches between kvm->memslots->generation
and kvm_current_mmio_generation().

Cc: stable@vger.kernel.org
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00f034a1

29 8月, 2014 2 次提交

KVM: remove garbage arg to *hardware_{en,dis}able · 13a34e06

由 Radim Krčmář 提交于 8月 28, 2014

In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.
Signed-off-by: NRadim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13a34e06

KVM: Unconditionally export KVM_CAP_READONLY_MEM · 0f8a4de3

由 Christoffer Dall 提交于 8月 26, 2014

The idea between capabilities and the KVM_CHECK_EXTENSION ioctl is that
userspace can, at run-time, determine if a feature is supported or not.
This allows KVM to being supporting a new feature with a new kernel
version without any need to update user space.  Unfortunately, since the
definition of KVM_CAP_READONLY_MEM was guarded by #ifdef
__KVM_HAVE_READONLY_MEM, such discovery still required a user space
update.

Therefore, unconditionally export KVM_CAP_READONLY_MEM and change the
in-kernel conditional to rely on __KVM_HAVE_READONLY_MEM.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f8a4de3

28 8月, 2014 3 次提交

KVM: vgic: declare probe function pointer as const · de56fb19

由 Will Deacon 提交于 8月 26, 2014

We extract the vgic probe function from the of_device_id data pointer,
which is const. Kill the sparse warning by ensuring that the local
function pointer is also marked as const.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

de56fb19

KVM: vgic: return int instead of bool when checking I/O ranges · 1fa451bc

由 Will Deacon 提交于 8月 26, 2014

vgic_ioaddr_overlap claims to return a bool, but in reality it returns
an int. Shut sparse up by fixing the type signature.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

1fa451bc

KVM: Introduce gfn_to_hva_memslot_prot · 64d83126

由 Christoffer Dall 提交于 8月 19, 2014

To support read-only memory regions on arm and arm64, we have a need to
resolve a gfn to an hva given a pointer to a memslot to avoid looping
through the memslots twice and to reuse the hva error checking of
gfn_to_hva_prot(), add a new gfn_to_hva_memslot_prot() function and
refactor gfn_to_hva_prot() to use this function.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

64d83126

22 8月, 2014 1 次提交

KVM: add kvm_arch_sched_in · e790d9ef

由 Radim Krčmář 提交于 8月 21, 2014

Introduce preempt notifiers for architecture specific code.
Advantage over creating a new notifier in every arch is slightly simpler
code and guaranteed call order with respect to kvm_sched_in.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e790d9ef

21 8月, 2014 1 次提交

KVM: avoid unnecessary synchronize_rcu · 7103f60d

由 Christian Borntraeger 提交于 8月 19, 2014

We dont have to wait for a grace period if there is no oldpid that
we are going to free. putpid also checks for NULL, so this patch
only fences synchronize_rcu.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7103f60d

19 8月, 2014 2 次提交

virt/kvm/assigned-dev.c: Set 'dev->irq_source_id' to '-1' after free it · 30d1e0e8

由 Chen Gang 提交于 8月 08, 2014

As a generic function, deassign_guest_irq() assumes it can be called
even if assign_guest_irq() is not be called successfully (which can be
triggered by ioctl from user mode, indirectly).

So for assign_guest_irq() failure process, need set 'dev->irq_source_id'
to -1 after free 'dev->irq_source_id', or deassign_guest_irq() may free
it again.
Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

30d1e0e8

kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601) · 350b8bdd

由 Michael S. Tsirkin 提交于 8月 19, 2014

The third parameter of kvm_iommu_put_pages is wrong,
It should be 'gfn - slot->base_gfn'.

By making gfn very large, malicious guest or userspace can cause kvm to
go to this error path, and subsequently to pass a huge value as size.
Alternatively if gfn is small, then pages would be pinned but never
unpinned, causing host memory leak and local DOS.

Passing a reasonable but large value could be the most dangerous case,
because it would unpin a page that should have stayed pinned, and thus
allow the device to DMA into arbitrary memory.  However, this cannot
happen because of the condition that can trigger the error:

- out of memory (where you can't allocate even a single page)
  should not be possible for the attacker to trigger

- when exceeding the iommu's address space, guest pages after gfn
  will also exceed the iommu's address space, and inside
  kvm_iommu_put_pages() the iommu_iova_to_phys() will fail.  The
  page thus would not be unpinned at all.
Reported-by: NJack Morgenstein <jackm@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

350b8bdd

06 8月, 2014 1 次提交

KVM: Move more code under CONFIG_HAVE_KVM_IRQFD · c77dcacb

由 Paolo Bonzini 提交于 8月 06, 2014

Commits e4d57e1e (KVM: Move irq notifier implementation into
eventfd.c, 2014-06-30) included the irq notifier code unconditionally
in eventfd.c, while it was under CONFIG_HAVE_KVM_IRQCHIP before.

Similarly, commit 297e2105 (KVM: Give IRQFD its own separate enabling
Kconfig option, 2014-06-30) moved code from CONFIG_HAVE_IRQ_ROUTING
to CONFIG_HAVE_KVM_IRQFD but forgot to move the pieces that used to be
under CONFIG_HAVE_KVM_IRQCHIP.

Together, this broke compilation without CONFIG_KVM_XICS. Fix by adding
or changing the #ifdefs so that they point at CONFIG_HAVE_KVM_IRQFD.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c77dcacb

05 8月, 2014 5 次提交

KVM: Give IRQFD its own separate enabling Kconfig option · 297e2105

由 Paul Mackerras 提交于 6月 30, 2014

Currently, the IRQFD code is conditional on CONFIG_HAVE_KVM_IRQ_ROUTING.
So that we can have the IRQFD code compiled in without having the
IRQ routing code, this creates a new CONFIG_HAVE_KVM_IRQFD, makes
the IRQFD code conditional on it instead of CONFIG_HAVE_KVM_IRQ_ROUTING,
and makes all the platforms that currently select HAVE_KVM_IRQ_ROUTING
also select HAVE_KVM_IRQFD.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Tested-by: NEric Auger <eric.auger@linaro.org>
Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

297e2105

KVM: Move irq notifier implementation into eventfd.c · e4d57e1e

由 Paul Mackerras 提交于 6月 30, 2014

This moves the functions kvm_irq_has_notifier(), kvm_notify_acked_irq(),
kvm_register_irq_ack_notifier() and kvm_unregister_irq_ack_notifier()
from irqchip.c to eventfd.c.  The reason for doing this is that those
functions are used in connection with IRQFDs, which are implemented in
eventfd.c.  In future we will want to use IRQFDs on platforms that
don't implement the GSI routing implemented in irqchip.c, so we won't
be compiling in irqchip.c, but we still need the irq notifiers.  The
implementation is unchanged.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Tested-by: NEric Auger <eric.auger@linaro.org>
Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e4d57e1e

KVM: Move all accesses to kvm::irq_routing into irqchip.c · 9957c86d

由 Paul Mackerras 提交于 6月 30, 2014

Now that struct _irqfd does not keep a reference to storage pointed
to by the irq_routing field of struct kvm, we can move the statement
that updates it out from under the irqfds.lock and put it in
kvm_set_irq_routing() instead.  That means we then have to take a
srcu_read_lock on kvm->irq_srcu around the irqfd_update call in
kvm_irqfd_assign(), since holding the kvm->irqfds.lock no longer
ensures that that the routing can't change.

Combined with changing kvm_irq_map_gsi() and kvm_irq_map_chip_pin()
to take a struct kvm * argument instead of the pointer to the routing
table, this allows us to to move all references to kvm->irq_routing
into irqchip.c.  That in turn allows us to move the definition of the
kvm_irq_routing_table struct into irqchip.c as well.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Tested-by: NEric Auger <eric.auger@linaro.org>
Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9957c86d

KVM: irqchip: Provide and use accessors for irq routing table · 8ba918d4

由 Paul Mackerras 提交于 6月 30, 2014

This provides accessor functions for the KVM interrupt mappings, in
order to reduce the amount of code that accesses the fields of the
kvm_irq_routing_table struct, and restrict that code to one file,
virt/kvm/irqchip.c.  The new functions are kvm_irq_map_gsi(), which
maps from a global interrupt number to a set of IRQ routing entries,
and kvm_irq_map_chip_pin, which maps from IRQ chip and pin numbers to
a global interrupt number.

This also moves the update of kvm_irq_routing_table::chip[][]
into irqchip.c, out of the various kvm_set_routing_entry
implementations.  That means that none of the kvm_set_routing_entry
implementations need the kvm_irq_routing_table argument anymore,
so this removes it.

This does not change any locking or data lifetime rules.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Tested-by: NEric Auger <eric.auger@linaro.org>
Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8ba918d4

KVM: Don't keep reference to irq routing table in irqfd struct · 56f89f36

由 Paul Mackerras 提交于 6月 30, 2014

This makes the irqfd code keep a copy of the irq routing table entry
for each irqfd, rather than a reference to the copy in the actual
irq routing table maintained in kvm/virt/irqchip.c.  This will enable
us to change the routing table structure in future, or even not have a
routing table at all on some platforms.

The synchronization that was previously achieved using srcu_dereference
on the read side is now achieved using a seqcount_t structure.  That
ensures that we don't get a halfway-updated copy of the structure if
we read it while another thread is updating it.

We still use srcu_read_lock/unlock around the read side so that when
changing the routing table we can be sure that after calling
synchronize_srcu, nothing will be using the old routing.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Tested-by: NEric Auger <eric.auger@linaro.org>
Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56f89f36

31 7月, 2014 2 次提交

KVM: arm64: GICv3: mandate page-aligned GICV region · fb3ec679

由 Marc Zyngier 提交于 7月 31, 2014

Just like GICv2 was fixed in 63afbe7a
(kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform),
mandate the GICV region to be both aligned on a page boundary and
its size to be a multiple of page size.

This prevents a guest from being able to poke at regions where we
have no idea what is sitting there.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

fb3ec679

KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table · 0f6c0a74

由 Paolo Bonzini 提交于 7月 30, 2014

Currently, the EOI exit bitmap (used for APICv) does not include
interrupts that are masked. However, this can cause a bug that manifests
as an interrupt storm inside the guest. Alex Williamson reported the
bug and is the one who really debugged this; I only wrote the patch. :)

The scenario involves a multi-function PCI device with OHCI and EHCI
USB functions and an audio function, all assigned to the guest, where
both USB functions use legacy INTx interrupts.

As soon as the guest boots, interrupts for these devices turn into an
interrupt storm in the guest; the host does not see the interrupt storm.
Basically the EOI path does not work, and the guest continues to see the
interrupt over and over, even after it attempts to mask it at the APIC.
The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
with not many changes in the area of APIC/IOAPIC handling).

Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
on in the eoi_exit_bitmap and TMR, and things then work. What happens
is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
It does not have set bit 59 because the RTE was masked, so the IOAPIC
never sees the EOI and the interrupt continues to fire in the guest.

My guess was that the guest is masking the interrupt in the redirection
table in the interrupt routine, i.e. while the interrupt is set in a
LAPIC's ISR, The simplest fix is to ignore the masking state, we would
rather have an unnecessary exit rather than a missed IRQ ACK and anyway
IOAPIC interrupts are not as performance-sensitive as for example MSIs.
Alex tested this patch and it fixed his bug.

[Thanks to Alex for his precise description of the problem
and initial debugging effort. A lot of the text above is
based on emails exchanged with him.]
Reported-by: NAlex Williamson <alex.williamson@redhat.com>
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f6c0a74

30 7月, 2014 1 次提交

kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform · 63afbe7a

由 Will Deacon 提交于 7月 25, 2014

If the physical address of GICV isn't page-aligned, then we end up
creating a stage-2 mapping of the page containing it, which causes us to
map neighbouring memory locations directly into the guest.

As an example, consider a platform with GICV at physical 0x2c02f000
running a 64k-page host kernel. If qemu maps this into the guest at
0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will
map host physical region 0x2c020000 - 0x2c02efff. Accesses to these
physical regions may cause UNPREDICTABLE behaviour, for example, on the
Juno platform this will cause an SError exception to EL3, which brings
down the entire physical CPU resulting in RCU stalls / HYP panics / host
crashing / wasted weeks of debugging.

SBSA recommends that systems alias the 4k GICV across the bounding 64k
region, in which case GICV physical could be described as 0x2c020000 in
the above scenario.

This patch fixes the problem by failing the vgic probe if the physical
base address or the size of GICV aren't page-aligned. Note that this
generated a warning in dmesg about freeing enabled IRQs, so I had to
move the IRQ enabling later in the probe.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joel Schopp <joel.schopp@amd.com>
Cc: Don Dutile <ddutile@redhat.com>
Acked-by: NPeter Maydell <peter.maydell@linaro.org>
Acked-by: NJoel Schopp <joel.schopp@amd.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

63afbe7a

28 7月, 2014 1 次提交

KVM: Allow KVM_CHECK_EXTENSION on the vm fd · 92b591a4

由 Alexander Graf 提交于 7月 14, 2014

The KVM_CHECK_EXTENSION is only available on the kvm fd today. Unfortunately
on PPC some of the capabilities change depending on the way a VM was created.

So instead we need a way to expose capabilities as VM ioctl, so that we can
see which VM type we're using (HV or PR). To enable this, add the
KVM_CHECK_EXTENSION ioctl to our vm ioctl portfolio.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

92b591a4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功