提交 · efbeec7098eee2b3d2359d0cc24bbba0436e7f21 · openanolis / cloud-kernel

28 12月, 2014 1 次提交

kvm: fix sorting of memslots with base_gfn == 0 · efbeec70

由 Paolo Bonzini 提交于 12月 27, 2014

Before commit 0e60b079 (kvm: change memslot sorting rule from size
to GFN, 2014-12-01), the memslots' sorting key was npages, meaning
that a valid memslot couldn't have its sorting key equal to zero.
On the other hand, a valid memslot can have base_gfn == 0, and invalid
memslots are identified by base_gfn == npages == 0.

Because of this, commit 0e60b079 broke the invariant that invalid
memslots are at the end of the mslots array.  When a memslot with
base_gfn == 0 was created, any invalid memslot before it were left
in place.

This can be fixed by changing the insertion to use a ">=" comparison
instead of "<=", but some care is needed to avoid breaking the case
of deleting a memslot; see the comment in update_memslots.

Thanks to Tiejun Chen for posting an initial patch for this bug.
Reported-by: NJamie Heilman <jamie@audible.transient.net>
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Tested-by: NJamie Heilman <jamie@audible.transient.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

efbeec70

15 12月, 2014 2 次提交

arm/arm64: KVM: Require in-kernel vgic for the arch timers · 05971120

由 Christoffer Dall 提交于 12月 12, 2014

It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.

To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.

When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.

We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

05971120

arm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs · ca7d9c82

由 Christoffer Dall 提交于 12月 09, 2014

Userspace assumes that it can wire up IRQ injections after having
created all VCPUs and after having created the VGIC, but potentially
before starting the first VCPU.  This can currently lead to lost IRQs
because the state of that IRQ injection is not stored anywhere and we
don't return an error to userspace.

We haven't seen this problem manifest itself yet, presumably because
guests reset the devices on boot, but this could cause issues with
migration and other non-standard startup configurations.
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

ca7d9c82

13 12月, 2014 3 次提交

arm/arm64: KVM: Add (new) vgic_initialized macro · 1f57be28

由 Christoffer Dall 提交于 12月 09, 2014

Some code paths will need to check to see if the internal state of the
vgic has been initialized (such as when creating new VCPUs), so
introduce such a macro that checks the nr_cpus field which is set when
the vgic has been initialized.

Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
vgic_init() will call this function, and code should never errornously
assume the vgic to be properly initialized after an error.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NEric Auger <eric.auger@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

1f57be28

arm/arm64: KVM: Rename vgic_initialized to vgic_ready · c52edf5f

由 Christoffer Dall 提交于 12月 09, 2014

The vgic_initialized() macro currently returns the state of the
vgic->ready flag, which indicates if the vgic is ready to be used when
running a VM, not specifically if its internal state has been
initialized.

Rename the macro accordingly in preparation for a more nuanced
initialization flow.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NEric Auger <eric.auger@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

c52edf5f

arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps() · 6d3cfbe2

由 Peter Maydell 提交于 12月 04, 2014

VGIC initialization currently happens in three phases:
 (1) kvm_vgic_create() (triggered by userspace GIC creation)
 (2) vgic_init_maps() (triggered by userspace GIC register read/write
     requests, or from kvm_vgic_init() if not already run)
 (3) kvm_vgic_init() (triggered by first VM run)

We were doing initialization of some state to correspond with the
state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
since it will overwrite changes made by userspace using the
register access APIs before the VM is run. Move this initialization
earlier, into the vgic_init_maps() phase.

This fixes a bug where QEMU could successfully restore a saved
VM state snapshot into a VM that had already been run, but could
not restore it "from cold" using the -loadvm command line option
(the symptoms being that the restored VM would run but interrupts
were ignored).

Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
kvm_vgic_map_resources.

  [ This patch is originally written by Peter Maydell, but I have
    modified it somewhat heavily, renaming various bits and moving code
    around.  If something is broken, I am to be blamed. - Christoffer ]
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NEric Auger <eric.auger@linaro.org>
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

6d3cfbe2

04 12月, 2014 6 次提交

KVM: track pid for VCPU only on KVM_RUN ioctl · 7a72f7a1

由 Christian Borntraeger 提交于 8月 05, 2014

We currently track the pid of the task that runs the VCPU in vcpu_load.
If a yield to that VCPU is triggered while the PID of the wrong thread
is active, the wrong thread might receive a yield, but this will most
likely not help the executing thread at all.  Instead, if we only track
the pid on the KVM_RUN ioctl, there are two possibilities:

1) the thread that did a non-KVM_RUN ioctl is holding a mutex that
the VCPU thread is waiting for.  In this case, the VCPU thread is not
runnable, but we also do not do a wrong yield.

2) the thread that did a non-KVM_RUN ioctl is sleeping, or doing
something that does not block the VCPU thread.  In this case, the
VCPU thread can receive the directed yield correctly.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
CC: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a72f7a1

KVM: don't check for PF_VCPU when yielding · eed6e79d

由 David Hildenbrand 提交于 11月 25, 2014

kvm_enter_guest() has to be called with preemption disabled and will
set PF_VCPU.  Current code takes PF_VCPU as a hint that the VCPU thread
is running and therefore needs no yield.

However, the check on PF_VCPU is wrong on s390, where preemption has
to stay enabled in order to correctly process page faults.  Thus,
s390 reenables preemption and starts to execute the guest.  The thread
might be scheduled out between kvm_enter_guest() and kvm_exit_guest(),
resulting in PF_VCPU being set but not being run.  When this happens,
the opportunity for directed yield is missed.

However, this check is done already in kvm_vcpu_on_spin before calling
kvm_vcpu_yield_loop:

        if (!ACCESS_ONCE(vcpu->preempted))
                continue;

so the check on PF_VCPU is superfluous in general, and this patch
removes it.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eed6e79d

kvm: optimize GFN to memslot lookup with large slots amount · 9c1a5d38

由 Igor Mammedov 提交于 12月 01, 2014

Current linear search doesn't scale well when
large amount of memslots is used and looked up slot
is not in the beginning memslots array.
Taking in account that memslots don't overlap, it's
possible to switch sorting order of memslots array from
'npages' to 'base_gfn' and use binary search for
memslot lookup by GFN.

As result of switching to binary search lookup times
are reduced with large amount of memslots.

Following is a table of search_memslot() cycles
during WS2008R2 guest boot.

                         boot,          boot + ~10 min
                         mostly same    of using it,
                         slot lookup    randomized lookup
                max      average        average
                cycles   cycles         cycles

13 slots      : 1450       28           30

13 slots      : 1400       30           40
binary search

117 slots     : 13000      30           460

117 slots     : 2000       35           180
binary search
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c1a5d38

kvm: change memslot sorting rule from size to GFN · 0e60b079

由 Igor Mammedov 提交于 12月 01, 2014

it will allow to use binary search for GFN -> memslot
lookups, reducing lookup cost with large slots amount.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0e60b079

kvm: update_memslots: drop not needed check for the same slot · 7f379cff

由 Igor Mammedov 提交于 12月 01, 2014

UP/DOWN shift loops will shift array in needed
direction and stop at place where new slot should
be placed regardless of old slot size.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f379cff

kvm: update_memslots: drop not needed check for the same number of pages · 5a38b6e6

由 Igor Mammedov 提交于 12月 01, 2014

if number of pages haven't changed sorting algorithm
will do nothing, so there is no need to do extra check
to avoid entering sorting logic.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a38b6e6

26 11月, 2014 3 次提交

kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() · d3fccc7e

由 Ard Biesheuvel 提交于 11月 10, 2014

This reverts commit 85c8555f ("KVM: check for !is_zero_pfn() in
kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

The problem being addressed by the patch above was that some ARM code
based the memory mapping attributes of a pfn on the return value of
kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
be mapped as device memory.

However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
and the existing non-ARM users were already using it in a way which
suggests that its name should probably have been 'kvm_is_reserved_pfn'
from the beginning, e.g., whether or not to call get_page/put_page on
it etc. This means that returning false for the zero page is a mistake
and the patch above should be reverted.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d3fccc7e

arm/arm64: KVM: vgic: Fix error code in kvm_vgic_create() · 6b50f540

由 Christoffer Dall 提交于 11月 06, 2014

If we detect another vCPU is running we just exit and return 0 as if we
succesfully created the VGIC, but the VGIC wouldn't actual be created.

This shouldn't break in-kernel behavior because the kernel will not
observe the failed the attempt to create the VGIC, but userspace could
be rightfully confused.

Cc: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6b50f540

arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all · 016ed39c

由 Shannon Zhao 提交于 11月 19, 2014

When call kvm_vgic_inject_irq to inject interrupt, we can known which
vcpu the interrupt for by the irq_num and the cpuid. So we should just
kick this vcpu to avoid iterating through all.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NShannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

016ed39c

25 11月, 2014 3 次提交

arm/arm64: vgic: Remove unreachable irq_clear_pending · b1e952b4

由 Christoffer Dall 提交于 11月 24, 2014

When 'injecting' an edge-triggered interrupt with a falling edge we
shouldn't clear the pending state on the distributor.  In fact, we
don't, because the check in vgic_validate_injection would prevent us
from ever reaching this bit of code.

Remove the unreachable snippet.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

b1e952b4

kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn() · bf4bea8e

由 Ard Biesheuvel 提交于 11月 10, 2014

This reverts commit 85c8555f ("KVM: check for !is_zero_pfn() in
kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn.

The problem being addressed by the patch above was that some ARM code
based the memory mapping attributes of a pfn on the return value of
kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should
be mapped as device memory.

However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin,
and the existing non-ARM users were already using it in a way which
suggests that its name should probably have been 'kvm_is_reserved_pfn'
from the beginning, e.g., whether or not to call get_page/put_page on
it etc. This means that returning false for the zero page is a mistake
and the patch above should be reverted.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

bf4bea8e

KVM: ARM: VGIC: Optimize the vGIC vgic_update_irq_pending function. · 7d39f9e3

由 wanghaibin 提交于 11月 17, 2014

When vgic_update_irq_pending with level-sensitive false, it is need to
deactivates an interrupt, and, it can go to out directly.
Here return a false value, because it will be not need to kick.
Signed-off-by: Nwanghaibin <wanghaibin.wang@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

7d39f9e3

24 11月, 2014 1 次提交

kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ · c274e03a

由 Radim Krčmář 提交于 11月 21, 2014

Now that ia64 is gone, we can hide deprecated device assignment in x86.

Notable changes:
 - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl()

The easy parts were removed from generic kvm code, remaining
 - kvm_iommu_(un)map_pages() would require new code to be moved
 - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c274e03a

22 11月, 2014 1 次提交

kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ · 6ef768fa

由 Paolo Bonzini 提交于 11月 20, 2014

ia64 does not need them anymore.  Ack notifiers become x86-specific
too.
Suggested-by: NGleb Natapov <gleb@kernel.org>
Reviewed-by: NRadim Krcmar <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ef768fa

20 11月, 2014 1 次提交

KVM: ia64: remove · 003f7de6

由 Paolo Bonzini 提交于 11月 19, 2014

KVM for ia64 has been marked as broken not just once, but twice even,
and the last patch from the maintainer is now roughly 5 years old.
Time for it to rest in peace.
Acked-by: NGleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

003f7de6

17 11月, 2014 2 次提交

kvm: simplify update_memslots invocation · 5cc15027

由 Paolo Bonzini 提交于 11月 14, 2014

The update_memslots invocation is only needed in one case.  Make
the code clearer by moving it to __kvm_set_memory_region, and
removing the wrapper around insert_memslot.
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5cc15027

kvm: commonize allocation of the new memory slots · f2a81036

由 Paolo Bonzini 提交于 11月 14, 2014

The two kmemdup invocations can be unified.  I find that the new
placement of the comment makes it easier to see what happens.
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f2a81036

14 11月, 2014 2 次提交

kvm: memslots: track id_to_index changes during the insertion sort · 8593176c

由 Paolo Bonzini 提交于 11月 14, 2014

This completes the optimization from the previous patch, by
removing the KVM_MEM_SLOTS_NUM-iteration loop from insert_memslot.
Reviewed-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8593176c

kvm: memslots: replace heap sort with an insertion sort pass · 063584d4

由 Igor Mammedov 提交于 11月 13, 2014

memslots is a sorted array.  When a slot is changed, heapsort (lib/sort.c)
would take O(n log n) time to update it; an optimized insertion sort will
only cost O(n) on an array with just one item out of order.

Replace sort() with a custom sort that takes advantage of memslots usage
pattern and the known position of the changed slot.

performance change of 128 memslots insertions with gradually increasing
size (the worst case):

      heap sort   custom sort
max:  249747      2500 cycles

with custom sort alg taking ~98% less then original
update time.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

063584d4

03 11月, 2014 2 次提交

KVM: trivial fix comment regarding __kvm_set_memory_region · 02d5d55b

由 Dominik Dingel 提交于 10月 27, 2014

commit 72dc67a6 ("KVM: remove the usage of the mmap_sem for the protection of the memory slots.")
changed the lock which will be taken. This should be reflected in the function
commentary.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02d5d55b

KVM: x86: some apic broadcast modes does not work · 394457a9

由 Nadav Amit 提交于 10月 03, 2014

KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
(10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
physical destination modes."

In addition, the local-apic enables cluster mode broadcast. As Intel SDM
10.6.2.2 says: "Broadcast to all local APICs is achieved by setting all
destination bits to one." This patch enables cluster mode broadcast.

The fix tries to combine broadcast in different modes through a unified code.

One rare case occurs when the source of IPI has its APIC disabled.  In such
case, the source can still issue IPIs, but since the source is not obliged to
have the same LAPIC mode as the enabled ones, we cannot rely on it.
Since it is a rare case, it is unoptimized and done on the slow-path.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NWanpeng Li <wanpeng.li@linux.intel.com>
[As per Radim's review, use unsigned int for X2APIC_BROADCAST, return bool from
 kvm_apic_broadcast. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

394457a9

24 10月, 2014 2 次提交

kvm: vfio: fix unregister kvm_device_ops of vfio · 571ee1b6

由 Wanpeng Li 提交于 10月 09, 2014

After commit 80ce1639 (KVM: VFIO: register kvm_device_ops dynamically),
kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
(kvm-vfio: do not use module_init) move the dynamic register invoked by
kvm_init in order to fix broke unloading of the kvm module. However,
kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
which lead to device type collision detection warning after kvm-intel
module reinsmod.

    WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
    Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
    CPU: 1 PID: 10358 Comm: insmod Tainted: G        W  O   3.17.0-rc1 #2
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
     0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
     0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
     ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
    Call Trace:
     [<ffffffff814a61d9>] dump_stack+0x49/0x60
     [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96
     [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm]
     [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17
     [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm]
     [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel]
     [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
     [<ffffffff810002ab>] do_one_initcall+0xe3/0x170
     [<ffffffff811168a9>] ? __vunmap+0xad/0xb8
     [<ffffffff8109c58f>] do_init_module+0x2b/0x174
     [<ffffffff8109d414>] load_module+0x43e/0x569
     [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174
     [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82
     [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20
     [<ffffffff8109d65f>] SyS_init_module+0x54/0x81
     [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b
    ---[ end trace 0626f4a3ddea56f3 ]---

The bug can be reproduced by:

    rmmod kvm_intel.ko
    insmod kvm_intel.ko

without rmmod/insmod kvm.ko
This patch fixes the bug by unregistering kvm_device_ops of vfio when the
kvm-intel module is removed.
Reported-by: NLiu Rongrong <rongrongx.liu@intel.com>
Fixes: 3c3c29fdSigned-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

571ee1b6

kvm: fix excessive pages un-pinning in kvm_iommu_map error path. · 3d32e4db

由 Quentin Casasnovas 提交于 10月 17, 2014

The third parameter of kvm_unpin_pages() when called from
kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
and not the page size.

This error was facilitated with an inconsistent API: kvm_pin_pages() takes
a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
by matching the two.

This was introduced by commit 350b8bdd ("kvm: iommu: fix the third parameter
of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
un-pinning for pages intended to be un-pinned (i.e. memory leak) but
unfortunately potentially aggravated the number of pages we un-pin that
should have stayed pinned. As far as I understand though, the same
practical mitigations apply.

This issue was found during review of Red Hat 6.6 patches to prepare
Ksplice rebootless updates.

Thanks to Vegard for his time on a late Friday evening to help me in
understanding this code.

Fixes: 350b8bdd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
Cc: stable@vger.kernel.org
Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJamie Iles <jamie.iles@oracle.com>
Reviewed-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3d32e4db

16 10月, 2014 1 次提交

arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs · 2df36a5d

由 Christoffer Dall 提交于 9月 28, 2014

The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
store these as an array of two such registers on the vgic vcpu struct.
However, we access them as a single 64-bit value or as a bitmap pointer
in the generic vgic code, which breaks BE support.

Instead, store them as u64 values on the vgic structure and do the
word-swapping in the assembly code, which already handles the byte order
for BE systems.
Tested-by: NVictor Kamensky <victor.kamensky@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

2df36a5d

10 10月, 2014 1 次提交

arm/arm64: KVM: add 'writable' parameter to kvm_phys_addr_ioremap · c40f2f8f

由 Ard Biesheuvel 提交于 9月 17, 2014

Add support for read-only MMIO passthrough mappings by adding a
'writable' parameter to kvm_phys_addr_ioremap. For the moment,
mappings will be read-write even if 'writable' is false, but once
the definition of PAGE_S2_DEVICE gets changed, those mappings will
be created read-only.
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

c40f2f8f

26 9月, 2014 2 次提交

kvm: Fix kvm_get_page_retry_io __gup retval check · bb0ca6ac

由 Andres Lagar-Cavilla 提交于 9月 25, 2014

Confusion around -EBUSY and zero (inside a BUG_ON no less).
Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb0ca6ac

arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset · 0fea6d76

由 Christoffer Dall 提交于 9月 25, 2014

The sgi values calculated in read_set_clear_sgi_pend_reg() and
write_set_clear_sgi_pend_reg() were horribly incorrectly multiplied by 4
with catastrophic results in that subfunctions ended up overwriting
memory not allocated for the expected purpose.

This showed up as bugs in kfree() and the kernel complaining a lot of
you turn on memory debugging.

This addresses: http://marc.info/?l=kvm&m=141164910007868&w=2Reported-by: NShannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

0fea6d76

25 9月, 2014 1 次提交

kvm: iommu: Convert to use new iommu_capable() API function · ee5ba30f

由 Joerg Roedel 提交于 9月 05, 2014

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

ee5ba30f

24 9月, 2014 6 次提交

kvm: Add arch specific mmu notifier for page invalidation · fe71557a

由 Tang Chen 提交于 9月 24, 2014

This will be used to let the guest run while the APIC access page is
not pinned.  Because subsequent patches will fill in the function
for x86, place the (still empty) x86 implementation in the x86.c file
instead of adding an inline function in kvm_host.h.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe71557a

kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static · 445b8236

由 Tang Chen 提交于 9月 24, 2014

Different architectures need different requests, and in fact we
will use this function in architecture-specific code later. This
will be outside kvm_main.c, so make it non-static and rename it to
kvm_make_all_cpus_request().
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

445b8236

kvm: Fix page ageing bugs · 57128468

由 Andres Lagar-Cavilla 提交于 9月 22, 2014

1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57128468

kvm: don't take vcpu mutex for obviously invalid vcpu ioctls · 2ea75be3

由 David Matlack 提交于 9月 19, 2014

vcpu ioctls can hang the calling thread if issued while a vcpu is running.
However, invalid ioctls can happen when userspace tries to probe the kind
of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case,
we know the ioctl is going to be rejected as invalid anyway and we can
fail before trying to take the vcpu mutex.

This patch does not change functionality, it just makes invalid ioctls
fail faster.

Cc: stable@vger.kernel.org
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2ea75be3

kvm: Faults which trigger IO release the mmap_sem · 234b239b

由 Andres Lagar-Cavilla 提交于 9月 17, 2014

When KVM handles a tdp fault it uses FOLL_NOWAIT. If the guest memory
has been swapped out or is behind a filemap, this will trigger async
readahead and return immediately. The rationale is that KVM will kick
back the guest with an "async page fault" and allow for some other
guest process to take over.

If async PFs are enabled the fault is retried asap from an async
workqueue. If not, it's retried immediately in the same code path. In
either case the retry will not relinquish the mmap semaphore and will
block on the IO. This is a bad thing, as other mmap semaphore users
now stall as a function of swap or filemap latency.

This patch ensures both the regular and async PF path re-enter the
fault allowing for the mmap semaphore to be relinquished in the case
of IO wait.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

234b239b

kvm-vfio: do not use module_init · 3c3c29fd

由 Paolo Bonzini 提交于 9月 24, 2014

/me got confused between the kernel and QEMU.  In the kernel, you can
only have one module_init function, and it will prevent unloading the
module unless you also have the corresponding module_exit function.

So, commit 80ce1639 (KVM: VFIO: register kvm_device_ops dynamically,
2014-09-02) broke unloading of the kvm module, by adding a module_init
function and no module_exit.

Repair it by making kvm_vfio_ops_init weak, and checking it in
kvm_init.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Alex Williamson <Alex.Williamson@redhat.com>
Fixes: 80ce1639Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3c3c29fd

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功