提交 · c9a7953f09bbe2b66050ebf97e0532eaeefbc9f3 · openanolis / cloud-kernel

11 3月, 2014 2 次提交

KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f

由 Jan Kiszka 提交于 3月 07, 2014

It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9a7953f

KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145

由 Jan Kiszka 提交于 3月 07, 2014

Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6b8a145

04 3月, 2014 2 次提交

x86: kvm: introduce periodic global clock updates · 332967a3

由 Andrew Jones 提交于 2月 28, 2014

commit 0061d53d introduced a mechanism to execute a global clock
update for a vm. We can apply this periodically in order to propagate
host NTP corrections. Also, if all vcpus of a vm are pinned, then
without an additional trigger, no guest NTP corrections can propagate
either, as the current trigger is only vcpu cpu migration.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

332967a3

x86: kvm: rate-limit global clock updates · 7e44e449

由 Andrew Jones 提交于 2月 28, 2014

When we update a vcpu's local clock it may pick up an NTP correction.
We can't wait an indeterminate amount of time for other vcpus to pick
up that correction, so commit 0061d53d introduced a global clock
update. However, we can't request a global clock update on every vcpu
load either (which is what happens if the tsc is marked as unstable).
The solution is to rate-limit the global clock updates. Marcelo
calculated that we should delay the global clock updates no more
than 0.1s as follows:

Assume an NTP correction c is applied to one vcpu, but not the other,
then in n seconds the delta of the vcpu system_timestamps will be
c * n. If we assume a correction of 500ppm (worst-case), then the two
vcpus will diverge 50us in 0.1s, which is a considerable amount.
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e44e449

28 2月, 2014 2 次提交

kvm: x86: fix emulator buffer overflow (CVE-2014-0049) · a08d3b3b

由 Andrew Honig 提交于 2月 27, 2014

The problem occurs when the guest performs a pusha with the stack
address pointing to an mmio address (or an invalid guest physical
address) to start with, but then extending into an ordinary guest
physical address. When doing repeated emulated pushes
emulator_read_write sets mmio_needed to 1 on the first one. On a
later push when the stack points to regular memory,
mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.

As a result, KVM exits to userspace, and then returns to
complete_emulated_mmio. In complete_emulated_mmio
vcpu->mmio_cur_fragment is incremented. The termination condition of
vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
The code bounces back and fourth to userspace incrementing
mmio_cur_fragment past it's buffer. If the guest does nothing else it
eventually leads to a a crash on a memcpy from invalid memory address.

However if a guest code can cause the vm to be destroyed in another
vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
can be used by the guest to control the data that's pointed to by the
call to cancel_work_item, which can be used to gain execution.

Fixes: f78146b0Signed-off-by: NAndrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org (3.5+)
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a08d3b3b

KVM: x86: Break kvm_for_each_vcpu loop after finding the VP_INDEX · 684851a1

由 Takuya Yoshikawa 提交于 2月 27, 2014

No need to scan the entire VCPU array.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

684851a1

26 2月, 2014 3 次提交

KVM: x86: emulator_cmpxchg_emulated should mark_page_dirty · d3714010

由 Marcelo Tosatti 提交于 2月 25, 2014

emulator_cmpxchg_emulated writes to guest memory, therefore it should
update the dirty bitmap accordingly.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d3714010

KVM: x86: Enable Intel MPX for guest · 390bd528

由 Liu, Jinsong 提交于 2月 24, 2014

From 44c2abca2c2eadc6f2f752b66de4acc8131880c4 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:12:31 +0800
Subject: [PATCH v5 3/3] KVM: x86: Enable Intel MPX for guest

This patch enable Intel MPX feature to guest.
Signed-off-by: NXudong Hao <xudong.hao@intel.com>
Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

390bd528

KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save · 0dd376e7

由 Liu, Jinsong 提交于 2月 24, 2014

From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:11:55 +0800
Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save

Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic
to kvm_get/set_msr().
Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0dd376e7

22 2月, 2014 1 次提交

KVM: x86: Fix xsave cpuid exposing bug · 56c103ec

由 Liu, Jinsong 提交于 2月 21, 2014

From 00c920c96127d20d4c3bb790082700ae375c39a0 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Fri, 21 Feb 2014 23:47:18 +0800
Subject: [PATCH] KVM: x86: Fix xsave cpuid exposing bug

EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable.
Bit 63 of XCR0 is reserved for future expansion.
Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56c103ec

04 2月, 2014 1 次提交

KVM: x86: remove unused last_kernel_ns variable · 4f34d683

由 Marcelo Tosatti 提交于 1月 29, 2014

Remove unused last_kernel_ns variable.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f34d683

30 1月, 2014 1 次提交

kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef · 5f66b620

由 Paolo Bonzini 提交于 1月 29, 2014

Self explanatory.
Reported-by: NRadim Krcmar <rkrcmar@redhat.com>
Cc: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f66b620

27 1月, 2014 1 次提交

KVM: x86: Validate guest writes to MSR_IA32_APICBASE · 58cb628d

由 Jan Kiszka 提交于 1月 24, 2014

Check for invalid state transitions on guest-initiated updates of
MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is
not supported and all invalid transitions as described in SDM section
10.12.5. It also checks that no reserved bit is set in APICBASE by the
guest.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
[Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58cb628d

24 1月, 2014 2 次提交

KVM: x86: mark hyper-v vapic assist page as dirty · b3af1e88

由 Vadim Rozenfeld 提交于 1月 23, 2014

Signed-off-by: NVadim Rozenfeld <vrozenfe@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b3af1e88

KVM: x86: mark hyper-v hypercall page as dirty · b94b64c9

由 Vadim Rozenfeld 提交于 1月 23, 2014

Signed-off-by: NVadim Rozenfeld <vrozenfe@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b94b64c9

17 1月, 2014 3 次提交

KVM: SVM: Fix reading of DR6 · 73aaf249

由 Jan Kiszka 提交于 1月 04, 2014

In contrast to VMX, SVM dose not automatically transfer DR6 into the
VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
hook to obtain the current value. And as SVM now picks the DR6 state
from its VMCB, we also need a set callback in order to write updates of
DR6 back.

Fixes a regression of 020df079.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73aaf249

KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS · 9926c9fd

由 Jan Kiszka 提交于 1月 04, 2014

Whenever we change arch.dr7, we also have to call kvm_update_dr7. In
case guest debugging is off, this will synchronize the new state into
hardware.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9926c9fd

add support for Hyper-V reference time counter · e984097b

由 Vadim Rozenfeld 提交于 1月 16, 2014

Signed-off: Peter Lieven <pl@kamp.de>
Signed-off: Gleb Natapov
Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com>

After some consideration I decided to submit only Hyper-V reference
counters support this time. I will submit iTSC support as a separate
patch as soon as it is ready.

v1 -> v2
1. mark TSC page dirty as suggested by
    Eric Northup <digitaleric@google.com> and Gleb
2. disable local irq when calling get_kernel_ns,
    as it was done by Peter Lieven <pl@amp.de>
3. move check for TSC page enable from second patch
    to this one.

v3 -> v4
    Get rid of ref counter offset.

v4 -> v5
    replace __copy_to_user with kvm_write_guest
    when updateing iTSC page.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e984097b

16 1月, 2014 1 次提交

KVM: remove useless write to vcpu->hv_clock.tsc_timestamp · aab6d7ce

由 Paolo Bonzini 提交于 1月 15, 2014

After the previous patch from Marcelo, the comment before this write
became obsolete.  In fact, the write is unnecessary.  The calls to
kvm_write_tsc ultimately result in a master clock update as soon as
all TSCs agree and the master clock is re-enabled.  This master
clock update will rewrite tsc_timestamp.

So, together with the comment, delete the dead write too.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aab6d7ce

15 1月, 2014 2 次提交

KVM: x86: fix tsc catchup issue with tsc scaling · f25e656d

由 Marcelo Tosatti 提交于 1月 06, 2014

To fix a problem related to different resolution of TSC and system clock,
the offset in TSC units is approximated by

delta = vcpu->hv_clock.tsc_timestamp 	- 	vcpu->last_guest_tsc

(Guest TSC value at 			(Guest TSC value at last VM-exit)
the last kvm_guest_time_update
call)

Delta is then later scaled using mult,shift pair found in hv_clock
structure (which is correct against tsc_timestamp in that
structure).

However, if a frequency change is performed between these two points,
this delta is measured using different TSC frequencies, but scaled using
mult,shift pair for one frequency only.

The end result is an incorrect delta.

The bug which this code works around is not the only cause for
clock backwards events. The global accumulator is still
necessary, so remove the max_kernel_ns fix and rely on the
global accumulator for no clock backwards events.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f25e656d

KVM: x86: limit PIT timer frequency · 9ed96e87

由 Marcelo Tosatti 提交于 1月 06, 2014

Limit PIT timer frequency similarly to the limit applied by
LAPIC timer.

Cc: stable@kernel.org
Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9ed96e87

13 12月, 2013 3 次提交

KVM: x86: Add comment on vcpu_enter_guest()'s return value · 9357d939

由 Takuya Yoshikawa 提交于 12月 13, 2013

Giving proper names to the 0 and 1 was once suggested. But since 0 is
returned to the userspace, giving it another name can introduce extra
confusion. This patch just explains the meanings instead.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9357d939

KVM: Use cond_resched() directly and remove useless kvm_resched() · c08ac06a

由 Takuya Yoshikawa 提交于 12月 13, 2013

Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers
to make kvm preemptible"), the remaining stuff in this function is a
simple cond_resched() call with an extra need_resched() check which was
there to avoid dropping VCPUs unnecessarily. Now it is meaningless.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c08ac06a

KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) · fda4e2e8

由 Andy Honig 提交于 11月 20, 2013

In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
potential to corrupt kernel memory if userspace provides an address that
is at the end of a page.  This patches concerts those functions to use
kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
vapic_address specified by userspace during ioctl processing and returns
an error to userspace if the address is not a valid GPA.

This is generally not guest triggerable, because the required write is
done by firmware that runs before the guest.  Also, it only affects AMD
processors and oldish Intel that do not have the FlexPriority feature
(unless you disable FlexPriority, of course; then newer processors are
also affected).

Fixes: b93463aa ('KVM: Accelerated apic support')
Reported-by: NAndrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NAndrew Honig <ahonig@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fda4e2e8

06 11月, 2013 1 次提交

kvm: optimize out smp_mb after srcu_read_unlock · 01b71917

由 Michael S. Tsirkin 提交于 11月 04, 2013

I noticed that srcu_read_lock/unlock both have a memory barrier,
so just by moving srcu_read_unlock earlier we can get rid of
one call to smp_mb() using smp_mb__after_srcu_read_unlock instead.

Unsurprisingly, the gain is small but measureable using the unit test
microbenchmark:
before
        vmcall in the ballpark of 1410 cycles
after
        vmcall in the ballpark of 1360 cycles
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

01b71917

31 10月, 2013 7 次提交

KVM: x86: fix KVM_SET_XCRS loop · c67a04cb

由 Paolo Bonzini 提交于 10月 17, 2013

The loop was always using 0 as the index.  This means that
any rubbish after the first element of the array went undetected.
It seems reasonable to assume that no KVM userspace did that.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c67a04cb

KVM: x86: fix KVM_SET_XCRS for CPUs that do not support XSAVE · 46c34cb0

由 Paolo Bonzini 提交于 10月 17, 2013

The KVM_SET_XCRS ioctl must accept anything that KVM_GET_XCRS
could return.  XCR0's bit 0 is always 1 in real processors with
XSAVE, and KVM_GET_XCRS will always leave bit 0 set even if the
emulated processor does not have XSAVE.  So, KVM_SET_XCRS must
ignore that bit when checking for attempts to enable unsupported
save states.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46c34cb0

kvm: Create non-coherent DMA registeration · e0f0bbc5

由 Alex Williamson 提交于 10月 30, 2013

We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present. Create an interface for this, adapting
legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
For now we assume that non-coherent DMA is possible any time we have a
VFIO group. Eventually an interface can be developed as part of the
VFIO external user interface to query the coherency of a group.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0f0bbc5

kvm/x86: Convert iommu_flags to iommu_noncoherent · d96eb2c6

由 Alex Williamson 提交于 10月 30, 2013

Default to operating in coherent mode.  This simplifies the logic when
we switch to a model of registering and unregistering noncoherent I/O
with KVM.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d96eb2c6

kvm, emulator: Rename VendorSpecific flag · b51e974f

由 Borislav Petkov 提交于 9月 22, 2013

Call it EmulateOnUD which is exactly what we're trying to do with
vendor-specific instructions.

Rename ->only_vendor_specific_insn to something shorter, while at it.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b51e974f

kvm, emulator: Use opcode length · 1ce19dc1

由 Borislav Petkov 提交于 9月 22, 2013

Add a field to the current emulation context which contains the
instruction opcode length. This will streamline handling of opcodes of
different length.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ce19dc1

kvm: Add KVM_GET_EMULATED_CPUID · 9c15bb1d

由 Borislav Petkov 提交于 9月 22, 2013

Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.

Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.

s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c15bb1d

17 10月, 2013 1 次提交

kvm: Add struct kvm arg to memslot APIs · 5587027c

由 Aneesh Kumar K.V 提交于 10月 07, 2013

We will use that in the later patch to find the kvm ops handler
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5587027c

15 10月, 2013 1 次提交

KVM: Drop FOLL_GET in GUP when doing async page fault · f2e10669

由 chai wen 提交于 10月 14, 2013

Page pinning is not mandatory in kvm async page fault processing since
after async page fault event is delivered to a guest it accesses page once
again and does its own GUP.  Drop the FOLL_GET flag in GUP in async_pf
code, and do some simplifying in check/clear processing.
Suggested-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NGu zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Nchai wen <chaiw.fnst@cn.fujitsu.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f2e10669

03 10月, 2013 4 次提交

KVM: mmu: change useless int return types to void · 8a3c1a33

由 Paolo Bonzini 提交于 10月 02, 2013

kvm_mmu initialization is mostly filling in function pointers, there is
no way for it to fail.  Clean up unused return values.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8a3c1a33

KVM: mmu: remove uninteresting MMU "new_cr3" callbacks · d8d173da

由 Paolo Bonzini 提交于 10月 02, 2013

The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit
e676505a (KVM: MMU: Force cr3 reload with two dimensional paging on mov
cr3 emulation, 2012-07-08).

The commit message mentioned that "mmu_free_roots() is somewhat of an overkill,
but fixing that is more complicated and will be done after this minimal fix".
One year has passed, and no one really felt the need to do a different fix.
Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the
callback.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

d8d173da

KVM: x86: only copy XSAVE state for the supported features · 4344ee98

由 Paolo Bonzini 提交于 10月 02, 2013

This makes the interface more deterministic for userspace, which can expect
(after configuring only the features it supports) to get exactly the same
state from the kernel, independent of the host CPU and kernel version.
Suggested-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

4344ee98

KVM: x86: prevent setting unsupported XSAVE states · d7876f1b

由 Paolo Bonzini 提交于 10月 02, 2013

A guest can still attempt to save and restore XSAVE states even if they
have been masked in CPUID leaf 0Dh.  This usually is not visible to
the guest, but is still wrong: "Any attempt to set a reserved bit (as
determined by the contents of EAX and EDX after executing CPUID with
EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP
exception".

The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE.
This catches migration from newer to older kernel/processor before the
guest starts running.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

d7876f1b

30 9月, 2013 1 次提交

KVM: Convert kvm_lock back to non-raw spinlock · 2f303b74

由 Paolo Bonzini 提交于 9月 25, 2013

In commit e935b837 ("KVM: Convert kvm_lock to raw_spinlock"),
the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
function tries to grab the (non-raw) mmu_lock within the scope of
the raw locked kvm_lock being held.  This leads to the following:

BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]

Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
Call Trace:
 [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
 [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
 [<ffffffff811185bf>] kswapd+0x18f/0x490
 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
 [<ffffffff81060d2b>] kthread+0xdb/0xe0
 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
 [<ffffffff81060c50>] ? __init_kthread_worker+0x

After the previous patch, kvm_lock need not be a raw spinlock anymore,
so change it back.
Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Cc: kvm@vger.kernel.org
Cc: gleb@redhat.com
Cc: jan.kiszka@siemens.com
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f303b74

28 8月, 2013 1 次提交

KVM: x86: update masterclock when kvmclock_offset is calculated (v2) · 2e762ff7

由 Marcelo Tosatti 提交于 8月 27, 2013

The offset to add to the hosts monotonic time, kvmclock_offset, is
calculated against the monotonic time at KVM_SET_CLOCK ioctl time.

Request a master clock update at this time, to reduce a potentially
unbounded difference between the values of the masterclock and
the clock value used to calculate kvmclock_offset.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

2e762ff7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功