提交 · 42057e1825cc581d46a46495e9ddc3f243f31539 · openeuler / raspberrypi-kernel

29 9月, 2017 1 次提交

KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume · 305d0ab4

由 Wanpeng Li 提交于 9月 28, 2017

------------[ cut here ]------------
 WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0+ #17
 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
 Call Trace:
  ? emulator_read_emulated+0x15/0x20 [kvm]
  ? segmented_read+0xae/0xf0 [kvm]
  vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
  x86_emulate_instruction+0x733/0x810 [kvm]
  vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
  ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
  kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2

A nested #PF is triggered during L0 emulating instruction for L2. However, it
doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
it by queuing the #PF exception instead ,requesting an immediate VM exit from
L2 and keeping the exception for L1 pending for a subsequent nested VM exit.

This should actually work all the time, making vmx_inject_page_fault_nested
totally unnecessary.  However, that's not working yet, so this patch can work
around the issue in the meanwhile.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

305d0ab4

28 9月, 2017 1 次提交

KVM: VMX: use cmpxchg64 · c0a1666b

由 Paolo Bonzini 提交于 9月 28, 2017

This fixes a compilation failure on 32-bit systems.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c0a1666b

27 9月, 2017 3 次提交

KVM: VMX: simplify and fix vmx_vcpu_pi_load · 31afb2ea

由 Paolo Bonzini 提交于 6月 06, 2017

The simplify part: do not touch pi_desc.nv, we can set it when the
VCPU is first created.  Likewise, pi_desc.sn is only handled by
vmx_vcpu_pi_load, do not touch it in __pi_post_block.

The fix part: do not check kvm_arch_has_assigned_device, instead
check the SN bit to figure out whether vmx_vcpu_pi_put ran before.
This matches what the previous patch did in pi_post_block.

Cc: Huangweidong <weidong.huang@huawei.com>
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: wangxin <wangxinxin.wang@huawei.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

31afb2ea

KVM: VMX: avoid double list add with VT-d posted interrupts · 8b306e2f

由 Paolo Bonzini 提交于 6月 06, 2017

In some cases, for example involving hot-unplug of assigned
devices, pi_post_block can forget to remove the vCPU from the
blocked_vcpu_list.  When this happens, the next call to
pi_pre_block corrupts the list.

Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
and WARN instead of adding the element twice in the list.  Second,
always do the list removal in pi_post_block if vcpu->pre_pcpu is
set (not -1).

The new code keeps interrupts disabled for the whole duration of
pi_pre_block/pi_post_block.  This is not strictly necessary, but
easier to follow.  For the same reason, PI.ON is checked only
after the cmpxchg, and to handle it we just call the post-block
code.  This removes duplication of the list removal code.

Cc: Huangweidong <weidong.huang@huawei.com>
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: wangxin <wangxinxin.wang@huawei.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b306e2f

KVM: VMX: extract __pi_post_block · cd39e117

由 Paolo Bonzini 提交于 6月 06, 2017

Simple code movement patch, preparing for the next one.

Cc: Huangweidong <weidong.huang@huawei.com>
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: wangxin <wangxinxin.wang@huawei.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cd39e117

26 9月, 2017 1 次提交

x86/fpu: Rename fpu__activate_curr() to fpu__initialize() · 2ce03d85

由 Ingo Molnar 提交于 9月 23, 2017

Rename this function to better express that it's all about
initializing the FPU state of a task which goes hand in hand
with the fpu::initialized field.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

2ce03d85

23 9月, 2017 1 次提交

x86/asm: Fix inline asm call constraints for Clang · f5caf621

由 Josh Poimboeuf 提交于 9月 20, 2017

For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
	register void *__sp asm(_ASM_SP);
	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes Clang to corrupt the stack pointer.

The fix is easy: convert the stack pointer register variable to a global
variable.

It should be noted that the end result is different based on the GCC
version.  With GCC 6.4, this patch has exactly the same result as
before:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9820389		9491555		8816046		8516940
 after	9820389		9491555		8816046		8516940

With GCC 7.2, however, GCC's behavior has changed.  It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm.  (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.)  It's a bit
overkill, but the performance impact should be negligible.  And in fact,
there's a nice improvement with frame pointers disabled:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9796316		9468236		9076191		8790305
 after	9796957		9464267		9076381		8785949

So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.
Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: NMatthias Kaehlcke <mka@chromium.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f5caf621

22 9月, 2017 1 次提交

KVM: nVMX: fix HOST_CR3/HOST_CR4 cache · 44889942

由 Ladi Prosek 提交于 9月 22, 2017

For nested virt we maintain multiple VMCS that can run on a vCPU. So it is
incorrect to keep vmcs_host_cr3 and vmcs_host_cr4, whose purpose is caching
the value of the rarely changing HOST_CR3 and HOST_CR4 VMCS fields, in
vCPU-wide data structures.

Hyper-V nested on KVM runs into this consistently for me with PCID enabled.
CR3 is updated with a new value, unlikely(cr3 != vmx->host_state.vmcs_host_cr3)
fires, and the currently loaded VMCS is updated. Then we switch from L2 to
L1 and the next exit reverts CR3 to its old value.

Fixes: d6e41f11 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44889942

19 9月, 2017 3 次提交

KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt · 5753743f

由 Haozhong Zhang 提交于 9月 18, 2017

WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)) in kvm_vcpu_trigger_posted_interrupt()
intends to detect the violation of invariant that VT-d PI notification
event is not suppressed when vcpu is in the guest mode. Because the
two checks for the target vcpu mode and the target suppress field
cannot be performed atomically, the target vcpu mode may change in
between. If that does happen, WARN_ON_ONCE() here may raise false
alarms.

As the previous patch fixed the real invariant breaker, remove this
WARN_ON_ONCE() to avoid false alarms, and document the allowed cases
instead.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Reported-by: N"Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d6 ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

5753743f

KVM: VMX: do not change SN bit in vmx_update_pi_irte() · dc91f2eb

由 Haozhong Zhang 提交于 9月 18, 2017

In kvm_vcpu_trigger_posted_interrupt() and pi_pre_block(), KVM
assumes that PI notification events should not be suppressed when the
target vCPU is not blocked.

vmx_update_pi_irte() sets the SN field before changing an interrupt
from posting to remapping, but it does not check the vCPU mode.
Therefore, the change of SN field may break above the assumption.
Besides, I don't see reasons to suppress notification events here, so
remove the changes of SN field to avoid race condition.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Reported-by: N"Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d6 ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

dc91f2eb

KVM: x86: Fix the NULL pointer parameter in check_cr_write() · d6500149

由 Yu Zhang 提交于 9月 18, 2017

Routine check_cr_write() will trigger emulator_get_cpuid()->
kvm_cpuid() to get maxphyaddr, and NULL is passed as values
for ebx/ecx/edx. This is problematic because kvm_cpuid() will
dereference these pointers.

Fixes: d1cd3ce9 ("KVM: MMU: check guest CR3 reserved bits based on its physical address width.")
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d6500149

15 9月, 2017 7 次提交

kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly · 4f350c6d

由 Jim Mattson 提交于 9月 14, 2017

When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.

The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f350c6d

kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly · b060ca3b

由 Jim Mattson 提交于 9月 14, 2017

On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.

On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b060ca3b

kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry · 7881f96c

由 Jim Mattson 提交于 9月 14, 2017

After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7881f96c

kvm,lapic: Justify use of swait_active() · cc1b4680

由 Davidlohr Bueso 提交于 9月 13, 2017

A comment might serve future readers.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc1b4680

KVM: VMX: Do not BUG() on out-of-bounds guest IRQ · 3a8b0677

由 Jan H. Schönherr 提交于 9月 07, 2017

The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)

Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).

This fixes CVE-2017-1000252.

Fixes: efc64404 ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a8b0677

kvm: nVMX: Don't allow L2 to access the hardware CR8 · 51aa68e7

由 Jim Mattson 提交于 9月 12, 2017

If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.

This fixes CVE-2017-12154.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

51aa68e7

KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously · 9a6e7c39

由 Wanpeng Li 提交于 9月 14, 2017

qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2

After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.

This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Fixes: 7c90705b ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9a6e7c39

14 9月, 2017 4 次提交

KVM: X86: Don't block vCPU if there is pending exception · a5f01f8e

由 Wanpeng Li 提交于 9月 13, 2017

Don't block vCPU if there is pending exception.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a5f01f8e

KVM: SVM: Add irqchip_split() checks before enabling AVIC · 67034bb9

由 Suravee Suthikulpanit 提交于 9月 12, 2017

SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.

So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
[Removed pr_debug - Radim.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

67034bb9

KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv() · b2a05fef

由 Suravee Suthikulpanit 提交于 9月 12, 2017

Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b2a05fef

KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu() · dfa20099

由 Suravee Suthikulpanit 提交于 9月 12, 2017

Preparing the base code for subsequent changes. This does not change
existing logic.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

dfa20099

13 9月, 2017 5 次提交

KVM: x86: fix clang build · 51537233

由 Radim Krčmář 提交于 9月 13, 2017

Clang resolves __builtin_constant_p() to false even if the expression is
constant in the end.  The only purpose of that expression was to
differentiate a case where the following expression couldn't be checked
at compile-time, so we can just remove the check.

Clang handles the following two correctly.  Turn it into BUG_ON if there
are any more problems with this.

Fixes: d6321d49 ("KVM: x86: generalize guest_cpuid_has_ helpers")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

51537233

KVM: x86: Fix immediate_exit handling for uninitialized AP · 2f173d26

由 Jan H. Schönherr 提交于 9月 06, 2017

When user space sets kvm_run->immediate_exit, KVM is supposed to
return quickly. However, when a vCPU is in KVM_MP_STATE_UNINITIALIZED,
the value is not considered and the vCPU blocks.

Fix that oversight.

Fixes: 460df4c1 ("KVM: race-free exit from KVM_RUN without POSIX signals")
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

2f173d26

KVM: x86: Fix handling of pending signal on uninitialized AP · a0595000

由 Jan H. Schönherr 提交于 9月 06, 2017

KVM API says that KVM_RUN will return with -EINTR when a signal is
pending. However, if a vCPU is in KVM_MP_STATE_UNINITIALIZED, then
the return value is unconditionally -EAGAIN.

Copy over some code from vcpu_run(), so that the case of a pending
signal results in the expected return value.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a0595000

KVM: SVM: Add a missing 'break' statement · 49a8afca

由 Jan H. Schönherr 提交于 9月 05, 2017

Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Fixes: f6511935 ("KVM: SVM: Add checks for IO instructions")
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

49a8afca

x86/paravirt: Remove no longer used paravirt functions · 87930019

由 Juergen Gross 提交于 9月 04, 2017

With removal of lguest some of the paravirt functions are no longer
needed:

	->read_cr4()
	->store_idt()
	->set_pmd_at()
	->set_pud_at()
	->pte_update()

Remove them.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

87930019

01 9月, 2017 1 次提交

KVM: update to new mmu_notifier semantic v2 · fb1522e0

由 Jérôme Glisse 提交于 8月 31, 2017

Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Changed since v1 (Linus Torvalds)
    - remove now useless kvm_arch_mmu_notifier_invalidate_page()
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
Tested-by: NMike Galbraith <efault@gmx.de>
Tested-by: NAdam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb1522e0

29 8月, 2017 1 次提交

x86/idt: Unify gate_struct handling for 32/64-bit kernels · 64b163fa

由 Thomas Gleixner 提交于 8月 28, 2017

The first 32 bits of gate struct are the same for 32 and 64 bit kernels.

The 32-bit version uses desc_struct and no designated data structure,
so we need different accessors for 32 and 64 bit kernels.

Aside of that the macros which are necessary to build the 32-bit
gate descriptor are horrible to read.

Unify the gate structs and switch all code fiddling with it over.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

64b163fa

26 8月, 2017 1 次提交

kvm/x86: Avoid clearing the C-bit in rsvd_bits() · ea2800dd

由 Brijesh Singh 提交于 8月 25, 2017

The following commit:

  d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")

uses __sme_clr() to remove the C-bit in rsvd_bits(). rsvd_bits() is
just a simple function to return some 1 bits. Applying a mask based
on properties of the host MMU is incorrect. Additionally, the masks
computed by __reset_rsvds_bits_mask also apply to guest page tables,
where the C bit is reserved since we don't emulate SME.

The fix is to clear the C-bit from rsvd_bits_mask array after it has been
populated from __reset_rsvds_bits_mask()
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: kvm@vger.kernel.org
Cc: paolo.bonzini@gmail.com
Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Link: http://lkml.kernel.org/r/20170825205540.123531-1-brijesh.singh@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ea2800dd

25 8月, 2017 10 次提交

kvm: nVMX: Validate the virtual-APIC address on nested VM-entry · 712b12d7

由 Jim Mattson 提交于 8月 24, 2017

According to the SDM, if the "use TPR shadow" VM-execution control is
1, bits 11:0 of the virtual-APIC address must be 0 and the address
should set any bits beyond the processor's physical-address width.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

712b12d7

KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state · 38cfd5e3

由 Paolo Bonzini 提交于 8月 23, 2017

The host pkru is restored right after vcpu exit (commit 1be0e61c), so
KVM_GET_XSAVE will return the host PKRU value instead.  Fix this by
using the guest PKRU explicitly in fill_xsave and load_xsave.  This
part is based on a patch by Junkang Fu.

The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
because it could have been changed by userspace since the last time
it was saved, so skip loading it in kvm_load_guest_fpu.
Reported-by: NJunkang Fu <junkang.fjk@alibaba-inc.com>
Cc: Yang Zhang <zy107165@alibaba-inc.com>
Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

38cfd5e3

KVM: x86: simplify handling of PKRU · b9dd21e1

由 Paolo Bonzini 提交于 8月 23, 2017

Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a
simple comparison against the host value of the register.  The write of
PKRU in addition can be skipped if the guest has not enabled the feature.
Once we do this, we need not test OSPKE in the host anymore, because
guest_CR4.PKE=1 implies host_CR4.PKE=1.

The static PKU test is kept to elide the code on older CPUs.
Suggested-by: NYang Zhang <zy107165@alibaba-inc.com>
Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9dd21e1

KVM: x86: block guest protection keys unless the host has them enabled · c469268c

由 Paolo Bonzini 提交于 8月 24, 2017

If the host has protection keys disabled, we cannot read and write the
guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0.  Block
the PKU cpuid bit in that case.

This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1.

Fixes: 1be0e61c
Cc: stable@vger.kernel.org
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c469268c

KVM: nVMX: Fix trying to cancel vmlauch/vmresume · bfcf83b1

由 Wanpeng Li 提交于 8月 24, 2017

------------[ cut here ]------------
WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc4+ #11
RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
Call Trace:
 ? kvm_multiple_exception+0x149/0x170 [kvm]
 ? handle_emulation_failure+0x79/0x230 [kvm]
 ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel]
 ? check_chain_key+0x137/0x1e0
 ? reexecute_instruction.part.168+0x130/0x130 [kvm]
 nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
 ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
 vmx_queue_exception+0x197/0x300 [kvm_intel]
 kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm]
 ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm]
 ? preempt_count_sub+0x18/0xc0
 ? restart_apic_timer+0x17d/0x300 [kvm]
 ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm]
 ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm]
 kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
 ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
 ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm]

The flag "nested_run_pending", which can override the decision of which should run
next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This
is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to
be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending
is especially intended to avoid switching  to L1 in the injection decision-point.

This can be handled just like the other cases in vmx_check_nested_events, instead of
having a special case in vmx_queue_exception.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bfcf83b1

KVM: X86: Fix loss of exception which has not yet been injected · 664f8e26

由 Wanpeng Li 提交于 8月 24, 2017

vmx_complete_interrupts() assumes that the exception is always injected,
so it can be dropped by kvm_clear_exception_queue().  However,
an exception cannot be injected immediately if it is: 1) originally
destined to a nested guest; 2) trapped to cause a vmexit; 3) happening
right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true.

This patch applies to exceptions the same algorithm that is used for
NMIs, replacing exception.reinject with "exception.injected" (equivalent
to nmi_injected).

exception.pending now represents an exception that is queued and whose
side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet.
If exception.pending is true, the exception might result in a nested
vmexit instead, too (in which case the side effects must not be applied).

exception.injected instead represents an exception that is going to be
injected into the guest at the next vmentry.
Reported-by: NRadim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

664f8e26

KVM: VMX: use kvm_event_needs_reinjection · 274bba52

由 Wanpeng Li 提交于 8月 24, 2017

Use kvm_event_needs_reinjection() encapsulation.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

274bba52

KVM: MMU: speedup update_permission_bitmask · 09f037aa

由 Paolo Bonzini 提交于 8月 24, 2017

update_permission_bitmask currently does a 128-iteration loop to,
essentially, compute a constant array.  Computing the 8 bits in parallel
reduces it to 16 iterations, and is enough to speed it up substantially
because many boolean operations in the inner loop become constants or
simplify noticeably.

Because update_permission_bitmask is actually the top item in the profile
for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
clock cycles, or up to 30%:

                                         before     after
   cpuid                                 35173      25954
   vmcall                                35122      27079
   inl_from_pmtimer                      52635      42675
   inl_from_qemu                         53604      44599
   inl_from_kernel                       38498      30798
   outl_to_kernel                        34508      28816
   wr_tsc_adjust_msr                     34185      26818
   rd_tsc_adjust_msr                     37409      27049
   mmio-no-eventfd:pci-mem               50563      45276
   mmio-wildcard-eventfd:pci-mem         34495      30823
   mmio-datamatch-eventfd:pci-mem        35612      31071
   portio-no-eventfd:pci-io              44925      40661
   portio-wildcard-eventfd:pci-io        29708      27269
   portio-datamatch-eventfd:pci-io       31135      27164

(I wrote a small C program to compare the tables for all values of CR0.WP,
CR4.SMAP and CR4.SMEP, and they match).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09f037aa

KVM: MMU: Expose the LA57 feature to VM. · fd8cb433

由 Yu Zhang 提交于 8月 24, 2017

This patch exposes 5 level page table feature to the VM.
At the same time, the canonical virtual address checking is
extended to support both 48-bits and 57-bits address width.
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd8cb433

KVM: MMU: Add 5 level EPT & Shadow page table support. · 855feb67

由 Yu Zhang 提交于 8月 24, 2017

Extends the shadow paging code, so that 5 level shadow page
table can be constructed if VM is running in 5 level paging
mode.

Also extends the ept code, so that 5 level ept table can be
constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
shadow logic, KVM should still use 4 level ept table for a VM
whose physical address width is less than 48 bits, even when
the VM is running in 5 level paging mode.
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
[Unconditionally reset the MMU context in kvm_cpuid_update.
 Changing MAXPHYADDR invalidates the reserved bit bitmasks.
 - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

855feb67