提交 · 75933433d666c2ab13a7a93f4ec1e6f000a94ffc · openeuler / Kernel

11 9月, 2015 1 次提交

kexec: split kexec_load syscall from kexec core code · 2965faa5

由 Dave Young 提交于 9月 09, 2015

There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NDave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2965faa5

09 9月, 2015 1 次提交

mm: rename alloc_pages_exact_node() to __alloc_pages_node() · 96db800f

由 Vlastimil Babka 提交于 9月 08, 2015

alloc_pages_exact_node() was introduced in commit 6484eb3e ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise.  In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see for example
commits 5265047a ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb4 ("mm, mempolicy:
migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage.  Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly.  The number of users would be small
anyway.

Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead.  This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid < 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.

Both differences will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers.  Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NRobin Holt <robinmholt@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96db800f

06 9月, 2015 3 次提交

Silence compiler warning in arch/x86/kvm/emulate.c · e8dd2d2d

由 Valdis Kletnieks 提交于 8月 29, 2015

Compiler warning:

 CC [M]  arch/x86/kvm/emulate.o
arch/x86/kvm/emulate.c: In function "__do_insn_fetch_bytes":
arch/x86/kvm/emulate.c:814:9: warning: "linear" may be used uninitialized in this function [-Wmaybe-uninitialized]

GCC is smart enough to realize that the inlined __linearize may return before
setting the value of linear, but not smart enough to realize the same
X86EMU_CONTINUE blocks actual use of the value.  However, the value of
'linear' can only be set to one value, so hoisting the one line of code
upwards makes GCC happy with the code.
Reported-by: NAruna Hewapathirane <aruna.hewapathirane@gmail.com>
Tested-by: NAruna Hewapathirane <aruna.hewapathirane@gmail.com>
Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8dd2d2d

kvm: compile process_smi_save_seg_64() only for x86_64 · efbb288a

由 Alexander Kuleshov 提交于 9月 06, 2015

The process_smi_save_seg_64() function called only in the
process_smi_save_state_64() if the CONFIG_X86_64 is set. This
patch adds #ifdef CONFIG_X86_64 around process_smi_save_seg_64()
to prevent following warning message:

arch/x86/kvm/x86.c:5946:13: warning: â€˜process_smi_save_seg_64â€™ defined but not used [-Wunused-function]
 static void process_smi_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
             ^
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

efbb288a

KVM: x86: avoid uninitialized variable warning · 29ecd660

由 Paolo Bonzini 提交于 9月 06, 2015

This does not show up on all compiler versions, so it sneaked into the
first 4.3 pull request.  The fix is to mimic the logic of the "print
sptes" loop in the "fill array" loop.  Then leaf and root can be
both initialized unconditionally.

Note that "leaf" now points to the first unused element of the array,
not the last filled element.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

29ecd660

15 8月, 2015 1 次提交

x86/kvm: Rename VMX's segment access rights defines · 4d283ec9

由 Andy Lutomirski 提交于 8月 13, 2015

VMX encodes access rights differently from LAR, and the latter is
most likely what x86 people think of when they think of "access
rights".

Rename them to avoid confusion.

Cc: kvm@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4d283ec9

11 8月, 2015 2 次提交

KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn · b6bb424b

由 Wei Huang 提交于 8月 07, 2015

According to AMD programmer's manual, AMD PERFCTRn is 64-bit MSR which,
unlike Intel perf counters, doesn't require signed extension. This
patch removes the unnecessary conversion in SVM vPMU code when PERFCTRn
is being updated.
Signed-off-by: NWei Huang <wei@redhat.com>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6bb424b

kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic · 603242a8

由 Nicholas Krause 提交于 8月 05, 2015

This fixes error handling in the function kvm_lapic_sync_from_vapic
by checking if the call to kvm_read_guest_cached has returned a
error code to signal to its caller the call to this function has
failed and due to this we must immediately return to the caller
of kvm_lapic_sync_from_vapic to avoid incorrectly call apic_set_tpc
if a error has occurred here.
Signed-off-by: NNicholas Krause <xerofoify@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

603242a8

07 8月, 2015 2 次提交

KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST · d7add054

由 Haozhong Zhang 提交于 8月 07, 2015

When kvm_set_msr_common() handles a guest's write to
MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data
written by guest and then use it to adjust TSC offset by calling a
call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset()
indicates whether the adjustment is in host TSC cycles or in guest TSC
cycles. If SVM TSC scaling is enabled, adjust_tsc_offset()
[i.e. svm_adjust_tsc_offset()] will first scale the adjustment;
otherwise, it will just use the unscaled one. As the MSR write here
comes from the guest, the adjustment is in guest TSC cycles. However,
the current kvm_set_msr_common() uses it as a value in host TSC
cycles (by using true as the 3rd parameter of adjust_tsc_offset()),
which can result in an incorrect adjustment of TSC offset if SVM TSC
scaling is enabled. This patch fixes this problem.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Cc: stable@vger.linux.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d7add054

KVM: x86: zero IDT limit on entry to SMM · 18c3626e

由 Paolo Bonzini 提交于 8月 07, 2015

The recent BlackHat 2015 presentation "The Memory Sinkhole"
mentions that the IDT limit is zeroed on entry to SMM.

This is not documented, and must have changed some time after 2010
(see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf).
KVM was not doing it, but the fix is easy.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

18c3626e

05 8月, 2015 11 次提交

KVM: VMX: drop ept misconfig check · f735d4af

由 Xiao Guangrong 提交于 8月 05, 2015

The logic used to check ept misconfig is completely contained in common
reserved bits check for sptes, so it can be removed
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f735d4af

KVM: MMU: fully check zero bits for sptes · 47ab8751

由 Xiao Guangrong 提交于 8月 05, 2015

The #PF with PFEC.RSV = 1 is designed to speed MMIO emulation, however,
it is possible that the RSV #PF is caused by real BUG by mis-configure
shadow page table entries

This patch enables full check for the zero bits on shadow page table
entries (which includes not only bits reserved by the hardware, but also
bits that will never be set in the SPTE), then dump the shadow page table
hierarchy.
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

47ab8751

KVM: MMU: introduce is_shadow_zero_bits_set() · d625b155

由 Xiao Guangrong 提交于 8月 05, 2015

We have the same data struct to check reserved bits on guest page tables
and shadow page tables, split is_rsvd_bits_set() so that the logic can be
shared between these two paths
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d625b155

KVM: MMU: introduce the framework to check zero bits on sptes · c258b62b

由 Xiao Guangrong 提交于 8月 05, 2015

We have abstracted the data struct and functions which are used to check
reserved bit on guest page tables, now we extend the logic to check
zero bits on shadow page tables

The zero bits on sptes include not only reserved bits on hardware but also
the bits that SPTEs willnever use.  For example, shadow pages will never
use GB pages unless the guest uses them too.
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c258b62b

KVM: MMU: split reset_rsvds_bits_mask_ept · 81b8eebb

由 Xiao Guangrong 提交于 8月 05, 2015

Since shadow ept page tables and Intel nested guest page tables have the
same format, split reset_rsvds_bits_mask_ept so that the logic can be
reused by later patches which check zero bits on sptes
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81b8eebb

KVM: MMU: split reset_rsvds_bits_mask · 6dc98b86

由 Xiao Guangrong 提交于 8月 05, 2015

Since softmmu & AMD nested shadow page tables and guest page tables have
the same format, split reset_rsvds_bits_mask so that the logic can be
reused by later patches which check zero bits on sptes
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6dc98b86

KVM: MMU: introduce rsvd_bits_validate · a0a64f50

由 Xiao Guangrong 提交于 8月 05, 2015

These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are
used to check if reserved bits set on guest ptes, move them to a data
struct so that the approach can be applied to check host shadow page
table entries as well
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a0a64f50

KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c · d2b0f981

由 Xiao Guangrong 提交于 8月 05, 2015

FNAME(is_rsvd_bits_set) does not depend on guest mmu mode, move it
to mmu.c to stop being compiled multiple times
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d2b0f981

KVM: MMU: fix validation of mmio page fault · 6f691251

由 Xiao Guangrong 提交于 8月 05, 2015

We got the bug that qemu complained with "KVM: unknown exit, hardware
reason 31" and KVM shown these info:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1

This is because we got a mmio #PF and the handler see the mmio spte becomes
normal (points to the ram page)

However, this is valid after introducing fast mmio spte invalidation which
increases the generation-number instead of zapping mmio sptes, a example
is as follows:
1. QEMU drops mmio region by adding a new memslot
2. invalidate all mmio sptes
3.

        VCPU 0                        VCPU 1
    access the invalid mmio spte
                            access the region originally was MMIO before
                            set the spte to the normal ram map

    mmio #PF
    check the spte and see it becomes normal ram mapping !!!

This patch fixes the bug just by dropping the check in mmio handler, it's
good for backport. Full check will be introduced in later patches
Reported-by: NPavel Shirshov <ru.pchel@gmail.com>
Tested-by: NPavel Shirshov <ru.pchel@gmail.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6f691251

KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON · 9c33ae0c

由 Alex Williamson 提交于 8月 04, 2015

The patch was munged on commit to re-order these tests resulting in
excessive warnings when trying to do device assignment. Return to
original ordering: https://lkml.org/lkml/2015/7/15/769

Fixes: 3e5d2fdc ("KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type")
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c33ae0c

KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON · fc1a8126

由 Alex Williamson 提交于 8月 04, 2015

The patch was munged on commit to re-order these tests resulting in
excessive warnings when trying to do device assignment. Return to
original ordering: https://lkml.org/lkml/2015/7/15/769

fc1a8126

30 7月, 2015 1 次提交

KVM: x86: clean/fix memory barriers in irqchip_in_kernel · 71ba994c

由 Paolo Bonzini 提交于 7月 29, 2015

The memory barriers are trying to protect against concurrent RCU-based
interrupt injection, but the IRQ routing table is not valid at the time
kvm->arch.vpic is written. Fix this by writing kvm->arch.vpic last.
kvm_destroy_pic then need not set kvm->arch.vpic to NULL; modify it
to take a struct kvm_pic* and reuse it if the IOAPIC creation fails.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71ba994c

29 7月, 2015 2 次提交

KVM: x86: remove unnecessary memory barriers for shared MSRs · c847fe88

由 Paolo Bonzini 提交于 7月 29, 2015

There is no smp_rmb matching the smp_wmb. shared_msr_update is called from
hardware_enable, which in turn is called via on_each_cpu. on_each_cpu
and must imply a read memory barrier (on x86 the rmb is achieved simply
through asm volatile in native_apic_mem_write).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c847fe88

P
KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 · d71ba788
由 Paolo Bonzini 提交于 7月 29, 2015
```
This is another remnant of ia64 support.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
d71ba788

23 7月, 2015 11 次提交

KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask · 54928303

由 Paolo Bonzini 提交于 7月 10, 2015

We can disable CD unconditionally when there is no assigned device.
KVM now forces guest PAT to all-writeback in that case, so it makes
sense to also force CR0.CD=0.

When there are assigned devices, emulate cache-disabled operation
through the page tables.  This behavior is consistent with VMX
microcode, where CD/NW are not touched by vmentry/vmexit.  However,
keep this dependent on the quirk because OVMF enables the caches
too late.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54928303

kvm/x86: add support for MONITOR_TRAP_FLAG · 5f3d45e7

由 Mihai Donțu 提交于 7月 05, 2015

Allow a nested hypervisor to single step its guests.
Signed-off-by: NMihai Donțu <mihai.dontu@gmail.com>
[Fix overlong line. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f3d45e7

kvm/x86: add sending hyper-v crash notification to user space · 2ce79189

由 Andrey Smetanin 提交于 7月 03, 2015

Sending of notification is done by exiting vcpu to user space
if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
the kvm_run structure contains system_event with type
KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NPeter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2ce79189

kvm/x86: added hyper-v crash msrs into kvm hyperv context · e7d9513b

由 Andrey Smetanin 提交于 7月 03, 2015

Added kvm Hyper-V context hv crash variables as storage
of Hyper-V crash msrs.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NPeter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e7d9513b

kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file · e83d5887

由 Andrey Smetanin 提交于 7月 03, 2015

This patch introduce Hyper-V related source code file - hyperv.c and
per vm and per vcpu hyperv context structures.
All Hyper-V MSR's and hypercall code moved into hyperv.c.
All Hyper-V kvm/vcpu fields moved into appropriate hyperv context
structures. Copyrights and authors information copied from x86.c
to hyperv.c.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NPeter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e83d5887

KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions · f9eb4af6

由 Eugene Korenevsky 提交于 4月 17, 2015

According to Intel SDM several checks must be applied for memory operands
of VMX instructions.

Long mode: #GP(0) or #SS(0) depending on the segment must be thrown
if the memory address is in a non-canonical form.

Protected mode, checks in chronological order:
- The segment type must be checked with access type (read or write) taken
into account.
	For write access: #GP(0) must be generated if the destination operand
		is located in a read-only data segment or any code segment.
	For read access: #GP(0) must be generated if if the source operand is
		located in an execute-only code segment.
- Usability of the segment must be checked. #GP(0) or #SS(0) depending on the
	segment must be thrown if the segment is unusable.
- Limit check. #GP(0) or #SS(0) depending on the segment must be
	thrown if the memory operand effective address is outside the segment
	limit.
Signed-off-by: NEugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f9eb4af6

KVM: x86: rename quirk constants to KVM_X86_QUIRK_* · 0da029ed

由 Paolo Bonzini 提交于 7月 23, 2015

Make them clearly architecture-dependent; the capability is valid for
all architectures, but the argument is not.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0da029ed

KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED · fb279950

由 Xiao Guangrong 提交于 7月 16, 2015

OVMF depends on WB to boot fast, because it only clears caches after
it has set up MTRRs---which is too late.

Let's do writeback if CR0.CD is set to make it happy, similar to what
SVM is already doing.
Signed-off-by: NXiao Guangrong <guangrong.xiao@intel.com>
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb279950

KVM: x86: introduce kvm_check_has_quirk · 41dbc6bc

由 Paolo Bonzini 提交于 7月 23, 2015

The logic of the disabled_quirks field usually results in a double
negation.  Wrap it in a simple function that checks the bit and
negates it.

Based on a patch from Xiao Guangrong.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

41dbc6bc

KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type · 3e5d2fdc

由 Xiao Guangrong 提交于 7月 16, 2015

kvm_mtrr_get_guest_memory_type never returns -1 which is implied
in the current code since if @type = -1 (means no MTRR contains the
range), iter.partial_map must be true

Simplify the code to indicate this fact
Signed-off-by: NXiao Guangrong <guangrong.xiao@intel.com>
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3e5d2fdc

KVM: MTRR: fix memory type handling if MTRR is completely disabled · 10dc331f

由 Xiao Guangrong 提交于 7月 16, 2015

Currently code uses default memory type if MTRR is fully disabled,
fix it by using UC instead.
Signed-off-by: NXiao Guangrong <guangrong.xiao@intel.com>
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

10dc331f

10 7月, 2015 5 次提交

kvm: x86: fix load xsave feature warning · ee4100da

由 Wanpeng Li 提交于 7月 09, 2015

[   68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]()
[   68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep bluetooth i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace sunrpc fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss x86_pkg_temp_thermal snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc ablk_helper snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul i2c_core glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw acpi_cpufreq ext4 mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e ptp pps_core
[   68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ #2
[   68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
[   68.197007]  ffffffffa03b0657 ffff8800d984bca8 ffffffff815915a2 0000000000000000
[   68.197009]  0000000000000000 ffff8800d984bce8 ffffffff81057c0a 00007ff6d0001000
[   68.197010]  0000000000000002 ffff880211c1a000 0000000000000004 ffff8800ce0288c0
[   68.197012] Call Trace:
[   68.197017]  [<ffffffff815915a2>] dump_stack+0x45/0x57
[   68.197020]  [<ffffffff81057c0a>] warn_slowpath_common+0x8a/0xc0
[   68.197022]  [<ffffffff81057cfa>] warn_slowpath_null+0x1a/0x20
[   68.197029]  [<ffffffffa037bed8>] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]
[   68.197035]  [<ffffffffa037aede>] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm]
[   68.197040]  [<ffffffffa03696a6>] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm]
[   68.197043]  [<ffffffff811252d2>] ? perf_pmu_enable+0x22/0x30
[   68.197044]  [<ffffffff8112663e>] ? perf_event_context_sched_in+0x7e/0xb0
[   68.197048]  [<ffffffff811a6882>] do_vfs_ioctl+0x2c2/0x4a0
[   68.197050]  [<ffffffff8107bf33>] ? finish_task_switch+0x173/0x220
[   68.197053]  [<ffffffff8123307f>] ? selinux_file_ioctl+0x4f/0xd0
[   68.197055]  [<ffffffff8122cac3>] ? security_file_ioctl+0x43/0x60
[   68.197057]  [<ffffffff811a6ad9>] SyS_ioctl+0x79/0x90
[   68.197060]  [<ffffffff81597e57>] entry_SYSCALL_64_fastpath+0x12/0x6a
[   68.197061] ---[ end trace 558a5ebf9445fc80 ]---

After commit (0c4109be 'x86/fpu/xstate: Fix up bad get_xsave_addr()
assumptions'), there is no assumption an xsave bit is present in the
hardware (pcntxt_mask) that it is always present in a given xsave buffer.
An enabled state to be present on 'pcntxt_mask', but *not* in 'xstate_bv'
could happen when the last 'xsave' did not request that this feature be
saved (unlikely) or because the "init optimization" caused it to not be
saved. This patch kill the assumption.
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee4100da

KVM: x86: apply guest MTRR virtualization on host reserved pages · fd717f11

由 Paolo Bonzini 提交于 7月 07, 2015

Currently guest MTRR is avoided if kvm_is_reserved_pfn returns true.
However, the guest could prefer a different page type than UC for
such pages. A good example is that pass-throughed VGA frame buffer is
not always UC as host expected.

This patch enables full use of virtual guest MTRRs.
Suggested-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: Joerg Roedel <jroedel@suse.de> (on AMD)
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fd717f11

KVM: SVM: Sync g_pat with guest-written PAT value · e098223b

由 Jan Kiszka 提交于 4月 20, 2015

When hardware supports the g_pat VMCB field, we can use it for emulating
the PAT configuration that the guest configures by writing to the
corresponding MSR.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Tested-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e098223b

KVM: SVM: use NPT page attributes · 3c2e7f7d

由 Paolo Bonzini 提交于 7月 07, 2015

Right now, NPT page attributes are not used, and the final page
attribute depends solely on gPAT (which however is not synced
correctly), the guest MTRRs and the guest page attributes.

However, we can do better by mimicking what is done for VMX.
In the absence of PCI passthrough, the guest PAT can be ignored
and the page attributes can be just WB.  If passthrough is being
used, instead, keep respecting the guest PAT, and emulate the guest
MTRRs through the PAT field of the nested page tables.

The only snag is that WP memory cannot be emulated correctly,
because Linux's default PAT setting only includes the other types.
Tested-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3c2e7f7d

KVM: count number of assigned devices · 5544eb9b

由 Paolo Bonzini 提交于 7月 07, 2015

If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Tested-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5544eb9b

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功