提交 · fc5b7f3bf1e1414bd4e91db6918c85ace0c873a5 · openeuler / Kernel

11 4月, 2016 2 次提交

kvm: x86: do not leak guest xcr0 into host interrupt handlers · fc5b7f3b

由 David Matlack 提交于 3月 30, 2016

An interrupt handler that uses the fpu can kill a KVM VM, if it runs
under the following conditions:
 - the guest's xcr0 register is loaded on the cpu
 - the guest's fpu context is not loaded
 - the host is using eagerfpu

Note that the guest's xcr0 register and fpu context are not loaded as
part of the atomic world switch into "guest mode". They are loaded by
KVM while the cpu is still in "host mode".

Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
interrupt handler will look something like this:

if (irq_fpu_usable()) {
        kernel_fpu_begin();

        [... code that uses the fpu ...]

        kernel_fpu_end();
}

As long as the guest's fpu is not loaded and the host is using eager
fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
returns true). The interrupt handler proceeds to use the fpu with
the guest's xcr0 live.

kernel_fpu_begin() saves the current fpu context. If this uses
XSAVE[OPT], it may leave the xsave area in an undesirable state.
According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
xcr0[i] == 0 following an XSAVE.

kernel_fpu_end() restores the fpu context. Now if any bit i in
XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
fault is trapped and SIGSEGV is delivered to the current process.

Only pre-4.2 kernels appear to be vulnerable to this sequence of
events. Commit 653f52c3 ("kvm,x86: load guest FPU context more eagerly")
from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.

This patch fixes the bug by keeping the host's xcr0 loaded outside
of the interrupts-disabled region where KVM switches into guest mode.

Cc: stable@vger.kernel.org
Suggested-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
[Move load after goto cancel_injection. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fc5b7f3b

KVM: MMU: fix permission_fault() · 7a98205d

由 Xiao Guangrong 提交于 3月 25, 2016

kvm-unit-tests complained about the PFEC is not set properly, e.g,:
test pte.rw pte.d pte.nx pde.p pde.rw pde.pse user fetch: FAIL: error code 15
expected 5
Dump mapping: address: 0x123400000000
------L4: 3e95007
------L3: 3e96007
------L2: 2000083

It's caused by the reason that PFEC returned to guest is copied from the
PFEC triggered by shadow page table

This patch fixes it and makes the logic of updating errcode more clean
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
[Do not assume pfec.p=1. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a98205d

05 4月, 2016 1 次提交

kvm: x86: make lapic hrtimer pinned · 61abdbe0

由 Luiz Capitulino 提交于 4月 04, 2016

When a vCPU runs on a nohz_full core, the hrtimer used by
the lapic emulation code can be migrated to another core.
When this happens, it's possible to observe milisecond
latency when delivering timer IRQs to KVM guests.

The huge latency is mainly due to the fact that
apic_timer_fn() expects to run during a kvm exit. It
sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
entry. However, if the timer fires on a different core,
we have to wait until the next kvm exit for the guest
to see KVM_REQ_PENDING_TIMER set.

This problem became visible after commit 9642d18e. This
commit changed the timer migration code to always attempt
to migrate timers away from nohz_full cores. While it's
discussable if this is correct/desirable (I don't think
it is), it's clear that the lapic emulation code has
a requirement on firing the hrtimer in the same core
where it was started. This is achieved by making the
hrtimer pinned.

Lastly, note that KVM has code to migrate timers when a
vCPU is scheduled to run in different core. However, this
forced migration may fail. When this happens, we can have
the same problem. If we want 100% correctness, we'll have
to modify apic_timer_fn() to cause a kvm exit when it runs
on a different core than the vCPU. Not sure if this is
possible.

Here's a reproducer for the issue being fixed:

 1. Set all cores but core0 to be nohz_full cores
 2. Start a guest with a single vCPU
 3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()

You'll see that apic_timer_fn() will run in core0 while
kvm_inject_apic_timer_irqs() runs in a different core. If
you get both on core0, try running a program that takes 100%
of the CPU and pin it to core0 to force the vCPU out.
Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61abdbe0

01 4月, 2016 3 次提交

kvm: set page dirty only if page has been writable · 14f47605

由 Yu Zhao 提交于 3月 30, 2016

In absence of shadow dirty mask, there is no need to set page dirty
if page has never been writable. This is a tiny optimization but
good to have for people who care much about dirty page tracking.
Signed-off-by: NYu Zhao <yuzhao@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14f47605

KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled · a2b5c3c0

由 Paolo Bonzini 提交于 3月 29, 2016

If SynIC is disabled, there is nothing that userspace can do to
handle these exits; on the other hand, userspace probably will
not know about KVM_EXIT_HYPERV_HCALL and complain about it or
even exit.  Just prevent anything bad from happening by handling
the hypercall in KVM and returning an "invalid hypercall" code.

Fixes: 83326e43
Cc: Andrey Smetanin <irqlevel@gmail.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a2b5c3c0

KVM: x86: Inject pending interrupt even if pending nmi exist · 321c5658

由 Yuki Shibuya 提交于 3月 24, 2016

Non maskable interrupts (NMI) are preferred to interrupts in current
implementation. If a NMI is pending and NMI is blocked by the result
of nmi_allowed(), pending interrupt is not injected and
enable_irq_window() is not executed, even if interrupts injection is
allowed.

In old kernel (e.g. 2.6.32), schedule() is often called in NMI context.
In this case, interrupts are needed to execute iret that intends end
of NMI. The flag of blocking new NMI is not cleared until the guest
execute the iret, and interrupts are blocked by pending NMI. Due to
this, iret can't be invoked in the guest, and the guest is starved
until block is cleared by some events (e.g. canceling injection).

This patch injects pending interrupts, when it's allowed, even if NMI
is blocked. And, If an interrupts is pending after executing
inject_pending_event(), enable_irq_window() is executed regardless of
NMI pending counter.

Cc: stable@vger.kernel.org
Signed-off-by: NYuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

321c5658

23 3月, 2016 1 次提交

KVM: page_track: fix access to NULL slot · a6adb106

由 Paolo Bonzini 提交于 3月 22, 2016

This happens when doing the reboot test from virt-tests:

[  131.833653] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  131.842461] IP: [<ffffffffa0950087>] kvm_page_track_is_active+0x17/0x60 [kvm]
[  131.850500] PGD 0
[  131.852763] Oops: 0000 [#1] SMP
[  132.007188] task: ffff880075fbc500 ti: ffff880850a3c000 task.ti: ffff880850a3c000
[  132.138891] Call Trace:
[  132.141639]  [<ffffffffa092bd11>] page_fault_handle_page_track+0x31/0x40 [kvm]
[  132.149732]  [<ffffffffa093380f>] paging64_page_fault+0xff/0x910 [kvm]
[  132.172159]  [<ffffffffa092c734>] kvm_mmu_page_fault+0x64/0x110 [kvm]
[  132.179372]  [<ffffffffa06743c2>] handle_exception+0x1b2/0x430 [kvm_intel]
[  132.187072]  [<ffffffffa067a301>] vmx_handle_exit+0x1e1/0xc50 [kvm_intel]
...

Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: 3d0c27adSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6adb106

22 3月, 2016 15 次提交

KVM/x86: update the comment of memory barrier in the vcpu_enter_guest() · 0f127d12

由 Lan Tianyu 提交于 3月 13, 2016

The barrier also orders the write to mode from any reads
to the page tables done and so update the comment.
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f127d12

KVM/x86: Call smp_wmb() before increasing tlbs_dirty · 7bfdf217

由 Lan Tianyu 提交于 3月 13, 2016

Update spte before increasing tlbs_dirty to make sure no tlb flush
in lost after spte is zapped. This pairs with the barrier in the
kvm_flush_remote_tlbs().
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7bfdf217

L
KVM/x86: Replace smp_mb() with smp_store_mb/release() in the walk_shadow_page_lockless_begin/end() · 36ca7e0a
由 Lan Tianyu 提交于 3月 13, 2016
```
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
36ca7e0a

KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page() · 9753f529

由 Lan Tianyu 提交于 3月 13, 2016

There is already a barrier inside of kvm_flush_remote_tlbs() which can
help to make sure everyone sees our modifications to the page tables and
see changes to vcpu->mode here. So remove the smp_mb in the
kvm_mmu_commit_zap_page() and update the comment.
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9753f529

KVM, pkeys: expose CPUID/CR4 to guest · b9baba86

由 Huaitong Han 提交于 3月 22, 2016

X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:
CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys,
enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of
CR4.PKE(bit 22).

This patch disables CPUID:PKU without ept, because pkeys is not yet
implemented for shadow paging.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9baba86

KVM, pkeys: add pkeys support for permission_fault · be94f6b7

由 Huaitong Han 提交于 3月 22, 2016

Protection keys define a new 4-bit protection key field (PKEY) in bits
62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
register(16 domains), every domain has 2 bits(write disable bit, access
disable bit).

Static logic has been produced in update_pkru_bitmask, dynamic logic need
read pkey from page table entries, get pkru value, and deduce the correct
result.

[ Huaitong: Xiao helps to modify many sections. ]
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be94f6b7

KVM, pkeys: introduce pkru_mask to cache conditions · 2d344105

由 Huaitong Han 提交于 3月 22, 2016

PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some
conditions is true, the fault is considered as a PKU violation.
pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and
does cache some conditions for permission_fault.

[ Huaitong: Xiao helps to modify many sections. ]
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2d344105

KVM, pkeys: save/restore PKRU when guest/host switches · 1be0e61c

由 Xiao Guangrong 提交于 3月 22, 2016

Currently XSAVE state of host is not restored after VM-exit and PKRU
is managed by XSAVE so the PKRU from guest is still controlling the
memory access even if the CPU is running the code of host. This is
not safe as KVM needs to access the memory of userspace (e,g QEMU) to
do some emulation.

So we save/restore PKRU when guest/host switches.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1be0e61c

KVM, pkeys: add pkeys support for xsave state · 17a511f8

由 Huaitong Han 提交于 3月 22, 2016

This patch adds pkeys support for xsave state.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17a511f8

KVM, pkeys: disable pkeys for guests in non-paging mode · ddba2628

由 Huaitong Han 提交于 3月 22, 2016

Pkeys is disabled if CPU is in non-paging mode in hardware. However KVM
always uses paging mode to emulate guest non-paging, mode with TDP. To
emulate this behavior, pkeys needs to be manually disabled when guest
switches to non-paging mode.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ddba2628

KVM: x86: remove magic number with enum cpuid_leafs · e0b18ef7

由 Huaitong Han 提交于 3月 22, 2016

This patch removes magic number with enum cpuid_leafs.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0b18ef7

KVM: MMU: return page fault error code from permission_fault · f13577e8

由 Paolo Bonzini 提交于 3月 08, 2016

This will help in the implementation of PKRU, where the PK bit of the page
fault error code cannot be computed in advance (unlike I/D, R/W and U/S).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f13577e8

KVM: VMX: fix nested vpid for old KVM guests · ef697a71

由 Paolo Bonzini 提交于 3月 18, 2016

Old KVM guests invoke single-context invvpid without actually checking
whether it is supported.  This was fixed by commit 518c8aee ("KVM: VMX:
Make sure single type invvpid is supported before issuing invvpid
instruction", 2010-08-01) and the patch after, but pre-2.6.36
kernels lack it including RHEL 6.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac8Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef697a71

KVM: VMX: avoid guest hang on invalid invvpid instruction · f6870ee9

由 Paolo Bonzini 提交于 3月 18, 2016

A guest executing an invalid invvpid instruction would hang
because the instruction pointer was not updated.

Reported-by: jmontleo@redhat.com
Tested-by: jmontleo@redhat.com
Cc: stable@vger.kernel.org
Fixes: 99b83ac8Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6870ee9

KVM: VMX: avoid guest hang on invalid invept instruction · 2849eb4f

由 Paolo Bonzini 提交于 3月 18, 2016

A guest executing an invalid invept instruction would hang
because the instruction pointer was not updated.

Cc: stable@vger.kernel.org
Fixes: bfd0a56bReviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2849eb4f

10 3月, 2016 2 次提交

KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 · 5f0b8199

由 Paolo Bonzini 提交于 3月 09, 2016

KVM has special logic to handle pages with pte.u=1 and pte.w=0 when
CR0.WP=1. These pages' SPTEs flip continuously between two states:
U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed)
and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed).

When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1.
When guest EFER has the NX bit cleared, the reserved bit check thinks
that the latter state is invalid; teach it that the smep_andnot_wp case
will also use the NX bit of SPTEs.

Cc: stable@vger.kernel.org
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.inel.com>
Fixes: c258b62bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f0b8199

KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo · 844a5fe2

由 Paolo Bonzini 提交于 3月 08, 2016

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM runs the guest with CR0.WP=1, so it must handle supervisor writes
specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault
when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0.
When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and
restarts execution. This will still cause a user write to fault, while
supervisor writes will succeed. User reads will fault spuriously now,
and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads
will be enabled and supervisor writes disabled, going back to the
originary situation where supervisor writes fault spuriously.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page. To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0. If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry
control, so they do not use user-return notifiers for EFER---if they did,
EFER.NX would be forced to the same value as the host).

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Fixes: f6577a5fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

844a5fe2

09 3月, 2016 2 次提交

KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch · 5a5fbdc0

由 Paolo Bonzini 提交于 3月 08, 2016

It is now equal to use_eager_fpu(), which simply tests a cpufeature bit.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a5fbdc0

KVM: x86: disable MPX if host did not enable MPX XSAVE features · a87036ad

由 Paolo Bonzini 提交于 3月 08, 2016

When eager FPU is disabled, KVM will still see the MPX bit in CPUID and
presumably the MPX vmentry and vmexit controls.  However, it will not
be able to expose the MPX XSAVE features to the guest, because the guest's
accessible XSAVE features are always a subset of host_xcr0.

In this case, we should disable the MPX CPUID bit, the BNDCFGS MSR,
and the MPX vmentry and vmexit controls for nested virtualization.
It is then unnecessary to enable guest eager FPU if the guest has the
MPX CPUID bit set.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a87036ad

08 3月, 2016 10 次提交

KVM: VMX: disable PEBS before a guest entry · 7099e2e1

由 Radim Krčmář 提交于 3月 04, 2016

Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
would crash if you decided to run a host command that uses PEBS, like
  perf record -e 'cpu/mem-stores/pp' -a

This happens because KVM is using VMX MSR switching to disable PEBS, but
SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
isn't safe:
  When software needs to reconfigure PEBS facilities, it should allow a
  quiescent period between stopping the prior event counting and setting
  up a new PEBS event. The quiescent period is to allow any latent
  residual PEBS records to complete its capture at their previously
  specified buffer address (provided by IA32_DS_AREA).

There might not be a quiescent period after the MSR switch, so a CPU
ends up using host's MSR_IA32_DS_AREA to access an area in guest's
memory.  (Or MSR switching is just buggy on some models.)

The guest can learn something about the host this way:
If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
in #PF where we leak host's MSR_IA32_DS_AREA through CR2.

After that, a malicious guest can map and configure memory where
MSR_IA32_DS_AREA is pointing and can therefore get an output from
host's tracing.

This is not a critical leak as the host must initiate with PEBS tracing
and I have not been able to get a record from more than one instruction
before vmentry in vmx_vcpu_run() (that place has most registers already
overwritten with guest's).

We could disable PEBS just few instructions before vmentry, but
disabling it earlier shouldn't affect host tracing too much.
We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
optimization isn't worth its code, IMO.

(If you are implementing PEBS for guests, be sure to handle the case
 where both host and guest enable PEBS, because this patch doesn't.)

Fixes: 26a4f3c0 ("perf/x86: disable PEBS on a guest entry.")
Cc: <stable@vger.kernel.org>
Reported-by: NJiří Olša <jolsa@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7099e2e1

KVM: MMU: micro-optimize gpte_access · bb9eadf0

由 Paolo Bonzini 提交于 2月 23, 2016

Avoid AND-NOT, most x86 processor lack an instruction for it.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb9eadf0

KVM: MMU: simplify last_pte_bitmap · 6bb69c9b

由 Paolo Bonzini 提交于 2月 23, 2016

Branch-free code is fun and everybody knows how much Avi loves it,
but last_pte_bitmap takes it a bit to the extreme.  Since the code
is simply doing a range check, like

	(level == 1 ||
	 ((gpte & PT_PAGE_SIZE_MASK) && level < N)

we can make it branch-free without storing the entire truth table;
it is enough to cache N.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6bb69c9b

KVM: MMU: coalesce more page zapping in mmu_sync_children · 50c9e6f3

由 Paolo Bonzini 提交于 2月 25, 2016

mmu_sync_children can only process up to 16 pages at a time.  Check
if we need to reschedule, and do not bother zapping the pages until
that happens.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

50c9e6f3

KVM: MMU: move zap/flush to kvm_mmu_get_page · 2a74003a

由 Paolo Bonzini 提交于 2月 24, 2016

kvm_mmu_get_page is the only caller of kvm_sync_page_transient
and kvm_sync_pages.  Moving the handling of the invalid_list there
removes the need for the underdocumented kvm_sync_page_transient
function.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a74003a

KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* · 1f50f1b3

由 Paolo Bonzini 提交于 2月 24, 2016

Return true if the page was synced (and the TLB must be flushed)
and false if the page was zapped.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f50f1b3

KVM: MMU: cleanup __kvm_sync_page and its callers · 9a43c5d9

由 Paolo Bonzini 提交于 2月 24, 2016

Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes
things unnecessarily tricky.  If kvm_mmu_prepare_zap_page is called,
it will call kvm_unlink_unsync_page too.  So kvm_unlink_unsync_page can
be called just as well at the beginning or the end of __kvm_sync_page...
which means that we might do it in kvm_sync_page too and remove the
parameter.

kvm_sync_page ends up being the same code that kvm_sync_pages used
to have before the previous patch.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a43c5d9

KVM: MMU: use kvm_sync_page in kvm_sync_pages · df748f86

由 Paolo Bonzini 提交于 2月 24, 2016

If the last argument is true, kvm_unlink_unsync_page is called anyway in
__kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page
itself).  Therefore, kvm_sync_pages can just call kvm_sync_page, instead
of going through kvm_unlink_unsync_page+__kvm_sync_page.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df748f86

KVM: MMU: move TLB flush out of __kvm_sync_page · 35a70510

由 Paolo Bonzini 提交于 2月 24, 2016

By doing this, kvm_sync_pages can use __kvm_sync_page instead of
reinventing it.  Because of kvm_mmu_flush_or_zap, the code does not
end up being more complex than before, and more cleanups to kvm_sync_pages
will come in the next patches.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35a70510

KVM: MMU: introduce kvm_mmu_flush_or_zap · b8c67b7a

由 Paolo Bonzini 提交于 2月 24, 2016

This is a generalization of mmu_pte_write_flush_tlb, that also
takes care of calling kvm_mmu_commit_zap_page.  The next
patches will introduce more uses.
Reviewed-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8c67b7a

05 3月, 2016 1 次提交

KVM: i8254: drop local copy of mul_u64_u32_div · 0e4d4415

由 Paolo Bonzini 提交于 3月 04, 2016

A function that does the same as i8254.c's muldiv64 has been added
(for KVM's own use, in fact!) in include/linux/math64.h.  Use it
instead of muldiv64.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0e4d4415

04 3月, 2016 3 次提交

KVM: MMU: check kvm_mmu_pages and mmu_page_path indices · e23d3fef

由 Xiao Guangrong 提交于 2月 24, 2016

Give a special invalid index to the root of the walk, so that we
can check the consistency of kvm_mmu_pages and mmu_page_path.
Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
[Extracted from a bigger patch proposed by Guangrong. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e23d3fef

KVM: MMU: Fix ubsan warnings · 0a47cd85

由 Paolo Bonzini 提交于 2月 23, 2016

kvm_mmu_pages_init is doing some really yucky stuff. It is setting
up a sentinel for mmu_page_clear_parents; however, because of a) the
way levels are numbered starting from 1 and b) the way mmu_page_path
sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
out of bounds. This is harmless because the code overwrites up to the
first two elements of parents->idx and these are initialized, and
because the sentinel is not needed in this case---mmu_page_clear_parents
exits anyway when it gets to the end of the array. However ubsan
complains, and everyone else should too.

This fix does three things. First it makes the mmu_page_path arrays
PT64_ROOT_LEVEL elements in size, so that we can write to them without
checking the level in advance. Second it disintegrates kvm_mmu_pages_init
between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
for_each_sp (to place the NULL sentinel at the end of the current path).
This is okay because the mmu_page_path is only used in
mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
a for_each_sp iterator, and hence always after a call to mmu_pages_next.
Third it changes mmu_pages_clear_parents to just use the sentinel to
stop iteration, without checking the bounds on level.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a47cd85

KVM: MMU: cleanup handle_abnormal_pfn · 798e88b3

由 Paolo Bonzini 提交于 2月 23, 2016

The goto and temporary variable are unnecessary, just use return
statements.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

798e88b3

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功