提交 · 0bdea06892e33afddbdc5da6df305e9fe9c41365 · openeuler / Kernel

24 1月, 2013 1 次提交

KVM: x86 emulator: Convert SHLD, SHRD to fastop · 0bdea068

由 Avi Kivity 提交于 1月 19, 2013

Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0bdea068

22 1月, 2013 3 次提交

KVM: x86: improve reexecute_instruction · 93c05d3e

由 Xiao Guangrong 提交于 1月 13, 2013

The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

93c05d3e

KVM: x86: let reexecute_instruction work for tdp · 95b3cf69

由 Xiao Guangrong 提交于 1月 13, 2013

Currently, reexecute_instruction refused to retry all instructions if
tdp is enabled. If nested npt is used, the emulation may be caused by
shadow page, it can be fixed by dropping the shadow page. And the only
condition that tdp can not retry the instruction is the access fault
on error pfn
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

95b3cf69

KVM: x86: clean up reexecute_instruction · 22368028

由 Xiao Guangrong 提交于 1月 13, 2013

Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

22368028

14 1月, 2013 7 次提交

KVM: MMU: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes a long time · 6b81b05e

由 Takuya Yoshikawa 提交于 1月 08, 2013

If the userspace starts dirty logging for a large slot, say 64GB of
memory, kvm_mmu_slot_remove_write_access() needs to hold mmu_lock for
a long time such as tens of milliseconds. This patch controls the lock
hold time by asking the scheduler if we need to reschedule for others.

One penalty for this is that we need to flush TLBs before releasing
mmu_lock. But since holding mmu_lock for a long time does affect not
only the guest, vCPU threads in other words, but also the host as a
whole, we should pay for that.

In practice, the cost will not be so high because we can protect a fair
amount of memory before being rescheduled: on my test environment,
cond_resched_lock() was called only once for protecting 12GB of memory
even without THP. We can also revisit Avi's "unlocked TLB flush" work
later for completely suppressing extra TLB flushes if needed.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

6b81b05e

KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself · 9d1beefb

由 Takuya Yoshikawa 提交于 1月 08, 2013

Better to place mmu_lock handling and TLB flushing code together since
this is a self-contained function.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

9d1beefb

KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself · b34cb590

由 Takuya Yoshikawa 提交于 1月 08, 2013

No reason to make callers take mmu_lock since we do not need to protect
kvm_mmu_change_mmu_pages() and kvm_mmu_slot_remove_write_access()
together by mmu_lock in kvm_arch_commit_memory_region(): the former
calls kvm_mmu_commit_zap_page() and flushes TLBs by itself.

Note: we do not need to protect kvm->arch.n_requested_mmu_pages by
mmu_lock as can be seen from the fact that it is read locklessly.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

b34cb590

KVM: Remove unused slot_bitmap from kvm_mmu_page · e12091ce

由 Takuya Yoshikawa 提交于 1月 08, 2013

Not needed any more.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

e12091ce

KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap based · b99db1d3

由 Takuya Yoshikawa 提交于 1月 08, 2013

This makes it possible to release mmu_lock and reschedule conditionally
in a later patch.  Although this may increase the time needed to protect
the whole slot when we start dirty logging, the kernel should not allow
the userspace to trigger something that will hold a spinlock for such a
long time as tens of milliseconds: actually there is no limit since it
is roughly proportional to the number of guest pages.

Another point to note is that this patch removes the only user of
slot_bitmap which will cause some problems when we increase the number
of slots further.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

b99db1d3

KVM: MMU: Remove unused parameter level from __rmap_write_protect() · 245c3912

由 Takuya Yoshikawa 提交于 1月 08, 2013

No longer need to care about the mapping level in this function.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

245c3912

KVM: Write protect the updated slot only when dirty logging is enabled · c972f3b1

由 Takuya Yoshikawa 提交于 1月 08, 2013

Calling kvm_mmu_slot_remove_write_access() for a deleted slot does
nothing but search for non-existent mmu pages which have mappings to
that deleted memory; this is safe but a waste of time.

Since we want to make the function rmap based in a later patch, in a
manner which makes it unsafe to be called for a deleted slot, we makes
the caller see if the slot is non-zero and being dirty logged.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c972f3b1

11 1月, 2013 2 次提交

KVM: MMU: fix infinite fault access retry · 7751babd

由 Xiao Guangrong 提交于 1月 08, 2013

We have two issues in current code:
- if target gfn is used as its page table, guest will refault then kvm will use
  small page size to map it. We need two #PF to fix its shadow page table

- sometimes, say a exception is triggered during vm-exit caused by #PF
  (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
  by the target gfn before go into page fault path, it will cause infinite
  loop:
  delete shadow pages shadowed by the gfn -> try to use large page size to map
  the gfn -> retry the access ->...

To fix these, we can adjust page size early if the target gfn is used as page
table
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7751babd

KVM: MMU: fix Dirty bit missed if CR0.WP = 0 · c2288505

由 Xiao Guangrong 提交于 1月 08, 2013

If the write-fault access is from supervisor and CR0.WP is not set on the
vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte
and clears U bit. This is the chance that kvm can change pte access from
readonly to writable

Unfortunately, the pte access is the access of 'direct' shadow page table,
means direct sp.role.access = pte_access, then we will create a writable
spte entry on the readonly shadow page table. It will cause Dirty bit is
not tracked when two guest ptes point to the same large page. Note, it
does not have other impact except Dirty bit since cr0.wp is encoded into
sp.role

It can be fixed by adjusting pte access before establishing shadow page
table. Also, after that, no mmu specified code exists in the common function
and drop two parameters in set_spte
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c2288505

10 1月, 2013 7 次提交

KVM: x86 emulator: convert basic ALU ops to fastop · fb864fbc

由 Avi Kivity 提交于 1月 04, 2013

Opcodes:
	TEST
	CMP
	ADD
	ADC
	SUB
	SBB
	XOR
	OR
	AND
Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

fb864fbc

KVM: x86 emulator: add macros for defining 2-operand fastop emulation · f7857f35

由 Avi Kivity 提交于 1月 04, 2013

Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f7857f35

KVM: x86 emulator: convert NOT, NEG to fastop · 45a1467d

由 Avi Kivity 提交于 1月 04, 2013

Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

45a1467d

KVM: x86 emulator: mark CMP, CMPS, SCAS, TEST as NoWrite · 75f72845

由 Avi Kivity 提交于 1月 04, 2013

Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

75f72845

KVM: x86 emulator: introduce NoWrite flag · b6744dc3

由 Avi Kivity 提交于 1月 04, 2013

Instead of disabling writeback via OP_NONE, just specify NoWrite.
Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b6744dc3

KVM: x86 emulator: Support for declaring single operand fastops · b7d491e7

由 Avi Kivity 提交于 1月 04, 2013

Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b7d491e7

KVM: x86 emulator: framework for streamlining arithmetic opcodes · e28bbd44

由 Avi Kivity 提交于 1月 04, 2013

We emulate arithmetic opcodes by executing a "similar" (same operation,
different operands) on the cpu.  This ensures accurate emulation, esp. wrt.
eflags.  However, the prologue and epilogue around the opcode is fairly long,
consisting of a switch (for the operand size) and code to load and save the
operands.  This is repeated for every opcode.

This patch introduces an alternative way to emulate arithmetic opcodes.
Instead of the above, we have four (three on i386) functions consisting
of just the opcode and a ret; one for each operand size.  For example:

   .align 8
   em_notb:
	not %al
	ret

   .align 8
   em_notw:
	not %ax
	ret

   .align 8
   em_notl:
	not %eax
	ret

   .align 8
   em_notq:
	not %rax
	ret

The prologue and epilogue are shared across all opcodes.  Note the functions
use a special calling convention; notably eflags is an input/output parameter
and is not clobbered.  Rather than dispatching the four functions through a
jump table, the functions are declared as a constant size (8) so their address
can be calculated.
Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e28bbd44

09 1月, 2013 2 次提交

KVM: VMX: fix incorrect cached cpl value with real/v8086 modes · b09408d0

由 Marcelo Tosatti 提交于 1月 07, 2013

CPL is always 0 when in real mode, and always 3 when virtual 8086 mode.

Using values other than those can cause failures on operations that
check CPL.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b09408d0

KVM: x86: remove unused variable from walk_addr_generic() · b0cfeb5d

由 Gleb Natapov 提交于 1月 08, 2013

Fix compilation warning.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b0cfeb5d

08 1月, 2013 2 次提交

KVM: MMU: simplify folding of dirty bit into accessed_dirty · 908e7d79

由 Gleb Natapov 提交于 12月 27, 2012

MMU code tries to avoid if()s HW is not able to predict reliably by using
bitwise operation to streamline code execution, but in case of a dirty bit
folding this gives us nothing since write_fault is checked right before
the folding code. Lets just piggyback onto the if() to make code more clear.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

908e7d79

KVM: mmu: remove unused trace event · ee04e0ce

由 Gleb Natapov 提交于 12月 25, 2012

trace_kvm_mmu_delay_free_pages() is no longer used.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ee04e0ce

03 1月, 2013 7 次提交

KVM: VMX: handle IO when emulation is due to #GP in real mode. · 0ca1b4f4

由 Gleb Natapov 提交于 12月 20, 2012

With emulate_invalid_guest_state=0 if a vcpu is in real mode VMX can
enter the vcpu with smaller segment limit than guest configured.  If the
guest tries to access pass this limit it will get #GP at which point
instruction will be emulated with correct segment limit applied. If
during the emulation IO is detected it is not handled correctly. Vcpu
thread should exit to userspace to serve the IO, but it returns to the
guest instead.  Since emulation is not completed till userspace completes
the IO the faulty instruction is re-executed ad infinitum.

The patch fixes that by exiting to userspace if IO happens during
instruction emulation.
Reported-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0ca1b4f4

KVM: VMX: Do not fix segment register during vcpu initialization. · d54d07b2

由 Gleb Natapov 提交于 12月 20, 2012

Segment registers will be fixed according to current emulation policy
during switching to real mode for the first time.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d54d07b2

KVM: VMX: fix emulation of invalid guest state. · d99e4152

由 Gleb Natapov 提交于 12月 20, 2012

Currently when emulation of invalid guest state is enable
(emulate_invalid_guest_state=1) segment registers are still fixed for
entry to vm86 mode some times. Segment register fixing is avoided in
enter_rmode(), but vmx_set_segment() still does it unconditionally.
The patch fixes it.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d99e4152

KVM: VMX: make rmode_segment_valid() more strict. · 89efbed0

由 Gleb Natapov 提交于 12月 20, 2012

Currently it allows entering vm86 mode if segment limit is greater than
0xffff and db bit is set. Both of those can cause incorrect execution of
instruction by cpu since in vm86 mode limit will be set to 0xffff and db
will be forced to 0.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

89efbed0

KVM: emulator: implement fninit, fnstsw, fnstcw · 045a282c

由 Gleb Natapov 提交于 12月 20, 2012

Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

045a282c

KVM: emulator: drop RPL check from linearize() function · 3a78a4f4

由 Gleb Natapov 提交于 12月 20, 2012

According to Intel SDM Vol3 Section 5.5 "Privilege Levels" and 5.6
"Privilege Level Checking When Accessing Data Segments" RPL checking is
done during loading of a segment selector, not during data access. We
already do checking during segment selector loading, so drop the check
during data access. Checking RPL during data access triggers #GP if
after transition from real mode to protected mode RPL bits in a segment
selector are set.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3a78a4f4

x86: kvm_para: fix typo in hypercall comments · 11393a07

由 Jesse Larrew 提交于 12月 10, 2012

Correct a typo in the comment explaining hypercalls.
Signed-off-by: NJesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

11393a07

23 12月, 2012 7 次提交

G
KVM: VMX: remove unneeded temporary variable from vmx_set_segment() · f924d66d
由 Gleb Natapov 提交于 12月 12, 2012
```
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
```
f924d66d

KVM: VMX: clean-up vmx_set_segment() · 1ecd50a9