提交 · 5d55f299f97769130c6cc67896414c988db309ab · openeuler / Kernel

02 8月, 2010 11 次提交

KVM: x86 emulator: re-implementing 'mov AL,moffs' instruction decoding · 5d55f299

由 Wei Yongjun 提交于 7月 07, 2010

This patch change to use DstAcc for decoding 'mov AL, moffs'
and introduced SrcAcc for decoding 'mov moffs, AL'.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5d55f299

KVM: x86 emulator: fix cli/sti instruction emulation · 07cbc6c1

由 Wei Yongjun 提交于 7月 06, 2010

If IOPL check fail, the cli/sti emulate GP and then we should
skip writeback since the default write OP is OP_REG.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

07cbc6c1

KVM: x86 emulator: fix 'mov rm,sreg' instruction decoding · b16b2b7b

由 Wei Yongjun 提交于 7月 06, 2010

The source operand of 'mov rm,sreg' is segment register, not
general-purpose register, so remove SrcReg from decoding.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b16b2b7b

KVM: x86 emulator: fix 'and AL,imm8' instruction decoding · e97e883f

由 Wei Yongjun 提交于 7月 06, 2010

'and AL,imm8' should be mask as ByteOp, otherwise the dest operand
length will no correct and we may fill the full EAX when writeback.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e97e883f

KVM: x86 emulator: fix the comment of out instruction · ce7a0ad3

由 Wei Yongjun 提交于 7月 06, 2010

Fix the comment of out instruction, using the same style as the
other instructions.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ce7a0ad3

KVM: x86 emulator: fix 'mov sreg,rm16' instruction decoding · a5046e6c

由 Wei Yongjun 提交于 7月 06, 2010

Memory reads for 'mov sreg,rm16' should be 16 bits only.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a5046e6c

KVM: MMU: Don't drop accessed bit while updating an spte · b79b93f9

由 Avi Kivity 提交于 6月 06, 2010

__set_spte() will happily replace an spte with the accessed bit set with
one that has the accessed bit clear.  Add a helper update_spte() which checks
for this condition and updates the page flag if needed.
Signed-off-by: NAvi Kivity <avi@redhat.com>

b79b93f9

KVM: MMU: Atomically check for accessed bit when dropping an spte · a9221dd5

由 Avi Kivity 提交于 6月 06, 2010

Currently, in the window between the check for the accessed bit, and actually
dropping the spte, a vcpu can access the page through the spte and set the bit,
which will be ignored by the mmu.

Fix by using an exchange operation to atmoically fetch the spte and drop it.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a9221dd5

KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte() · ce061867

由 Avi Kivity 提交于 6月 06, 2010

Since we need to make the check atomic, move it to the place that will
set the new spte.
Signed-off-by: NAvi Kivity <avi@redhat.com>

ce061867

KVM: MMU: Introduce drop_spte() · be38d276

由 Avi Kivity 提交于 6月 06, 2010

When we call rmap_remove(), we (almost) always immediately follow it by
an __set_spte() to a nonpresent pte.  Since we need to perform the two
operations atomically, to avoid losing the dirty and accessed bits, introduce
a helper drop_spte() and convert all call sites.

The operation is still nonatomic at this point.
Signed-off-by: NAvi Kivity <avi@redhat.com>

be38d276

KVM: VMX: fix tlb flush with invalid root · dd180b3e

由 Xiao Guangrong 提交于 7月 03, 2010

Commit 341d9b535b6c simplify reload logic while entry guest mode, it
can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and
KVM_REQ_MMU_SYNC both set.

But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the
root is invalid, it is triggered during my test:

Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable]
......

Fixed by directly return if the root is not ready.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dd180b3e

01 8月, 2010 29 次提交

KVM: Remove unnecessary divide operations · 82855413

由 Joerg Roedel 提交于 7月 01, 2010

This patch converts unnecessary divide and modulo operations
in the KVM large page related code into logical operations.
This allows to convert gfn_t to u64 while not breaking 32
bit builds.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

82855413

KVM: MMU: cleanup FNAME(fetch)() functions · 84754cd8

由 Xiao Guangrong 提交于 6月 30, 2010

Cleanup this function that we are already get the direct sp's access
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

84754cd8

KVM: MMU: fix direct sp's access corrupted · 9e7b0e7f

由 Xiao Guangrong 提交于 6月 30, 2010

If the mapping is writable but the dirty flag is not set, we will find
the read-only direct sp and setup the mapping, then if the write #PF
occur, we will mark this mapping writable in the read-only direct sp,
now, other real read-only mapping will happily write it without #PF.

It may hurt guest's COW

Fixed by re-install the mapping when write #PF occur.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9e7b0e7f

KVM: MMU: fix conflict access permissions in direct sp · 5fd5387c

由 Xiao Guangrong 提交于 6月 30, 2010

In no-direct mapping, we mark sp is 'direct' when we mapping the
guest's larger page, but its access is encoded form upper page-struct
entire not include the last mapping, it will cause access conflict.

For example, have this mapping:
        [W]
      / PDE1 -> |---|
  P[W]          |   | LPA
      \ PDE2 -> |---|
        [R]

P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the
same lage page(LPA). The P's access is WR, PDE1's access is WR,
PDE2's access is RO(just consider read-write permissions here)

When guest access PDE1, we will create a direct sp for LPA, the sp's
access is from P, is W, then we will mark the ptes is W in this sp.

Then, guest access PDE2, we will find LPA's shadow page, is the same as
PDE's, and mark the ptes is RO.

So, if guest access PDE1, the incorrect #PF is occured.

Fixed by encode the last mapping access into direct shadow page
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5fd5387c

KVM: MMU: fix writable sync sp mapping · 36a2e677

由 Xiao Guangrong 提交于 6月 30, 2010

While we sync many unsync sp at one time(in mmu_sync_children()),
we may mapping the spte writable, it's dangerous, if one unsync
sp's mapping gfn is another unsync page's gfn.

For example:

SP1.pte[0] = P
SP2.gfn's pfn = P
[SP1.pte[0] = SP2.gfn's pfn]

First, we write protected SP1 and SP2, but SP1 and SP2 are still the
unsync sp.

Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp,
that is SP2, so it will mapping it writable, but we plan to sync SP2 soon,
at this point, the SP2->unsync is not reliable since later we sync SP2 but
SP2->gfn is already writable.

So the final result is: SP2 is the sync page but SP2.gfn is writable.

This bug will corrupt guest's page table, fixed by mark read-only mapping
if the mapped gfn has shadow pages.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

36a2e677

KVM: VMX: Execute WBINVD to keep data consistency with assigned devices · f5f48ee1

由 Sheng Yang 提交于 6月 30, 2010

Some guest device driver may leverage the "Non-Snoop" I/O, and explicitly
WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
CLFLUSH, we need to maintain data consistency either by:
1: flushing cache (wbinvd) when the guest is scheduled out if there is no
wbinvd exit, or
2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.
Signed-off-by: NYaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f5f48ee1

KVM: Simplify vcpu_enter_guest() mmu reload logic slightly · 3e007509

由 Avi Kivity 提交于 6月 23, 2010

No need to reload the mmu in between two different vcpu->requests checks.

kvm_mmu_reload() may trigger KVM_REQ_TRIPLE_FAULT, but that will be caught
during atomic guest entry later.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3e007509

KVM: Search the LAPIC's for one that will accept a PIC interrupt · 529df65e

由 Chris Lalancette 提交于 6月 21, 2010

Older versions of 32-bit linux have a "Checking 'hlt' instruction"
test where they repeatedly call the 'hlt' instruction, and then
expect a timer interrupt to kick the CPU out of halt.  This happens
before any LAPIC or IOAPIC setup happens, which means that all of
the APIC's are in virtual wire mode at this point.  Unfortunately,
the current implementation of virtual wire mode is hardcoded to
only kick the BSP, so if a crash+kexec occurs on a different
vcpu, it will never get kicked.

This patch makes pic_unlock() do the equivalent of
kvm_irq_delivery_to_apic() for the IOAPIC code.  That is, it runs
through all of the vcpus looking for one that is in virtual wire
mode.  In the normal case where LAPICs and IOAPICs are configured,
this won't be used at all.  In the bootstrap phase of a modern
OS, before the LAPICs and IOAPICs are configured, this will have
exactly the same behavior as today; VCPU0 is always looked at
first, so it will always get out of the loop after the first
iteration.  This will only go through the loop more than once
during a kexec/kdump, in which case it will only do it a few times
until the kexec'ed kernel programs the LAPIC and IOAPIC.
Signed-off-by: NChris Lalancette <clalance@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

529df65e

KVM: x86: Enable AVX for guest · 6c3f6041

由 Sheng Yang 提交于 6月 22, 2010

Enable Intel(R) Advanced Vector Extension(AVX) for guest.

The detection of AVX feature includes OSXSAVE bit testing. When OSXSAVE bit is
not set, even if AVX is supported, the AVX instruction would result in UD as
well. So we're safe to expose AVX bits to guest directly.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6c3f6041

KVM: Prevent internal slots from being COWed · 7ac77099

由 Avi Kivity 提交于 6月 21, 2010

If a process with a memory slot is COWed, the page will change its address
(despite having an elevated reference count).  This breaks internal memory
slots which have their physical addresses loaded into vmcs registers (see
the APIC access memory slot).
Signed-off-by: NAvi Kivity <avi@redhat.com>

7ac77099

KVM: Add mini-API for vcpu->requests · a8eeb04a

由 Avi Kivity 提交于 5月 10, 2010

Makes it a little more readable and hackable.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a8eeb04a

A
KVM: i8259: simplify pic_irq_request() calling sequence · 36633f32
由 Avi Kivity 提交于 5月 03, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
36633f32

KVM: i8259: reduce excessive abstraction for pic_irq_request() · 073d4613

由 Avi Kivity 提交于 5月 03, 2010

Part of the i8259 code pretends it isn't part of kvm, but we know better.
Reduce excessive abstraction, eliminating callbacks and void pointers.
Signed-off-by: NAvi Kivity <avi@redhat.com>

073d4613

KVM: Remove kernel-allocated memory regions · b74a07be

由 Avi Kivity 提交于 6月 21, 2010

Equivalent (and better) functionality is provided by user-allocated memory
regions.
Signed-off-by: NAvi Kivity <avi@redhat.com>

b74a07be

KVM: Remove memory alias support · a1f4d395

由 Avi Kivity 提交于 6月 21, 2010

As advertised in feature-removal-schedule.txt.  Equivalent support is provided
by overlapping memory regions.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1f4d395

KVM: Consolidate load/save temporary buffer allocation and freeing · d1ac91d8

由 Avi Kivity 提交于 6月 20, 2010

Instead of three temporary variables and three free calls, have one temporary
variable (with four names) and one free call.
Signed-off-by: NAvi Kivity <avi@redhat.com>

d1ac91d8

KVM: Fix xsave and xcr save/restore memory leak · a1a005f3

由 Avi Kivity 提交于 6月 20, 2010

We allocate temporary kernel buffers for these structures, but never free them.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1a005f3

KVM: x86 emulator: fix group3 instruction decoding · 7d5993d6

由 Wei Yongjun 提交于 6月 17, 2010

Group 3 instruction with ModRM reg field as 001 is
defined as test instruction under AMD arch, and
emulate_grp3() is ready for emulate it, so fix the
decoding.

static inline int emulate_grp3(...)
{
	...
	switch (c->modrm_reg) {
	case 0 ... 1:   /* test */
		emulate_2op_SrcV("test", c->src, c->dst, ctxt->eflags);
	...
}
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7d5993d6

KVM: x86: Allow any LAPIC to accept PIC interrupts · e7dca5c0

由 Chris Lalancette 提交于 6月 16, 2010

If the guest wants to accept timer interrupts on a CPU other
than the BSP, we need to remove this gate.
Signed-off-by: NChris Lalancette <clalance@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e7dca5c0

KVM: x86: Introduce a workqueue to deliver PIT timer interrupts · 33572ac0

由 Chris Lalancette 提交于 6月 16, 2010

We really want to "kvm_set_irq" during the hrtimer callback,
but that is risky because that is during interrupt context.
Instead, offload the work to a workqueue, which is a bit safer
and should provide most of the same functionality.
Signed-off-by: NChris Lalancette <clalance@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

33572ac0

KVM: x86 emulator: fix pusha instruction emulation · c37eda13

由 Wei Yongjun 提交于 6月 15, 2010

emulate pusha instruction only writeback the last
EDI register, but the other registers which need
to be writeback is ignored. This patch fixed it.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c37eda13

KVM: x86: fix -DDEBUG oops · bd371396

由 Zachary Amsden 提交于 6月 14, 2010

Fix a slight error with assertion in local APIC code.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

bd371396

KVM: MMU: don't walk every parent pages while mark unsync · 1047df1f

由 Xiao Guangrong 提交于 6月 11, 2010

While we mark the parent's unsync_child_bitmap, if the parent is already
unsynced, it no need walk it's parent, it can reduce some unnecessary
workload
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1047df1f

KVM: MMU: clear unsync_child_bitmap completely · 7a8f1a74

由 Xiao Guangrong 提交于 6月 11, 2010

In current code, some page's unsync_child_bitmap is not cleared completely
in mmu_sync_children(), for example, if two PDPEs shard one PDT, one of
PDPE's unsync_child_bitmap is not cleared.

Currently, it not harm anything just little overload, but it's the prepare
work for the later patch
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7a8f1a74

KVM: MMU: cleanup for __mmu_unsync_walk() · ebdea638

由 Xiao Guangrong 提交于 6月 11, 2010

Decrease sp->unsync_children after clear unsync_child_bitmap bit
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ebdea638

KVM: MMU: don't mark pte notrap if it's just sync transient · be71e061

由 Xiao Guangrong 提交于 6月 11, 2010

If the sync-sp just sync transient, don't mark its pte notrap
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

be71e061

KVM: MMU: avoid double write protected in sync page path · f918b443

由 Xiao Guangrong 提交于 6月 11, 2010

The sync page is already write protected in mmu_sync_children(), don't
write protected it again
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f918b443

KVM: MMU: cleanup for dirty page judgment · cb83cad2

由 Xiao Guangrong 提交于 6月 11, 2010

Using wrap function to cleanup page dirty judgment
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cb83cad2

KVM: MMU: rename 'page' and 'shadow_page' to 'sp' · ac3cd03c

由 Xiao Guangrong 提交于 6月 11, 2010

Rename 'page' and 'shadow_page' to 'sp' to better fit the context
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ac3cd03c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功