提交 · 1d5f066e0b63271b67eac6d3752f8aa96adcbddb · openanolis / cloud-kernel

24 10月, 2010 7 次提交

KVM: x86: Fix a possible backwards warp of kvmclock · 1d5f066e

由 Zachary Amsden 提交于 8月 19, 2010

Kernel time, which advances in discrete steps may progress much slower
than TSC.  As a result, when kvmclock is adjusted to a new base, the
apparent time to the guest, which runs at a much higher, nsec scaled
rate based on the current TSC, may have already been observed to have
a larger value (kernel_ns + scaled tsc) than the value to which we are
setting it (kernel_ns + 0).

We must instead compute the clock as potentially observed by the guest
for kernel_ns to make sure it does not go backwards.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1d5f066e

KVM: x86: Unify TSC logic · e48672fa

由 Zachary Amsden 提交于 8月 19, 2010

Move the TSC control logic from the vendor backends into x86.c
by adding adjust_tsc_offset to x86 ops.  Now all TSC decisions
can be done in one place.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e48672fa

KVM: x86: TSC reset compensation · f38e098f

由 Zachary Amsden 提交于 8月 19, 2010

Attempt to synchronize TSCs which are reset to the same value.  In the
case of a reliable hardware TSC, we can just re-use the same offset, but
on non-reliable hardware, we can get closer by adjusting the offset to
match the elapsed time.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f38e098f

KVM: x86: Move TSC offset writes to common code · 99e3e30a

由 Zachary Amsden 提交于 8月 19, 2010

Also, ensure that the storing of the offset and the reading of the TSC
are never preempted by taking a spinlock.  While the lock is overkill
now, it is useful later in this patch series.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

99e3e30a

KVM: x86: Drop vm_init_tsc · ae38436b

由 Zachary Amsden 提交于 8月 19, 2010

This is used only by the VMX code, and is not done properly;
if the TSC is indeed backwards, it is out of sync, and will
need proper handling in the logic at each and every CPU change.
For now, drop this test during init as misguided.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ae38436b

KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages · 49d5ca26

由 Dave Hansen 提交于 8月 19, 2010

Doing this makes the code much more readable.  That's
borne out by the fact that this patch removes code.  "used"
also happens to be the number that we need to return back to
the slab code when our shrinker gets called.  Keeping this
value as opposed to free makes the next patch simpler.

So, 'struct kvm' is kzalloc()'d.  'struct kvm_arch' is a
structure member (and not a pointer) of 'struct kvm'.  That
means they start out zeroed.  I _think_ they get initialized
properly by kvm_mmu_change_mmu_pages().  But, that only happens
via kvm ioctls.

Another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

49d5ca26

KVM: rename x86 kvm->arch.n_alloc_mmu_pages · 39de71ec

由 Dave Hansen 提交于 8月 19, 2010

arch.n_alloc_mmu_pages is a poor choice of name. This value truly
means, "the number of pages which _may_ be allocated".  But,
reading the name, "n_alloc_mmu_pages" implies "the number of allocated
mmu pages", which is dead wrong.

It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages.  This change will make the next few patches
much more obvious and easy to read.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NTim Pepper <lnxninja@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

39de71ec

20 10月, 2010 1 次提交

KVM: Fix fs/gs reload oops with invalid ldt · 9581d442

由 Avi Kivity 提交于 10月 19, 2010

kvm reloads the host's fs and gs blindly, however the underlying segment
descriptors may be invalid due to the user modifying the ldt after loading
them.

Fix by using the safe accessors (loadsegment() and load_gs_index()) instead
of home grown unsafe versions.

This is CVE-2010-3698.

KVM-Stable-Tag.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9581d442

02 8月, 2010 1 次提交

KVM: VMX: fix tlb flush with invalid root · dd180b3e

由 Xiao Guangrong 提交于 7月 03, 2010

Commit 341d9b535b6c simplify reload logic while entry guest mode, it
can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and
KVM_REQ_MMU_SYNC both set.

But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the
root is invalid, it is triggered during my test:

Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable]
......

Fixed by directly return if the root is not ready.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

dd180b3e

01 8月, 2010 14 次提交

KVM: Remove unnecessary divide operations · 82855413

由 Joerg Roedel 提交于 7月 01, 2010

This patch converts unnecessary divide and modulo operations
in the KVM large page related code into logical operations.
This allows to convert gfn_t to u64 while not breaking 32
bit builds.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

82855413

KVM: VMX: Execute WBINVD to keep data consistency with assigned devices · f5f48ee1

由 Sheng Yang 提交于 6月 30, 2010

Some guest device driver may leverage the "Non-Snoop" I/O, and explicitly
WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
CLFLUSH, we need to maintain data consistency either by:
1: flushing cache (wbinvd) when the guest is scheduled out if there is no
wbinvd exit, or
2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.
Signed-off-by: NYaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f5f48ee1

KVM: Remove memory alias support · a1f4d395

由 Avi Kivity 提交于 6月 21, 2010

As advertised in feature-removal-schedule.txt.  Equivalent support is provided
by overlapping memory regions.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1f4d395

KVM: MMU: don't mark pte notrap if it's just sync transient · be71e061

由 Xiao Guangrong 提交于 6月 11, 2010

If the sync-sp just sync transient, don't mark its pte notrap
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

be71e061

KVM: Fix mov cr3 #GP at wrong instruction · 2390218b

由 Avi Kivity 提交于 6月 10, 2010

On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.

Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2390218b

KVM: Fix mov cr4 #GP at wrong instruction · a83b29c6

由 Avi Kivity 提交于 6月 10, 2010

On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.

Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a83b29c6

KVM: Fix mov cr0 #GP at wrong instruction · 49a9b07e

由 Avi Kivity 提交于 6月 10, 2010

On Intel, we call skip_emulated_instruction() even if we injected a #GP,
resulting in the #GP pointing at the wrong address.

Fix by injecting the exception and skipping the instruction at the same place,
so we can do just one or the other.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

49a9b07e

KVM: VMX: Enable XSAVE/XRSTOR for guest · 2acf923e

由 Dexuan Cui 提交于 6月 10, 2010

This patch enable guest to use XSAVE/XRSTOR instructions.

We assume that host_xcr0 would use all possible bits that OS supported.

And we loaded xcr0 in the same way we handled fpu - do it as late as we can.
Signed-off-by: NDexuan Cui <dexuan.cui@intel.com>
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2acf923e

KVM: x86: Propagate fpu_alloc errors · 10ab25cd

由 Jan Kiszka 提交于 5月 25, 2010

Memory allocation may fail. Propagate such errors.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

10ab25cd

KVM: x86: Use FPU API · 98918833

由 Sheng Yang 提交于 5月 17, 2010

Convert KVM to use generic FPU API.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

98918833

KVM: x86: Use unlazy_fpu() for host FPU · 7cf30855

由 Sheng Yang 提交于 5月 17, 2010

We can avoid unnecessary fpu load when userspace process
didn't use FPU frequently.

Derived from Avi's idea.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7cf30855

KVM: inject #UD if instruction emulation fails and exit to userspace · 6d77dbfc

由 Gleb Natapov 提交于 5月 10, 2010

Do not kill VM when instruction emulation fails. Inject #UD and report
failure to userspace instead. Userspace may choose to reenter guest if
vcpu is in userspace (cpl == 3) in which case guest OS will kill
offending process and continue running.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6d77dbfc

KVM: remove export of emulator_write_emulated() · f181b96d

由 Gleb Natapov 提交于 4月 28, 2010

It is not called directly outside of the file it's defined in anymore.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f181b96d

KVM: x86 emulator: add (set|get)_dr callbacks to x86_emulate_ops · 35aa5375

由 Gleb Natapov 提交于 4月 28, 2010

Add (set|get)_dr callbacks to x86_emulate_ops instead of calling
them directly.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

35aa5375

19 5月, 2010 1 次提交

KVM: MMU: Segregate shadow pages with different cr0.wp · 3dbe1415

由 Avi Kivity 提交于 5月 12, 2010

When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte
having u/s=0 and r/w=1.  This allows excessive access if the guest sets
cr0.wp=1 and accesses through this spte.

Fix by making cr0.wp part of the base role; we'll have different sptes for
the two cases and the problem disappears.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3dbe1415

17 5月, 2010 15 次提交

KVM: x86: Allow marking an exception as reinjected · ce7ddec4

由 Joerg Roedel 提交于 4月 22, 2010

This patch adds logic to kvm/x86 which allows to mark an
injected exception as reinjected. This allows to remove an
ugly hack from svm_complete_interrupts that prevented
exceptions from being reinjected at all in the nested case.
The hack was necessary because an reinjected exception into
the nested guest could cause a nested vmexit emulation. But
reinjected exceptions must not intercept. The downside of
the hack is that a exception that in injected could get
lost.
This patch fixes the problem and puts the code for it into
generic x86 files because. Nested-VMX will likely have the
same problem and could reuse the code.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ce7ddec4

KVM: x86: Add callback to let modules decide over some supported cpuid bits · d4330ef2

由 Joerg Roedel 提交于 4月 22, 2010

This patch adds the get_supported_cpuid callback to
kvm_x86_ops. It will be used in do_cpuid_ent to delegate the
decission about some supported cpuid bits to the
architecture modules.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d4330ef2

KVM: MMU: Drop cr4.pge from shadow page role · 87bc3bf9

由 Avi Kivity 提交于 4月 19, 2010

Since commit bf47a760, we no longer handle ptes with the global bit
set specially, so there is no reason to distinguish between shadow pages
created with cr4.gpe set and clear.

Such tracking is expensive when the guest toggles cr4.pge, so drop it.
Signed-off-by: NAvi Kivity <avi@redhat.com>

87bc3bf9

KVM: MMU: reduce 'struct kvm_mmu_page' size · 0571d366

由 Xiao Guangrong 提交于 4月 16, 2010

Define 'multimapped' as 'bool'.
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0571d366

KVM: MMU: Replace role.glevels with role.cr4_pae · 5b7e0102

由 Avi Kivity 提交于 4月 14, 2010

There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way.  Drop
role.glevels and replace is with role.cr4_pae (which is meaningful).  This
simplifies the code a bit.

As a side effect, it allows sharing shadow page tables between pae and
longmode guest page tables at the same guest page.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5b7e0102

KVM: x86: Push potential exception error code on task switches · e269fb21

由 Jan Kiszka 提交于 4月 14, 2010

When a fault triggers a task switch, the error code, if existent, has to
be pushed on the new task's stack. Implement the missing bits.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e269fb21

KVM: move DR register access handling into generic code · 020df079

由 Gleb Natapov 提交于 4月 13, 2010

Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.
Acked-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

020df079

KVM: MMU: remove unused field · f84cbb05

由 Xiao Guangrong 提交于 4月 06, 2010

kvm_mmu_page.oos_link is not used, so remove it
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f84cbb05

KVM: x86 emulator: Move string pio emulation into emulator.c · 7972995b

由 Gleb Natapov 提交于 3月 18, 2010

Currently emulation is done outside of emulator so things like doing
ins/outs to/from mmio are broken it also makes it hard (if not impossible)
to implement single stepping in the future. The implementation in this
patch is not efficient since it exits to userspace for each IO while
previous implementation did 'ins' in batches. Further patch that
implements pio in string read ahead address this problem.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7972995b

KVM: x86 emulator: fix in/out emulation. · cf8f70bf

由 Gleb Natapov 提交于 3月 18, 2010

in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cf8f70bf

KVM: remove realmode_lmsw function. · 93a152be

由 Gleb Natapov 提交于 3月 18, 2010

Use (get|set)_cr callback to emulate lmsw inside emulator.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

93a152be

KVM: Provide callback to get/set control registers in emulator ops. · 52a46617

由 Gleb Natapov 提交于 3月 18, 2010

Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

52a46617

KVM: Remove pointer to rflags from realmode_set_cr parameters. · 49c6799a

由 Gleb Natapov 提交于 3月 15, 2010

Mov reg, cr instruction doesn't change flags in any meaningful way, so
no need to update rflags after instruction execution.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

49c6799a

KVM: MMU: Reinstate pte prefetch on invlpg · 08e850c6

由 Avi Kivity 提交于 3月 15, 2010

Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:

  "The processor may create entries in paging-structure caches for
   translations required for prefetches and for accesses that are a
   result of speculative execution that would never actually occur
   in the executed code path."

And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.

Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter.  If a race occured, do not install the pte.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

08e850c6

KVM: x86: Use native_store_idt() instead of kvm_get_idt() · ec68798c

由 Wei Yongjun 提交于 3月 05, 2010

This patch use generic linux function native_store_idt()
instead of kvm_get_idt(), and also removed the useless
function kvm_get_idt().
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ec68798c

25 4月, 2010 1 次提交

KVM: move segment_base() into vmx.c · 2d49ec72

由 Gleb Natapov 提交于 2月 25, 2010

segment_base() is used only by vmx so move it there.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2d49ec72

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功