提交 · 257090f70233084488f7b3ebe99be8c159a23281 · openeuler / Kernel

11 2月, 2013 2 次提交

KVM: VMX: disable apicv by default · 257090f7

由 Yang Zhang 提交于 2月 10, 2013

Without Posted Interrupt, current code is broken. Just disable by
default until Posted Interrupt is ready.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

257090f7

KVM: s390: Fix handling of iscs. · 79fd50c6

由 Cornelia Huck 提交于 2月 07, 2013

There are two ways to express an interruption subclass:
- As a bitmask, as used in cr6.
- As a number, as used in the I/O interruption word.

Unfortunately, we have treated the I/O interruption word as if it
contained the bitmask as well, which went unnoticed so far as
- (not-yet-released) qemu made the same mistake, and
- Linux guest kernels don't check the isc value in the I/O interruption
  word for subchannel interrupts.

Make sure that we treat the I/O interruption word correctly.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

79fd50c6

07 2月, 2013 5 次提交

KVM: MMU: cleanup __direct_map · 24db2734

由 Xiao Guangrong 提交于 2月 05, 2013

Use link_shadow_page to link the sp to the spte in __direct_map
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

24db2734

KVM: MMU: remove pt_access in mmu_set_spte · f7616203

由 Xiao Guangrong 提交于 2月 05, 2013

It is only used in debug code, so drop it
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f7616203

KVM: MMU: cleanup mapping-level · 55dd98c3

由 Xiao Guangrong 提交于 2月 05, 2013

Use min() to cleanup mapping_level
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

55dd98c3

KVM: MMU: lazily drop large spte · caf6900f

由 Xiao Guangrong 提交于 2月 05, 2013

Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them not present, the page fault caused by read access can
be avoided

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

caf6900f

KVM: VMX: cleanup vmx_set_cr0(). · 5037878e

由 Gleb Natapov 提交于 2月 04, 2013

When calculating hw_cr0 teh current code masks bits that should be always
on and re-adds them back immediately after. Cleanup the code by masking
only those bits that should be dropped from hw_cr0. This allow us to
get rid of some defines.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5037878e

06 2月, 2013 4 次提交

G
KVM: VMX: add missing exit names to VMX_EXIT_REASONS array · b0da5bec
由 Gleb Natapov 提交于 2月 03, 2013
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
b0da5bec

KVM: VMX: disable SMEP feature when guest is in non-paging mode · c08800a5

由 Dongxiao Xu 提交于 2月 04, 2013

SMEP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMEP needs to be manually
disabled when guest switches to non-paging mode.

We met an issue that, SMP Linux guest with recent kernel (enable
SMEP support, for example, 3.5.3) would crash with triple fault if
setting unrestricted_guest=0. This is because KVM uses an identity
mapping page table to emulate the non-paging mode, where the page
table is set with USER flag. If SMEP is still enabled in this case,
guest will meet unhandlable page fault and then crash.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: NXiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c08800a5

KVM: Remove duplicate text in api.txt · 4293b5e5

由 Geoff Levand 提交于 1月 31, 2013

Signed-off-by: NGeoff Levand <geoff@infradead.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4293b5e5

Revert "KVM: MMU: split kvm_mmu_free_page" · 834be0d8

由 Gleb Natapov 提交于 1月 30, 2013

This reverts commit bd4c86ea.

There is not user for kvm_mmu_isolate_page() any more.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

834be0d8

05 2月, 2013 7 次提交

KVM: MMU: drop superfluous is_present_gpte() check. · eb3fce87

由 Gleb Natapov 提交于 1月 30, 2013

Gust page walker puts only present ptes into ptes[] array. No need to
check it again.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

eb3fce87

KVM: MMU: drop superfluous min() call. · 116eb3d3

由 Gleb Natapov 提交于 1月 30, 2013

Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

116eb3d3

KVM: MMU: set base_role.nxe during mmu initialization. · 2c9afa52

由 Gleb Natapov 提交于 1月 30, 2013

Move base_role.nxe initialisation to where all other roles are initialized.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2c9afa52

KVM: MMU: drop unneeded checks. · 9bb4f6b1

由 Gleb Natapov 提交于 1月 30, 2013

Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9bb4f6b1

KVM: MMU: make spte_is_locklessly_modifiable() more clear · feb3eb70

由 Gleb Natapov 提交于 1月 30, 2013

spte_is_locklessly_modifiable() checks that both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are present on spte. Make it more explicit.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

feb3eb70

KVM: set_memory_region: Disallow changing read-only attribute later · 75d61fbc

由 Takuya Yoshikawa 提交于 1月 30, 2013

As Xiao pointed out, there are a few problems with it:
 - kvm_arch_commit_memory_region() write protects the memory slot only
   for GET_DIRTY_LOG when modifying the flags.
 - FNAME(sync_page) uses the old spte value to set a new one without
   checking KVM_MEM_READONLY flag.

Since we flush all shadow pages when creating a new slot, the simplest
fix is to disallow such problematic flag changes: this is safe because
no one is doing such things.
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

75d61fbc

KVM: set_memory_region: Identify the requested change explicitly · f64c0398

由 Takuya Yoshikawa 提交于 1月 29, 2013

KVM_SET_USER_MEMORY_REGION forces __kvm_set_memory_region() to identify
what kind of change is being requested by checking the arguments. The
current code does this checking at various points in code and each
condition being used there is not easy to understand at first glance.

This patch consolidates these checks and introduces an enum to name the
possible changes to clean up the code.

Although this does not introduce any functional changes, there is one
change which optimizes the code a bit: if we have nothing to change, the
new code returns 0 immediately.

Note that the return value for this case cannot be changed since QEMU
relies on it: we noticed this when we changed it to -EINVAL and got a
section mismatch error at the final stage of live migration.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f64c0398

30 1月, 2013 3 次提交

s390/kvm: Fix instruction decoding · 0c29b229

由 Christian Borntraeger 提交于 1月 25, 2013

Instructions with long displacement have a signed displacement.
Currently the sign bit is interpreted as 2^20: Lets fix it by doing the
sign extension from 20bit to 32bit and then use it as a signed variable
in the addition (see kvm_s390_get_base_disp_rsy).

Furthermore, there are lots of "int" in that code. This is problematic,
because shifting on a signed integer is undefined/implementation defined
if the bit value happens to be negative.
Fortunately the promotion rules will make the right hand side unsigned
anyway, so there is no real problem right now.
Let's convert them anyway to unsigned where appropriate to avoid
problems if the code is changed or copy/pasted later on.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0c29b229

s390/virtio-ccw: Fix setup_vq error handling. · c98d3683

由 Cornelia Huck 提交于 1月 25, 2013

virtio_ccw_setup_vq() failed to unwind correctly on errors. In
particular, it failed to delete the virtqueue on errors, leading to
list corruption when virtio_ccw_del_vqs() iterated over a virtqueue
that had not been added to the vcdev's list.

Fix this with redoing the error unwinding in virtio_ccw_setup_vq(),
using a single path for all errors.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c98d3683

s390/kvm: Fix store status for ACRS/FPRS · 15bc8d84

由 Christian Borntraeger 提交于 1月 25, 2013

On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.

If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.

This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
CC: stable@vger.kernel.org
Signed-off-by: NGleb Natapov <gleb@redhat.com>

15bc8d84

29 1月, 2013 5 次提交

kvm: Handle yield_to failure return code for potential undercommit case · c45c528e

由 Raghavendra K T 提交于 1月 22, 2013

yield_to returns -ESRCH, When source and target of yield_to
run queue length is one. When we see three successive failures of
yield_to we assume we are in potential undercommit case and abort
from PLE handler.
The assumption is backed by low probability of wrong decision
for even worst case scenarios such as average runqueue length
between 1 and 2.

More detail on rationale behind using three tries:
if p is the probability of finding rq length one on a particular cpu,
and if we do n tries, then probability of exiting ple handler is:

 p^(n+1) [ because we would have come across one source with rq length
1 and n target cpu rqs  with length 1 ]

so
num tries:         probability of aborting ple handler (1.5x overcommit)
 1                 1/4
 2                 1/8
 3                 1/16

We can increase this probability with more tries, but the problem is
the overhead.
Also, If we have tried three times that means we would have iterated
over 3 good eligible vcpus along with many non-eligible candidates. In
worst case if we iterate all the vcpus, we reduce 1x performance and
overcommit performance get hit.

note that we do not update last boosted vcpu in failure cases.
Thank Avi for raising question on aborting after first fail from yield_to.
Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Tested-by: NChegu Vinod <chegu_vinod@hp.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c45c528e

sched: Bail out of yield_to when source and target runqueue has one task · 7b270f60

由 Peter Zijlstra 提交于 1月 22, 2013

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

(History: Raghavendra initially worked on break out of kvm ple handler upon
 seeing source runqueue length = 1, but it had to export rq length).
 Peter came up with the elegant idea of return -ESRCH in scheduler core.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi)
Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: NAndrew Jones <drjones@redhat.com>
Tested-by: NChegu Vinod <chegu_vinod@hp.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

7b270f60

x86, apicv: add virtual interrupt delivery support · c7c9c56c

由 Yang Zhang 提交于 1月 25, 2013

Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:

- for pending interrupt, instead of direct injection, we may need
  update architecture specific indicators before resuming to guest.

- A pending interrupt, which is masked by ISR, should be also
  considered in above update action, since hardware will decide
  when to inject it at right time. Current has_interrupt and
  get_interrupt only returns a valid vector from injection p.o.v.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c7c9c56c

x86, apicv: add virtual x2apic support · 8d14695f

由 Yang Zhang 提交于 1月 25, 2013

basically to benefit from apicv, we need to enable virtualized x2apic mode.
Currently, we only enable it when guest is really using x2apic.

Also, clear MSR bitmap for corresponding x2apic MSRs when guest enabled x2apic:
0x800 - 0x8ff: no read intercept for apicv register virtualization,
               except APIC ID and TMCCT which need software's assistance to
               get right value.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8d14695f

x86, apicv: add APICv register virtualization support · 83d4c286

由 Yang Zhang 提交于 1月 25, 2013

- APIC read doesn't cause VM-Exit
- APIC write becomes trap-like
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhang <yang.z.zhang@intel.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

83d4c286

27 1月, 2013 3 次提交

kvm: Obey read-only mappings in iommu · d47510e2

由 Alex Williamson 提交于 1月 24, 2013

We've been ignoring read-only mappings and programming everything
into the iommu as read-write.  Fix this to only include the write
access flag when read-only is not set.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

d47510e2

kvm: Force IOMMU remapping on memory slot read-only flag changes · 261874b0

由 Alex Williamson 提交于 1月 24, 2013

Memory slot flags can be altered without changing other parameters of
the slot.  The read-only attribute is the only one the IOMMU cares
about, so generate an un-map, re-map when this occurs.  This also
avoid unnecessarily re-mapping the slot when no IOMMU visible changes
are made.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

261874b0

KVM: x86 emulator: fix test_cc() build failure on i386 · 3f0c3d0b

由 Avi Kivity 提交于 1月 26, 2013

'pushq' doesn't exist on i386.  Replace with 'push', which should work
since the operand is a register.
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

3f0c3d0b

24 1月, 2013 11 次提交

KVM: VMX: set vmx->emulation_required only when needed. · 14168786