提交 · 29b7c71b5ecf2caaa4c2105ecc0094826db8a8a8 · openanolis / cloud-kernel

18 5月, 2012 3 次提交

KVM: s390: epoch difference and TOD programmable field · 29b7c71b

由 Carsten Otte 提交于 5月 15, 2012

This patch makes vcpu epoch difference and the TOD programmable
field accessible from userspace. This is needed in order to
implement a couple of instructions that deal with the time of
day clock on s390, such as SET CLOCK and for migration.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

29b7c71b

KVM: s390: KVM_GET/SET_ONEREG for s390 · 14eebd91

由 Carsten Otte 提交于 5月 15, 2012

This patch enables KVM_CAP_ONE_REG for s390 and implements stubs
for KVM_GET/SET_ONE_REG. This is based on the ppc implementation.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

14eebd91

KVM: s390: add capability indicating COW support · 1526bf9c

由 Christian Borntraeger 提交于 5月 15, 2012

Currently qemu/kvm on s390 uses a guest mapping that does not
allow the guest backing page table to be write-protected to
support older systems. On those older systems a host write
protection fault will be delivered to the guest.

Newer systems allow to write-protect the guest backing memory
and let the fault be delivered to the host, thus allowing COW.

Use a capability bit to tell qemu if that is possible.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1526bf9c

17 5月, 2012 4 次提交

KVM: Fix mmu_reload() clash with nested vmx event injection · d8368af8

由 Avi Kivity 提交于 5月 14, 2012

Currently the inject_pending_event() call during guest entry happens after
kvm_mmu_reload().  This is for historical reasons - we used to
inject_pending_event() in atomic context, while kvm_mmu_reload() needs task
context.

A problem is that nested vmx can cause the mmu context to be reset, if event
injection is intercepted and causes a #VMEXIT instead (the #VMEXIT resets
CR0/CR3/CR4).  If this happens, we end up with invalid root_hpa, and since
kvm_mmu_reload() has already run, no one will fix it and we end up entering
the guest this way.

Fix by reordering event injection to be before kvm_mmu_reload().  Use
->cancel_injection() to undo if kvm_mmu_reload() fails.

https://bugzilla.kernel.org/show_bug.cgi?id=42980Reported-by: NLuke-Jr <luke-jr+linuxbugs@utopios.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d8368af8

KVM: MMU: Don't use RCU for lockless shadow walking · c142786c

由 Avi Kivity 提交于 5月 14, 2012

Using RCU for lockless shadow walking can increase the amount of memory
in use by the system, since RCU grace periods are unpredictable.  We also
have an unconditional write to a shared variable (reader_counter), which
isn't good for scaling.

Replace that with a scheme similar to x86's get_user_pages_fast(): disable
interrupts during lockless shadow walk to force the freer
(kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
processor with interrupts enabled.

We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
kvm_flush_remote_tlbs() from avoiding the IPI.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c142786c

KVM: VMX: Optimize %ds, %es reload · b2da15ac

由 Avi Kivity 提交于 5月 13, 2012

On x86_64, we can defer %ds and %es reload to the heavyweight context switch,
since nothing in the lightweight paths uses the host %ds or %es (they are
ignored by the processor). Furthermore we can avoid the load if the segments
are null, by letting the hardware load the null segments for us. This is the
expected case.

On i386, we could avoid the reload entirely, since the entry.S paths take care
of reload, except for the SYSEXIT path which leaves %ds and %es set to __USER_DS.
So we set them to the same values as well.

Saves about 70 cycles out of 1600 (around 4%; noisy measurements).
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b2da15ac

KVM: VMX: Fix %ds/%es clobber · 512d5649

由 Avi Kivity 提交于 5月 13, 2012

The vmx exit code unconditionally restores %ds and %es to __USER_DS. This
can override the user's values, since %ds and %es are not saved and restored
in x86_64 syscalls. In practice, this isn't dangerous since nobody uses
segment registers in long mode, least of all programs that use KVM.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

512d5649

14 5月, 2012 2 次提交

KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() · d54e4237

由 Joerg Roedel 提交于 5月 07, 2012

The instruction emulation for bsrw is broken in KVM because
the code always uses bsr with 32 or 64 bit operand size for
emulation. Fix that by using emulate_2op_SrcV_nobyte() macro
to use guest operand size for emulation.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d54e4237

KVM: VMX: unlike vmcs on fail path · 5f3fbc34

由 Xiao Guangrong 提交于 5月 14, 2012

fix:

[ 1529.577273] Call Trace:
[ 1529.577289]  [<ffffffffa060d58f>] kvm_arch_hardware_disable+0x13/0x30 [kvm]
[ 1529.577302]  [<ffffffffa05fa2d4>] hardware_disable_nolock+0x35/0x39 [kvm]
[ 1529.577311]  [<ffffffffa05fa29f>] ? cpumask_clear_cpu.constprop.31+0x13/0x13 [kvm]
[ 1529.577315]  [<ffffffff81096ba8>] on_each_cpu+0x44/0x84
[ 1529.577326]  [<ffffffffa05f98b5>] hardware_disable_all_nolock+0x34/0x36 [kvm]
[ 1529.577335]  [<ffffffffa05f98e2>] hardware_disable_all+0x2b/0x39 [kvm]
[ 1529.577349]  [<ffffffffa05fafe5>] kvm_put_kvm+0xed/0x10f [kvm]
[ 1529.577358]  [<ffffffffa05fb3d7>] kvm_vm_release+0x22/0x28 [kvm]
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5f3fbc34

08 5月, 2012 1 次提交

Merge branch 'for-upstream' of git://github.com/agraf/linux-2.6 into next · f2569053

由 Avi Kivity 提交于 5月 08, 2012

PPC updates from Alex.

* 'for-upstream' of git://github.com/agraf/linux-2.6:
  KVM: PPC: Emulator: clean up SPR reads and writes
  KVM: PPC: Emulator: clean up instruction parsing
  kvm/powerpc: Add new ioctl to retreive server MMU infos
  kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
  KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
  KVM: PPC: Book3S: Enable IRQs during exit handling
  KVM: PPC: Fix PR KVM on POWER7 bare metal
  KVM: PPC: Fix stbux emulation
  KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
  KVM: PPC: Book3S: PR: No isync in slbie path
  KVM: PPC: Book3S: PR: Optimize entry path
  KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs.
  KVM: PPC: Restrict PPC_[L|ST]D macro to asm code
  KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their 64 bit copies.
  KVM: PPC: Use clockevent multiplier and shifter for decrementer
  KVM: Use minimum and maximum address mapped by TLB1
Signed-off-by: NAvi Kivity <avi@redhat.com>

f2569053

06 5月, 2012 20 次提交

KVM: PPC: Emulator: clean up SPR reads and writes · 54771e62

由 Alexander Graf 提交于 5月 04, 2012

When reading and writing SPRs, every SPR emulation piece had to read
or write the respective GPR the value was read from or stored in itself.

This approach is pretty prone to failure. What if we accidentally
implement mfspr emulation where we just do "break" and nothing else?
Suddenly we would get a random value in the return register - which is
always a bad idea.

So let's consolidate the generic code paths and only give the core
specific SPR handling code readily made variables to read/write from/to.

Functionally, this patch doesn't change anything, but it increases the
readability of the code and makes is less prone to bugs.
Signed-off-by: NAlexander Graf <agraf@suse.de>

54771e62

KVM: PPC: Emulator: clean up instruction parsing · c46dc9a8

由 Alexander Graf 提交于 5月 04, 2012

Instructions on PPC are pretty similarly encoded. So instead of
every instruction emulation code decoding the instruction fields
itself, we can move that code to more generic places and rely on
the compiler to optimize the unused bits away.

This has 2 advantages. It makes the code smaller and it makes the
code less error prone, as the instruction fields are always
available, so accidental misusage is reduced.

Functionally, this patch doesn't change anything.
Signed-off-by: NAlexander Graf <agraf@suse.de>

c46dc9a8

kvm/powerpc: Add new ioctl to retreive server MMU infos · 5b74716e

由 Benjamin Herrenschmidt 提交于 4月 26, 2012

This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: NAlexander Graf <agraf@suse.de>

5b74716e

kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM · f31e65e1

由 Benjamin Herrenschmidt 提交于 3月 15, 2012

There is nothing in the code for emulating TCE tables in the kernel
that prevents it from working on "PR" KVM... other than ifdef's and
location of the code.

This and moves the bulk of the code there to a new file called
book3s_64_vio.c.

This speeds things up a bit on my G5.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix for hv kvm, 32bit, whitespace]
Signed-off-by: NAlexander Graf <agraf@suse.de>

f31e65e1

KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler · 4444aa5f

由 Mihai Caraman 提交于 4月 16, 2012

Guest r8 register is held in the scratch register and stored correctly,
so remove the instruction that clobbers it. Guest r13 was missing from vcpu,
store it there.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4444aa5f

KVM: PPC: Book3S: Enable IRQs during exit handling · 3b1d9d7d

由 Alexander Graf 提交于 4月 30, 2012

While handling an exit, we should listen for interrupts and make sure to
receive them when they arrive, to keep our latencies low.
Signed-off-by: NAlexander Graf <agraf@suse.de>

3b1d9d7d

KVM: PPC: Fix PR KVM on POWER7 bare metal · 11f7d6c2

由 Alexander Graf 提交于 4月 27, 2012

When running on a system that is HV capable, some interrupts use HSRR
SPRs instead of the normal SRR SPRs. These are also used in the Linux
handlers to jump back to code after an interrupt got processed.

Unfortunately, in our "jump back to the real host handler after we've
done the context switch" code, we were only setting the SRR SPRs,
rendering Linux to jump back to some invalid IP after it's processed
the interrupt.

This fixes random crashes on p7 opal mode with PR KVM for me.
Signed-off-by: NAlexander Graf <agraf@suse.de>

11f7d6c2

KVM: PPC: Fix stbux emulation · 978b4fae

由 Alexander Graf 提交于 4月 27, 2012

Stbux writes the address it's operating on to the register specified in ra,
not into the data source register.
Signed-off-by: NAlexander Graf <agraf@suse.de>

978b4fae

KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields · 518f040c

由 Mihai Caraman 提交于 4月 16, 2012

Interrupt code used PPC_LL/PPC_STL macros to load/store some of u32 fields
which led to memory overflow on 64-bit. Use lwz/stw instead.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

518f040c

KVM: PPC: Book3S: PR: No isync in slbie path · af415087

由 Alexander Graf 提交于 4月 25, 2012

While messing around with the SLBs we're running in real mode. The
entry to guest space goes through rfid, which is context synchronizing,
so there's no need to manually synchronize anything through isync.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2035607 to 2181301 exits per second.
Signed-off-by: NAlexander Graf <agraf@suse.de>

af415087

KVM: PPC: Book3S: PR: Optimize entry path · 8c2d0be7

由 Alexander Graf 提交于 4月 25, 2012

By shuffling a few instructions around we can execute more memory
loads in parallel, giving us a small performance boost.

With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2013052 to 2035607 exits per second.
Signed-off-by: NAlexander Graf <agraf@suse.de>

8c2d0be7

KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs. · 30124906

由 Varun Sethi 提交于 4月 25, 2012

For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and
non-64 bit case. Use the PPC_STD/PPC_LD macros for saving/restoring to/from these registers.
Signed-off-by: NVarun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

30124906

KVM: PPC: Restrict PPC_[L|ST]D macro to asm code · 3d4c6826

由 Alexander Graf 提交于 4月 25, 2012

We only want asm code macros to be accessible from asm code, so #ifdef it
depending on it.
Signed-off-by: NAlexander Graf <agraf@suse.de>

3d4c6826

KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from their 64 bit copies. · 185e4188

由 Varun Sethi 提交于 4月 25, 2012

Introduced PPC_STD/PPC_LD macros for saving/restoring guest registers to/from their 64 bit copies.
Signed-off-by: NVarun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

185e4188

KVM: PPC: Use clockevent multiplier and shifter for decrementer · 6e35994d

由 Bharat Bhushan 提交于 4月 18, 2012

Time for which the hrtimer is started for decrementer emulation is calculated
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC
reprogramming (if needed) and which calculate timebase ticks using the
multiplier and shifter mechanism implemented within clockevent layer.

It was observed that this conversion (timebase->time->timebase) are not
correct because the mechanism are not consistent.
In our setup it adds 2% jitter.

With this patch clockevent multiplier and shifter mechanism are used when
starting hrtimer for decrementer emulation. Now the jitter is < 0.5%.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6e35994d

KVM: Use minimum and maximum address mapped by TLB1 · cc902ad4

由 Bharat Bhushan 提交于 3月 22, 2012

Keep track of minimum and maximum address mapped by tlb1.
This helps in TLBMISS handling in KVM to quick check whether the address lies in mapped range.
If address does not lies in this range then no need to look in each tlb1 entry of tlb1 array.
Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

cc902ad4

KVM: x86 emulator: Avoid pushing back ModRM byte fetched for group decoding · 9f4260e7

由 Takuya Yoshikawa 提交于 4月 30, 2012

Although ModRM byte is fetched for group decoding, it is soon pushed
back to make decode_modrm() fetch it later again.

Now that ModRM flag can be found in the top level opcode tables, fetch
ModRM byte before group decoding to make the code simpler.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9f4260e7

KVM: x86 emulator: Move ModRM flags for groups to top level opcode tables · 1c2545be

由 Takuya Yoshikawa 提交于 4月 30, 2012

Needed for the following patch which simplifies ModRM fetching code.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1c2545be

KVM guest: make kvm_para_available() check hypervisor bit reading cpuid leaf · 9b72d3b0

由 Gleb Natapov 提交于 4月 30, 2012

This cpuid range does not exist on real HW and Intel spec says that
"Information returned for highest basic information leaf" will be
returned. Not very well defined.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9b72d3b0

KVM: fix cpuid eax for KVM leaf · 57c22e5f

由 Michael S. Tsirkin 提交于 5月 02, 2012

cpuid eax should return the max leaf so that
guests can find out the valid range.
This matches Xen et al.
Update documentation to match.

Tested with -cpu host.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

57c22e5f

03 5月, 2012 1 次提交

KVM: s390: implement KVM_CAP_NR/MAX_VCPUS · e726b1bd

由 Christian Borntraeger 提交于 5月 02, 2012

Let userspace know the number of max and supported cpus for kvm on s390.
Return KVM_MAX_VCPUS (currently 64) for both values.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e726b1bd

01 5月, 2012 3 次提交

KVM: s390: Handle sckpf instruction · 8c3f61e2

由 Cornelia Huck 提交于 4月 24, 2012

Handle the mandatory intercept SET CLOCK PROGRAMMABLE FIELD
instruction.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8c3f61e2

KVM: s390: use kvm_vcpu_on_spin for diag 0x44 · 8733ac36

由 Christian Borntraeger 提交于 4月 25, 2012

Lets replace the old open coded version of diag 0x44 (which relied on
compat_sched_yield) with kvm_vcpu_on_spin.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8733ac36

KVM: s390: Implement the directed yield (diag 9c) hypervisor call for KVM · 41628d33

由 Konstantin Weitz 提交于 4月 25, 2012

This patch implements the directed yield hypercall found on other
System z hypervisors. It delegates execution time to the virtual cpu
specified in the instruction's parameter.

Useful to avoid long spinlock waits in the guest.

Christian Borntraeger: moved common code in virt/kvm/
Signed-off-by: NKonstantin Weitz <WEITZKON@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

41628d33

28 4月, 2012 3 次提交

KVM: x86: Run PIT work in own kthread · b6ddf05f

由 Jan Kiszka 提交于 4月 24, 2012

We can't run PIT IRQ injection work in the interrupt context of the host
timer. This would allow the user to influence the handler complexity by
asking for a broadcast to a large number of VCPUs. Therefore, this work
was pushed into workqueue context in 9d244caf2e. However, this prevents
prioritizing the PIT injection over other task as workqueues share
kernel threads.

This replaces the workqueue with a kthread worker and gives that thread
a name in the format "kvm-pit/<owner-process-pid>". That allows to
identify and adjust the kthread priority according to the VM process
parameters.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b6ddf05f

KVM: x86: Document in-kernel PIT API · 0589ff6c

由 Jan Kiszka 提交于 4月 24, 2012

Add descriptions for KVM_CREATE_PIT2 and KVM_GET/SET_PIT2.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0589ff6c

KVM: Improve readability of KVM API doc · 414fa985

由 Jan Kiszka 提交于 4月 24, 2012

This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

414fa985

24 4月, 2012 3 次提交

KVM: x86 emulator: fix asm constraint in flush_pending_x87_faults · 38e8a2dd

由 Avi Kivity 提交于 4月 22, 2012

'bool' wants 8-bit registers.
Reported-by: NTakuya Yoshikawa <takuya.yoshikawa@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

38e8a2dd

KVM: Introduce bitmask for apic attention reasons · 41383771

由 Gleb Natapov 提交于 4月 19, 2012

The patch introduces a bitmap that will hold reasons apic should be
checked during vmexit. This is in a preparation for vp eoi patch
that will add one more check on vmexit. With the bitmap we can do
if(apic_attention) to check everything simultaneously which will
add zero overhead on the fast path.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

41383771

KVM: Introduce direct MSI message injection for in-kernel irqchips · 07975ad3

由 Jan Kiszka 提交于 3月 29, 2012

Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated "on the fly" by user space,
IRQ routes are a limited resource that user space has to manage
carefully.

By providing a direct injection path, we can both avoid using up limited
resources and simplify the necessary steps for user land.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

07975ad3

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功