提交 · be6ba0f0962a39091c52eb9167ddea201fe80716 · openeuler / raspberrypi-kernel

27 12月, 2011 30 次提交

KVM: introduce kvm_for_each_memslot macro · be6ba0f0

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce kvm_for_each_memslot to walk all valid memslot
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

be6ba0f0

KVM: introduce update_memslots function · be593d62

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce update_memslots to update slot which will be update to
kvm->memslots
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

be593d62

KVM: introduce KVM_MEM_SLOTS_NUM macro · 93a5cef0

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce KVM_MEM_SLOTS_NUM macro to instead of
KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93a5cef0

KVM: x86 emulator: Use opcode::execute for BSF/BSR · ff227392

由 Takuya Yoshikawa 提交于 11月 22, 2011

BSF: 0F BC
BSR: 0F BD
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ff227392

KVM: x86 emulator: Use opcode::execute for CMPXCHG · e940b5c2

由 Takuya Yoshikawa 提交于 11月 22, 2011

CMPXCHG: 0F B0, 0F B1
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e940b5c2

KVM: x86 emulator: Use opcode::execute for WRMSR/RDMSR · e1e210b0

由 Takuya Yoshikawa 提交于 11月 22, 2011

WRMSR: 0F 30
RDMSR: 0F 32
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e1e210b0

KVM: x86 emulator: Use opcode::execute for MOV to cr/dr · bc00f8d2

由 Takuya Yoshikawa 提交于 11月 22, 2011

MOV: 0F 22 (move to control registers)
MOV: 0F 23 (move to debug registers)
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

bc00f8d2

KVM: x86 emulator: Use opcode::execute for CALL · d4ddafcd

由 Takuya Yoshikawa 提交于 11月 22, 2011

CALL: E8
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d4ddafcd

KVM: x86 emulator: Use opcode::execute for BT family · ce7faab2

由 Takuya Yoshikawa 提交于 11月 22, 2011

BT : 0F A3
BTS: 0F AB
BTR: 0F B3
BTC: 0F BB

Group 8: 0F BA
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ce7faab2

KVM: x86 emulator: Use opcode::execute for IN/OUT · d7841a4b

由 Takuya Yoshikawa 提交于 11月 22, 2011

IN : E4, E5, EC, ED
OUT: E6, E7, EE, EF
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d7841a4b

KVM: VMX: remove unneeded vmx_load_host_state() calls. · 46199f33

由 Gleb Natapov 提交于 11月 17, 2011

vmx_load_host_state() does not handle msrs switching (except
MSR_KERNEL_GS_BASE) since commit 26bb0981. Remove call to it
where it is no longer make sense.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

46199f33

KVM: Optimize dirty logging by rmap_write_protect() · 95d4c16c

由 Takuya Yoshikawa 提交于 11月 14, 2011

Currently, write protecting a slot needs to walk all the shadow pages
and checks ones which have a pte mapping a page in it.

The walk is overly heavy when dirty pages in that slot are not so many
and checking the shadow pages would result in unwanted cache pollution.

To mitigate this problem, we use rmap_write_protect() and check only
the sptes which can be reached from gfns marked in the dirty bitmap
when the number of dirty pages are less than that of shadow pages.

This criterion is reasonable in its meaning and worked well in our test:
write protection became some times faster than before when the ratio of
dirty pages are low and was not worse even when the ratio was near the
criterion.

Note that the locking for this write protection becomes fine grained.
The reason why this is safe is descripted in the comments.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

95d4c16c

KVM: Count the number of dirty pages for dirty logging · 7850ac54

由 Takuya Yoshikawa 提交于 11月 14, 2011

Needed for the next patch which uses this number to decide how to write
protect a slot.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7850ac54

KVM: MMU: Split gfn_to_rmap() into two functions · 9b9b1492

由 Takuya Yoshikawa 提交于 11月 14, 2011

rmap_write_protect() calls gfn_to_rmap() for each level with gfn fixed.
This results in calling gfn_to_memslot() repeatedly with that gfn.

This patch introduces __gfn_to_rmap() which takes the slot as an
argument to avoid this.

This is also needed for the following dirty logging optimization.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9b9b1492

KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect() · d6eebf8b

由 Takuya Yoshikawa 提交于 11月 14, 2011

Remove redundant checks and use is_large_pte() macro.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d6eebf8b

KVM: MMU: remove KVM host pv mmu support · fb920458

由 Chris Wright 提交于 11月 01, 2011

The host side pv mmu support has been marked for feature removal in
January 2011.  It's not in use, is slower than shadow or hardware
assisted paging, and a maintenance burden.  It's November 2011, time to
remove it.
Signed-off-by: NChris Wright <chrisw@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fb920458

KVM: x86: Simplify kvm timer handler · 3f2e5260

由 Jan Kiszka 提交于 9月 14, 2011

The vcpu reference of a kvm_timer can't become NULL while the timer is
valid, so drop this redundant test. This also makes it pointless to
carry a separate __kvm_timer_fn, fold it into kvm_timer_fn.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3f2e5260

KVM: MMU: improve write flooding detected · a30f47cb

由 Xiao Guangrong 提交于 9月 22, 2011

Detecting write-flooding does not work well, when we handle page written, if
the last speculative spte is not accessed, we treat the page is
write-flooding, however, we can speculative spte on many path, such as pte
prefetch, page synced, that means the last speculative spte may be not point
to the written page and the written page can be accessed via other sptes, so
depends on the Accessed bit of the last speculative spte is not enough

Instead of detected page accessed, we can detect whether the spte is accessed
after it is written, if the spte is not accessed but it is written frequently,
we treat is not a page table or it not used for a long time
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a30f47cb

KVM: MMU: fix detecting misaligned accessed · 5d9ca30e

由 Xiao Guangrong 提交于 9月 22, 2011

Sometimes, we only modify the last one byte of a pte to update status bit,
for example, clear_bit is used to clear r/w bit in linux kernel and 'andb'
instruction is used in this function, in this case, kvm_mmu_pte_write will
treat it as misaligned access, and the shadow page table is zapped
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5d9ca30e

KVM: MMU: split kvm_mmu_pte_write function · 889e5cbc

由 Xiao Guangrong 提交于 9月 22, 2011

kvm_mmu_pte_write is too long, we split it for better readable
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

889e5cbc

KVM: MMU: remove unnecessary kvm_mmu_free_some_pages · f8734352

由 Xiao Guangrong 提交于 9月 22, 2011

In kvm_mmu_pte_write, we do not need to alloc shadow page, so calling
kvm_mmu_free_some_pages is really unnecessary
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f8734352

KVM: MMU: fast prefetch spte on invlpg path · f57f2ef5

由 Xiao Guangrong 提交于 9月 22, 2011

Fast prefetch spte for the unsync shadow page on invlpg path
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f57f2ef5

KVM: MMU: cleanup FNAME(invlpg) · 505aef8f

由 Xiao Guangrong 提交于 9月 22, 2011

Directly Use mmu_page_zap_pte to zap spte in FNAME(invlpg), also remove the
same code between FNAME(invlpg) and FNAME(sync_page)
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

505aef8f

KVM: MMU: do not mark accessed bit on pte write path · d01f8d5e

由 Xiao Guangrong 提交于 9月 22, 2011

In current code, the accessed bit is always set when page fault occurred,
do not need to set it on pte write path
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d01f8d5e

KVM: x86: cleanup port-in/port-out emulated · 6f6fbe98

由 Xiao Guangrong 提交于 9月 22, 2011

Remove the same code between emulator_pio_in_emulated and
emulator_pio_out_emulated
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6f6fbe98

KVM: x86: retry non-page-table writing instructions · 1cb3f3ae

由 Xiao Guangrong 提交于 9月 22, 2011

If the emulation is caused by #PF and it is non-page_table writing instruction,
it means the VM-EXIT is caused by shadow page protected, we can zap the shadow
page and retry this instruction directly

The idea is from Avi
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1cb3f3ae

KVM: x86: tag the instructions which are used to write page table · d5ae7ce8

由 Xiao Guangrong 提交于 9月 22, 2011

The idea is from Avi:
| tag instructions that are typically used to modify the page tables, and
| drop shadow if any other instruction is used.
| The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg,
| and cmpxchg8b.

This patch is used to tag the instructions and in the later path, shadow page
is dropped if it is written by other instructions
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d5ae7ce8

KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write · f759e2b4

由 Xiao Guangrong 提交于 9月 22, 2011

kvm_mmu_pte_write is unsafe since we need to alloc pte_list_desc in the
function when spte is prefetched, unfortunately, we can not know how many
spte need to be prefetched on this path, that means we can use out of the
free pte_list_desc object in the cache, and BUG_ON() is triggered, also some
path does not fill the cache, such as INS instruction emulated that does not
trigger page fault
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f759e2b4

KVM: nVMX: Fix warning-causing idt-vectoring-info behavior · 51cfe38e

由 Nadav Har'El 提交于 9月 22, 2011

When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
nVMX patch 23, titled "Correct handling of interrupt injection".

Unfortunately, it is possible (though rare) that at this point there is valid
idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
and when L2 tried to run this interrupt's handler, it got a page fault - so
it returns the original interrupt vector in idt_vectoring_info. The problem
is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
like we wished to, because the VMX spec guarantees that idt_vectoring_info
and exit_reason_external_interrupt can never happen together. This is not
just specified in the spec - a KVM L1 actually prints a kernel warning
"unexpected, valid vectoring info" if we violate this guarantee, and some
users noticed these warnings in L1's logs.

In order to better emulate a processor, which would never return the external
interrupt and the idt-vectoring-info together, we need to separate the two
injection steps: First, complete L1's injection into L2 (i.e., enter L2,
injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
Most of this is already in the code - the only change we need is to remain
in L2 (and not exit to L1) in this case.

Note that the previous patch ensures (by using KVM_REQ_IMMEDIATE_EXIT) that
although we do enter L2 first, it will exit immediately after processing its
injection, allowing us to promptly inject to L1.

Note how we test vmcs12->idt_vectoring_info_field; This isn't really the
vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
of this field. This was explained in patch 25 ("Correct handling of idt
vectoring info") of the original nVMX patch series.

Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
to Abel Gordon for helping me figure out the solution, and to Avi Kivity
for helping to improve it.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

51cfe38e

KVM: nVMX: Add KVM_REQ_IMMEDIATE_EXIT · d6185f20

由 Nadav Har'El 提交于 9月 22, 2011

This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT.
This bit requests that when next entering the guest, we should run it only
for as little as possible, and exit again.

We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1
to continue running so it can inject an event to it, we unfortunately cannot
just pretend to have run L2 for a little while - We must really launch L2,
otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2)
will be lost. So the existing code runs L2 in this case.
But L2 could potentially run for a long time until it exits, and the
injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us
to request that L2 will be entered, as necessary, but will exit as soon as
possible after entry.

Our implementation of this request uses smp_send_reschedule() to send a
self-IPI, with interrupts disabled. The interrupts remain disabled until the
guest is entered, and then, after the entry is complete (often including
processing an injection and jumping to the relevant handler), the physical
interrupt is noticed and causes an exit.

On recent Intel processors, we could have achieved the same goal by using
MTF instead of a self-IPI. Another technique worth considering in the future
is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to
slightly improve performance by avoiding the useless interrupt handler
which ends up being called when smp_send_reschedule() is used.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d6185f20

26 12月, 2011 1 次提交

KVM: Don't automatically expose the TSC deadline timer in cpuid · 4d25a066

由 Jan Kiszka 提交于 12月 21, 2011

Unlike all of the other cpuid bits, the TSC deadline timer bit is set
unconditionally, regardless of what userspace wants.

This is broken in several ways:
 - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
   deadline timer feature, a guest that uses the feature will break
 - live migration to older host kernels that don't support the TSC deadline
   timer will cause the feature to be pulled from under the guest's feet;
   breaking it
 - guests that are broken wrt the feature will fail.

Fix by not enabling the feature automatically; instead report it to userspace.
Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
KVM_GET_SUPPORTED_CPUID.

Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.

[avi: add the KVM_CAP + documentation]
Reported-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4d25a066

25 12月, 2011 1 次提交

KVM: x86: Prevent starting PIT timers in the absence of irqchip support · 0924ab2c

由 Jan Kiszka 提交于 12月 14, 2011

User space may create the PIT and forgets about setting up the irqchips.
In that case, firing PIT IRQs will crash the host:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
IP: [<ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm]
...
Call Trace:
 [<ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm]
 [<ffffffff81071431>] process_one_work+0x111/0x4d0
 [<ffffffff81071bb2>] worker_thread+0x152/0x340
 [<ffffffff81075c8e>] kthread+0x7e/0x90
 [<ffffffff815a4474>] kernel_thread_helper+0x4/0x10

Prevent this by checking the irqchip mode before starting a timer. We
can't deny creating the PIT if the irqchips aren't set up yet as
current user land expects this order to work.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

0924ab2c

17 11月, 2011 3 次提交

G
KVM: VMX: Check for automatic switch msr table overflow · e7fc6f93
由 Gleb Natapov 提交于 10月 05, 2011
```
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
e7fc6f93

KVM: VMX: Add support for guest/host-only profiling · d7cd9796

由 Gleb Natapov 提交于 10月 05, 2011

Support guest/host-only profiling by switch perf msrs on
a guest entry if needed.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d7cd9796

KVM: VMX: add support for switching of PERF_GLOBAL_CTRL · 8bf00a52

由 Gleb Natapov 提交于 10月 05, 2011

Some cpus have special support for switching PERF_GLOBAL_CTRL msr.
Add logic to detect if such support exists and works properly and extend
msr switching code to use it if available. Also extend number of generic
msr switching entries to 8.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8bf00a52

30 10月, 2011 1 次提交

KVM: SVM: Keep intercepting task switching with NPT enabled · f1c1da2b

由 Jan Kiszka 提交于 10月 18, 2011

AMD processors apparently have a bug in the hardware task switching
support when NPT is enabled. If the task switch triggers a NPF, we can
get wrong EXITINTINFO along with that fault. On resume, spurious
exceptions may then be injected into the guest.

We were able to reproduce this bug when our guest triggered #SS and the
handler were supposed to run over a separate task with not yet touched
stack pages.

Work around the issue by continuing to emulate task switches even in
NPT mode.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f1c1da2b

21 10月, 2011 1 次提交

iommu/core: Convert iommu_found to iommu_present · a1b60c1c

由 Joerg Roedel 提交于 9月 06, 2011

With per-bus iommu_ops the iommu_found function needs to
work on a bus_type too. This patch adds a bus_type parameter
to that function and converts all call-places.
The function is also renamed to iommu_present because the
function now checks if an iommu is present for a given bus
and does not check for a global iommu anymore.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

a1b60c1c

05 10月, 2011 1 次提交

KVM: emulate lapic tsc deadline timer for guest · a3e06bbe

由 Liu, Jinsong 提交于 9月 22, 2011

This patch emulate lapic tsc deadline timer for guest:
Enumerate tsc deadline timer capability by CPUID;
Enable tsc deadline timer mode by lapic MMIO;
Start tsc deadline timer by WRMSR;

[jan: use do_div()]
[avi: fix for !irqchip_in_kernel()]
[marcelo: another fix for !irqchip_in_kernel()]
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a3e06bbe

26 9月, 2011 2 次提交

KVM: Fix simultaneous NMIs · 7460fb4a

由 Avi Kivity 提交于 9月 20, 2011

If simultaneous NMIs happen, we're supposed to queue the second
and next (collapsing them), but currently we sometimes collapse
the second into the first.

Fix by using a counter for pending NMIs instead of a bool; since
the counter limit depends on whether the processor is currently
in an NMI handler, which can only be checked in vcpu context
(via the NMI mask), we add a new KVM_REQ_NMI to request recalculation
of the counter.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7460fb4a

A
KVM: x86 emulator: convert push %sreg/pop %sreg to direct decode · 1cd196ea
由 Avi Kivity 提交于 9月 13, 2011
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
1cd196ea