提交 · 5217973ef899ffecdd77743aae3b633994a08cde · openeuler / raspberrypi-kernel

26 9月, 2011 40 次提交

KVM: x86 emulator: qualify OpReg inhibit_byte_regs hack · 5217973e

由 Avi Kivity 提交于 9月 13, 2011

OpReg decoding has a hack that inhibits byte registers for movsx and movzx
instructions.  It should be replaced by something better, but meanwhile,
qualify that the hack is only active for the destination operand.

Note these instructions only use OpReg for the destination, but better to
be explicit about it.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

5217973e

KVM: x86 emulator: switch OpImmUByte decode to decode_imm() · 608aabe3

由 Avi Kivity 提交于 9月 13, 2011

Similar to SrcImmUByte.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

608aabe3

KVM: x86 emulator: free up some flag bits near src, dst · 20c29ff2

由 Avi Kivity 提交于 9月 13, 2011

Op fields are going to grow by a bit, we need two free bits.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

20c29ff2

A
KVM: x86 emulator: switch src2 to generic decode_operand() · 4dd6a57d
由 Avi Kivity 提交于 9月 13, 2011
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
4dd6a57d

KVM: x86 emulator: expand decode flags to 64 bits · b1ea50b2

由 Avi Kivity 提交于 9月 13, 2011

Unifiying the operands means not taking advantage of the fact that some
operand types can only go into certain operands (for example, DI can only
be used by the destination), so we need more bits to hold the operand type.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

b1ea50b2

KVM: x86 emulator: split dst decode to a generic decode_operand() · a9945549

由 Avi Kivity 提交于 9月 13, 2011

Instead of decoding each operand using its own code, use a generic
function.  Start with the destination operand.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a9945549

KVM: x86 emulator: move memop, memopp into emulation context · f09ed83e

由 Avi Kivity 提交于 9月 13, 2011

Simplifies further generalization of decode.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f09ed83e

A
KVM: x86 emulator: convert group 3 instructions to direct decode · 3329ece1
由 Avi Kivity 提交于 9月 13, 2011
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
3329ece1

KVM: Split up MSI-X assigned device IRQ handler · cc079396

由 Jan Kiszka 提交于 9月 12, 2011

The threaded IRQ handler for MSI-X has almost nothing in common with the
INTx/MSI handler. Move its code into a dedicated handler.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cc079396

KVM: x86: Add module parameter for lapic periodic timer limit · 9bc5791d

由 Jan Kiszka 提交于 9月 12, 2011

Certain guests, specifically RTOSes, request faster periodic timers than
what we allow by default. Add a module parameter to adjust the limit for
non-standard setups. Also add a rate-limited warning in case the guest
requested more.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9bc5791d

KVM: Clean up and extend rate-limited output · bd80158a

由 Jan Kiszka 提交于 9月 12, 2011

The use of printk_ratelimit is discouraged, replace it with
pr*_ratelimited or __ratelimit. While at it, convert remaining
guest-triggerable printks to rate-limited variants.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

bd80158a

KVM: x86: Avoid guest-triggerable printks in APIC model · 7712de87

由 Jan Kiszka 提交于 9月 12, 2011

Convert remaining printks that the guest can trigger to apic_printk.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

7712de87

KVM: x86: Move kvm_trace_exit into atomic vmexit section · 1e2b1dd7

由 Jan Kiszka 提交于 9月 12, 2011

This avoids that events causing the vmexit are recorded before the
actual exit reason.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1e2b1dd7

KVM: x86 emulator: disable writeback for TEST · caa8a168

由 Avi Kivity 提交于 9月 11, 2011

The TEST instruction doesn't write its destination operand.  This
could cause problems if an MMIO register was accessed using the TEST
instruction.  Recently Windows XP was observed to use TEST against
the APIC ICR; this can cause spurious IPIs.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

caa8a168

KVM: Avoid needless registrations of IRQ ack notifier for assigned devices · c61fa9d6

由 Jan Kiszka 提交于 9月 11, 2011

We only perform work in kvm_assigned_dev_ack_irq if the guest IRQ is of
INTx type. This completely avoids the callback invocation in non-INTx
cases by registering the IRQ ack notifier only for INTx.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

c61fa9d6

KVM: Clean up unneeded void pointer casts · 9f9f6b78

由 Jan Kiszka 提交于 9月 11, 2011

Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9f9f6b78

KVM: x86 emulator: simplify emulate_1op_rax_rdx() · e8f2b1d6

由 Avi Kivity 提交于 9月 07, 2011

emulate_1op_rax_rdx() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e8f2b1d6

KVM: x86 emulator: merge the two emulate_1op_rax_rdx implementations · 9fef72ce

由 Avi Kivity 提交于 9月 07, 2011

We have two emulate-with-extended-accumulator implementations: once
which expect traps (_ex) and one which doesn't (plain).  Drop the
plain implementation and always use the one which expects traps;
it will simply return 0 in the _ex argument and we can happily ignore
it.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

9fef72ce

KVM: x86 emulator: simplify emulate_1op() · d1eef45d

由 Avi Kivity 提交于 9月 07, 2011

emulate_1op() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d1eef45d

KVM: x86 emulator: simplify emulate_2op_cl() · 29053a60

由 Avi Kivity 提交于 9月 07, 2011

emulate_2op_cl() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

29053a60

KVM: x86 emulator: simplify emulate_2op_cl() · 761441b9

由 Avi Kivity 提交于 9月 07, 2011

emulate_2op_cl() is always called with the same parameters.  Simplify
by passing just the emulation context.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

761441b9

KVM: x86 emulator: simplify emulate_2op_SrcV() · a31b9cea

由 Avi Kivity 提交于 9月 07, 2011

emulate_2op_SrcV(), and its siblings, emulate_2op_SrcV_nobyte()
and emulate_2op_SrcB(), all use the same calling conventions
and all get passed exactly the same parameters.  Simplify them
by passing just the emulation context.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a31b9cea

KVM: Update documentation to include detailed ENABLE_CAP description · 821246a5

由 Alexander Graf 提交于 8月 31, 2011

We have an ioctl that enables capabilities individually, but no description
on what exactly happens when we enable a capability using this ioctl.

This patch adds documentation for capability enabling in a new section
of the API documentation.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

821246a5

KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code · 19ccb76a

由 Paul Mackerras 提交于 7月 23, 2011

With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.

This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.

This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.

This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.

Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.

Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.

Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

19ccb76a

KVM: PPC: book3s_pr: Simplify transitions between virtual and real mode · 02143947

由 Paul Mackerras 提交于 7月 23, 2011

This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest.  We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.

kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.

This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt.  Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on.  The handler will then return to the virtual-mode code
(potentially in the module).

Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.

The net result is that we no longer need any code addresses to be
stored in vcpu->arch.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

02143947

KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately · 177339d7

由 Paul Mackerras 提交于 7月 23, 2011

This makes arch/powerpc/kvm/book3s_rmhandlers.S and
arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
separate compilation units rather than having them #included in
arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
conditional branches between the exception prologs in
exceptions-64s.S and the KVM handlers, so there is no need to
keep their contents close together in the vmlinux image.

In their current location, they are using up part of the limited
space between the first-level interrupt handlers and the firmware
NMI data area at offset 0x7000, and with some kernel configurations
this area will overflow (e.g. allyesconfig), leading to an
"attempt to .org backwards" error when compiling exceptions-64s.S.

Moving them out requires that we add some #includes that the
book3s_{,hv_}rmhandlers.S code was previously getting implicitly
via exceptions-64s.S.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

177339d7

KVM: PPC: Add sanity checking to vcpu_run · af8f38b3

由 Alexander Graf 提交于 8月 10, 2011

There are multiple features in PowerPC KVM that can now be enabled
depending on the user's wishes. Some of the combinations don't make
sense or don't work though.

So this patch adds a way to check if the executing environment would
actually be able to run the guest properly. It also adds sanity
checks if PVR is set (should always be true given the current code
flow), if PAPR is only used with book3s_64 where it works and that
HV KVM is only used in PAPR mode.
Signed-off-by: NAlexander Graf <agraf@suse.de>

af8f38b3

KVM: PPC: Enable the PAPR CAP for Book3S · 930b412a

由 Alexander Graf 提交于 8月 08, 2011

Now that Book3S PV mode can also run PAPR guests, we can add a PAPR cap and
enable it for all Book3S targets. Enabling that CAP switches KVM into PAPR
mode.
Signed-off-by: NAlexander Graf <agraf@suse.de>

930b412a

KVM: PPC: Support SC1 hypercalls for PAPR in PR mode · a668f2bd

由 Alexander Graf 提交于 8月 08, 2011

PAPR defines hypercalls as SC1 instructions. Using these, the guest modifies
page tables and does other privileged operations that it wouldn't be allowed
to do in supervisor mode.

This patch adds support for PR KVM to trap these instructions and route them
through the same PAPR hypercall interface that we already use for HV style
KVM.
Signed-off-by: NAlexander Graf <agraf@suse.de>

a668f2bd

KVM: PPC: Stub emulate CFAR and PURR SPRs · aacf9aa3

由 Alexander Graf 提交于 8月 08, 2011

Recent Linux versions use the CFAR and PURR SPRs, but don't really care about
their contents (yet). So for now, we can simply return 0 when the guest wants
to read them.
Signed-off-by: NAlexander Graf <agraf@suse.de>

aacf9aa3

KVM: PPC: Add PAPR hypercall code for PR mode · 0254f074

由 Alexander Graf 提交于 8月 08, 2011

When running a PAPR guest, we need to handle a few hypercalls in kernel space,
most prominently the page table invalidation (to sync the shadows).

So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
to share the code with HV mode, but it ended up being a lot easier this way
around, as the two differ too much in those details.
Signed-off-by: NAlexander Graf <agraf@suse.de>

---

v1 -> v2:

  - whitespace fix

0254f074

KVM: PPC: Add support for explicit HIOR setting · a15bd354

由 Alexander Graf 提交于 8月 08, 2011

Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SREGS based method, we drop the PVR logic.
Signed-off-by: NAlexander Graf <agraf@suse.de>

a15bd354

KVM: PPC: Read out syscall instruction on trap · 77e675ad

由 Alexander Graf 提交于 8月 08, 2011

We have a few traps where we cache the instruction that cause the trap
for analysis later on. Since we now need to be able to distinguish
between SC 0 and SC 1 system calls and the only way to find out which
is which is by looking at the instruction, we also read out the instruction
causing the system call.
Signed-off-by: NAlexander Graf <agraf@suse.de>

77e675ad

KVM: PPC: Interpret SDR1 as HVA in PAPR mode · 04fcc11b

由 Alexander Graf 提交于 8月 08, 2011

When running a PAPR guest, the guest is not allowed to set SDR1 - instead
the HTAB information is held in internal hypervisor structures. But all of
our current code relies on SDR1 and walking the HTAB like on real hardware.

So in order to not be too intrusive, we simply set SDR1 to the HTAB we hold
in host memory. That way we can keep the HTAB in user space, but use it from
kernel space to map the guest.
Signed-off-by: NAlexander Graf <agraf@suse.de>

04fcc11b

KVM: PPC: Check privilege level on SPRs · 317a8fa3

由 Alexander Graf 提交于 8月 08, 2011

We have 3 privilege levels: problem state, supervisor state and hypervisor
state. Each of them can access different SPRs, so we need to check on every
SPR if it's accessible in the respective mode.
Signed-off-by: NAlexander Graf <agraf@suse.de>

317a8fa3

KVM: PPC: Add papr_enabled flag · 9432ba60

由 Alexander Graf 提交于 8月 08, 2011

When running a PAPR guest, some things change. The privilege level drops
from hypervisor to supervisor, SDR1 gets treated differently and we interpret
hypercalls. For bisectability sake, add the flag now, but only enable it when
all the support code is there.
Signed-off-by: NAlexander Graf <agraf@suse.de>

9432ba60

KVM: PPC: move compute_tlbie_rb to book3s common header · db507c30

由 Alexander Graf 提交于 7月 08, 2011

We need the compute_tlbie_rb in _pr and _hv implementations for papr
soon, so let's move it over to a common header file that both
implementations can leverage.
Signed-off-by: NAlexander Graf <agraf@suse.de>

db507c30

KVM: Restore missing powerpc API docs · 36442687

由 Avi Kivity 提交于 8月 29, 2011

Commit 371fefd6 lost a doc hunk somehow, restore it.
Signed-off-by: NAvi Kivity <avi@redhat.com>

36442687

KVM: APIC: avoid instruction emulation for EOI writes · 58fbbf26

由 Kevin Tian 提交于 8月 30, 2011

Instruction emulation for EOI writes can be skipped, since sane
guest simply uses MOV instead of string operations. This is a nice
improvement when guest doesn't support x2apic or hyper-V EOI
support.

a single VM bandwidth is observed with ~8% bandwidth improvement
(7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation.
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
<Based on earlier work from>:
Signed-off-by: NEddie Dong <eddie.dong@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

58fbbf26

KVM: SVM: Fix TSC MSR read in nested SVM · 45133eca

由 Nadav Har'El 提交于 8月 02, 2011

When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be
read without exit), we need to return L2's notion of the TSC, not L1's.

The current code incorrectly returned L1 TSC, because svm_get_msr() was also
used in x86.c where this was assumed, but now that these places call the new
svm_read_l1_tsc(), the MSR read can be fixed.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Tested-by: NJoerg Roedel <joerg.roedel@amd.com>
Acked-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

45133eca