提交 · dee6bb70e4ac0588c98cc4e661664f0653117f89 · openeuler / raspberrypi-kernel

11 5月, 2011 31 次提交

KVM: SVM: Add intercept checks for descriptor table accesses · dee6bb70

由 Joerg Roedel 提交于 4月 04, 2011

This patch add intercept checks into the KVM instruction
emulator to check for the 8 instructions that access the
descriptor table addresses.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

dee6bb70

KVM: SVM: Add intercept check for accessing dr registers · 3b88e41a

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds the intercept checks for instruction
accessing the debug registers.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3b88e41a

KVM: SVM: Add intercept check for emulated cr accesses · cfec82cb

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds all necessary intercept checks for
instructions that access the crX registers.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cfec82cb

KVM: x86: Add x86 callback for intercept check · 8a76d7f2

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds a callback into kvm_x86_ops so that svm and
vmx code can do intercept checks on emulated instructions.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8a76d7f2

KVM: x86 emulator: Add flag to check for protected mode instructions · 8ea7d6ae

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds a flag for the opcoded to tag instruction
which are only recognized in protected mode. The necessary
check is added too.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8ea7d6ae

KVM: x86 emulator: Add check_perm callback · d09beabd

由 Joerg Roedel 提交于 4月 04, 2011

This patch adds a check_perm callback for each opcode into
the instruction emulator. This will be used to do all
necessary permission checks on instructions before checking
whether they are intercepted or not.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d09beabd

KVM: x86 emulator: Don't write-back cpu-state on X86EMUL_INTERCEPTED · 775fde86

由 Joerg Roedel 提交于 4月 04, 2011

This patch prevents the changed CPU state to be written back
when the emulator detected that the instruction was
intercepted by the guest.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

775fde86

KVM: x86 emulator: add SVM intercepts · 3c6e276f

由 Avi Kivity 提交于 4月 04, 2011

Add intercept codes for instructions defined by SVM as
interceptable.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3c6e276f

KVM: x86 emulator: add framework for instruction intercepts · c4f035c6

由 Avi Kivity 提交于 4月 04, 2011

When running in guest mode, certain instructions can be intercepted by
hardware.  This also holds for nested guests running on emulated
virtualization hardware, in particular instructions emulated by kvm
itself.

This patch adds a framework for intercepting instructions.  If an
instruction is marked for interception, and if we're running in guest
mode, a callback is called to check whether an intercept is needed or
not.  The callback is called at three points in time: immediately after
beginning execution, after checking privilge exceptions, and after
checking memory exception.  This suits the different interception points
defined for different instructions and for the various virtualization
instruction sets.

In addition, a new X86EMUL_INTERCEPT is defined, which any callback or
memory access may define, allowing the more complicated intercepts to be
implemented in existing callbacks.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c4f035c6

A
KVM: x86 emulator: implement movdqu instruction (f3 0f 6f, f3 0f 7f) · aa97bb48
由 Avi Kivity 提交于 1月 20, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
aa97bb48

KVM: x86 emulator: SSE support · 1253791d

由 Avi Kivity 提交于 3月 29, 2011

Add support for marking an instruction as SSE, switching registers used
to the SSE register file.
Signed-off-by: NAvi Kivity <avi@redhat.com>

1253791d

KVM: x86 emulator: Specialize decoding for insns with 66/f2/f3 prefixes · 0d7cdee8

由 Avi Kivity 提交于 3月 29, 2011

Most SIMD instructions use the 66/f2/f3 prefixes to distinguish between
different variants of the same instruction.  Usually the encoding is quite
regular, but in some cases (including non-SIMD instructions) the prefixes
generate very different instructions.  Examples include XCHG/PAUSE,
MOVQ/MOVDQA/MOVDQU, and MOVBE/CRC32.

Allow the emulator to handle these special cases by splitting such opcodes
into groups, with different decode flags and execution functions for different
prefixes.
Signed-off-by: NAvi Kivity <avi@redhat.com>

0d7cdee8

A
KVM: x86 emulator: define callbacks for using the guest fpu within the emulator · 5037f6f3
由 Avi Kivity 提交于 3月 28, 2011
```
Needed for emulating fpu instructions.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
5037f6f3

KVM: x86 emulator: do not munge rep prefix · 1d6b114f

由 Avi Kivity 提交于 1月 20, 2010

Currently we store a rep prefix as 1 or 2 depending on whether it is a REPE or
REPNE.  Since sse instructions depend on the prefix value, store it as the
original opcode to simplify things further on.
Signed-off-by: NAvi Kivity <avi@redhat.com>

1d6b114f

KVM: 16-byte mmio support · cef4dea0

由 Avi Kivity 提交于 1月 20, 2010

Since sse instructions can issue 16-byte mmios, we need to support them. We
can't increase the kvm_run mmio buffer size to 16 bytes without breaking
compatibility, so instead we break the large mmios into two smaller 8-byte
ones. Since the bus is 64-bit we aren't breaking any atomicity guarantees.
Signed-off-by: NAvi Kivity <avi@redhat.com>

cef4dea0

A
KVM: Split mmio completion into a function · 5287f194
由 Avi Kivity 提交于 1月 19, 2010
```
Make room for sse mmio completions.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
5287f194
A
KVM: extend in-kernel mmio to handle >8 byte transactions · 70252a10
由 Avi Kivity 提交于 1月 19, 2010
```
Needed for coalesced mmio using sse.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
70252a10
G
KVM: x86: better fix for race between nmi injection and enabling nmi window · 1499e54a
由 Gleb Natapov 提交于 4月 01, 2011
```
Fix race between nmi injection and enabling nmi window in a simpler way.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
1499e54a
M
Revert "KVM: Fix race between nmi injection and enabling nmi window" · c761e586
由 Marcelo Tosatti 提交于 4月 01, 2011
```
This reverts commit f8636849.

Simpler fix to follow.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
c761e586

KVM: expose async pf through our standard mechanism · 32918924

由 Glauber Costa 提交于 3月 23, 2011

As Avi recently mentioned, the new standard mechanism for exposing features
is KVM_GET_SUPPORTED_CPUID, not spamming CAPs. For some reason async pf
missed that.

So expose async_pf here.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
CC: Gleb Natapov <gleb@redhat.com>
CC: Avi Kivity <avi@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

32918924

KVM: VMX: simplify NMI mask management · 654f06fc

由 Avi Kivity 提交于 3月 23, 2011

Use vmx_set_nmi_mask() instead of open-coding management of
the hardware bit and the software hint (nmi_known_unmasked).

There's a slight change of behaviour when running without
hardware virtual NMI support - we now clear the NMI mask if
NMI delivery faulted in that case as well.  This improves
emulation accuracy.
Signed-off-by: NAvi Kivity <avi@redhat.com>

654f06fc

KVM: SVM: Remove unused svm_features · 89a9fb78

由 Jan Kiszka 提交于 3月 24, 2011

We use boot_cpu_has now.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

89a9fb78

KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception · 88786475

由 Avi Kivity 提交于 3月 07, 2011

vmx_complete_atomic_exit() cached it for us, so we can use it here.
Signed-off-by: NAvi Kivity <avi@redhat.com>

88786475

A
KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally · c5ca8e57
由 Avi Kivity 提交于 3月 07, 2011
```
Only read it if we're going to use it later.
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
c5ca8e57

KVM: VMX: Refactor vmx_complete_atomic_exit() · 00eba012

由 Avi Kivity 提交于 3月 07, 2011

Move the exit reason checks to the front of the function, for early
exit in the common case.
Signed-off-by: NAvi Kivity <avi@redhat.com>

00eba012

KVM: VMX: Qualify check for host NMI · f9902069

由 Avi Kivity 提交于 3月 07, 2011

Check for the exit reason first; this allows us, later,
to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f9902069

KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded · 9d58b931

由 Avi Kivity 提交于 3月 07, 2011

When we haven't injected an interrupt, we don't need to recover
the nmi blocking state (since the guest can't set it by itself).
This allows us to avoid a VMREAD later on.
Signed-off-by: NAvi Kivity <avi@redhat.com>

9d58b931

KVM: VMX: Cache cpl · 69c73028

由 Avi Kivity 提交于 3月 07, 2011

We may read the cpl quite often in the same vmexit (instruction privilege
check, memory access checks for instruction and operands), so we gain
a bit if we cache the value.
Signed-off-by: NAvi Kivity <avi@redhat.com>

69c73028

KVM: VMX: Optimize vmx_get_cpl() · f4c63e5d

由 Avi Kivity 提交于 3月 07, 2011

In long mode, vm86 mode is disallowed, so we need not check for
it.  Reading rflags.vm may require a VMREAD, so it is expensive.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f4c63e5d

KVM: VMX: Optimize vmx_get_rflags() · 6de12732

由 Avi Kivity 提交于 3月 07, 2011

If called several times within the same exit, return cached results.
Signed-off-by: NAvi Kivity <avi@redhat.com>

6de12732

KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions · f6e78475

由 Avi Kivity 提交于 8月 02, 2010

Some rflags bits are owned by the host, not guest, so we need to use
kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them
back.
Signed-off-by: NAvi Kivity <avi@redhat.com>

f6e78475

06 5月, 2011 1 次提交

perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions · 63b6a675

由 Peter Zijlstra 提交于 4月 23, 2011

The Intel Nehalem offcore bits implemented in:

  e994d7d2: perf: Fix LLC-* events on Intel Nehalem/Westmere

... are wrong: they implemented _ACCESS as _HIT and counted OTHER_CORE_HIT* as
MISS even though its clearly documented as an L3 hit ...

Fix them and the Westmere definitions as well.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1299119690-13991-3-git-send-email-ming.m.lin@intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

63b6a675

03 5月, 2011 3 次提交

x86, reboot: Fix relocations in reboot_32.S · 7806a49a

由 H. Peter Anvin 提交于 5月 02, 2011

The use of base for %ebx in this file is arbitrary, *except* that we
also use it to compute the real-mode segment.  Therefore, make it so
that r_base really is the true address to which %ebx points.

This resolves kernel bugzilla 33302.
Reported-and-tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org

7806a49a

xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top · b9269dc7

由 Stefano Stabellini 提交于 4月 12, 2011

mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b9269dc7

xen/mmu: Add workaround "x86-64, mm: Put early page table high" · a3864783

由 Konrad Rzeszutek Wilk 提交于 4月 29, 2011

As a consequence of the commit:

commit 4b239f45
Author: Yinghai Lu <yinghai@kernel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

it causes the Linux kernel to crash under Xen:

mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...

The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).

For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).

Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].

When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.

And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a3864783

02 5月, 2011 2 次提交

x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() · 2be19102

由 Yinghai Lu 提交于 5月 01, 2011

numa_cleanup_meminfo() trims each memblk between low (0) and
high (max_pfn) limits and discards empty ones.  However, the
emptiness detection incorrectly used equality test.  If the
start of a memblk is higher than max_pfn, it is empty but fails
the equality test and doesn't get discarded.

The condition triggers when max_pfn is lower than start of a
NUMA node and results in memory misconfiguration - leading to
WARN_ON()s and other funnies.  The bug was discovered in devel
branch where 32bit too uses this code path for NUMA init.  If a
node is above the addressing limit, max_pfn ends up lower than
the node triggering this problem.

The failure hasn't been observed on x86-64 but is still possible
with broken hardware e820/NUMA info.  As the fix is very low
risk, it would be better to apply it even for 64bit.

Fix it by using >= instead of ==.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
[ Extracted the actual fix from the original patch and rewrote patch description. ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

2be19102

x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors · e20a2d20

由 Boris Ostrovsky 提交于 4月 29, 2011

Older AMD K8 processors (Revisions A-E) are affected by erratum
400 (APIC timer interrupts don't occur in C states greater than
C1). This, for example, means that X86_FEATURE_ARAT flag should
not be set for these parts.

This addresses regression introduced by commit
b87cf80a ("x86, AMD: Set ARAT
feature on AMD processors") where the system may become
unresponsive until external interrupt (such as keyboard input)
occurs. This results, for example, in time not being reported
correctly, lack of progress on the system and other lockups.
Reported-by: NJoerg-Volker Peetz <jvpeetz@web.de>
Tested-by: NJoerg-Volker Peetz <jvpeetz@web.de>
Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NBoris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

e20a2d20

28 4月, 2011 2 次提交

x86: ce4100: Configure IOAPIC pins for USB and SATA to level type · 1ff42c32

由 Sebastian Andrzej Siewior 提交于 4月 27, 2011

The USB and SATA ioapic interrrupt pins are configured as edge type,
but need to be level type interrupts to work correctly.

[ tglx: Split out from the combo patch ]

Cc: Torben Hohn <torbenh@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1ff42c32

x86: devicetree: Configure IOAPIC pin only once · 20443598

由 Sebastian Andrzej Siewior 提交于 4月 27, 2011

We use io_apic_setup_irq_pin() in order to configure pin's interrupt
number polarity and type. This is done on every irq_create_of_mapping()
which happens for instance during pci enable calls. Level typed
interrupts are masked by default, edge are unmasked.

On the first ->xlate() call the level interrupt is configured and
masked. The driver calls request_irq() and the line is unmasked. Lets
assume the interrupt line is shared with another device and we call
pci_enable_device() for this device. The ->xlate() configures the pin
again and it is masked. request_irq() does not unmask the line because
it _is_ already unmasked according to its internal state. So the
interrupt will never be unmasked again.

This patch is based on an earlier work by Torben Hohn and solves the
problem by configuring the pin only once. Since all devices must agree
on the same type and polarity there is no point in configuring the pin
more than once.

[ tglx: Split out the ce4100 part into a separate patch ]

Cc: Torben Hohn <torbenh@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/%3C20110427143052.GA15211%40linutronix.de%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>

20443598

27 4月, 2011 1 次提交

perf, x86, nmi: Move LVT un-masking into irq handlers · 2bce5dac

由 Don Zickus 提交于 4月 27, 2011

It was noticed that P4 machines were generating double NMIs for
each perf event.  These extra NMIs lead to 'Dazed and confused'
messages on the screen.

I tracked this down to a P4 quirk that said the overflow bit had
to be cleared before re-enabling the apic LVT mask.  My first
attempt was to move the un-masking inside the perf nmi handler
from before the chipset NMI handler to after.

This broke Nehalem boxes that seem to like the unmasking before
the counters themselves are re-enabled.

In order to keep this change simple for 2.6.39, I decided to
just simply move the apic LVT un-masking to the beginning of all
the chipset NMI handlers, with the exception of Pentium4's to
fix the double NMI issue.

Later on we can move the un-masking to later in the handlers to
save a number of 'extra' NMIs on those particular chipsets.

I tested this change on a P4 machine, an AMD machine, a Nehalem
box, and a core2quad box.  'perf top' worked correctly along
with various other small 'perf record' runs.  Anything high
stress breaks all the machines but that is a different problem.

Thanks to various people for testing different versions of this
patch.
Reported-and-tested-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
CC: Cyrill Gorcunov <gorcunov@gmail.com>

2bce5dac