提交 · bcd6acd51f3d4d1ada201e9bc5c40a31d6d80c71 · OpenHarmony / kernel_linux

06 12月, 2009 1 次提交

x86: Convert BUG() to use unreachable() · a5fc5eba

由 David Daney 提交于 12月 04, 2009

Use the new unreachable() macro instead of for(;;);.  When
allyesconfig is built with a GCC-4.5 snapshot on i686 the size of the
text segment is reduced by 3987 bytes (from 6827019 to 6823032).
Signed-off-by: NDavid Daney <ddaney@caviumnetworks.com>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5fc5eba

03 12月, 2009 17 次提交

KVM: VMX: Fix comparison of guest efer with stale host value · d5696725

由 Avi Kivity 提交于 12月 02, 2009

update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.

However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.

Fix by deferring the mask/compare to the actual point of guest entry.

Noted by Marcelo.
Signed-off-by: NAvi Kivity <avi@redhat.com>

d5696725

KVM: x86 emulator: limit instructions to 15 bytes · eb3c79e6

由 Avi Kivity 提交于 11月 24, 2009

While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

eb3c79e6

KVM: x86: Add KVM_GET/SET_VCPU_EVENTS · 3cfc3092

由 Jan Kiszka 提交于 11月 12, 2009

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3cfc3092

KVM: x86 shared msr infrastructure · 18863bdd

由 Avi Kivity 提交于 9月 07, 2009

The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.
Signed-off-by: NAvi Kivity <avi@redhat.com>

18863bdd

KVM: allow userspace to adjust kvmclock offset · afbcf7ab

由 Glauber Costa 提交于 10月 16, 2009

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

afbcf7ab

KVM: SVM: Cleanup NMI singlestep · 6be7d306

由 Jan Kiszka 提交于 10月 18, 2009

Push the NMI-related singlestep variable into vcpu_svm. It's dealing
with an AMD-specific deficit, nothing generic for x86.
Acked-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>

 arch/x86/include/asm/kvm_host.h |    1 -
 arch/x86/kvm/svm.c              |   12 +++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6be7d306

KVM: x86: Fix guest single-stepping while interruptible · 94fe45da

由 Jan Kiszka 提交于 10月 18, 2009

Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.

Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

94fe45da

KVM: Xen PV-on-HVM guest support · ffde22ac

由 Ed Swierk 提交于 10月 15, 2009

Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: NEd Swierk <eswierk@aristanetworks.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ffde22ac

KVM: SVM: Support Pause Filter in AMD processors · 565d0998

由 Mark Langsdorf 提交于 10月 06, 2009

New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature.  This feature creates a new field in the VMCB
called Pause Filter Count.  If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.
Signed-off-by: NMark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

565d0998

KVM: VMX: Add support for Pause-Loop Exiting · 4b8d54f9

由 Zhai, Edwin 提交于 10月 09, 2009

New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: NZhai Edwin <edwin.zhai@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4b8d54f9

KVM: x86: Rework guest single-step flag injection and filtering · 91586a3b

由 Jan Kiszka 提交于 10月 05, 2009

Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

91586a3b

KVM: x86: Refactor guest debug IOCTL handling · 355be0b9

由 Jan Kiszka 提交于 10月 03, 2009

Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

355be0b9

KVM: Activate Virtualization On Demand · 10474ae8

由 Alexander Graf 提交于 9月 15, 2009

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

10474ae8

KVM: Move irq ack notifier list to arch independent code · 136bdfee

由 Gleb Natapov 提交于 8月 24, 2009

Mask irq notifier list is already there.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

136bdfee

KVM: Maintain back mapping from irqchip/pin to gsi · 3e71f88b

由 Gleb Natapov 提交于 8月 24, 2009

Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3e71f88b

KVM: Move irq sharing information to irqchip level · 1a6e4a8c

由 Gleb Natapov 提交于 8月 24, 2009

This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1a6e4a8c

KVM: Don't pass kvm_run arguments · 851ba692

由 Avi Kivity 提交于 8月 24, 2009

They're just copies of vcpu->run, which is readily accessible.
Signed-off-by: NAvi Kivity <avi@redhat.com>

851ba692

02 12月, 2009 3 次提交

x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h · 99063c0b

由 Jan Beulich 提交于 11月 27, 2009

This at once also gets the alignment specification right for
x86-64.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8F80200007800022708@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

99063c0b

x86/alternatives: Check replacementlen <= instrlen at build time · 01be50a3

由 Jan Beulich 提交于 11月 27, 2009

Having run into the run-(boot-)time check a couple of times lately,
I finally took time to find a build-time check so that one doesn't
need to analyze the register/stack dump and resolve this (through
manual lookup in vmlinux) to the offending construct.

The assembler will emit a message like "Error: value of <num> too
large for field of 1 bytes at <offset>", which while not pointing
out the source location still makes analysis quite a bit easier.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8AA0200007800022703@vpn.id2.novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

01be50a3

x86: Fix comments of register/stack access functions · e859cf86

由 Masami Hiramatsu 提交于 11月 30, 2009

Fix typos and some redundant comments of register/stack access
functions in asm/ptrace.h.
Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Wenji Huang <wenji.huang@oracle.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
LKML-Reference: <20091201000222.7669.7477.stgit@harusame>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Suggested-by: NWenji Huang <wenji.huang@oracle.com>

e859cf86

01 12月, 2009 1 次提交

x86, mm: Correct the implementation of is_untracked_pat_range() · ccef0864

由 H. Peter Anvin 提交于 11月 30, 2009

The semantics the PAT code expect of is_untracked_pat_range() is "is
this range completely contained inside the untracked region."  This
means that checkin 8a271389 was
technically wrong, because the implementation needlessly confusing.

The sane interface is for it to take a semiclosed range like just
about everything else (as evidenced by the sheer number of "- 1"'s
removed by that patch) so change the actual implementation to match.
Reported-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>

ccef0864

27 11月, 2009 12 次提交

x86/amd-iommu: Remove amd_iommu_pd_table · 492667da

由 Joerg Roedel 提交于 11月 27, 2009

The data that was stored in this table is now available in
dev->archdata.iommu. So this table is not longer necessary.
This patch removes the remaining uses of that variable and
removes it from the code.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

492667da

x86/amd-iommu: Cleanup DTE flushing code · b00d3bcf

由 Joerg Roedel 提交于 11月 26, 2009

This patch cleans up the code to flush device table entries
in the IOMMU. With this chance the driver can get rid of the
iommu_queue_inv_dev_entry() function.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

b00d3bcf

x86/amd-iommu: Keep devices per domain in a list · 7c392cbe

由 Joerg Roedel 提交于 11月 26, 2009

This patch introduces a list to each protection domain which
keeps all devices associated with the domain. This can be
used later to optimize certain functions and to completly
remove the amd_iommu_pd_table.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

7c392cbe

x86/amd-iommu: Add device bind reference counting · 24100055

由 Joerg Roedel 提交于 11月 25, 2009

This patch adds a reference count to each device to count
how often the device was bound to that domain. This is
important for single devices that act as an alias for a
number of others. These devices must stay bound to their
domains until all devices that alias to it are unbound from
the same domain.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

24100055

x86/amd-iommu: Use dev->arch->iommu to store iommu related information · 657cbb6b

由 Joerg Roedel 提交于 11月 23, 2009

This patch changes IOMMU code to use dev->archdata->iommu to
store information about the alias device and the domain the
device is attached to.
This allows the driver to get rid of the amd_iommu_pd_table
in the future.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

657cbb6b

x86/amd-iommu: Remove support for domain sharing · 8793abeb

由 Joerg Roedel 提交于 11月 27, 2009

This patch makes device isolation mandatory and removes
support for the amd_iommu=share option. This simplifies the
code in several places.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

8793abeb

x86/amd-iommu: Make np-cache a global flag · 318afd41

由 Joerg Roedel 提交于 11月 23, 2009

The non-present cache flag was IOMMU local until now which
doesn't make sense. Make this a global flag so we can remove
the lase user of 'struct iommu' in the map/unmap path.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

318afd41

x86/amd-iommu: Implement protection domain list · aeb26f55