提交 · 2fe77b81c77eed92c4c0439f74c8148a295b4a86 · xiphi1978 / linux

11 12月, 2009 4 次提交

kgdb,x86: do not set kgdb_single_step on x86 · 8097551d

由 Jason Wessel 提交于 12月 11, 2009

On an SMP system the kgdb_single_step flag has the possibility to
indefinitely hang the system in the case.  Consider the case where,
CPU 1 has the schedule lock and CPU 0 is set to single step, there is
no way for CPU 0 to run another task.

The easy way to observe the problem is to make 2 cpus busy, and run
the kgdb test suite.  You will see that it hangs the system very
quickly.

while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done &
echo V1 > /sys/module/kgdbts/parameters/kgdbts

The side effect of this patch is that there is the possibility
to miss a breakpoint in the case that a single step operation
was executed to step over a breakpoint in common code.

The trade off of the missed breakpoint is preferred to
hanging the kernel.  This can be fixed in the future by
using kprobes or another strategy to step over planted
breakpoints with out of line execution.

CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

8097551d

kgdb,i386: Fix corner case access to ss with NMI watch dog exception · cf6f196d

由 Jason Wessel 提交于 12月 11, 2009

It is possible for the user_mode_vm(regs) check to return true on the
i368 arch for a non master kgdb cpu or when the master kgdb cpu
handles the NMI watch dog exception.

The solution is simply to select the correct gdb_ss location
based on the check to user_mode_vm(regs).

CC: Ingo Molnar <mingo@elte.hu>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

cf6f196d

kgdb,x86: remove redundant test · a5d09d68

由 Roel Kluin 提交于 12月 11, 2009

The for loop starts with a breakno of 0, and ends when it's 4. so
this test is always true.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

a5d09d68

Unify sys_mmap* · f8b72560

由 Al Viro 提交于 11月 30, 2009

New helper - sys_mmap_pgoff(); switch syscalls to using it.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f8b72560

10 12月, 2009 1 次提交

vfs: Implement proper O_SYNC semantics · 6b2f3d1f

由 Christoph Hellwig 提交于 10月 27, 2009

While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag.  To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.

This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.

We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.

Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op.  We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NKyle McMartin <kyle@mcmartin.ca>
Acked-by: NUlrich Drepper <drepper@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJan Kara <jack@suse.cz>

6b2f3d1f

06 12月, 2009 1 次提交

x86: Convert BUG() to use unreachable() · a5fc5eba

由 David Daney 提交于 12月 04, 2009

Use the new unreachable() macro instead of for(;;);.  When
allyesconfig is built with a GCC-4.5 snapshot on i686 the size of the
text segment is reduced by 3987 bytes (from 6827019 to 6823032).
Signed-off-by: NDavid Daney <ddaney@caviumnetworks.com>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5fc5eba

05 12月, 2009 3 次提交

PCI: add pci_request_acs · 5d990b62

由 Chris Wright 提交于 12月 04, 2009

Commit ae21ee65 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.

Add a function to pci core to allow an IOMMU to request that ACS
be enabled.  The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order;  iommu has only been detected not initialized.

Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.

Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

5d990b62

x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource · 575939cf

由 Yinghai Lu 提交于 11月 24, 2009

This allows us to use the BIOS SR-IOV allocations rather than assigning
our own later on.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

575939cf

tree-wide: fix misspelling of "definition" in comments · 6070d81e

由 Adam Buchbinder 提交于 12月 04, 2009

"Definition" is misspelled "defintion" in several comments; this
patch fixes them. No code changes.
Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

6070d81e

04 12月, 2009 9 次提交

tree-wide: fix assorted typos all over the place · af901ca1

由 André Goddard Rosa 提交于 11月 14, 2009

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.
Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

af901ca1

xen: call clock resume notifier on all CPUs · f6eafe36

由 Ian Campbell 提交于 11月 25, 2009

tick_resume() is never called on secondary processors. Presumably this
is because they are offlined for suspend on native and so this is
normally taken care of in the CPU onlining path. Under Xen we keep all
CPUs online over a suspend.

This patch papers over the issue for me but I will investigate a more
generic, less hacky, way of doing to the same.

tick_suspend is also only called on the boot CPU which I presume should
be fixed too.
Signed-off-by: NIan Campbell <Ian.Campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>

f6eafe36

xen: use iret for return from 64b kernel to 32b usermode · 6aaf5d63

由 Jeremy Fitzhardinge 提交于 11月 25, 2009

If Xen wants to return to a 32b usermode with sysret it must use the
right form.  When using VCGF_in_syscall to trigger this, it looks at
the code segment and does a 32b sysret if it is FLAT_USER_CS32.
However, this is different from __USER32_CS, so it fails to return
properly if we use the normal Linux segment.

So avoid the whole mess by dropping VCGF_in_syscall and simply use
plain iret to return to usermode.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NJan Beulich <jbeulich@novell.com>
Cc: Stable Kernel <stable@kernel.org>

6aaf5d63

xen: register runstate info for boot CPU early · 499d19b8

由 Jeremy Fitzhardinge 提交于 11月 24, 2009

printk timestamping uses sched_clock, which in turn relies on runstate
info under Xen.  So make sure we set it up before any printks can
be called.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>

499d19b8

xen: register runstate on secondary CPUs · 02889672

由 Ian Campbell 提交于 11月 24, 2009

The commit "xen: re-register runstate area earlier on resume" caused us
to never try and setup the runstate area for secondary CPUs. Ensure that
we do this...
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>

02889672

xen: register timer interrupt with IRQF_TIMER · f350c792

由 Ian Campbell 提交于 11月 24, 2009

Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents
correct operation of stop_machine on multi-processor systems and breaks
suspend.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>

f350c792

xen: correctly restore pfn_to_mfn_list_list after resume · fa24ba62

由 Ian Campbell 提交于 11月 21, 2009

pvops kernels >= 2.6.30 can currently only be saved and restored once. The
second attempt to save results in:

ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
ERROR Internal error: Failed to map/save the p2m frame list

I finally narrowed it down to:

commit cdaead6b
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Fri Feb 27 15:34:59 2009 -0800

xen: split construction of p2m mfn tables from registration

Build the p2m_mfn_list_list early with the rest of the p2m table, but
register it later when the real shared_info structure is in place.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The unforeseen side-effect of this change was to cause the mfn list list to not
be rebuilt on resume. Prior to this change it would have been rebuilt via
xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().

Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>

fa24ba62

xen: restore runstate_info even if !have_vcpu_info_placement · 3905bb2a

由 Jeremy Fitzhardinge 提交于 11月 21, 2009

Even if have_vcpu_info_placement is not set, we still need to set up
the runstate area on each resumed vcpu.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>

3905bb2a

xen: re-register runstate area earlier on resume. · be012920

由 Ian Campbell 提交于 11月 21, 2009

This is necessary to ensure the runstate area is available to
xen_sched_clock before any calls to printk which will require it in
order to provide a timestamp.

I chose to pull the xen_setup_runstate_info out of xen_time_init into
the caller in order to maintain parity with calling
xen_setup_runstate_info separately from calling xen_time_resume.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>

be012920

03 12月, 2009 22 次提交

x86, apic: Enable lapic nmi watchdog on AMD Family 11h · 7d1849af

由 Mikael Pettersson 提交于 12月 03, 2009

The x86 lapic nmi watchdog does not recognize AMD Family 11h,
resulting in:

  NMI watchdog: CPU not supported

As far as I can see from available documentation (the BKDM),
family 11h looks identical to family 10h as far as the PMU
is concerned.

Extending the check to accept family 11h results in:

  Testing NMI watchdog ... OK.

I've been running with this change on a Turion X2 Ultra ZM-82
laptop for a couple of weeks now without problems.
Signed-off-by: NMikael Pettersson <mikpe@it.uu.se>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
LKML-Reference: <19223.53436.931768.278021@pilspetsen.it.uu.se>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7d1849af

x86/reboot: Add pci_dev_put in reboot_fixup_32.c for consistency · 57fea8f7

由 Xiaotian Feng 提交于 12月 03, 2009

pci_get_device will increase the ref count of found device.
Although we're going to reset soon, we should use pci_dev_put
to decrease the ref count for consistency.
Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1259838400-23833-1-git-send-email-dfeng@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

57fea8f7

x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree · 4528752f

由 Darrick J. Wong 提交于 12月 02, 2009

On a multi-node x3950M2 system, there's a slight oddity in the
PCI device tree for all secondary nodes:

 30:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-33:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
     \-34:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

...as compared to the primary node:

 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
  \-01:00.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
 03:00.0 PCI bridge: IBM CalIOC2 PCI-E Root Port (rev 01)
  \-04:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)

In both nodes, the LSI RAID controller hangs off a CalIOC2
device, but on the secondary nodes, the BIOS hides the VGA
device and substitutes the device tree ending with the disk
controller.

It would seem that Calgary devices don't necessarily appear at
the top of the PCI tree, which means that the current code to
find the Calgary IOMMU that goes with a particular device is
buggy.

Rather than walk all the way to the top of the PCI
device tree and try to match bus number with Calgary descriptor,
the code needs to examine each parent of the particular device;
if it encounters a Calgary with a matching bus number, simply
use that.

Otherwise, we BUG() when the bus number of the Calgary doesn't
match the bus number of whatever's at the top of the device tree.

Extra note: This patch appears to work correctly for the x3950
that came before the x3950 M2.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Acked-by: NMuli Ben-Yehuda <muli@il.ibm.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Corinna Schultz <coschult@us.ibm.com>
Cc: <stable@kernel.org>
LKML-Reference: <20091202230556.GG10295@tux1.beaverton.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4528752f

KVM: VMX: Fix comparison of guest efer with stale host value · d5696725

由 Avi Kivity 提交于 12月 02, 2009

update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.

However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.

Fix by deferring the mask/compare to the actual point of guest entry.

Noted by Marcelo.
Signed-off-by: NAvi Kivity <avi@redhat.com>

d5696725

KVM: Drop user return notifier when disabling virtualization on a cpu · 3548bab5

由 Avi Kivity 提交于 11月 28, 2009

This way, we don't leave a dangling notifier on cpu hotunplug or module
unload.  In particular, module unload leaves the notifier pointing into
freed memory.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3548bab5

KVM: VMX: Disable unrestricted guest when EPT disabled · 046d8710

由 Sheng Yang 提交于 11月 27, 2009

Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest
supported processors.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

046d8710

KVM: x86 emulator: limit instructions to 15 bytes · eb3c79e6

由 Avi Kivity 提交于 11月 24, 2009

While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

eb3c79e6

KVM: x86: Add KVM_GET/SET_VCPU_EVENTS · 3cfc3092

由 Jan Kiszka 提交于 11月 12, 2009

This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3cfc3092

KVM: VMX: Report unexpected simultaneous exceptions as internal errors · 65ac7264

由 Avi Kivity 提交于 11月 04, 2009

These happen when we trap an exception when another exception is being
delivered; we only expect these with MCEs and page faults.  If something
unexpected happens, things probably went south and we're better off reporting
an internal error and freezing.
Signed-off-by: NAvi Kivity <avi@redhat.com>

65ac7264

KVM: Allow internal errors reported to userspace to carry extra data · a9c7399d

由 Avi Kivity 提交于 11月 04, 2009

Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available.  Add extra data to internal error
reporting so we can expose it to the debugger.  Extra data is specific
to the suberror.
Signed-off-by: NAvi Kivity <avi@redhat.com>

a9c7399d

KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG · 4f926bf2

由 Jan Kiszka 提交于 10月 30, 2009

Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
avoid triggering the WARN_ON in kvm_queue_exception if there is already
an exception pending and reject such invalid requests.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4f926bf2

KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic · 2204ae3c

由 Marcelo Tosatti 提交于 10月 29, 2009

Otherwise kvm might attempt to dereference a NULL pointer.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2204ae3c

KVM: x86: disallow multiple KVM_CREATE_IRQCHIP · 3ddea128

由 Marcelo Tosatti 提交于 10月 29, 2009

Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
Also serialize multiple accesses with kvm->lock.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3ddea128

KVM: VMX: Remove vmx->msr_offset_efer · 92c0d900

由 Avi Kivity 提交于 10月 29, 2009

This variable is used to communicate between a caller and a callee; switch
to a function argument instead.
Signed-off-by: NAvi Kivity <avi@redhat.com>

92c0d900

KVM: MMU: update invlpg handler comment · 5f5c35aa

由 Marcelo Tosatti 提交于 10月 26, 2009

Large page translations are always synchronized (either in level 3
or level 2), so its not necessary to properly deal with them
in the invlpg handler.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5f5c35aa

KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 · 7c93be44

由 Marcelo Tosatti 提交于 10月 26, 2009

GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.

Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Acked-by: NAcked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7c93be44

KVM: remove duplicated task_switch check · 1655e3a3

由 Gleb Natapov 提交于 10月 25, 2009

Probably introduced by a bad merge.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1655e3a3

KVM: VMX: Use shared msr infrastructure · 26bb0981

由 Avi Kivity 提交于 9月 07, 2009

Instead of reloading syscall MSRs on every preemption, use the new shared
msr infrastructure to reload them at the last possible minute (just before
exit to userspace).

Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be
reloaded as well).

[jan: fix slot index missing indirection]
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

26bb0981

KVM: x86 shared msr infrastructure · 18863bdd

由 Avi Kivity 提交于 9月 07, 2009

The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.
Signed-off-by: NAvi Kivity <avi@redhat.com>

18863bdd

KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area · 44ea2b17

由 Avi Kivity 提交于 9月 06, 2009

Currently MSR_KERNEL_GS_BASE is saved and restored as part of the
guest/host msr reloading.  Since we wish to lazy-restore all the other
msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using
the common code.
Signed-off-by: NAvi Kivity <avi@redhat.com>

44ea2b17

KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization · 3ce672d4

由 Eduardo Habkost 提交于 10月 24, 2009

The svm_set_cr0() call will initialize save->cr0 properly even when npt is
enabled, clearing the NW and CD bits as expected, so we don't need to
initialize it manually for npt_enabled anymore.
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3ce672d4

KVM: SVM: Reset cr0 properly on vcpu reset · 18fa000a

由 Eduardo Habkost 提交于 10月 24, 2009

svm_vcpu_reset() was not properly resetting the contents of the guest-visible
cr0 register, causing the following issue:
https://bugzilla.redhat.com/show_bug.cgi?id=525699

Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine
with paging enabled, making the vcpu get a pagefault exception while trying to
run it.

Instead of setting vmcb->save.cr0 directly, the new code just resets
kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().

kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

18fa000a