- 09 7月, 2012 1 次提交
-
-
由 Avi Kivity 提交于
Currently the MMU's ->new_cr3() callback does nothing when guest paging is disabled or when two-dimentional paging (e.g. EPT on Intel) is active. This means that an emulated write to cr3 can be lost; kvm_set_cr3() will write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its old value and this is what the guest sees. This bug did not have any effect until now because: - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction - without unrestricted guest, and with paging enabled, we also never emulate a mov cr3 instruction - without unrestricted guest, but with paging disabled, the guest's cr3 is ignored until the guest enables paging; at this point the value from arch.cr3 is loaded correctly my the mov cr0 instruction which turns on paging However, the patchset that enables big real mode causes us to emulate mov cr3 instructions in protected mode sometimes (when guest state is not virtualizable by vmx); this mov cr3 is effectively ignored and will crash the guest. The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3 reload. This is awkward because now all the new_cr3 callbacks to the same thing, and because mmu_free_roots() is somewhat of an overkill; but fixing that is more complicated and will be done after this minimal fix. Observed in the Window XP 32-bit installer while bringing up secondary vcpus. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 04 7月, 2012 2 次提交
-
-
由 Guo Chao 提交于
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Michael S. Tsirkin 提交于
On UP i386, when APIC is disabled # CONFIG_X86_UP_APIC is not set # CONFIG_PCI_IOAPIC is not set code looking at apicdrivers never has any effect but it still gets compiled in. In particular, this causes build failures with kvm, but it generally bloats the kernel unnecessarily. Fix by defining both __apicdrivers and __apicdrivers_end to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified that as the result any loop scanning __apicdrivers gets optimized out by the compiler. Warning: a .config with apic disabled doesn't seem to boot for me (even without this patch). Still verifying why, meanwhile this patch is compile-tested only. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reported-by: NRandy Dunlap <rdunlap@xenotime.net> Acked-by: NRandy Dunlap <rdunlap@xenotime.net> Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 25 6月, 2012 7 次提交
-
-
由 Michael S. Tsirkin 提交于
Implementation of PV EOI using shared memory. This reduces the number of exits an interrupt causes as much as by half. The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. We set it before injecting an interrupt and clear before injecting a nested one. Guest tests it using a test and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. There's a new MSR to set the address of said register in guest memory. Otherwise not much changed: - Guest EOI is not required - Register is tested & ISR is automatically cleared on exit For testing results see description of previous patch 'kvm_para: guest side for eoi avoidance'. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Michael S. Tsirkin 提交于
Each time we need to cancel injection we invoke same code (cancel_injection callback). Move it towards the end of function using the familiar goto on error pattern. Will make it easier to do more cleanups for PV EOI. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Michael S. Tsirkin 提交于
Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic attention bitmask but kvm still syncs lapic unconditionally. As that commit suggested and in anticipation of adding more attention bits, only sync lapic if(apic_attention). Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Michael S. Tsirkin 提交于
__test_and_clear_bit is actually atomic with respect to the local CPU. Add a note saying that KVM on x86 relies on this behaviour so people don't accidentaly break it. Also warn not to rely on this in portable code. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Michael S. Tsirkin 提交于
The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. Guest tests it using a single est and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. I run a simple microbenchmark to show exit reduction (note: for testing, need to apply follow-up patch 'kvm: host side for eoi optimization' + a qemu patch I posted separately, on host): Before: Performance counter stats for 'sleep 1s': 47,357 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 5,001 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 22,124 kvm:kvm_apic [99.98%] 49,849 kvm:kvm_exit [99.98%] 21,115 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 22,937 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.98%] 22,207 kvm:kvm_apic_accept_irq [99.98%] 22,421 kvm:kvm_eoi [99.98%] 0 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 57 kvm:kvm_emulate_insn [99.99%] 0 kvm:vcpu_match_mmio [99.99%] 0 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 23,609 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [99.99%] 226 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002100578 seconds time elapsed After: Performance counter stats for 'sleep 1s': 28,354 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 1,347 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 1,931 kvm:kvm_apic [99.98%] 29,595 kvm:kvm_exit [99.98%] 24,884 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 1,986 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.99%] 25,953 kvm:kvm_apic_accept_irq [99.99%] 26,132 kvm:kvm_eoi [99.99%] 26,593 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 284 kvm:kvm_emulate_insn [99.99%] 68 kvm:vcpu_match_mmio [99.99%] 68 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 28,288 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [100.00%] 588 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002039622 seconds time elapsed We see that # of exits is almost halved. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Michael S. Tsirkin 提交于
We perform ISR lookups twice: during interrupt injection and on EOI. Typical workloads only have a single bit set there. So we can avoid ISR scans by 1. counting bits as we set/clear them in ISR 2. on set, caching the injected vector number 3. on clear, invalidating the cache The real purpose of this is enabling PV EOI which needs to quickly validate the vector. But non PV guests also benefit: with this patch, and without interrupt nesting, apic_find_highest_isr will always return immediately without scanning ISR. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Michael S. Tsirkin 提交于
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 19 6月, 2012 1 次提交
-
-
由 Takuya Yoshikawa 提交于
The following commit did not care about the error handling path: commit c1a7b32a KVM: Avoid wasting pages for small lpage_info arrays If memory allocation fails, vfree() will be called with the address returned by kzalloc(). This patch fixes this issue. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 18 6月, 2012 1 次提交
-
-
由 Christoffer Dall 提交于
This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use the KVM_IRQ_LINE ioctl, which is currently conditional on __KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we need a separate define. Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 14 6月, 2012 1 次提交
-
-
由 Xudong Hao 提交于
EPT Dirty bit use bit 9 as Intel SDM definition, to avoid conflict, change PT_FIRST_AVAIL_BITS_SHIFT to 10. Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NXiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 12 6月, 2012 1 次提交
-
-
由 Takuya Yoshikawa 提交于
Size is not needed to return one from pre-allocated objects. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 06 6月, 2012 2 次提交
-
-
由 Michael S. Tsirkin 提交于
I see this in 3.5-rc1: arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’: arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function The line in question was introduced by commit 1e3f42f0 static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp, unsigned long data) { - u64 *spte; + u64 *sptep; + struct rmap_iterator iter; <- line 1271 int young = 0; /* The reason I think is that the compiler assumes that the rmap value could be 0, so static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) { if (!rmap) return NULL; if (!(rmap & 1)) { iter->desc = NULL; return (u64 *)rmap; } iter->desc = (struct pte_list_desc *)(rmap & ~1ul); iter->pos = 0; return iter->desc->sptes[iter->pos]; } will not initialize iter.desc, but the compiler isn't smart enough to see that for (sptep = rmap_get_first(*rmapp, &iter); sptep; sptep = rmap_get_next(&iter)) { will immediately exit in this case. I checked by adding if (!*rmapp) goto out; on top which is clearly equivalent but disables the warning. This patch uses uninitialized_var to disable the warning without increasing code size. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Christoffer Dall 提交于
Introduces a couple of print functions, which are essentially wrappers around standard printk functions, with a KVM: prefix. Functions introduced or modified are: - kvm_err(fmt, ...) - kvm_info(fmt, ...) - kvm_debug(fmt, ...) - kvm_pr_unimpl(fmt, ...) - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...) Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 05 6月, 2012 7 次提交
-
-
由 Orit Wasserman 提交于
For example migration between Westmere and Nehelem hosts, caught in big real mode. The code that fixes the segments for real mode guest was moved from enter_rmode to vmx_set_segments. enter_rmode calls vmx_set_segments for each segment. Signed-off-by: NOrit Wasserman <owasserm@rehdat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
mmu_shrink() needlessly iterates over all VMs even though it will not attempt to free mmu pages from more than one on them. Fix that and also check used mmu pages count outside of VM lock to skip inactive VMs faster. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xudong Hao 提交于
Signed-off-by: NHaitao Shan <haitao.shan@intel.com> Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xudong Hao 提交于
In EPT page structure entry, Enable EPT A/D bits if processor supported. Signed-off-by: NHaitao Shan <haitao.shan@intel.com> Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xudong Hao 提交于
Add kernel parameter to control A/D bits support, it's on by default. Signed-off-by: NHaitao Shan <haitao.shan@intel.com> Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Xudong Hao 提交于
Signed-off-by: NHaitao Shan <haitao.shan@intel.com> Signed-off-by: NXudong Hao <xudong.hao@intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Takuya Yoshikawa 提交于
lpage_info is created for each large level even when the memory slot is not for RAM. This means that when we add one slot for a PCI device, we end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc(). To make things worse, there is an increasing number of devices which would result in more pages being wasted this way. This patch mitigates this problem by using kvm_kvzalloc(). Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 02 6月, 2012 12 次提交
-
-
由 H.J. Lu 提交于
When I added x32 ptrace to 3.4 kernel, I also include PTRACE_ARCH_PRCTL support for x32 GDB For ARCH_GET_FS/GS, it takes a pointer to int64. But at user level, ARCH_GET_FS/GS takes a pointer to int32. So I have to add x32 ptrace to glibc to handle it with a temporary int64 passed to kernel and copy it back to GDB as int32. Roland suggested that PTRACE_ARCH_PRCTL is obsolete and x32 GDB should use fs_base and gs_base fields of user_regs_struct instead. Accordingly, remove PTRACE_ARCH_PRCTL completely from the x32 code to avoid possible memory overrun when pointer to int32 is passed to kernel. Link: http://lkml.kernel.org/r/CAMe9rOpDzHfS7NH7m1vmD9QRw8SSj4Sc%2BaNOgcWm_WJME2eRsQ@mail.gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org> v3.4
-
由 Al Viro 提交于
If we end up calling do_notify_resume() with !user_mode(refs), it does nothing (do_signal() explicitly bails out and we can't get there with TIF_NOTIFY_RESUME in such situations). Then we jump to resume_userspace_sig, which rechecks the same thing and bails out to resume_kernel, thus breaking the loop. It's easier and cheaper to check *before* calling do_notify_resume() and bail out to resume_kernel immediately. And kill the check in do_signal()... Note that on amd64 we can't get there with !user_mode() at all - asm glue takes care of that. Acked-and-reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Does block_sigmask() + tracehook_signal_handler(); called when sigframe has been successfully built. All architectures converted to it; block_sigmask() itself is gone now (merged into this one). I'm still not too happy with the signature, but that's a separate story (IMO we need a structure that would contain signal number + siginfo + k_sigaction, so that get_signal_to_deliver() would fill one, signal_delivered(), handle_signal() and probably setup...frame() - take one). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Only 3 out of 63 do not. Renamed the current variant to __set_current_blocked(), added set_current_blocked() that will exclude unblockable signals, switched open-coded instances to it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
replace boilerplate "should we use ->saved_sigmask or ->blocked?" with calls of obvious inlined helper... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
first fruits of ..._restore_sigmask() helpers: now we can take boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK and restore the blocked mask from ->saved_mask" into a common helper. Open-coded instances switched... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
helpers parallel to set_restore_sigmask(), used in the next commits Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Matt Fleming 提交于
Since we can't expect every user to read the EFI boot stub code it seems prudent to have a couple of paragraphs explaining what it is and how it works. The "initrd=" option in particular is tricky because it only understands absolute EFI-style paths (backslashes as directory separators), and until now this hasn't been documented anywhere. This has tripped up a couple of users. Cc: Matthew Garrett <mjg@redhat.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1331907517-3985-4-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Matt Fleming 提交于
We need a way of printing useful messages to the user, for example when we fail to open an initrd file, instead of just hanging the machine without giving the user any indication of what went wrong. So sprinkle some error messages throughout the EFI boot stub code to make it easier for users to diagnose/report problems. Reported-by: NKeshav P R <the.ridikulus.rat@gmail.com> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1331907517-3985-3-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Matt Fleming 提交于
The loop at the 'close_handles' label in handle_ramdisks() should be using 'i', which represents the number of initrd files that were successfully opened, not 'nr_initrds' which is the number of initrd= arguments passed on the command line. Currently, if we execute the loop to close all file handles and we failed to open any initrds we'll try to call the close function on a garbage pointer, causing the machine to hang. Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1331907517-3985-2-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 01 6月, 2012 5 次提交
-
-
由 Steven Rostedt 提交于
When both DYNAMIC_FTRACE and LOCKDEP are set, the TRACE_IRQS_ON/OFF will call into the lockdep code. The lockdep code can call lots of functions that may be traced by ftrace. When ftrace is updating its code and hits a breakpoint, the breakpoint handler will call into lockdep. If lockdep happens to call a function that also has a breakpoint attached, it will jump back into the breakpoint handler resetting the stack to the debug stack and corrupt the contents currently on that stack. The 'do_sym' call that calls do_int3() is protected by modifying the IST table to point to a different location if another breakpoint is hit. But the TRACE_IRQS_OFF/ON are outside that protection, and if a breakpoint is hit from those, the stack will get corrupted, and the kernel will crash: [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002 [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0 [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP [ 1013.310600] CPU 2 [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 1013.401848] [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30 [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>] [<ffff88014630a000>] 0xffff880146309fff [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408 EFLAGS: 00010046 [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000 [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20 [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000 [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000 [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40 [ 1013.586108] FS: 0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000 [ 1013.609458] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0 [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0) [ 1013.717028] Stack: [ 1013.724131] ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000 [ 1013.745918] cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627 [ 1013.767870] ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb [ 1013.790021] Call Trace: [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff [ 1013.861443] RIP [<ffff88014630a000>] 0xffff880146309fff [ 1013.884466] RSP <ffff88014780f408> [ 1013.901507] CR2: 0000000000000002 The solution was to reuse the NMI functions that change the IDT table to make the debug stack keep its current stack (in kernel mode) when hitting a breakpoint: call debug_stack_set_zero TRACE_IRQS_ON call debug_stack_reset If the TRACE_IRQS_ON happens to hit a breakpoint then it will keep the current stack and not crash the box. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
When the NMI handler runs, it checks if it preempted a debug handler and if that handler is using the debug stack. If it is, it changes the IDT table not to update the stack, otherwise it will reset the debug stack and corrupt the debug handler it preempted. Now that ftrace uses breakpoints to change functions from nops to callers, many more places may hit a breakpoint. Unfortunately this includes some of the calls that lockdep performs. Which causes issues with the debug stack. It too needs to change the debug stack before tracing (if called from the debug handler). Allow the debug_stack_set_zero() and debug_stack_reset() to be nested so that the debug handlers can take advantage of them too. [ Used this_cpu_*() over __get_cpu_var() as suggested by H. Peter Anvin ] Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
When an NMI goes off and it sees that it preempted the debug stack, to keep the debug stack safe, it changes the IDT to point to one that does not modify the stack on breakpoint (to allow breakpoints in NMIs). But the variable that gets set to know to undo it on exit never gets cleared on exit. Thus every NMI will reset it on exit the first time it is done even if it does not need to be reset. [ Added H. Peter Anvin's suggestion to use this_cpu_read/write ] Cc: <stable@vger.kernel.org> # v3.3 Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
On boot up and module load, it is fine to modify the code directly, without the use of breakpoints. This is because boot up modification is done before SMP is initialized, thus the modification is serial, and module load is done before the module executes. But after that we must use a SMP safe method to modify running code. Otherwise, if we are running the function tracer and update its function (by starting off the stack tracer, or perf tracing) the change of the function called by the ftrace trampoline is done directly. If this is being executed on another CPU, that CPU may take a GPF and crash the kernel. The breakpoint method is used to change the nops at all the functions, but the change of the ftrace callback handler itself was still using a direct modification. If tracing was enabled and the function callback was changed then another CPU could fault if it was currently calling the original callback. This modification must use the breakpoint method too. Note, the direct method is still used for boot up and module load. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
When the function tracer starts modifying the code via breakpoints it sets a variable (modifying_ftrace_code) to inform the breakpoint handler to call the ftrace int3 code. But there's no synchronization between setting this code and the handler, thus it is possible for the handler to be called on another CPU before it sees the variable. This will cause a kernel crash as the int3 handler will not know what to do with it. I originally added smp_mb()'s to force the visibility of the variable but H. Peter Anvin suggested that I just make it atomic. [ Added comments as suggested by Peter Zijlstra ] Suggested-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-