- 01 11月, 2016 1 次提交
-
-
由 Andy Lutomirski 提交于
The kernel doesn't use clts() any more. Remove it and all of its paravirt infrastructure. A careful reader may notice that xen_clts() appears to have been buggy -- it didn't update xen_cr0_value. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm list <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/3d3c8ca62f17579b9849a013d71e59a4d5d1b079.1477951965.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 9月, 2016 2 次提交
-
-
由 Andy Lutomirski 提交于
We use __read_cr4() vs __read_cr4_safe() inconsistently. On CR4-less CPUs, all CR4 bits are effectively clear, so we can make the code simpler and more robust by making __read_cr4() always fix up faults on 32-bit kernels. This may fix some bugs on old 486-like CPUs, but I don't have any easy way to test that. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: david@saggiorato.net Link: http://lkml.kernel.org/r/ea647033d357d9ce2ad2bbde5a631045f5052fb6.1475178370.git.luto@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Peter Zijlstra 提交于
We've unconditionally used the queued spinlock for many releases now. Its time to remove the old ticket lock code. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Waiman.Long@hpe.com Cc: david.vrabel@citrix.com Cc: dhowells@redhat.com Cc: pbonzini@redhat.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20160518184302.GO3193@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 4月, 2016 2 次提交
-
-
由 Luis R. Rodriguez 提交于
Now that all previous paravirt_enabled() uses were replaced with proper x86 semantics by the previous patches we can remove the unused paravirt_enabled() mechanism. Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org> Acked-by: NJuergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-15-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Luis R. Rodriguez 提交于
We have 4 types of x86 platforms that disable RTC: * Intel MID * Lguest - uses paravirt * Xen dom-U - uses paravirt * x86 on legacy systems annotated with an ACPI legacy flag We can consolidate all of these into a platform specific legacy quirk set early in boot through i386_start_kernel() and through x86_64_start_reservations(). This deals with the RTC quirks which we can rely on through the hardware subarch, the ACPI check can be dealt with separately. For Xen things are bit more complex given that the @X86_SUBARCH_XEN x86_hardware_subarch is shared on for Xen which uses the PV path for both domU and dom0. Since the semantics for differentiating between the two are Xen specific we provide a platform helper to help override default legacy features -- x86_platform.set_legacy_features(). Use of this helper is highly discouraged, its only purpose should be to account for the lack of semantics available within your given x86_hardware_subarch. As per 0-day, this bumps the vmlinux size using i386-tinyconfig as follows: TOTAL TEXT init.text x86_early_init_platform_quirks() +70 +62 +62 +43 Only 8 bytes overhead total, as the main increase in size is all removed via __init. Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: andriy.shevchenko@linux.intel.com Cc: bigeasy@linutronix.de Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: ffainelli@freebox.fr Cc: george.dunlap@citrix.com Cc: glin@suse.com Cc: jlee@suse.com Cc: josh@joshtriplett.org Cc: julien.grall@linaro.org Cc: konrad.wilk@oracle.com Cc: kozerkov@parallels.com Cc: lenb@kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-acpi@vger.kernel.org Cc: lv.zheng@intel.com Cc: matt@codeblueprint.co.uk Cc: mbizon@freebox.fr Cc: rjw@rjwysocki.net Cc: robert.moore@intel.com Cc: rusty@rustcorp.com.au Cc: tiwai@suse.de Cc: toshi.kani@hp.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 4月, 2016 3 次提交
-
-
由 Andy Lutomirski 提交于
Enabling CONFIG_PARAVIRT had an unintended side effect: rdmsr() turned into rdmsr_safe() and wrmsr() turned into wrmsr_safe(), even on bare metal. Undo that by using the new unsafe paravirt MSR callbacks. Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/414fabd6d3527703077c6c2a797223d0a9c3b081.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
This adds paravirt callbacks for unsafe MSR access. On native, they call native_{read,write}_msr(). On Xen, they use xen_{read,write}_msr_safe(). Nothing uses them yet for ease of bisection. The next patch will use them in rdmsrl(), wrmsrl(), etc. I intentionally didn't make them warn on #GP on Xen. I think that should be done separately by the Xen maintainers. Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/880eebc5dcd2ad9f310d41345f82061ea500e9fa.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
These callbacks match the _safe variants, so name them accordingly. This will make room for unsafe PV callbacks. Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: KVM list <kvm@vger.kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/9ee3fb6a196a514c93325bdfa15594beecf04876.1459605520.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 2月, 2016 1 次提交
-
-
由 Josh Poimboeuf 提交于
A function created with the PV_CALLEE_SAVE_REGS_THUNK macro doesn't set up a new stack frame before the call instruction, which breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and can result in a bad stack trace. Also, the thunk functions aren't annotated as ELF callable functions. Create a stack frame when CONFIG_FRAME_POINTER is enabled and add the ELF function type. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Alok Kataria <akataria@vmware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris J Arges <chris.j.arges@canonical.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/a2cad74e87c4aba7fd0f54a1af312e66a824a575.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 12月, 2015 1 次提交
-
-
由 David Vrabel 提交于
Adding the rtc platform device in non-privileged Xen PV guests causes an IRQ conflict because these guests do not have legacy PIC and may allocate irqs in the legacy range. In a single VCPU Xen PV guest we should have: /proc/interrupts: CPU0 0: 4934 xen-percpu-virq timer0 1: 0 xen-percpu-ipi spinlock0 2: 0 xen-percpu-ipi resched0 3: 0 xen-percpu-ipi callfunc0 4: 0 xen-percpu-virq debug0 5: 0 xen-percpu-ipi callfuncsingle0 6: 0 xen-percpu-ipi irqwork0 7: 321 xen-dyn-event xenbus 8: 90 xen-dyn-event hvc_console ... But hvc_console cannot get its interrupt because it is already in use by rtc0 and the console does not work. genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0) We can avoid this problem by realizing that unprivileged PV guests (both Xen and lguests) are not supposed to have rtc_cmos device and so adding it is not necessary. Privileged guests (i.e. Xen's dom0) do use it but they should not have irq conflicts since they allocate irqs above legacy range (above gsi_top, in fact). Instead of explicitly testing whether the guest is privileged we can extend pv_info structure to include information about guest's RTC support. Reported-and-tested-by: NSander Eikelenboom <linux@eikelenboom.it> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Cc: vkuznets@redhat.com Cc: xen-devel@lists.xenproject.org Cc: konrad.wilk@oracle.com Cc: stable@vger.kernel.org # 4.2+ Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 26 11月, 2015 1 次提交
-
-
由 Juergen Gross 提交于
pte_update_defer can be removed as it is always set to the same function as pte_update. So any usage of pte_update_defer() can be replaced by pte_update(). pmd_update and pmd_update_defer are always set to paravirt_nop, so they can just be nuked. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: david.vrabel@citrix.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447771879-1806-1-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 11月, 2015 2 次提交
-
-
由 Boris Ostrovsky 提交于
As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", usergs_sysret32 pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-4-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Boris Ostrovsky 提交于
As result of commit "x86/xen: Avoid fast syscall path for Xen PV guests", the irq_enable_sysexit pv op is not called by Xen PV guests anymore and since they were the only ones who used it we can safely remove it. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 11月, 2015 1 次提交
-
-
由 Juergen Gross 提交于
The only member of that structure is startup_ipi_hook which is always set to paravirt_nop. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447767872-16730-1-git-send-email-jgross@suse.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 8月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
As of cf991de2 ("x86/asm/msr: Make wrmsrl_safe() a function"), wrmsrl_safe is a function, but wrmsrl is still a macro. The wrmsrl macro performs invalid shifts if the value argument is 32 bits. This makes it unnecessarily awkward to write code that puts an unsigned long into an MSR. To make this work, syscall_init needs tweaking to stop passing a function pointer to wrmsrl. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Willy Tarreau <w@1wt.eu> Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 7月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
We've had ->read_tsc() and ->read_tscp() paravirt hooks since the very beginning of paravirt, i.e., d3561b7f ("[PATCH] paravirt: header and stubs for paravirtualisation"). AFAICT, the only paravirt guest implementation that ever replaced these calls was vmware, and it's gone. Arguably even vmware shouldn't have hooked RDTSC -- we fully support systems that don't have a TSC at all, so there's no point for a paravirt implementation to pretend that we have a TSC but to replace it. I also doubt that these hooks actually worked. Calls to rdtscl() and rdtscll(), which respected the hooks, were used seemingly interchangeably with native_read_tsc(), which did not. Just remove them. If anyone ever needs them again, they can try to make a case for why they need them. Before, on a paravirt config: text data bss dec hex filename 12618257 1816384 1093632 15528273 ecf151 vmlinux After: text data bss dec hex filename 12617207 1816384 1093632 15527223 eced37 vmlinux Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 5月, 2015 1 次提交
-
-
由 Ingo Molnar 提交于
Valentin Rothberg reported that we use CONFIG_QUEUED_SPINLOCKS in arch/x86/kernel/paravirt_patch_32.c, while the symbol is called CONFIG_QUEUED_SPINLOCK. (Note the extra 'S') But the typo was natural: the proper English term for such a generic object would be 'queued spinlocks' - so rename this and related symbols accordingly to the plural form. Reported-by: NValentin Rothberg <valentinrothberg@gmail.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hp.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 5月, 2015 1 次提交
-
-
由 Peter Zijlstra (Intel) 提交于
We use the regular paravirt call patching to switch between: native_queued_spin_lock_slowpath() __pv_queued_spin_lock_slowpath() native_queued_spin_unlock() __pv_queued_spin_unlock() We use a callee saved call for the unlock function which reduces the i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions again. We further optimize the unlock path by patching the direct call with a "movb $0,%arg1" if we are indeed using the native unlock code. This makes the unlock code almost as fast as the !PARAVIRT case. This significantly lowers the overhead of having CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <paolo.bonzini@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 4月, 2015 1 次提交
-
-
由 Kirill A. Shutemov 提交于
We would want to use number of page table level to define mm_struct. Let's expose it as CONFIG_PGTABLE_LEVELS. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Tested-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2015 1 次提交
-
-
由 Borislav Petkov 提交于
Commit: 4214a16b ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER") removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the macro now too. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 2月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 11月, 2014 1 次提交
-
-
由 Dave Hansen 提交于
asm-generic/mm_hooks.h provides some generic fillers for the 90% of architectures that do not need to hook some mmap-manipulation functions. A comment inside says: > Define generic no-op hooks for arch_dup_mmap and > arch_exit_mmap, to be included in asm-FOO/mmu_context.h > for any arch FOO which doesn't need to hook these. So, does x86 need to hook these? It depends on CONFIG_PARAVIRT. We *conditionally* include this generic header if we have CONFIG_PARAVIRT=n. That's madness. With this patch, x86 stops using asm-generic/mmu_hooks.h entirely. We use our own copies of the functions. The paravirt code provides some stubs if it is disabled, and we always call those stubs in our x86-private versions of arch_exit_mmap() and arch_dup_mmap(). Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 30 1月, 2014 1 次提交
-
-
由 Andi Kleen 提交于
The paravirt thunks use a hack of using a static reference to a static function to reference that function from the top level statement. This assumes that gcc always generates static function names in a specific format, which is not necessarily true. Simply make these functions global and asmlinkage or __visible. This way the static __used variables are not needed and everything works. Functions with arguments are __visible to keep the register calling convention on 32bit. Changed in paravirt and in all users (Xen and vsmp) v2: Use __visible for functions with arguments Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ido Yariv <ido@wizery.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 09 8月, 2013 3 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.comSigned-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NAttilio Rao <attilio.rao@citrix.com> Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ | base (stdev %)|patched(stdev%) | %gain | +-----------------+----------------+--------+ | 15035.3 (0.3) |15150.0 (0.6) | 0.8 | | 1470.0 (2.2) | 1713.7 (1.9) | 16.6 | | 848.6 (4.3) | 967.8 (4.3) | 14.0 | | 652.9 (3.5) | 685.3 (3.7) | 5.0 | +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NAttilio Rao <attilio.rao@citrix.com> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 12 4月, 2013 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The two use-cases where we needed to store the GDT were during ACPI S3 suspend and resume. As the patches: x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed. have demonstrated - there are other mechanism by which the GDT is saved and reloaded during early resume path. Hence we do not need to worry about the pvops call-chain for saving the GDT and can and can eliminate it. The other areas where the store_gdt is used are never going to be hit when running under the pvops platforms. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 11 4月, 2013 1 次提交
-
-
由 Boris Ostrovsky 提交于
Invoking arch_flush_lazy_mmu_mode() results in calls to preempt_enable()/disable() which may have performance impact. Since lazy MMU is not used on bare metal we can patch away arch_flush_lazy_mmu_mode() so that it is never called in such environment. [ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU updates" may cause a minor performance regression on bare metal. This patch resolves that performance regression. It is somewhat unclear to me if this is a good -stable candidate. ] Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.comTested-by: NJosh Boyer <jwboyer@redhat.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> SEE NOTE ABOVE
-
- 19 12月, 2012 1 次提交
-
-
由 David Rientjes 提交于
With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks because set_pmd_at() is undeclared: mm/memory.c: In function 'do_pmd_numa_page': mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at' mm/mprotect.c: In function 'change_pmd_protnuma': mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at' This is because paravirt defines set_pmd_at() only when CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded. The fix is to define it for all CONFIG_PARAVIRT configurations. Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 6月, 2012 1 次提交
-
-
由 Alex Shi 提交于
x86 has no flush_tlb_range support in instruction level. Currently the flush_tlb_range just implemented by flushing all page table. That is not the best solution for all scenarios. In fact, if we just use 'invlpg' to flush few lines from TLB, we can get the performance gain from later remain TLB lines accessing. But the 'invlpg' instruction costs much of time. Its execution time can compete with cr3 rewriting, and even a bit more on SNB CPU. So, on a 512 4KB TLB entries CPU, the balance points is at: (512 - X) * 100ns(assumed TLB refill cost) = X(TLB flush entries) * 100ns(assumed invlpg cost) Here, X is 256, that is 1/2 of 512 entries. But with the mysterious CPU pre-fetcher and page miss handler Unit, the assumed TLB refill cost is far lower then 100ns in sequential access. And 2 HT siblings in one core makes the memory access more faster if they are accessing the same memory. So, in the patch, I just do the change when the target entries is less than 1/16 of whole active tlb entries. Actually, I have no data support for the percentage '1/16', so any suggestions are welcomed. As to hugetlb, guess due to smaller page table, and smaller active TLB entries, I didn't see benefit via my benchmark, so no optimizing now. My micro benchmark show in ideal scenarios, the performance improves 70 percent in reading. And in worst scenario, the reading/writing performance is similar with unpatched 3.4-rc4 kernel. Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP 'always': multi thread testing, '-t' paramter is thread number: with patch unpatched 3.4-rc4 ./mprotect -t 1 14ns 24ns ./mprotect -t 2 13ns 22ns ./mprotect -t 4 12ns 19ns ./mprotect -t 8 14ns 16ns ./mprotect -t 16 28ns 26ns ./mprotect -t 32 54ns 51ns ./mprotect -t 128 200ns 199ns Single process with sequencial flushing and memory accessing: with patch unpatched 3.4-rc4 ./mprotect 7ns 11ns ./mprotect -p 4096 -l 8 -n 10240 21ns 21ns [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com has additional performance numbers. ] Signed-off-by: NAlex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 08 6月, 2012 1 次提交
-
-
由 Andre Przywara 提交于
There were paravirt_ops hooks for the full register set variant of {rd,wr}msr_safe which are actually not used by anyone anymore. Remove them to make the code cleaner and avoid silent breakages when the pvops members were uninitialized. This has been boot-tested natively and under Xen with PVOPS enabled and disabled on one machine. Signed-off-by: NAndre Przywara <andre.przywara@amd.com> Link: http://lkml.kernel.org/r/1338562358-28182-2-git-send-email-bp@amd64.orgAcked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 06 6月, 2012 1 次提交
-
-
由 Andi Kleen 提交于
Add a version of rdpmc() that directly reads into a u64 Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338944211-28275-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 4月, 2012 1 次提交
-
-
由 H. Peter Anvin 提交于
GET_CR2_INTO_RCX is asinine: it is only used in one place, the actual paravirt call returns the value in %rax, not %rcx; and the one place that wants it wants the result in %r9. We actually generate as a result of this call: call ... movq %rax, %rcx xorq %rax, %rax /* this value isn't even used... */ movq %rcx, %r9 At least make the macro do what the paravirt call does, which is put the value into %rax. Nevermind the fact that the macro clobbers all the volatile registers. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1334794610-5546-4-git-send-email-hpa@zytor.com Cc: Glauber de Oliveira Costa <glommer@parallels.com>
-
- 05 3月, 2012 1 次提交
-
-
由 Paul Gortmaker 提交于
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any other BUG variant in a static inline (i.e. not in a #define) then that header really should be including <linux/bug.h> and not just expecting it to be implicitly present. We can make this change risk-free, since if the files using these headers didn't have exposure to linux/bug.h already, they would have been causing compile failures/warnings. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 24 2月, 2012 1 次提交
-
-
由 Ingo Molnar 提交于
static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: NIngo Molnar <mingo@elte.hu> Acked-by: NJason Baron <jbaron@redhat.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 14 7月, 2011 1 次提交
-
-
由 Glauber Costa 提交于
This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: NGlauber Costa <glommer@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NEric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 26 1月, 2011 1 次提交
-
-
由 Andrea Arcangeli 提交于
This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n. The #ifdef that this patch removes was erratically introduced to fix a build error for noPAE (where pmd.pmd doesn't exist). So then the kernel built but it failed at runtime because set_pmd_at was a noop. This will correct it by enabling set_pmd_at for noPAE mode too. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: Nwerner <w.landgraf@ru.ru> Reported-by: NMinchan Kim <minchan.kim@gmail.com> Tested-by: NMinchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2011 1 次提交
-
-
由 Andrea Arcangeli 提交于
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs pmd_update_defer), but this is to keep full simmetry with pte paravirt ops, which looks cleaner and simpler from a common code POV. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 12月, 2010 1 次提交
-
-
由 Cliff Wickman 提交于
halt() should use native_halt() safe_halt() uses native_safe_halt() If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as static inline void halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); } Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is static inline void halt(void) { native_halt(); } So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt() for a halt() is an oversight. Am I missing something? It probably hasn't shown up as a problem because the local apic is disabled on a shutdown or restart. But if we disable interrupts and call halt() we shouldn't expect that the halt() will re-enable interrupts. Signed-off-by: NCliff Wickman <cpw@sgi.com> LKML-Reference: <E1PSBcz-0001g1-FM@eag09.americas.sgi.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 11 11月, 2010 1 次提交
-
-
由 Steven Rostedt 提交于
When running ktest.pl randconfig tests, I would sometimes trigger a lockdep annotation bug (possible reason: unannotated irqs-on). This triggering happened right after function tracer self test was executed. After doing a config bisect I found that this was caused with having function tracer, paravirt guest, prove locking, and rcu torture all enabled. The rcu torture just enhanced the likelyhood of triggering the bug. Prove locking was needed, since it was the thing that was bugging. Function tracer would trace and disable interrupts in all sorts of funny places. paravirt guest would turn arch_local_irq_* into functions that would be traced. Besides the fact that tracing arch_local_irq_* is just a bad idea, this is what is happening. The bug happened simply in the local_irq_restore() code: if (raw_irqs_disabled_flags(flags)) { \ raw_local_irq_restore(flags); \ trace_hardirqs_off(); \ } else { \ trace_hardirqs_on(); \ raw_local_irq_restore(flags); \ } \ The raw_local_irq_restore() was defined as arch_local_irq_restore(). Now imagine, we are about to enable interrupts. We go into the else case and call trace_hardirqs_on() which tells lockdep that we are enabling interrupts, so it sets the current->hardirqs_enabled = 1. Then we call raw_local_irq_restore() which calls arch_local_irq_restore() which gets traced! Now in the function tracer we disable interrupts with local_irq_save(). This is fine, but flags is stored that we have interrupts disabled. When the function tracer calls local_irq_restore() it does it, but this time with flags set as disabled, so we go into the if () path. This keeps interrupts disabled and calls trace_hardirqs_off() which sets current->hardirqs_enabled = 0. When the tracer is finished and proceeds with the original code, we enable interrupts but leave current->hardirqs_enabled as 0. Which now breaks lockdeps internal processing. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-