1. 13 9月, 2017 1 次提交
  2. 01 9月, 2017 1 次提交
  3. 24 6月, 2017 1 次提交
  4. 13 6月, 2017 1 次提交
  5. 05 6月, 2017 1 次提交
    • A
      x86/mm: Pass flush_tlb_info to flush_tlb_others() etc · a2055abe
      Andy Lutomirski 提交于
      Rather than passing all the contents of flush_tlb_info to
      flush_tlb_others(), pass a pointer to the structure directly. For
      consistency, this also removes the unnecessary cpu parameter from
      uv_flush_tlb_others() to make its signature match the other
      *flush_tlb_others() functions.
      
      This serves two purposes:
      
       - It will dramatically simplify future patches that change struct
         flush_tlb_info, which I'm planning to do.
      
       - struct flush_tlb_info is an adequate description of what to do
         for a local flush, too, so by reusing it we can remove duplicated
         code between local and remove flushes in a future patch.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      [ Fix build warning. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a2055abe
  6. 04 4月, 2017 1 次提交
  7. 27 3月, 2017 1 次提交
  8. 25 2月, 2017 1 次提交
  9. 21 2月, 2017 1 次提交
  10. 22 11月, 2016 1 次提交
    • P
      x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted() · 3cded417
      Peter Zijlstra 提交于
      Avoid the pointless function call to pv_lock_ops.vcpu_is_preempted()
      when a paravirt spinlock enabled kernel is ran on native hardware.
      
      Do this by patching out the CALL instruction with "XOR %RAX,%RAX"
      which has the same effect (0 return value).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: benh@kernel.crashing.org
      Cc: boqun.feng@gmail.com
      Cc: borntraeger@de.ibm.com
      Cc: bsingharora@gmail.com
      Cc: dave@stgolabs.net
      Cc: jgross@suse.com
      Cc: kernellwp@gmail.com
      Cc: konrad.wilk@oracle.com
      Cc: mpe@ellerman.id.au
      Cc: paulmck@linux.vnet.ibm.com
      Cc: paulus@samba.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: will.deacon@arm.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3cded417
  11. 01 11月, 2016 1 次提交
    • A
      x86/fpu: Remove clts() · af25ed59
      Andy Lutomirski 提交于
      The kernel doesn't use clts() any more.  Remove it and all of its
      paravirt infrastructure.
      
      A careful reader may notice that xen_clts() appears to have been
      buggy -- it didn't update xen_cr0_value.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm list <kvm@vger.kernel.org>
      Link: http://lkml.kernel.org/r/3d3c8ca62f17579b9849a013d71e59a4d5d1b079.1477951965.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af25ed59
  12. 30 9月, 2016 2 次提交
  13. 22 4月, 2016 2 次提交
    • L
      x86/paravirt: Remove paravirt_enabled() · 867fe800
      Luis R. Rodriguez 提交于
      Now that all previous paravirt_enabled() uses were replaced with proper
      x86 semantics by the previous patches we can remove the unused
      paravirt_enabled() mechanism.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andrew.cooper3@citrix.com
      Cc: andriy.shevchenko@linux.intel.com
      Cc: bigeasy@linutronix.de
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: ffainelli@freebox.fr
      Cc: george.dunlap@citrix.com
      Cc: glin@suse.com
      Cc: jlee@suse.com
      Cc: josh@joshtriplett.org
      Cc: julien.grall@linaro.org
      Cc: konrad.wilk@oracle.com
      Cc: kozerkov@parallels.com
      Cc: lenb@kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-acpi@vger.kernel.org
      Cc: lv.zheng@intel.com
      Cc: matt@codeblueprint.co.uk
      Cc: mbizon@freebox.fr
      Cc: rjw@rjwysocki.net
      Cc: robert.moore@intel.com
      Cc: rusty@rustcorp.com.au
      Cc: tiwai@suse.de
      Cc: toshi.kani@hp.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1460592286-300-15-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      867fe800
    • L
      x86/rtc: Replace paravirt rtc check with platform legacy quirk · 8d152e7a
      Luis R. Rodriguez 提交于
      We have 4 types of x86 platforms that disable RTC:
      
        * Intel MID
        * Lguest - uses paravirt
        * Xen dom-U - uses paravirt
        * x86 on legacy systems annotated with an ACPI legacy flag
      
      We can consolidate all of these into a platform specific legacy
      quirk set early in boot through i386_start_kernel() and through
      x86_64_start_reservations(). This deals with the RTC quirks which
      we can rely on through the hardware subarch, the ACPI check can
      be dealt with separately.
      
      For Xen things are bit more complex given that the @X86_SUBARCH_XEN
      x86_hardware_subarch is shared on for Xen which uses the PV path for
      both domU and dom0. Since the semantics for differentiating between
      the two are Xen specific we provide a platform helper to help override
      default legacy features -- x86_platform.set_legacy_features(). Use
      of this helper is highly discouraged, its only purpose should be
      to account for the lack of semantics available within your given
      x86_hardware_subarch.
      
      As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
      follows:
      
      TOTAL   TEXT   init.text    x86_early_init_platform_quirks()
      +70     +62    +62          +43
      
      Only 8 bytes overhead total, as the main increase in size is
      all removed via __init.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andrew.cooper3@citrix.com
      Cc: andriy.shevchenko@linux.intel.com
      Cc: bigeasy@linutronix.de
      Cc: boris.ostrovsky@oracle.com
      Cc: david.vrabel@citrix.com
      Cc: ffainelli@freebox.fr
      Cc: george.dunlap@citrix.com
      Cc: glin@suse.com
      Cc: jlee@suse.com
      Cc: josh@joshtriplett.org
      Cc: julien.grall@linaro.org
      Cc: konrad.wilk@oracle.com
      Cc: kozerkov@parallels.com
      Cc: lenb@kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-acpi@vger.kernel.org
      Cc: lv.zheng@intel.com
      Cc: matt@codeblueprint.co.uk
      Cc: mbizon@freebox.fr
      Cc: rjw@rjwysocki.net
      Cc: robert.moore@intel.com
      Cc: rusty@rustcorp.com.au
      Cc: tiwai@suse.de
      Cc: toshi.kani@hp.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8d152e7a
  14. 13 4月, 2016 3 次提交
  15. 24 2月, 2016 1 次提交
    • J
      x86/paravirt: Create a stack frame in PV_CALLEE_SAVE_REGS_THUNK · 87b240cb
      Josh Poimboeuf 提交于
      A function created with the PV_CALLEE_SAVE_REGS_THUNK macro doesn't set
      up a new stack frame before the call instruction, which breaks frame
      pointer convention if CONFIG_FRAME_POINTER is enabled and can result in
      a bad stack trace.  Also, the thunk functions aren't annotated as ELF
      callable functions.
      
      Create a stack frame when CONFIG_FRAME_POINTER is enabled and add the
      ELF function type.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/a2cad74e87c4aba7fd0f54a1af312e66a824a575.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      87b240cb
  16. 20 12月, 2015 1 次提交
    • D
      x86/paravirt: Prevent rtc_cmos platform device init on PV guests · d8c98a1d
      David Vrabel 提交于
      Adding the rtc platform device in non-privileged Xen PV guests causes
      an IRQ conflict because these guests do not have legacy PIC and may
      allocate irqs in the legacy range.
      
      In a single VCPU Xen PV guest we should have:
      
      /proc/interrupts:
                 CPU0
        0:       4934  xen-percpu-virq      timer0
        1:          0  xen-percpu-ipi       spinlock0
        2:          0  xen-percpu-ipi       resched0
        3:          0  xen-percpu-ipi       callfunc0
        4:          0  xen-percpu-virq      debug0
        5:          0  xen-percpu-ipi       callfuncsingle0
        6:          0  xen-percpu-ipi       irqwork0
        7:        321   xen-dyn-event     xenbus
        8:         90   xen-dyn-event     hvc_console
        ...
      
      But hvc_console cannot get its interrupt because it is already in use
      by rtc0 and the console does not work.
      
        genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
      
      We can avoid this problem by realizing that unprivileged PV guests (both
      Xen and lguests) are not supposed to have rtc_cmos device and so
      adding it is not necessary.
      
      Privileged guests (i.e. Xen's dom0) do use it but they should not have
      irq conflicts since they allocate irqs above legacy range (above
      gsi_top, in fact).
      
      Instead of explicitly testing whether the guest is privileged we can
      extend pv_info structure to include information about guest's RTC
      support.
      Reported-and-tested-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: vkuznets@redhat.com
      Cc: xen-devel@lists.xenproject.org
      Cc: konrad.wilk@oracle.com
      Cc: stable@vger.kernel.org # 4.2+
      Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d8c98a1d
  17. 26 11月, 2015 1 次提交
  18. 23 11月, 2015 2 次提交
  19. 19 11月, 2015 1 次提交
  20. 23 8月, 2015 1 次提交
  21. 06 7月, 2015 1 次提交
    • A
      x86/asm/tsc, x86/paravirt: Remove read_tsc() and read_tscp() paravirt hooks · 9261e050
      Andy Lutomirski 提交于
      We've had ->read_tsc() and ->read_tscp() paravirt hooks since
      the very beginning of paravirt, i.e.,
      
        d3561b7f ("[PATCH] paravirt: header and stubs for paravirtualisation").
      
      AFAICT, the only paravirt guest implementation that ever
      replaced these calls was vmware, and it's gone. Arguably even
      vmware shouldn't have hooked RDTSC -- we fully support systems
      that don't have a TSC at all, so there's no point for a paravirt
      implementation to pretend that we have a TSC but to replace it.
      
      I also doubt that these hooks actually worked. Calls to rdtscl()
      and rdtscll(), which respected the hooks, were used seemingly
      interchangeably with native_read_tsc(), which did not.
      
      Just remove them. If anyone ever needs them again, they can try
      to make a case for why they need them.
      
      Before, on a paravirt config:
        text    	data     bss     dec     hex filename
        12618257      1816384 1093632 15528273 ecf151 vmlinux
      
      After:
        text		data     bss     dec     hex filename
        12617207      1816384 1093632 15527223 eced37 vmlinux
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: virtualization@lists.linux-foundation.org
      Link: http://lkml.kernel.org/r/d08a2600fb298af163681e5efd8e599d889a5b97.1434501121.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9261e050
  22. 11 5月, 2015 1 次提交
    • I
      locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS · 62c7a1e9
      Ingo Molnar 提交于
      Valentin Rothberg reported that we use CONFIG_QUEUED_SPINLOCKS
      in arch/x86/kernel/paravirt_patch_32.c, while the symbol is
      called CONFIG_QUEUED_SPINLOCK. (Note the extra 'S')
      
      But the typo was natural: the proper English term for such
      a generic object would be 'queued spinlocks' - so rename
      this and related symbols accordingly to the plural form.
      Reported-by: NValentin Rothberg <valentinrothberg@gmail.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      62c7a1e9
  23. 08 5月, 2015 1 次提交
    • P
      locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching · f233f7f1
      Peter Zijlstra (Intel) 提交于
      We use the regular paravirt call patching to switch between:
      
        native_queued_spin_lock_slowpath()	__pv_queued_spin_lock_slowpath()
        native_queued_spin_unlock()		__pv_queued_spin_unlock()
      
      We use a callee saved call for the unlock function which reduces the
      i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
      again.
      
      We further optimize the unlock path by patching the direct call with a
      "movb $0,%arg1" if we are indeed using the native unlock code. This
      makes the unlock code almost as fast as the !PARAVIRT case.
      
      This significantly lowers the overhead of having
      CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f233f7f1
  24. 15 4月, 2015 1 次提交
  25. 03 4月, 2015 1 次提交
  26. 04 2月, 2015 1 次提交
  27. 19 11月, 2014 1 次提交
    • D
      x86: Cleanly separate use of asm-generic/mm_hooks.h · a1ea1c03
      Dave Hansen 提交于
      asm-generic/mm_hooks.h provides some generic fillers for the 90%
      of architectures that do not need to hook some mmap-manipulation
      functions.  A comment inside says:
      
      > Define generic no-op hooks for arch_dup_mmap and
      > arch_exit_mmap, to be included in asm-FOO/mmu_context.h
      > for any arch FOO which doesn't need to hook these.
      
      So, does x86 need to hook these?  It depends on CONFIG_PARAVIRT.
      We *conditionally* include this generic header if we have
      CONFIG_PARAVIRT=n.  That's madness.
      
      With this patch, x86 stops using asm-generic/mmu_hooks.h entirely.
      We use our own copies of the functions.  The paravirt code
      provides some stubs if it is disabled, and we always call those
      stubs in our x86-private versions of arch_exit_mmap() and
      arch_dup_mmap().
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/20141118182349.14567FA5@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a1ea1c03
  28. 30 1月, 2014 1 次提交
  29. 09 8月, 2013 3 次提交
    • J
      x86, ticketlock: Add slowpath logic · 96f853ea
      Jeremy Fitzhardinge 提交于
      Maintain a flag in the LSB of the ticket lock tail which indicates
      whether anyone is in the lock slowpath and may need kicking when
      the current holder unlocks.  The flags are set when the first locker
      enters the slowpath, and cleared when unlocking to an empty queue (ie,
      no contention).
      
      In the specific implementation of lock_spinning(), make sure to set
      the slowpath flags on the lock just before blocking.  We must do
      this before the last-chance pickup test to prevent a deadlock
      with the unlocker:
      
      Unlocker			Locker
      				test for lock pickup
      					-> fail
      unlock
      test slowpath
      	-> false
      				set slowpath flags
      				block
      
      Whereas this works in any ordering:
      
      Unlocker			Locker
      				set slowpath flags
      				test for lock pickup
      					-> fail
      				block
      unlock
      test slowpath
      	-> true, kick
      
      If the unlocker finds that the lock has the slowpath flag set but it is
      actually uncontended (ie, head == tail, so nobody is waiting), then it
      clears the slowpath flag.
      
      The unlock code uses a locked add to update the head counter.  This also
      acts as a full memory barrier so that its safe to subsequently
      read back the slowflag state, knowing that the updated lock is visible
      to the other CPUs.  If it were an unlocked add, then the flag read may
      just be forwarded from the store buffer before it was visible to the other
      CPUs, which could result in a deadlock.
      
      Unfortunately this means we need to do a locked instruction when
      unlocking with PV ticketlocks.  However, if PV ticketlocks are not
      enabled, then the old non-locked "add" is the only unlocking code.
      
      Note: this code relies on gcc making sure that unlikely() code is out of
      line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
      doesn't the generated code isn't too bad, but its definitely suboptimal.
      
      Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
      version of this change, which has been folded in.
      Thanks to Stephan Diestelhorst for commenting on some code which relied
      on an inaccurate reading of the x86 memory ordering rules.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.comSigned-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      96f853ea
    • J
      x86, pvticketlock: Use callee-save for lock_spinning · 354714dd
      Jeremy Fitzhardinge 提交于
      Although the lock_spinning calls in the spinlock code are on the
      uncommon path, their presence can cause the compiler to generate many
      more register save/restores in the function pre/postamble, which is in
      the fast path.  To avoid this, convert it to using the pvops callee-save
      calling convention, which defers all the save/restores until the actual
      function is called, keeping the fastpath clean.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NAttilio Rao <attilio.rao@citrix.com>
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      354714dd
    • J
      x86, spinlock: Replace pv spinlocks with pv ticketlocks · 545ac138
      Jeremy Fitzhardinge 提交于
      Rather than outright replacing the entire spinlock implementation in
      order to paravirtualize it, keep the ticket lock implementation but add
      a couple of pvops hooks on the slow patch (long spin on lock, unlocking
      a contended lock).
      
      Ticket locks have a number of nice properties, but they also have some
      surprising behaviours in virtual environments.  They enforce a strict
      FIFO ordering on cpus trying to take a lock; however, if the hypervisor
      scheduler does not schedule the cpus in the correct order, the system can
      waste a huge amount of time spinning until the next cpu can take the lock.
      
      (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
      http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
      
      To address this, we add two hooks:
       - __ticket_spin_lock which is called after the cpu has been
         spinning on the lock for a significant number of iterations but has
         failed to take the lock (presumably because the cpu holding the lock
         has been descheduled).  The lock_spinning pvop is expected to block
         the cpu until it has been kicked by the current lock holder.
       - __ticket_spin_unlock, which on releasing a contended lock
         (there are more cpus with tail tickets), it looks to see if the next
         cpu is blocked and wakes it if so.
      
      When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
      functions causes all the extra code to go away.
      
      Results:
      =======
      setup: 32 core machine with 32 vcpu KVM guest (HT off)  with 8GB RAM
      base = 3.11-rc
      patched = base + pvspinlock V12
      
      +-----------------+----------------+--------+
       dbench (Throughput in MB/sec. Higher is better)
      +-----------------+----------------+--------+
      |   base (stdev %)|patched(stdev%) | %gain  |
      +-----------------+----------------+--------+
      | 15035.3   (0.3) |15150.0   (0.6) |   0.8  |
      |  1470.0   (2.2) | 1713.7   (1.9) |  16.6  |
      |   848.6   (4.3) |  967.8   (4.3) |  14.0  |
      |   652.9   (3.5) |  685.3   (3.7) |   5.0  |
      +-----------------+----------------+--------+
      
      pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases,
      and undercommits results are flat
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NAttilio Rao <attilio.rao@citrix.com>
      [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t]
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      545ac138
  30. 12 4月, 2013 1 次提交
  31. 11 4月, 2013 1 次提交
  32. 19 12月, 2012 1 次提交
    • D
      x86, paravirt: fix build error when thp is disabled · c36e0501
      David Rientjes 提交于
      With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks
      because set_pmd_at() is undeclared:
      
        mm/memory.c: In function 'do_pmd_numa_page':
        mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at'
        mm/mprotect.c: In function 'change_pmd_protnuma':
        mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at'
      
      This is because paravirt defines set_pmd_at() only when
      CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded.  The
      fix is to define it for all CONFIG_PARAVIRT configurations.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c36e0501
  33. 28 6月, 2012 1 次提交
    • A
      x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range · e7b52ffd
      Alex Shi 提交于
      x86 has no flush_tlb_range support in instruction level. Currently the
      flush_tlb_range just implemented by flushing all page table. That is not
      the best solution for all scenarios. In fact, if we just use 'invlpg' to
      flush few lines from TLB, we can get the performance gain from later
      remain TLB lines accessing.
      
      But the 'invlpg' instruction costs much of time. Its execution time can
      compete with cr3 rewriting, and even a bit more on SNB CPU.
      
      So, on a 512 4KB TLB entries CPU, the balance points is at:
      	(512 - X) * 100ns(assumed TLB refill cost) =
      		X(TLB flush entries) * 100ns(assumed invlpg cost)
      
      Here, X is 256, that is 1/2 of 512 entries.
      
      But with the mysterious CPU pre-fetcher and page miss handler Unit, the
      assumed TLB refill cost is far lower then 100ns in sequential access. And
      2 HT siblings in one core makes the memory access more faster if they are
      accessing the same memory. So, in the patch, I just do the change when
      the target entries is less than 1/16 of whole active tlb entries.
      Actually, I have no data support for the percentage '1/16', so any
      suggestions are welcomed.
      
      As to hugetlb, guess due to smaller page table, and smaller active TLB
      entries, I didn't see benefit via my benchmark, so no optimizing now.
      
      My micro benchmark show in ideal scenarios, the performance improves 70
      percent in reading. And in worst scenario, the reading/writing
      performance is similar with unpatched 3.4-rc4 kernel.
      
      Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
      'always':
      
      multi thread testing, '-t' paramter is thread number:
      	       	        with patch   unpatched 3.4-rc4
      ./mprotect -t 1           14ns		24ns
      ./mprotect -t 2           13ns		22ns
      ./mprotect -t 4           12ns		19ns
      ./mprotect -t 8           14ns		16ns
      ./mprotect -t 16          28ns		26ns
      ./mprotect -t 32          54ns		51ns
      ./mprotect -t 128         200ns		199ns
      
      Single process with sequencial flushing and memory accessing:
      
      		       	with patch   unpatched 3.4-rc4
      ./mprotect		    7ns			11ns
      ./mprotect -p 4096  -l 8 -n 10240
      			    21ns		21ns
      
      [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
        has additional performance numbers. ]
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      e7b52ffd