1. 08 5月, 2015 14 次提交
    • P
      locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching · f233f7f1
      Peter Zijlstra (Intel) 提交于
      We use the regular paravirt call patching to switch between:
      
        native_queued_spin_lock_slowpath()	__pv_queued_spin_lock_slowpath()
        native_queued_spin_unlock()		__pv_queued_spin_unlock()
      
      We use a callee saved call for the unlock function which reduces the
      i-cache footprint and allows 'inlining' of SPIN_UNLOCK functions
      again.
      
      We further optimize the unlock path by patching the direct call with a
      "movb $0,%arg1" if we are indeed using the native unlock code. This
      makes the unlock code almost as fast as the !PARAVIRT case.
      
      This significantly lowers the overhead of having
      CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-10-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f233f7f1
    • W
      locking/pvqspinlock: Implement simple paravirt support for the qspinlock · a23db284
      Waiman Long 提交于
      Provide a separate (second) version of the spin_lock_slowpath for
      paravirt along with a special unlock path.
      
      The second slowpath is generated by adding a few pv hooks to the
      normal slowpath, but where those will compile away for the native
      case, they expand into special wait/wake code for the pv version.
      
      The actual MCS queue can use extra storage in the mcs_nodes[] array to
      keep track of state and therefore uses directed wakeups.
      
      The head contender has no such storage directly visible to the
      unlocker.  So the unlocker searches a hash table with open addressing
      using a simple binary Galois linear feedback shift register.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1429901803-29771-9-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a23db284
    • P
      locking/qspinlock: Revert to test-and-set on hypervisors · 2aa79af6
      Peter Zijlstra (Intel) 提交于
      When we detect a hypervisor (!paravirt, see qspinlock paravirt support
      patches), revert to a simple test-and-set lock to avoid the horrors
      of queue preemption.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2aa79af6
    • W
      locking/qspinlock: Use a simple write to grab the lock · 2c83e8e9
      Waiman Long 提交于
      Currently, atomic_cmpxchg() is used to get the lock. However, this
      is not really necessary if there is more than one task in the queue
      and the queue head don't need to reset the tail code. For that case,
      a simple write to set the lock bit is enough as the queue head will
      be the only one eligible to get the lock as long as it checks that
      both the lock and pending bits are not set. The current pending bit
      waiting code will ensure that the bit will not be set as soon as the
      tail code in the lock is set.
      
      With that change, the are some slight improvement in the performance
      of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
      Westere-EX machine as shown in the tables below.
      
      		[Standalone/Embedded - same node]
        # of tasks	Before patch	After patch	%Change
        ----------	-----------	----------	-------
             3	 2324/2321	2248/2265	 -3%/-2%
             4	 2890/2896	2819/2831	 -2%/-2%
             5	 3611/3595	3522/3512	 -2%/-2%
             6	 4281/4276	4173/4160	 -3%/-3%
             7	 5018/5001	4875/4861	 -3%/-3%
             8	 5759/5750	5563/5568	 -3%/-3%
      
      		[Standalone/Embedded - different nodes]
        # of tasks	Before patch	After patch	%Change
        ----------	-----------	----------	-------
             3	12242/12237	12087/12093	 -1%/-1%
             4	10688/10696	10507/10521	 -2%/-2%
      
      It was also found that this change produced a much bigger performance
      improvement in the newer IvyBridge-EX chip and was essentially to close
      the performance gap between the ticket spinlock and queued spinlock.
      
      The disk workload of the AIM7 benchmark was run on a 4-socket
      Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
      on a 3.14 based kernel. The results of the test runs were:
      
                      AIM7 XFS Disk Test
        kernel                 JPM    Real Time   Sys Time    Usr Time
        -----                  ---    ---------   --------    --------
        ticketlock            5678233    3.17       96.61       5.81
        qspinlock             5750799    3.13       94.83       5.97
      
                      AIM7 EXT4 Disk Test
        kernel                 JPM    Real Time   Sys Time    Usr Time
        -----                  ---    ---------   --------    --------
        ticketlock            1114551   16.15      509.72       7.11
        qspinlock             2184466    8.24      232.99       6.01
      
      The ext4 filesystem run had a much higher spinlock contention than
      the xfs filesystem run.
      
      The "ebizzy -m" test was also run with the following results:
      
        kernel               records/s  Real Time   Sys Time    Usr Time
        -----                ---------  ---------   --------    --------
        ticketlock             2075       10.00      216.35       3.49
        qspinlock              3023       10.00      198.20       4.80
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2c83e8e9
    • P
      locking/qspinlock: Optimize for smaller NR_CPUS · 69f9cae9
      Peter Zijlstra (Intel) 提交于
      When we allow for a max NR_CPUS < 2^14 we can optimize the pending
      wait-acquire and the xchg_tail() operations.
      
      By growing the pending bit to a byte, we reduce the tail to 16bit.
      This means we can use xchg16 for the tail part and do away with all
      the repeated compxchg() operations.
      
      This in turn allows us to unconditionally acquire; the locked state
      as observed by the wait loops cannot change. And because both locked
      and pending are now a full byte we can use simple stores for the
      state transition, obviating one atomic operation entirely.
      
      This optimization is needed to make the qspinlock achieve performance
      parity with ticket spinlock at light load.
      
      All this is horribly broken on Alpha pre EV56 (and any other arch that
      cannot do single-copy atomic byte stores).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69f9cae9
    • W
      locking/qspinlock: Extract out code snippets for the next patch · 6403bd7d
      Waiman Long 提交于
      This is a preparatory patch that extracts out the following 2 code
      snippets to prepare for the next performance optimization patch.
      
       1) the logic for the exchange of new and previous tail code words
          into a new xchg_tail() function.
       2) the logic for clearing the pending bit and setting the locked bit
          into a new clear_pending_set_locked() function.
      
      This patch also simplifies the trylock operation before queuing by
      calling queued_spin_trylock() directly.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6403bd7d
    • P
      locking/qspinlock: Add pending bit · c1fb159d
      Peter Zijlstra (Intel) 提交于
      Because the qspinlock needs to touch a second cacheline (the per-cpu
      mcs_nodes[]); add a pending bit and allow a single in-word spinner
      before we punt to the second cacheline.
      
      It is possible so observe the pending bit without the locked bit when
      the last owner has just released but the pending owner has not yet
      taken ownership.
      
      In this case we would normally queue -- because the pending bit is
      already taken. However, in this case the pending bit is guaranteed
      to be released 'soon', therefore wait for it and avoid queueing.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1fb159d
    • W
      locking/qspinlock, x86: Enable x86-64 to use queued spinlocks · d73a3397
      Waiman Long 提交于
      This patch makes the necessary changes at the x86 architecture
      specific layer to enable the use of queued spinlocks for x86-64. As
      x86-32 machines are typically not multi-socket. The benefit of queue
      spinlock may not be apparent. So queued spinlocks are not enabled.
      
      Currently, there is some incompatibilities between the para-virtualized
      spinlock code (which hard-codes the use of ticket spinlock) and the
      queued spinlocks. Therefore, the use of queued spinlocks is disabled
      when the para-virtualized spinlock is enabled.
      
      The arch/x86/include/asm/qspinlock.h header file includes some x86
      specific optimization which will make the queueds spinlock code
      perform better than the generic implementation.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-3-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d73a3397
    • W
      locking/qspinlock: Introduce a simple generic 4-byte queued spinlock · a33fda35
      Waiman Long 提交于
      This patch introduces a new generic queued spinlock implementation that
      can serve as an alternative to the default ticket spinlock. Compared
      with the ticket spinlock, this queued spinlock should be almost as fair
      as the ticket spinlock. It has about the same speed in single-thread
      and it can be much faster in high contention situations especially when
      the spinlock is embedded within the data structure to be protected.
      
      Only in light to moderate contention where the average queue depth
      is around 1-3 will this queued spinlock be potentially a bit slower
      due to the higher slowpath overhead.
      
      This queued spinlock is especially suit to NUMA machines with a large
      number of cores as the chance of spinlock contention is much higher
      in those machines. The cost of contention is also higher because of
      slower inter-node memory traffic.
      
      Due to the fact that spinlocks are acquired with preemption disabled,
      the process will not be migrated to another CPU while it is trying
      to get a spinlock. Ignoring interrupt handling, a CPU can only be
      contending in one spinlock at any one time. Counting soft IRQ, hard
      IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
      activities.  By allocating a set of per-cpu queue nodes and used them
      to form a waiting queue, we can encode the queue node address into a
      much smaller 24-bit size (including CPU number and queue node index)
      leaving one byte for the lock.
      
      Please note that the queue node is only needed when waiting for the
      lock. Once the lock is acquired, the queue node can be released to
      be used later.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a33fda35
    • P
      kernel: Replace reference to ASSIGN_ONCE() with WRITE_ONCE() in comment · 663fdcbe
      Preeti U Murthy 提交于
      Looks like commit :
      
       43239cbe ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)")
      
      left behind a reference to ASSIGN_ONCE(). Update this to WRITE_ONCE().
      Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: borntraeger@de.ibm.com
      Cc: dave@stgolabs.net
      Cc: paulmck@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/20150430115721.22278.94082.stgit@preeti.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      663fdcbe
    • W
      locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() · 59aabfc7
      Waiman Long 提交于
      In up_write()/up_read(), rwsem_wake() will be called whenever it
      detects that some writers/readers are waiting. The rwsem_wake()
      function will take the wait_lock and call __rwsem_do_wake() to do the
      real wakeup.  For a heavily contended rwsem, doing a spin_lock() on
      wait_lock will cause further contention on the heavily contended rwsem
      cacheline resulting in delay in the completion of the up_read/up_write
      operations.
      
      This patch makes the wait_lock taking and the call to __rwsem_do_wake()
      optional if at least one spinning writer is present. The spinning
      writer will be able to take the rwsem and call rwsem_wake() later
      when it calls up_write(). With the presence of a spinning writer,
      rwsem_wake() will now try to acquire the lock using trylock. If that
      fails, it will just quit.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: NJason Low <jason.low2@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      59aabfc7
    • L
      Merge tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 3e0283a5
      Linus Torvalds 提交于
      Pull power management and ACPI fixes from Rafael Wysocki:
       "These include three regression fixes (PCI resources management,
        ACPI/PNP device enumeration, ACPI SBS on MacBook) and two ACPI
        documentation fixes related to GPIO.
      
        Specifics:
      
         - Fix for a PCI resources management regression introduced during the
           4.0 cycle and related to the handling of ACPI resources'
           Producer/Consumer flags that turn out to be useless (Jiang Liu)
      
         - Fix for a MacBook regression related to the Smart Battery Subsystem
           (SBS) driver causing various problems (stalls on boot, failure to
           detect or report battery) to happen and introduced during the 3.18
           cycle (Chris Bainbridge)
      
         - Fix for an ACPI/PNP device enumeration regression introduced during
           the 3.16 cycle caused by failing to include two PNP device IDs into
           the list of IDs that PNP device objects need to be created for
           (Witold Szczeponik)
      
         - Fixes for two minor mistakes in the ACPI GPIO properties
           documentation (Antonio Ospite, Rafael J Wysocki)"
      
      * tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / PNP: add two IDs to list for PNPACPI device enumeration
        ACPI / documentation: Fix ambiguity in the GPIO properties document
        ACPI / documentation: fix a sentence about GPIO resources
        ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook
        x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus
      3e0283a5
    • R
      Merge branches 'acpi-resources', 'acpi-battery', 'acpi-doc' and 'acpi-pnp' · 9a5d9315
      Rafael J. Wysocki 提交于
      * acpi-resources:
        x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus
      
      * acpi-battery:
        ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook
      
      * acpi-doc:
        ACPI / documentation: Fix ambiguity in the GPIO properties document
        ACPI / documentation: fix a sentence about GPIO resources
      
      * acpi-pnp:
        ACPI / PNP: add two IDs to list for PNPACPI device enumeration
      9a5d9315
    • L
      Merge tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 68c2f356
      Linus Torvalds 提交于
      Pull f2fs fixes from Jaegeuk Kim:
       "Fix a performance regression and a bug"
      
      * tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
        f2fs: fix wrong error hanlder in f2fs_follow_link
        Revert "f2fs: enhance multi-threads performance"
      68c2f356
  2. 07 5月, 2015 7 次提交
    • L
      Merge tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · fbb7b92f
      Linus Torvalds 提交于
      Pull pin control fixes from Linus Walleij:
       "Here is a smallish set of pin control fixes for the v4.1 cycle,
        collected the last two weeks:
      
         - fix a real nasty legacy bug that has screwed up the protection of
           adding pinctrl maps dynamically.  Normally this didn't happen so
           much but Dough Anderson ran into it and fixed it, kudos!
      
        - minor driver fixes for Qualcomm spmi, mediatek and Marvell drivers"
      
      * tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: Don't just pretend to protect pinctrl_maps, do it for real
        pinctrl: mediatek: mtk-common: initialize unmask
        pinctrl: qcom-spmi-mpp: Fix input value report
        pinctrl: qcom-spmi: Fix pin direction configuration
        pinctrl: mvebu: Fix mapping of pin 63 (gpo -> gpio)
      fbb7b92f
    • L
      Merge tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio · 7bbcd1b8
      Linus Torvalds 提交于
      Pull vfio fixes from Alex Williamson:
       "Fix some undesirable behavior with the vfio device request interface:
      
         - increase verbosity of device request channel (Alex Williamson)
      
         - fix runaway interruptible timeout (Alex Williamson)"
      
      * tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio:
        vfio: Fix runaway interruptible timeout
        vfio-pci: Log device requests more verbosely
      7bbcd1b8
    • L
      Merge tag 'for-linus' of git://github.com/dledford/linux · 8cb7c15b
      Linus Torvalds 提交于
      Pull infiniband updates from Doug Ledford:
       "Minor updates for 4.1-rc
      
        Most of the changes are fairly small and well confined.  The iWARP
        address reporting changes are the only ones that are a medium size.  I
        had these queued up prior to rc1, but due to the shuffle in
        maintainers, they did not get submitted when I expected.  My apologies
        for that.  I feel comfortable with them however due to the testing
        they've received, so I left them in this submission"
      
      * tag 'for-linus' of git://github.com/dledford/linux:
        MAINTAINERS: Update InfiniBand subsystem maintainer
        MAINTAINERS: add include/rdma/ to InfiniBand subsystem
        IPoIB/CM: Fix indentation level
        iw_cxgb4: Remove negative advice dmesg warnings
        IB/core: Fix unaligned accesses
        IB/core: change rdma_gid2ip into void function as it always return zero
        IB/qib: use arch_phys_wc_add()
        IB/qib: add acounting for MTRR
        IB/core: dma unmap optimizations
        IB/core: dma map/unmap locking optimizations
        RDMA/cxgb4: Report the actual address of the remote connecting peer
        RDMA/nes: Report the actual address of the remote connecting peer
        RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
        iw_cxgb4: enforce qp/cq id requirements
        iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs
        iw_cxgb4: 32b platform fixes
        iw_cxgb4: Cleanup register defines/MACROS
        RDMA/CMA: Canonize IPv4 on IPV6 sockets properly
      8cb7c15b
    • L
      Merge tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 0e1dc427
      Linus Torvalds 提交于
      Pull xen bug fixes from David Vrabel:
      
       - fix blkback regression if using persistent grants
      
       - fix various event channel related suspend/resume bugs
      
       - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS
      
       - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
         capable of 32-bit DMA work.
      
      * tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
        hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
        xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq()
        xen/console: Update console event channel on resume
        xen/xenbus: Update xenbus event channel on resume
        xen/events: Clear cpu_evtchn_mask before resuming
        xen-pciback: Add name prefix to global 'permissive' variable
        xen: Suspend ticks on all CPUs during suspend
        xen/grant: introduce func gnttab_unmap_refs_sync()
        xen/blkback: safely unmap purge persistent grants
      0e1dc427
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3d54ac9e
      Linus Torvalds 提交于
      Pull x86 fixes from Ingo Molnar:
       "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
        two build fixes"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Always restore_xinit_state() when use_eager_cpu()
        x86: Make cpu_tss available to external modules
        efi: Fix error handling in add_sysfs_runtime_map_entry()
        x86/spinlocks: Fix regression in spinlock contention detection
        x86/mm: Clean up types in xlate_dev_mem_ptr()
        x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
        efivarfs: Ensure VariableName is NUL-terminated
      3d54ac9e
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d8fce2db
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
        PMU driver hardware-enablement addition"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf probe: Fix segfault if passed with ''.
        perf report: Fix -T/--threads option to work again
        perf bench numa: Fix immediate meeting of convergence condition
        perf bench numa: Fixes of --quiet argument
        perf bench futex: Fix hung wakeup tasks after requeueing
        perf probe: Fix bug with global variables handling
        perf top: Fix a segfault when kernel map is restricted.
        tools lib traceevent: Fix build failure on 32-bit arch
        perf kmem: Fix compiles on RHEL6/OL6
        tools lib api: Undefine _FORTIFY_SOURCE before setting it
        perf kmem: Consistently use PRIu64 for printing u64 values
        perf trace: Disable events and drain events when forked workload ends
        perf trace: Enable events when doing system wide tracing and starting a workload
        perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
        perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
        perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
      d8fce2db
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 02f0f572
      Linus Torvalds 提交于
      Pull RCU fix from Ingo Molnar:
       "An RCU Kconfig fix that eliminates an annoying interactive kconfig
        question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rcu: Control grace-period delays directly from value
      02f0f572
  3. 06 5月, 2015 19 次提交