1. 23 11月, 2015 3 次提交
  2. 11 9月, 2015 1 次提交
  3. 03 8月, 2015 1 次提交
    • W
      locking/pvqspinlock: Only kick CPU at unlock time · 75d22702
      Waiman Long 提交于
      For an over-committed guest with more vCPUs than physical CPUs
      available, it is possible that a vCPU may be kicked twice before
      getting the lock - once before it becomes queue head and once again
      before it gets the lock. All these CPU kicking and halting (VMEXIT)
      can be expensive and slow down system performance.
      
      This patch adds a new vCPU state (vcpu_hashed) which enables the code
      to delay CPU kicking until at unlock time. Once this state is set,
      the new lock holder will set _Q_SLOW_VAL and fill in the hash table
      on behalf of the halted queue head vCPU. The original vcpu_halted
      state will be used by pv_wait_node() only to differentiate other
      queue nodes from the qeue head.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1436647018-49734-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75d22702
  4. 08 5月, 2015 7 次提交
    • W
      locking/pvqspinlock: Implement simple paravirt support for the qspinlock · a23db284
      Waiman Long 提交于
      Provide a separate (second) version of the spin_lock_slowpath for
      paravirt along with a special unlock path.
      
      The second slowpath is generated by adding a few pv hooks to the
      normal slowpath, but where those will compile away for the native
      case, they expand into special wait/wake code for the pv version.
      
      The actual MCS queue can use extra storage in the mcs_nodes[] array to
      keep track of state and therefore uses directed wakeups.
      
      The head contender has no such storage directly visible to the
      unlocker.  So the unlocker searches a hash table with open addressing
      using a simple binary Galois linear feedback shift register.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1429901803-29771-9-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a23db284
    • P
      locking/qspinlock: Revert to test-and-set on hypervisors · 2aa79af6
      Peter Zijlstra (Intel) 提交于
      When we detect a hypervisor (!paravirt, see qspinlock paravirt support
      patches), revert to a simple test-and-set lock to avoid the horrors
      of queue preemption.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2aa79af6
    • W
      locking/qspinlock: Use a simple write to grab the lock · 2c83e8e9
      Waiman Long 提交于
      Currently, atomic_cmpxchg() is used to get the lock. However, this
      is not really necessary if there is more than one task in the queue
      and the queue head don't need to reset the tail code. For that case,
      a simple write to set the lock bit is enough as the queue head will
      be the only one eligible to get the lock as long as it checks that
      both the lock and pending bits are not set. The current pending bit
      waiting code will ensure that the bit will not be set as soon as the
      tail code in the lock is set.
      
      With that change, the are some slight improvement in the performance
      of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
      Westere-EX machine as shown in the tables below.
      
      		[Standalone/Embedded - same node]
        # of tasks	Before patch	After patch	%Change
        ----------	-----------	----------	-------
             3	 2324/2321	2248/2265	 -3%/-2%
             4	 2890/2896	2819/2831	 -2%/-2%
             5	 3611/3595	3522/3512	 -2%/-2%
             6	 4281/4276	4173/4160	 -3%/-3%
             7	 5018/5001	4875/4861	 -3%/-3%
             8	 5759/5750	5563/5568	 -3%/-3%
      
      		[Standalone/Embedded - different nodes]
        # of tasks	Before patch	After patch	%Change
        ----------	-----------	----------	-------
             3	12242/12237	12087/12093	 -1%/-1%
             4	10688/10696	10507/10521	 -2%/-2%
      
      It was also found that this change produced a much bigger performance
      improvement in the newer IvyBridge-EX chip and was essentially to close
      the performance gap between the ticket spinlock and queued spinlock.
      
      The disk workload of the AIM7 benchmark was run on a 4-socket
      Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
      on a 3.14 based kernel. The results of the test runs were:
      
                      AIM7 XFS Disk Test
        kernel                 JPM    Real Time   Sys Time    Usr Time
        -----                  ---    ---------   --------    --------
        ticketlock            5678233    3.17       96.61       5.81
        qspinlock             5750799    3.13       94.83       5.97
      
                      AIM7 EXT4 Disk Test
        kernel                 JPM    Real Time   Sys Time    Usr Time
        -----                  ---    ---------   --------    --------
        ticketlock            1114551   16.15      509.72       7.11
        qspinlock             2184466    8.24      232.99       6.01
      
      The ext4 filesystem run had a much higher spinlock contention than
      the xfs filesystem run.
      
      The "ebizzy -m" test was also run with the following results:
      
        kernel               records/s  Real Time   Sys Time    Usr Time
        -----                ---------  ---------   --------    --------
        ticketlock             2075       10.00      216.35       3.49
        qspinlock              3023       10.00      198.20       4.80
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2c83e8e9
    • P
      locking/qspinlock: Optimize for smaller NR_CPUS · 69f9cae9
      Peter Zijlstra (Intel) 提交于
      When we allow for a max NR_CPUS < 2^14 we can optimize the pending
      wait-acquire and the xchg_tail() operations.
      
      By growing the pending bit to a byte, we reduce the tail to 16bit.
      This means we can use xchg16 for the tail part and do away with all
      the repeated compxchg() operations.
      
      This in turn allows us to unconditionally acquire; the locked state
      as observed by the wait loops cannot change. And because both locked
      and pending are now a full byte we can use simple stores for the
      state transition, obviating one atomic operation entirely.
      
      This optimization is needed to make the qspinlock achieve performance
      parity with ticket spinlock at light load.
      
      All this is horribly broken on Alpha pre EV56 (and any other arch that
      cannot do single-copy atomic byte stores).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69f9cae9
    • W
      locking/qspinlock: Extract out code snippets for the next patch · 6403bd7d
      Waiman Long 提交于
      This is a preparatory patch that extracts out the following 2 code
      snippets to prepare for the next performance optimization patch.
      
       1) the logic for the exchange of new and previous tail code words
          into a new xchg_tail() function.
       2) the logic for clearing the pending bit and setting the locked bit
          into a new clear_pending_set_locked() function.
      
      This patch also simplifies the trylock operation before queuing by
      calling queued_spin_trylock() directly.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6403bd7d
    • P
      locking/qspinlock: Add pending bit · c1fb159d
      Peter Zijlstra (Intel) 提交于
      Because the qspinlock needs to touch a second cacheline (the per-cpu
      mcs_nodes[]); add a pending bit and allow a single in-word spinner
      before we punt to the second cacheline.
      
      It is possible so observe the pending bit without the locked bit when
      the last owner has just released but the pending owner has not yet
      taken ownership.
      
      In this case we would normally queue -- because the pending bit is
      already taken. However, in this case the pending bit is guaranteed
      to be released 'soon', therefore wait for it and avoid queueing.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1fb159d
    • W
      locking/qspinlock: Introduce a simple generic 4-byte queued spinlock · a33fda35
      Waiman Long 提交于
      This patch introduces a new generic queued spinlock implementation that
      can serve as an alternative to the default ticket spinlock. Compared
      with the ticket spinlock, this queued spinlock should be almost as fair
      as the ticket spinlock. It has about the same speed in single-thread
      and it can be much faster in high contention situations especially when
      the spinlock is embedded within the data structure to be protected.
      
      Only in light to moderate contention where the average queue depth
      is around 1-3 will this queued spinlock be potentially a bit slower
      due to the higher slowpath overhead.
      
      This queued spinlock is especially suit to NUMA machines with a large
      number of cores as the chance of spinlock contention is much higher
      in those machines. The cost of contention is also higher because of
      slower inter-node memory traffic.
      
      Due to the fact that spinlocks are acquired with preemption disabled,
      the process will not be migrated to another CPU while it is trying
      to get a spinlock. Ignoring interrupt handling, a CPU can only be
      contending in one spinlock at any one time. Counting soft IRQ, hard
      IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
      activities.  By allocating a set of per-cpu queue nodes and used them
      to form a waiting queue, we can encode the queue node address into a
      much smaller 24-bit size (including CPU number and queue node index)
      leaving one byte for the lock.
      
      Please note that the queue node is only needed when waiting for the
      lock. Once the lock is acquired, the queue node can be released to
      be used later.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Douglas Hatch <doug.hatch@hp.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Scott J Norton <scott.norton@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a33fda35