1. 18 4月, 2008 1 次提交
  2. 01 10月, 2006 2 次提交
    • M
      [PATCH] Directed yield: direct yield of spinlocks for powerpc · cdc39363
      Martin Schwidefsky 提交于
      Powerpc already has a directed yield for CONFIG_PREEMPT="n".  To make it
      work with CONFIG_PREEMPT="y" as well the _raw_{spin,read,write}_relax
      primitives need to be defined to call __spin_yield() for spinlocks and
      __rw_yield() for rw-locks.
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cdc39363
    • M
      [PATCH] Directed yield: cpu_relax variants for spinlocks and rw-locks · ef6edc97
      Martin Schwidefsky 提交于
      On systems running with virtual cpus there is optimization potential in
      regard to spinlocks and rw-locks.  If the virtual cpu that has taken a lock
      is known to a cpu that wants to acquire the same lock it is beneficial to
      yield the timeslice of the virtual cpu in favour of the cpu that has the
      lock (directed yield).
      
      With CONFIG_PREEMPT="n" this can be implemented by the architecture without
      common code changes.  Powerpc already does this.
      
      With CONFIG_PREEMPT="y" the lock loops are coded with _raw_spin_trylock,
      _raw_read_trylock and _raw_write_trylock in kernel/spinlock.c.  If the lock
      could not be taken cpu_relax is called.  A directed yield is not possible
      because cpu_relax doesn't know anything about the lock.  To be able to
      yield the lock in favour of the current lock holder variants of cpu_relax
      for spinlocks and rw-locks are needed.  The new _raw_spin_relax,
      _raw_read_relax and _raw_write_relax primitives differ from cpu_relax
      insofar that they have an argument: a pointer to the lock structure.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ef6edc97
  3. 13 9月, 2006 1 次提交
    • P
      [POWERPC] Fix MMIO ops to provide expected barrier behaviour · f007cacf
      Paul Mackerras 提交于
      This changes the writeX family of functions to have a sync instruction
      before the MMIO store rather than after, because the generally expected
      behaviour is that the device receiving the MMIO store can be guaranteed
      to see the effects of any preceding writes to normal memory.
      
      To preserve ordering between writeX and readX, and to preserve ordering
      between preceding stores and the readX, the readX family of functions
      have had an sync added before the load.
      
      Although writeX followed by spin_unlock is not officially guaranteed
      to keep the writeX inside the spin-locked region unless an mmiowb()
      is used, there are currently drivers that depend on the previous
      behaviour on powerpc, which was that the mmiowb wasn't actually required.
      Therefore we have a per-cpu flag that is set by writeX, cleared by
      __raw_spin_lock and mmiowb, and tested by __raw_spin_unlock.  If it is
      set, __raw_spin_unlock does a sync and clears it.
      
      This changes both 32-bit and 64-bit readX/writeX.  32-bit already has a
      sync in __raw_spin_unlock (since lwsync doesn't exist on 32-bit), and thus
      doesn't need the per-cpu flag.
      
      Tested on G5 (PPC970) and POWER5.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f007cacf
  4. 13 1月, 2006 2 次提交
    • A
      [PATCH] powerpc: use lwsync in atomics, bitops, lock functions · 144b9c13
      Anton Blanchard 提交于
      eieio is only a store - store ordering. When used to order an unlock
      operation loads may leak out of the critical region. This is potentially
      buggy, one example is if a user wants to atomically read a couple of
      values.
      
      We can solve this with an lwsync which orders everything except store - load.
      
      I removed the (now unused) EIEIO_ON_SMP macros and the c versions
      isync_on_smp and eieio_on_smp now we dont use them. I also removed some
      old comments that were used to identify inline spinlocks in assembly,
      they dont make sense now our locks are out of line.
      
      Another interesting thing was that read_unlock was using an eieio even
      though the rest of the spinlock code had already been converted to
      use lwsync.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      144b9c13
    • D
      [PATCH] powerpc: Remove lppaca structure from the PACA · 3356bb9f
      David Gibson 提交于
      At present the lppaca - the structure shared with the iSeries
      hypervisor and phyp - is contained within the PACA, our own low-level
      per-cpu structure.  This doesn't have to be so, the patch below
      removes it, making a separate array of lppaca structures.
      
      This saves approximately 500*NR_CPUS bytes of image size and kernel
      memory, because we don't need aligning gap between the Linux and
      hypervisor portions of every PACA.  On the other hand it means an
      extra level of dereference in many accesses to the lppaca.
      
      The patch also gets rid of several places where we assign the paca
      address to a local variable for no particular reason.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3356bb9f
  5. 09 1月, 2006 1 次提交
  6. 19 11月, 2005 1 次提交
  7. 01 11月, 2005 1 次提交
  8. 11 9月, 2005 1 次提交
    • I
      [PATCH] spinlock consolidation · fb1c8f93
      Ingo Molnar 提交于
      This patch (written by me and also containing many suggestions of Arjan van
      de Ven) does a major cleanup of the spinlock code.  It does the following
      things:
      
       - consolidates and enhances the spinlock/rwlock debugging code
      
       - simplifies the asm/spinlock.h files
      
       - encapsulates the raw spinlock type and moves generic spinlock
         features (such as ->break_lock) into the generic code.
      
       - cleans up the spinlock code hierarchy to get rid of the spaghetti.
      
      Most notably there's now only a single variant of the debugging code,
      located in lib/spinlock_debug.c.  (previously we had one SMP debugging
      variant per architecture, plus a separate generic one for UP builds)
      
      Also, i've enhanced the rwlock debugging facility, it will now track
      write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
      All locks have lockup detection now, which will work for both soft and hard
      spin/rwlock lockups.
      
      The arch-level include files now only contain the minimally necessary
      subset of the spinlock code - all the rest that can be generalized now
      lives in the generic headers:
      
       include/asm-i386/spinlock_types.h       |   16
       include/asm-x86_64/spinlock_types.h     |   16
      
      I have also split up the various spinlock variants into separate files,
      making it easier to see which does what. The new layout is:
      
         SMP                         |  UP
         ----------------------------|-----------------------------------
         asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
         linux/spinlock_types.h      |  linux/spinlock_types.h
         asm/spinlock_smp.h          |  linux/spinlock_up.h
         linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
         linux/spinlock.h            |  linux/spinlock.h
      
      /*
       * here's the role of the various spinlock/rwlock related include files:
       *
       * on SMP builds:
       *
       *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
       *                        initializers
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
       *                        implementations, mostly inline assembly code
       *
       *   (also included on UP-debug builds:)
       *
       *  linux/spinlock_api_smp.h:
       *                        contains the prototypes for the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       *
       * on UP builds:
       *
       *  linux/spinlock_type_up.h:
       *                        contains the generic, simplified UP spinlock type.
       *                        (which is an empty structure on non-debug builds)
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  linux/spinlock_up.h:
       *                        contains the __raw_spin_*()/etc. version of UP
       *                        builds. (which are NOPs on non-debug, non-preempt
       *                        builds)
       *
       *   (included on UP-non-debug builds:)
       *
       *  linux/spinlock_api_up.h:
       *                        builds the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       */
      
      All SMP and UP architectures are converted by this patch.
      
      arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
      crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
      be mostly fine.
      
      From: Grant Grundler <grundler@parisc-linux.org>
      
        Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
        Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
        non-SMP kernels.  That should be trivial to fix up later if necessary.
      
        I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
        some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
        are well tested and contained entirely inside arch specific code.  I do NOT
        expect any new issues to arise with them.
      
       If someone does ever need to use debug/metrics with them, then they will
        need to unravel this hairball between spinlocks, atomic ops, and bit ops
        that exist only because parisc has exactly one atomic instruction: LDCW
        (load and clear word).
      
      From: "Luck, Tony" <tony.luck@intel.com>
      
         ia64 fix
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjanv@infradead.org>
      Signed-off-by: NGrant Grundler <grundler@parisc-linux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Signed-off-by: NHirokazu Takata <takata@linux-m32r.org>
      Signed-off-by: NMikael Pettersson <mikpe@csd.uu.se>
      Signed-off-by: NBenoit Boissinot <benoit.boissinot@ens-lyon.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb1c8f93
  9. 01 5月, 2005 1 次提交
    • J
      [PATCH] ppc64: reverse prediction on spinlock busy loop code · d637413f
      Jake Moilanen 提交于
      On our raw spinlocks, we currently have an attempt at the lock, and if we do
      not get it we enter a spin loop.  This spinloop will likely continue for
      awhile, and we pridict likely.
      
      Shouldn't we predict that we will get out of the loop so our next instructions
      are already prefetched.  Even when we miss because the lock is still held, it
      won't matter since we are waiting anyways.
      
      I did a couple quick benchmarks, but the results are inconclusive.
      
      	16-way 690 running specjbb with original code
      	# ./specjbb 3000 16 1 1 19 30 120
      	    ...
      	Valid run, Score is 59282
      
      	16-way 690 running specjbb with unlikely code
      	# ./specjbb 3000 16 1 1 19 30 120
      	    ...
      	Valid run, Score is 59541
      
      I saw a smaller increase on a JS20 (~1.6%)
      
      	JS20 specjbb w/ original code
      	# ./specjbb 400 2 1 1 19 30 120
      	   ...
      	Valid run, Score is 20460
      
      	JS20 specjbb w/ unlikely code
      	# ./specjbb 400 2 1 1 19 30 120
      	   ...
      	Valid run, Score is 20803
      
      Anton said:
      
      Mispredicting the spinlock busy loop also means we slow down the rate at which
      we do the loads which can be good for heavily contended locks.
      
      Note: There are some gcc issues with our default build and branch prediction,
      but a CONFIG_POWER4_ONLY build should emit them correctly.  I'm working with
      Alan Modra on it now.
      Signed-off-by: NJake Moilanen <moilanen@austin.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d637413f
  10. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4