• C
    [IA64] Spinlock optimizations · f5210891
    Christoph Lameter 提交于
    1. Nontemporal store for spin unlock.
    
    A nontemporal store will not update the LRU setting for the cacheline. The
    cacheline with the lock may therefore be evicted faster from the cpu
    caches. Doing so may be useful since it increases the chance that the
    exclusive cache line has been evicted when another cpu is trying to
    acquire the lock.
    
    The time between dropping and reacquiring a lock on the same cpu is
    typically very small so the danger of the cacheline being
    evicted is negligible.
    
    2. Avoid semaphore operation in write_unlock and use nontemporal store
    
    write_lock uses a cmpxchg like the regular spin_lock but write_unlock uses
    clear_bit which requires a load and then a loop over a cmpxchg. The
    following patch makes write_unlock simply use a nontemporal store to clear
    the highest 8 bits. We will then still have the lower 3 bytes (24 bits)
    left to count the readers.
    
    Doing the byte store will reduce the number of possible readers from 2^31
    to 2^24 = 16 million.
    
    These patches were discussed already:
    
    http://marc.theaimsgroup.com/?t=111472054400001&r=1&w=2
    http://marc.theaimsgroup.com/?l=linux-ia64&m=111401837707849&w=2
    
    The nontemporal stores will only work using GCC. If a compiler is used
    that does not support inline asm then fallback C code is used. This
    will preserve the byte store but not be able to do the nontemporal stores.
    Signed-off-by: NChristoph Lameter <clameter@sgi.com>
    Signed-off-by: NTony Luck <tony.luck@intel.com>
    f5210891
spinlock.h 6.7 KB