• W
    tick/broadcast: Reduce lock cacheline contention · 668802c2
    Waiman Long 提交于
    It was observed that on an Intel x86 system without the ARAT (Always
    running APIC timer) feature and with fairly large number of CPUs as
    well as CPUs coming in and out of intel_idle frequently, the lock
    contention on the tick_broadcast_lock can become significant.
    
    To reduce contention, the lock is put into its own cacheline and all
    the cpumask_var_t variables are put into the __read_mostly section.
    
    Running the SP benchmark of the NAS Parallel Benchmarks on a 4-socket
    16-core 32-thread Nehalam system, the performance number improved
    from 3353.94 Mop/s to 3469.31 Mop/s when this patch was applied on
    a 4.9.6 kernel.  This is a 3.4% improvement.
    Signed-off-by: NWaiman Long <longman@redhat.com>
    Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Link: http://lkml.kernel.org/r/1485799063-20857-1-git-send-email-longman@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
    668802c2
cpumask.h 24.1 KB