• J
    [PATCH] sched: reduce overhead of calc_load · db1b1fef
    Jack Steiner 提交于
    Currently, count_active_tasks() calls both nr_running() &
    nr_interruptible().  Each of these functions does a "for_each_cpu" & reads
    values from the runqueue of each cpu.  Although this is not a lot of
    instructions, each runqueue may be located on different node.  Depending on
    the architecture, a unique TLB entry may be required to access each
    runqueue.
    
    Since there may be more runqueues than cpu TLB entries, a scan of all
    runqueues can trash the TLB.  Each memory reference incurs a TLB miss &
    refill.
    
    In addition, the runqueue cacheline that contains nr_running &
    nr_uninterruptible may be evicted from the cache between the two passes.
    This causes unnecessary cache misses.
    
    Combining nr_running() & nr_interruptible() into a single function
    substantially reduces the TLB & cache misses on large systems.  This should
    have no measureable effect on smaller systems.
    
    On a 128p IA64 system running a memory stress workload, the new function
    reduced the overhead of calc_load() from 605 usec/call to 324 usec/call.
    Signed-off-by: NJack Steiner <steiner@sgi.com>
    Acked-by: NIngo Molnar <mingo@elte.hu>
    Signed-off-by: NAndrew Morton <akpm@osdl.org>
    Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
    db1b1fef
sched.c 154.8 KB