• P
    rcu: Reduce cache-miss initialization latencies for large systems · 8932a63d
    Paul E. McKenney 提交于
    Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
    limit of 16 on the leaf-level fanout for the rcu_node tree.  This was
    needed to reduce lock contention that was induced by the synchronization
    of scheduling-clock interrupts, which was in turn needed to improve
    energy efficiency for moderate-sized lightly loaded servers.
    
    However, reducing the leaf-level fanout means that there are more
    leaf-level rcu_node structures in the tree, which in turn means that
    RCU's grace-period initialization incurs more cache misses.  This is
    not a problem on moderate-sized servers with only a few tens of CPUs,
    but becomes a major source of real-time latency spikes on systems with
    many hundreds of CPUs.  In addition, the workloads running on these large
    systems tend to be CPU-bound, which eliminates the energy-efficiency
    advantages of synchronizing scheduling-clock interrupts.  Therefore,
    these systems need maximal values for the rcu_node leaf-level fanout.
    
    This commit addresses this problem by introducing a new kernel parameter
    named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
    This parameter defaults to 16 to handle the common case of a moderate
    sized lightly loaded servers, but may be set higher on larger systems.
    Reported-by: NMike Galbraith <efault@gmx.de>
    Reported-by: NDimitri Sivanich <sivanich@sgi.com>
    Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    8932a63d
rcutree.c 79.3 KB