• V
    tracing: Use NUMA allocation for per-cpu ring buffer pages · 7ea59064
    Vaibhav Nagarnaik 提交于
    The tracing ring buffer is a group of per-cpu ring buffers where
    allocation and logging is done on a per-cpu basis. The events that are
    generated on a particular CPU are logged in the corresponding buffer.
    This is to provide wait-free writes between CPUs and good NUMA node
    locality while accessing the ring buffer.
    
    However, the allocation routines consider NUMA locality only for buffer
    page metadata and not for the actual buffer page. This causes the pages
    to be allocated on the NUMA node local to the CPU where the allocation
    routine is running at the time.
    
    This patch fixes the problem by using a NUMA node specific allocation
    routine so that the pages are allocated from a NUMA node local to the
    logging CPU.
    
    I tested with the getuid_microbench from autotest. It is a simple binary
    that calls getuid() in a loop and measures the average time for the
    syscall to complete. The following command was used to test:
    $ getuid_microbench 1000000
    
    Compared the numbers found on kernel with and without this patch and
    found that logging latency decreases by 30-50 ns/call.
    tracing with non-NUMA allocation - 569 ns/call
    tracing with NUMA allocation     - 512 ns/call
    Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Michael Rubin <mrubin@google.com>
    Cc: David Sharp <dhsharp@google.com>
    Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
    7ea59064
ring_buffer_benchmark.c 10.5 KB