• R
    net: Optimize snmp stat aggregation by walking all the percpu data at once · a3a77372
    Raghavendra K T 提交于
    Docker container creation linearly increased from around 1.6 sec to 7.5 sec
    (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.
    
    reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
    through per cpu data of an item (iteratively for around 36 items).
    
    idea: This patch tries to aggregate the statistics by going through
    all the items of each cpu sequentially which is reducing cache
    misses.
    
    Docker creation got faster by more than 2x after the patch.
    
    Result:
                           Before           After
    Docker creation time   6.836s           3.25s
    cache miss             2.7%             1.41%
    
    perf before:
        50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
         9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
         3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
         2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock
    
    perf after:
        10.57%  docker           docker                [.] scanblock
         8.37%  swapper          [kernel.kallsyms]     [k] snooze_loop
         6.91%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
         6.67%  docker           [kernel.kallsyms]     [k] veth_stats_one
    
    changes/ideas suggested:
    Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
    place of unaligned_put (Joe).
    Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
    Acked-by: NEric Dumazet <edumazet@google.com>
    Signed-off-by: NDavid S. Miller <davem@davemloft.net>
    a3a77372
addrconf.c 144.5 KB