• D
    ipv4: Allow amount of dirty memory from fib resizing to be controllable · 9ab948a9
    David Ahern 提交于
    fib_trie implementation calls synchronize_rcu when a certain amount of
    pages are dirty from freed entries. The number of pages was determined
    experimentally in 2009 (commit c3059477).
    
    At the current setting, synchronize_rcu is called often -- 51 times in a
    second in one test with an average of an 8 msec delay adding a fib entry.
    The total impact is a lot of slow down modifying the fib. This is seen
    in the output of 'time' - the difference between real time and sys+user.
    For example, using 720,022 single path routes and 'ip -batch'[1]:
    
        $ time ./ip -batch ipv4/routes-1-hops
        real    0m14.214s
        user    0m2.513s
        sys     0m6.783s
    
    So roughly 35% of the actual time to install the routes is from the ip
    command getting scheduled out, most notably due to synchronize_rcu (this
    is observed using 'perf sched timehist').
    
    This patch makes the amount of dirty memory configurable between 64k where
    the synchronize_rcu is called often (small, low end systems that are memory
    sensitive) to 64M where synchronize_rcu is called rarely during a large
    FIB change (for high end systems with lots of memory). The default is 512kB
    which corresponds to the current setting of 128 pages with a 4kB page size.
    
    As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu
    in a second blocking for up to 30 msec in a single instance, and a total
    of almost 100 msec across the 4 calls in the second. The trade off is
    allowing FIB entries to consume more memory in a given time window but
    but with much better fib insertion rates (~30% increase in prefixes/sec).
    With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch
    file runs in:
    
        $ time ./ip -batch ipv4/routes-1-hops
        real    0m9.692s
        user    0m2.491s
        sys     0m6.769s
    
    So the dead time is reduced to about 1/2 second or <5% of the real time.
    
    [1] 'ip' modified to not request ACK messages which improves route
        insertion times by about 20%
    Signed-off-by: NDavid Ahern <dsahern@gmail.com>
    Signed-off-by: NDavid S. Miller <davem@davemloft.net>
    9ab948a9
sysctl_net_ipv4.c 32.3 KB