1. 09 4月, 2019 8 次提交
  2. 05 4月, 2019 3 次提交
  3. 04 4月, 2019 2 次提交
    • D
      ipv6: Flip to fib_nexthop_info · c0a72077
      David Ahern 提交于
      Export fib_nexthop_info and fib_add_nexthop for use by IPv6 code.
      Remove rt6_nexthop_info and rt6_add_nexthop in favor of the IPv4
      versions. Update fib_nexthop_info for IPv6 linkdown check and
      RTA_GATEWAY for AF_INET6.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0a72077
    • D
      ipv4: Add fib_nh_common to fib_result · eba618ab
      David Ahern 提交于
      Most of the ipv4 code only needs data from fib_nh_common. Add
      fib_nh_common selection to fib_result and update users to use it.
      
      Right now, fib_nh_common in fib_result will point to a fib_nh struct
      that is embedded within a fib_info:
      
              fib_info  --> fib_nh
                            fib_nh
                            ...
                            fib_nh
                              ^
          fib_result->nhc ----+
      
      Later, nhc can point to a fib_nh within a nexthop struct:
      
              fib_info --> nexthop --> fib_nh
                                         ^
          fib_result->nhc ---------------+
      
      or for a nexthop group:
      
              fib_info --> nexthop --> nexthop --> fib_nh
                                       nexthop --> fib_nh
                                       ...
                                       nexthop --> fib_nh
                                                     ^
          fib_result->nhc ---------------------------+
      
      In all cases nhsel within fib_result will point to which leg in the
      multipath route is used.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eba618ab
  4. 02 4月, 2019 4 次提交
  5. 30 3月, 2019 11 次提交
  6. 29 3月, 2019 5 次提交
  7. 28 3月, 2019 2 次提交
  8. 25 3月, 2019 2 次提交
  9. 24 3月, 2019 3 次提交
    • E
      tcp: add one skb cache for rx · 8b27dae5
      Eric Dumazet 提交于
      Often times, recvmsg() system calls and BH handling for a particular
      TCP socket are done on different cpus.
      
      This means the incoming skb had to be allocated on a cpu,
      but freed on another.
      
      This incurs a high spinlock contention in slab layer for small rpc,
      but also a high number of cache line ping pongs for larger packets.
      
      A full size GRO packet might use 45 page fragments, meaning
      that up to 45 put_page() can be involved.
      
      More over performing the __kfree_skb() in the recvmsg() context
      adds a latency for user applications, and increase probability
      of trapping them in backlog processing, since the BH handler
      might found the socket owned by the user.
      
      This patch, combined with the prior one increases the rpc
      performance by about 10 % on servers with large number of cores.
      
      (tcp_rr workload with 10,000 flows and 112 threads reach 9 Mpps
       instead of 8 Mpps)
      
      This also increases single bulk flow performance on 40Gbit+ links,
      since in this case there are often two cpus working in tandem :
      
       - CPU handling the NIC rx interrupts, feeding the receive queue,
        and (after this patch) freeing the skbs that were consumed.
      
       - CPU in recvmsg() system call, essentially 100 % busy copying out
        data to user space.
      
      Having at most one skb in a per-socket cache has very little risk
      of memory exhaustion, and since it is protected by socket lock,
      its management is essentially free.
      
      Note that if rps/rfs is used, we do not enable this feature, because
      there is high chance that the same cpu is handling both the recvmsg()
      system call and the TCP rx path, but that another cpu did the skb
      allocations in the device driver right before the RPS/RFS logic.
      
      To properly handle this case, it seems we would need to record
      on which cpu skb was allocated, and use a different channel
      to give skbs back to this cpu.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b27dae5
    • E
      tcp: add one skb cache for tx · 472c2e07
      Eric Dumazet 提交于
      On hosts with a lot of cores, RPC workloads suffer from heavy contention on slab spinlocks.
      
          20.69%  [kernel]       [k] queued_spin_lock_slowpath
           5.64%  [kernel]       [k] _raw_spin_lock
           3.83%  [kernel]       [k] syscall_return_via_sysret
           3.48%  [kernel]       [k] __entry_text_start
           1.76%  [kernel]       [k] __netif_receive_skb_core
           1.64%  [kernel]       [k] __fget
      
      For each sendmsg(), we allocate one skb, and free it at the time ACK packet comes.
      
      In many cases, ACK packets are handled by another cpus, and this unfortunately
      incurs heavy costs for slab layer.
      
      This patch uses an extra pointer in socket structure, so that we try to reuse
      the same skb and avoid these expensive costs.
      
      We cache at most one skb per socket so this should be safe as far as
      memory pressure is concerned.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      472c2e07
    • E
      net: convert rps_needed and rfs_needed to new static branch api · dc05360f
      Eric Dumazet 提交于
      We prefer static_branch_unlikely() over static_key_false() these days.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc05360f