1. 15 3月, 2021 3 次提交
  2. 12 3月, 2021 1 次提交
    • E
      tcp: plug skb_still_in_host_queue() to TSQ · f4dae54e
      Eric Dumazet 提交于
      Jakub and Neil reported an increase of RTO timers whenever
      TX completions are delayed a bit more (by increasing
      NIC TX coalescing parameters)
      
      Main issue is that TCP stack has a logic preventing a packet
      being retransmit if the prior clone has not yet been
      orphaned or freed.
      
      This logic came with commit 1f3279ae ("tcp: avoid
      retransmits of TCP packets hanging in host queues")
      
      Thankfully, in the case skb_still_in_host_queue() detects
      the initial clone is still in flight, it can use TSQ logic
      that will eventually retry later, at the moment the clone
      is freed or orphaned.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NNeil Spring <ntspring@fb.com>
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4dae54e
  3. 04 3月, 2021 1 次提交
  4. 27 2月, 2021 1 次提交
  5. 14 2月, 2021 3 次提交
    • A
      skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing · 9243adfc
      Alexander Lobakin 提交于
      napi_frags_finish() and napi_skb_finish() can only be called inside
      NAPI Rx context, so we can feed NAPI cache with skbuff_heads that
      got NAPI_MERGED_FREE verdict instead of immediate freeing.
      Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish()
      and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs
      to NAPI cache.
      As many drivers call napi_alloc_skb()/napi_get_frags() on their
      receive path, this becomes especially useful.
      Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9243adfc
    • A
      skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads · f450d539
      Alexander Lobakin 提交于
      Instead of just bulk-flushing skbuff_heads queued up through
      napi_consume_skb() or __kfree_skb_defer(), try to reuse them
      on allocation path.
      If the cache is empty on allocation, bulk-allocate the first
      16 elements, which is more efficient than per-skb allocation.
      If the cache is full on freeing, bulk-wipe the second half of
      the cache (32 elements).
      This also includes custom KASAN poisoning/unpoisoning to be
      double sure there are no use-after-free cases.
      
      To not change current behaviour, introduce a new function,
      napi_build_skb(), to optionally use a new approach later
      in drivers.
      
      Note on selected bulk size, 16:
       - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
         and especially VETH_XDP_BATCH, which is also used to
         bulk-allocate skbuff_heads and was tested on powerful
         setups;
       - this also showed the best performance in the actual
         test series (from the array of {8, 16, 32}).
      
      Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
      Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
      Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
      Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
      Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f450d539
    • A
      skbuff: remove __kfree_skb_flush() · fec6e49b
      Alexander Lobakin 提交于
      This function isn't much needed as NAPI skb queue gets bulk-freed
      anyway when there's no more room, and even may reduce the efficiency
      of bulk operations.
      It will be even less needed after reusing skb cache on allocation path,
      so remove it and this way lighten network softirqs a bit.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fec6e49b
  6. 07 2月, 2021 1 次提交
    • K
      net: Introduce {netdev,napi}_alloc_frag_align() · 3f6e687d
      Kevin Hao 提交于
      In the current implementation of {netdev,napi}_alloc_frag(), it doesn't
      have any align guarantee for the returned buffer address, But for some
      hardwares they do require the DMA buffer to be aligned correctly,
      so we would have to use some workarounds like below if the buffers
      allocated by the {netdev,napi}_alloc_frag() are used by these hardwares
      for DMA.
          buf = napi_alloc_frag(really_needed_size + align);
          buf = PTR_ALIGN(buf, align);
      
      These codes seems ugly and would waste a lot of memories if the buffers
      are used in a network driver for the TX/RX. We have added the align
      support for the page_frag functions, so add the corresponding
      {netdev,napi}_frag functions.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3f6e687d
  7. 05 2月, 2021 2 次提交
  8. 23 1月, 2021 1 次提交
  9. 21 1月, 2021 1 次提交
  10. 20 1月, 2021 1 次提交
  11. 12 1月, 2021 2 次提交
  12. 08 1月, 2021 11 次提交
  13. 02 12月, 2020 1 次提交
  14. 18 11月, 2020 1 次提交
  15. 03 11月, 2020 1 次提交
    • A
      net: add kcov handle to skb extensions · 6370cc3b
      Aleksandr Nogikh 提交于
      Remote KCOV coverage collection enables coverage-guided fuzzing of the
      code that is not reachable during normal system call execution. It is
      especially helpful for fuzzing networking subsystems, where it is
      common to perform packet handling in separate work queues even for the
      packets that originated directly from the user space.
      
      Enable coverage-guided frame injection by adding kcov remote handle to
      skb extensions. Default initialization in __alloc_skb and
      __build_skb_around ensures that no socket buffer that was generated
      during a system call will be missed.
      
      Code that is of interest and that performs packet processing should be
      annotated with kcov_remote_start()/kcov_remote_stop().
      
      An alternative approach is to determine kcov_handle solely on the
      basis of the device/interface that received the specific socket
      buffer. However, in this case it would be impossible to distinguish
      between packets that originated during normal background network
      processes or were intentionally injected from the user space.
      Signed-off-by: NAleksandr Nogikh <nogikh@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      6370cc3b
  16. 04 10月, 2020 1 次提交
    • G
      net/sched: act_vlan: Add {POP,PUSH}_ETH actions · 19fbcb36
      Guillaume Nault 提交于
      Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
      respectively pop and push a base Ethernet header at the beginning of a
      frame.
      
      POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
      must be stripped before calling POP_ETH.
      
      PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
      addresses can be configured. The Ethertype is automatically set from
      skb->protocol. These restrictions ensure that all skb's fields remain
      consistent, so that this action can't confuse other part of the
      networking stack (like GSO).
      
      Since openvswitch already had these actions, consolidate the code in
      skbuff.c (like for vlan and mpls push/pop).
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19fbcb36
  17. 01 10月, 2020 1 次提交
    • D
      bpf: Add redirect_neigh helper as redirect drop-in · b4ab3141
      Daniel Borkmann 提交于
      Add a redirect_neigh() helper as redirect() drop-in replacement
      for the xmit side. Main idea for the helper is to be very similar
      in semantics to the latter just that the skb gets injected into
      the neighboring subsystem in order to let the stack do the work
      it knows best anyway to populate the L2 addresses of the packet
      and then hand over to dev_queue_xmit() as redirect() does.
      
      This solves two bigger items: i) skbs don't need to go up to the
      stack on the host facing veth ingress side for traffic egressing
      the container to achieve the same for populating L2 which also
      has the huge advantage that ii) the skb->sk won't get orphaned in
      ip_rcv_core() when entering the IP routing layer on the host stack.
      
      Given that skb->sk neither gets orphaned when crossing the netns
      as per 9c4c3252 ("skbuff: preserve sock reference when scrubbing
      the skb.") the helper can then push the skbs directly to the phys
      device where FQ scheduler can do its work and TCP stack gets proper
      backpressure given we hold on to skb->sk as long as skb is still
      residing in queues.
      
      With the helper used in BPF data path to then push the skb to the
      phys device, I observed a stable/consistent TCP_STREAM improvement
      on veth devices for traffic going container -> host -> host ->
      container from ~10Gbps to ~15Gbps for a single stream in my test
      environment.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: David Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
      b4ab3141
  18. 10 9月, 2020 1 次提交
  19. 27 8月, 2020 1 次提交
  20. 25 8月, 2020 1 次提交
  21. 24 8月, 2020 1 次提交
  22. 21 8月, 2020 1 次提交
  23. 04 8月, 2020 1 次提交
    • W
      net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct · 038ebb1a
      wenxu 提交于
      When openvswitch conntrack offload with act_ct action. Fragment packets
      defrag in the ingress tc act_ct action and miss the next chain. Then the
      packet pass to the openvswitch datapath without the mru. The over
      mtu packet will be dropped in output action in openvswitch for over mtu.
      
      "kernel: net2: dropped over-mtu packet: 1528 > 1500"
      
      This patch add mru in the tc_skb_ext for adefrag and miss next chain
      situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
      to the qdisc_skb_cb when the packet defrag. And When the chain miss,
      The mru is set to tc_skb_ext which can be got by ovs datapath.
      
      Fixes: b57dc7c1 ("net/sched: Introduce action ct")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      038ebb1a
  24. 25 7月, 2020 1 次提交