提交 · c6baf7eeb0cf82f6a90a703f6548250fc85cfdcc · openeuler / Kernel

15 3月, 2021 3 次提交

skbuff: micro-optimize {,__}skb_header_pointer() · d206121f

由 Alexander Lobakin 提交于 3月 14, 2021

{,__}skb_header_pointer() helpers exist mainly for preventing
accesses-beyond-end of the linear data.
In the vast majorify of cases, they bail out on the first condition.
All code going after is mostly a fallback.
Mark the most common branch as 'likely' one to move it in-line.
Also, skb_copy_bits() can return negative values only when the input
arguments are invalid, e.g. offset is greater than skb->len. It can
be safely marked as 'unlikely' branch, assuming that hotpath code
provides sane input to not fail here.

These two bump the throughput with a single Flow Dissector pass on
every packet (e.g. with RPS or driver that uses eth_get_headlen())
on 20 Mbps per flow/core.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d206121f

flow_dissector: constify raw input data argument · f96533cd

由 Alexander Lobakin 提交于 3月 14, 2021

Flow Dissector code never modifies the input buffer, neither skb nor
raw data.
Make 'data' argument const for all of the Flow dissector's functions.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f96533cd

skbuff: make __skb_header_pointer()'s data argument const · e3305138

由 Alexander Lobakin 提交于 3月 14, 2021

The function never modifies the input buffer, so 'data' argument
can be marked as const.
This implies one harmless cast-away.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3305138

12 3月, 2021 1 次提交

tcp: plug skb_still_in_host_queue() to TSQ · f4dae54e

由 Eric Dumazet 提交于 3月 11, 2021

Jakub and Neil reported an increase of RTO timers whenever
TX completions are delayed a bit more (by increasing
NIC TX coalescing parameters)

Main issue is that TCP stack has a logic preventing a packet
being retransmit if the prior clone has not yet been
orphaned or freed.

This logic came with commit 1f3279ae ("tcp: avoid
retransmits of TCP packets hanging in host queues")

Thankfully, in the case skb_still_in_host_queue() detects
the initial clone is still in flight, it can use TSQ logic
that will eventually retry later, at the moment the clone
is freed or orphaned.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NNeil Spring <ntspring@fb.com>
Reported-by: NJakub Kicinski <kuba@kernel.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4dae54e

04 3月, 2021 1 次提交

skmsg: Add function doc for skb->_sk_redir · 6ed6e1c7

由 Cong Wang 提交于 3月 01, 2021

This should fix the following warning:

include/linux/skbuff.h:932: warning: Function parameter or member
'_sk_redir' not described in 'sk_buff'
Reported-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NLorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210301184805.8174-1-xiyou.wangcong@gmail.com

6ed6e1c7

27 2月, 2021 1 次提交

skmsg: Move sk_redir from TCP_SKB_CB to skb · e3526bb9

由 Cong Wang 提交于 2月 23, 2021

Currently TCP_SKB_CB() is hard-coded in skmsg code, it certainly
does not work for any other non-TCP protocols. We can move them to
skb ext, but it introduces a memory allocation on fast path.

Fortunately, we only need to a word-size to store all the information,
because the flags actually only contains 1 bit so can be just packed
into the lowest bit of the "pointer", which is stored as unsigned
long.

Inside struct sk_buff, '_skb_refdst' can be reused because skb dst is
no longer needed after ->sk_data_ready() so we can just drop it.
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210223184934.6054-5-xiyou.wangcong@gmail.com

e3526bb9

14 2月, 2021 3 次提交

skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing · 9243adfc

由 Alexander Lobakin 提交于 2月 13, 2021

napi_frags_finish() and napi_skb_finish() can only be called inside
NAPI Rx context, so we can feed NAPI cache with skbuff_heads that
got NAPI_MERGED_FREE verdict instead of immediate freeing.
Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish()
and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs
to NAPI cache.
As many drivers call napi_alloc_skb()/napi_get_frags() on their
receive path, this becomes especially useful.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9243adfc

skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads · f450d539

由 Alexander Lobakin 提交于 2月 13, 2021

Instead of just bulk-flushing skbuff_heads queued up through
napi_consume_skb() or __kfree_skb_defer(), try to reuse them
on allocation path.
If the cache is empty on allocation, bulk-allocate the first
16 elements, which is more efficient than per-skb allocation.
If the cache is full on freeing, bulk-wipe the second half of
the cache (32 elements).
This also includes custom KASAN poisoning/unpoisoning to be
double sure there are no use-after-free cases.

To not change current behaviour, introduce a new function,
napi_build_skb(), to optionally use a new approach later
in drivers.

Note on selected bulk size, 16:
 - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
   and especially VETH_XDP_BATCH, which is also used to
   bulk-allocate skbuff_heads and was tested on powerful
   setups;
 - this also showed the best performance in the actual
   test series (from the array of {8, 16, 32}).

Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f450d539

skbuff: remove __kfree_skb_flush() · fec6e49b

由 Alexander Lobakin 提交于 2月 13, 2021

This function isn't much needed as NAPI skb queue gets bulk-freed
anyway when there's no more room, and even may reduce the efficiency
of bulk operations.
It will be even less needed after reusing skb cache on allocation path,
so remove it and this way lighten network softirqs a bit.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fec6e49b

07 2月, 2021 1 次提交

net: Introduce {netdev,napi}_alloc_frag_align() · 3f6e687d

由 Kevin Hao 提交于 2月 04, 2021

In the current implementation of {netdev,napi}_alloc_frag(), it doesn't
have any align guarantee for the returned buffer address, But for some
hardwares they do require the DMA buffer to be aligned correctly,
so we would have to use some workarounds like below if the buffers
allocated by the {netdev,napi}_alloc_frag() are used by these hardwares
for DMA.
    buf = napi_alloc_frag(really_needed_size + align);
    buf = PTR_ALIGN(buf, align);

These codes seems ugly and would waste a lot of memories if the buffers
are used in a network driver for the TX/RX. We have added the align
support for the page_frag functions, so add the corresponding
{netdev,napi}_frag functions.
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3f6e687d

05 2月, 2021 2 次提交

net: introduce common dev_page_is_reusable() · bc38f30f

由 Alexander Lobakin 提交于 2月 02, 2021

A bunch of drivers test the page before reusing/recycling for two
common conditions:
 - if a page was allocated under memory pressure (pfmemalloc page);
 - if a page was allocated at a distant memory node (to exclude
   slowdowns).

Introduce a new common inline for doing this, with likely() already
folded inside to make driver code a bit simpler.
Suggested-by: NDavid Rientjes <rientjes@google.com>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

bc38f30f

skbuff: constify skb_propagate_pfmemalloc() "page" argument · 48f971c9

由 Alexander Lobakin 提交于 2月 02, 2021

The function doesn't write anything to the page struct itself,
so this argument can be const.

Misc: align second argument to the brace while at it.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

48f971c9

23 1月, 2021 1 次提交

tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS · e7ed11ee

由 Yousuk Seung 提交于 1月 20, 2021

This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
the time-to-live or hop limit of the latest incoming packet with
SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
the sequence when incoming packets are aggregated. Exporting the
time-to-live or hop limit value of incoming packets helps to estimate
the hop count of the path of the flow that may change over time.
Signed-off-by: NYousuk Seung <ysseung@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e7ed11ee

21 1月, 2021 1 次提交

net/sched: cls_flower add CT_FLAGS_INVALID flag support · 7baf2429

由 wenxu 提交于 1月 19, 2021

This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to
match the ct_state with invalid for conntrack.
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/1611045110-682-1-git-send-email-wenxu@ucloud.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>

7baf2429

20 1月, 2021 1 次提交

net: add inline function skb_csum_is_sctp · fa821170

由 Xin Long 提交于 1月 16, 2021

This patch is to define a inline function skb_csum_is_sctp(), and
also replace all places where it checks if it's a SCTP CSUM skb.
This function would be used later in many networking drivers in
the following patches.
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

fa821170

12 1月, 2021 2 次提交

net: compound page support in skb_seq_read · 97550f6f

由 Willem de Bruijn 提交于 1月 09, 2021

skb_seq_read iterates over an skb, returning pointer and length of
the next data range with each call.

It relies on kmap_atomic to access highmem pages when needed.

An skb frag may be backed by a compound page, but kmap_atomic maps
only a single page. There are not enough kmap slots to always map all
pages concurrently.

Instead, if kmap_atomic is needed, iterate over each page.

As this increases the number of calls, avoid this unless needed.
The necessary condition is captured in skb_frag_must_loop.

I tried to make the change as obvious as possible. It should be easy
to verify that nothing changes if skb_frag_must_loop returns false.

Tested:
  On an x86 platform with
    CONFIG_HIGHMEM=y
    CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
    CONFIG_NETFILTER_XT_MATCH_STRING=y

  Run
    ip link set dev lo mtu 1500
    iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
    dd if=/dev/urandom of=in bs=1M count=20
    nc -l -p 8000 > /dev/null &
    nc -w 1 -q 0 localhost 8000 < in
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

97550f6f

net: support kmap_local forced debugging in skb_frag_foreach · 29766bcf

由 Willem de Bruijn 提交于 1月 09, 2021

Skb frags may be backed by highmem and/or compound pages. Highmem
pages need kmap_atomic mappings to access. But kmap_atomic maps a
single page, not the entire compound page.

skb_foreach_page iterates over an skb frag, in one step in the common
case, page by page only if kmap_atomic must be called for each page.
The decision logic is captured in skb_frag_must_loop.

CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP extends kmap from highmem to all
pages, to increase code coverage.

Extend skb_frag_must_loop to this new condition.

Link: https://lore.kernel.org/linux-mm/20210106180132.41dc249d@gandalf.local.home/
Fixes: 0e91a0c6 ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Reported-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

29766bcf

08 1月, 2021 11 次提交

skbuff: Rename skb_zcopy_{get|put} to net_zcopy_{get|put} · 8e044917

由 Jonathan Lemon 提交于 1月 06, 2021

Unlike the rest of the skb_zcopy_ functions, these routines
operate on a 'struct ubuf', not a skb.  Remove the 'skb_'
prefix from the naming to make things clearer.
Suggested-by: NWillem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8e044917

tap/tun: add skb_zcopy_init() helper for initialization. · 9ee5e5ad

由 Jonathan Lemon 提交于 1月 06, 2021

Replace direct assignments with skb_zcopy_init() for zerocopy
cases where a new skb is initialized, without changing the
reference counts.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

9ee5e5ad

skbuff: add flags to ubuf_info for ubuf setup · 04c2d33e

由 Jonathan Lemon 提交于 1月 06, 2021

Currently, when an ubuf is attached to a new skb, the shared
flags word is initialized to a fixed value.  Instead of doing
this, set the default flags in the ubuf, and have new skbs
inherit from this default.

This is needed when setting up different zerocopy types.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

04c2d33e

net: group skb_shinfo zerocopy related bits together. · 06b4feb3

由 Jonathan Lemon 提交于 1月 06, 2021

In preparation for expanded zerocopy (TX and RX), move
the zerocopy related bits out of tx_flags into their own
flag word.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

06b4feb3

skbuff: rename sock_zerocopy_* to msg_zerocopy_* · 8c793822

由 Jonathan Lemon 提交于 1月 06, 2021

At Willem's suggestion, rename the sock_zerocopy_* functions
so that they match the MSG_ZEROCOPY flag, which makes it clear
they are specific to this zerocopy implementation.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8c793822

skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort · 236a6b1c

由 Jonathan Lemon 提交于 1月 06, 2021

The sock_zerocopy_put_abort function contains logic which is
specific to the current zerocopy implementation.  Add a wrapper
which checks the callback and dispatches apppropriately.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

236a6b1c

skbuff: Add skb parameter to the ubuf zerocopy callback · 36177832

由 Jonathan Lemon 提交于 1月 06, 2021

Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear().  This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

36177832

skbuff: replace sock_zerocopy_get with skb_zcopy_get · e76d46cf

由 Jonathan Lemon 提交于 1月 06, 2021

Rename the get routines for consistency.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e76d46cf

skbuff: replace sock_zerocopy_put() with skb_zcopy_put() · 59776362

由 Jonathan Lemon 提交于 1月 06, 2021

Replace sock_zerocopy_put with the generic skb_zcopy_put()
function.  Pass 'true' as the success argument, as this
is identical to no change.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

59776362

skbuff: Push status and refcounts into sock_zerocopy_callback · 75518851

由 Jonathan Lemon 提交于 1月 06, 2021

Before this change, the caller of sock_zerocopy_callback would
need to save the zerocopy status, decrement and check the refcount,
and then call the callback function - the callback was only invoked
when the refcount reached zero.

Now, the caller just passes the status into the callback function,
which saves the status and handles its own refcounts.

This makes the behavior of the sock_zerocopy_callback identical
to the tpacket and vhost callbacks.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

75518851

skbuff: remove unused skb_zcopy_abort function · 424f481f

由 Jonathan Lemon 提交于 1月 06, 2021

skb_zcopy_abort() has no in-tree consumers, remove it.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

424f481f

02 12月, 2020 1 次提交

net: switch to storing KCOV handle directly in sk_buff · fa69ee5a

由 Marco Elver 提交于 11月 25, 2020

It turns out that usage of skb extensions can cause memory leaks. Ido
Schimmel reported: "[...] there are instances that blindly overwrite
'skb->extensions' by invoking skb_copy_header() after __alloc_skb()."

Therefore, give up on using skb extensions for KCOV handle, and instead
directly store kcov_handle in sk_buff.

Fixes: 6370cc3b ("net: add kcov handle to skb extensions")
Fixes: 85ce50d3 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET")
Fixes: 97f53a08 ("net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling")
Link: https://lore.kernel.org/linux-wireless/20201121160941.GA485907@shredder.lan/Reported-by: NIdo Schimmel <idosch@idosch.org>
Signed-off-by: NMarco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20201125224840.2014773-1-elver@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

fa69ee5a

18 11月, 2020 1 次提交

net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling · 97f53a08

由 Randy Dunlap 提交于 11月 16, 2020

The previous Kconfig patch led to some other build errors as
reported by the 0day bot and my own overnight build testing.

These are all in <linux/skbuff.h> when KCOV is enabled but
SKB_EXTENSIONS is not enabled, so fix those by combining those conditions
in the header file.

Fixes: 6370cc3b ("net: add kcov handle to skb extensions")
Fixes: 85ce50d3 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reported-by: Nkernel test robot <lkp@intel.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20201116212108.32465-1-rdunlap@infradead.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

97f53a08

03 11月, 2020 1 次提交

net: add kcov handle to skb extensions · 6370cc3b

由 Aleksandr Nogikh 提交于 10月 29, 2020

Remote KCOV coverage collection enables coverage-guided fuzzing of the
code that is not reachable during normal system call execution. It is
especially helpful for fuzzing networking subsystems, where it is
common to perform packet handling in separate work queues even for the
packets that originated directly from the user space.

Enable coverage-guided frame injection by adding kcov remote handle to
skb extensions. Default initialization in __alloc_skb and
__build_skb_around ensures that no socket buffer that was generated
during a system call will be missed.

Code that is of interest and that performs packet processing should be
annotated with kcov_remote_start()/kcov_remote_stop().

An alternative approach is to determine kcov_handle solely on the
basis of the device/interface that received the specific socket
buffer. However, in this case it would be impossible to distinguish
between packets that originated during normal background network
processes or were intentionally injected from the user space.
Signed-off-by: NAleksandr Nogikh <nogikh@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

6370cc3b

04 10月, 2020 1 次提交

net/sched: act_vlan: Add {POP,PUSH}_ETH actions · 19fbcb36

由 Guillaume Nault 提交于 10月 03, 2020

Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
respectively pop and push a base Ethernet header at the beginning of a
frame.

POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
must be stripped before calling POP_ETH.

PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
addresses can be configured. The Ethertype is automatically set from
skb->protocol. These restrictions ensure that all skb's fields remain
consistent, so that this action can't confuse other part of the
networking stack (like GSO).

Since openvswitch already had these actions, consolidate the code in
skbuff.c (like for vlan and mpls push/pop).
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19fbcb36

01 10月, 2020 1 次提交

bpf: Add redirect_neigh helper as redirect drop-in · b4ab3141

由 Daniel Borkmann 提交于 9月 30, 2020

Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.

This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.

Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c3252 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.

With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net

b4ab3141

10 9月, 2020 1 次提交

net: add __must_check to skb_put_padto() · 4a009cb0

由 Eric Dumazet 提交于 9月 09, 2020

skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a009cb0

27 8月, 2020 1 次提交

net: Fix some comments · 645f0897

由 Miaohe Lin 提交于 8月 27, 2020

Fix some comments, including wrong function name, duplicated word and so
on.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

645f0897

25 8月, 2020 1 次提交

net: Get rid of consume_skb when tracing is off · be769db2

由 Herbert Xu 提交于 8月 22, 2020

The function consume_skb is only meaningful when tracing is enabled.
This patch makes it conditional on CONFIG_TRACEPOINTS.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be769db2

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

21 8月, 2020 1 次提交
- A
  skb_copy_and_csum_bits(): don't bother with the last argument · 8d5930df
  由 Al Viro 提交于 7月 10, 2020
```
it's always 0
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  8d5930df
04 8月, 2020 1 次提交

net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct · 038ebb1a

由 wenxu 提交于 7月 31, 2020

When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.

"kernel: net2: dropped over-mtu packet: 1528 > 1500"

This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.

Fixes: b57dc7c1 ("net/sched: Introduce action ct")
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

038ebb1a

25 7月, 2020 1 次提交

net/flow_dissector: add packet hash dissection · 0cb09aff

由 Ariel Levkovich 提交于 7月 23, 2020

Retreive a hash value from the SKB and store it
in the dissector key for future matching.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cb09aff

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功