提交 · 3bdd5ee0ec8c14131d560da492e6df452c6fdd75 · openeuler / Kernel

11 6月, 2021 1 次提交

skbuff: fix incorrect msg_zerocopy copy notifications · 3bdd5ee0

由 Willem de Bruijn 提交于 6月 09, 2021

msg_zerocopy signals if a send operation required copying with a flag
in serr->ee.ee_code.

This field can be incorrect as of the below commit, as a result of
both structs uarg and serr pointing into the same skb->cb[].

uarg->zerocopy must be read before skb->cb[] is reinitialized to hold
serr. Similar to other fields len, hi and lo, use a local variable to
temporarily hold the value.

This was not a problem before, when the value was passed as a function
argument.

Fixes: 75518851 ("skbuff: Push status and refcounts into sock_zerocopy_callback")
Reported-by: NTalal Ahmad <talalahmad@google.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bdd5ee0

15 4月, 2021 1 次提交

skbuff: revert "skbuff: remove some unnecessary operation in skb_segment_list()" · 17c3df70

由 Paolo Abeni 提交于 4月 14, 2021

the commit 1ddc3229 ("skbuff: remove some unnecessary operation
in skb_segment_list()") introduces an issue very similar to the
one already fixed by commit 53475c5d ("net: fix use-after-free when
UDP GRO with shared fraglist").

If the GSO skb goes though skb_clone() and pskb_expand_head() before
entering skb_segment_list(), the latter  will unshare the frag_list
skbs and will release the old list. With the reverted commit in place,
when skb_segment_list() completes, skb->next points to the just
released list, and later on the kernel will hit UaF.

Note that since commit e0e3070a ("udp: properly complete L4 GRO
over UDP tunnel packet") the critical scenario can be reproduced also
receiving UDP over vxlan traffic with:

NIC (NETIF_F_GRO_FRAGLIST enabled) -> vxlan -> UDP sink

Attaching a packet socket to the NIC will cause skb_clone() and the
tunnel decapsulation will call pskb_expand_head().

Fixes: 1ddc3229 ("skbuff: remove some unnecessary operation in skb_segment_list()")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17c3df70

02 4月, 2021 1 次提交

net: Introduce skb_send_sock() for sock_map · 0739cd28

由 Cong Wang 提交于 3月 30, 2021

We only have skb_send_sock_locked() which requires callers
to use lock_sock(). Introduce a variant skb_send_sock()
which locks on its own, callers do not need to lock it
any more. This will save us from adding a ->sendmsg_locked
for each protocol.

To reuse the code, pass function pointers to __skb_send_sock()
and build skb_send_sock() and skb_send_sock_locked() on top.
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210331023237.41094-4-xiyou.wangcong@gmail.com

0739cd28

11 3月, 2021 1 次提交

skbuff: remove some unnecessary operation in skb_segment_list() · 1ddc3229

由 Yunsheng Lin 提交于 3月 10, 2021

gro list uses skb_shinfo(skb)->frag_list to link two skb together,
and NAPI_GRO_CB(p)->last->next is used when there are more skb,
see skb_gro_receive_list(). gso expects that each segmented skb is
linked together using skb->next, so only the first skb->next need
to set to skb_shinfo(skb)-> frag_list when doing gso list segment.

It is the same reason that nskb->next does not need to be set to
list_skb before goto the error handling, because nskb->next already
pointers to list_skb.

And nskb is also the last skb at the end of loop, so remove tail
variable and use nskb instead.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ddc3229

06 3月, 2021 1 次提交

kcov: Remove kcov include from sched.h and move it to its users. · 183f47fc

由 Sebastian Andrzej Siewior 提交于 2月 18, 2021

The recent addition of in_serving_softirq() to kconv.h results in
compile failure on PREEMPT_RT because it requires
task_struct::softirq_disable_cnt. This is not available if kconv.h is
included from sched.h.

It is not needed to include kconv.h from sched.h. All but the net/ user
already include the kconv header file.

Move the include of the kconv.h header from sched.h it its users.
Additionally include sched.h from kconv.h to ensure that everything
task_struct related is available.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Acked-by: NAndrey Konovalov <andreyknvl@google.com>
Link: https://lkml.kernel.org/r/20210218173124.iy5iyqv3a4oia4vv@linutronix.de

183f47fc

02 3月, 2021 1 次提交

net: expand textsearch ts_state to fit skb_seq_state · b228c9b0

由 Willem de Bruijn 提交于 3月 01, 2021

The referenced commit expands the skb_seq_state used by
skb_find_text with a 4B frag_off field, growing it to 48B.

This exceeds container ts_state->cb, causing a stack corruption:

[   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: skb_find_text+0xc5/0xd0
[   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
[   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.14.0-2 04/01/2014
[   73.260078] Call Trace:
[   73.264677]  dump_stack+0x57/0x6a
[   73.267866]  panic+0xf6/0x2b7
[   73.270578]  ? skb_find_text+0xc5/0xd0
[   73.273964]  __stack_chk_fail+0x10/0x10
[   73.277491]  skb_find_text+0xc5/0xd0
[   73.280727]  string_mt+0x1f/0x30
[   73.283639]  ipt_do_table+0x214/0x410

The struct is passed between skb_find_text and its callbacks
skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
the textsearch interface using TS_SKB_CB.

I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.

skb->cb was increased from 40B to 48B after ts_state was introduced,
in commit 3e3850e9 ("[NETFILTER]: Fix xfrm lookup in
ip_route_me_harder/ip6_route_me_harder").

Increase ts_state.cb[] to 48 to fit the struct.

Also add a BUILD_BUG_ON to avoid a repeat.

The alternative is to directly add a dependency from textsearch onto
linux/skbuff.h, but I think the intent is textsearch to have no such
dependencies on its callers.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
Fixes: 97550f6f ("net: compound page support in skb_seq_read")
Reported-by: NKris Karas <bugs-a17@moonlit-rail.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b228c9b0

14 2月, 2021 11 次提交

skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing · 9243adfc

由 Alexander Lobakin 提交于 2月 13, 2021

napi_frags_finish() and napi_skb_finish() can only be called inside
NAPI Rx context, so we can feed NAPI cache with skbuff_heads that
got NAPI_MERGED_FREE verdict instead of immediate freeing.
Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish()
and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs
to NAPI cache.
As many drivers call napi_alloc_skb()/napi_get_frags() on their
receive path, this becomes especially useful.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9243adfc

skbuff: allow to use NAPI cache from __napi_alloc_skb() · cfb8ec65

由 Alexander Lobakin 提交于 2月 13, 2021

{,__}napi_alloc_skb() is mostly used either for optional non-linear
receive methods (usually controlled via Ethtool private flags and off
by default) and/or for Rx copybreaks.
Use __napi_build_skb() here for obtaining skbuff_heads from NAPI cache
instead of inplace allocations. This includes both kmalloc and page
frag paths.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfb8ec65

skbuff: allow to optionally use NAPI cache from __alloc_skb() · d13612b5

由 Alexander Lobakin 提交于 2月 13, 2021

Reuse the old and forgotten SKB_ALLOC_NAPI to add an option to get
an skbuff_head from the NAPI cache instead of inplace allocation
inside __alloc_skb().
This implies that the function is called from softirq or BH-off
context, not for allocating a clone or from a distant node.

Cc: Alexander Duyck <alexander.duyck@gmail.com> # Simplified flags check
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d13612b5

skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads · f450d539

由 Alexander Lobakin 提交于 2月 13, 2021

Instead of just bulk-flushing skbuff_heads queued up through
napi_consume_skb() or __kfree_skb_defer(), try to reuse them
on allocation path.
If the cache is empty on allocation, bulk-allocate the first
16 elements, which is more efficient than per-skb allocation.
If the cache is full on freeing, bulk-wipe the second half of
the cache (32 elements).
This also includes custom KASAN poisoning/unpoisoning to be
double sure there are no use-after-free cases.

To not change current behaviour, introduce a new function,
napi_build_skb(), to optionally use a new approach later
in drivers.

Note on selected bulk size, 16:
 - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
   and especially VETH_XDP_BATCH, which is also used to
   bulk-allocate skbuff_heads and was tested on powerful
   setups;
 - this also showed the best performance in the actual
   test series (from the array of {8, 16, 32}).

Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f450d539

skbuff: move NAPI cache declarations upper in the file · 50fad4b5

由 Alexander Lobakin 提交于 2月 13, 2021

NAPI cache structures will be used for allocating skbuff_heads,
so move their declarations a bit upper.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50fad4b5

skbuff: remove __kfree_skb_flush() · fec6e49b

由 Alexander Lobakin 提交于 2月 13, 2021

This function isn't much needed as NAPI skb queue gets bulk-freed
anyway when there's no more room, and even may reduce the efficiency
of bulk operations.
It will be even less needed after reusing skb cache on allocation path,
so remove it and this way lighten network softirqs a bit.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fec6e49b

skbuff: use __build_skb_around() in __alloc_skb() · f9d6725b

由 Alexander Lobakin 提交于 2月 13, 2021

Just call __build_skb_around() instead of open-coding it.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9d6725b

skbuff: simplify __alloc_skb() a bit · df1ae022

由 Alexander Lobakin 提交于 2月 13, 2021

Use unlikely() annotations for skbuff_head and data similarly to the
two other allocation functions and remove totally redundant goto.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df1ae022

skbuff: make __build_skb_around() return void · 483126b3

由 Alexander Lobakin 提交于 2月 13, 2021

__build_skb_around() can never fail and always returns passed skb.
Make it return void to simplify and optimize the code.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

483126b3

skbuff: simplify kmalloc_reserve() · ef28095f

由 Alexander Lobakin 提交于 2月 13, 2021

Eversince the introduction of __kmalloc_reserve(), "ip" argument
hasn't been used. _RET_IP_ is embedded inside
kmalloc_node_track_caller().
Remove the redundant macro and rename the function after it.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef28095f

skbuff: move __alloc_skb() next to the other skb allocation functions · 5381b23d

由 Alexander Lobakin 提交于 2月 13, 2021

In preparation before reusing several functions in all three skb
allocation variants, move __alloc_skb() next to the
__netdev_alloc_skb() and __napi_alloc_skb().
No functional changes.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5381b23d

07 2月, 2021 1 次提交

net: Introduce {netdev,napi}_alloc_frag_align() · 3f6e687d

由 Kevin Hao 提交于 2月 04, 2021

In the current implementation of {netdev,napi}_alloc_frag(), it doesn't
have any align guarantee for the returned buffer address, But for some
hardwares they do require the DMA buffer to be aligned correctly,
so we would have to use some workarounds like below if the buffers
allocated by the {netdev,napi}_alloc_frag() are used by these hardwares
for DMA.
    buf = napi_alloc_frag(really_needed_size + align);
    buf = PTR_ALIGN(buf, align);

These codes seems ugly and would waste a lot of memories if the buffers
are used in a network driver for the TX/RX. We have added the align
support for the page_frag functions, so add the corresponding
{netdev,napi}_frag functions.
Signed-off-by: NKevin Hao <haokexin@gmail.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3f6e687d

03 2月, 2021 1 次提交

net: fix up truesize of cloned skb in skb_prepare_for_shift() · 097b9146

由 Marco Elver 提交于 2月 01, 2021

Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when
cloning an skb, save and restore truesize after pskb_expand_head(). This
can occur if the allocator decides to service an allocation of the same
size differently (e.g. use a different size class, or pass the
allocation on to KFENCE).

Because truesize is used for bookkeeping (such as sk_wmem_queued), a
modified truesize of a cloned skb may result in corrupt bookkeeping and
relevant warnings (such as in sk_stream_kill_queues()).

Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com
Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

097b9146

23 1月, 2021 1 次提交

tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS · e7ed11ee

由 Yousuk Seung 提交于 1月 20, 2021

This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
the time-to-live or hop limit of the latest incoming packet with
SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
the sequence when incoming packets are aggregated. Exporting the
time-to-live or hop limit value of incoming packets helps to estimate
the hop count of the path of the flow that may change over time.
Signed-off-by: NYousuk Seung <ysseung@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e7ed11ee

20 1月, 2021 1 次提交

net: fix GSO for SG-enabled devices · 00b229f7

由 Paolo Abeni 提交于 1月 19, 2021

The commit dbd50f23 ("net: move the hsize check to the else
block in skb_segment") introduced a data corruption for devices
supporting scatter-gather.

The problem boils down to signed/unsigned comparison given
unexpected results: if signed 'hsize' is negative, it will be
considered greater than a positive 'len', which is unsigned.

This commit addresses resorting to the old checks order, so that
'hsize' never has a negative value when compared with 'len'.

v1 -> v2:
 - reorder hsize checks instead of explicit cast (Alex)
Bisected-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Fixes: dbd50f23 ("net: move the hsize check to the else block in skb_segment")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/861947c2d2d087db82af93c21920ce8147d15490.1611074818.git.pabeni@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

00b229f7

17 1月, 2021 2 次提交

net: move the hsize check to the else block in skb_segment · dbd50f23

由 Xin Long 提交于 1月 15, 2021

After commit 89319d38 ("net: Add frag_list support to skb_segment"),
it goes to process frag_list when !hsize in skb_segment(). However, when
using skb frag_list, sg normally should not be set. In this case, hsize
will be set with len right before !hsize check, then it won't go to
frag_list processing code.

So the right thing to do is move the hsize check to the else block, so
that it won't affect the !hsize check for frag_list processing.

v1->v2:
  - change to do "hsize <= 0" check instead of "!hsize", and also move
    "hsize < 0" into else block, to save some cycles, as Alex suggested.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

dbd50f23

skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too · 66c55602

由 Alexander Lobakin 提交于 1月 15, 2021

Commit 3226b158 ("net: avoid 32 x truesize under-estimation for
tiny skbs") ensured that skbs with data size lower than 1025 bytes
will be kmalloc'ed to avoid excessive page cache fragmentation and
memory consumption.
However, the fix adressed only __napi_alloc_skb() (primarily for
virtio_net and napi_get_frags()), but the issue can still be achieved
through __netdev_alloc_skb(), which is still used by several drivers.
Drivers often allocate a tiny skb for headers and place the rest of
the frame to frags (so-called copybreak).
Mirror the condition to __netdev_alloc_skb() to handle this case too.

Since v1 [0]:
 - fix "Fixes:" tag;
 - refine commit message (mention copybreak usecase).

[0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me

Fixes: a1c7fff7 ("net: netdev_alloc_skb() use build_skb()")
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.meSigned-off-by: NJakub Kicinski <kuba@kernel.org>

66c55602

15 1月, 2021 1 次提交

net: avoid 32 x truesize under-estimation for tiny skbs · 3226b158

由 Eric Dumazet 提交于 1月 13, 2021

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83d ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

3226b158

12 1月, 2021 1 次提交

net: compound page support in skb_seq_read · 97550f6f

由 Willem de Bruijn 提交于 1月 09, 2021

skb_seq_read iterates over an skb, returning pointer and length of
the next data range with each call.

It relies on kmap_atomic to access highmem pages when needed.

An skb frag may be backed by a compound page, but kmap_atomic maps
only a single page. There are not enough kmap slots to always map all
pages concurrently.

Instead, if kmap_atomic is needed, iterate over each page.

As this increases the number of calls, avoid this unless needed.
The necessary condition is captured in skb_frag_must_loop.

I tried to make the change as obvious as possible. It should be easy
to verify that nothing changes if skb_frag_must_loop returns false.

Tested:
  On an x86 platform with
    CONFIG_HIGHMEM=y
    CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
    CONFIG_NETFILTER_XT_MATCH_STRING=y

  Run
    ip link set dev lo mtu 1500
    iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
    dd if=/dev/urandom of=in bs=1M count=20
    nc -l -p 8000 > /dev/null &
    nc -w 1 -q 0 localhost 8000 < in
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

97550f6f

09 1月, 2021 1 次提交

net: fix use-after-free when UDP GRO with shared fraglist · 53475c5d

由 Dongseok Yi 提交于 1月 08, 2021

skbs in fraglist could be shared by a BPF filter loaded at TC. If TC
writes, it will call skb_ensure_writable -> pskb_expand_head to create
a private linear section for the head_skb. And then call
skb_clone_fraglist -> skb_get on each skb in the fraglist.

skb_segment_list overwrites part of the skb linear section of each
fragment itself. Even after skb_clone, the frag_skbs share their
linear section with their clone in PF_PACKET.

Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have
a link for the same frag_skbs chain. If a new skb (not frags) is
queued to one of the sk_receive_queue, multiple ptypes can see and
release this. It causes use-after-free.

[ 4443.426215] ------------[ cut here ]------------
[ 4443.426222] refcount_t: underflow; use-after-free.
[ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
[ 4443.426808] Call trace:
[ 4443.426813]  refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426823]  skb_release_data+0x144/0x264
[ 4443.426828]  kfree_skb+0x58/0xc4
[ 4443.426832]  skb_queue_purge+0x64/0x9c
[ 4443.426844]  packet_set_ring+0x5f0/0x820
[ 4443.426849]  packet_setsockopt+0x5a4/0xcd0
[ 4443.426853]  __sys_setsockopt+0x188/0x278
[ 4443.426858]  __arm64_sys_setsockopt+0x28/0x38
[ 4443.426869]  el0_svc_common+0xf0/0x1d0
[ 4443.426873]  el0_svc_handler+0x74/0x98
[ 4443.426880]  el0_svc+0x8/0xc

Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.)
Signed-off-by: NDongseok Yi <dseok.yi@samsung.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

53475c5d

08 1月, 2021 11 次提交

skbuff: Rename skb_zcopy_{get|put} to net_zcopy_{get|put} · 8e044917

由 Jonathan Lemon 提交于 1月 06, 2021

Unlike the rest of the skb_zcopy_ functions, these routines
operate on a 'struct ubuf', not a skb.  Remove the 'skb_'
prefix from the naming to make things clearer.
Suggested-by: NWillem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8e044917

skbuff: add flags to ubuf_info for ubuf setup · 04c2d33e

由 Jonathan Lemon 提交于 1月 06, 2021

Currently, when an ubuf is attached to a new skb, the shared
flags word is initialized to a fixed value.  Instead of doing
this, set the default flags in the ubuf, and have new skbs
inherit from this default.

This is needed when setting up different zerocopy types.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

04c2d33e

net: group skb_shinfo zerocopy related bits together. · 06b4feb3

由 Jonathan Lemon 提交于 1月 06, 2021

In preparation for expanded zerocopy (TX and RX), move
the zerocopy related bits out of tx_flags into their own
flag word.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

06b4feb3

skbuff: rename sock_zerocopy_* to msg_zerocopy_* · 8c793822

由 Jonathan Lemon 提交于 1月 06, 2021

At Willem's suggestion, rename the sock_zerocopy_* functions
so that they match the MSG_ZEROCOPY flag, which makes it clear
they are specific to this zerocopy implementation.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8c793822

skbuff: Call skb_zcopy_clear() before unref'ing fragments · 70c43167

由 Jonathan Lemon 提交于 1月 06, 2021

RX zerocopy fragment pages which are not allocated from the
system page pool require special handling.  Give the callback
in skb_zcopy_clear() a chance to process them first.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

70c43167

skbuff: Call sock_zerocopy_put_abort from skb_zcopy_put_abort · 236a6b1c

由 Jonathan Lemon 提交于 1月 06, 2021

The sock_zerocopy_put_abort function contains logic which is
specific to the current zerocopy implementation.  Add a wrapper
which checks the callback and dispatches apppropriately.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

236a6b1c

skbuff: Add skb parameter to the ubuf zerocopy callback · 36177832

由 Jonathan Lemon 提交于 1月 06, 2021

Add an optional skb parameter to the zerocopy callback parameter,
which is passed down from skb_zcopy_clear().  This gives access
to the original skb, which is needed for upcoming RX zero-copy
error handling.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

36177832

skbuff: replace sock_zerocopy_get with skb_zcopy_get · e76d46cf

由 Jonathan Lemon 提交于 1月 06, 2021

Rename the get routines for consistency.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e76d46cf

skbuff: replace sock_zerocopy_put() with skb_zcopy_put() · 59776362

由 Jonathan Lemon 提交于 1月 06, 2021

Replace sock_zerocopy_put with the generic skb_zcopy_put()
function.  Pass 'true' as the success argument, as this
is identical to no change.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

59776362

skbuff: Push status and refcounts into sock_zerocopy_callback · 75518851

由 Jonathan Lemon 提交于 1月 06, 2021

Before this change, the caller of sock_zerocopy_callback would
need to save the zerocopy status, decrement and check the refcount,
and then call the callback function - the callback was only invoked
when the refcount reached zero.

Now, the caller just passes the status into the callback function,
which saves the status and handles its own refcounts.

This makes the behavior of the sock_zerocopy_callback identical
to the tpacket and vhost callbacks.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

75518851

skbuff: simplify sock_zerocopy_put · d6adf1b1

由 Jonathan Lemon 提交于 1月 06, 2021

All 'struct ubuf_info' users should have a callback defined
as of commit 0a4a060b ("sock: fix zerocopy_success regression
with msg_zerocopy").

Remove the dead code path to consume_skb(), which makes
assumptions about how the structure was allocated.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

d6adf1b1

15 12月, 2020 1 次提交

net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet · 54970a2f

由 Vasily Averin 提交于 12月 14, 2020

syzbot reproduces BUG_ON in skb_checksum_help():
tun creates (bogus) skb with huge partial-checksummed area and
small ip packet inside. Then ip_rcv trims the skb based on size
of internal ip packet, after that csum offset points beyond of
trimmed skb. Then checksum_tg() called via netfilter hook
triggers BUG_ON:

        offset = skb_checksum_start_offset(skb);
        BUG_ON(offset >= skb_headlen(skb));

To work around the problem this patch forces pskb_trim_rcsum_slow()
to return -EINVAL in described scenario. It allows its callers to
drop such kind of packets.

Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0
Reported-by: syzbot+7010af67ced6105e5ab6@syzkaller.appspotmail.com
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1b2494af-2c56-8ee2-7bc0-923fcad1cdf8@virtuozzo.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

54970a2f

04 12月, 2020 1 次提交

net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl · 13de4ed9

由 Davide Caratti 提交于 12月 03, 2020

skb_mpls_dec_ttl() reads the LSE without ensuring that it is contained in
the skb "linear" area. Fix this calling pskb_may_pull() before reading the
current ttl.

Found by code inspection.

Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC")
Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/53659f28be8bc336c113b5254dc637cc76bbae91.1606987074.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

13de4ed9

02 12月, 2020 1 次提交

net: switch to storing KCOV handle directly in sk_buff · fa69ee5a

由 Marco Elver 提交于 11月 25, 2020

It turns out that usage of skb extensions can cause memory leaks. Ido
Schimmel reported: "[...] there are instances that blindly overwrite
'skb->extensions' by invoking skb_copy_header() after __alloc_skb()."

Therefore, give up on using skb extensions for KCOV handle, and instead
directly store kcov_handle in sk_buff.

Fixes: 6370cc3b ("net: add kcov handle to skb extensions")
Fixes: 85ce50d3 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET")
Fixes: 97f53a08 ("net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling")
Link: https://lore.kernel.org/linux-wireless/20201121160941.GA485907@shredder.lan/Reported-by: NIdo Schimmel <idosch@idosch.org>
Signed-off-by: NMarco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20201125224840.2014773-1-elver@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

fa69ee5a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功