1. 11 6月, 2021 1 次提交
  2. 15 4月, 2021 1 次提交
    • P
      skbuff: revert "skbuff: remove some unnecessary operation in skb_segment_list()" · 17c3df70
      Paolo Abeni 提交于
      the commit 1ddc3229 ("skbuff: remove some unnecessary operation
      in skb_segment_list()") introduces an issue very similar to the
      one already fixed by commit 53475c5d ("net: fix use-after-free when
      UDP GRO with shared fraglist").
      
      If the GSO skb goes though skb_clone() and pskb_expand_head() before
      entering skb_segment_list(), the latter  will unshare the frag_list
      skbs and will release the old list. With the reverted commit in place,
      when skb_segment_list() completes, skb->next points to the just
      released list, and later on the kernel will hit UaF.
      
      Note that since commit e0e3070a ("udp: properly complete L4 GRO
      over UDP tunnel packet") the critical scenario can be reproduced also
      receiving UDP over vxlan traffic with:
      
      NIC (NETIF_F_GRO_FRAGLIST enabled) -> vxlan -> UDP sink
      
      Attaching a packet socket to the NIC will cause skb_clone() and the
      tunnel decapsulation will call pskb_expand_head().
      
      Fixes: 1ddc3229 ("skbuff: remove some unnecessary operation in skb_segment_list()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17c3df70
  3. 02 4月, 2021 1 次提交
  4. 11 3月, 2021 1 次提交
    • Y
      skbuff: remove some unnecessary operation in skb_segment_list() · 1ddc3229
      Yunsheng Lin 提交于
      gro list uses skb_shinfo(skb)->frag_list to link two skb together,
      and NAPI_GRO_CB(p)->last->next is used when there are more skb,
      see skb_gro_receive_list(). gso expects that each segmented skb is
      linked together using skb->next, so only the first skb->next need
      to set to skb_shinfo(skb)-> frag_list when doing gso list segment.
      
      It is the same reason that nskb->next does not need to be set to
      list_skb before goto the error handling, because nskb->next already
      pointers to list_skb.
      
      And nskb is also the last skb at the end of loop, so remove tail
      variable and use nskb instead.
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ddc3229
  5. 06 3月, 2021 1 次提交
  6. 02 3月, 2021 1 次提交
    • W
      net: expand textsearch ts_state to fit skb_seq_state · b228c9b0
      Willem de Bruijn 提交于
      The referenced commit expands the skb_seq_state used by
      skb_find_text with a 4B frag_off field, growing it to 48B.
      
      This exceeds container ts_state->cb, causing a stack corruption:
      
      [   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
      is corrupted in: skb_find_text+0xc5/0xd0
      [   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
      [   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.14.0-2 04/01/2014
      [   73.260078] Call Trace:
      [   73.264677]  dump_stack+0x57/0x6a
      [   73.267866]  panic+0xf6/0x2b7
      [   73.270578]  ? skb_find_text+0xc5/0xd0
      [   73.273964]  __stack_chk_fail+0x10/0x10
      [   73.277491]  skb_find_text+0xc5/0xd0
      [   73.280727]  string_mt+0x1f/0x30
      [   73.283639]  ipt_do_table+0x214/0x410
      
      The struct is passed between skb_find_text and its callbacks
      skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
      the textsearch interface using TS_SKB_CB.
      
      I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
      skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.
      
      skb->cb was increased from 40B to 48B after ts_state was introduced,
      in commit 3e3850e9 ("[NETFILTER]: Fix xfrm lookup in
      ip_route_me_harder/ip6_route_me_harder").
      
      Increase ts_state.cb[] to 48 to fit the struct.
      
      Also add a BUILD_BUG_ON to avoid a repeat.
      
      The alternative is to directly add a dependency from textsearch onto
      linux/skbuff.h, but I think the intent is textsearch to have no such
      dependencies on its callers.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
      Fixes: 97550f6f ("net: compound page support in skb_seq_read")
      Reported-by: NKris Karas <bugs-a17@moonlit-rail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b228c9b0
  7. 14 2月, 2021 11 次提交
  8. 07 2月, 2021 1 次提交
    • K
      net: Introduce {netdev,napi}_alloc_frag_align() · 3f6e687d
      Kevin Hao 提交于
      In the current implementation of {netdev,napi}_alloc_frag(), it doesn't
      have any align guarantee for the returned buffer address, But for some
      hardwares they do require the DMA buffer to be aligned correctly,
      so we would have to use some workarounds like below if the buffers
      allocated by the {netdev,napi}_alloc_frag() are used by these hardwares
      for DMA.
          buf = napi_alloc_frag(really_needed_size + align);
          buf = PTR_ALIGN(buf, align);
      
      These codes seems ugly and would waste a lot of memories if the buffers
      are used in a network driver for the TX/RX. We have added the align
      support for the page_frag functions, so add the corresponding
      {netdev,napi}_frag functions.
      Signed-off-by: NKevin Hao <haokexin@gmail.com>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3f6e687d
  9. 03 2月, 2021 1 次提交
  10. 23 1月, 2021 1 次提交
  11. 20 1月, 2021 1 次提交
  12. 17 1月, 2021 2 次提交
  13. 15 1月, 2021 1 次提交
    • E
      net: avoid 32 x truesize under-estimation for tiny skbs · 3226b158
      Eric Dumazet 提交于
      Both virtio net and napi_get_frags() allocate skbs
      with a very small skb->head
      
      While using page fragments instead of a kmalloc backed skb->head might give
      a small performance improvement in some cases, there is a huge risk of
      under estimating memory usage.
      
      For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
      per page (order-3 page in x86), or even 64 on PowerPC
      
      We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
      but consuming far more memory for TCP buffers than instructed in tcp_mem[2]
      
      Even if we force napi_alloc_skb() to only use order-0 pages, the issue
      would still be there on arches with PAGE_SIZE >= 32768
      
      This patch makes sure that small skb head are kmalloc backed, so that
      other objects in the slab page can be reused instead of being held as long
      as skbs are sitting in socket queues.
      
      Note that we might in the future use the sk_buff napi cache,
      instead of going through a more expensive __alloc_skb()
      
      Another idea would be to use separate page sizes depending
      on the allocated length (to never have more than 4 frags per page)
      
      I would like to thank Greg Thelen for his precious help on this matter,
      analysing crash dumps is always a time consuming task.
      
      Fixes: fd11a83d ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      3226b158
  14. 12 1月, 2021 1 次提交
    • W
      net: compound page support in skb_seq_read · 97550f6f
      Willem de Bruijn 提交于
      skb_seq_read iterates over an skb, returning pointer and length of
      the next data range with each call.
      
      It relies on kmap_atomic to access highmem pages when needed.
      
      An skb frag may be backed by a compound page, but kmap_atomic maps
      only a single page. There are not enough kmap slots to always map all
      pages concurrently.
      
      Instead, if kmap_atomic is needed, iterate over each page.
      
      As this increases the number of calls, avoid this unless needed.
      The necessary condition is captured in skb_frag_must_loop.
      
      I tried to make the change as obvious as possible. It should be easy
      to verify that nothing changes if skb_frag_must_loop returns false.
      
      Tested:
        On an x86 platform with
          CONFIG_HIGHMEM=y
          CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
          CONFIG_NETFILTER_XT_MATCH_STRING=y
      
        Run
          ip link set dev lo mtu 1500
          iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
          dd if=/dev/urandom of=in bs=1M count=20
          nc -l -p 8000 > /dev/null &
          nc -w 1 -q 0 localhost 8000 < in
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      97550f6f
  15. 09 1月, 2021 1 次提交
    • D
      net: fix use-after-free when UDP GRO with shared fraglist · 53475c5d
      Dongseok Yi 提交于
      skbs in fraglist could be shared by a BPF filter loaded at TC. If TC
      writes, it will call skb_ensure_writable -> pskb_expand_head to create
      a private linear section for the head_skb. And then call
      skb_clone_fraglist -> skb_get on each skb in the fraglist.
      
      skb_segment_list overwrites part of the skb linear section of each
      fragment itself. Even after skb_clone, the frag_skbs share their
      linear section with their clone in PF_PACKET.
      
      Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have
      a link for the same frag_skbs chain. If a new skb (not frags) is
      queued to one of the sk_receive_queue, multiple ptypes can see and
      release this. It causes use-after-free.
      
      [ 4443.426215] ------------[ cut here ]------------
      [ 4443.426222] refcount_t: underflow; use-after-free.
      [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
      refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
      [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
      [ 4443.426808] Call trace:
      [ 4443.426813]  refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426823]  skb_release_data+0x144/0x264
      [ 4443.426828]  kfree_skb+0x58/0xc4
      [ 4443.426832]  skb_queue_purge+0x64/0x9c
      [ 4443.426844]  packet_set_ring+0x5f0/0x820
      [ 4443.426849]  packet_setsockopt+0x5a4/0xcd0
      [ 4443.426853]  __sys_setsockopt+0x188/0x278
      [ 4443.426858]  __arm64_sys_setsockopt+0x28/0x38
      [ 4443.426869]  el0_svc_common+0xf0/0x1d0
      [ 4443.426873]  el0_svc_handler+0x74/0x98
      [ 4443.426880]  el0_svc+0x8/0xc
      
      Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.)
      Signed-off-by: NDongseok Yi <dseok.yi@samsung.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      53475c5d
  16. 08 1月, 2021 11 次提交
  17. 15 12月, 2020 1 次提交
  18. 04 12月, 2020 1 次提交
  19. 02 12月, 2020 1 次提交