1. 08 2月, 2023 1 次提交
  2. 13 12月, 2022 1 次提交
  3. 14 6月, 2022 1 次提交
    • J
      skbuff: fix coalescing for page_pool fragment recycling · 3a44f609
      Jean-Philippe Brucker 提交于
      mainline inclusion
      from mainline-v5.18-rc2
      commit 1effe8ca
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I56XHY
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1effe8ca4e34
      
      ----------------------------------------------------------------------
      
      Fix a use-after-free when using page_pool with page fragments. We
      encountered this problem during normal RX in the hns3 driver:
      
      (1) Initially we have three descriptors in the RX queue. The first one
          allocates PAGE1 through page_pool, and the other two allocate one
          half of PAGE2 each. Page references look like this:
      
                      RX_BD1 _______ PAGE1
                      RX_BD2 _______ PAGE2
                      RX_BD3 _________/
      
      (2) Handle RX on the first descriptor. Allocate SKB1, eventually added
          to the receive queue by tcp_queue_rcv().
      
      (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to
          netif_receive_skb():
      
          netif_receive_skb(SKB2)
            ip_rcv(SKB2)
              SKB3 = skb_clone(SKB2)
      
          SKB2 and SKB3 share a reference to PAGE2 through
          skb_shinfo()->dataref. The other ref to PAGE2 is still held by
          RX_BD3:
      
                            SKB2 ---+- PAGE2
                            SKB3 __/   /
                      RX_BD3 _________/
      
       (3b) Now while handling TCP, coalesce SKB3 with SKB1:
      
            tcp_v4_rcv(SKB3)
              tcp_try_coalesce(to=SKB1, from=SKB3)    // succeeds
              kfree_skb_partial(SKB3)
                skb_release_data(SKB3)                // drops one dataref
      
                            SKB1 _____ PAGE1
                                 \____
                            SKB2 _____ PAGE2
                                       /
                      RX_BD3 _________/
      
          In skb_try_coalesce(), __skb_frag_ref() takes a page reference to
          PAGE2, where it should instead have increased the page_pool frag
          reference, pp_frag_count. Without coalescing, when releasing both
          SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
          when releasing SKB1 and SKB2, two references to PAGE2 will be
          dropped, resulting in underflow.
      
       (3c) Drop SKB2:
      
            af_packet_rcv(SKB2)
              consume_skb(SKB2)
                skb_release_data(SKB2)                // drops second dataref
                  page_pool_return_skb_page(PAGE2)    // drops one pp_frag_count
      
                            SKB1 _____ PAGE1
                                 \____
                                       PAGE2
                                       /
                      RX_BD3 _________/
      
      (4) Userspace calls recvmsg()
          Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
          release the SKB3 page as well:
      
          tcp_eat_recv_skb(SKB1)
            skb_release_data(SKB1)
              page_pool_return_skb_page(PAGE1)
              page_pool_return_skb_page(PAGE2)        // drops second pp_frag_count
      
      (5) PAGE2 is freed, but the third RX descriptor was still using it!
          In our case this causes IOMMU faults, but it would silently corrupt
          memory if the IOMMU was disabled.
      
      Change the logic that checks whether pp_recycle SKBs can be coalesced.
      We still reject differing pp_recycle between 'from' and 'to' SKBs, but
      in order to avoid the situation described above, we also reject
      coalescing when both 'from' and 'to' are pp_recycled and 'from' is
      cloned.
      
      The new logic allows coalescing a cloned pp_recycle SKB into a page
      refcounted one, because in this case the release (4) will drop the right
      reference, the one taken by skb_try_coalesce().
      
      Fixes: 53e0961d ("page_pool: add frag page recycling support in page pool")
      Suggested-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
      Reviewed-by: NYunsheng Lin <linyunsheng@huawei.com>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com>
      Reviewed-by: NJian Shen <shenjian15@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3a44f609
  4. 28 5月, 2022 2 次提交
  5. 27 1月, 2022 1 次提交
  6. 14 1月, 2022 1 次提交
  7. 11 11月, 2021 4 次提交
  8. 15 10月, 2021 3 次提交
  9. 09 4月, 2021 1 次提交
  10. 08 2月, 2021 1 次提交
  11. 29 1月, 2021 2 次提交
    • E
      net: avoid 32 x truesize under-estimation for tiny skbs · 2eaab298
      Eric Dumazet 提交于
      stable inclusion
      from stable-5.10.10
      commit 024158d3b5715e830bf4b51c4452132937c8f1e8
      bugzilla: 47610
      
      --------------------------------
      
      [ Upstream commit 3226b158 ]
      
      Both virtio net and napi_get_frags() allocate skbs
      with a very small skb->head
      
      While using page fragments instead of a kmalloc backed skb->head might give
      a small performance improvement in some cases, there is a huge risk of
      under estimating memory usage.
      
      For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
      per page (order-3 page in x86), or even 64 on PowerPC
      
      We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
      but consuming far more memory for TCP buffers than instructed in tcp_mem[2]
      
      Even if we force napi_alloc_skb() to only use order-0 pages, the issue
      would still be there on arches with PAGE_SIZE >= 32768
      
      This patch makes sure that small skb head are kmalloc backed, so that
      other objects in the slab page can be reused instead of being held as long
      as skbs are sitting in socket queues.
      
      Note that we might in the future use the sk_buff napi cache,
      instead of going through a more expensive __alloc_skb()
      
      Another idea would be to use separate page sizes depending
      on the allocated length (to never have more than 4 frags per page)
      
      I would like to thank Greg Thelen for his precious help on this matter,
      analysing crash dumps is always a time consuming task.
      
      Fixes: fd11a83d ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      2eaab298
    • D
      net: fix use-after-free when UDP GRO with shared fraglist · da767576
      Dongseok Yi 提交于
      stable inclusion
      from stable-5.10.10
      commit 24cd3317418955fd9489e4e7e9c2b643691b270b
      bugzilla: 47610
      
      --------------------------------
      
      [ Upstream commit 53475c5d ]
      
      skbs in fraglist could be shared by a BPF filter loaded at TC. If TC
      writes, it will call skb_ensure_writable -> pskb_expand_head to create
      a private linear section for the head_skb. And then call
      skb_clone_fraglist -> skb_get on each skb in the fraglist.
      
      skb_segment_list overwrites part of the skb linear section of each
      fragment itself. Even after skb_clone, the frag_skbs share their
      linear section with their clone in PF_PACKET.
      
      Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have
      a link for the same frag_skbs chain. If a new skb (not frags) is
      queued to one of the sk_receive_queue, multiple ptypes can see and
      release this. It causes use-after-free.
      
      [ 4443.426215] ------------[ cut here ]------------
      [ 4443.426222] refcount_t: underflow; use-after-free.
      [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
      refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
      [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
      [ 4443.426808] Call trace:
      [ 4443.426813]  refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426823]  skb_release_data+0x144/0x264
      [ 4443.426828]  kfree_skb+0x58/0xc4
      [ 4443.426832]  skb_queue_purge+0x64/0x9c
      [ 4443.426844]  packet_set_ring+0x5f0/0x820
      [ 4443.426849]  packet_setsockopt+0x5a4/0xcd0
      [ 4443.426853]  __sys_setsockopt+0x188/0x278
      [ 4443.426858]  __arm64_sys_setsockopt+0x28/0x38
      [ 4443.426869]  el0_svc_common+0xf0/0x1d0
      [ 4443.426873]  el0_svc_handler+0x74/0x98
      [ 4443.426880]  el0_svc+0x8/0xc
      
      Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.)
      Signed-off-by: NDongseok Yi <dseok.yi@samsung.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      da767576
  12. 27 1月, 2021 1 次提交
  13. 04 12月, 2020 1 次提交
  14. 28 11月, 2020 1 次提交
  15. 06 10月, 2020 1 次提交
  16. 05 10月, 2020 1 次提交
  17. 04 10月, 2020 1 次提交
    • G
      net/sched: act_vlan: Add {POP,PUSH}_ETH actions · 19fbcb36
      Guillaume Nault 提交于
      Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to
      respectively pop and push a base Ethernet header at the beginning of a
      frame.
      
      POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any,
      must be stripped before calling POP_ETH.
      
      PUSH_ETH is restricted to skbs with no mac_header, and only the MAC
      addresses can be configured. The Ethertype is automatically set from
      skb->protocol. These restrictions ensure that all skb's fields remain
      consistent, so that this action can't confuse other part of the
      networking stack (like GSO).
      
      Since openvswitch already had these actions, consolidate the code in
      skbuff.c (like for vlan and mpls push/pop).
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19fbcb36
  18. 21 9月, 2020 1 次提交
  19. 25 8月, 2020 1 次提交
  20. 21 8月, 2020 2 次提交
  21. 19 8月, 2020 2 次提交
  22. 17 8月, 2020 1 次提交
  23. 09 8月, 2020 1 次提交
  24. 04 8月, 2020 2 次提交
  25. 01 8月, 2020 1 次提交
  26. 20 5月, 2020 1 次提交
  27. 18 5月, 2020 1 次提交
  28. 02 5月, 2020 1 次提交
  29. 31 3月, 2020 1 次提交
  30. 24 3月, 2020 1 次提交
    • Y
      net: Make skb_segment not to compute checksum if network controller supports checksumming · 1454c9fa
      Yadu Kishore 提交于
      Problem:
      TCP checksum in the output path is not being offloaded during GSO
      in the following case:
      The network driver does not support scatter-gather but supports
      checksum offload with NETIF_F_HW_CSUM.
      
      Cause:
      skb_segment calls skb_copy_and_csum_bits if the network driver
      does not announce NETIF_F_SG. It does not check if the driver
      supports NETIF_F_HW_CSUM.
      So for devices which might want to offload checksum but do not support SG
      there is currently no way to do so if GSO is enabled.
      
      Solution:
      In skb_segment check if the network controller does checksum and if so
      call skb_copy_bits instead of skb_copy_and_csum_bits.
      
      Testing:
      Without the patch, ran iperf TCP traffic with NETIF_F_HW_CSUM enabled
      in the network driver. Observed the TCP checksum offload is not happening
      since the skbs received by the driver in the output path have
      skb->ip_summed set to CHECKSUM_NONE.
      
      With the patch ran iperf TCP traffic and observed that TCP checksum
      is being offloaded with skb->ip_summed set to CHECKSUM_PARTIAL.
      Also tested with the patch by disabling NETIF_F_HW_CSUM in the driver
      to cover the newly introduced if-else code path in skb_segment.
      
      Link: https://lore.kernel.org/netdev/CA+FuTSeYGYr3Umij+Mezk9CUcaxYwqEe5sPSuXF8jPE2yMFJAw@mail.gmail.comSigned-off-by: NYadu Kishore <kyk.segfault@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1454c9fa