1. 14 10月, 2019 1 次提交
  2. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  3. 28 9月, 2019 1 次提交
    • F
      sk_buff: drop all skb extensions on free and skb scrubbing · 174e2381
      Florian Westphal 提交于
      Now that we have a 3rd extension, add a new helper that drops the
      extension space and use it when we need to scrub an sk_buff.
      
      At this time, scrubbing clears secpath and bridge netfilter data, but
      retains the tc skb extension, after this patch all three get cleared.
      
      NAPI reuse/free assumes we can only have a secpath attached to skb, but
      it seems better to clear all extensions there as well.
      
      v2: add unlikely hint (Eric Dumazet)
      
      Fixes: 95a7233c ("net: openvswitch: Set OvS recirc_id from tc chain index")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      174e2381
  4. 07 9月, 2019 1 次提交
    • S
      net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list · 3dcbdb13
      Shmulik Ladkani 提交于
      Historically, support for frag_list packets entering skb_segment() was
      limited to frag_list members terminating on exact same gso_size
      boundaries. This is verified with a BUG_ON since commit 89319d38
      ("net: Add frag_list support to skb_segment"), quote:
      
          As such we require all frag_list members terminate on exact MSS
          boundaries.  This is checked using BUG_ON.
          As there should only be one producer in the kernel of such packets,
          namely GRO, this requirement should not be difficult to maintain.
      
      However, since commit 6578171a ("bpf: add bpf_skb_change_proto helper"),
      the "exact MSS boundaries" assumption no longer holds:
      An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
      leaves the frag_list members as originally merged by GRO with the
      original 'gso_size'. Example of such programs are bpf-based NAT46 or
      NAT64.
      
      This lead to a kernel BUG_ON for flows involving:
       - GRO generating a frag_list skb
       - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
       - skb_segment() of the skb
      
      See example BUG_ON reports in [0].
      
      In commit 13acc94e ("net: permit skb_segment on head_frag frag_list skb"),
      skb_segment() was modified to support the "gso_size mangling" case of
      a frag_list GRO'ed skb, but *only* for frag_list members having
      head_frag==true (having a page-fragment head).
      
      Alas, GRO packets having frag_list members with a linear kmalloced head
      (head_frag==false) still hit the BUG_ON.
      
      This commit adds support to skb_segment() for a 'head_skb' packet having
      a frag_list whose members are *non* head_frag, with gso_size mangled, by
      disabling SG and thus falling-back to copying the data from the given
      'head_skb' into the generated segmented skbs - as suggested by Willem de
      Bruijn [1].
      
      Since this approach involves the penalty of skb_copy_and_csum_bits()
      when building the segments, care was taken in order to enable this
      solution only when required:
       - untrusted gso_size, by testing SKB_GSO_DODGY is set
         (SKB_GSO_DODGY is set by any gso_size mangling functions in
          net/core/filter.c)
       - the frag_list is non empty, its item is a non head_frag, *and* the
         headlen of the given 'head_skb' does not match the gso_size.
      
      [0]
      https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
      https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/
      
      [1]
      https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/
      
      Fixes: 6578171a ("bpf: add bpf_skb_change_proto helper")
      Suggested-by: NWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dcbdb13
  5. 06 9月, 2019 1 次提交
    • P
      net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c
      Paul Blakey 提交于
      Offloaded OvS datapath rules are translated one to one to tc rules,
      for example the following simplified OvS rule:
      
      recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
      
      Will be translated to the following tc rule:
      
      $ tc filter add dev dev1 ingress \
      	    prio 1 chain 0 proto ip \
      		flower tcp ct_state -trk \
      		action ct pipe \
      		action goto chain 2
      
      Received packets will first travel though tc, and if they aren't stolen
      by it, like in the above rule, they will continue to OvS datapath.
      Since we already did some actions (action ct in this case) which might
      modify the packets, and updated action stats, we would like to continue
      the proccessing with the correct recirc_id in OvS (here recirc_id(2))
      where we left off.
      
      To support this, introduce a new skb extension for tc, which
      will be used for translating tc chain to ovs recirc_id to
      handle these miss cases. Last tc chain index will be set
      by tc goto chain action and read by OvS datapath.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a7233c
  6. 31 7月, 2019 1 次提交
  7. 23 7月, 2019 2 次提交
  8. 17 7月, 2019 1 次提交
  9. 09 7月, 2019 5 次提交
  10. 10 6月, 2019 2 次提交
    • S
      net: Don't disable interrupts in __netdev_alloc_skb() · 92dcabd7
      Sebastian Andrzej Siewior 提交于
      __netdev_alloc_skb() can be used from any context and is used by NAPI
      and non-NAPI drivers. Non-NAPI drivers use it in interrupt context and
      NAPI drivers use it during initial allocation (->ndo_open() or
      ->ndo_change_mtu()). Some NAPI drivers share the same function for the
      initial allocation and the allocation in their NAPI callback.
      
      The interrupts are disabled in order to ensure locked access from every
      context to `netdev_alloc_cache'.
      
      Let __netdev_alloc_skb() check if interrupts are disabled. If they are, use
      `netdev_alloc_cache'. Otherwise disable BH and use `napi_alloc_cache.page'.
      The IRQ check is cheaper compared to disabling & enabling interrupts and
      memory allocation with disabled interrupts does not work on -RT.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92dcabd7
    • S
      net: Don't disable interrupts in napi_alloc_frag() · 7ba7aeab
      Sebastian Andrzej Siewior 提交于
      netdev_alloc_frag() can be used from any context and is used by NAPI
      and non-NAPI drivers. Non-NAPI drivers use it in interrupt context
      and NAPI drivers use it during initial allocation (->ndo_open() or
      ->ndo_change_mtu()). Some NAPI drivers share the same function for the
      initial allocation and the allocation in their NAPI callback.
      
      The interrupts are disabled in order to ensure locked access from every
      context to `netdev_alloc_cache'.
      
      Let netdev_alloc_frag() check if interrupts are disabled. If they are,
      use `netdev_alloc_cache' otherwise disable BH and invoke
      __napi_alloc_frag() for the allocation. The IRQ check is cheaper
      compared to disabling & enabling interrupts and memory allocation with
      disabled interrupts does not work on -RT.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ba7aeab
  11. 05 6月, 2019 1 次提交
  12. 31 5月, 2019 3 次提交
  13. 25 5月, 2019 1 次提交
    • J
      bpf: sockmap, fix use after free from sleep in psock backlog workqueue · bd95e678
      John Fastabend 提交于
      Backlog work for psock (sk_psock_backlog) might sleep while waiting
      for memory to free up when sending packets. However, while sleeping
      the socket may be closed and removed from the map by the user space
      side.
      
      This breaks an assumption in sk_stream_wait_memory, which expects the
      wait queue to be still there when it wakes up resulting in a
      use-after-free shown below. To fix his mark sendmsg as MSG_DONTWAIT
      to avoid the sleep altogether. We already set the flag for the
      sendpage case but we missed the case were sendmsg is used.
      Sockmap is currently the only user of skb_send_sock_locked() so only
      the sockmap paths should be impacted.
      
      ==================================================================
      BUG: KASAN: use-after-free in remove_wait_queue+0x31/0x70
      Write of size 8 at addr ffff888069a0c4e8 by task kworker/0:2/110
      
      CPU: 0 PID: 110 Comm: kworker/0:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3-dirty #14
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      Workqueue: events sk_psock_backlog
      Call Trace:
       print_address_description+0x6e/0x2b0
       ? remove_wait_queue+0x31/0x70
       kasan_report+0xfd/0x177
       ? remove_wait_queue+0x31/0x70
       ? remove_wait_queue+0x31/0x70
       remove_wait_queue+0x31/0x70
       sk_stream_wait_memory+0x4dd/0x5f0
       ? sk_stream_wait_close+0x1b0/0x1b0
       ? wait_woken+0xc0/0xc0
       ? tcp_current_mss+0xc5/0x110
       tcp_sendmsg_locked+0x634/0x15d0
       ? tcp_set_state+0x2e0/0x2e0
       ? __kasan_slab_free+0x1d1/0x230
       ? kmem_cache_free+0x70/0x140
       ? sk_psock_backlog+0x40c/0x4b0
       ? process_one_work+0x40b/0x660
       ? worker_thread+0x82/0x680
       ? kthread+0x1b9/0x1e0
       ? ret_from_fork+0x1f/0x30
       ? check_preempt_curr+0xaf/0x130
       ? iov_iter_kvec+0x5f/0x70
       ? kernel_sendmsg_locked+0xa0/0xe0
       skb_send_sock_locked+0x273/0x3c0
       ? skb_splice_bits+0x180/0x180
       ? start_thread+0xe0/0xe0
       ? update_min_vruntime.constprop.27+0x88/0xc0
       sk_psock_backlog+0xb3/0x4b0
       ? strscpy+0xbf/0x1e0
       process_one_work+0x40b/0x660
       worker_thread+0x82/0x680
       ? process_one_work+0x660/0x660
       kthread+0x1b9/0x1e0
       ? __kthread_create_on_node+0x250/0x250
       ret_from_fork+0x1f/0x30
      
      Fixes: 20bf50de ("skbuff: Function to send an skbuf on a socket")
      Reported-by: NJakub Sitnicki <jakub@cloudflare.com>
      Tested-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bd95e678
  14. 18 4月, 2019 1 次提交
  15. 17 4月, 2019 1 次提交
  16. 04 4月, 2019 1 次提交
    • S
      net-gro: Fix GRO flush when receiving a GSO packet. · 0ab03f35
      Steffen Klassert 提交于
      Currently we may merge incorrectly a received GSO packet
      or a packet with frag_list into a packet sitting in the
      gro_hash list. skb_segment() may crash case because
      the assumptions on the skb layout are not met.
      The correct behaviour would be to flush the packet in the
      gro_hash list and send the received GSO packet directly
      afterwards. Commit d61d072e ("net-gro: avoid reorders")
      sets NAPI_GRO_CB(skb)->flush in this case, but this is not
      checked before merging. This patch makes sure to check this
      flag and to not merge in that case.
      
      Fixes: d61d072e ("net-gro: avoid reorders")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ab03f35
  17. 28 3月, 2019 1 次提交
  18. 18 2月, 2019 1 次提交
  19. 05 1月, 2019 1 次提交
    • D
      net, skbuff: do not prefer skb allocation fails early · f8c468e8
      David Rientjes 提交于
      Commit dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by
      __GFP_RETRY_MAYFAIL with more useful semantic") replaced __GFP_REPEAT in
      alloc_skb_with_frags() with __GFP_RETRY_MAYFAIL when the allocation may
      directly reclaim.
      
      The previous behavior would require reclaim up to 1 << order pages for
      skb aligned header_len of order > PAGE_ALLOC_COSTLY_ORDER before failing,
      otherwise the allocations in alloc_skb() would loop in the page allocator
      looking for memory.  __GFP_RETRY_MAYFAIL makes both allocations failable
      under memory pressure, including for the HEAD allocation.
      
      This can cause, among many other things, write() to fail with ENOTCONN
      during RPC when under memory pressure.
      
      These allocations should succeed as they did previous to dcda9b04
      even if it requires calling the oom killer and additional looping in the
      page allocator to find memory.  There is no way to specify the previous
      behavior of __GFP_REPEAT, but it's unlikely to be necessary since the
      previous behavior only guaranteed that 1 << order pages would be reclaimed
      before failing for order > PAGE_ALLOC_COSTLY_ORDER.  That reclaim is not
      guaranteed to be contiguous memory, so repeating for such large orders is
      usually not beneficial.
      
      Removing the setting of __GFP_RETRY_MAYFAIL to restore the previous
      behavior, specifically not allowing alloc_skb() to fail for small orders
      and oom kill if necessary rather than allowing RPCs to fail.
      
      Fixes: dcda9b04 ("mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic")
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8c468e8
  20. 22 12月, 2018 2 次提交
  21. 20 12月, 2018 3 次提交
    • F
      net: switch secpath to use skb extension infrastructure · 4165079b
      Florian Westphal 提交于
      Remove skb->sp and allocate secpath storage via extension
      infrastructure.  This also reduces sk_buff by 8 bytes on x86_64.
      
      Total size of allyesconfig kernel is reduced slightly, as there is
      less inlined code (one conditional atomic op instead of two on
      skb_clone).
      
      No differences in throughput in following ipsec performance tests:
      - transport mode with aes on 10GB link
      - tunnel mode between two network namespaces with aes and null cipher
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4165079b
    • F
      net: convert bridge_nf to use skb extension infrastructure · de8bda1d
      Florian Westphal 提交于
      This converts the bridge netfilter (calling iptables hooks from bridge)
      facility to use the extension infrastructure.
      
      The bridge_nf specific hooks in skb clone and free paths are removed, they
      have been replaced by the skb_ext hooks that do the same as the bridge nf
      allocations hooks did.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de8bda1d
    • F
      sk_buff: add skb extension infrastructure · df5042f4
      Florian Westphal 提交于
      This adds an optional extension infrastructure, with ispec (xfrm) and
      bridge netfilter as first users.
      objdiff shows no changes if kernel is built without xfrm and br_netfilter
      support.
      
      The third (planned future) user is Multipath TCP which is still
      out-of-tree.
      MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
      numbers used by individual subflows.
      
      This DSS mapping is read/written from tcp option space on receive and
      written to tcp option space on transmitted tcp packets that are part of
      and MPTCP connection.
      
      Extending skb_shared_info or adding a private data field to skb fclones
      doesn't work for incoming skb, so a different DSS propagation method would
      be required for the receive side.
      
      mptcp has same requirements as secpath/bridge netfilter:
      
      1. extension memory is released when the sk_buff is free'd.
      2. data is shared after cloning an skb (clone inherits extension)
      3. adding extension to an skb will COW the extension buffer if needed.
      
      The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
      mapping for tx and rx processing.
      
      Two new members are added to sk_buff:
      1. 'active_extensions' byte (filling a hole), telling which extensions
         are available for this skb.
         This has two purposes.
         a) avoids the need to initialize the pointer.
         b) allows to "delete" an extension by clearing its bit
         value in ->active_extensions.
      
         While it would be possible to store the active_extensions byte
         in the extension struct instead of sk_buff, there is one problem
         with this:
          When an extension has to be disabled, we can always clear the
          bit in skb->active_extensions.  But in case it would be stored in the
          extension buffer itself, we might have to COW it first, if
          we are dealing with a cloned skb.  On kmalloc failure we would
          be unable to turn an extension off.
      
      2. extension pointer, located at the end of the sk_buff.
         If the active_extensions byte is 0, the pointer is undefined,
         it is not initialized on skb allocation.
      
      This adds extra code to skb clone and free paths (to deal with
      refcount/free of extension area) but this replaces similar code that
      manages skb->nf_bridge and skb->sp structs in the followup patches of
      the series.
      
      It is possible to add support for extensions that are not preseved on
      clones/copies.
      
      To do this, it would be needed to define a bitmask of all extensions that
      need copy/cow semantics, and change __skb_ext_copy() to check
      ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
      ->active_extensions to 0 on the new clone.
      
      This isn't done here because all extensions that get added here
      need the copy/cow semantics.
      
      v2:
      Allocate entire extension space using kmem_cache.
      Upside is that this allows better tracking of used memory,
      downside is that we will allocate more space than strictly needed in
      most cases (its unlikely that all extensions are active/needed at same
      time for same skb).
      The allocated memory (except the small extension header) is not cleared,
      so no additonal overhead aside from memory usage.
      
      Avoid atomic_dec_and_test operation on skb_ext_put()
      by using similar trick as kfree_skbmem() does with fclone_ref:
      If recount is 1, there is no concurrent user and we can free right away.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df5042f4
  22. 05 12月, 2018 1 次提交
    • I
      skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' · 875e8939
      Ido Schimmel 提交于
      Commit abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field") added
      the 'offload_mr_fwd_mark' field to indicate that a packet has already
      undergone L3 multicast routing by a capable device. The field is used to
      prevent the kernel from forwarding a packet through a netdev through
      which the device has already forwarded the packet.
      
      Currently, no unicast packet is routed by both the device and the
      kernel, but this is about to change by subsequent patches and we need to
      be able to mark such packets, so that they will no be forwarded twice.
      
      Instead of adding yet another field to 'struct sk_buff', we can just
      rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark', as a packet
      either has a multicast or a unicast destination IP.
      
      While at it, add a comment about both 'offload_fwd_mark' and
      'offload_l3_fwd_mark'.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      875e8939
  23. 04 12月, 2018 2 次提交
    • W
      udp: elide zerocopy operation in hot path · 52900d22
      Willem de Bruijn 提交于
      With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
      Release of its last reference triggers a completion notification.
      
      The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
      the skbs, because it can build, send and free skbs within its loop,
      possibly reaching refcount zero and freeing the ubuf_info too soon.
      
      The UDP stack currently also takes this extra ref, but does not need
      it as all skbs are sent after return from __ip(6)_append_data.
      
      Avoid the extra refcount_inc and refcount_dec_and_test, and generally
      the sock_zerocopy_put in the common path, by passing the initial
      reference to the first skb.
      
      This approach is taken instead of initializing the refcount to 0, as
      that would generate error "refcount_t: increment on 0" on the
      next skb_zcopy_set.
      
      Changes
        v3 -> v4
          - Move skb_zcopy_set below the only kfree_skb that might cause
            a premature uarg destroy before skb_zerocopy_put_abort
            - Move the entire skb_shinfo assignment block, to keep that
              cacheline access in one place
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52900d22
    • W
      udp: msg_zerocopy · b5947e5d
      Willem de Bruijn 提交于
      Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
      interpret flag MSG_ZEROCOPY.
      
      This patch was previously part of the zerocopy RFC patchsets. Zerocopy
      is not effective at small MTU. With segmentation offload building
      larger datagrams, the benefit of page flipping outweights the cost of
      generating a completion notification.
      
      tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
      test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
      
          ipv4 udp -t 1
          tx=191312 (11938 MB) txc=0 zc=n
          rx=191312 (11938 MB)
          ipv4 udp -z -t 1
          tx=304507 (19002 MB) txc=304507 zc=y
          rx=304507 (19002 MB)
          ok
          ipv6 udp -t 1
          tx=174485 (10888 MB) txc=0 zc=n
          rx=174485 (10888 MB)
          ipv6 udp -z -t 1
          tx=294801 (18396 MB) txc=294801 zc=y
          rx=294801 (18396 MB)
          ok
      
      Changes
        v1 -> v2
          - Fixup reverse christmas tree violation
        v2 -> v3
          - Split refcount avoidance optimization into separate patch
            - Fix refcount leak on error in fragmented case
              (thanks to Paolo Abeni for pointing this one out!)
            - Fix refcount inc on zero
            - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
              This is needed since commit 5cf4a853 ("tcp: really ignore
      	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5947e5d
  24. 30 11月, 2018 1 次提交
  25. 26 11月, 2018 1 次提交
    • E
      net: remove unsafe skb_insert() · 4bffc669
      Eric Dumazet 提交于
      I do not see how one can effectively use skb_insert() without holding
      some kind of lock. Otherwise other cpus could have changed the list
      right before we have a chance of acquiring list->lock.
      
      Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
      one probably meant to use __skb_insert() since it appears nesqp->pau_list
      is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
      could be removed, since nesqp->pau_list.lock could be used instead.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Faisal Latif <faisal.latif@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: linux-rdma <linux-rdma@vger.kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bffc669
  26. 22 11月, 2018 1 次提交
    • P
      net: skb_scrub_packet(): Scrub offload_fwd_mark · b5dd186d
      Petr Machata 提交于
      When a packet is trapped and the corresponding SKB marked as
      already-forwarded, it retains this marking even after it is forwarded
      across veth links into another bridge. There, since it ingresses the
      bridge over veth, which doesn't have offload_fwd_mark, it triggers a
      warning in nbp_switchdev_frame_mark().
      
      Then nbp_switchdev_allowed_egress() decides not to allow egress from
      this bridge through another veth, because the SKB is already marked, and
      the mark (of 0) of course matches. Thus the packet is incorrectly
      blocked.
      
      Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
      function is called from tunnels and also from veth, and thus catches the
      cases where traffic is forwarded between bridges and transformed in a
      way that invalidates the marking.
      
      Fixes: 6bc506b4 ("bridge: switchdev: Add forward mark support for stacked devices")
      Fixes: abf4bb6b ("skbuff: Add the offload_mr_fwd_mark field")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Suggested-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5dd186d
  27. 20 11月, 2018 1 次提交
    • P
      net: skb_scrub_packet(): Scrub offload_fwd_mark · 6f9a5069
      Petr Machata 提交于
      When a packet is trapped and the corresponding SKB marked as
      already-forwarded, it retains this marking even after it is forwarded
      across veth links into another bridge. There, since it ingresses the
      bridge over veth, which doesn't have offload_fwd_mark, it triggers a
      warning in nbp_switchdev_frame_mark().
      
      Then nbp_switchdev_allowed_egress() decides not to allow egress from
      this bridge through another veth, because the SKB is already marked, and
      the mark (of 0) of course matches. Thus the packet is incorrectly
      blocked.
      
      Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
      function is called from tunnels and also from veth, and thus catches the
      cases where traffic is forwarded between bridges and transformed in a
      way that invalidates the marking.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Suggested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f9a5069
  28. 17 11月, 2018 1 次提交