1. 11 3月, 2021 2 次提交
    • D
      net, bpf: Fix ip6ip6 crash with collect_md populated skbs · a188bb56
      Daniel Borkmann 提交于
      I ran into a crash where setting up a ip6ip6 tunnel device which was /not/
      set to collect_md mode was receiving collect_md populated skbs for xmit.
      
      The BPF prog was populating the skb via bpf_skb_set_tunnel_key() which is
      assigning special metadata dst entry and then redirecting the skb to the
      device, taking ip6_tnl_start_xmit() -> ipxip6_tnl_xmit() -> ip6_tnl_xmit()
      and in the latter it performs a neigh lookup based on skb_dst(skb) where
      we trigger a NULL pointer dereference on dst->ops->neigh_lookup() since
      the md_dst_ops do not populate neigh_lookup callback with a fake handler.
      
      Transform the md_dst_ops into generic dst_blackhole_ops that can also be
      reused elsewhere when needed, and use them for the metadata dst entries as
      callback ops.
      
      Also, remove the dst_md_discard{,_out}() ops and rely on dst_discard{,_out}()
      from dst_init() which free the skb the same way modulo the splat. Given we
      will be able to recover just fine from there, avoid any potential splats
      iff this gets ever triggered in future (or worse, panic on warns when set).
      
      Fixes: f38a9eb1 ("dst: Metadata destinations")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a188bb56
    • D
      net: Consolidate common blackhole dst ops · c4c877b2
      Daniel Borkmann 提交于
      Move generic blackhole dst ops to the core and use them from both
      ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No
      functional change otherwise. We need these also in other locations
      and having to define them over and over again is not great.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4c877b2
  2. 04 3月, 2021 1 次提交
  3. 02 3月, 2021 1 次提交
    • W
      net: expand textsearch ts_state to fit skb_seq_state · b228c9b0
      Willem de Bruijn 提交于
      The referenced commit expands the skb_seq_state used by
      skb_find_text with a 4B frag_off field, growing it to 48B.
      
      This exceeds container ts_state->cb, causing a stack corruption:
      
      [   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
      is corrupted in: skb_find_text+0xc5/0xd0
      [   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
      [   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.14.0-2 04/01/2014
      [   73.260078] Call Trace:
      [   73.264677]  dump_stack+0x57/0x6a
      [   73.267866]  panic+0xf6/0x2b7
      [   73.270578]  ? skb_find_text+0xc5/0xd0
      [   73.273964]  __stack_chk_fail+0x10/0x10
      [   73.277491]  skb_find_text+0xc5/0xd0
      [   73.280727]  string_mt+0x1f/0x30
      [   73.283639]  ipt_do_table+0x214/0x410
      
      The struct is passed between skb_find_text and its callbacks
      skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
      the textsearch interface using TS_SKB_CB.
      
      I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
      skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.
      
      skb->cb was increased from 40B to 48B after ts_state was introduced,
      in commit 3e3850e9 ("[NETFILTER]: Fix xfrm lookup in
      ip_route_me_harder/ip6_route_me_harder").
      
      Increase ts_state.cb[] to 48 to fit the struct.
      
      Also add a BUILD_BUG_ON to avoid a repeat.
      
      The alternative is to directly add a dependency from textsearch onto
      linux/skbuff.h, but I think the intent is textsearch to have no such
      dependencies on its callers.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
      Fixes: 97550f6f ("net: compound page support in skb_seq_read")
      Reported-by: NKris Karas <bugs-a17@moonlit-rail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b228c9b0
  4. 14 2月, 2021 11 次提交
  5. 13 2月, 2021 6 次提交
  6. 12 2月, 2021 4 次提交
  7. 10 2月, 2021 3 次提交
  8. 09 2月, 2021 1 次提交
  9. 07 2月, 2021 2 次提交
  10. 06 2月, 2021 1 次提交
  11. 05 2月, 2021 3 次提交
  12. 04 2月, 2021 2 次提交
  13. 03 2月, 2021 1 次提交
  14. 31 1月, 2021 1 次提交
    • C
      neighbour: Prevent a dead entry from updating gc_list · eb4e8fac
      Chinmay Agarwal 提交于
      Following race condition was detected:
      <CPU A, t0> - neigh_flush_dev() is under execution and calls
      neigh_mark_dead(n) marking the neighbour entry 'n' as dead.
      
      <CPU B, t1> - Executing: __netif_receive_skb() ->
      __netif_receive_skb_core() -> arp_rcv() -> arp_process().arp_process()
      calls __neigh_lookup() which takes a reference on neighbour entry 'n'.
      
      <CPU A, t2> - Moves further along neigh_flush_dev() and calls
      neigh_cleanup_and_release(n), but since reference count increased in t2,
      'n' couldn't be destroyed.
      
      <CPU B, t3> - Moves further along, arp_process() and calls
      neigh_update()-> __neigh_update() -> neigh_update_gc_list(), which adds
      the neighbour entry back in gc_list(neigh_mark_dead(), removed it
      earlier in t0 from gc_list)
      
      <CPU B, t4> - arp_process() finally calls neigh_release(n), destroying
      the neighbour entry.
      
      This leads to 'n' still being part of gc_list, but the actual
      neighbour structure has been freed.
      
      The situation can be prevented from happening if we disallow a dead
      entry to have any possibility of updating gc_list. This is what the
      patch intends to achieve.
      
      Fixes: 9c29a2f5 ("neighbor: Fix locking order for gc_list changes")
      Signed-off-by: NChinmay Agarwal <chinagar@codeaurora.org>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210127165453.GA20514@chinagar-linux.qualcomm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      eb4e8fac
  15. 30 1月, 2021 1 次提交
    • K
      net: Remove redundant calls of sk_tx_queue_clear(). · df610cd9
      Kuniyuki Iwashima 提交于
      The commit 41b14fb8 ("net: Do not clear the sock TX queue in
      sk_set_socket()") removes sk_tx_queue_clear() from sk_set_socket() and adds
      it instead in sk_alloc() and sk_clone_lock() to fix an issue introduced in
      the commit e022f0b4 ("net: Introduce sk_tx_queue_mapping"). On the
      other hand, the original commit had already put sk_tx_queue_clear() in
      sk_prot_alloc(): the callee of sk_alloc() and sk_clone_lock(). Thus
      sk_tx_queue_clear() is called twice in each path.
      
      If we remove sk_tx_queue_clear() in sk_alloc() and sk_clone_lock(), it
      currently works well because (i) sk_tx_queue_mapping is defined between
      sk_dontcopy_begin and sk_dontcopy_end, and (ii) sock_copy() called after
      sk_prot_alloc() in sk_clone_lock() does not overwrite sk_tx_queue_mapping.
      However, if we move sk_tx_queue_mapping out of the no copy area, it
      introduces a bug unintentionally.
      
      Therefore, this patch adds a compile-time check to take care of the order
      of sock_copy() and sk_tx_queue_clear() and removes sk_tx_queue_clear() from
      sk_prot_alloc() so that it does the only allocation and its callers
      initialize fields.
      
      CC: Boris Pismenny <borisp@mellanox.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Acked-by: NTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20210128150217.6060-1-kuniyu@amazon.co.jpSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      df610cd9