1. 23 6月, 2022 2 次提交
  2. 21 6月, 2022 2 次提交
  3. 20 6月, 2022 2 次提交
    • Z
      net/tls: fix tls_sk_proto_close executed repeatedly · 69135c57
      Ziyang Xuan 提交于
      After setting the sock ktls, update ctx->sk_proto to sock->sk_prot by
      tls_update(), so now ctx->sk_proto->close is tls_sk_proto_close(). When
      close the sock, tls_sk_proto_close() is called for sock->sk_prot->close
      is tls_sk_proto_close(). But ctx->sk_proto->close() will be executed later
      in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly
      occurred. That will trigger the following bug.
      
      =================================================================
      KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
      RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306
      Call Trace:
       <TASK>
       tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
       __sock_release+0xcd/0x280 net/socket.c:650
       sock_close+0x18/0x20 net/socket.c:1365
      
      Updating a proto which is same with sock->sk_prot is incorrect. Add proto
      and sock->sk_prot equality check at the head of tls_update() to fix it.
      
      Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
      Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69135c57
    • E
      erspan: do not assume transport header is always set · 301bd140
      Eric Dumazet 提交于
      Rewrite tests in ip6erspan_tunnel_xmit() and
      erspan_fb_xmit() to not assume transport header is set.
      
      syzbot reported:
      
      WARNING: CPU: 0 PID: 1350 at include/linux/skbuff.h:2911 skb_transport_header include/linux/skbuff.h:2911 [inline]
      WARNING: CPU: 0 PID: 1350 at include/linux/skbuff.h:2911 ip6erspan_tunnel_xmit+0x15af/0x2eb0 net/ipv6/ip6_gre.c:963
      Modules linked in:
      CPU: 0 PID: 1350 Comm: aoe_tx0 Not tainted 5.19.0-rc2-syzkaller-00160-g274295c6 #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      RIP: 0010:skb_transport_header include/linux/skbuff.h:2911 [inline]
      RIP: 0010:ip6erspan_tunnel_xmit+0x15af/0x2eb0 net/ipv6/ip6_gre.c:963
      Code: 0f 47 f0 40 88 b5 7f fe ff ff e8 8c 16 4b f9 89 de bf ff ff ff ff e8 a0 12 4b f9 66 83 fb ff 0f 85 1d f1 ff ff e8 71 16 4b f9 <0f> 0b e9 43 f0 ff ff e8 65 16 4b f9 48 8d 85 30 ff ff ff ba 60 00
      RSP: 0018:ffffc90005daf910 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 000000000000ffff RCX: 0000000000000000
      RDX: ffff88801f032100 RSI: ffffffff882e8d3f RDI: 0000000000000003
      RBP: ffffc90005dafab8 R08: 0000000000000003 R09: 000000000000ffff
      R10: 000000000000ffff R11: 0000000000000000 R12: ffff888024f21d40
      R13: 000000000000a288 R14: 00000000000000b0 R15: ffff888025a2e000
      FS: 0000000000000000(0000) GS:ffff88802c800000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2e425000 CR3: 000000006d099000 CR4: 0000000000152ef0
      Call Trace:
      <TASK>
      __netdev_start_xmit include/linux/netdevice.h:4805 [inline]
      netdev_start_xmit include/linux/netdevice.h:4819 [inline]
      xmit_one net/core/dev.c:3588 [inline]
      dev_hard_start_xmit+0x188/0x880 net/core/dev.c:3604
      sch_direct_xmit+0x19f/0xbe0 net/sched/sch_generic.c:342
      __dev_xmit_skb net/core/dev.c:3815 [inline]
      __dev_queue_xmit+0x14a1/0x3900 net/core/dev.c:4219
      dev_queue_xmit include/linux/netdevice.h:2994 [inline]
      tx+0x6a/0xc0 drivers/block/aoe/aoenet.c:63
      kthread+0x1e7/0x3b0 drivers/block/aoe/aoecmd.c:1229
      kthread+0x2e9/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
      </TASK>
      
      Fixes: d5db21a3 ("erspan: auto detect truncated ipv6 packets.")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: William Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      301bd140
  4. 18 6月, 2022 3 次提交
  5. 17 6月, 2022 4 次提交
    • R
      ipv4: ping: fix bind address validity check · b4a028c4
      Riccardo Paolo Bestetti 提交于
      Commit 8ff978b8 ("ipv4/raw: support binding to nonlocal addresses")
      introduced a helper function to fold duplicated validity checks of bind
      addresses into inet_addr_valid_or_nonlocal(). However, this caused an
      unintended regression in ping_check_bind_addr(), which previously would
      reject binding to multicast and broadcast addresses, but now these are
      both incorrectly allowed as reported in [1].
      
      This patch restores the original check. A simple reordering is done to
      improve readability and make it evident that multicast and broadcast
      addresses should not be allowed. Also, add an early exit for INADDR_ANY
      which replaces lost behavior added by commit 0ce779a9 ("net: Avoid
      unnecessary inet_addr_type() call when addr is INADDR_ANY").
      
      Furthermore, this patch introduces regression selftests to catch these
      specific cases.
      
      [1] https://lore.kernel.org/netdev/CANP3RGdkAcDyAZoT1h8Gtuu0saq+eOrrTiWbxnOs+5zn+cpyKg@mail.gmail.com/
      
      Fixes: 8ff978b8 ("ipv4/raw: support binding to nonlocal addresses")
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NCarlos Llamas <cmllamas@google.com>
      Signed-off-by: NRiccardo Paolo Bestetti <pbl@bestov.io>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4a028c4
    • H
      tipc: fix use-after-free Read in tipc_named_reinit · 911600bf
      Hoang Le 提交于
      syzbot found the following issue on:
      ==================================================================
      BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
      net/tipc/name_distr.c:413
      Read of size 8 at addr ffff88805299a000 by task kworker/1:9/23764
      
      CPU: 1 PID: 23764 Comm: kworker/1:9 Not tainted
      5.18.0-rc4-syzkaller-00878-g17d49e6e #0
      Hardware name: Google Compute Engine/Google Compute Engine,
      BIOS Google 01/01/2011
      Workqueue: events tipc_net_finalize_work
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x495
      mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       tipc_named_reinit+0x94f/0x9b0 net/tipc/name_distr.c:413
       tipc_net_finalize+0x234/0x3d0 net/tipc/net.c:138
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
       </TASK>
      [...]
      ==================================================================
      
      In the commit
      d966ddcc ("tipc: fix a deadlock when flushing scheduled work"),
      the cancel_work_sync() function just to make sure ONLY the work
      tipc_net_finalize_work() is executing/pending on any CPU completed before
      tipc namespace is destroyed through tipc_exit_net(). But this function
      is not guaranteed the work is the last queued. So, the destroyed instance
      may be accessed in the work which will try to enqueue later.
      
      In order to completely fix, we re-order the calling of cancel_work_sync()
      to make sure the work tipc_net_finalize_work() was last queued and it
      must be completed by calling cancel_work_sync().
      
      Reported-by: syzbot+47af19f3307fc9c5c82e@syzkaller.appspotmail.com
      Fixes: d966ddcc ("tipc: fix a deadlock when flushing scheduled work")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      911600bf
    • E
      net: fix data-race in dev_isalive() · cc26c266
      Eric Dumazet 提交于
      dev_isalive() is called under RTNL or dev_base_lock protection.
      
      This means that changes to dev->reg_state should be done with both locks held.
      
      syzbot reported:
      
      BUG: KCSAN: data-race in register_netdevice / type_show
      
      write to 0xffff888144ecf518 of 1 bytes by task 20886 on cpu 0:
      register_netdevice+0xb9f/0xdf0 net/core/dev.c:10050
      lapbeth_new_device drivers/net/wan/lapbether.c:414 [inline]
      lapbeth_device_event+0x4a0/0x6c0 drivers/net/wan/lapbether.c:456
      notifier_call_chain kernel/notifier.c:87 [inline]
      raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:455
      __dev_notify_flags+0x1d6/0x3a0
      dev_change_flags+0xa2/0xc0 net/core/dev.c:8607
      do_setlink+0x778/0x2230 net/core/rtnetlink.c:2780
      __rtnl_newlink net/core/rtnetlink.c:3546 [inline]
      rtnl_newlink+0x114c/0x16a0 net/core/rtnetlink.c:3593
      rtnetlink_rcv_msg+0x811/0x8c0 net/core/rtnetlink.c:6089
      netlink_rcv_skb+0x13e/0x240 net/netlink/af_netlink.c:2501
      rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6107
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x58a/0x660 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x661/0x750 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg net/socket.c:734 [inline]
      __sys_sendto+0x21e/0x2c0 net/socket.c:2119
      __do_sys_sendto net/socket.c:2131 [inline]
      __se_sys_sendto net/socket.c:2127 [inline]
      __x64_sys_sendto+0x74/0x90 net/socket.c:2127
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      read to 0xffff888144ecf518 of 1 bytes by task 20423 on cpu 1:
      dev_isalive net/core/net-sysfs.c:38 [inline]
      netdev_show net/core/net-sysfs.c:50 [inline]
      type_show+0x24/0x90 net/core/net-sysfs.c:112
      dev_attr_show+0x35/0x90 drivers/base/core.c:2095
      sysfs_kf_seq_show+0x175/0x240 fs/sysfs/file.c:59
      kernfs_seq_show+0x75/0x80 fs/kernfs/file.c:162
      seq_read_iter+0x2c3/0x8e0 fs/seq_file.c:230
      kernfs_fop_read_iter+0xd1/0x2f0 fs/kernfs/file.c:235
      call_read_iter include/linux/fs.h:2052 [inline]
      new_sync_read fs/read_write.c:401 [inline]
      vfs_read+0x5a5/0x6a0 fs/read_write.c:482
      ksys_read+0xe8/0x1a0 fs/read_write.c:620
      __do_sys_read fs/read_write.c:630 [inline]
      __se_sys_read fs/read_write.c:628 [inline]
      __x64_sys_read+0x3e/0x50 fs/read_write.c:628
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 20423 Comm: udevd Tainted: G W 5.19.0-rc2-syzkaller-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc26c266
    • J
      Revert "net: Add a second bind table hashed by port and address" · 593d1ebe
      Joanne Koong 提交于
      This reverts:
      
      commit d5a42de8 ("net: Add a second bind table hashed by port and address")
      commit 538aaf9b ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
      Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/
      
      There are a few things that need to be fixed here:
      * Updating bhash2 in cases where the socket's rcv saddr changes
      * Adding bhash2 hashbucket locks
      
      Links to syzbot reports:
      https://lore.kernel.org/netdev/00000000000022208805e0df247a@google.com/
      https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/
      
      Fixes: d5a42de8 ("net: Add a second bind table hashed by port and address")
      Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com
      Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com
      Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com
      Signed-off-by: NJoanne Koong <joannelkoong@gmail.com>
      Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      593d1ebe
  6. 15 6月, 2022 2 次提交
    • J
      bpf: Fix request_sock leak in sk lookup helpers · 3046a827
      Jon Maxwell 提交于
      A customer reported a request_socket leak in a Calico cloud environment. We
      found that a BPF program was doing a socket lookup with takes a refcnt on
      the socket and that it was finding the request_socket but returning the parent
      LISTEN socket via sk_to_full_sk() without decrementing the child request socket
      1st, resulting in request_sock slab object leak. This patch retains the
      existing behaviour of returning full socks to the caller but it also decrements
      the child request_socket if one is present before doing so to prevent the leak.
      
      Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
      thanks to Antoine Tenart for the reproducer and patch input.
      
      v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
      validate RCU flags on the listen socket so that it balances with bpf_sk_release()
      and update comments as per Martin KaFai Lau's suggestion. One small change to
      Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra
      instruction.
      
      Fixes: f7355a6c ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()")
      Fixes: edbf8c01 ("bpf: add skc_lookup_tcp helper")
      Co-developed-by: NAntoine Tenart <atenart@kernel.org>
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NCurtis Taylor <cutaylor-pub@yahoo.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net
      Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com
      3046a827
    • D
      net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg · 219b51a6
      Duoming Zhou 提交于
      The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
      and block until it receives a packet from the remote. If the client
      doesn`t connect to server and calls read() directly, it will not
      receive any packets forever. As a result, the deadlock will happen.
      
      The fail log caused by deadlock is shown below:
      
      [  369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
      [  369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  369.613058] Call Trace:
      [  369.613315]  <TASK>
      [  369.614072]  __schedule+0x2f9/0xb20
      [  369.615029]  schedule+0x49/0xb0
      [  369.615734]  __lock_sock+0x92/0x100
      [  369.616763]  ? destroy_sched_domains_rcu+0x20/0x20
      [  369.617941]  lock_sock_nested+0x6e/0x70
      [  369.618809]  ax25_bind+0xaa/0x210
      [  369.619736]  __sys_bind+0xca/0xf0
      [  369.620039]  ? do_futex+0xae/0x1b0
      [  369.620387]  ? __x64_sys_futex+0x7c/0x1c0
      [  369.620601]  ? fpregs_assert_state_consistent+0x19/0x40
      [  369.620613]  __x64_sys_bind+0x11/0x20
      [  369.621791]  do_syscall_64+0x3b/0x90
      [  369.622423]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  369.623319] RIP: 0033:0x7f43c8aa8af7
      [  369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
      [  369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
      [  369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
      [  369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
      [  369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
      [  369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700
      
      This patch replaces skb_recv_datagram() with an open-coded variant of it
      releasing the socket lock before the __skb_wait_for_more_packets() call
      and re-acquiring it after such call in order that other functions that
      need socket lock could be executed.
      
      what's more, the socket lock will be released only when recvmsg() will
      block and that should produce nicer overall behavior.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Suggested-by: NThomas Osterried <thomas@osterried.de>
      Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
      Reported-by: Thomas Habets <thomas@@habets.se>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      219b51a6
  7. 14 6月, 2022 1 次提交
  8. 10 6月, 2022 2 次提交
    • A
      net: seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdev · a3bd2102
      Andrea Mayer 提交于
      Commit 40867d74 ("net: Add l3mdev index to flow struct and avoid oif
      reset for port devices") adds a new entry (flowi_l3mdev) in the common
      flow struct used for indicating the l3mdev index for later rule and
      table matching.
      The l3mdev_update_flow() has been adapted to properly set the
      flowi_l3mdev based on the flowi_oif/flowi_iif. In fact, when a valid
      flowi_iif is supplied to the l3mdev_update_flow(), this function can
      update the flowi_l3mdev entry only if it has not yet been set (i.e., the
      flowi_l3mdev entry is equal to 0).
      
      The SRv6 End.DT6 behavior in VRF mode leverages a VRF device in order to
      force the routing lookup into the associated routing table. This routing
      operation is performed by seg6_lookup_any_nextop() preparing a flowi6
      data structure used by ip6_route_input_lookup() which, in turn,
      (indirectly) invokes l3mdev_update_flow().
      
      However, seg6_lookup_any_nexthop() does not initialize the new
      flowi_l3mdev entry which is filled with random garbage data. This
      prevents l3mdev_update_flow() from properly updating the flowi_l3mdev
      with the VRF index, and thus SRv6 End.DT6 (VRF mode)/DT46 behaviors are
      broken.
      
      This patch correctly initializes the flowi6 instance allocated and used
      by seg6_lookup_any_nexhtop(). Specifically, the entire flowi6 instance
      is wiped out: in case new entries are added to flowi/flowi6 (as happened
      with the flowi_l3mdev entry), we should no longer have incorrectly
      initialized values. As a result of this operation, the value of
      flowi_l3mdev is also set to 0.
      
      The proposed fix can be tested easily. Starting from the commit
      referenced in the Fixes, selftests [1],[2] indicate that the SRv6
      End.DT6 (VRF mode)/DT46 behaviors no longer work correctly. By applying
      this patch, those behaviors are back to work properly again.
      
      [1] - tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
      [2] - tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
      
      Fixes: 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
      Reported-by: NAnton Makarov <am@3a-alliance.com>
      Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220608091917.20345-1-andrea.mayer@uniroma2.itSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a3bd2102
    • M
      tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TX · b489a6e5
      Maxim Mikityanskiy 提交于
      To embrace possible future optimizations of TLS, rename zerocopy
      sendfile definitions to more generic ones:
      
      * setsockopt: TLS_TX_ZEROCOPY_SENDFILE- > TLS_TX_ZEROCOPY_RO
      * sock_diag: TLS_INFO_ZC_SENDFILE -> TLS_INFO_ZC_RO_TX
      
      RO stands for readonly and emphasizes that the application shouldn't
      modify the data being transmitted with zerocopy to avoid potential
      disconnection.
      
      Fixes: c1318b39 ("tls: Add opt-in zerocopy mode of sendfile()")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b489a6e5
  9. 09 6月, 2022 12 次提交
  10. 08 6月, 2022 2 次提交
    • M
      xsk: Fix handling of invalid descriptors in XSK TX batching API · d678cbd2
      Maciej Fijalkowski 提交于
      xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK
      Tx batching API. There is a test that checks how invalid Tx descriptors
      are handled by AF_XDP. Each valid descriptor is followed by invalid one
      on Tx side whereas the Rx side expects only to receive a set of valid
      descriptors.
      
      In current xsk_tx_peek_release_desc_batch() function, the amount of
      available descriptors is hidden inside xskq_cons_peek_desc_batch(). This
      can be problematic in cases where invalid descriptors are present due to
      the fact that xskq_cons_peek_desc_batch() returns only a count of valid
      descriptors. This means that it is impossible to properly update XSK
      ring state when calling xskq_cons_release_n().
      
      To address this issue, pull out the contents of
      xskq_cons_peek_desc_batch() so that callers (currently only
      xsk_tx_peek_release_desc_batch()) will always be able to update the
      state of ring properly, as total count of entries is now available and
      use this value as an argument in xskq_cons_release_n(). By
      doing so, xskq_cons_peek_desc_batch() can be dropped altogether.
      
      Fixes: 9349eb3a ("xsk: Introduce batched Tx descriptor interfaces")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com
      d678cbd2
    • F
      netfilter: use get_random_u32 instead of prandom · b1fd94e7
      Florian Westphal 提交于
      bh might occur while updating per-cpu rnd_state from user context,
      ie. local_out path.
      
      BUG: using smp_processor_id() in preemptible [00000000] code: nginx/2725
      caller is nft_ng_random_eval+0x24/0x54 [nft_numgen]
      Call Trace:
       check_preemption_disabled+0xde/0xe0
       nft_ng_random_eval+0x24/0x54 [nft_numgen]
      
      Use the random driver instead, this also avoids need for local prandom
      state. Moreover, prandom now uses the random driver since d4150779
      ("random32: use real rng for non-deterministic randomness").
      
      Based on earlier patch from Pablo Neira.
      
      Fixes: 6b2faee0 ("netfilter: nft_meta: place prandom handling in a helper")
      Fixes: 978d8f90 ("netfilter: nft_numgen: add map lookups for numgen random operations")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b1fd94e7
  11. 07 6月, 2022 2 次提交
  12. 06 6月, 2022 3 次提交
  13. 03 6月, 2022 3 次提交
    • P
      netfilter: nf_tables: always initialize flowtable hook list in transaction · 2c9e4559
      Pablo Neira Ayuso 提交于
      The hook list is used if nft_trans_flowtable_update(trans) == true. However,
      initialize this list for other cases for safety reasons.
      
      Fixes: 78d9f48f ("netfilter: nf_tables: add devices to existing flowtable")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2c9e4559
    • E
      net/af_packet: make sure to pull mac header · e9d3f809
      Eric Dumazet 提交于
      GSO assumes skb->head contains link layer headers.
      
      tun device in some case can provide base 14 bytes,
      regardless of VLAN being used or not.
      
      After blamed commit, we can end up setting a network
      header offset of 18+, we better pull the missing
      bytes to avoid a posible crash in GSO.
      
      syzbot report was:
      kernel BUG at include/linux/skbuff.h:2699!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 3601 Comm: syz-executor210 Not tainted 5.18.0-syzkaller-11338-g2c5ca23f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__skb_pull include/linux/skbuff.h:2699 [inline]
      RIP: 0010:skb_mac_gso_segment+0x48f/0x530 net/core/gro.c:136
      Code: 00 48 c7 c7 00 96 d4 8a c6 05 cb d3 45 06 01 e8 26 bb d0 01 e9 2f fd ff ff 49 c7 c4 ea ff ff ff e9 f1 fe ff ff e8 91 84 19 fa <0f> 0b 48 89 df e8 97 44 66 fa e9 7f fd ff ff e8 ad 44 66 fa e9 48
      RSP: 0018:ffffc90002e2f4b8 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000000000
      RDX: ffff88805bb58000 RSI: ffffffff8760ed0f RDI: 0000000000000004
      RBP: 0000000000005dbc R08: 0000000000000004 R09: 0000000000000fe0
      R10: 0000000000000fe4 R11: 0000000000000000 R12: 0000000000000fe0
      R13: ffff88807194d780 R14: 1ffff920005c5e9b R15: 0000000000000012
      FS:  000055555730f300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000200015c0 CR3: 0000000071ff8000 CR4: 0000000000350ee0
      Call Trace:
       <TASK>
       __skb_gso_segment+0x327/0x6e0 net/core/dev.c:3411
       skb_gso_segment include/linux/netdevice.h:4749 [inline]
       validate_xmit_skb+0x6bc/0xf10 net/core/dev.c:3669
       validate_xmit_skb_list+0xbc/0x120 net/core/dev.c:3719
       sch_direct_xmit+0x3d1/0xbe0 net/sched/sch_generic.c:327
       __dev_xmit_skb net/core/dev.c:3815 [inline]
       __dev_queue_xmit+0x14a1/0x3a00 net/core/dev.c:4219
       packet_snd net/packet/af_packet.c:3071 [inline]
       packet_sendmsg+0x21cb/0x5550 net/packet/af_packet.c:3102
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:734
       ____sys_sendmsg+0x6eb/0x810 net/socket.c:2492
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2546
       __sys_sendmsg net/socket.c:2575 [inline]
       __do_sys_sendmsg net/socket.c:2584 [inline]
       __se_sys_sendmsg net/socket.c:2582 [inline]
       __x64_sys_sendmsg+0x132/0x220 net/socket.c:2582
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f4b95da06c9
      Code: 28 c3 e8 4a 15 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffd7defc4c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007ffd7defc4f0 RCX: 00007f4b95da06c9
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 0000000000000003 R08: bb1414ac00000050 R09: bb1414ac00000050
      R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd7defc4e0 R14: 00007ffd7defc4d8 R15: 00007ffd7defc4d4
       </TASK>
      
      Fixes: dfed913e ("net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e9d3f809
    • E
      net: CONFIG_DEBUG_NET depends on CONFIG_NET · eb0b39ef
      Eric Dumazet 提交于
      It makes little sense to debug networking stacks
      if networking is not compiled in.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      eb0b39ef