1. 07 10月, 2021 1 次提交
    • M
      net: prefer socket bound to interface when not in VRF · 8d6c414c
      Mike Manning 提交于
      The commit 6da5b0f0 ("net: ensure unbound datagram socket to be
      chosen when not in a VRF") modified compute_score() so that a device
      match is always made, not just in the case of an l3mdev skb, then
      increments the score also for unbound sockets. This ensures that
      sockets bound to an l3mdev are never selected when not in a VRF.
      But as unbound and bound sockets are now scored equally, this results
      in the last opened socket being selected if there are matches in the
      default VRF for an unbound socket and a socket bound to a dev that is
      not an l3mdev. However, handling prior to this commit was to always
      select the bound socket in this case. Reinstate this handling by
      incrementing the score only for bound sockets. The required isolation
      due to choosing between an unbound socket and a socket bound to an
      l3mdev remains in place due to the device match always being made.
      The same approach is taken for compute_score() for stream sockets.
      
      Fixes: 6da5b0f0 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
      Fixes: e7819058 ("net: ensure unbound stream socket to be chosen when not in a VRF")
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/cf0a8523-b362-1edf-ee78-eef63cbbb428@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8d6c414c
  2. 06 10月, 2021 2 次提交
  3. 05 10月, 2021 4 次提交
    • E
      netlink: annotate data races around nlk->bound · 7707a4d0
      Eric Dumazet 提交于
      While existing code is correct, KCSAN is reporting
      a data-race in netlink_insert / netlink_sendmsg [1]
      
      It is correct to read nlk->bound without a lock, as netlink_autobind()
      will acquire all needed locks.
      
      [1]
      BUG: KCSAN: data-race in netlink_insert / netlink_sendmsg
      
      write to 0xffff8881031c8b30 of 1 bytes by task 18752 on cpu 0:
       netlink_insert+0x5cc/0x7f0 net/netlink/af_netlink.c:597
       netlink_autobind+0xa9/0x150 net/netlink/af_netlink.c:842
       netlink_sendmsg+0x479/0x7c0 net/netlink/af_netlink.c:1892
       sock_sendmsg_nosec net/socket.c:703 [inline]
       sock_sendmsg net/socket.c:723 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
       ___sys_sendmsg net/socket.c:2446 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
       __do_sys_sendmsg net/socket.c:2484 [inline]
       __se_sys_sendmsg net/socket.c:2482 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff8881031c8b30 of 1 bytes by task 18751 on cpu 1:
       netlink_sendmsg+0x270/0x7c0 net/netlink/af_netlink.c:1891
       sock_sendmsg_nosec net/socket.c:703 [inline]
       sock_sendmsg net/socket.c:723 [inline]
       __sys_sendto+0x2a8/0x370 net/socket.c:2019
       __do_sys_sendto net/socket.c:2031 [inline]
       __se_sys_sendto net/socket.c:2027 [inline]
       __x64_sys_sendto+0x74/0x90 net/socket.c:2027
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 18751 Comm: syz-executor.0 Not tainted 5.14.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: da314c99 ("netlink: Replace rhash_portid with bound")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7707a4d0
    • E
      net/sched: sch_taprio: properly cancel timer from taprio_destroy() · a56d447f
      Eric Dumazet 提交于
      There is a comment in qdisc_create() about us not calling ops->reset()
      in some cases.
      
      err_out4:
      	/*
      	 * Any broken qdiscs that would require a ops->reset() here?
      	 * The qdisc was never in action so it shouldn't be necessary.
      	 */
      
      As taprio sets a timer before actually receiving a packet, we need
      to cancel it from ops->destroy, just in case ops->reset has not
      been called.
      
      syzbot reported:
      
      ODEBUG: free active (active state 0) object type: hrtimer hint: advance_sched+0x0/0x9a0 arch/x86/include/asm/atomic64_64.h:22
      WARNING: CPU: 0 PID: 8441 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 0 PID: 8441 Comm: syz-executor813 Not tainted 5.14.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd e0 d3 e3 89 4c 89 ee 48 c7 c7 e0 c7 e3 89 e8 5b 86 11 05 <0f> 0b 83 05 85 03 92 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
      RSP: 0018:ffffc9000130f330 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
      RDX: ffff88802baeb880 RSI: ffffffff815d87b5 RDI: fffff52000261e58
      RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815d25ee R11: 0000000000000000 R12: ffffffff898dd020
      R13: ffffffff89e3ce20 R14: ffffffff81653630 R15: dffffc0000000000
      FS:  0000000000f0d300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffb64b3e000 CR3: 0000000036557000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __debug_check_no_obj_freed lib/debugobjects.c:987 [inline]
       debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1018
       slab_free_hook mm/slub.c:1603 [inline]
       slab_free_freelist_hook+0x171/0x240 mm/slub.c:1653
       slab_free mm/slub.c:3213 [inline]
       kfree+0xe4/0x540 mm/slub.c:4267
       qdisc_create+0xbcf/0x1320 net/sched/sch_api.c:1299
       tc_modify_qdisc+0x4c8/0x1a60 net/sched/sch_api.c:1663
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2403
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2457
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      
      Fixes: 44d4775c ("net/sched: sch_taprio: reset child qdiscs before freeing them")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Davide Caratti <dcaratti@redhat.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Acked-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a56d447f
    • E
      net: bridge: fix under estimation in br_get_linkxstats_size() · 0854a051
      Eric Dumazet 提交于
      Commit de179966 ("net: bridge: add STP xstats")
      added an additional nla_reserve_64bit() in br_fill_linkxstats(),
      but forgot to update br_get_linkxstats_size() accordingly.
      
      This can trigger the following in rtnl_stats_get()
      
      	WARN_ON(err == -EMSGSIZE);
      
      Fixes: de179966 ("net: bridge: add STP xstats")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0854a051
    • E
      net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size() · dbe0b880
      Eric Dumazet 提交于
      bridge_fill_linkxstats() is using nla_reserve_64bit().
      
      We must use nla_total_size_64bit() instead of nla_total_size()
      for corresponding data structure.
      
      Fixes: 1080ab95 ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbe0b880
  4. 04 10月, 2021 1 次提交
  5. 02 10月, 2021 2 次提交
    • P
      netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notification · 6fb721cf
      Pablo Neira Ayuso 提交于
      Include the NLM_F_CREATE and NLM_F_EXCL flags in netlink event
      notifications, otherwise userspace cannot distiguish between create and
      add commands.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6fb721cf
    • E
      net_sched: fix NULL deref in fifo_set_limit() · 560ee196
      Eric Dumazet 提交于
      syzbot reported another NULL deref in fifo_set_limit() [1]
      
      I could repro the issue with :
      
      unshare -n
      tc qd add dev lo root handle 1:0 tbf limit 200000 burst 70000 rate 100Mbit
      tc qd replace dev lo parent 1:0 pfifo_fast
      tc qd change dev lo root handle 1:0 tbf limit 300000 burst 70000 rate 100Mbit
      
      pfifo_fast does not have a change() operation.
      Make fifo_set_limit() more robust about this.
      
      [1]
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 1cf99067 P4D 1cf99067 PUD 7ca49067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 14443 Comm: syz-executor959 Not tainted 5.15.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
      RSP: 0018:ffffc9000e2f7310 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffffffff8d6ecc00 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff888024c27910 RDI: ffff888071e34000
      RBP: ffff888071e34000 R08: 0000000000000001 R09: ffffffff8fcfb947
      R10: 0000000000000001 R11: 0000000000000000 R12: ffff888024c27910
      R13: ffff888071e34018 R14: 0000000000000000 R15: ffff88801ef74800
      FS:  00007f321d897700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000722c3000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       fifo_set_limit net/sched/sch_fifo.c:242 [inline]
       fifo_set_limit+0x198/0x210 net/sched/sch_fifo.c:227
       tbf_change+0x6ec/0x16d0 net/sched/sch_tbf.c:418
       qdisc_change net/sched/sch_api.c:1332 [inline]
       tc_modify_qdisc+0xd9a/0x1a60 net/sched/sch_api.c:1634
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: fb0305ce ("net-sched: consolidate default fifo qdisc setup")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20210930212239.3430364-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      560ee196
  6. 01 10月, 2021 1 次提交
    • J
      SUNRPC: fix sign error causing rpcsec_gss drops · 2ba5acfb
      J. Bruce Fields 提交于
      If sd_max is unsigned, then sd_max - GSS_SEQ_WIN is a very large number
      whenever sd_max is less than GSS_SEQ_WIN, and the comparison:
      
      	seq_num <= sd->sd_max - GSS_SEQ_WIN
      
      in gss_check_seq_num is pretty much always true, even when that's
      clearly not what was intended.
      
      This was causing pynfs to hang when using krb5, because pynfs uses zero
      as the initial gss sequence number.  That's perfectly legal, but this
      logic error causes knfsd to drop the rpc in that case.  Out-of-order
      sequence IDs in the first GSS_SEQ_WIN (128) calls will also cause this.
      
      Fixes: 10b9d99a ("SUNRPC: Augment server-side rpcgss tracepoints")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      2ba5acfb
  7. 30 9月, 2021 4 次提交
    • E
      af_unix: fix races in sk_peer_pid and sk_peer_cred accesses · 35306eb2
      Eric Dumazet 提交于
      Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
      are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
      
      In order to fix this issue, this patch adds a new spinlock that needs
      to be used whenever these fields are read or written.
      
      Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
      reading sk->sk_peer_pid which makes no sense, as this field
      is only possibly set by AF_UNIX sockets.
      We will have to clean this in a separate patch.
      This could be done by reverting b48596d1 "Bluetooth: L2CAP: Add get_peer_pid callback"
      or implementing what was truly expected.
      
      Fixes: 109f6e39 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35306eb2
    • J
      net: dev_addr_list: handle first address in __hw_addr_add_ex · a5b8fd65
      Jakub Kicinski 提交于
      struct dev_addr_list is used for device addresses, unicast addresses
      and multicast addresses. The first of those needs special handling
      of the main address - netdev->dev_addr points directly the data
      of the entry and drivers write to it freely, so we can't maintain
      it in the rbtree (for now, at least, to be fixed in net-next).
      
      Current work around sprinkles special handling of the first
      address on the list throughout the code but it missed the case
      where address is being added. First address will not be visible
      during subsequent adds.
      
      Syzbot found a warning where unicast addresses are modified
      without holding the rtnl lock, tl;dr is that team generates
      the same modification multiple times, not necessarily when
      right locks are held.
      
      In the repro we have:
      
        macvlan -> team -> veth
      
      macvlan adds a unicast address to the team. Team then pushes
      that address down to its memebers (veths). Next something unrelated
      makes team sync member addrs again, and because of the bug
      the addr entries get duplicated in the veths. macvlan gets
      removed, removes its addr from team which removes only one
      of the duplicated addresses from veths. This removal is done
      under rtnl. Next syzbot uses iptables to add a multicast addr
      to team (which does not hold rtnl lock). Team syncs veth addrs,
      but because veths' unicast list still has the duplicate it will
      also get sync, even though this update is intended for mc addresses.
      Again, uc address updates need rtnl lock, boom.
      
      Reported-by: syzbot+7a2ab2cdc14d134de553@syzkaller.appspotmail.com
      Fixes: 406f42fa ("net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.")
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b8fd65
    • V
      net: sched: flower: protect fl_walk() with rcu · d5ef1906
      Vlad Buslov 提交于
      Patch that refactored fl_walk() to use idr_for_each_entry_continue_ul()
      also removed rcu protection of individual filters which causes following
      use-after-free when filter is deleted concurrently. Fix fl_walk() to obtain
      rcu read lock while iterating and taking the filter reference and temporary
      release the lock while calling arg->fn() callback that can sleep.
      
      KASAN trace:
      
      [  352.773640] ==================================================================
      [  352.775041] BUG: KASAN: use-after-free in fl_walk+0x159/0x240 [cls_flower]
      [  352.776304] Read of size 4 at addr ffff8881c8251480 by task tc/2987
      
      [  352.777862] CPU: 3 PID: 2987 Comm: tc Not tainted 5.15.0-rc2+ #2
      [  352.778980] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  352.781022] Call Trace:
      [  352.781573]  dump_stack_lvl+0x46/0x5a
      [  352.782332]  print_address_description.constprop.0+0x1f/0x140
      [  352.783400]  ? fl_walk+0x159/0x240 [cls_flower]
      [  352.784292]  ? fl_walk+0x159/0x240 [cls_flower]
      [  352.785138]  kasan_report.cold+0x83/0xdf
      [  352.785851]  ? fl_walk+0x159/0x240 [cls_flower]
      [  352.786587]  kasan_check_range+0x145/0x1a0
      [  352.787337]  fl_walk+0x159/0x240 [cls_flower]
      [  352.788163]  ? fl_put+0x10/0x10 [cls_flower]
      [  352.789007]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
      [  352.790102]  tcf_chain_dump+0x231/0x450
      [  352.790878]  ? tcf_chain_tp_delete_empty+0x170/0x170
      [  352.791833]  ? __might_sleep+0x2e/0xc0
      [  352.792594]  ? tfilter_notify+0x170/0x170
      [  352.793400]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
      [  352.794477]  tc_dump_tfilter+0x385/0x4b0
      [  352.795262]  ? tc_new_tfilter+0x1180/0x1180
      [  352.796103]  ? __mod_node_page_state+0x1f/0xc0
      [  352.796974]  ? __build_skb_around+0x10e/0x130
      [  352.797826]  netlink_dump+0x2c0/0x560
      [  352.798563]  ? netlink_getsockopt+0x430/0x430
      [  352.799433]  ? __mutex_unlock_slowpath.constprop.0+0x220/0x220
      [  352.800542]  __netlink_dump_start+0x356/0x440
      [  352.801397]  rtnetlink_rcv_msg+0x3ff/0x550
      [  352.802190]  ? tc_new_tfilter+0x1180/0x1180
      [  352.802872]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
      [  352.803668]  ? tc_new_tfilter+0x1180/0x1180
      [  352.804344]  ? _copy_from_iter_nocache+0x800/0x800
      [  352.805202]  ? kasan_set_track+0x1c/0x30
      [  352.805900]  netlink_rcv_skb+0xc6/0x1f0
      [  352.806587]  ? rht_deferred_worker+0x6b0/0x6b0
      [  352.807455]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
      [  352.808324]  ? netlink_ack+0x4d0/0x4d0
      [  352.809086]  ? netlink_deliver_tap+0x62/0x3d0
      [  352.809951]  netlink_unicast+0x353/0x480
      [  352.810744]  ? netlink_attachskb+0x430/0x430
      [  352.811586]  ? __alloc_skb+0xd7/0x200
      [  352.812349]  netlink_sendmsg+0x396/0x680
      [  352.813132]  ? netlink_unicast+0x480/0x480
      [  352.813952]  ? __import_iovec+0x192/0x210
      [  352.814759]  ? netlink_unicast+0x480/0x480
      [  352.815580]  sock_sendmsg+0x6c/0x80
      [  352.816299]  ____sys_sendmsg+0x3a5/0x3c0
      [  352.817096]  ? kernel_sendmsg+0x30/0x30
      [  352.817873]  ? __ia32_sys_recvmmsg+0x150/0x150
      [  352.818753]  ___sys_sendmsg+0xd8/0x140
      [  352.819518]  ? sendmsg_copy_msghdr+0x110/0x110
      [  352.820402]  ? ___sys_recvmsg+0xf4/0x1a0
      [  352.821110]  ? __copy_msghdr_from_user+0x260/0x260
      [  352.821934]  ? _raw_spin_lock+0x81/0xd0
      [  352.822680]  ? __handle_mm_fault+0xef3/0x1b20
      [  352.823549]  ? rb_insert_color+0x2a/0x270
      [  352.824373]  ? copy_page_range+0x16b0/0x16b0
      [  352.825209]  ? perf_event_update_userpage+0x2d0/0x2d0
      [  352.826190]  ? __fget_light+0xd9/0xf0
      [  352.826941]  __sys_sendmsg+0xb3/0x130
      [  352.827613]  ? __sys_sendmsg_sock+0x20/0x20
      [  352.828377]  ? do_user_addr_fault+0x2c5/0x8a0
      [  352.829184]  ? fpregs_assert_state_consistent+0x52/0x60
      [  352.830001]  ? exit_to_user_mode_prepare+0x32/0x160
      [  352.830845]  do_syscall_64+0x35/0x80
      [  352.831445]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  352.832331] RIP: 0033:0x7f7bee973c17
      [  352.833078] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  352.836202] RSP: 002b:00007ffcbb368e28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  352.837524] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7bee973c17
      [  352.838715] RDX: 0000000000000000 RSI: 00007ffcbb368e50 RDI: 0000000000000003
      [  352.839838] RBP: 00007ffcbb36d090 R08: 00000000cea96d79 R09: 00007f7beea34a40
      [  352.841021] R10: 00000000004059bb R11: 0000000000000246 R12: 000000000046563f
      [  352.842208] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcbb36d088
      
      [  352.843784] Allocated by task 2960:
      [  352.844451]  kasan_save_stack+0x1b/0x40
      [  352.845173]  __kasan_kmalloc+0x7c/0x90
      [  352.845873]  fl_change+0x282/0x22db [cls_flower]
      [  352.846696]  tc_new_tfilter+0x6cf/0x1180
      [  352.847493]  rtnetlink_rcv_msg+0x471/0x550
      [  352.848323]  netlink_rcv_skb+0xc6/0x1f0
      [  352.849097]  netlink_unicast+0x353/0x480
      [  352.849886]  netlink_sendmsg+0x396/0x680
      [  352.850678]  sock_sendmsg+0x6c/0x80
      [  352.851398]  ____sys_sendmsg+0x3a5/0x3c0
      [  352.852202]  ___sys_sendmsg+0xd8/0x140
      [  352.852967]  __sys_sendmsg+0xb3/0x130
      [  352.853718]  do_syscall_64+0x35/0x80
      [  352.854457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  352.855830] Freed by task 7:
      [  352.856421]  kasan_save_stack+0x1b/0x40
      [  352.857139]  kasan_set_track+0x1c/0x30
      [  352.857854]  kasan_set_free_info+0x20/0x30
      [  352.858609]  __kasan_slab_free+0xed/0x130
      [  352.859348]  kfree+0xa7/0x3c0
      [  352.859951]  process_one_work+0x44d/0x780
      [  352.860685]  worker_thread+0x2e2/0x7e0
      [  352.861390]  kthread+0x1f4/0x220
      [  352.862022]  ret_from_fork+0x1f/0x30
      
      [  352.862955] Last potentially related work creation:
      [  352.863758]  kasan_save_stack+0x1b/0x40
      [  352.864378]  kasan_record_aux_stack+0xab/0xc0
      [  352.865028]  insert_work+0x30/0x160
      [  352.865617]  __queue_work+0x351/0x670
      [  352.866261]  rcu_work_rcufn+0x30/0x40
      [  352.866917]  rcu_core+0x3b2/0xdb0
      [  352.867561]  __do_softirq+0xf6/0x386
      
      [  352.868708] Second to last potentially related work creation:
      [  352.869779]  kasan_save_stack+0x1b/0x40
      [  352.870560]  kasan_record_aux_stack+0xab/0xc0
      [  352.871426]  call_rcu+0x5f/0x5c0
      [  352.872108]  queue_rcu_work+0x44/0x50
      [  352.872855]  __fl_put+0x17c/0x240 [cls_flower]
      [  352.873733]  fl_delete+0xc7/0x100 [cls_flower]
      [  352.874607]  tc_del_tfilter+0x510/0xb30
      [  352.886085]  rtnetlink_rcv_msg+0x471/0x550
      [  352.886875]  netlink_rcv_skb+0xc6/0x1f0
      [  352.887636]  netlink_unicast+0x353/0x480
      [  352.888285]  netlink_sendmsg+0x396/0x680
      [  352.888942]  sock_sendmsg+0x6c/0x80
      [  352.889583]  ____sys_sendmsg+0x3a5/0x3c0
      [  352.890311]  ___sys_sendmsg+0xd8/0x140
      [  352.891019]  __sys_sendmsg+0xb3/0x130
      [  352.891716]  do_syscall_64+0x35/0x80
      [  352.892395]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  352.893666] The buggy address belongs to the object at ffff8881c8251000
                      which belongs to the cache kmalloc-2k of size 2048
      [  352.895696] The buggy address is located 1152 bytes inside of
                      2048-byte region [ffff8881c8251000, ffff8881c8251800)
      [  352.897640] The buggy address belongs to the page:
      [  352.898492] page:00000000213bac35 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c8250
      [  352.900110] head:00000000213bac35 order:3 compound_mapcount:0 compound_pincount:0
      [  352.901541] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      [  352.902908] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042f00
      [  352.904391] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
      [  352.905861] page dumped because: kasan: bad access detected
      
      [  352.907323] Memory state around the buggy address:
      [  352.908218]  ffff8881c8251380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.909471]  ffff8881c8251400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.910735] >ffff8881c8251480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.912012]                    ^
      [  352.912642]  ffff8881c8251500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.913919]  ffff8881c8251580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  352.915185] ==================================================================
      
      Fixes: d39d7149 ("idr: introduce idr_for_each_entry_continue_ul()")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Acked-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5ef1906
    • P
      net: introduce and use lock_sock_fast_nested() · 49054556
      Paolo Abeni 提交于
      Syzkaller reported a false positive deadlock involving
      the nl socket lock and the subflow socket lock:
      
      MPTCP: kernel_bind error, err=-98
      ============================================
      WARNING: possible recursive locking detected
      5.15.0-rc1-syzkaller #0 Not tainted
      --------------------------------------------
      syz-executor998/6520 is trying to acquire lock:
      ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
      
      but task is already holding lock:
      ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
      ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(k-sk_lock-AF_INET);
        lock(k-sk_lock-AF_INET);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by syz-executor998/6520:
       #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802
       #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
       #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790
       #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
       #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720
      
      stack backtrace:
      CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_deadlock_bug kernel/locking/lockdep.c:2944 [inline]
       check_deadlock kernel/locking/lockdep.c:2987 [inline]
       validate_chain kernel/locking/lockdep.c:3776 [inline]
       __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015
       lock_acquire kernel/locking/lockdep.c:5625 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
       lock_sock_fast+0x36/0x100 net/core/sock.c:3229
       mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       __sock_release net/socket.c:649 [inline]
       sock_release+0x87/0x1b0 net/socket.c:677
       mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900
       mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
       genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_no_sendpage+0x101/0x150 net/core/sock.c:2980
       kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504
       kernel_sendpage net/socket.c:3501 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:1003
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       direct_splice_actor+0x110/0x180 fs/splice.c:936
       splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891
       do_splice_direct+0x1b3/0x280 fs/splice.c:979
       do_sendfile+0xae9/0x1240 fs/read_write.c:1249
       __do_sys_sendfile64 fs/read_write.c:1314 [inline]
       __se_sys_sendfile64 fs/read_write.c:1300 [inline]
       __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f215cb69969
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08
      R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c
      R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
      
      the problem originates from uncorrect lock annotation in the mptcp
      code and is only visible since commit 2dcb96ba ("net: core: Correct
      the sock::sk_lock.owned lockdep annotations"), but is present since
      the port-based endpoint support initial implementation.
      
      This patch addresses the issue introducing a nested variant of
      lock_sock_fast() and using it in the relevant code path.
      
      Fixes: 1729cf18 ("mptcp: create the listening socket for new port")
      Fixes: 2dcb96ba ("net: core: Correct the sock::sk_lock.owned lockdep annotations")
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49054556
  8. 29 9月, 2021 1 次提交
  9. 28 9月, 2021 6 次提交
    • K
      af_unix: Return errno instead of NULL in unix_create1(). · f4bd73b5
      Kuniyuki Iwashima 提交于
      unix_create1() returns NULL on error, and the callers assume that it never
      fails for reasons other than out of memory.  So, the callers always return
      -ENOMEM when unix_create1() fails.
      
      However, it also returns NULL when the number of af_unix sockets exceeds
      twice the limit controlled by sysctl: fs.file-max.  In this case, the
      callers should return -ENFILE like alloc_empty_file().
      
      This patch changes unix_create1() to return the correct error value instead
      of NULL on error.
      
      Out of curiosity, the assumption has been wrong since 1999 due to this
      change introduced in 2.2.4 [0].
      
        diff -u --recursive --new-file v2.2.3/linux/net/unix/af_unix.c linux/net/unix/af_unix.c
        --- v2.2.3/linux/net/unix/af_unix.c	Tue Jan 19 11:32:53 1999
        +++ linux/net/unix/af_unix.c	Sun Mar 21 07:22:00 1999
        @@ -388,6 +413,9 @@
         {
         	struct sock *sk;
      
        +	if (atomic_read(&unix_nr_socks) >= 2*max_files)
        +		return NULL;
        +
         	MOD_INC_USE_COUNT;
         	sk = sk_alloc(PF_UNIX, GFP_KERNEL, 1);
         	if (!sk) {
      
      [0]: https://cdn.kernel.org/pub/linux/kernel/v2.2/patch-2.2.4.gz
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4bd73b5
    • E
      net: udp: annotate data race around udp_sk(sk)->corkflag · a9f59707
      Eric Dumazet 提交于
      up->corkflag field can be read or written without any lock.
      Annotate accesses to avoid possible syzbot/KCSAN reports.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9f59707
    • P
      netfilter: nf_tables: reverse order in rule replacement expansion · 2c964c55
      Pablo Neira Ayuso 提交于
      Deactivate old rule first, then append the new rule, so rule replacement
      notification via netlink first reports the deletion of the old rule with
      handle X in first place, then it adds the new rule (reusing the handle X
      of the replaced old rule).
      
      Note that the abort path releases the transaction that has been created
      by nft_delrule() on error.
      
      Fixes: ca089878 ("netfilter: nf_tables: deactivate expressions in rule replecement routine")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2c964c55
    • P
      netfilter: nf_tables: add position handle in event notification · e189ae16
      Pablo Neira Ayuso 提交于
      Add position handle to allow to identify the rule location from netlink
      events. Otherwise, userspace cannot incrementally update a userspace
      cache through monitoring events.
      
      Skip handle dump if the rule has been either inserted (at the beginning
      of the ruleset) or appended (at the end of the ruleset), the
      NLM_F_APPEND netlink flag is sufficient in these two cases.
      
      Handle NLM_F_REPLACE as NLM_F_APPEND since the rule replacement
      expansion appends it after the specified rule handle.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e189ae16
    • F
      netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1 · 339031ba
      Florian Westphal 提交于
      This is a revert of
      7b1957b0 ("netfilter: nf_defrag_ipv4: use net_generic infra")
      and a partial revert of
      8b0adbe3 ("netfilter: nf_defrag_ipv6: use net_generic infra").
      
      If conntrack is builtin and kernel is booted with:
      nf_conntrack.enable_hooks=1
      
      .... kernel will fail to boot due to a NULL deref in
      nf_defrag_ipv4_enable(): Its called before the ipv4 defrag initcall is
      made, so net_generic() returns NULL.
      
      To resolve this, move the user refcount back to struct net so calls
      to those functions are possible even before their initcalls have run.
      
      Fixes: 7b1957b0 ("netfilter: nf_defrag_ipv4: use net_generic infra")
      Fixes: 8b0adbe3 ("netfilter: nf_defrag_ipv6: use net_generic infra").
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      339031ba
    • D
      bpf, test, cgroup: Use sk_{alloc,free} for test cases · 435b08ec
      Daniel Borkmann 提交于
      BPF test infra has some hacks in place which kzalloc() a socket and perform
      minimum init via sock_net_set() and sock_init_data(). As a result, the sk's
      skcd->cgroup is NULL since it didn't go through proper initialization as it
      would have been the case from sk_alloc(). Rather than re-adding a NULL test
      in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test
      socket. The latter also allows to get rid of the bpf_sk_storage_free() special
      case.
      
      Fixes: 8520e224 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode")
      Fixes: b7a1848e ("bpf: add BPF_PROG_TEST_RUN support for flow dissector")
      Fixes: 2cb494a3 ("bpf: add tests for direct packet access from CGROUP_SKB")
      Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
      Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com
      Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net
      435b08ec
  10. 27 9月, 2021 2 次提交
  11. 26 9月, 2021 1 次提交
    • net: prevent user from passing illegal stab size · b193e15a
      王贇 提交于
      We observed below report when playing with netlink sock:
      
        UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10
        shift exponent 249 is too large for 32-bit type
        CPU: 0 PID: 685 Comm: a.out Not tainted
        Call Trace:
         dump_stack_lvl+0x8d/0xcf
         ubsan_epilogue+0xa/0x4e
         __ubsan_handle_shift_out_of_bounds+0x161/0x182
         __qdisc_calculate_pkt_len+0xf0/0x190
         __dev_queue_xmit+0x2ed/0x15b0
      
      it seems like kernel won't check the stab log value passing from
      user, and will use the insane value later to calculate pkt_len.
      
      This patch just add a check on the size/cell_log to avoid insane
      calculation.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b193e15a
  12. 24 9月, 2021 4 次提交
  13. 23 9月, 2021 7 次提交
    • I
      nexthop: Fix memory leaks in nexthop notification chain listeners · 3106a084
      Ido Schimmel 提交于
      syzkaller discovered memory leaks [1] that can be reduced to the
      following commands:
      
       # ip nexthop add id 1 blackhole
       # devlink dev reload pci/0000:06:00.0
      
      As part of the reload flow, mlxsw will unregister its netdevs and then
      unregister from the nexthop notification chain. Before unregistering
      from the notification chain, mlxsw will receive delete notifications for
      nexthop objects using netdevs registered by mlxsw or their uppers. mlxsw
      will not receive notifications for nexthops using netdevs that are not
      dismantled as part of the reload flow. For example, the blackhole
      nexthop above that internally uses the loopback netdev as its nexthop
      device.
      
      One way to fix this problem is to have listeners flush their nexthop
      tables after unregistering from the notification chain. This is
      error-prone as evident by this patch and also not symmetric with the
      registration path where a listener receives a dump of all the existing
      nexthops.
      
      Therefore, fix this problem by replaying delete notifications for the
      listener being unregistered. This is symmetric to the registration path
      and also consistent with the netdev notification chain.
      
      The above means that unregister_nexthop_notifier(), like
      register_nexthop_notifier(), will have to take RTNL in order to iterate
      over the existing nexthops and that any callers of the function cannot
      hold RTNL. This is true for mlxsw and netdevsim, but not for the VXLAN
      driver. To avoid a deadlock, change the latter to unregister its nexthop
      listener without holding RTNL, making it symmetric to the registration
      path.
      
      [1]
      unreferenced object 0xffff88806173d600 (size 512):
        comm "syz-executor.0", pid 1290, jiffies 4295583142 (age 143.507s)
        hex dump (first 32 bytes):
          41 9d 1e 60 80 88 ff ff 08 d6 73 61 80 88 ff ff  A..`......sa....
          08 d6 73 61 80 88 ff ff 01 00 00 00 00 00 00 00  ..sa............
        backtrace:
          [<ffffffff81a6b576>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<ffffffff81a6b576>] slab_post_alloc_hook+0x96/0x490 mm/slab.h:522
          [<ffffffff81a716d3>] slab_alloc_node mm/slub.c:3206 [inline]
          [<ffffffff81a716d3>] slab_alloc mm/slub.c:3214 [inline]
          [<ffffffff81a716d3>] kmem_cache_alloc_trace+0x163/0x370 mm/slub.c:3231
          [<ffffffff82e8681a>] kmalloc include/linux/slab.h:591 [inline]
          [<ffffffff82e8681a>] kzalloc include/linux/slab.h:721 [inline]
          [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_group_create drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4918 [inline]
          [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_new drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5054 [inline]
          [<ffffffff82e8681a>] mlxsw_sp_nexthop_obj_event+0x59a/0x2910 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5239
          [<ffffffff813ef67d>] notifier_call_chain+0xbd/0x210 kernel/notifier.c:83
          [<ffffffff813f0662>] blocking_notifier_call_chain kernel/notifier.c:318 [inline]
          [<ffffffff813f0662>] blocking_notifier_call_chain+0x72/0xa0 kernel/notifier.c:306
          [<ffffffff8384b9c6>] call_nexthop_notifiers+0x156/0x310 net/ipv4/nexthop.c:244
          [<ffffffff83852bd8>] insert_nexthop net/ipv4/nexthop.c:2336 [inline]
          [<ffffffff83852bd8>] nexthop_add net/ipv4/nexthop.c:2644 [inline]
          [<ffffffff83852bd8>] rtm_new_nexthop+0x14e8/0x4d10 net/ipv4/nexthop.c:2913
          [<ffffffff833e9a78>] rtnetlink_rcv_msg+0x448/0xbf0 net/core/rtnetlink.c:5572
          [<ffffffff83608703>] netlink_rcv_skb+0x173/0x480 net/netlink/af_netlink.c:2504
          [<ffffffff833de032>] rtnetlink_rcv+0x22/0x30 net/core/rtnetlink.c:5590
          [<ffffffff836069de>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
          [<ffffffff836069de>] netlink_unicast+0x5ae/0x7f0 net/netlink/af_netlink.c:1340
          [<ffffffff83607501>] netlink_sendmsg+0x8e1/0xe30 net/netlink/af_netlink.c:1929
          [<ffffffff832fde84>] sock_sendmsg_nosec net/socket.c:704 [inline]
          [<ffffffff832fde84>] sock_sendmsg net/socket.c:724 [inline]
          [<ffffffff832fde84>] ____sys_sendmsg+0x874/0x9f0 net/socket.c:2409
          [<ffffffff83304a44>] ___sys_sendmsg+0x104/0x170 net/socket.c:2463
          [<ffffffff83304c01>] __sys_sendmsg+0x111/0x1f0 net/socket.c:2492
          [<ffffffff83304d5d>] __do_sys_sendmsg net/socket.c:2501 [inline]
          [<ffffffff83304d5d>] __se_sys_sendmsg net/socket.c:2499 [inline]
          [<ffffffff83304d5d>] __x64_sys_sendmsg+0x7d/0xc0 net/socket.c:2499
      
      Fixes: 2a014b20 ("mlxsw: spectrum_router: Add support for nexthop objects")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3106a084
    • J
      mac80211: mesh: fix potentially unaligned access · b9731062
      Johannes Berg 提交于
      The pointer here points directly into the frame, so the
      access is potentially unaligned. Use get_unaligned_le16
      to avoid that.
      
      Fixes: 3f52b7e3 ("mac80211: mesh power save basics")
      Link: https://lore.kernel.org/r/20210920154009.3110ff75be0c.Ib6a2ff9e9cc9bc6fca50fce631ec1ce725cc926b@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      b9731062
    • L
      mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotap · 13cb6d82
      Lorenzo Bianconi 提交于
      Limit max values for vht mcs and nss in ieee80211_parse_tx_radiotap
      routine in order to fix the following warning reported by syzbot:
      
      WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
      WARNING: CPU: 0 PID: 10717 at include/net/mac80211.h:989 ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
      Modules linked in:
      CPU: 0 PID: 10717 Comm: syz-executor.5 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:ieee80211_rate_set_vht include/net/mac80211.h:989 [inline]
      RIP: 0010:ieee80211_parse_tx_radiotap+0x101e/0x12d0 net/mac80211/tx.c:2244
      RSP: 0018:ffffc9000186f3e8 EFLAGS: 00010216
      RAX: 0000000000000618 RBX: ffff88804ef76500 RCX: ffffc900143a5000
      RDX: 0000000000040000 RSI: ffffffff888f478e RDI: 0000000000000003
      RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000100
      R10: ffffffff888f46f9 R11: 0000000000000000 R12: 00000000fffffff8
      R13: ffff88804ef7653c R14: 0000000000000001 R15: 0000000000000004
      FS:  00007fbf5718f700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2de23000 CR3: 000000006a671000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Call Trace:
       ieee80211_monitor_select_queue+0xa6/0x250 net/mac80211/iface.c:740
       netdev_core_pick_tx+0x169/0x2e0 net/core/dev.c:4089
       __dev_queue_xmit+0x6f9/0x3710 net/core/dev.c:4165
       __bpf_tx_skb net/core/filter.c:2114 [inline]
       __bpf_redirect_no_mac net/core/filter.c:2139 [inline]
       __bpf_redirect+0x5ba/0xd20 net/core/filter.c:2162
       ____bpf_clone_redirect net/core/filter.c:2429 [inline]
       bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2401
       bpf_prog_eeb6f53a69e5c6a2+0x59/0x234
       bpf_dispatcher_nop_func include/linux/bpf.h:717 [inline]
       __bpf_prog_run include/linux/filter.h:624 [inline]
       bpf_prog_run include/linux/filter.h:631 [inline]
       bpf_test_run+0x381/0xa30 net/bpf/test_run.c:119
       bpf_prog_test_run_skb+0xb84/0x1ee0 net/bpf/test_run.c:663
       bpf_prog_test_run kernel/bpf/syscall.c:3307 [inline]
       __sys_bpf+0x2137/0x5df0 kernel/bpf/syscall.c:4605
       __do_sys_bpf kernel/bpf/syscall.c:4691 [inline]
       __se_sys_bpf kernel/bpf/syscall.c:4689 [inline]
       __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4689
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      
      Reported-by: syzbot+0196ac871673f0c20f68@syzkaller.appspotmail.com
      Fixes: 646e76bb ("mac80211: parse VHT info in injected frames")
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/c26c3f02dcb38ab63b2f2534cb463d95ee81bb13.1632141760.git.lorenzo@kernel.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      13cb6d82
    • Y
      mac80211: Drop frames from invalid MAC address in ad-hoc mode · a6555f84
      YueHaibing 提交于
      WARNING: CPU: 1 PID: 9 at net/mac80211/sta_info.c:554
      sta_info_insert_rcu+0x121/0x12a0
      Modules linked in:
      CPU: 1 PID: 9 Comm: kworker/u8:1 Not tainted 5.14.0-rc7+ #253
      Workqueue: phy3 ieee80211_iface_work
      RIP: 0010:sta_info_insert_rcu+0x121/0x12a0
      ...
      Call Trace:
       ieee80211_ibss_finish_sta+0xbc/0x170
       ieee80211_ibss_work+0x13f/0x7d0
       ieee80211_iface_work+0x37a/0x500
       process_one_work+0x357/0x850
       worker_thread+0x41/0x4d0
      
      If an Ad-Hoc node receives packets with invalid source MAC address,
      it hits a WARN_ON in sta_info_insert_check(), this can spam the log.
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Link: https://lore.kernel.org/r/20210827144230.39944-1-yuehaibing@huawei.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      a6555f84
    • C
      mac80211: Fix ieee80211_amsdu_aggregate frag_tail bug · fe94bac6
      Chih-Kang Chang 提交于
      In ieee80211_amsdu_aggregate() set a pointer frag_tail point to the
      end of skb_shinfo(head)->frag_list, and use it to bind other skb in
      the end of this function. But when execute ieee80211_amsdu_aggregate()
      ->ieee80211_amsdu_realloc_pad()->pskb_expand_head(), the address of
      skb_shinfo(head)->frag_list will be changed. However, the
      ieee80211_amsdu_aggregate() not update frag_tail after call
      pskb_expand_head(). That will cause the second skb can't bind to the
      head skb appropriately.So we update the address of frag_tail to fix it.
      
      Fixes: 6e0456b5 ("mac80211: add A-MSDU tx support")
      Signed-off-by: NChih-Kang Chang <gary.chang@realtek.com>
      Signed-off-by: NZong-Zhe Yang <kevin_yang@realtek.com>
      Signed-off-by: NPing-Ke Shih <pkshih@realtek.com>
      Link: https://lore.kernel.org/r/20210830073240.12736-1-pkshih@realtek.com
      [reword comment]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      fe94bac6
    • F
      Revert "mac80211: do not use low data rates for data frames with no ack flag" · 98d46b02
      Felix Fietkau 提交于
      This reverts commit d3333223 ("mac80211: do not use low data rates for
      data frames with no ack flag").
      
      Returning false early in rate_control_send_low breaks sending broadcast
      packets, since rate control will not select a rate for it.
      
      Before re-introducing a fixed version of this patch, we should probably also
      make some changes to rate control to be more conservative in selecting rates
      for no-ack packets and also prevent using probing rates on them, since we won't
      get any feedback.
      
      Fixes: d3333223 ("mac80211: do not use low data rates for data frames with no ack flag")
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Link: https://lore.kernel.org/r/20210906083559.9109-1-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      98d46b02
    • N
      xfrm: fix rcu lock in xfrm_notify_userpolicy() · 93ec1320
      Nicolas Dichtel 提交于
      As stated in the comment above xfrm_nlmsg_multicast(), rcu read lock must
      be held before calling this function.
      
      Reported-by: syzbot+3d9866419b4aa8f985d6@syzkaller.appspotmail.com
      Fixes: 703b94b93c19 ("xfrm: notify default policy on update")
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      93ec1320
  14. 22 9月, 2021 1 次提交
    • P
      mptcp: ensure tx skbs always have the MPTCP ext · 977d293e
      Paolo Abeni 提交于
      Due to signed/unsigned comparison, the expression:
      
      	info->size_goal - skb->len > 0
      
      evaluates to true when the size goal is smaller than the
      skb size. That results in lack of tx cache refill, so that
      the skb allocated by the core TCP code lacks the required
      MPTCP skb extensions.
      
      Due to the above, syzbot is able to trigger the following WARN_ON():
      
      WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Modules linked in:
      CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
      RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
      RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
      RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
      RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
      R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
      FS:  00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
       mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
       release_sock+0xb4/0x1b0 net/core/sock.c:3206
       sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
       mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
       inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2163 [inline]
       new_sync_write+0x40b/0x640 fs/read_write.c:507
       vfs_write+0x7cf/0xae0 fs/read_write.c:594
       ksys_write+0x1ee/0x250 fs/read_write.c:647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
      RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
      R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000
      
      Fix the issue rewriting the relevant expression to avoid
      sign-related problems - note: size_goal is always >= 0.
      
      Additionally, ensure that the skb in the tx cache always carries
      the relevant extension.
      
      Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
      Fixes: 1094c6fe ("mptcp: fix possible divide by zero")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      977d293e
  15. 21 9月, 2021 3 次提交
    • V
      net: dsa: don't allocate the slave_mii_bus using devres · 5135e96a
      Vladimir Oltean 提交于
      The Linux device model permits both the ->shutdown and ->remove driver
      methods to get called during a shutdown procedure. Example: a DSA switch
      which sits on an SPI bus, and the SPI bus driver calls this on its
      ->shutdown method:
      
      spi_unregister_controller
      -> device_for_each_child(&ctlr->dev, NULL, __unregister);
         -> spi_unregister_device(to_spi_device(dev));
            -> device_del(&spi->dev);
      
      So this is a simple pattern which can theoretically appear on any bus,
      although the only other buses on which I've been able to find it are
      I2C:
      
      i2c_del_adapter
      -> device_for_each_child(&adap->dev, NULL, __unregister_client);
         -> i2c_unregister_device(client);
            -> device_unregister(&client->dev);
      
      The implication of this pattern is that devices on these buses can be
      unregistered after having been shut down. The drivers for these devices
      might choose to return early either from ->remove or ->shutdown if the
      other callback has already run once, and they might choose that the
      ->shutdown method should only perform a subset of the teardown done by
      ->remove (to avoid unnecessary delays when rebooting).
      
      So in other words, the device driver may choose on ->remove to not
      do anything (therefore to not unregister an MDIO bus it has registered
      on ->probe), because this ->remove is actually triggered by the
      device_shutdown path, and its ->shutdown method has already run and done
      the minimally required cleanup.
      
      This used to be fine until the blamed commit, but now, the following
      BUG_ON triggers:
      
      void mdiobus_free(struct mii_bus *bus)
      {
      	/* For compatibility with error handling in drivers. */
      	if (bus->state == MDIOBUS_ALLOCATED) {
      		kfree(bus);
      		return;
      	}
      
      	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
      	bus->state = MDIOBUS_RELEASED;
      
      	put_device(&bus->dev);
      }
      
      In other words, there is an attempt to free an MDIO bus which was not
      unregistered. The attempt to free it comes from the devres release
      callbacks of the SPI device, which are executed after the device is
      unregistered.
      
      I'm not saying that the fact that MDIO buses allocated using devres
      would automatically get unregistered wasn't strange. I'm just saying
      that the commit didn't care about auditing existing call paths in the
      kernel, and now, the following code sequences are potentially buggy:
      
      (a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
          located on a bus that unregisters its children on shutdown. After
          the blamed patch, either both the alloc and the register should use
          devres, or none should.
      
      (b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
          mdiobus_unregister at all in the remove path. After the blamed
          patch, nobody unregisters the MDIO bus anymore, so this is even more
          buggy than the previous case which needs a specific bus
          configuration to be seen, this one is an unconditional bug.
      
      In this case, DSA falls into category (a), it tries to be helpful and
      registers an MDIO bus on behalf of the switch, which might be on such a
      bus. I've no idea why it does it under devres.
      
      It does this on probe:
      
      	if (!ds->slave_mii_bus && ds->ops->phy_read)
      		alloc and register mdio bus
      
      and this on remove:
      
      	if (ds->slave_mii_bus && ds->ops->phy_read)
      		unregister mdio bus
      
      I _could_ imagine using devres because the condition used on remove is
      different than the condition used on probe. So strictly speaking, DSA
      cannot determine whether the ds->slave_mii_bus it sees on remove is the
      ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
      have solved that problem. But nonetheless, the existing code already
      proceeds to unregister the MDIO bus, even though it might be
      unregistering an MDIO bus it has never registered. So I can only guess
      that no driver that implements ds->ops->phy_read also allocates and
      registers ds->slave_mii_bus itself.
      
      So in that case, if unregistering is fine, freeing must be fine too.
      
      Stop using devres and free the MDIO bus manually. This will make devres
      stop attempting to free a still registered MDIO bus on ->shutdown.
      
      Fixes: ac3a68d5 ("net: phy: don't abuse devres in devm_mdiobus_register()")
      Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5135e96a
    • V
      net: dsa: fix dsa_tree_setup error path · e5845aa0
      Vladimir Oltean 提交于
      Since the blamed commit, dsa_tree_teardown_switches() was split into two
      smaller functions, dsa_tree_teardown_switches and dsa_tree_teardown_ports.
      
      However, the error path of dsa_tree_setup stopped calling dsa_tree_teardown_ports.
      
      Fixes: a57d8c21 ("net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5845aa0
    • K
      net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work · a18cee47
      Karsten Graul 提交于
      The abort_work is scheduled when a connection was detected to be
      out-of-sync after a link failure. The work calls smc_conn_kill(),
      which calls smc_close_active_abort() and that might end up calling
      smc_close_cancel_work().
      smc_close_cancel_work() cancels any pending close_work and tx_work but
      needs to release the sock_lock before and acquires the sock_lock again
      afterwards. So when the sock_lock was NOT acquired before then it may
      be held after the abort_work completes. Thats why the sock_lock is
      acquired before the call to smc_conn_kill() in __smc_lgr_terminate(),
      but this is missing in smc_conn_abort_work().
      
      Fix that by acquiring the sock_lock first and release it after the
      call to smc_conn_kill().
      
      Fixes: b286a065 ("net/smc: handle incoming CDC validation message")
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a18cee47