1. 25 4月, 2022 8 次提交
    • E
      tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT · 4bfe744f
      Eric Dumazet 提交于
      I had this bug sitting for too long in my pile, it is time to fix it.
      
      Thanks to Doug Porter for reminding me of it!
      
      We had various attempts in the past, including commit
      0cbe6a8f ("tcp: remove SOCK_QUEUE_SHRUNK"),
      but the issue is that TCP stack currently only generates
      EPOLLOUT from input path, when tp->snd_una has advanced
      and skb(s) cleaned from rtx queue.
      
      If a flow has a big RTT, and/or receives SACKs, it is possible
      that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
      and no more data can be sent until tp->snd_una finally advances.
      
      What is needed is to also check if POLLOUT needs to be generated
      whenever tp->snd_nxt is advanced, from output path.
      
      This bug triggers more often after an idle period, as
      we do not receive ACK for at least one RTT. tcp_notsent_lowat
      could be a fraction of what CWND and pacing rate would allow to
      send during this RTT.
      
      In a followup patch, I will remove the bogus call
      to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
      from tcp_check_space(). Fact that we have decided to generate
      an EPOLLOUT does not mean the application has immediately
      refilled the transmit queue. This optimistic call
      might have been the reason the bug seemed not too serious.
      
      Tested:
      
      200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]
      
      $ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
      $ cat bench_rr.sh
      SUM=0
      for i in {1..10}
      do
       V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
       echo $V
       SUM=$(($SUM + $V))
      done
      echo SUM=$SUM
      
      Before patch:
      $ bench_rr.sh
      130000000
      80000000
      140000000
      140000000
      140000000
      140000000
      130000000
      40000000
      90000000
      110000000
      SUM=1140000000
      
      After patch:
      $ bench_rr.sh
      430000000
      590000000
      530000000
      450000000
      450000000
      350000000
      450000000
      490000000
      480000000
      460000000
      SUM=4680000000  # This is 410 % of the value before patch.
      
      Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDoug Porter <dsp@fb.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bfe744f
    • V
      net: dsa: flood multicast to CPU when slave has IFF_PROMISC · 7c762e70
      Vladimir Oltean 提交于
      Certain DSA switches can eliminate flooding to the CPU when none of the
      ports have the IFF_ALLMULTI or IFF_PROMISC flags set. This is done by
      synthesizing a call to dsa_port_bridge_flags() for the CPU port, a call
      which normally comes from the bridge driver via switchdev.
      
      The bridge port flags and IFF_PROMISC|IFF_ALLMULTI have slightly
      different semantics, and due to inattention/lack of proper testing, the
      IFF_PROMISC flag allows unknown unicast to be flooded to the CPU, but
      not unknown multicast.
      
      This must be fixed by setting both BR_FLOOD (unicast) and BR_MCAST_FLOOD
      in the synthesized dsa_port_bridge_flags() call, since IFF_PROMISC means
      that packets should not be filtered regardless of their MAC DA.
      
      Fixes: 7569459a ("net: dsa: manage flooding on the CPU ports")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c762e70
    • P
      ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode · 31c417c9
      Peilin Ye 提交于
      As pointed out by Jakub Kicinski, currently using TUNNEL_SEQ in
      collect_md mode is racy for [IP6]GRE[TAP] devices.  Consider the
      following sequence of events:
      
      1. An [IP6]GRE[TAP] device is created in collect_md mode using "ip link
         add ... external".  "ip" ignores "[o]seq" if "external" is specified,
         so TUNNEL_SEQ is off, and the device is marked as NETIF_F_LLTX (i.e.
         it uses lockless TX);
      2. Someone sets TUNNEL_SEQ on outgoing skb's, using e.g.
         bpf_skb_set_tunnel_key() in an eBPF program attached to this device;
      3. gre_fb_xmit() or __gre6_xmit() processes these skb's:
      
      	gre_build_header(skb, tun_hlen,
      			 flags, protocol,
      			 tunnel_id_to_key32(tun_info->key.tun_id),
      			 (flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++)
      					      : 0);   ^^^^^^^^^^^^^^^^^
      
      Since we are not using the TX lock (&txq->_xmit_lock), multiple CPUs may
      try to do this tunnel->o_seqno++ in parallel, which is racy.  Fix it by
      making o_seqno atomic_t.
      
      As mentioned by Eric Dumazet in commit b790e01a ("ip_gre: lockless
      xmit"), making o_seqno atomic_t increases "chance for packets being out
      of order at receiver" when NETIF_F_LLTX is on.
      
      Maybe a better fix would be:
      
      1. Do not ignore "oseq" in external mode.  Users MUST specify "oseq" if
         they want the kernel to allow sequencing of outgoing packets;
      2. Reject all outgoing TUNNEL_SEQ packets if the device was not created
         with "oseq".
      
      Unfortunately, that would break userspace.
      
      We could now make [IP6]GRE[TAP] devices always NETIF_F_LLTX, but let us
      do it in separate patches to keep this fix minimal.
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Fixes: 77a5196a ("gre: add sequence number for collect md mode.")
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31c417c9
    • P
      ip6_gre: Make o_seqno start from 0 in native mode · fde98ae9
      Peilin Ye 提交于
      For IP6GRE and IP6GRETAP devices, currently o_seqno starts from 1 in
      native mode.  According to RFC 2890 2.2., "The first datagram is sent
      with a sequence number of 0."  Fix it.
      
      It is worth mentioning that o_seqno already starts from 0 in collect_md
      mode, see the "if (tunnel->parms.collect_md)" clause in __gre6_xmit(),
      where tunnel->o_seqno is passed to gre_build_header() before getting
      incremented.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fde98ae9
    • P
      ip_gre: Make o_seqno start from 0 in native mode · ff827beb
      Peilin Ye 提交于
      For GRE and GRETAP devices, currently o_seqno starts from 1 in native
      mode.  According to RFC 2890 2.2., "The first datagram is sent with a
      sequence number of 0."  Fix it.
      
      It is worth mentioning that o_seqno already starts from 0 in collect_md
      mode, see gre_fb_xmit(), where tunnel->o_seqno is passed to
      gre_build_header() before getting incremented.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff827beb
    • L
      net/smc: sync err code when tcp connection was refused · 4e2e65e2
      liuyacan 提交于
      In the current implementation, when TCP initiates a connection
      to an unavailable [ip,port], ECONNREFUSED will be stored in the
      TCP socket, but SMC will not. However, some apps (like curl) use
      getsockopt(,,SO_ERROR,,) to get the error information, which makes
      them miss the error message and behave strangely.
      
      Fixes: 50717a37 ("net/smc: nonblocking connect rework")
      Signed-off-by: Nliuyacan <liuyacan@corp.netease.com>
      Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
      Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e2e65e2
    • M
      netfilter: Update ip6_route_me_harder to consider L3 domain · 8ddffdb9
      Martin Willi 提交于
      The commit referenced below fixed packet re-routing if Netfilter mangles
      a routing key property of a packet and the packet is routed in a VRF L3
      domain. The fix, however, addressed IPv4 re-routing, only.
      
      This commit applies the same behavior for IPv6. While at it, untangle
      the nested ternary operator to make the code more readable.
      
      Fixes: 6d8b49c3 ("netfilter: Update ip_route_me_harder to consider L3 domain")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8ddffdb9
    • R
      netfilter: flowtable: Remove the empty file · b9b1e0da
      Rongguang Wei 提交于
      CONFIG_NF_FLOW_TABLE_IPV4 is already removed and the real user is also
      removed(nf_flow_table_ipv4.c is empty).
      
      Fixes: c42ba429 ("netfilter: flowtable: remove ipv4/ipv6 modules")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b9b1e0da
  2. 24 4月, 2022 1 次提交
    • X
      sctp: check asoc strreset_chunk in sctp_generate_reconf_event · 165e3e17
      Xin Long 提交于
      A null pointer reference issue can be triggered when the response of a
      stream reconf request arrives after the timer is triggered, such as:
      
        send Incoming SSN Reset Request --->
        CPU0:
         reconf timer is triggered,
         go to the handler code before hold sk lock
                                  <--- reply with Outgoing SSN Reset Request
        CPU1:
         process Outgoing SSN Reset Request,
         and set asoc->strreset_chunk to NULL
        CPU0:
         continue the handler code, hold sk lock,
         and try to hold asoc->strreset_chunk, crash!
      
      In Ying Xu's testing, the call trace is:
      
        [ ] BUG: kernel NULL pointer dereference, address: 0000000000000010
        [ ] RIP: 0010:sctp_chunk_hold+0xe/0x40 [sctp]
        [ ] Call Trace:
        [ ]  <IRQ>
        [ ]  sctp_sf_send_reconf+0x2c/0x100 [sctp]
        [ ]  sctp_do_sm+0xa4/0x220 [sctp]
        [ ]  sctp_generate_reconf_event+0xbd/0xe0 [sctp]
        [ ]  call_timer_fn+0x26/0x130
      
      This patch is to fix it by returning from the timer handler if asoc
      strreset_chunk is already set to NULL.
      
      Fixes: 7b9438de ("sctp: add stream reconf timer")
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      165e3e17
  3. 23 4月, 2022 3 次提交
  4. 22 4月, 2022 2 次提交
  5. 19 4月, 2022 3 次提交
    • E
      netlink: reset network and mac headers in netlink_dump() · 99c07327
      Eric Dumazet 提交于
      netlink_dump() is allocating an skb, reserves space in it
      but forgets to reset network header.
      
      This allows a BPF program, invoked later from sk_filter()
      to access uninitialized kernel memory from the reserved
      space.
      
      Theorically mac header reset could be omitted, because
      it is set to a special initial value.
      bpf_internal_load_pointer_neg_helper calls skb_mac_header()
      without checking skb_mac_header_was_set().
      Relying on skb->len not being too big seems fragile.
      We also could add a sanity check in bpf_internal_load_pointer_neg_helper()
      to avoid surprises in the future.
      
      syzbot report was:
      
      BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
       ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
       __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
       bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
       bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
       sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
       sk_filter include/linux/filter.h:905 [inline]
       netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was stored to memory at:
       ___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558
       __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
       bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
       bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
       sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
       sk_filter include/linux/filter.h:905 [inline]
       netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:737 [inline]
       slab_alloc_node mm/slub.c:3244 [inline]
       __kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972
       kmalloc_reserve net/core/skbuff.c:354 [inline]
       __alloc_skb+0x545/0xf90 net/core/skbuff.c:426
       alloc_skb include/linux/skbuff.h:1158 [inline]
       netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: db65a3aa ("netlink: Trim skb to alloc size to avoid MSG_TRUNC")
      Fixes: 9063e21f ("netlink: autosize skb lengthes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      99c07327
    • P
      ipvs: correctly print the memory size of ip_vs_conn_tab · eba1a872
      Pengcheng Yang 提交于
      The memory size of ip_vs_conn_tab changed after we use hlist
      instead of list.
      
      Fixes: 731109e7 ("ipvs: use hlist instead of list")
      Signed-off-by: NPengcheng Yang <yangpc@wangsu.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eba1a872
    • K
      net: dsa: hellcreek: Calculate checksums in tagger · 0763120b
      Kurt Kanzenbach 提交于
      In case the checksum calculation is offloaded to the DSA master network
      interface, it will include the switch trailing tag. As soon as the switch strips
      that tag on egress, the calculated checksum is wrong.
      
      Therefore, add the checksum calculation to the tagger (if required) before
      adding the switch tag. This way, the hellcreek code works with all DSA master
      interfaces regardless of their declared feature set.
      
      Fixes: 01ef09ca ("net: dsa: Add tag handling for Hirschmann Hellcreek switches")
      Signed-off-by: NKurt Kanzenbach <kurt@linutronix.de>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.deSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      0763120b
  6. 17 4月, 2022 1 次提交
  7. 16 4月, 2022 5 次提交
    • E
      ipv6: make ip6_rt_gc_expire an atomic_t · 9cb7c013
      Eric Dumazet 提交于
      Reads and Writes to ip6_rt_gc_expire always have been racy,
      as syzbot reported lately [1]
      
      There is a possible risk of under-flow, leading
      to unexpected high value passed to fib6_run_gc(),
      although I have not observed this in the field.
      
      Hosts hitting ip6_dst_gc() very hard are under pretty bad
      state anyway.
      
      [1]
      BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc
      
      read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000bb3 -> 0x00000ba9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: mld mld_ifc_work
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9cb7c013
    • D
      net: Handle l3mdev in ip_tunnel_init_flow · db53cd3d
      David Ahern 提交于
      Ido reported that the commit referenced in the Fixes tag broke
      a gre use case with dummy devices. Add a check to ip_tunnel_init_flow
      to see if the oif is an l3mdev port and if so set the oif to 0 to
      avoid the oif comparison in fib_lookup_good_nhc.
      
      Fixes: 40867d74 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      db53cd3d
    • D
      l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu · 83daab06
      David Ahern 提交于
      Next patch uses l3mdev_master_upper_ifindex_by_index_rcu which throws
      a splat with debug kernels:
      
      [13783.087570] ------------[ cut here ]------------
      [13783.093974] RTNL: assertion failed at net/core/dev.c (6702)
      [13783.100761] WARNING: CPU: 3 PID: 51132 at net/core/dev.c:6702 netdev_master_upper_dev_get+0x16a/0x1a0
      
      [13783.184226] CPU: 3 PID: 51132 Comm: kworker/3:3 Not tainted 5.17.0-custom-100090-g6f963aafb1cc #682
      [13783.194788] Hardware name: Mellanox Technologies Ltd. MSN2010/SA002610, BIOS 5.6.5 08/24/2017
      [13783.204755] Workqueue: mld mld_ifc_work [ipv6]
      [13783.210338] RIP: 0010:netdev_master_upper_dev_get+0x16a/0x1a0
      [13783.217209] Code: 0f 85 e3 fe ff ff e8 65 ac ec fe ba 2e 1a 00 00 48 c7 c6 60 6f 38 83 48 c7 c7 c0 70 38 83 c6 05 5e b5 d7 01 01 e8 c6 29 52 00 <0f> 0b e9 b8 fe ff ff e8 5a 6c 35 ff e9 1c ff ff ff 48 89 ef e8 7d
      [13783.238659] RSP: 0018:ffffc9000b37f5a8 EFLAGS: 00010286
      [13783.244995] RAX: 0000000000000000 RBX: ffff88812ee5c000 RCX: 0000000000000000
      [13783.253379] RDX: ffff88811ce09d40 RSI: ffffffff812d0fcd RDI: fffff5200166fea7
      [13783.261769] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8882375f4287
      [13783.270138] R10: ffffed1046ebe850 R11: 0000000000000001 R12: dffffc0000000000
      [13783.278510] R13: 0000000000000275 R14: ffffc9000b37f688 R15: ffff8881273b4af8
      [13783.286870] FS:  0000000000000000(0000) GS:ffff888237400000(0000) knlGS:0000000000000000
      [13783.296352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [13783.303177] CR2: 00007ff25fc9b2e8 CR3: 0000000174d23000 CR4: 00000000001006e0
      [13783.311546] Call Trace:
      [13783.314660]  <TASK>
      [13783.317553]  l3mdev_master_upper_ifindex_by_index_rcu+0x43/0xe0
      ...
      
      Change l3mdev_master_upper_ifindex_by_index_rcu to use
      netdev_master_upper_dev_get_rcu.
      
      Fixes: 6a6d6681 ("l3mdev: add function to retreive upper master")
      Signed-off-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Cc: Alexis Bauvin <abauvin@scaleway.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      83daab06
    • E
      net/sched: cls_u32: fix possible leak in u32_init_knode() · ec5b0f60
      Eric Dumazet 提交于
      While investigating a related syzbot report,
      I found that whenever call to tcf_exts_init()
      from u32_init_knode() is failing, we end up
      with an elevated refcount on ht->refcnt
      
      To avoid that, only increase the refcount after
      all possible errors have been evaluated.
      
      Fixes: b9a24bb7 ("net_sched: properly handle failure case of tcf_exts_init()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ec5b0f60
    • E
      net/sched: cls_u32: fix netns refcount changes in u32_change() · 3db09e76
      Eric Dumazet 提交于
      We are now able to detect extra put_net() at the moment
      they happen, instead of much later in correct code paths.
      
      u32_init_knode() / tcf_exts_init() populates the ->exts.net
      pointer, but as mentioned in tcf_exts_init(),
      the refcount on netns has not been elevated yet.
      
      The refcount is taken only once tcf_exts_get_net()
      is called.
      
      So the two u32_destroy_key() calls from u32_change()
      are attempting to release an invalid reference on the netns.
      
      syzbot report:
      
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 0 PID: 21708 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
      Modules linked in:
      CPU: 0 PID: 21708 Comm: syz-executor.5 Not tainted 5.18.0-rc2-next-20220412-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
      Code: 1d 14 b6 b2 09 31 ff 89 de e8 6d e9 89 fd 84 db 75 e0 e8 84 e5 89 fd 48 c7 c7 40 aa 26 8a c6 05 f4 b5 b2 09 01 e8 e5 81 2e 05 <0f> 0b eb c4 e8 68 e5 89 fd 0f b6 1d e3 b5 b2 09 31 ff 89 de e8 38
      RSP: 0018:ffffc900051af1b0 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000040000 RSI: ffffffff8160a0c8 RDI: fffff52000a35e28
      RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff81604a9e R11: 0000000000000000 R12: 1ffff92000a35e3b
      R13: 00000000ffffffef R14: ffff8880211a0194 R15: ffff8880577d0a00
      FS:  00007f25d183e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f19c859c028 CR3: 0000000051009000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       __refcount_dec include/linux/refcount.h:344 [inline]
       refcount_dec include/linux/refcount.h:359 [inline]
       ref_tracker_free+0x535/0x6b0 lib/ref_tracker.c:118
       netns_tracker_free include/net/net_namespace.h:327 [inline]
       put_net_track include/net/net_namespace.h:341 [inline]
       tcf_exts_put_net include/net/pkt_cls.h:255 [inline]
       u32_destroy_key.isra.0+0xa7/0x2b0 net/sched/cls_u32.c:394
       u32_change+0xe01/0x3140 net/sched/cls_u32.c:909
       tc_new_tfilter+0x98d/0x2200 net/sched/cls_api.c:2148
       rtnetlink_rcv_msg+0x80d/0xb80 net/core/rtnetlink.c:6016
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2495
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1921
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e2/0x800 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f25d0689049
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f25d183e168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f25d079c030 RCX: 00007f25d0689049
      RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000005
      RBP: 00007f25d06e308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd0b752e3f R14: 00007f25d183e300 R15: 0000000000022000
       </TASK>
      
      Fixes: 35c55fc1 ("cls_u32: use tcf_exts_get_net() before call_rcu()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3db09e76
  8. 15 4月, 2022 6 次提交
    • P
      openvswitch: fix OOB access in reserve_sfa_size() · cefa91b2
      Paolo Valerio 提交于
      Given a sufficiently large number of actions, while copying and
      reserving memory for a new action of a new flow, if next_offset is
      greater than MAX_ACTIONS_BUFSIZE, the function reserve_sfa_size() does
      not return -EMSGSIZE as expected, but it allocates MAX_ACTIONS_BUFSIZE
      bytes increasing actions_len by req_size. This can then lead to an OOB
      write access, especially when further actions need to be copied.
      
      Fix it by rearranging the flow action size check.
      
      KASAN splat below:
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in reserve_sfa_size+0x1ba/0x380 [openvswitch]
      Write of size 65360 at addr ffff888147e4001c by task handler15/836
      
      CPU: 1 PID: 836 Comm: handler15 Not tainted 5.18.0-rc1+ #27
      ...
      Call Trace:
       <TASK>
       dump_stack_lvl+0x45/0x5a
       print_report.cold+0x5e/0x5db
       ? __lock_text_start+0x8/0x8
       ? reserve_sfa_size+0x1ba/0x380 [openvswitch]
       kasan_report+0xb5/0x130
       ? reserve_sfa_size+0x1ba/0x380 [openvswitch]
       kasan_check_range+0xf5/0x1d0
       memcpy+0x39/0x60
       reserve_sfa_size+0x1ba/0x380 [openvswitch]
       __add_action+0x24/0x120 [openvswitch]
       ovs_nla_add_action+0xe/0x20 [openvswitch]
       ovs_ct_copy_action+0x29d/0x1130 [openvswitch]
       ? __kernel_text_address+0xe/0x30
       ? unwind_get_return_address+0x56/0xa0
       ? create_prof_cpu_mask+0x20/0x20
       ? ovs_ct_verify+0xf0/0xf0 [openvswitch]
       ? prep_compound_page+0x198/0x2a0
       ? __kasan_check_byte+0x10/0x40
       ? kasan_unpoison+0x40/0x70
       ? ksize+0x44/0x60
       ? reserve_sfa_size+0x75/0x380 [openvswitch]
       __ovs_nla_copy_actions+0xc26/0x2070 [openvswitch]
       ? __zone_watermark_ok+0x420/0x420
       ? validate_set.constprop.0+0xc90/0xc90 [openvswitch]
       ? __alloc_pages+0x1a9/0x3e0
       ? __alloc_pages_slowpath.constprop.0+0x1da0/0x1da0
       ? unwind_next_frame+0x991/0x1e40
       ? __mod_node_page_state+0x99/0x120
       ? __mod_lruvec_page_state+0x2e3/0x470
       ? __kasan_kmalloc_large+0x90/0xe0
       ovs_nla_copy_actions+0x1b4/0x2c0 [openvswitch]
       ovs_flow_cmd_new+0x3cd/0xb10 [openvswitch]
       ...
      
      Cc: stable@vger.kernel.org
      Fixes: f28cd2af ("openvswitch: fix flow actions reallocation")
      Signed-off-by: NPaolo Valerio <pvalerio@redhat.com>
      Acked-by: NEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cefa91b2
    • P
      ip6_gre: Fix skb_under_panic in __gre6_xmit() · ab198e1d
      Peilin Ye 提交于
      Feng reported an skb_under_panic BUG triggered by running
      test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:
      
      [   82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
      <...>
      [   82.607380] Call Trace:
      [   82.609389]  <TASK>
      [   82.611136]  skb_push.cold.109+0x10/0x10
      [   82.614289]  __gre6_xmit+0x41e/0x590
      [   82.617169]  ip6gre_tunnel_xmit+0x344/0x3f0
      [   82.620526]  dev_hard_start_xmit+0xf1/0x330
      [   82.623882]  sch_direct_xmit+0xe4/0x250
      [   82.626961]  __dev_queue_xmit+0x720/0xfe0
      <...>
      [   82.633431]  packet_sendmsg+0x96a/0x1cb0
      [   82.636568]  sock_sendmsg+0x30/0x40
      <...>
      
      The following sequence of events caused the BUG:
      
      1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
         calculated based on old flags (see ip6gre_calc_hlen());
      2. packet_snd() reserves header room for skb A, assuming
         tunnel->tun_hlen is 4;
      3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
         skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
      4. __gre6_xmit() detects the new tunnel key, and recalculates
         "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
         TUNNEL_SEQ);
      5. gre_build_header() calls skb_push() with insufficient reserved header
         room, triggering the BUG.
      
      As sugguested by Cong, fix it by moving the call to skb_cow_head() after
      the recalculation of tun_hlen.
      
      Reproducer:
      
        OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o
      
        ip netns add at_ns0
        ip link add veth0 type veth peer name veth1
        ip link set veth0 netns at_ns0
        ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
        ip netns exec at_ns0 ip link set dev veth0 up
        ip link set dev veth1 up mtu 1500
        ip addr add dev veth1 172.16.1.200/24
      
        ip netns exec at_ns0 ip addr add ::11/96 dev veth0
        ip netns exec at_ns0 ip link set dev veth0 up
        ip addr add dev veth1 ::22/96
        ip link set dev veth1 up
      
        ip netns exec at_ns0 \
        	ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
        	local ::11 remote ::22
      
        ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
        ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
        ip netns exec at_ns0 ip link set dev ip6gretap00 up
      
        ip link add dev ip6gretap11 type ip6gretap external
        ip addr add dev ip6gretap11 10.1.1.200/24
        ip addr add dev ip6gretap11 fc80::200/24
        ip link set dev ip6gretap11 up
      
        tc qdisc add dev ip6gretap11 clsact
        tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
        tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel
      
        ping6 -c 3 -w 10 -q ::11
      
      Fixes: 6712abc1 ("ip6_gre: add ip6 gre and gretap collect_md mode")
      Reported-by: NFeng Zhou <zhoufeng.zf@bytedance.com>
      Co-developed-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab198e1d
    • P
      ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() · f40c064e
      Peilin Ye 提交于
      Do not update tunnel->tun_hlen in data plane code.  Use a local variable
      instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit().
      Co-developed-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40c064e
    • H
      net/packet: fix packet_sock xmit return value checking · 29e8e659
      Hangbin Liu 提交于
      packet_sock xmit could be dev_queue_xmit, which also returns negative
      errors. So only checking positive errors is not enough, or userspace
      sendmsg may return success while packet is not send out.
      
      Move the net_xmit_errno() assignment in the braces as checkpatch.pl said
      do not use assignment in if condition.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29e8e659
    • T
      net/smc: Fix sock leak when release after smc_shutdown() · 1a74e993
      Tony Lu 提交于
      Since commit e5d5aadc ("net/smc: fix sk_refcnt underflow on linkdown
      and fallback"), for a fallback connection, __smc_release() does not call
      sock_put() if its state is already SMC_CLOSED.
      
      When calling smc_shutdown() after falling back, its state is set to
      SMC_CLOSED but does not call sock_put(), so this patch calls it.
      
      Reported-and-tested-by: syzbot+6e29a053eb165bd50de5@syzkaller.appspotmail.com
      Fixes: e5d5aadc ("net/smc: fix sk_refcnt underflow on linkdown and fallback")
      Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
      Acked-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a74e993
    • D
      rxrpc: Restore removed timer deletion · ee3b0826
      David Howells 提交于
      A recent patch[1] from Eric Dumazet flipped the order in which the
      keepalive timer and the keepalive worker were cancelled in order to fix a
      syzbot reported issue[2].  Unfortunately, this enables the mirror image bug
      whereby the timer races with rxrpc_exit_net(), restarting the worker after
      it has been cancelled:
      
      	CPU 1		CPU 2
      	===============	=====================
      			if (rxnet->live)
      			<INTERRUPT>
      	rxnet->live = false;
       	cancel_work_sync(&rxnet->peer_keepalive_work);
      			rxrpc_queue_work(&rxnet->peer_keepalive_work);
      	del_timer_sync(&rxnet->peer_keepalive_timer);
      
      Fix this by restoring the removed del_timer_sync() so that we try to remove
      the timer twice.  If the timer runs again, it should see ->live == false
      and not restart the worker.
      
      Fixes: 1946014c ("rxrpc: fix a race in rxrpc_exit_net()")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Eric Dumazet <edumazet@google.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: linux-afs@lists.infradead.org
      Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
      Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee3b0826
  9. 14 4月, 2022 1 次提交
  10. 13 4月, 2022 3 次提交
    • L
      nfc: nci: add flush_workqueue to prevent uaf · ef27324e
      Lin Ma 提交于
      Our detector found a concurrent use-after-free bug when detaching an
      NCI device. The main reason for this bug is the unexpected scheduling
      between the used delayed mechanism (timer and workqueue).
      
      The race can be demonstrated below:
      
      Thread-1                           Thread-2
                                       | nci_dev_up()
                                       |   nci_open_device()
                                       |     __nci_request(nci_reset_req)
                                       |       nci_send_cmd
                                       |         queue_work(cmd_work)
      nci_unregister_device()          |
        nci_close_device()             | ...
          del_timer_sync(cmd_timer)[1] |
      ...                              | Worker
      nci_free_device()                | nci_cmd_work()
        kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
      
      In short, the cleanup routine thought that the cmd_timer has already
      been detached by [1] but the mod_timer can re-attach the timer [2], even
      it is already released [3], resulting in UAF.
      
      This UAF is easy to trigger, crash trace by POC is like below
      
      [   66.703713] ==================================================================
      [   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
      [   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
      [   66.703974]
      [   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
      [   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
      [   66.703974] Call Trace:
      [   66.703974]  <TASK>
      [   66.703974]  dump_stack_lvl+0x57/0x7d
      [   66.703974]  print_report.cold+0x5e/0x5db
      [   66.703974]  ? enqueue_timer+0x448/0x490
      [   66.703974]  kasan_report+0xbe/0x1c0
      [   66.703974]  ? enqueue_timer+0x448/0x490
      [   66.703974]  enqueue_timer+0x448/0x490
      [   66.703974]  __mod_timer+0x5e6/0xb80
      [   66.703974]  ? mark_held_locks+0x9e/0xe0
      [   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
      [   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
      [   66.703974]  ? queue_work_on+0x61/0x80
      [   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
      [   66.703974]  process_one_work+0x8bb/0x1510
      [   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
      [   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
      [   66.703974]  ? rwlock_bug.part.0+0x90/0x90
      [   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
      [   66.703974]  worker_thread+0x575/0x1190
      [   66.703974]  ? process_one_work+0x1510/0x1510
      [   66.703974]  kthread+0x2a0/0x340
      [   66.703974]  ? kthread_complete_and_exit+0x20/0x20
      [   66.703974]  ret_from_fork+0x22/0x30
      [   66.703974]  </TASK>
      [   66.703974]
      [   66.703974] Allocated by task 267:
      [   66.703974]  kasan_save_stack+0x1e/0x40
      [   66.703974]  __kasan_kmalloc+0x81/0xa0
      [   66.703974]  nci_allocate_device+0xd3/0x390
      [   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
      [   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
      [   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
      [   66.703974]  tty_ioctl+0x764/0x1310
      [   66.703974]  __x64_sys_ioctl+0x122/0x190
      [   66.703974]  do_syscall_64+0x3b/0x90
      [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   66.703974]
      [   66.703974] Freed by task 406:
      [   66.703974]  kasan_save_stack+0x1e/0x40
      [   66.703974]  kasan_set_track+0x21/0x30
      [   66.703974]  kasan_set_free_info+0x20/0x30
      [   66.703974]  __kasan_slab_free+0x108/0x170
      [   66.703974]  kfree+0xb0/0x330
      [   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
      [   66.703974]  nci_uart_tty_close+0xdf/0x180
      [   66.703974]  tty_ldisc_kill+0x73/0x110
      [   66.703974]  tty_ldisc_hangup+0x281/0x5b0
      [   66.703974]  __tty_hangup.part.0+0x431/0x890
      [   66.703974]  tty_release+0x3a8/0xc80
      [   66.703974]  __fput+0x1f0/0x8c0
      [   66.703974]  task_work_run+0xc9/0x170
      [   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
      [   66.703974]  syscall_exit_to_user_mode+0x19/0x50
      [   66.703974]  do_syscall_64+0x48/0x90
      [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      To fix the UAF, this patch adds flush_workqueue() to ensure the
      nci_cmd_work is finished before the following del_timer_sync.
      This combination will promise the timer is actually detached.
      
      Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
      Signed-off-by: NLin Ma <linma@zju.edu.cn>
      Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef27324e
    • V
      Revert "net: dsa: setup master before ports" · 762c2998
      Vladimir Oltean 提交于
      This reverts commit 11fd667d.
      
      dsa_slave_change_mtu() updates the MTU of the DSA master and of the
      associated CPU port, but only if it detects a change to the master MTU.
      
      The blamed commit in the Fixes: tag below addressed a regression where
      dsa_slave_change_mtu() would return early and not do anything due to
      ds->ops->port_change_mtu() not being implemented.
      
      However, that commit also had the effect that the master MTU got set up
      to the correct value by dsa_master_setup(), but the associated CPU port's
      MTU did not get updated. This causes breakage for drivers that rely on
      the ->port_change_mtu() DSA call to account for the tagging overhead on
      the CPU port, and don't set up the initial MTU during the setup phase.
      
      Things actually worked before because they were in a fragile equilibrium
      where dsa_slave_change_mtu() was called before dsa_master_setup() was.
      So dsa_slave_change_mtu() could actually detect a change and update the
      CPU port MTU too.
      
      Restore the code to the way things used to work by reverting the reorder
      of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did
      not have a concrete motivation going for it anyway, it just looked
      better.
      
      Fixes: 066dfc42 ("Revert "net: dsa: stop updating master MTU from master.c"")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      762c2998
    • S
      esp: limit skb_page_frag_refill use to a single page · 5bd8baab
      Sabrina Dubroca 提交于
      Commit ebe48d36 ("esp: Fix possible buffer overflow in ESP
      transformation") tried to fix skb_page_frag_refill usage in ESP by
      capping allocsize to 32k, but that doesn't completely solve the issue,
      as skb_page_frag_refill may return a single page. If that happens, we
      will write out of bounds, despite the check introduced in the previous
      patch.
      
      This patch forces COW in cases where we would end up calling
      skb_page_frag_refill with a size larger than a page (first in
      esp_output_head with tailen, then in esp_output_tail with
      skb->data_len).
      
      Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
      Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      5bd8baab
  11. 12 4月, 2022 5 次提交
  12. 11 4月, 2022 2 次提交