1. 22 4月, 2022 2 次提交
  2. 17 4月, 2022 1 次提交
  3. 16 4月, 2022 2 次提交
    • E
      ipv6: make ip6_rt_gc_expire an atomic_t · 9cb7c013
      Eric Dumazet 提交于
      Reads and Writes to ip6_rt_gc_expire always have been racy,
      as syzbot reported lately [1]
      
      There is a possible risk of under-flow, leading
      to unexpected high value passed to fib6_run_gc(),
      although I have not observed this in the field.
      
      Hosts hitting ip6_dst_gc() very hard are under pretty bad
      state anyway.
      
      [1]
      BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc
      
      read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000bb3 -> 0x00000ba9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: mld mld_ifc_work
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9cb7c013
    • E
      ipv6: fix NULL deref in ip6_rcv_core() · 0339d25a
      Eric Dumazet 提交于
      idev can be NULL, as the surrounding code suggests.
      
      Fixes: 4daf841a ("net: ipv6: add skb drop reasons to ip6_rcv_core()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Menglong Dong <imagedong@tencent.com>
      Cc: Jiang Biao <benbjiang@tencent.com>
      Cc: Hao Peng <flyingpeng@tencent.com>
      Link: https://lore.kernel.org/r/20220413205653.1178458-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0339d25a
  4. 15 4月, 2022 2 次提交
    • P
      ip6_gre: Fix skb_under_panic in __gre6_xmit() · ab198e1d
      Peilin Ye 提交于
      Feng reported an skb_under_panic BUG triggered by running
      test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:
      
      [   82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
      <...>
      [   82.607380] Call Trace:
      [   82.609389]  <TASK>
      [   82.611136]  skb_push.cold.109+0x10/0x10
      [   82.614289]  __gre6_xmit+0x41e/0x590
      [   82.617169]  ip6gre_tunnel_xmit+0x344/0x3f0
      [   82.620526]  dev_hard_start_xmit+0xf1/0x330
      [   82.623882]  sch_direct_xmit+0xe4/0x250
      [   82.626961]  __dev_queue_xmit+0x720/0xfe0
      <...>
      [   82.633431]  packet_sendmsg+0x96a/0x1cb0
      [   82.636568]  sock_sendmsg+0x30/0x40
      <...>
      
      The following sequence of events caused the BUG:
      
      1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
         calculated based on old flags (see ip6gre_calc_hlen());
      2. packet_snd() reserves header room for skb A, assuming
         tunnel->tun_hlen is 4;
      3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
         skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
      4. __gre6_xmit() detects the new tunnel key, and recalculates
         "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
         TUNNEL_SEQ);
      5. gre_build_header() calls skb_push() with insufficient reserved header
         room, triggering the BUG.
      
      As sugguested by Cong, fix it by moving the call to skb_cow_head() after
      the recalculation of tun_hlen.
      
      Reproducer:
      
        OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o
      
        ip netns add at_ns0
        ip link add veth0 type veth peer name veth1
        ip link set veth0 netns at_ns0
        ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
        ip netns exec at_ns0 ip link set dev veth0 up
        ip link set dev veth1 up mtu 1500
        ip addr add dev veth1 172.16.1.200/24
      
        ip netns exec at_ns0 ip addr add ::11/96 dev veth0
        ip netns exec at_ns0 ip link set dev veth0 up
        ip addr add dev veth1 ::22/96
        ip link set dev veth1 up
      
        ip netns exec at_ns0 \
        	ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
        	local ::11 remote ::22
      
        ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
        ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
        ip netns exec at_ns0 ip link set dev ip6gretap00 up
      
        ip link add dev ip6gretap11 type ip6gretap external
        ip addr add dev ip6gretap11 10.1.1.200/24
        ip addr add dev ip6gretap11 fc80::200/24
        ip link set dev ip6gretap11 up
      
        tc qdisc add dev ip6gretap11 clsact
        tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
        tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel
      
        ping6 -c 3 -w 10 -q ::11
      
      Fixes: 6712abc1 ("ip6_gre: add ip6 gre and gretap collect_md mode")
      Reported-by: NFeng Zhou <zhoufeng.zf@bytedance.com>
      Co-developed-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab198e1d
    • P
      ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() · f40c064e
      Peilin Ye 提交于
      Do not update tunnel->tun_hlen in data plane code.  Use a local variable
      instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit().
      Co-developed-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40c064e
  5. 13 4月, 2022 9 次提交
  6. 12 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from recvmsg() entities · ec095263
      Oliver Hartkopp 提交于
      The internal recvmsg() functions have two parameters 'flags' and 'noblock'
      that were merged inside skb_recv_datagram(). As a follow up patch to commit
      f4b41f06 ("net: remove noblock parameter from skb_recv_datagram()")
      this patch removes the separate 'noblock' parameter for recvmsg().
      
      Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
      'noblock' parameters are unnecessarily split up with e.g.
      
      err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
                                 flags & ~MSG_DONTWAIT, &addr_len);
      
      or in
      
      err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
                            sk, msg, size, flags & MSG_DONTWAIT,
                            flags & ~MSG_DONTWAIT, &addr_len);
      
      instead of simply using only flags all the time and check for MSG_DONTWAIT
      where needed (to preserve for the formerly separated no(n)block condition).
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.netSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      ec095263
  7. 11 4月, 2022 3 次提交
    • N
      ipv6: fix panic when forwarding a pkt with no in6 dev · e3fa461d
      Nicolas Dichtel 提交于
      kongweibin reported a kernel panic in ip6_forward() when input interface
      has no in6 dev associated.
      
      The following tc commands were used to reproduce this panic:
      tc qdisc del dev vxlan100 root
      tc qdisc add dev vxlan100 root netem corrupt 5%
      
      CC: stable@vger.kernel.org
      Fixes: ccd27f05 ("ipv6: fix 'disable_policy' for fwd packets")
      Reported-by: Nkongweibin <kongweibin2@huawei.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3fa461d
    • P
      netfilter: nft_fib: reverse path filter for policy-based routing on iif · be8be04e
      Pablo Neira Ayuso 提交于
      If policy-based routing using the iif selector is used, then the fib
      expression fails to look up for the reverse path from the prerouting
      hook because the input interface cannot be inferred. In order to support
      this scenario, extend the fib expression to allow to use after the route
      lookup, from the forward hook.
      
      This patch also adds support for the input hook for usability reasons.
      Since the prerouting hook cannot be used for the scenario described
      above, users need two rules: one for the forward chain and another rule
      for the input chain to check for the reverse path check for locally
      targeted traffic.
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      be8be04e
    • M
      net: icmp: add skb drop reasons to icmp protocol · b384c95a
      Menglong Dong 提交于
      Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with
      kfree_skb_reason().
      
      In order to get the reasons of the skb drops after icmp message handle,
      we change the return type of 'handler()' in 'struct icmp_control' from
      'bool' to 'enum skb_drop_reason'. This may change its original
      intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means
      success now. Therefore, all 'handler' and the call of them need to be
      handled. Following 'handler' functions are involved:
      
      icmp_unreach()
      icmp_redirect()
      icmp_echo()
      icmp_timestamp()
      icmp_discard()
      
      And following new drop reasons are added:
      
      SKB_DROP_REASON_ICMP_CSUM
      SKB_DROP_REASON_INVALID_PROTO
      
      The reason 'INVALID_PROTO' is introduced for the case that the packet
      doesn't follow rfc 1122 and is dropped. This is not a common case, and
      I believe we can locate the problem from the data in the packet. For now,
      this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types.
      
      Maybe there should be a document file for these reasons. For example,
      list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO'
      drop reason. Therefore, users can locate their problems according to the
      document.
      Reviewed-by: NHao Peng <flyingpeng@tencent.com>
      Reviewed-by: NJiang Biao <benbjiang@tencent.com>
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b384c95a
  8. 08 4月, 2022 1 次提交
  9. 07 4月, 2022 2 次提交
  10. 06 4月, 2022 3 次提交
  11. 05 4月, 2022 1 次提交
  12. 20 3月, 2022 2 次提交
  13. 16 3月, 2022 1 次提交
    • D
      net: Add l3mdev index to flow struct and avoid oif reset for port devices · 40867d74
      David Ahern 提交于
      The fundamental premise of VRF and l3mdev core code is binding a socket
      to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
      Legacy code resets flowi_oif to the l3mdev losing any original port
      device binding. Ben (among others) has demonstrated use cases where the
      original port device binding is important and needs to be retained.
      This patch handles that by adding a new entry to the common flow struct
      that can indicate the l3mdev index for later rule and table matching
      avoiding the need to reset flowi_oif.
      
      In addition to allowing more use cases that require port device binds,
      this patch brings a few datapath simplications:
      
      1. l3mdev_fib_rule_match is only called when walking fib rules and
         always after l3mdev_update_flow. That allows an optimization to bail
         early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
         only that index needs to be checked for the FIB table id.
      
      2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
         (e.g., VRF) device. By resetting flowi_oif only for this case the
         FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
         removing several checks in the datapath. The flowi_iif path can be
         simplified to only be called if the it is not loopback (loopback can
         not be assigned to an L3 domain) and the l3mdev index is not already
         set.
      
      3. Avoid another device lookup in the output path when the fib lookup
         returns a reject failure.
      
      Note: 2 functional tests for local traffic with reject fib rules are
      updated to reflect the new direct failure at FIB lookup time for ping
      rather than the failure on packet path. The current code fails like this:
      
          HINT: Fails since address on vrf device is out of device scope
          COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
          ping: Warning: source address might be selected on device other than: eth1
          PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.
      
          --- 172.16.3.1 ping statistics ---
          1 packets transmitted, 0 received, 100% packet loss, time 0ms
      
      where the test now directly fails:
      
          HINT: Fails since address on vrf device is out of device scope
          COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
          ping: connect: No route to host
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Tested-by: NBen Greear <greearb@candelatech.com>
      Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      40867d74
  14. 14 3月, 2022 1 次提交
    • S
      esp6: fix check on ipv6_skip_exthdr's return value · 4db4075f
      Sabrina Dubroca 提交于
      Commit 5f9c55c8 ("ipv6: check return value of ipv6_skip_exthdr")
      introduced an incorrect check, which leads to all ESP packets over
      either TCPv6 or UDPv6 encapsulation being dropped. In this particular
      case, offset is negative, since skb->data points to the ESP header in
      the following chain of headers, while skb->network_header points to
      the IPv6 header:
      
          IPv6 | ext | ... | ext | UDP | ESP | ...
      
      That doesn't seem to be a problem, especially considering that if we
      reach esp6_input_done2, we're guaranteed to have a full set of headers
      available (otherwise the packet would have been dropped earlier in the
      stack). However, it means that the return value will (intentionally)
      be negative. We can make the test more specific, as the expected
      return value of ipv6_skip_exthdr will be the (negated) size of either
      a UDP header, or a TCP header with possible options.
      
      In the future, we should probably either make ipv6_skip_exthdr
      explicitly accept negative offsets (and adjust its return value for
      error cases), or make ipv6_skip_exthdr only take non-negative
      offsets (and audit all callers).
      
      Fixes: 5f9c55c8 ("ipv6: check return value of ipv6_skip_exthdr")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      4db4075f
  15. 12 3月, 2022 1 次提交
  16. 09 3月, 2022 1 次提交
  17. 07 3月, 2022 3 次提交
  18. 04 3月, 2022 1 次提交
  19. 03 3月, 2022 3 次提交
    • M
      net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally · cd14e9b7
      Martin KaFai Lau 提交于
      The previous patches handled the delivery_time in the ingress path
      before the routing decision is made.  This patch can postpone clearing
      delivery_time in a skb until knowing it is delivered locally and also
      set the (rcv) timestamp if needed.  This patch moves the
      skb_clear_delivery_time() from dev.c to ip_local_deliver_finish()
      and ip6_input_finish().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd14e9b7
    • M
      net: ipv6: Get rcv timestamp if needed when handling hop-by-hop IOAM option · b6561f84
      Martin KaFai Lau 提交于
      IOAM is a hop-by-hop option with a temporary iana allocation (49).
      Since it is hop-by-hop, it is done before the input routing decision.
      One of the traced data field is the (rcv) timestamp.
      
      When the locally generated skb is looping from egress to ingress over
      a virtual interface (e.g. veth, loopback...), skb->tstamp may have the
      delivery time before it is known that it will be delivered locally
      and received by another sk.
      
      Like handling the network tapping (tcpdump) in the earlier patch,
      this patch gets the timestamp if needed without over-writing the
      delivery_time in the skb->tstamp.  skb_tstamp_cond() is added to do the
      ktime_get_real() with an extra cond arg to check on top of the
      netstamp_needed_key static key.  skb_tstamp_cond() will also be used in
      a latter patch and it needs the netstamp_needed_key check.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6561f84
    • M
      net: ipv6: Handle delivery_time in ipv6 defrag · 335c8cf3
      Martin KaFai Lau 提交于
      A latter patch will postpone the delivery_time clearing until the stack
      knows the skb is being delivered locally (i.e. calling
      skb_clear_delivery_time() at ip_local_deliver_finish() for IPv4
      and at ip6_input_finish() for IPv6).  That will allow other kernel
      forwarding path (e.g. ip[6]_forward) to keep the delivery_time also.
      
      A very similar IPv6 defrag codes have been duplicated in
      multiple places: regular IPv6, nf_conntrack, and 6lowpan.
      
      Unlike the IPv4 defrag which is done before ip_local_deliver_finish(),
      the regular IPv6 defrag is done after ip6_input_finish().
      Thus, no change should be needed in the regular IPv6 defrag
      logic because skb_clear_delivery_time() should have been called.
      
      6lowpan also does not need special handling on delivery_time
      because it is a non-inet packet_type.
      
      However, cf_conntrack has a case in NF_INET_PRE_ROUTING that needs
      to do the IPv6 defrag earlier.  Thus, it needs to save the
      mono_delivery_time bit in the inet_frag_queue which is similar
      to how it is handled in the previous patch for the IPv4 defrag.
      
      This patch chooses to do it consistently and stores the mono_delivery_time
      in the inet_frag_queue for all cases such that it will be easier
      for the future refactoring effort on the IPv6 reasm code.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      335c8cf3