1. 07 12月, 2021 4 次提交
  2. 02 12月, 2021 1 次提交
  3. 30 11月, 2021 1 次提交
  4. 29 11月, 2021 1 次提交
    • M
      ipv6: fix memory leak in fib6_rule_suppress · cdef4852
      msizanoen1 提交于
      The kernel leaks memory when a `fib` rule is present in IPv6 nftables
      firewall rules and a suppress_prefix rule is present in the IPv6 routing
      rules (used by certain tools such as wg-quick). In such scenarios, every
      incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
      
      After some hours of `bpftrace`-ing and source code reading, I tracked
      down the issue to ca7a03c4 ("ipv6: do not free rt if
      FIB_LOOKUP_NOREF is set on suppress rule").
      
      The problem with that change is that the generic `args->flags` always have
      `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
      `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
      decreasing the refcount when needed.
      
      How to reproduce:
       - Add the following nftables rule to a prerouting chain:
           meta nfproto ipv6 fib saddr . mark . iif oif missing drop
         This can be done with:
           sudo nft create table inet test
           sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
           sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
       - Run:
           sudo ip -6 rule add table main suppress_prefixlength 0
       - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
         with every incoming ipv6 packet.
      
      This patch exposes the protocol-specific flags to the protocol
      specific `suppress` function, and check the protocol-specific `flags`
      argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
      FIB_LOOKUP_NOREF when decreasing the refcount, like this.
      
      [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
      [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
      Fixes: ca7a03c4 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdef4852
  5. 25 11月, 2021 4 次提交
  6. 22 11月, 2021 2 次提交
  7. 19 11月, 2021 1 次提交
  8. 18 11月, 2021 3 次提交
  9. 17 11月, 2021 1 次提交
  10. 16 11月, 2021 6 次提交
  11. 14 11月, 2021 1 次提交
  12. 11 11月, 2021 1 次提交
  13. 06 11月, 2021 1 次提交
  14. 03 11月, 2021 2 次提交
  15. 02 11月, 2021 1 次提交
    • J
      net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter · 18ac597a
      James Prestwood 提交于
      In most situations the neighbor discovery cache should be cleared on a
      NOCARRIER event which is currently done unconditionally. But for wireless
      roams the neighbor discovery cache can and should remain intact since
      the underlying network has not changed.
      
      This patch introduces a sysctl option ndisc_evict_nocarrier which can
      be disabled by a wireless supplicant during a roam. This allows packets
      to be sent after a roam immediately without having to wait for
      neighbor discovery.
      
      A user reported roughly a 1 second delay after a roam before packets
      could be sent out (note, on IPv4). This delay was due to the ARP
      cache being cleared. During testing of this same scenario using IPv6
      no delay was noticed, but regardless there is no reason to clear
      the ndisc cache for wireless roams.
      Signed-off-by: NJames Prestwood <prestwoj@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      18ac597a
  16. 01 11月, 2021 1 次提交
  17. 28 10月, 2021 3 次提交
    • A
      ipv6: enable net.ipv6.route.max_size sysctl in network namespace · 06e6c88f
      Alexander Kuznetsov 提交于
      We want to increase route cache size in network namespace
      created with user namespace. Currently ipv6 route settings
      are disabled for non-initial network namespaces.
      We can allow this sysctl and it will be safe since
      commit <6126891c> because route cache account to kmem,
      that is why users from user namespace can not DOS system.
      Signed-off-by: NAlexander Kuznetsov <wwfq@yandex-team.ru>
      Acked-by: NDmitry Yakunin <zeil@yandex-team.ru>
      Acked-by: NDmitry Monakhov <dmtrmonakhov@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06e6c88f
    • E
      tcp: do not clear skb->csum if already zero · 4f226674
      Eric Dumazet 提交于
      Freshly allocated skbs have their csum field cleared already.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f226674
    • E
      inet: remove races in inet{6}_getname() · 9dfc685e
      Eric Dumazet 提交于
      syzbot reported data-races in inet_getname() multiple times,
      it is time we fix this instead of pretending applications
      should not trigger them.
      
      getsockname() and getpeername() are not really considered fast path.
      
      v2: added the missing BPF_CGROUP_RUN_SA_PROG() declaration
          needed when CONFIG_CGROUP_BPF=n, as reported by
          kernel test robot <lkp@intel.com>
      
      syzbot typical report:
      
      BUG: KCSAN: data-race in __inet_hash_connect / inet_getname
      
      write to 0xffff888136d66cf8 of 2 bytes by task 14374 on cpu 1:
       __inet_hash_connect+0x7ec/0x950 net/ipv4/inet_hashtables.c:831
       inet_hash_connect+0x85/0x90 net/ipv4/inet_hashtables.c:853
       tcp_v4_connect+0x782/0xbb0 net/ipv4/tcp_ipv4.c:275
       __inet_stream_connect+0x156/0x6e0 net/ipv4/af_inet.c:664
       inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:728
       __sys_connect_file net/socket.c:1896 [inline]
       __sys_connect+0x254/0x290 net/socket.c:1913
       __do_sys_connect net/socket.c:1923 [inline]
       __se_sys_connect net/socket.c:1920 [inline]
       __x64_sys_connect+0x3d/0x50 net/socket.c:1920
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888136d66cf8 of 2 bytes by task 14408 on cpu 0:
       inet_getname+0x11f/0x170 net/ipv4/af_inet.c:790
       __sys_getsockname+0x11d/0x1b0 net/socket.c:1946
       __do_sys_getsockname net/socket.c:1961 [inline]
       __se_sys_getsockname net/socket.c:1958 [inline]
       __x64_sys_getsockname+0x3e/0x50 net/socket.c:1958
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000 -> 0xdee0
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 14408 Comm: syz-executor.3 Not tainted 5.15.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211026213014.3026708-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9dfc685e
  18. 26 10月, 2021 5 次提交
  19. 23 10月, 2021 1 次提交