1. 29 11月, 2021 1 次提交
    • M
      ipv6: fix memory leak in fib6_rule_suppress · cdef4852
      msizanoen1 提交于
      The kernel leaks memory when a `fib` rule is present in IPv6 nftables
      firewall rules and a suppress_prefix rule is present in the IPv6 routing
      rules (used by certain tools such as wg-quick). In such scenarios, every
      incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
      
      After some hours of `bpftrace`-ing and source code reading, I tracked
      down the issue to ca7a03c4 ("ipv6: do not free rt if
      FIB_LOOKUP_NOREF is set on suppress rule").
      
      The problem with that change is that the generic `args->flags` always have
      `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
      `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
      decreasing the refcount when needed.
      
      How to reproduce:
       - Add the following nftables rule to a prerouting chain:
           meta nfproto ipv6 fib saddr . mark . iif oif missing drop
         This can be done with:
           sudo nft create table inet test
           sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
           sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
       - Run:
           sudo ip -6 rule add table main suppress_prefixlength 0
       - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
         with every incoming ipv6 packet.
      
      This patch exposes the protocol-specific flags to the protocol
      specific `suppress` function, and check the protocol-specific `flags`
      argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
      FIB_LOOKUP_NOREF when decreasing the refcount, like this.
      
      [1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
      [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
      Fixes: ca7a03c4 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdef4852
  2. 20 7月, 2021 1 次提交
    • V
      memcg: enable accounting for IP address and routing-related objects · 6126891c
      Vasily Averin 提交于
      An netadmin inside container can use 'ip a a' and 'ip r a'
      to assign a large number of ipv4/ipv6 addresses and routing entries
      and force kernel to allocate megabytes of unaccounted memory
      for long-lived per-netdevice related kernel objects:
      'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node',
      'struct rt6_info', 'struct fib_rules' and ip_fib caches.
      
      These objects can be manually removed, though usually they lives
      in memory till destroy of its net namespace.
      
      It makes sense to account for them to restrict the host's memory
      consumption from inside the memcg-limited container.
      
      One of such objects is the 'struct fib6_node' mostly allocated in
      net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section:
      
       write_lock_bh(&table->tb6_lock);
       err = fib6_add(&table->tb6_root, rt, info, mxc);
       write_unlock_bh(&table->tb6_lock);
      
      In this case it is not enough to simply add SLAB_ACCOUNT to corresponding
      kmem cache. The proper memory cgroup still cannot be found due to the
      incorrect 'in_interrupt()' check used in memcg_kmem_bypass().
      
      Obsoleted in_interrupt() does not describe real execution context properly.
      >From include/linux/preempt.h:
      
       The following macros are deprecated and should not be used in new code:
       in_interrupt()	- We're in NMI,IRQ,SoftIRQ context or have BH disabled
      
      To verify the current execution context new macro should be used instead:
       in_task()	- We're in task context
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6126891c
  3. 04 6月, 2021 1 次提交
  4. 17 11月, 2020 1 次提交
  5. 09 9月, 2020 1 次提交
    • B
      fib: fix fib_rule_ops indirect call wrappers when CONFIG_IPV6=m · 923f614c
      Brian Vazquez 提交于
      If CONFIG_IPV6=m, the IPV6 functions won't be found by the linker:
      
      ld: net/core/fib_rules.o: in function `fib_rules_lookup':
      fib_rules.c:(.text+0x606): undefined reference to `fib6_rule_match'
      ld: fib_rules.c:(.text+0x611): undefined reference to `fib6_rule_match'
      ld: fib_rules.c:(.text+0x68c): undefined reference to `fib6_rule_action'
      ld: fib_rules.c:(.text+0x693): undefined reference to `fib6_rule_action'
      ld: fib_rules.c:(.text+0x6aa): undefined reference to `fib6_rule_suppress'
      ld: fib_rules.c:(.text+0x6bc): undefined reference to `fib6_rule_suppress'
      make: *** [Makefile:1166: vmlinux] Error 1
      Reported-by: NSven Joachim <svenjoac@gmx.de>
      Fixes: b9aaec8f ("fib: use indirect call wrappers in the most common fib_rules_ops")
      Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Signed-off-by: NBrian Vazquez <brianvv@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      923f614c
  6. 04 8月, 2020 1 次提交
  7. 02 8月, 2020 1 次提交
  8. 30 7月, 2020 1 次提交
  9. 29 7月, 2020 1 次提交
  10. 17 2月, 2020 1 次提交
  11. 05 10月, 2019 3 次提交
  12. 06 6月, 2019 1 次提交
  13. 05 6月, 2019 1 次提交
  14. 09 5月, 2019 1 次提交
  15. 28 4月, 2019 1 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
  16. 09 10月, 2018 1 次提交
  17. 31 7月, 2018 1 次提交
  18. 30 6月, 2018 1 次提交
    • R
      net: fib_rules: bring back rule_exists to match rule during add · 35e8c7ba
      Roopa Prabhu 提交于
      After commit f9d4b0c1 ("fib_rules: move common handling of newrule
      delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
      for existing rule lookup in both the add and del paths. While this
      is good for the delete path, it solves a few problems but opens up
      a few invalid key matches in the add path.
      
      $ip -4 rule add table main tos 10 fwmark 1
      $ip -4 rule add table main tos 10
      RTNETLINK answers: File exists
      
      The problem here is rule_find does not check if the key masks in
      the new and old rule are the same and hence ends up matching a more
      secific rule. Rule key masks cannot be easily compared today without
      an elaborate if-else block. Its best to introduce key masks for easier
      and accurate rule comparison in the future. Until then, due to fear of
      regressions this patch re-introduces older loose rule_exists during add.
      Also fixes both rule_exists and rule_find to cover missing attributes.
      
      Fixes: f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35e8c7ba
  19. 27 6月, 2018 1 次提交
  20. 26 4月, 2018 1 次提交
  21. 24 4月, 2018 1 次提交
  22. 23 4月, 2018 2 次提交
  23. 30 3月, 2018 1 次提交
  24. 28 3月, 2018 1 次提交
  25. 01 3月, 2018 1 次提交
  26. 24 2月, 2018 1 次提交
  27. 22 2月, 2018 1 次提交
  28. 13 2月, 2018 1 次提交
  29. 14 11月, 2017 1 次提交
  30. 01 11月, 2017 1 次提交
  31. 10 8月, 2017 1 次提交
  32. 04 8月, 2017 1 次提交
  33. 14 7月, 2017 1 次提交
  34. 03 7月, 2017 1 次提交
    • E
      net: avoid one splat in fib_nl_delrule() · 5361e209
      Eric Dumazet 提交于
      We need to use refcount_set() on a newly created rule to avoid
      following error :
      
      [   64.601749] ------------[ cut here ]------------
      [   64.601757] WARNING: CPU: 0 PID: 6476 at lib/refcount.c:184 refcount_sub_and_test+0x75/0xa0
      [   64.601758] Modules linked in: w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
      [   64.601769] CPU: 0 PID: 6476 Comm: ip Tainted: G        W       4.12.0-smp-DEV #274
      [   64.601771] task: ffff8837bf482040 task.stack: ffff8837bdc08000
      [   64.601773] RIP: 0010:refcount_sub_and_test+0x75/0xa0
      [   64.601774] RSP: 0018:ffff8837bdc0f5c0 EFLAGS: 00010286
      [   64.601776] RAX: 0000000000000026 RBX: 0000000000000001 RCX: 0000000000000000
      [   64.601777] RDX: 0000000000000026 RSI: 0000000000000096 RDI: ffffed06f7b81eae
      [   64.601778] RBP: ffff8837bdc0f5d0 R08: 0000000000000004 R09: fffffbfff4a54c25
      [   64.601779] R10: 00000000cbc500e5 R11: ffffffffa52a6128 R12: ffff881febcf6f24
      [   64.601779] R13: ffff881fbf4eaf00 R14: ffff881febcf6f80 R15: ffff8837d7a4ed00
      [   64.601781] FS:  00007ff5a2f6b700(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000
      [   64.601782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   64.601783] CR2: 00007ffcdc70d000 CR3: 0000001f9c91e000 CR4: 00000000001406f0
      [   64.601783] Call Trace:
      [   64.601786]  refcount_dec_and_test+0x11/0x20
      [   64.601790]  fib_nl_delrule+0xc39/0x1630
      [   64.601793]  ? is_bpf_text_address+0xe/0x20
      [   64.601795]  ? fib_nl_newrule+0x25e0/0x25e0
      [   64.601798]  ? depot_save_stack+0x133/0x470
      [   64.601801]  ? ns_capable+0x13/0x20
      [   64.601803]  ? __netlink_ns_capable+0xcc/0x100
      [   64.601806]  rtnetlink_rcv_msg+0x23a/0x6a0
      [   64.601808]  ? rtnl_newlink+0x1630/0x1630
      [   64.601811]  ? memset+0x31/0x40
      [   64.601813]  netlink_rcv_skb+0x2d7/0x440
      [   64.601815]  ? rtnl_newlink+0x1630/0x1630
      [   64.601816]  ? netlink_ack+0xaf0/0xaf0
      [   64.601818]  ? kasan_unpoison_shadow+0x35/0x50
      [   64.601820]  ? __kmalloc_node_track_caller+0x4c/0x70
      [   64.601821]  rtnetlink_rcv+0x28/0x30
      [   64.601823]  netlink_unicast+0x422/0x610
      [   64.601824]  ? netlink_attachskb+0x650/0x650
      [   64.601826]  netlink_sendmsg+0x7b7/0xb60
      [   64.601828]  ? netlink_unicast+0x610/0x610
      [   64.601830]  ? netlink_unicast+0x610/0x610
      [   64.601832]  sock_sendmsg+0xba/0xf0
      [   64.601834]  ___sys_sendmsg+0x6a9/0x8c0
      [   64.601835]  ? copy_msghdr_from_user+0x520/0x520
      [   64.601837]  ? __alloc_pages_nodemask+0x160/0x520
      [   64.601839]  ? memcg_write_event_control+0xd60/0xd60
      [   64.601841]  ? __alloc_pages_slowpath+0x1d50/0x1d50
      [   64.601843]  ? kasan_slab_free+0x71/0xc0
      [   64.601845]  ? mem_cgroup_commit_charge+0xb2/0x11d0
      [   64.601847]  ? lru_cache_add_active_or_unevictable+0x7d/0x1a0
      [   64.601849]  ? __handle_mm_fault+0x1af8/0x2810
      [   64.601851]  ? may_open_dev+0xc0/0xc0
      [   64.601852]  ? __pmd_alloc+0x2c0/0x2c0
      [   64.601853]  ? __fdget+0x13/0x20
      [   64.601855]  __sys_sendmsg+0xc6/0x150
      [   64.601856]  ? __sys_sendmsg+0xc6/0x150
      [   64.601857]  ? SyS_shutdown+0x170/0x170
      [   64.601859]  ? handle_mm_fault+0x28a/0x650
      [   64.601861]  SyS_sendmsg+0x12/0x20
      [   64.601863]  entry_SYSCALL_64_fastpath+0x13/0x94
      
      Fixes: 717d1e99 ("net: convert fib_rule.refcnt from atomic_t to refcount_t")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5361e209
  35. 01 7月, 2017 1 次提交
  36. 21 6月, 2017 1 次提交
    • S
      fib_rules: Resolve goto rules target on delete · bdaf32c3
      Serhey Popovych 提交于
      We should avoid marking goto rules unresolved when their
      target is actually reachable after rule deletion.
      
      Consolder following sample scenario:
      
        # ip -4 ru sh
        0:      from all lookup local
        32000:  from all goto 32100
        32100:  from all lookup main
        32100:  from all lookup default
        32766:  from all lookup main
        32767:  from all lookup default
      
        # ip -4 ru del pref 32100 table main
        # ip -4 ru sh
        0:      from all lookup local
        32000:  from all goto 32100 [unresolved]
        32100:  from all lookup default
        32766:  from all lookup main
        32767:  from all lookup default
      
      After removal of first rule with preference 32100 we
      mark all goto rules as unreachable, even when rule with
      same preference as removed one still present.
      
      Check if next rule with same preference is available
      and make all rules with goto action pointing to it.
      Signed-off-by: NSerhey Popovych <serhe.popovych@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdaf32c3
  37. 28 4月, 2017 1 次提交