1. 17 5月, 2019 2 次提交
    • W
      ipv6: fix src addr routing with the exception table · 510e2ced
      Wei Wang 提交于
      When inserting route cache into the exception table, the key is
      generated with both src_addr and dest_addr with src addr routing.
      However, current logic always assumes the src_addr used to generate the
      key is a /128 host address. This is not true in the following scenarios:
      1. When the route is a gateway route or does not have next hop.
         (rt6_is_gw_or_nonexthop() == false)
      2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
      This means, when looking for a route cache in the exception table, we
      have to do the lookup twice: first time with the passed in /128 host
      address, second time with the src_addr stored in fib6_info.
      
      This solves the pmtu discovery issue reported by Mikael Magnusson where
      a route cache with a lower mtu info is created for a gateway route with
      src addr. However, the lookup code is not able to find this route cache.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Reported-by: NMikael Magnusson <mikael.kernel@lists.m7n.se>
      Bisected-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Cc: Martin Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      510e2ced
    • E
      ipv6: prevent possible fib6 leaks · 61fb0d01
      Eric Dumazet 提交于
      At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
      for finding all percpu routes and set their ->from pointer
      to NULL, so that fib6_ref can reach its expected value (1).
      
      The problem right now is that other cpus can still catch the
      route being deleted, since there is no rcu grace period
      between the route deletion and call to fib6_drop_pcpu_from()
      
      This can leak the fib6 and associated resources, since no
      notifier will take care of removing the last reference(s).
      
      I decided to add another boolean (fib6_destroying) instead
      of reusing/renaming exception_bucket_flushed to ease stable backports,
      and properly document the memory barriers used to implement this fix.
      
      This patch has been co-developped with Wei Wang.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Martin Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61fb0d01
  2. 02 5月, 2019 1 次提交
    • M
      ipv6: A few fixes on dereferencing rt->from · 886b7a50
      Martin KaFai Lau 提交于
      It is a followup after the fix in
      commit 9c69a132 ("route: Avoid crash from dereferencing NULL rt->from")
      
      rt6_do_redirect():
      1. NULL checking is needed on rt->from because a parallel
         fib6_info delete could happen that sets rt->from to NULL.
         (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
      
      2. fib6_info_hold() is not enough.  Same reason as (1).
         Meaning, holding dst->__refcnt cannot ensure
         rt->from is not NULL or rt->from->fib6_ref is not 0.
      
         Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
         is already doing, this patch chooses to extend the rcu section
         to keep "from" dereference-able after checking for NULL.
      
      inet6_rtm_getroute():
      1. NULL checking is also needed on rt->from for a similar reason.
         Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      886b7a50
  3. 01 5月, 2019 1 次提交
    • E
      ipv6: fix races in ip6_dst_destroy() · 0e233874
      Eric Dumazet 提交于
      We had many syzbot reports that seem to be caused by use-after-free
      of struct fib6_info.
      
      ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
      are writers vs rt->from, and use non consistent synchronization among
      themselves.
      
      Switching to xchg() will solve the issues with no possible
      lockdep issues.
      
      BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
      BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
      BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
      Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649
      
      CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
       check_memory_region_inline mm/kasan/generic.c:185 [inline]
       check_memory_region+0x123/0x190 mm/kasan/generic.c:191
       kasan_check_write+0x14/0x20 mm/kasan/common.c:108
       atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
       fib6_info_release include/net/ip6_fib.h:294 [inline]
       fib6_info_release include/net/ip6_fib.h:292 [inline]
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
       fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
       fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
       fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
       fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
       fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
       fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
       fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
       __fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
       fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
       rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
       rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
       addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
       addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
       notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
       call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
       call_netdevice_notifiers net/core/dev.c:1779 [inline]
       dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
       rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
       rollback_registered+0x109/0x1d0 net/core/dev.c:8242
       unregister_netdevice_queue net/core/dev.c:9289 [inline]
       unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
       unregister_netdevice include/linux/netdevice.h:2658 [inline]
       __tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
       tun_detach drivers/net/tun.c:744 [inline]
       tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
       __fput+0x2e5/0x8d0 fs/file_table.c:278
       ____fput+0x16/0x20 fs/file_table.c:309
       task_work_run+0x14a/0x1c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x90a/0x2fa0 kernel/exit.c:876
       do_group_exit+0x135/0x370 kernel/exit.c:980
       __do_sys_exit_group kernel/exit.c:991 [inline]
       __se_sys_exit_group kernel/exit.c:989 [inline]
       __x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
      RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
      RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
      R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e233874
  4. 30 4月, 2019 1 次提交
    • S
      vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach · 1d3fd8a1
      Stephen Suryaputra 提交于
      When there is no route to an IPv6 dest addr, skb_dst(skb) points
      to loopback dev in the case of that the IP6CB(skb)->iif is
      enslaved to a vrf. This causes Ip6InNoRoutes to be incremented on the
      loopback dev. This also causes the lookup to fail on icmpv6_send() and
      the dest unreachable to not sent and Ip6OutNoRoutes gets incremented on
      the loopback dev.
      
      To reproduce:
      * Gateway configuration:
              ip link add dev vrf_258 type vrf table 258
              ip link set dev enp0s9 master vrf_258
              ip addr add 66:1/64 dev enp0s9
              ip -6 route add unreachable default metric 8192 table 258
              sysctl -w net.ipv6.conf.all.forwarding=1
              sysctl -w net.ipv6.conf.enp0s9.forwarding=1
      * Sender configuration:
              ip addr add 66::2/64 dev enp0s9
              ip -6 route add default via 66::1
      and ping 67::1 for example from the sender.
      
      Fix this by counting on the original netdev and reset the skb dst to
      force a fresh lookup.
      
      v2: Fix typo of destination address in the repro steps.
      v3: Simplify the loopback check (per David Ahern) and use reverse
          Christmas tree format (per David Miller).
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d3fd8a1
  5. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  6. 24 4月, 2019 2 次提交
  7. 23 4月, 2019 1 次提交
  8. 18 4月, 2019 13 次提交
  9. 16 4月, 2019 1 次提交
    • J
      route: Avoid crash from dereferencing NULL rt->from · 9c69a132
      Jonathan Lemon 提交于
      When __ip6_rt_update_pmtu() is called, rt->from is RCU dereferenced, but is
      never checked for null - rt6_flush_exceptions() may have removed the entry.
      
      [ 1913.989004] RIP: 0010:ip6_rt_cache_alloc+0x13/0x170
      [ 1914.209410] Call Trace:
      [ 1914.214798]  <IRQ>
      [ 1914.219226]  __ip6_rt_update_pmtu+0xb0/0x190
      [ 1914.228649]  ip6_tnl_xmit+0x2c2/0x970 [ip6_tunnel]
      [ 1914.239223]  ? ip6_tnl_parse_tlv_enc_lim+0x32/0x1a0 [ip6_tunnel]
      [ 1914.252489]  ? __gre6_xmit+0x148/0x530 [ip6_gre]
      [ 1914.262678]  ip6gre_tunnel_xmit+0x17e/0x3c7 [ip6_gre]
      [ 1914.273831]  dev_hard_start_xmit+0x8d/0x1f0
      [ 1914.283061]  sch_direct_xmit+0xfa/0x230
      [ 1914.291521]  __qdisc_run+0x154/0x4b0
      [ 1914.299407]  net_tx_action+0x10e/0x1f0
      [ 1914.307678]  __do_softirq+0xca/0x297
      [ 1914.315567]  irq_exit+0x96/0xa0
      [ 1914.322494]  smp_apic_timer_interrupt+0x68/0x130
      [ 1914.332683]  apic_timer_interrupt+0xf/0x20
      [ 1914.341721]  </IRQ>
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c69a132
  10. 13 4月, 2019 1 次提交
  11. 12 4月, 2019 10 次提交
  12. 09 4月, 2019 1 次提交
  13. 04 4月, 2019 1 次提交
  14. 30 3月, 2019 3 次提交