1. 29 4月, 2020 1 次提交
  2. 24 4月, 2020 1 次提交
    • F
      ipv6: Honor all IPv6 PIO Valid Lifetime values · b75326c2
      Fernando Gont 提交于
      RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
      the Valid Lifetime of configured addresses to less than two hours, thus
      preventing hosts from reacting to the information provided by a router
      that has positive knowledge that a prefix has become invalid.
      
      This patch makes hosts honor all Valid Lifetime values, as per
      draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
      mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.
      
      Note: Attacks aiming at disabling an advertised prefix via a Valid
      Lifetime of 0 are not really more harmful than other attacks
      that can be performed via forged RA messages, such as those
      aiming at completely disabling a next-hop router via an RA that
      advertises a Router Lifetime of 0, or performing a Denial of
      Service (DoS) attack by advertising illegitimate prefixes via
      forged PIOs.  In scenarios where RA-based attacks are of concern,
      proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
      be implemented.
      Signed-off-by: NFernando Gont <fgont@si6networks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b75326c2
  3. 03 4月, 2020 1 次提交
    • H
      neigh: support smaller retrans_time settting · 19e16d22
      Hangbin Liu 提交于
      Currently, we limited the retrans_time to be greater than HZ/2. i.e.
      setting retrans_time less than 500ms will not work. This makes the user
      unable to achieve a more accurate control for bonding arp fast failover.
      
      Update the sanity check to HZ/100, which is 10ms, to let users have more
      ability on the retrans_time control.
      
      v3: sync the behavior with IPv6 and update all the timer handler
      v2: use HZ instead of hard code number
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19e16d22
  4. 02 4月, 2020 1 次提交
    • J
      ipv6: don't auto-add link-local address to lag ports · 744fdc82
      Jarod Wilson 提交于
      Bonding slave and team port devices should not have link-local addresses
      automatically added to them, as it can interfere with openvswitch being
      able to properly add tc ingress.
      
      Basic reproducer, courtesy of Marcelo:
      
      $ ip link add name bond0 type bond
      $ ip link set dev ens2f0np0 master bond0
      $ ip link set dev ens2f1np2 master bond0
      $ ip link set dev bond0 up
      $ ip a s
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
      group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
      noqueue state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link
             valid_lft forever preferred_lft forever
      
      (above trimmed to relevant entries, obviously)
      
      $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0
      net.ipv6.conf.ens2f0np0.addr_gen_mode = 0
      $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0
      net.ipv6.conf.ens2f1np2.addr_gen_mode = 0
      
      $ ip a l ens2f0np0
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      $ ip a l ens2f1np2
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      
      Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is
      this a slave interface?" check added by commit c2edacf8, and
      results in an address getting added, while w/the proposed patch added,
      no address gets added. This simply adds the same gating check to another
      code path, and thus should prevent the same devices from erroneously
      obtaining an ipv6 link-local address.
      
      Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
      Reported-by: NMoshe Levi <moshele@mellanox.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Marcelo Ricardo Leitner <mleitner@redhat.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744fdc82
  5. 30 3月, 2020 2 次提交
  6. 13 3月, 2020 1 次提交
  7. 11 3月, 2020 1 次提交
    • H
      ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface · 60380488
      Hangbin Liu 提交于
      Rafał found an issue that for non-Ethernet interface, if we down and up
      frequently, the memory will be consumed slowly.
      
      The reason is we add allnodes/allrouters addressed in multicast list in
      ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
      addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
      for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
      getting bigger and bigger. The call stack looks like:
      
      addrconf_notify(NETDEV_REGISTER)
      	ipv6_add_dev
      		ipv6_dev_mc_inc(ff01::1)
      		ipv6_dev_mc_inc(ff02::1)
      		ipv6_dev_mc_inc(ff02::2)
      
      addrconf_notify(NETDEV_UP)
      	addrconf_dev_config
      		/* Alas, we support only Ethernet autoconfiguration. */
      		return;
      
      addrconf_notify(NETDEV_DOWN)
      	addrconf_ifdown
      		ipv6_mc_down
      			igmp6_group_dropped(ff02::2)
      				mld_add_delrec(ff02::2)
      			igmp6_group_dropped(ff02::1)
      			igmp6_group_dropped(ff01::1)
      
      After investigating, I can't found a rule to disable multicast on
      non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
      tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
      in inetdev_event(). Even for IPv6, we don't check the dev type and call
      ipv6_add_dev(), ipv6_dev_mc_inc() after register device.
      
      So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
      non-Ethernet interface.
      
      v2: Also check IFF_MULTICAST flag to make sure the interface supports
          multicast
      Reported-by: NRafał Miłecki <zajec5@gmail.com>
      Tested-by: NRafał Miłecki <zajec5@gmail.com>
      Fixes: 74235a25 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
      Fixes: 1666d49e ("mld: do not remove mld souce list info when set link down")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60380488
  8. 04 3月, 2020 2 次提交
    • H
      net/ipv6: remove the old peer route if change it to a new one · d0098e4c
      Hangbin Liu 提交于
      When we modify the peer route and changed it to a new one, we should
      remove the old route first. Before the fix:
      
      + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      
      After the fix:
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::3 proto kernel metric 256 pref medium
      
      This patch depend on the previous patch "net/ipv6: need update peer route
      when modify metric" to update new peer route after delete old one.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0098e4c
    • H
      net/ipv6: need update peer route when modify metric · 61794012
      Hangbin Liu 提交于
      When we modify the route metric, the peer address's route need also
      be updated. Before the fix:
      
      + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 60 pref medium
      2001:db8::2 proto kernel metric 60 pref medium
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 61 pref medium
      2001:db8::2 proto kernel metric 60 pref medium
      
      After the fix:
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 61 pref medium
      2001:db8::2 proto kernel metric 61 pref medium
      
      Fixes: 8308f3ff ("net/ipv6: Add support for specifying metric of connected routes")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61794012
  9. 01 3月, 2020 1 次提交
  10. 08 2月, 2020 1 次提交
    • E
      ipv6/addrconf: fix potential NULL deref in inet6_set_link_af() · db3fa271
      Eric Dumazet 提交于
      __in6_dev_get(dev) called from inet6_set_link_af() can return NULL.
      
      The needed check has been recently removed, let's add it back.
      
      While do_setlink() does call validate_linkmsg() :
      ...
      err = validate_linkmsg(dev, tb); /* OK at this point */
      ...
      
      It is possible that the following call happening before the
      ->set_link_af() removes IPv6 if MTU is less than 1280 :
      
      if (tb[IFLA_MTU]) {
          err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack);
          if (err < 0)
                goto errout;
          status |= DO_SETLINK_MODIFIED;
      }
      ...
      
      if (tb[IFLA_AF_SPEC]) {
         ...
         err = af_ops->set_link_af(dev, af);
            ->inet6_set_link_af() // CRASH because idev is NULL
      
      Please note that IPv4 is immune to the bug since inet_set_link_af() does :
      
      struct in_device *in_dev = __in_dev_get_rcu(dev);
      if (!in_dev)
          return -EAFNOSUPPORT;
      
      This problem has been mentioned in commit cf7afbfe ("rtnl: make
      link af-specific updates atomic") changelog :
      
          This method is not fail proof, while it is currently sufficient
          to make set_link_af() inerrable and thus 100% atomic, the
          validation function method will not be able to detect all error
          scenarios in the future, there will likely always be errors
          depending on states which are f.e. not protected by rtnl_mutex
          and thus may change between validation and setting.
      
      IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready
      general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7]
      CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
      Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
      RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
      RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
      RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
      R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
      FS:  0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754
       rtnl_group_changelink net/core/rtnetlink.c:3103 [inline]
       __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257
       rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377
       rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:672
       ____sys_sendmsg+0x753/0x880 net/socket.c:2343
       ___sys_sendmsg+0x100/0x170 net/socket.c:2397
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2430
       __do_sys_sendmsg net/socket.c:2439 [inline]
       __se_sys_sendmsg net/socket.c:2437 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4402e9
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8
      R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70
      R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      ---[ end trace cfa7664b8fdcdff3 ]---
      RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
      Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
      RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
      RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
      RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
      R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
      FS:  0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 7dc2bcca ("Validate required parameters in inet6_validate_link_af")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Bisected-and-reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db3fa271
  11. 14 12月, 2019 1 次提交
  12. 17 10月, 2019 1 次提交
  13. 16 10月, 2019 1 次提交
    • M
      blackhole_netdev: fix syzkaller reported issue · b0818f80
      Mahesh Bandewar 提交于
      While invalidating the dst, we assign backhole_netdev instead of
      loopback device. However, this device does not have idev pointer
      and hence no ip6_ptr even if IPv6 is enabled. Possibly this has
      triggered the syzbot reported crash.
      
      The syzbot report does not have reproducer, however, this is the
      only device that doesn't have matching idev created.
      
      Crash instruction is :
      
      static inline bool ip6_ignore_linkdown(const struct net_device *dev)
      {
              const struct inet6_dev *idev = __in6_dev_get(dev);
      
              return !!idev->cnf.ignore_routes_with_linkdown; <= crash
      }
      
      Also ipv6 always assumes presence of idev and never checks for it
      being NULL (as does the above referenced code). So adding a idev
      for the blackhole_netdev to avoid this class of crashes in the future.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0818f80
  14. 05 10月, 2019 2 次提交
    • D
      ipv6: Handle missing host route in __ipv6_ifa_notify · 2d819d25
      David Ahern 提交于
      Rajendra reported a kernel panic when a link was taken down:
      
          [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
          [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
          <snip>
      
          [ 6870.570501] Call Trace:
          [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
          [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
          [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
          [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
          [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
          [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
          [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
          [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
          [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
          [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
          [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
          [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in __ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      Since the DAD sequence can not be aborted, add a check for the missing
      host route in __ipv6_ifa_notify. The only way this should happen is due
      to the previously mentioned race. The host route is created when the
      address is added to an interface; it is only removed on a down event
      where the address is kept. Add a warning if the host route is missing
      AND the device is up; this is a situation that should never happen.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: NRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d819d25
    • D
      Revert "ipv6: Handle race in addrconf_dad_work" · 8ae72cbf
      David Ahern 提交于
      This reverts commit a3ce2a21.
      
      Eric reported tests failings with commit. After digging into it,
      the bottom line is that the DAD sequence is not to be messed with.
      There are too many cases that are expected to proceed regardless
      of whether a device is up.
      
      Revert the patch and I will send a different solution for the
      problem Rajendra reported.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ae72cbf
  15. 02 10月, 2019 2 次提交
    • D
      ipv6: Handle race in addrconf_dad_work · a3ce2a21
      David Ahern 提交于
      Rajendra reported a kernel panic when a link was taken down:
      
      [ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      [ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290
      
      <snip>
      
      [ 6870.570501] Call Trace:
      [ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
      [ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
      [ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
      [ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
      [ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
      [ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
      [ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
      [ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
      [ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
      [ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
      [ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
      [ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0
      
      addrconf_dad_work is kicked to be scheduled when a device is brought
      up. There is a race between addrcond_dad_work getting scheduled and
      taking the rtnl lock and a process taking the link down (under rtnl).
      The latter removes the host route from the inet6_addr as part of
      addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
      to use the host route in ipv6_ifa_notify. If the down event removes
      the host route due to the race to the rtnl, then the BUG listed above
      occurs.
      
      This scenario does not occur when the ipv6 address is not kept
      (net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
      state of the ifp to DEAD. Handle when the addresses are kept by checking
      IF_READY which is reset by addrconf_ifdown.
      
      The 'dead' flag for an inet6_addr is set only under rtnl, in
      addrconf_ifdown and it means the device is getting removed (or IPv6 is
      disabled). The interesting cases for changing the idev flag are
      addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
      (reset the flag). The former does not have the idev lock - only rtnl;
      the latter has both. Based on that the existing dead + IF_READY check
      can be moved to right after the rtnl_lock in addrconf_dad_work.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: NRajendra Dendukuri <rajendra.dendukuri@broadcom.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3ce2a21
    • N
      ipv6: minor code reorg in inet6_fill_ifla6_attrs() · 0d7982ce
      Nicolas Dichtel 提交于
      Just put related code together to ease code reading: the memcpy() is
      related to the nla_reserve().
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d7982ce
  16. 24 8月, 2019 1 次提交
  17. 22 8月, 2019 1 次提交
    • H
      ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is set · f17f7648
      Hangbin Liu 提交于
      In commit 93a714d6 ("multicast: Extend ip address command to enable
      multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN
      to make user able to add multicast address on ethernet interface.
      
      This works for IPv4, but not for IPv6. See the inet6_addr_add code.
      
      static int inet6_addr_add()
      {
      	...
      	if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
      		ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...)
      	}
      
      	ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr
      	if (!IS_ERR(ifp)) {
      		...
      	} else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
      		ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...)
      	}
      }
      
      But in ipv6_add_addr() it will check the address type and reject multicast
      address directly. So this feature is never worked for IPv6.
      
      We should not remove the multicast address check totally in ipv6_add_addr(),
      but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied.
      
      v2: update commit description
      
      Fixes: 93a714d6 ("multicast: Extend ip address command to enable multicast group join/leave on")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f17f7648
  18. 19 7月, 2019 1 次提交
  19. 05 6月, 2019 1 次提交
    • D
      ipv6: Plumb support for nexthop object in a fib6_info · f88d8ea6
      David Ahern 提交于
      Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
      fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info
      referencing a nexthop object can not have 'sibling' entries (the old way
      of doing multipath routes), the nh_list is a union with fib6_siblings.
      
      Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
      using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
      and delete fib entries using the nexthop.
      
      Add a few nexthop helpers for use when a nexthop is added to fib6_info:
      - nexthop_fib6_nh - return first fib6_nh in a nexthop object
      - fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
        if the fib6_info references a nexthop object
      - nexthop_path_fib6_result - similar to ipv4, select a path within a
        multipath nexthop object. If the nexthop is a blackhole, set
        fib6_result type to RTN_BLACKHOLE, and set the REJECT flag
      
      Update the fib6_info references to check for nh and take a different path
      as needed:
      - rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
        be coalesced with other fib entries into a multipath route
      - rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
        a nexthop
      - addrconf (host routes), RA's and info entries (anything configured via
        ndisc) does not use nexthop objects
      - fib6_info_destroy_rcu - put reference to nexthop object
      - fib6_purge_rt - drop fib6_info from f6i_list
      - fib6_select_path - update to use the new nexthop_path_fib6_result when
        fib entry uses a nexthop object
      - rt6_device_match - update to catch use of nexthop object as a blackhole
        and set fib6_type and flags.
      - ip6_route_info_create - don't add space for fib6_nh if fib entry is
        going to reference a nexthop object, take a reference to nexthop object,
        disallow use of source routing
      - rt6_nlmsg_size - add space for RTA_NH_ID
      - add rt6_fill_node_nexthop to add nexthop data on a dump
      
      As with ipv4, most of the changes push existing code into the else branch
      of whether the fib entry uses a nexthop object.
      
      Update the nexthop code to walk f6i_list on a nexthop deleted to remove
      fib entries referencing it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88d8ea6
  20. 03 6月, 2019 1 次提交
  21. 31 5月, 2019 1 次提交
  22. 25 5月, 2019 2 次提交
    • D
      ipv6: Make fib6_nh optional at the end of fib6_info · 1cf844c7
      David Ahern 提交于
      Move fib6_nh to the end of fib6_info and make it an array of
      size 0. Pass a flag to fib6_info_alloc indicating if the
      allocation needs to add space for a fib6_nh.
      
      The current code path always has a fib6_nh allocated with a
      fib6_info; with nexthop objects they will be separate.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cf844c7
    • D
      ipv6: Move pcpu cached routes to fib6_nh · f40b6ae2
      David Ahern 提交于
      rt6_info are specific instances of a fib entry and are tied to a
      device and gateway - ie., a nexthop. Before nexthop objects, IPv6 fib
      entries have separate fib6_info for each nexthop in a multipath route,
      so the location of the pcpu cache in the fib6_info struct worked.
      However, with nexthop objects a fib6_info can point to a set of nexthops
      (yet another alignment of ipv6 with ipv4). Accordingly, the pcpu
      cache needs to be moved to the fib6_nh struct so the cached entries
      are local to the nexthop specification used to create the rt6_info.
      
      Initialization and free of the pcpu entries moved to fib6_nh_init and
      fib6_nh_release.
      
      Change in location only, from fib6_info down to fib6_nh; no other
      functional change intended.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40b6ae2
  23. 23 5月, 2019 1 次提交
    • M
      Validate required parameters in inet6_validate_link_af · 7dc2bcca
      Maxim Mikityanskiy 提交于
      inet6_set_link_af requires that at least one of IFLA_INET6_TOKEN or
      IFLA_INET6_ADDR_GET_MODE is passed. If none of them is passed, it
      returns -EINVAL, which may cause do_setlink() to fail in the middle of
      processing other commands and give the following warning message:
      
        A link change request failed with some changes committed already.
        Interface eth0 may have been left with an inconsistent configuration,
        please check.
      
      Check the presence of at least one of them in inet6_validate_link_af to
      detect invalid parameters at an early stage, before do_setlink does
      anything. Also validate the address generation mode at an early stage.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7dc2bcca
  24. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  25. 09 4月, 2019 1 次提交
  26. 30 3月, 2019 2 次提交
  27. 05 3月, 2019 1 次提交
  28. 12 2月, 2019 1 次提交
    • Z
      net: fix IPv6 prefix route residue · e75913c9
      Zhiqiang Liu 提交于
      Follow those steps:
       # ip addr add 2001:123::1/32 dev eth0
       # ip addr add 2001:123:456::2/64 dev eth0
       # ip addr del 2001:123::1/32 dev eth0
       # ip addr del 2001:123:456::2/64 dev eth0
      and then prefix route of 2001:123::1/32 will still exist.
      
      This is because ipv6_prefix_equal in check_cleanup_prefix_route
      func does not check whether two IPv6 addresses have the same
      prefix length. If the prefix of one address starts with another
      shorter address prefix, even though their prefix lengths are
      different, the return value of ipv6_prefix_equal is true.
      
      Here I add a check of whether two addresses have the same prefix
      to decide whether their prefixes are equal.
      
      Fixes: 5b84efec ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reported-by: NWenhao Zhang <zhangwenhao8@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e75913c9
  29. 23 1月, 2019 3 次提交
  30. 20 1月, 2019 2 次提交