1. 22 5月, 2020 1 次提交
    • S
      net: don't return invalid table id error when we fall back to PF_UNSPEC · 41b4bd98
      Sabrina Dubroca 提交于
      In case we can't find a ->dumpit callback for the requested
      (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're
      in the same situation as if userspace had requested a PF_UNSPEC
      dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all
      the registered RTM_GETROUTE handlers.
      
      The requested table id may or may not exist for all of those
      families. commit ae677bbb ("net: Don't return invalid table id
      error when dumping all families") fixed the problem when userspace
      explicitly requests a PF_UNSPEC dump, but missed the fallback case.
      
      For example, when we pass ipv6.disable=1 to a kernel with
      CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y,
      the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in
      rtnl_dump_all, and listing IPv6 routes will unexpectedly print:
      
        # ip -6 r
        Error: ipv4: MR table does not exist.
        Dump terminated
      
      commit ae677bbb introduced the dump_all_families variable, which
      gets set when userspace requests a PF_UNSPEC dump. However, we can't
      simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the
      fallback case to get dump_all_families == true, because some messages
      types (for example RTM_GETRULE and RTM_GETNEIGH) only register the
      PF_UNSPEC handler and use the family to filter in the kernel what is
      dumped to userspace. We would then export more entries, that userspace
      would have to filter. iproute does that, but other programs may not.
      
      Instead, this patch removes dump_all_families and updates the
      RTM_GETROUTE handlers to check if the family that is being dumped is
      their own. When it's not, which covers both the intentional PF_UNSPEC
      dumps (as dump_all_families did) and the fallback case, ignore the
      missing table id error.
      
      Fixes: cb167893 ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41b4bd98
  2. 17 5月, 2020 1 次提交
  3. 13 5月, 2020 1 次提交
    • P
      netlabel: cope with NULL catmap · eead1c2e
      Paolo Abeni 提交于
      The cipso and calipso code can set the MLS_CAT attribute on
      successful parsing, even if the corresponding catmap has
      not been allocated, as per current configuration and external
      input.
      
      Later, selinux code tries to access the catmap if the MLS_CAT flag
      is present via netlbl_catmap_getlong(). That may cause null ptr
      dereference while processing incoming network traffic.
      
      Address the issue setting the MLS_CAT flag only if the catmap is
      really allocated. Additionally let netlbl_catmap_getlong() cope
      with NULL catmap.
      Reported-by: NMatthew Sheets <matthew.sheets@gd-ms.com>
      Fixes: 4b8feff2 ("netlabel: fix the horribly broken catmap functions")
      Fixes: ceba1832 ("calipso: Set the calipso socket label to match the secattr.")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eead1c2e
  4. 08 5月, 2020 1 次提交
    • M
      Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu" · 09454fd0
      Maciej Żenczykowski 提交于
      This reverts commit 19bda36c:
      
      | ipv6: add mtu lock check in __ip6_rt_update_pmtu
      |
      | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
      | It leaded to that mtu lock doesn't really work when receiving the pkt
      | of ICMPV6_PKT_TOOBIG.
      |
      | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
      | did in __ip_rt_update_pmtu.
      
      The above reasoning is incorrect.  IPv6 *requires* icmp based pmtu to work.
      There's already a comment to this effect elsewhere in the kernel:
      
        $ git grep -p -B1 -A3 'RTAX_MTU lock'
        net/ipv6/route.c=4813=
      
        static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg)
        ...
          /* In IPv6 pmtu discovery is not optional,
             so that RTAX_MTU lock cannot disable it.
             We still use this lock to block changes
             caused by addrconf/ndisc.
          */
      
      This reverts to the pre-4.9 behaviour.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Fixes: 19bda36c ("ipv6: add mtu lock check in __ip6_rt_update_pmtu")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09454fd0
  5. 07 5月, 2020 1 次提交
    • A
      seg6: fix SRH processing to comply with RFC8754 · 0cb7498f
      Ahmed Abdelsalam 提交于
      The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined
      in RFC8754.
      
      RFC8754 (section 4.1) defines the SR source node behavior which encapsulates
      packets into an outer IPv6 header and SRH. The SR source node encodes the
      full list of Segments that defines the packet path in the SRH. Then, the
      first segment from list of Segments is copied into the Destination address
      of the outer IPv6 header and the packet is sent to the first hop in its path
      towards the destination.
      
      If the Segment list has only one segment, the SR source node can omit the SRH
      as he only segment is added in the destination address.
      
      RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not
      require the entire SID list to be preserved in the SRH. A reduced SRH does
      not contain the first segment of the related SR Policy (the first segment is
      the one already in the DA of the IPv6 header), and the Last Entry field is
      set to n-2, where n is the number of elements in the SR Policy.
      
      RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to
      validate the SRH (S09, S10, S11) which works for both reduced and
      non-reduced behaviors.
      
      This patch updates seg6_validate_srh() to validate the SRH as per RFC8754.
      Signed-off-by: NAhmed Abdelsalam <ahabdels@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cb7498f
  6. 02 5月, 2020 1 次提交
    • D
      ipv6: Use global sernum for dst validation with nexthop objects · 8f34e53b
      David Ahern 提交于
      Nik reported a bug with pcpu dst cache when nexthop objects are
      used illustrated by the following:
          $ ip netns add foo
          $ ip -netns foo li set lo up
          $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
          $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
          $ ip li add veth1 type veth peer name veth2
          $ ip li set veth1 up
          $ ip addr add 2001:db8:10::1/64 dev veth1
          $ ip li set dev veth2 netns foo
          $ ip -netns foo li set veth2 up
          $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
          $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
          $ ip -6 route add 2001:db8:11::1/128 nhid 100
      
          Create a pcpu entry on cpu 0:
          $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
      
          Re-add the route entry:
          $ ip -6 ro del 2001:db8:11::1
          $ ip -6 route add 2001:db8:11::1/128 nhid 100
      
          Route get on cpu 0 returns the stale pcpu:
          $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
          RTNETLINK answers: Network is unreachable
      
          While cpu 1 works:
          $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
          2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium
      
      Conversion of FIB entries to work with external nexthop objects
      missed an important difference between IPv4 and IPv6 - how dst
      entries are invalidated when the FIB changes. IPv4 has a per-network
      namespace generation id (rt_genid) that is bumped on changes to the FIB.
      Checking if a dst_entry is still valid means comparing rt_genid in the
      rtable to the current value of rt_genid for the namespace.
      
      IPv6 also has a per network namespace counter, fib6_sernum, but the
      count is saved per fib6_node. With the per-node counter only dst_entries
      based on fib entries under the node are invalidated when changes are
      made to the routes - limiting the scope of invalidations. IPv6 uses a
      reference in the rt6_info, 'from', to track the corresponding fib entry
      used to create the dst_entry. When validating a dst_entry, the 'from'
      is used to backtrack to the fib6_node and check the sernum of it to the
      cookie passed to the dst_check operation.
      
      With the inline format (nexthop definition inline with the fib6_info),
      dst_entries cached in the fib6_nh have a 1:1 correlation between fib
      entries, nexthop data and dst_entries. With external nexthops, IPv6
      looks more like IPv4 which means multiple fib entries across disparate
      fib6_nodes can all reference the same fib6_nh. That means validation
      of dst_entries based on external nexthops needs to use the IPv4 format
      - the per-network namespace counter.
      
      Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
      rt6_get_cookie to return sernum if it is set and update dst_check for
      IPv6 to look for sernum set and based the check on it if so. Finally,
      rt6_get_pcpu_route needs to validate the cached entry before returning
      a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
      __mkroute_output for IPv4).
      
      This problem only affects routes using the new, external nexthops.
      
      Thanks to the kbuild test robot for catching the IS_ENABLED needed
      around rt_genid_ipv6 before I sent this out.
      
      Fixes: 5b98324e ("ipv6: Allow routes to use nexthop objects")
      Reported-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f34e53b
  7. 23 4月, 2020 1 次提交
  8. 21 4月, 2020 1 次提交
  9. 19 4月, 2020 1 次提交
    • A
      ipv6: rpl: fix full address compression · 62e69776
      Alexander Aring 提交于
      This patch makes it impossible that cmpri or cmpre values are set to the
      value 16 which is not possible, because these are 4 bit values. We
      currently run in an overflow when assigning the value 16 to it.
      
      According to the standard a value of 16 can be interpreted as a full
      elided address which isn't possible to set as compression value. A reason
      why this cannot be set is that the current ipv6 header destination address
      should never show up inside the segments of the rpl header. In this case we
      run in a overflow and the address will have no compression at all. Means
      cmpri or compre is set to 0.
      
      As we handle cmpri and cmpre sometimes as unsigned char or 4 bit value
      inside the rpl header the current behaviour ends in an invalid header
      format. This patch simple use the best compression method if we ever run
      into the case that the destination address is showed up inside the rpl
      segments. We avoid the overflow handling and the rpl header is still valid,
      even when we have the destination address inside the rpl segments.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62e69776
  10. 16 4月, 2020 1 次提交
  11. 08 4月, 2020 1 次提交
    • T
      net: icmp6: do not select saddr from iif when route has prefsrc set · b93cfb9c
      Tim Stallard 提交于
      Since commit fac6fce9 ("net: icmp6: provide input address for
      traceroute6") ICMPv6 errors have source addresses from the ingress
      interface. However, this overrides when source address selection is
      influenced by setting preferred source addresses on routes.
      
      This can result in ICMP errors being lost to upstream BCP38 filters
      when the wrong source addresses are used, breaking path MTU discovery
      and traceroute.
      
      This patch sets the modified source address selection to only take place
      when the route used has no prefsrc set.
      
      It can be tested with:
      
      ip link add v1 type veth peer name v2
      ip netns add test
      ip netns exec test ip link set lo up
      ip link set v2 netns test
      ip link set v1 up
      ip netns exec test ip link set v2 up
      ip addr add 2001:db8::1/64 dev v1 nodad
      ip addr add 2001:db8::3 dev v1 nodad
      ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
      ip netns exec test ip route add unreachable 2001:db8:1::1
      ip netns exec test ip addr add 2001:db8:100::1 dev lo
      ip netns exec test ip route add 2001:db8::1 dev v2 src 2001:db8:100::1
      ip route add 2001:db8:1000::1 via 2001:db8::2
      traceroute6 -s 2001:db8::1 2001:db8:1000::1
      traceroute6 -s 2001:db8::3 2001:db8:1000::1
      ip netns delete test
      
      Output before:
      $ traceroute6 -s 2001:db8::1 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8::2 (2001:db8::2)  0.843 ms !N  0.396 ms !N  0.257 ms !N
      $ traceroute6 -s 2001:db8::3 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8::2 (2001:db8::2)  0.772 ms !N  0.257 ms !N  0.357 ms !N
      
      After:
      $ traceroute6 -s 2001:db8::1 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8:100::1 (2001:db8:100::1)  8.885 ms !N  0.310 ms !N  0.174 ms !N
      $ traceroute6 -s 2001:db8::3 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8::2 (2001:db8::2)  1.403 ms !N  0.205 ms !N  0.313 ms !N
      
      Fixes: fac6fce9 ("net: icmp6: provide input address for traceroute6")
      Signed-off-by: NTim Stallard <code@timstallard.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b93cfb9c
  12. 07 4月, 2020 1 次提交
    • A
      ipv6: rpl: fix loop iteration · a7f9a6f4
      Alexander Aring 提交于
      This patch fix the loop iteration by not walking over the last
      iteration. The cmpri compressing value exempt the last segment. As the
      code shows the last iteration will be overwritten by cmpre value
      handling which is for the last segment.
      
      I think this doesn't end in any bufferoverflows because we work on worst
      case temporary buffer sizes but it ends in not best compression settings
      in some cases.
      
      Fixes: 8610c7c6 ("net: ipv6: add support for rpl sr exthdr")
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7f9a6f4
  13. 03 4月, 2020 1 次提交
    • H
      neigh: support smaller retrans_time settting · 19e16d22
      Hangbin Liu 提交于
      Currently, we limited the retrans_time to be greater than HZ/2. i.e.
      setting retrans_time less than 500ms will not work. This makes the user
      unable to achieve a more accurate control for bonding arp fast failover.
      
      Update the sanity check to HZ/100, which is 10ms, to let users have more
      ability on the retrans_time control.
      
      v3: sync the behavior with IPv6 and update all the timer handler
      v2: use HZ instead of hard code number
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19e16d22
  14. 02 4月, 2020 2 次提交
    • C
      net: ipv6: rpl_iptunnel: remove redundant assignments to variable err · d16fa759
      Colin Ian King 提交于
      The variable err is being initialized with a value that is never
      read and it is being updated later with a new value.  The initialization
      is redundant and can be removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d16fa759
    • J
      ipv6: don't auto-add link-local address to lag ports · 744fdc82
      Jarod Wilson 提交于
      Bonding slave and team port devices should not have link-local addresses
      automatically added to them, as it can interfere with openvswitch being
      able to properly add tc ingress.
      
      Basic reproducer, courtesy of Marcelo:
      
      $ ip link add name bond0 type bond
      $ ip link set dev ens2f0np0 master bond0
      $ ip link set dev ens2f1np2 master bond0
      $ ip link set dev bond0 up
      $ ip a s
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
      group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
      noqueue state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link
             valid_lft forever preferred_lft forever
      
      (above trimmed to relevant entries, obviously)
      
      $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0
      net.ipv6.conf.ens2f0np0.addr_gen_mode = 0
      $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0
      net.ipv6.conf.ens2f1np2.addr_gen_mode = 0
      
      $ ip a l ens2f0np0
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      $ ip a l ens2f1np2
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      
      Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is
      this a slave interface?" check added by commit c2edacf8, and
      results in an address getting added, while w/the proposed patch added,
      no address gets added. This simply adds the same gating check to another
      code path, and thus should prevent the same devices from erroneously
      obtaining an ipv6 link-local address.
      
      Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
      Reported-by: NMoshe Levi <moshele@mellanox.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Marcelo Ricardo Leitner <mleitner@redhat.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744fdc82
  15. 01 4月, 2020 1 次提交
  16. 31 3月, 2020 2 次提交
  17. 30 3月, 2020 4 次提交
  18. 27 3月, 2020 1 次提交
  19. 26 3月, 2020 1 次提交
    • X
      esp6: add gso_segment for esp6 beet mode · 7f9e40eb
      Xin Long 提交于
      Similar to xfrm6_tunnel/transport_gso_segment(), _gso_segment()
      is added to do gso_segment for esp6 beet mode. Before calling
      inet6_offloads[proto]->callbacks.gso_segment, it needs to do:
      
        - Get the upper proto from ph header to get its gso_segment
          when xo->proto is IPPROTO_BEETPH.
      
        - Add SKB_GSO_TCPV6 to gso_type if x->sel.family != AF_INET6
          and the proto == IPPROTO_TCP, so that the current tcp ipv6
          packet can be segmented.
      
        - Calculate a right value for skb->transport_header and move
          skb->data to the transport header position.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7f9e40eb
  20. 24 3月, 2020 1 次提交
  21. 16 3月, 2020 1 次提交
  22. 15 3月, 2020 1 次提交
    • G
      netfilter: Replace zero-length array with flexible-array member · 6daf1414
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      Lastly, fix checkpatch.pl warning
      WARNING: __aligned(size) is preferred over __attribute__((aligned(size)))
      in net/bridge/netfilter/ebtables.c
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6daf1414
  23. 13 3月, 2020 1 次提交
  24. 12 3月, 2020 1 次提交
  25. 11 3月, 2020 1 次提交
    • H
      ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface · 60380488
      Hangbin Liu 提交于
      Rafał found an issue that for non-Ethernet interface, if we down and up
      frequently, the memory will be consumed slowly.
      
      The reason is we add allnodes/allrouters addressed in multicast list in
      ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
      addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
      for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
      getting bigger and bigger. The call stack looks like:
      
      addrconf_notify(NETDEV_REGISTER)
      	ipv6_add_dev
      		ipv6_dev_mc_inc(ff01::1)
      		ipv6_dev_mc_inc(ff02::1)
      		ipv6_dev_mc_inc(ff02::2)
      
      addrconf_notify(NETDEV_UP)
      	addrconf_dev_config
      		/* Alas, we support only Ethernet autoconfiguration. */
      		return;
      
      addrconf_notify(NETDEV_DOWN)
      	addrconf_ifdown
      		ipv6_mc_down
      			igmp6_group_dropped(ff02::2)
      				mld_add_delrec(ff02::2)
      			igmp6_group_dropped(ff02::1)
      			igmp6_group_dropped(ff01::1)
      
      After investigating, I can't found a rule to disable multicast on
      non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
      tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
      in inetdev_event(). Even for IPv6, we don't check the dev type and call
      ipv6_add_dev(), ipv6_dev_mc_inc() after register device.
      
      So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
      non-Ethernet interface.
      
      v2: Also check IFF_MULTICAST flag to make sure the interface supports
          multicast
      Reported-by: NRafał Miłecki <zajec5@gmail.com>
      Tested-by: NRafał Miłecki <zajec5@gmail.com>
      Fixes: 74235a25 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
      Fixes: 1666d49e ("mld: do not remove mld souce list info when set link down")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60380488
  26. 04 3月, 2020 3 次提交
    • C
      ipv6: Use math to point per net sysctls into the appropriate struct net · d2f7e56d
      Cambda Zhu 提交于
      The data pointers of ipv6 sysctl are set one by one which is hard to
      maintain, especially with kconfig. This patch simplifies it by using
      math to point the per net sysctls into the appropriate struct net,
      just like what we did for ipv4.
      Signed-off-by: NCambda Zhu <cambda@linux.alibaba.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2f7e56d
    • H
      net/ipv6: remove the old peer route if change it to a new one · d0098e4c
      Hangbin Liu 提交于
      When we modify the peer route and changed it to a new one, we should
      remove the old route first. Before the fix:
      
      + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::2 proto kernel metric 256 pref medium
      
      After the fix:
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::3
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 256 pref medium
      2001:db8::3 proto kernel metric 256 pref medium
      
      This patch depend on the previous patch "net/ipv6: need update peer route
      when modify metric" to update new peer route after delete old one.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0098e4c
    • H
      net/ipv6: need update peer route when modify metric · 61794012
      Hangbin Liu 提交于
      When we modify the route metric, the peer address's route need also
      be updated. Before the fix:
      
      + ip addr add dev dummy1 2001:db8::1 peer 2001:db8::2 metric 60
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 60 pref medium
      2001:db8::2 proto kernel metric 60 pref medium
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 61 pref medium
      2001:db8::2 proto kernel metric 60 pref medium
      
      After the fix:
      + ip addr change dev dummy1 2001:db8::1 peer 2001:db8::2 metric 61
      + ip -6 route show dev dummy1
      2001:db8::1 proto kernel metric 61 pref medium
      2001:db8::2 proto kernel metric 61 pref medium
      
      Fixes: 8308f3ff ("net/ipv6: Add support for specifying metric of connected routes")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61794012
  27. 01 3月, 2020 1 次提交
  28. 29 2月, 2020 1 次提交
    • G
      ipv6: Replace zero-length array with flexible-array member · b0c9a2d9
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0c9a2d9
  29. 27 2月, 2020 2 次提交
    • M
      ipv6: xfrm6_tunnel.c: Use built-in RCU list checking · edf0d283
      Madhuparna Bhowmik 提交于
      hlist_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
      by default.
      Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      edf0d283
    • E
      ipv6: restrict IPV6_ADDRFORM operation · b6f61189
      Eric Dumazet 提交于
      IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one.
      While this operation sounds illogical, we have to support it.
      
      One of the things it does for TCP socket is to switch sk->sk_prot
      to tcp_prot.
      
      We now have other layers playing with sk->sk_prot, so we should make
      sure to not interfere with them.
      
      This patch makes sure sk_prot is the default pointer for TCP IPv6 socket.
      
      syzbot reported :
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD a0113067 P4D a0113067 PUD a8771067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427
       __sock_release net/socket.c:605 [inline]
       sock_close+0xe1/0x260 net/socket.c:1283
       __fput+0x2e4/0x740 fs/file_table.c:280
       ____fput+0x15/0x20 fs/file_table.c:313
       task_work_run+0x176/0x1b0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:188 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
       prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
       syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
       do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x45c429
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2ae75dac78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: 0000000000000000 RBX: 00007f2ae75db6d4 RCX: 000000000045c429
      RDX: 0000000000000001 RSI: 000000000000011a RDI: 0000000000000004
      RBP: 000000000076bf20 R08: 0000000000000038 R09: 0000000000000000
      R10: 0000000020000180 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000000a9d R14: 00000000004ccfb4 R15: 000000000076bf2c
      Modules linked in:
      CR2: 0000000000000000
      ---[ end trace 82567b5207e87bae ]---
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f61189
  30. 25 2月, 2020 2 次提交
  31. 21 2月, 2020 1 次提交
    • K
      net: ip6_gre: Distribute switch variables for initialization · 46d30cb1
      Kees Cook 提交于
      Variables declared in a switch statement before any case statements
      cannot be automatically initialized with compiler instrumentation (as
      they are not part of any execution flow). With GCC's proposed automatic
      stack variable initialization feature, this triggers a warning (and they
      don't get initialized). Clang's automatic stack variable initialization
      (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
      doesn't initialize such variables[1]. Note that these warnings (or silent
      skipping) happen before the dead-store elimination optimization phase,
      so even when the automatic initializations are later elided in favor of
      direct initializations, the warnings remain.
      
      To avoid these problems, move such variables into the "case" where
      they're used or lift them up into the main function body.
      
      net/ipv6/ip6_gre.c: In function ‘ip6gre_err’:
      net/ipv6/ip6_gre.c:440:32: warning: statement will never be executed [-Wswitch-unreachable]
        440 |   struct ipv6_tlv_tnl_enc_lim *tel;
            |                                ^~~
      
      net/ipv6/ip6_tunnel.c: In function ‘ip6_tnl_err’:
      net/ipv6/ip6_tunnel.c:520:32: warning: statement will never be executed [-Wswitch-unreachable]
        520 |   struct ipv6_tlv_tnl_enc_lim *tel;
            |                                ^~~
      
      [1] https://bugs.llvm.org/show_bug.cgi?id=44916Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46d30cb1