1. 06 4月, 2018 1 次提交
  2. 28 3月, 2018 1 次提交
  3. 16 3月, 2018 1 次提交
    • D
      net/ipv6: Change address check to always take a device argument · 232378e8
      David Ahern 提交于
      ipv6_chk_addr_and_flags determines if an address is a local address and
      optionally if it is an address on a specific device. For example, it is
      called by ip6_route_info_create to determine if a given gateway address
      is a local address. The address check currently does not consider L3
      domains and as a result does not allow a route to be added in one VRF
      if the nexthop points to an address in a second VRF. e.g.,
      
          $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
          Error: Invalid gateway address.
      
      where 2001:db8:102::23 is an address on an interface in vrf r1.
      
      ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
      with a separate argument to not limit the address to the specific device.
      The device is used used to determine the L3 domain of interest.
      
      To that end add an argument to skip the device check and update callers
      to always pass a device where possible and use the new argument to mean
      any address in the domain.
      
      Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
      patch handles the change to these callers without adding the domain check.
      
      ip6_validate_gw needs to handle 2 cases - one where the device is given
      as part of the nexthop spec and the other where the device is resolved.
      There is at least 1 VRF case where deferring the check to only after
      the route lookup has resolved the device fails with an unintuitive error
      "RTNETLINK answers: No route to host" as opposed to the preferred
      "Error: Gateway can not be a local address." The 'no route to host'
      error is because of the fallback to a full lookup. The check is done
      twice to avoid this error.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232378e8
  4. 10 3月, 2018 1 次提交
    • E
      net: do not create fallback tunnels for non-default namespaces · 79134e6c
      Eric Dumazet 提交于
      fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0,
      ip6tnl0, ip6gre0) are automatically created when the corresponding
      module is loaded.
      
      These tunnels are also automatically created when a new network
      namespace is created, at a great cost.
      
      In many cases, netns are used for isolation purposes, and these
      extra network devices are a waste of resources. We are using
      thousands of netns per host, and hit the netns creation/delete
      bottleneck a lot. (Many thanks to Kirill for recent work on this)
      
      Add a new sysctl so that we can opt-out from this automatic creation.
      
      Note that these tunnels are still created for the initial namespace,
      to be the least intrusive for typical setups.
      
      Tested:
      lpk43:~# cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
      done
      wait
      
      lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net
      lpk43:~# time ./add_del_unshare.sh
      
      real	0m37.521s
      user	0m0.886s
      sys	7m7.084s
      lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net
      lpk43:~# time ./add_del_unshare.sh
      
      real	0m4.761s
      user	0m0.851s
      sys	1m8.343s
      lpk43:~#
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79134e6c
  5. 05 3月, 2018 1 次提交
  6. 28 2月, 2018 2 次提交
  7. 26 1月, 2018 1 次提交
  8. 03 1月, 2018 2 次提交
  9. 20 12月, 2017 1 次提交
    • X
      ip6_tunnel: get the min mtu properly in ip6_tnl_xmit · c9fefa08
      Xin Long 提交于
      Now it's using IPV6_MIN_MTU as the min mtu in ip6_tnl_xmit, but
      IPV6_MIN_MTU actually only works when the inner packet is ipv6.
      
      With IPV6_MIN_MTU for ipv4 packets, the new pmtu for inner dst
      couldn't be set less than 1280. It would cause tx_err and the
      packet to be dropped when the outer dst pmtu is close to 1280.
      
      Jianlin found it by running ipv4 traffic with the topo:
      
        (client) gre6 <---> eth1 (route) eth2 <---> gre6 (server)
      
      After changing eth2 mtu to 1300, the performance became very
      low, or the connection was even broken. The issue also affects
      ip4ip6 and ip6ip6 tunnels.
      
      So if the inner packet is ipv4, 576 should be considered as the
      min mtu.
      
      Note that for ip4ip6 and ip6ip6 tunnels, the inner packet can
      only be ipv4 or ipv6, but for gre6 tunnel, it may also be ARP.
      This patch using 576 as the min mtu for non-ipv6 packet works
      for all those cases.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9fefa08
  10. 08 12月, 2017 1 次提交
  11. 05 12月, 2017 1 次提交
  12. 13 11月, 2017 3 次提交
  13. 25 10月, 2017 2 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
    • S
      ip6_tunnel: Allow rcv/xmit even if remote address is a local address · 908d140a
      Shmulik Ladkani 提交于
      Currently, ip6_tnl_xmit_ctl drops tunneled packets if the remote
      address (outer v6 destination) is one of host's locally configured
      addresses.
      Same applies to ip6_tnl_rcv_ctl: it drops packets if the remote address
      (outer v6 source) is a local address.
      
      This prevents using ipxip6 (and ip6_gre) tunnels whose local/remote
      endpoints are on same host; OTOH v4 tunnels (ipip or gre) allow such
      configurations.
      
      An example where this proves useful is a system where entities are
      identified by their unique v6 addresses, and use tunnels to encapsulate
      traffic between them. The limitation prevents placing several entities
      on same host.
      
      Introduce IP6_TNL_F_ALLOW_LOCAL_REMOTE which allows to bypass this
      restriction.
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      908d140a
  14. 18 10月, 2017 1 次提交
  15. 01 10月, 2017 1 次提交
  16. 20 9月, 2017 1 次提交
    • E
      ipv6: speedup ipv6 tunnels dismantle · bb401cae
      Eric Dumazet 提交于
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        110    267   5504    1    2 : tunables    8    4    0 : slabdata    110    267      0
      
      real    3m25.292s
      user    0m0.644s
      sys     0m40.153s
      
      After patch:
      
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real	1m38.965s
      user	0m0.688s
      sys	0m37.017s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb401cae
  17. 19 9月, 2017 1 次提交
    • X
      ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline · 8c22dab0
      Xin Long 提交于
      If ipv6 has been disabled from cmdline since kernel started, it makes
      no sense to allow users to create any ip6 tunnel. Otherwise, it could
      some potential problem.
      
      Jianlin found a kernel crash caused by this in ip6_gre when he set
      ipv6.disable=1 in grub:
      
      [  209.588865] Unable to handle kernel paging request for data at address 0x00000080
      [  209.588872] Faulting instruction address: 0xc000000000a3aa6c
      [  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
      [  209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
      [  209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
      [  209.589076] Call Trace:
      [  209.589097] fib6_rule_lookup+0x50/0xb0
      [  209.589106] rt6_lookup+0xc4/0x110
      [  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
      [  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
      [  209.589134] rtnl_newlink+0x798/0xb80
      [  209.589142] rtnetlink_rcv_msg+0xec/0x390
      [  209.589151] netlink_rcv_skb+0x138/0x150
      [  209.589159] rtnetlink_rcv+0x48/0x70
      [  209.589169] netlink_unicast+0x538/0x640
      [  209.589175] netlink_sendmsg+0x40c/0x480
      [  209.589184] ___sys_sendmsg+0x384/0x4e0
      [  209.589194] SyS_sendmsg+0xd4/0x140
      [  209.589201] SyS_socketcall+0x3e0/0x4f0
      [  209.589209] system_call+0x38/0xe0
      
      This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
      disabled from cmdline.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c22dab0
  18. 13 9月, 2017 1 次提交
  19. 09 9月, 2017 1 次提交
  20. 27 6月, 2017 3 次提交
  21. 19 6月, 2017 1 次提交
  22. 17 6月, 2017 1 次提交
  23. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  24. 05 6月, 2017 1 次提交
  25. 27 5月, 2017 1 次提交
    • P
      ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · 0e9a7095
      Peter Dawson 提交于
      This fix addresses two problems in the way the DSCP field is formulated
       on the encapsulating header of IPv6 tunnels.
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
      
      1) The IPv6 tunneling code was manipulating the DSCP field of the
       encapsulating packet using the 32b flowlabel. Since the flowlabel is
       only the lower 20b it was incorrect to assume that the upper 12b
       containing the DSCP and ECN fields would remain intact when formulating
       the encapsulating header. This fix handles the 'inherit' and
       'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
      
      2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
       incorrect and resulted in the DSCP value always being set to 0.
      
      Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
       is non-0") caused the regression by masking out the flowlabel
       which exposed the incorrect handling of the DSCP portion of the
       flowlabel in ip6_tunnel and ip6_gre.
      
      Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
      Signed-off-by: NPeter Dawson <peter.a.dawson@boeing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e9a7095
  26. 02 5月, 2017 1 次提交
    • C
      ip6_tunnel: Fix missing tunnel encapsulation limit option · 89a23c8b
      Craig Gallek 提交于
      The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
      IPV6_TLV_PADN options when an encapsulation limit is defined (the
      default is a limit of 4).  An MTU adjustment is done to account for
      these options as well.  However, the options are never present in the
      generated packets.
      
      The issue appears to be a subtlety between IPV6_DSTOPTS and
      IPV6_RTHDRDSTOPTS defined in RFC 3542.  When the IPIP tunnel driver was
      written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
      dst0opt of struct ipv6_txoptions.  Later, ipv6_push_nfrags_opts was
      (correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
      are to be used.  This caused the options to no longer be included in v6
      encapsulated packets.
      
      The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
      instead.  IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
      
      Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
      Fixes: 333fad53: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89a23c8b
  27. 27 4月, 2017 1 次提交
    • W
      ipv6: check skb->protocol before lookup for nexthop · 199ab00f
      WANG Cong 提交于
      Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
      is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
      neigh key as an IPv6 address:
      
              neigh = dst_neigh_lookup(skb_dst(skb),
                                       &ipv6_hdr(skb)->daddr);
              if (!neigh)
                      goto tx_err_link_failure;
      
              addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
              addr_type = ipv6_addr_type(addr6);
      
              if (addr_type == IPV6_ADDR_ANY)
                      addr6 = &ipv6_hdr(skb)->daddr;
      
              memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
      
      Also the network header of the skb at this point should be still IPv4
      for 4in6 tunnels, we shold not just use it as IPv6 header.
      
      This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
      is, we are safe to do the nexthop lookup using skb_dst() and
      ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
      dest address we can pick here, we have to rely on callers to fill it
      from tunnel config, so just fall to ip6_route_output() to make the
      decision.
      
      Fixes: ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      199ab00f
  28. 22 4月, 2017 1 次提交
  29. 02 2月, 2017 1 次提交
  30. 25 1月, 2017 2 次提交
  31. 17 1月, 2017 1 次提交
    • J
      ip6_tunnel: Account for tunnel header in tunnel MTU · 02ca0423
      Jakub Sitnicki 提交于
      With ip6gre we have a tunnel header which also makes the tunnel MTU
      smaller. We need to reserve room for it. Previously we were using up
      space reserved for the Tunnel Encapsulation Limit option
      header (RFC 2473).
      
      Also, after commit b05229f4 ("gre6: Cleanup GREv6 transmit path,
      call common GRE functions") our contract with the caller has
      changed. Now we check if the packet length exceeds the tunnel MTU after
      the tunnel header has been pushed, unlike before.
      
      This is reflected in the check where we look at the packet length minus
      the size of the tunnel header, which is already accounted for in tunnel
      MTU.
      
      Fixes: b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
      Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ca0423
  32. 25 12月, 2016 1 次提交