1. 28 2月, 2018 1 次提交
  2. 26 1月, 2018 1 次提交
  3. 25 1月, 2018 1 次提交
  4. 14 12月, 2017 1 次提交
  5. 20 9月, 2017 1 次提交
    • E
      ipv4: speedup ipv6 tunnels dismantle · 64bc1781
      Eric Dumazet 提交于
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real    1m38.965s
      user    0m0.688s
      sys     0m37.017s
      
      After patch:
      $ time ./add_del_unshare.sh
      net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0
      
      real	0m22.117s
      user	0m0.728s
      sys	0m35.328s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64bc1781
  6. 13 9月, 2017 1 次提交
  7. 09 9月, 2017 1 次提交
  8. 17 6月, 2017 1 次提交
  9. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  10. 22 4月, 2017 1 次提交
    • C
      ip_tunnel: Allow policy-based routing through tunnels · 9830ad4c
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      
      There is no concept of per-packet routing decisions through IPv4
      tunnels, so this implementation does not need to work with
      per-packet route lookups as the v6 implementation may
      (with IP6_TNL_F_USE_ORIG_FWMARK).
      
      Further, since the v4 tunnel ioctls share datastructures
      (which can not be trivially modified) with the kernel's internal
      tunnel configuration structures, the mark attribute must be stored
      in the tunnel structure itself and passed as a parameter when
      creating or changing tunnel attributes.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9830ad4c
  11. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  12. 21 10月, 2016 1 次提交
    • J
      ipv4/6: use core net MTU range checking · b96f9afe
      Jarod Wilson 提交于
      ipv4/ip_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_vti:
      - min_mtu = 1280, max_mtu = 65535
      - remove redundant vti6_change_mtu
      
      ipv6/sit:
      - min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen
      - remove redundant ipip6_tunnel_change_mtu
      
      CC: netdev@vger.kernel.org
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b96f9afe
  13. 17 9月, 2016 1 次提交
  14. 16 6月, 2016 1 次提交
  15. 21 5月, 2016 1 次提交
  16. 30 4月, 2016 1 次提交
  17. 24 2月, 2016 1 次提交
    • B
      tunnel: Clear IPCB(skb)->opt before dst_link_failure called · 5146d1f1
      Bernie Harris 提交于
      IPCB may contain data from previous layers (in the observed case the
      qdisc layer). In the observed scenario, the data was misinterpreted as
      ip header options, which later caused the ihl to be set to an invalid
      value (<5). This resulted in an infinite loop in the mips implementation
      of ip_fast_csum.
      
      This patch clears IPCB(skb)->opt before dst_link_failure can be called for
      various types of tunnels. This change only applies to encapsulated ipv4
      packets.
      
      The code introduced in 11c21a30 which clears all of IPCB has been removed
      to be consistent with these changes, and instead the opt field is cleared
      unconditionally in ip_tunnel_xmit. The change in ip_tunnel_xmit applies to
      SIT, GRE, and IPIP tunnels.
      
      The relevant vti, l2tp, and pptp functions already contain similar code for
      clearing the IPCB.
      Signed-off-by: NBernie Harris <bernie.harris@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5146d1f1
  18. 17 2月, 2016 1 次提交
  19. 10 2月, 2016 1 次提交
    • D
      vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices · 7e059158
      David Wragg 提交于
      Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
      transmit vxlan packets of any size, constrained only by the ability to
      send out the resulting packets.  4.3 introduced netdevs corresponding
      to tunnel vports.  These netdevs have an MTU, which limits the size of
      a packet that can be successfully encapsulated.  The default MTU
      values are low (1500 or less), which is awkwardly small in the context
      of physical networks supporting jumbo frames, and leads to a
      conspicuous change in behaviour for userspace.
      
      Instead, set the MTU on openvswitch-created netdevs to be the relevant
      maximum (i.e. the maximum IP packet size minus any relevant overhead),
      effectively restoring the behaviour prior to 4.3.
      Signed-off-by: NDavid Wragg <david@weave.works>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e059158
  20. 26 12月, 2015 1 次提交
  21. 01 12月, 2015 1 次提交
  22. 11 8月, 2015 1 次提交
  23. 09 7月, 2015 1 次提交
  24. 08 4月, 2015 1 次提交
  25. 04 4月, 2015 2 次提交
  26. 03 4月, 2015 1 次提交
  27. 20 1月, 2015 1 次提交
  28. 17 12月, 2014 2 次提交
  29. 13 11月, 2014 1 次提交
  30. 06 11月, 2014 1 次提交
    • T
      net: Move fou_build_header into fou.c and refactor · 63487bab
      Tom Herbert 提交于
      Move fou_build_header out of ip_tunnel.c and into fou.c splitting
      it up into fou_build_header, gue_build_header, and fou_build_udp.
      This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
      to call fou_build_header or gue_build_header based on the tunnel
      encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
      functions which are called by ip_encap_hlen. New net/fou.h has
      prototypes and defines for this.
      
      Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
      can use FOU/GUE and fou module is also selected.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63487bab
  31. 04 10月, 2014 2 次提交
  32. 26 9月, 2014 1 次提交
  33. 23 9月, 2014 1 次提交
  34. 20 9月, 2014 1 次提交
  35. 31 7月, 2014 1 次提交
  36. 16 7月, 2014 1 次提交
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677
  37. 09 7月, 2014 1 次提交
    • D
      ip_tunnel: fix ip_tunnel_lookup · e0056593
      Dmitry Popov 提交于
      This patch fixes 3 similar bugs where incoming packets might be routed into
      wrong non-wildcard tunnels:
      
      1) Consider the following setup:
          ip address add 1.1.1.1/24 dev eth0
          ip address add 1.1.1.2/24 dev eth0
          ip tunnel add ipip1 remote 2.2.2.2 local 1.1.1.1 mode ipip dev eth0
          ip link set ipip1 up
      
      Incoming ipip packets from 2.2.2.2 were routed into ipip1 even if it has dst =
      1.1.1.2. Moreover even if there was wildcard tunnel like
         ip tunnel add ipip0 remote 2.2.2.2 local any mode ipip dev eth0
      but it was created before explicit one (with local 1.1.1.1), incoming ipip
      packets with src = 2.2.2.2 and dst = 1.1.1.2 were still routed into ipip1.
      
      Same issue existed with all tunnels that use ip_tunnel_lookup (gre, vti)
      
      2)  ip address add 1.1.1.1/24 dev eth0
          ip tunnel add ipip1 remote 2.2.146.85 local 1.1.1.1 mode ipip dev eth0
          ip link set ipip1 up
      
      Incoming ipip packets with dst = 1.1.1.1 were routed into ipip1, no matter what
      src address is. Any remote ip address which has ip_tunnel_hash = 0 raised this
      issue, 2.2.146.85 is just an example, there are more than 4 million of them.
      And again, wildcard tunnel like
         ip tunnel add ipip0 remote any local 1.1.1.1 mode ipip dev eth0
      wouldn't be ever matched if it was created before explicit tunnel like above.
      
      Gre & vti tunnels had the same issue.
      
      3)  ip address add 1.1.1.1/24 dev eth0
          ip tunnel add gre1 remote 2.2.146.84 local 1.1.1.1 key 1 mode gre dev eth0
          ip link set gre1 up
      
      Any incoming gre packet with key = 1 were routed into gre1, no matter what
      src/dst addresses are. Any remote ip address which has ip_tunnel_hash = 0 raised
      the issue, 2.2.146.84 is just an example, there are more than 4 million of them.
      Wildcard tunnel like
         ip tunnel add gre2 remote any local any key 1 mode gre dev eth0
      wouldn't be ever matched if it was created before explicit tunnel like above.
      
      All this stuff happened because while looking for a wildcard tunnel we didn't
      check that matched tunnel is a wildcard one. Fixed.
      Signed-off-by: NDmitry Popov <ixaphire@qrator.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0056593