1. 07 6月, 2018 1 次提交
  2. 01 5月, 2018 1 次提交
  3. 27 4月, 2018 1 次提交
  4. 06 4月, 2018 1 次提交
  5. 28 3月, 2018 1 次提交
  6. 19 3月, 2018 3 次提交
    • S
      vti6: Fix dev->max_mtu setting · f8a554b4
      Stefano Brivio 提交于
      We shouldn't allow a tunnel to have IP_MAX_MTU as MTU, because
      another IPv6 header is going on top of our packets. Without this
      patch, we might end up building packets bigger than IP_MAX_MTU.
      
      Fixes: b96f9afe ("ipv4/6: use core net MTU range checking")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Acked-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f8a554b4
    • S
      vti6: Keep set MTU on link creation or change, validate it · 7a67e69a
      Stefano Brivio 提交于
      In vti6_link_config(), if MTU is already given on link creation
      or change, validate and use it instead of recomputing it. To do
      that, we need to propagate the knowledge that MTU was set by
      userspace all the way down to vti6_link_config().
      
      To keep this simple, vti6_dev_init() sets the new 'keep_mtu'
      argument of vti6_link_config() to true: on initialization, we
      don't have convenient access to netlink attributes there, but we
      will anyway check whether dev->mtu is set in vti6_link_config().
      If it's non-zero, it was set to the value of the IFLA_MTU
      attribute during creation. Otherwise, determine a reasonable
      value.
      
      Fixes: ed1efb2a ("ipv6: Add support for IPsec virtual tunnel interfaces")
      Fixes: 53c81e95 ("ip6_vti: adjust vti mtu according to mtu of lower device")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Acked-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7a67e69a
    • S
      vti6: Properly adjust vti6 MTU from MTU of lower device · c6741fbe
      Stefano Brivio 提交于
      If a lower device is found, we don't need to subtract
      LL_MAX_HEADER to calculate our MTU: just use its MTU, the link
      layer headers are already taken into account by it.
      
      If the lower device is not found, start from ETH_DATA_LEN
      instead, and only in this case subtract a worst-case
      LL_MAX_HEADER.
      
      We then need to subtract our additional IPv6 header from the
      calculation.
      
      While at it, note that vti6 doesn't have a hardware header, so
      it doesn't need to set dev->hard_header_len. And as
      vti6_link_config() now always sets the MTU, there's no need to
      set a default value in vti6_dev_setup().
      
      This makes the behaviour consistent with IPv4 vti, after
      commit a3245236 ("vti4: Don't count header length twice."),
      which was accidentally reverted by merge commit f895f0cf
      ("Merge branch 'master' of
      git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec").
      
      While commit 53c81e95 ("ip6_vti: adjust vti mtu according to
      mtu of lower device") improved on the original situation, this
      was still not ideal. As reported in that commit message itself,
      if we start from an underlying veth MTU of 9000, we end up with
      an MTU of 8832, that is, 9000 - LL_MAX_HEADER - sizeof(ipv6hdr).
      This should simply be 8880, or 9000 - sizeof(ipv6hdr) instead:
      we found the lower device (veth) and we know we don't have any
      additional link layer header, so there's no need to subtract an
      hypothetical worst-case number.
      
      Fixes: 53c81e95 ("ip6_vti: adjust vti mtu according to mtu of lower device")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Acked-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      c6741fbe
  7. 05 3月, 2018 1 次提交
  8. 28 2月, 2018 1 次提交
  9. 26 1月, 2018 1 次提交
  10. 21 12月, 2017 1 次提交
    • A
      ip6_vti: adjust vti mtu according to mtu of lower device · 53c81e95
      Alexey Kodanev 提交于
      LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams over
      ip6_vti that require fragmentation and the underlying device has an
      MTU smaller than 1500 plus some extra space for headers. This happens
      because ip6_vti, by default, sets MTU to ETH_DATA_LEN and not updating
      it depending on a destination address or link parameter. Further
      attempts to send UDP packets may succeed because pmtu gets updated on
      ICMPV6_PKT_TOOBIG in vti6_err().
      
      In case the lower device has larger MTU size, e.g. 9000, ip6_vti works
      but not using the possible maximum size, output packets have 1500 limit.
      
      The above cases require manual MTU setup after ip6_vti creation. However
      ip_vti already updates MTU based on lower device with ip_tunnel_bind_dev().
      
      Here is the example when the lower device MTU is set to 9000:
      
        # ip a sh ltp_ns_veth2
            ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ...
              inet 10.0.0.2/24 scope global ltp_ns_veth2
              inet6 fd00::2/64 scope global
      
        # ip li add vti6 type vti6 local fd00::2 remote fd00::1
        # ip li show vti6
            vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ...
              link/tunnel6 fd00::2 peer fd00::1
      
      After the patch:
        # ip li add vti6 type vti6 local fd00::2 remote fd00::1
        # ip li show vti6
            vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
              link/tunnel6 fd00::2 peer fd00::1
      Reported-by: NPetr Vorel <pvorel@suse.cz>
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53c81e95
  11. 27 9月, 2017 1 次提交
    • A
      vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit · 36f6ee22
      Alexey Kodanev 提交于
      When running LTP IPsec tests, KASan might report:
      
      BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
      Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0
      ...
      Call Trace:
        <IRQ>
        dump_stack+0x63/0x89
        print_address_description+0x7c/0x290
        kasan_report+0x28d/0x370
        ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        __asan_report_load4_noabort+0x19/0x20
        vti_tunnel_xmit+0xeee/0xff0 [ip_vti]
        ? vti_init_net+0x190/0x190 [ip_vti]
        ? save_stack_trace+0x1b/0x20
        ? save_stack+0x46/0xd0
        dev_hard_start_xmit+0x147/0x510
        ? icmp_echo.part.24+0x1f0/0x210
        __dev_queue_xmit+0x1394/0x1c60
      ...
      Freed by task 0:
        save_stack_trace+0x1b/0x20
        save_stack+0x46/0xd0
        kasan_slab_free+0x70/0xc0
        kmem_cache_free+0x81/0x1e0
        kfree_skbmem+0xb1/0xe0
        kfree_skb+0x75/0x170
        kfree_skb_list+0x3e/0x60
        __dev_queue_xmit+0x1298/0x1c60
        dev_queue_xmit+0x10/0x20
        neigh_resolve_output+0x3a8/0x740
        ip_finish_output2+0x5c0/0xe70
        ip_finish_output+0x4ba/0x680
        ip_output+0x1c1/0x3a0
        xfrm_output_resume+0xc65/0x13d0
        xfrm_output+0x1e4/0x380
        xfrm4_output_finish+0x5c/0x70
      
      Can be fixed if we get skb->len before dst_output().
      
      Fixes: b9959fd3 ("vti: switch to new ip tunnel code")
      Fixes: 22e1b23d ("vti6: Support inter address family tunneling.")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36f6ee22
  12. 20 9月, 2017 1 次提交
    • E
      ipv6: speedup ipv6 tunnels dismantle · bb401cae
      Eric Dumazet 提交于
      Implement exit_batch() method to dismantle more devices
      per round.
      
      (rtnl_lock() ...
       unregister_netdevice_many() ...
       rtnl_unlock())
      
      Tested:
      $ cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
      done
      wait ; grep net_namespace /proc/slabinfo
      
      Before patch :
      $ time ./add_del_unshare.sh
      net_namespace        110    267   5504    1    2 : tunables    8    4    0 : slabdata    110    267      0
      
      real    3m25.292s
      user    0m0.644s
      sys     0m40.153s
      
      After patch:
      
      $ time ./add_del_unshare.sh
      net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0
      
      real	1m38.965s
      user	0m0.688s
      sys	0m37.017s
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb401cae
  13. 19 7月, 2017 1 次提交
  14. 27 6月, 2017 3 次提交
  15. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  16. 22 4月, 2017 1 次提交
  17. 25 2月, 2017 1 次提交
  18. 16 2月, 2017 1 次提交
  19. 27 1月, 2017 1 次提交
  20. 07 1月, 2017 1 次提交
  21. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  22. 05 11月, 2016 1 次提交
    • L
      net: inet: Support UID-based routing in IP protocols. · e2d118a1
      Lorenzo Colitti 提交于
      - Use the UID in routing lookups made by protocol connect() and
        sendmsg() functions.
      - Make sure that routing lookups triggered by incoming packets
        (e.g., Path MTU discovery) take the UID of the socket into
        account.
      - For packets not associated with a userspace socket, (e.g., ping
        replies) use UID 0 inside the user namespace corresponding to
        the network namespace the socket belongs to. This allows
        all namespaces to apply routing and iptables rules to
        kernel-originated traffic in that namespaces by matching UID 0.
        This is better than using the UID of the kernel socket that is
        sending the traffic, because the UID of kernel sockets created
        at namespace creation time (e.g., the per-processor ICMP and
        TCP sockets) is the UID of the user that created the socket,
        which might not be mapped in the namespace.
      
      Tested: compiles allnoconfig, allyesconfig, allmodconfig
      Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2d118a1
  23. 21 10月, 2016 1 次提交
    • J
      ipv4/6: use core net MTU range checking · b96f9afe
      Jarod Wilson 提交于
      ipv4/ip_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_vti:
      - min_mtu = 1280, max_mtu = 65535
      - remove redundant vti6_change_mtu
      
      ipv6/sit:
      - min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen
      - remove redundant ipip6_tunnel_change_mtu
      
      CC: netdev@vger.kernel.org
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b96f9afe
  24. 11 10月, 2016 1 次提交
    • N
      vti6: flush x-netns xfrm cache when vti interface is removed · 7f92083e
      Nicolas Dichtel 提交于
      This is the same fix than commit a5d0dc81 ("vti: flush x-netns xfrm
      cache when vti interface is removed")
      
      This patch fixes a refcnt problem when a x-netns vti6 interface is removed:
      unregister_netdevice: waiting for vti6_test to become free. Usage count = 1
      
      Here is a script to reproduce the problem:
      
      ip link set dev ntfp2 up
      ip addr add dev ntfp2 2001::1/64
      ip link add vti6_test type vti6 local 2001::1 remote 2001::2 key 1
      ip netns add secure
      ip link set vti6_test netns secure
      ip netns exec secure ip link set vti6_test up
      ip netns exec secure ip link s lo up
      ip netns exec secure ip addr add dev vti6_test 2003::1/64
      ip -6 xfrm policy add dir out tmpl src 2001::1 dst 2001::2 proto esp \
      	   mode tunnel mark 1
      ip -6 xfrm policy add dir in tmpl src 2001::2 dst 2001::1 proto esp \
      	   mode tunnel mark 1
      ip xfrm state add src 2001::1 dst 2001::2 proto esp spi 1 mode tunnel \
      	   enc des3_ede 0x112233445566778811223344556677881122334455667788 mark 1
      ip xfrm state add src 2001::2 dst 2001::1 proto esp spi 1 mode tunnel \
      	   enc des3_ede 0x112233445566778811223344556677881122334455667788 mark 1
      ip netns exec secure  ping6 -c 4 2003::2
      ip netns del secure
      
      CC: Lance Richardson <lrichard@redhat.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NLance Richardson <lrichard@redhat.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7f92083e
  25. 21 9月, 2016 1 次提交
    • N
      vti6: fix input path · 63c43787
      Nicolas Dichtel 提交于
      Since commit 1625f452, vti6 is broken, all input packets are dropped
      (LINUX_MIB_XFRMINNOSTATES is incremented).
      
      XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
      xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
      xfrm6_rcv_spi().
      
      A new function xfrm6_rcv_tnl() that enables to pass a value to
      xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
      is used in several handlers).
      
      CC: Alexey Kodanev <alexey.kodanev@oracle.com>
      Fixes: 1625f452 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      63c43787
  26. 09 9月, 2016 1 次提交
  27. 11 8月, 2016 1 次提交
  28. 17 2月, 2016 1 次提交
  29. 08 10月, 2015 1 次提交
  30. 18 9月, 2015 1 次提交
  31. 02 6月, 2015 1 次提交
  32. 28 5月, 2015 2 次提交
  33. 07 4月, 2015 1 次提交
  34. 03 4月, 2015 1 次提交
  35. 01 4月, 2015 1 次提交