1. 03 5月, 2016 7 次提交
  2. 28 4月, 2016 9 次提交
  3. 26 4月, 2016 8 次提交
    • D
      net: ipv6: Delete host routes on an ifdown · 38bd10c4
      David Ahern 提交于
      It was a simple idea -- save IPv6 configured addresses on a link down
      so that IPv6 behaves similar to IPv4. As always the devil is in the
      details and the IPv6 stack as too many behavioral differences from IPv4
      making the simple idea more complicated than it needs to be.
      
      The current implementation for keeping IPv6 addresses can panic or spit
      out a warning in one of many paths:
      
      1. IPv6 route gets an IPv4 route as its 'next' which causes a panic in
         rt6_fill_node while handling a route dump request.
      
      2. rt->dst.obsolete is set to DST_OBSOLETE_DEAD hitting the WARN_ON in
         fib6_del
      
      3. Panic in fib6_purge_rt because rt6i_ref count is not 1.
      
      The root cause of all these is references related to the host route for
      an address that is retained.
      
      So, this patch deletes the host route every time the ifdown loop runs.
      Since the host route is deleted and will be re-generated an up there is
      no longer a need for the l3mdev fix up. On the 'admin up' side move
      addrconf_permanent_addr into the NETDEV_UP event handling so that it
      runs only once versus on UP and CHANGE events.
      
      All of the current panics and warnings appear to be related to
      addresses on the loopback device, but given the catastrophic nature when
      a bug is triggered this patch takes the conservative approach and evicts
      all host routes rather than trying to determine when it can be re-used
      and when it can not. That can be a later optimizaton if desired.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38bd10c4
    • D
      Revert "ipv6: Revert optional address flusing on ifdown." · 6a923934
      David S. Miller 提交于
      This reverts commit 841645b5.
      
      Ok, this puts the feature back.  I've decided to apply David A.'s
      bug fix and run with that rather than make everyone wait another
      whole release for this feature.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a923934
    • T
      ila: add checksum neutral ILA translations · 90bfe662
      Tom Herbert 提交于
      Support checksum neutral ILA as described in the ILA draft. The low
      order 16 bits of the identifier are used to contain the checksum
      adjustment value.
      
      The csum-mode parameter is added to described checksum processing. There
      are three values:
       - adjust transport checksum (previous behavior)
       - do checksum neutral mapping
       - do nothing
      
      On output the csum-mode in the ila_params is checked and acted on. If
      mode is checksum neutral mapping then to mapping and set C-bit.
      
      On input, C-bit is checked. If it is set checksum-netural mapping is
      done (regardless of csum-mode in ila params) and C-bit will be cleared.
      If it is not set then action in csum-mode is taken.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90bfe662
    • T
      ila: xlat changes · 642c2c95
      Tom Herbert 提交于
      Change model of xlat to be used only for input where lookup is done on
      the locator part of an address (comparing to locator_match as key
      in rhashtable). This is needed for checksum neutral translation
      which obfuscates the low order 16 bits of the identifier. It also
      permits hosts to be in muliple ILA domains (each locator can map
      to a different SIR address). A check is also added to disallow
      translating non-ILA addresses (check of type in identifier).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      642c2c95
    • T
      ila: Add struct definitions and helpers · 351596aa
      Tom Herbert 提交于
      Add structures for identifiers, locators, and an ila address which
      is composed of a locator and identifier and in6_addr can be cast to
      it. This includes a three bit type field and enums for the types defined
      in ILA I-D.
      
      In ILA lwt don't allow user to set a translation for a non-ILA
      address (type of identifier is zero meaning it is an IID). This also
      requires that the destination prefix is at least 65 bytes (64
      bit locator and first byte of identifier).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      351596aa
    • E
      net: better drop monitoring in ip{6}_recv_error() · 960a2628
      Eric Dumazet 提交于
      We should call consume_skb(skb) when skb is properly consumed,
      or kfree_skb(skb) when skb must be dropped in error case.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      960a2628
    • D
      ipv6: Revert optional address flusing on ifdown. · 841645b5
      David S. Miller 提交于
      This reverts the following three commits:
      
      70af921d
      799977d9
      f1705ec1
      
      The feature was ill conceived, has terrible semantics, and has added
      nothing but regressions to the already fragile ipv6 stack.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      841645b5
    • N
      ipv6: use nla_put_u64_64bit() · f13a82d8
      Nicolas Dichtel 提交于
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f13a82d8
  4. 22 4月, 2016 1 次提交
  5. 20 4月, 2016 2 次提交
  6. 17 4月, 2016 4 次提交
  7. 16 4月, 2016 1 次提交
    • E
      tcp: do not mess with listener sk_wmem_alloc · b3d05147
      Eric Dumazet 提交于
      When removing sk_refcnt manipulation on synflood, I missed that
      using skb_set_owner_w() was racy, if sk->sk_wmem_alloc had already
      transitioned to 0.
      
      We should hold sk_refcnt instead, but this is a big deal under attack.
      (Doing so increase performance from 3.2 Mpps to 3.8 Mpps only)
      
      In this patch, I chose to not attach a socket to syncookies skb.
      
      Performance is now 5 Mpps instead of 3.2 Mpps.
      
      Following patch will remove last known false sharing in
      tcp_rcv_state_process()
      
      Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3d05147
  8. 15 4月, 2016 7 次提交
    • M
      ipv6: udp: Do a route lookup and update during release_cb · e646b657
      Martin KaFai Lau 提交于
      This patch adds a release_cb for UDPv6.  It does a route lookup
      and updates sk->sk_dst_cache if it is needed.  It picks up the
      left-over job from ip6_sk_update_pmtu() if the sk was owned
      by user during the pmtu update.
      
      It takes a rcu_read_lock to protect the __sk_dst_get() operations
      because another thread may do ip6_dst_store() without taking the
      sk lock (e.g. sendmsg).
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NWei Wang <weiwan@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e646b657
    • M
      ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update · 33c162a9
      Martin KaFai Lau 提交于
      There is a case in connected UDP socket such that
      getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
      sequence could be the following:
      1. Create a connected UDP socket
      2. Send some datagrams out
      3. Receive a ICMPV6_PKT_TOOBIG
      4. No new outgoing datagrams to trigger the sk_dst_check()
         logic to update the sk->sk_dst_cache.
      5. getsockopt(IPV6_MTU) returns the mtu from the invalid
         sk->sk_dst_cache instead of the newly created RTF_CACHE clone.
      
      This patch updates the sk->sk_dst_cache for a connected datagram sk
      during pmtu-update code path.
      
      Note that the sk->sk_v6_daddr is used to do the route lookup
      instead of skb->data (i.e. iph).  It is because a UDP socket can become
      connected after sending out some datagrams in un-connected state.  or
      It can be connected multiple times to different destinations.  Hence,
      iph may not be related to where sk is currently connected to.
      
      It is done under '!sock_owned_by_user(sk)' condition because
      the user may make another ip6_datagram_connect()  (i.e changing
      the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
      code path.
      
      For the sock_owned_by_user(sk) == true case, the next patch will
      introduce a release_cb() which will update the sk->sk_dst_cache.
      
      Test:
      
      Server (Connected UDP Socket):
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Route Details:
      [root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
      2fac::/64 dev eth0  proto kernel  metric 256  pref medium
      2fac:face::/64 via 2fac::face dev eth0  metric 1024  pref medium
      
      A simple python code to create a connected UDP socket:
      
      import socket
      import errno
      
      HOST = '2fac::1'
      PORT = 8080
      
      s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
      s.bind((HOST, PORT))
      s.connect(('2fac:face::face', 53))
      print("connected")
      while True:
          try:
      	data = s.recv(1024)
          except socket.error as se:
      	if se.errno == errno.EMSGSIZE:
      		pmtu = s.getsockopt(41, 24)
      		print("PMTU:%d" % pmtu)
      		break
      s.close()
      
      Python program output after getting a ICMPV6_PKT_TOOBIG:
      [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
      connected
      PMTU:1300
      
      Cache routes after recieving TOOBIG:
      [root@arch-fb-vm1 ~]# ip -6 r show table cache
      2fac:face::face via 2fac::face dev eth0  metric 0
          cache  expires 463sec mtu 1300 pref medium
      
      Client (Send the ICMPV6_PKT_TOOBIG):
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      scapy is used to generate the TOOBIG message.  Here is the scapy script I have
      used:
      
      >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
      1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
      >>> sendp(p, iface='qemubr0')
      
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reported-by: NWei Wang <weiwan@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33c162a9
    • M
      ipv6: datagram: Refactor dst lookup and update codes to a new function · 7e2040db
      Martin KaFai Lau 提交于
      This patch moves the route lookup and update codes for connected
      datagram sk to a newly created function ip6_datagram_dst_update()
      
      It will be reused during the pmtu update in the later patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e2040db
    • M
      ipv6: datagram: Refactor flowi6 init codes to a new function · 80fbdb20
      Martin KaFai Lau 提交于
      Move flowi6 init codes for connected datagram sk to a newly created
      function ip6_datagram_flow_key_init().
      
      Notes:
      1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect
      2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of
         (addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init()
      
      This new function will be reused during pmtu update in the later patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80fbdb20
    • A
      GSO: Support partial segmentation offload · 802ab55a
      Alexander Duyck 提交于
      This patch adds support for something I am referring to as GSO partial.
      The basic idea is that we can support a broader range of devices for
      segmentation if we use fixed outer headers and have the hardware only
      really deal with segmenting the inner header.  The idea behind the naming
      is due to the fact that everything before csum_start will be fixed headers,
      and everything after will be the region that is handled by hardware.
      
      With the current implementation it allows us to add support for the
      following GSO types with an inner TSO_MANGLEID or TSO6 offload:
      NETIF_F_GSO_GRE
      NETIF_F_GSO_GRE_CSUM
      NETIF_F_GSO_IPIP
      NETIF_F_GSO_SIT
      NETIF_F_UDP_TUNNEL
      NETIF_F_UDP_TUNNEL_CSUM
      
      In the case of hardware that already supports tunneling we may be able to
      extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
      hardware can support updating inner IPv4 headers.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      802ab55a
    • A
      GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values · 1530545e
      Alexander Duyck 提交于
      This patch does two things.
      
      First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field.  As
      a result we should now be able to aggregate flows that were converted from
      IPv6 to IPv4.  In addition this allows us more flexibility for future
      implementations of segmentation as we may be able to use a fixed IP ID when
      segmenting the flow.
      
      The second thing this does is that it places limitations on the outer IPv4
      ID header in the case of tunneled frames.  Specifically it forces the IP ID
      to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
      This way we can avoid creating overlapping series of IP IDs that could
      possibly be fragmented if the frame goes through GRO and is then
      resegmented via GSO.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1530545e
    • A
      GSO: Add GSO type for fixed IPv4 ID · cbc53e08
      Alexander Duyck 提交于
      This patch adds support for TSO using IPv4 headers with a fixed IP ID
      field.  This is meant to allow us to do a lossless GRO in the case of TCP
      flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
      headers.
      
      In addition I am adding a feature that for now I am referring to TSO with
      IP ID mangling.  Basically when this flag is enabled the device has the
      option to either output the flow with incrementing IP IDs or with a fixed
      IP ID regardless of what the original IP ID ordering was.  This is useful
      in cases where the DF bit is set and we do not care if the original IP ID
      value is maintained.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbc53e08
  9. 14 4月, 2016 1 次提交