1. 29 8月, 2015 1 次提交
  2. 09 9月, 2014 1 次提交
    • W
      inet: remove dead inetpeer sequence code · a7f26b7e
      Willem de Bruijn 提交于
      inetpeer sequence numbers are no longer incremented, so no need to
      check and flush the tree. The function that increments the sequence
      number was already dead code and removed in in "ipv4: remove unused
      function" (068a6e18). Remove the code that checks for a change, too.
      
      Verifying that v4_seq and v6_seq are never incremented and thus that
      flush_check compares bp->flush_seq to 0 is trivial.
      
      The second part of the change removes flush_check completely even
      though bp->flush_seq is exactly !0 once, at initialization. This
      change is correct because the time this branch is true is when
      bp->root == peer_avl_empty_rcu, in which the branch and
      inetpeer_invalidate_tree are a NOOP.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7f26b7e
  3. 03 6月, 2014 1 次提交
    • E
      inetpeer: get rid of ip_id_count · 73f156a6
      Eric Dumazet 提交于
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f156a6
  4. 27 4月, 2014 1 次提交
  5. 18 4月, 2014 1 次提交
  6. 29 12月, 2013 1 次提交
  7. 27 12月, 2013 1 次提交
  8. 20 9月, 2013 1 次提交
    • A
      ip: generate unique IP identificator if local fragmentation is allowed · 703133de
      Ansis Atteka 提交于
      If local fragmentation is allowed, then ip_select_ident() and
      ip_select_ident_more() need to generate unique IDs to ensure
      correct defragmentation on the peer.
      
      For example, if IPsec (tunnel mode) has to encrypt large skbs
      that have local_df bit set, then all IP fragments that belonged
      to different ESP datagrams would have used the same identificator.
      If one of these IP fragments would get lost or reordered, then
      peer could possibly stitch together wrong IP fragments that did
      not belong to the same datagram. This would lead to a packet loss
      or data corruption.
      Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      703133de
  9. 28 9月, 2012 1 次提交
  10. 22 8月, 2012 1 次提交
  11. 11 7月, 2012 2 次提交
  12. 21 6月, 2012 1 次提交
    • E
      inetpeer: inetpeer_invalidate_tree() cleanup · da557374
      Eric Dumazet 提交于
      No need to use cmpxchg() in inetpeer_invalidate_tree() since we hold
      base lock.
      
      Also use correct rcu annotations to remove sparse errors
      (CONFIG_SPARSE_RCU_POINTER=y)
      
      net/ipv4/inetpeer.c:144:19: error: incompatible types in comparison
      expression (different address spaces)
      net/ipv4/inetpeer.c:149:20: error: incompatible types in comparison
      expression (different address spaces)
      net/ipv4/inetpeer.c:595:10: error: incompatible types in comparison
      expression (different address spaces)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da557374
  13. 11 6月, 2012 1 次提交
    • D
      inet: Add family scope inetpeer flushes. · b48c80ec
      David S. Miller 提交于
      This implementation can deal with having many inetpeer roots, which is
      a necessary prerequisite for per-FIB table rooted peer tables.
      
      Each family (AF_INET, AF_INET6) has a sequence number which we bump
      when we get a family invalidation request.
      
      Each peer lookup cheaply checks whether the flush sequence of the
      root we are using is out of date, and if so flushes it and updates
      the sequence number.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b48c80ec
  14. 10 6月, 2012 3 次提交
  15. 09 6月, 2012 1 次提交
  16. 07 6月, 2012 1 次提交
  17. 08 3月, 2012 2 次提交
  18. 18 1月, 2012 1 次提交
  19. 17 1月, 2012 1 次提交
  20. 07 8月, 2011 1 次提交
    • D
      net: Compute protocol sequence numbers and fragment IDs using MD5. · 6e5714ea
      David S. Miller 提交于
      Computers have become a lot faster since we compromised on the
      partial MD4 hash which we use currently for performance reasons.
      
      MD5 is a much safer choice, and is inline with both RFC1948 and
      other ISS generators (OpenBSD, Solaris, etc.)
      
      Furthermore, only having 24-bits of the sequence number be truly
      unpredictable is a very serious limitation.  So the periodic
      regeneration and 8-bit counter have been removed.  We compute and
      use a full 32-bit sequence number.
      
      For ipv6, DCCP was found to use a 32-bit truncated initial sequence
      number (it needs 43-bits) and that is fixed here as well.
      Reported-by: NDan Kaminsky <dan@doxpara.com>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e5714ea
  21. 22 7月, 2011 1 次提交
  22. 12 7月, 2011 1 次提交
    • E
      inetpeer: kill inet_putpeer race · 6d1a3e04
      Eric Dumazet 提交于
      We currently can free inetpeer entries too early :
      
      [  782.636674] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f130f44c)
      [  782.636677] 1f7b13c100000000000000000000000002000000000000000000000000000000
      [  782.636686]  i i i i u u u u i i i i u u u u i i i i u u u u u u u u u u u u
      [  782.636694]                          ^
      [  782.636696]
      [  782.636698] Pid: 4638, comm: ssh Not tainted 3.0.0-rc5+ #270 Hewlett-Packard HP Compaq 6005 Pro SFF PC/3047h
      [  782.636702] EIP: 0060:[<c13fefbb>] EFLAGS: 00010286 CPU: 0
      [  782.636707] EIP is at inet_getpeer+0x25b/0x5a0
      [  782.636709] EAX: 00000002 EBX: 00010080 ECX: f130f3c0 EDX: f0209d30
      [  782.636711] ESI: 0000bc87 EDI: 0000ea60 EBP: f0209ddc ESP: c173134c
      [  782.636712]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      [  782.636714] CR0: 8005003b CR2: f0beca80 CR3: 30246000 CR4: 000006d0
      [  782.636716] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      [  782.636717] DR6: ffff4ff0 DR7: 00000400
      [  782.636718]  [<c13fbf76>] rt_set_nexthop.clone.45+0x56/0x220
      [  782.636722]  [<c13fc449>] __ip_route_output_key+0x309/0x860
      [  782.636724]  [<c141dc54>] tcp_v4_connect+0x124/0x450
      [  782.636728]  [<c142ce43>] inet_stream_connect+0xa3/0x270
      [  782.636731]  [<c13a8da1>] sys_connect+0xa1/0xb0
      [  782.636733]  [<c13a99dd>] sys_socketcall+0x25d/0x2a0
      [  782.636736]  [<c149deb8>] sysenter_do_call+0x12/0x28
      [  782.636738]  [<ffffffff>] 0xffffffff
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d1a3e04
  23. 09 6月, 2011 1 次提交
    • E
      inetpeer: remove unused list · 4b9d9be8
      Eric Dumazet 提交于
      Andi Kleen and Tim Chen reported huge contention on inetpeer
      unused_peers.lock, on memcached workload on a 40 core machine, with
      disabled route cache.
      
      It appears we constantly flip peers refcnt between 0 and 1 values, and
      we must insert/remove peers from unused_peers.list, holding a contended
      spinlock.
      
      Remove this list completely and perform a garbage collection on-the-fly,
      at lookup time, using the expired nodes we met during the tree
      traversal.
      
      This removes a lot of code, makes locking more standard, and obsoletes
      two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
      removes two pointers in inet_peer structure.
      
      There is still a false sharing effect because refcnt is in first cache
      line of object [were the links and keys used by lookups are located], we
      might move it at the end of inet_peer structure to let this first cache
      line mostly read by cpus.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <andi@firstfloor.org>
      CC: Tim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b9d9be8
  24. 28 5月, 2011 1 次提交
  25. 13 4月, 2011 1 次提交
    • E
      inetpeer: reduce stack usage · 66944e1c
      Eric Dumazet 提交于
      On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
      from inet_getpeer().
      
      Lets share the avl stack to save ~376 bytes.
      
      Before patch :
      
      # objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
      
      0x000006c3 unlink_from_pool [inetpeer.o]:		376
      0x00000721 unlink_from_pool [inetpeer.o]:		376
      0x00000cb1 inet_getpeer [inetpeer.o]:			376
      0x00000e6d inet_getpeer [inetpeer.o]:			376
      0x0004 inet_initpeers [inetpeer.o]:			112
      # size net/ipv4/inetpeer.o
         text	   data	    bss	    dec	    hex	filename
         5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o
      
      After patch :
      
      objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
      0x00000c11 inet_getpeer [inetpeer.o]:			376
      0x00000dcd inet_getpeer [inetpeer.o]:			376
      0x00000ab9 peer_check_expire [inetpeer.o]:		328
      0x00000b7f peer_check_expire [inetpeer.o]:		328
      0x0004 inet_initpeers [inetpeer.o]:			112
      # size net/ipv4/inetpeer.o
         text	   data	    bss	    dec	    hex	filename
         5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Scot Doyle <lkml@scotdoyle.com>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
      Reviewed-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66944e1c
  26. 14 3月, 2011 2 次提交
  27. 09 3月, 2011 1 次提交
  28. 05 3月, 2011 1 次提交
    • E
      inetpeer: seqlock optimization · 65e8354e
      Eric Dumazet 提交于
      David noticed :
      
      ------------------
      Eric, I was profiling the non-routing-cache case and something that
      stuck out is the case of calling inet_getpeer() with create==0.
      
      If an entry is not found, we have to redo the lookup under a spinlock
      to make certain that a concurrent writer rebalancing the tree does
      not "hide" an existing entry from us.
      
      This makes the case of a create==0 lookup for a not-present entry
      really expensive.  It is on the order of 600 cpu cycles on my
      Niagara2.
      
      I added a hack to not do the relookup under the lock when create==0
      and it now costs less than 300 cycles.
      
      This is now a pretty common operation with the way we handle COW'd
      metrics, so I think it's definitely worth optimizing.
      -----------------
      
      One solution is to use a seqlock instead of a spinlock to protect struct
      inet_peer_base.
      
      After a failed avl tree lookup, we can easily detect if a writer did
      some changes during our lookup. Taking the lock and redo the lookup is
      only necessary in this case.
      
      Note: Add one private rcu_deref_locked() macro to place in one spot the
      access to spinlock included in seqlock.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65e8354e
  29. 11 2月, 2011 2 次提交
  30. 05 2月, 2011 1 次提交
  31. 28 1月, 2011 1 次提交
  32. 25 1月, 2011 1 次提交
  33. 02 12月, 2010 1 次提交
  34. 01 12月, 2010 1 次提交