1. 09 9月, 2009 1 次提交
  2. 03 9月, 2009 2 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
    • E
      ip: Report qdisc packet drops · 6ce9e7b5
      Eric Dumazet 提交于
      Christoph Lameter pointed out that packet drops at qdisc level where not
      accounted in SNMP counters. Only if application sets IP_RECVERR, drops
      are reported to user (-ENOBUFS errors) and SNMP counters updated.
      
      IP_RECVERR is used to enable extended reliable error message passing,
      but these are not needed to update system wide SNMP stats.
      
      This patch changes things a bit to allow SNMP counters to be updated,
      regardless of IP_RECVERR being set or not on the socket.
      
      Example after an UDP tx flood
      # netstat -s 
      ...
      IP:
          1487048 outgoing packets dropped
      ...
      Udp:
      ...
          SndbufErrors: 1487048
      
      
      send() syscalls, do however still return an OK status, to not
      break applications.
      
      Note : send() manual page explicitly says for -ENOBUFS error :
      
       "The output queue for a network interface was full.
        This generally indicates that the interface has stopped sending,
        but may be caused by transient congestion.
        (Normally, this does not occur in Linux. Packets are just silently
        dropped when a device queue overflows.) "
      
      This is not true for IP_RECVERR enabled sockets : a send() syscall
      that hit a qdisc drop returns an ENOBUFS error.
      
      Many thanks to Christoph, David, and last but not least, Alexey !
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ce9e7b5
  3. 02 9月, 2009 3 次提交
  4. 01 9月, 2009 4 次提交
  5. 29 8月, 2009 6 次提交
    • J
      tcp: Remove redundant copy of MD5 authentication key · 9a7030b7
      John Dykstra 提交于
      Remove the copy of the MD5 authentication key from tcp_check_req().
      This key has already been copied by tcp_v4_syn_recv_sock() or
      tcp_v6_syn_recv_sock().
      Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a7030b7
    • O
      tcp: fix premature termination of FIN_WAIT2 time-wait sockets · 80a1096b
      Octavian Purdila 提交于
      There is a race condition in the time-wait sockets code that can lead
      to premature termination of FIN_WAIT2 and, subsequently, to RST
      generation when the FIN,ACK from the peer finally arrives:
      
      Time     TCP header
      0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0
      0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=...
      0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture]
      0.136934 HTTP/1.0 200 OK [Packet size limited during capture]
      0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521...
      0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=...
      0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=...
      0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151...
      0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0
      
      Say twdr->slot = 1 and we are running inet_twdr_hangman and in this
      instance inet_twdr_do_twkill_work returns 1. At that point we will
      mark slot 1 and schedule inet_twdr_twkill_work. We will also make
      twdr->slot = 2.
      
      Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo)
      is called which will create a new FIN_WAIT2 time-wait socket and will
      place it in the last to be reached slot, i.e. twdr->slot = 1.
      
      At this point say inet_twdr_twkill_work will run which will start
      destroying the time-wait sockets in slot 1, including the just added
      TCP_FIN_WAIT2 one.
      
      To avoid this issue we increment the slot only if all entries in the
      slot have been purged.
      
      This change may delay the slots cleanup by a time-wait death row
      period but only if the worker thread didn't had the time to run/purge
      the current slot in the next period (6 seconds with default sysctl
      settings). However, on such a busy system even without this change we
      would probably see delays...
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80a1096b
    • J
      fib_trie: resize rework · 80b71b80
      Jens Låås 提交于
      Here is rework and cleanup of the resize function.
      
      Some bugs we had. We were using ->parent when we should use 
      node_parent(). Also we used ->parent which is not assigned by
      inflate in inflate loop.
      
      Also a fix to set thresholds to power 2 to fit halve 
      and double strategy.
      
      max_resize is renamed to max_work which better indicates
      it's function.
      
      Reaching max_work is not an error, so warning is removed. 
      max_work only limits amount of work done per resize.
      (limits CPU-usage, outstanding memory etc).
      
      The clean-up makes it relatively easy to add fixed sized 
      root-nodes if we would like to decrease the memory pressure
      on routers with large routing tables and dynamic routing.
      If we'll need that...
      
      Its been tested with 280k routes.
      
      Work done together with Robert Olsson.
      Signed-off-by: NJens Låås <jens.laas@its.uu.se>
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80b71b80
    • E
      net: ip_rt_send_redirect() optimization · 30038fc6
      Eric Dumazet 提交于
      While doing some forwarding benchmarks, I noticed
      ip_rt_send_redirect() is rather expensive, even if send_redirects is
      false for the device.
      
      Fix is to avoid two atomic ops, we dont really need to take a
      reference on in_dev
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30038fc6
    • E
      tcp: keepalive cleanups · df19a626
      Eric Dumazet 提交于
      Introduce keepalive_probes(tp) helper, and use it, like 
      keepalive_time_when(tp) and keepalive_intvl_when(tp)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df19a626
    • E
      ipv4: af_inet.c cleanups · 3d1427f8
      Eric Dumazet 提交于
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d1427f8
  6. 28 8月, 2009 1 次提交
  7. 25 8月, 2009 3 次提交
  8. 24 8月, 2009 1 次提交
  9. 15 8月, 2009 1 次提交
  10. 14 8月, 2009 1 次提交
  11. 10 8月, 2009 9 次提交
  12. 06 8月, 2009 1 次提交
  13. 05 8月, 2009 1 次提交
  14. 31 7月, 2009 2 次提交
    • N
      xfrm: select sane defaults for xfrm[4|6] gc_thresh · a33bc5c1
      Neil Horman 提交于
      Choose saner defaults for xfrm[4|6] gc_thresh values on init
      
      Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
      (set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
      dynamically at boot time, the static selections can be non-sensical.
      This patch dynamically selects an appropriate gc threshold based on
      the corresponding main routing table size, using the assumption that
      we should in the worst case be able to handle as many connections as
      the routing table can.
      
      For ipv4, the maximum route cache size is 16 * the number of hash
      buckets in the route cache.  Given that xfrm4 starts garbage
      collection at the gc_thresh and prevents new allocations at 2 *
      gc_thresh, we set gc_thresh to half the maximum route cache size.
      
      For ipv6, its a bit trickier.  there is no maximum route cache size,
      but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
      sane to select a simmilar gc_thresh for the xfrm6 code that is half
      the number of hash buckets in the v6 route cache times 16 (like the v4
      code does).
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a33bc5c1
    • R
      ipv4: ARP neigh procfs buffer overflow · a3e8ee68
      roel kluin 提交于
      If arp_format_neigh_entry() can be called with n->dev->addr_len == 0, then a
      write to hbuffer[-1] occurs.
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3e8ee68
  15. 28 7月, 2009 1 次提交
    • N
      xfrm: export xfrm garbage collector thresholds via sysctl · a44a4a00
      Neil Horman 提交于
      Export garbage collector thresholds for xfrm[4|6]_dst_ops
      
      Had a problem reported to me recently in which a high volume of ipsec
      connections on a system began reporting ENOBUFS for new connections
      eventually.
      
      It seemed that after about 2000 connections we started being unable to
      create more.  A quick look revealed that the xfrm code used a dst_ops
      structure that limited the gc_thresh value to 1024, and always
      dropped route cache entries after 2x the gc_thresh.
      
      It seems the most direct solution is to export the gc_thresh values in
      the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so
      that higher volumes of connections can be supported.  This patch has
      been tested and allows the reporter to increase their ipsec connection
      volume successfully.
      Reported-by: NJoe Nall <joe@nall.com>
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
      ipv4/xfrm4_policy.c |   18 ++++++++++++++++++
      ipv6/xfrm6_policy.c |   18 ++++++++++++++++++
      2 files changed, 36 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a44a4a00
  16. 24 7月, 2009 1 次提交
  17. 20 7月, 2009 2 次提交