1. 21 3月, 2010 1 次提交
    • S
      NET_DMA: free skbs periodically · 73852e81
      Steven J. Magnani 提交于
      Under NET_DMA, data transfer can grind to a halt when userland issues a
      large read on a socket with a high RCVLOWAT (i.e., 512 KB for both).
      This appears to be because the NET_DMA design queues up lots of memcpy
      operations, but doesn't issue or wait for them (and thus free the
      associated skbs) until it is time for tcp_recvmesg() to return.
      The socket hangs when its TCP window goes to zero before enough data is
      available to satisfy the read.
      
      Periodically issue asynchronous memcpy operations, and free skbs for ones
      that have completed, to prevent sockets from going into zero-window mode.
      Signed-off-by: NSteven J. Magnani <steve@digidescorp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73852e81
  2. 19 3月, 2010 1 次提交
  3. 19 2月, 2010 2 次提交
    • A
      net: TCP thin dupack · 7e380175
      Andreas Petlund 提交于
      This patch enables fast retransmissions after one dupACK for
      TCP if the stream is identified as thin. This will reduce
      latencies for thin streams that are not able to trigger fast
      retransmissions due to high packet interarrival time. This
      mechanism is only active if enabled by iocontrol or syscontrol
      and the stream is identified as thin.
      Signed-off-by: NAndreas Petlund <apetlund@simula.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e380175
    • A
      net: TCP thin linear timeouts · 36e31b0a
      Andreas Petlund 提交于
      This patch will make TCP use only linear timeouts if the
      stream is thin. This will help to avoid the very high latencies
      that thin stream suffer because of exponential backoff. This
      mechanism is only active if enabled by iocontrol or syscontrol
      and the stream is identified as thin. A maximum of 6 linear
      timeouts is tried before exponential backoff is resumed.
      Signed-off-by: NAndreas Petlund <apetlund@simula.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36e31b0a
  4. 17 2月, 2010 1 次提交
    • T
      percpu: add __percpu sparse annotations to net · 7d720c3e
      Tejun Heo 提交于
      Add __percpu sparse annotations to net.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      
      The macro and type tricks around snmp stats make things a bit
      interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
      as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
      snmp_mib_*() users which used to cast the argument to (void **) are
      updated to cast it to (void __percpu **).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d720c3e
  5. 24 12月, 2009 2 次提交
  6. 09 12月, 2009 1 次提交
  7. 03 12月, 2009 2 次提交
    • W
      TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS · e56fb50f
      William Allen Simpson 提交于
      Provide per socket control of the TCP cookie option and SYN/SYNACK data.
      
      This is a straightforward re-implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      The principle difference is using a TCP option to carry the cookie nonce,
      instead of a user configured offset in the data.
      
      Allocations have been rearranged to avoid requiring GFP_ATOMIC.
      
      Requires:
         net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
         TCPCT part 1d: define TCP cookie option, extend existing struct's
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e56fb50f
    • W
      TCPCT part 1b: generate Responder Cookie secret · da5c78c8
      William Allen Simpson 提交于
      Define (missing) hash message size for SHA1.
      
      Define hashing size constants specific to TCP cookies.
      
      Add new function: tcp_cookie_generator().
      
      Maintain global secret values for tcp_cookie_generator().
      
      This is a significantly revised implementation of earlier (15-year-old)
      Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform.
      
      Linux RCU technique appears to be well-suited to this application, though
      neither of the circular queue items are freed.
      
      These functions will also be used in subsequent patches that implement
      additional features.
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da5c78c8
  8. 01 12月, 2009 1 次提交
  9. 14 11月, 2009 1 次提交
  10. 23 10月, 2009 1 次提交
  11. 20 10月, 2009 2 次提交
  12. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  13. 13 10月, 2009 1 次提交
  14. 03 10月, 2009 1 次提交
  15. 02 10月, 2009 1 次提交
  16. 01 10月, 2009 1 次提交
  17. 22 9月, 2009 1 次提交
  18. 15 9月, 2009 1 次提交
  19. 03 9月, 2009 1 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
  20. 29 8月, 2009 1 次提交
  21. 10 7月, 2009 1 次提交
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  22. 30 6月, 2009 1 次提交
  23. 29 5月, 2009 1 次提交
  24. 27 5月, 2009 5 次提交
  25. 19 5月, 2009 1 次提交
  26. 17 4月, 2009 1 次提交
  27. 01 4月, 2009 1 次提交
  28. 19 3月, 2009 1 次提交
  29. 16 3月, 2009 3 次提交
    • I
      tcp: make sure xmit goal size never becomes zero · afece1c6
      Ilpo Järvinen 提交于
      It's not too likely to happen, would basically require crafted
      packets (must hit the max guard in tcp_bound_to_half_wnd()).
      It seems that nothing that bad would happen as there's tcp_mems
      and congestion window that prevent runaway at some point from
      hurting all too much (I'm not that sure what all those zero
      sized segments we would generate do though in write queue).
      Preventing it regardless is certainly the best way to go.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afece1c6
    • I
      tcp: cache result of earlier divides when mss-aligning things · 2a3a041c
      Ilpo Järvinen 提交于
      The results is very unlikely change every so often so we
      hardly need to divide again after doing that once for a
      connection. Yet, if divide still becomes necessary we
      detect that and do the right thing and again settle for
      non-divide state. Takes the u16 space which was previously
      taken by the plain xmit_size_goal.
      
      This should take care part of the tso vs non-tso difference
      we found earlier.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a3a041c
    • I
      tcp: simplify tcp_current_mss · 0c54b85f
      Ilpo Järvinen 提交于
      There's very little need for most of the callsites to get
      tp->xmit_goal_size updated. That will cost us divide as is,
      so slice the function in two. Also, the only users of the
      tp->xmit_goal_size are directly behind tcp_current_mss(),
      so there's no need to store that variable into tcp_sock
      at all! The drop of xmit_goal_size currently leaves 16-bit
      hole and some reorganization would again be necessary to
      change that (but I'm aiming to fill that hole with u16
      xmit_goal_size_segs to cache the results of the remaining
      divide to get that tso on regression).
      
      Bring xmit_goal_size parts into tcp.c
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c54b85f
  30. 02 3月, 2009 1 次提交