1. 24 9月, 2010 1 次提交
  2. 02 9月, 2010 1 次提交
  3. 23 8月, 2010 1 次提交
    • H
      tcp: allow effective reduction of TCP's rcv-buffer via setsockopt · e88c64f0
      Hagen Paul Pfeifer 提交于
      Via setsockopt it is possible to reduce the socket RX buffer
      (SO_RCVBUF). TCP method to select the initial window and window scaling
      option in tcp_select_initial_window() currently misbehaves and do not
      consider a reduced RX socket buffer via setsockopt.
      
      Even though the server's RX buffer is reduced via setsockopt() to 256
      byte (Initial Window 384 byte => 256 * 2 - (256 * 2 / 4)) the window
      scale option is still 7:
      
      192.168.1.38.40676 > 78.47.222.210.5001: Flags [S], seq 2577214362, win 5840, options [mss 1460,sackOK,TS val 338417 ecr 0,nop,wscale 0], length 0
      78.47.222.210.5001 > 192.168.1.38.40676: Flags [S.], seq 1570631029, ack 2577214363, win 384, options [mss 1452,sackOK,TS val 2435248895 ecr 338417,nop,wscale 7], length 0
      192.168.1.38.40676 > 78.47.222.210.5001: Flags [.], ack 1, win 5840, options [nop,nop,TS val 338421 ecr 2435248895], length 0
      
      Within tcp_select_initial_window() the original space argument - a
      representation of the rx buffer size - is expanded during
      tcp_select_initial_window(). Only sysctl_tcp_rmem[2], sysctl_rmem_max
      and window_clamp are considered to calculate the initial window.
      
      This patch adjust the window_clamp argument if the user explicitly
      reduce the receive buffer.
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e88c64f0
  4. 20 7月, 2010 1 次提交
  5. 13 7月, 2010 1 次提交
  6. 29 6月, 2010 1 次提交
  7. 16 6月, 2010 1 次提交
    • C
      tcp: unify tcp flag macros · a3433f35
      Changli Gao 提交于
      unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH,
      TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced
      with the corresponding TCPHDR_*.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      ----
       include/net/tcp.h                      |   24 ++++++-------
       net/ipv4/tcp.c                         |    8 ++--
       net/ipv4/tcp_input.c                   |    2 -
       net/ipv4/tcp_output.c                  |   59 ++++++++++++++++-----------------
       net/netfilter/nf_conntrack_proto_tcp.c |   32 ++++++-----------
       net/netfilter/xt_TCPMSS.c              |    4 --
       6 files changed, 58 insertions(+), 71 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3433f35
  8. 18 5月, 2010 1 次提交
    • E
      tcp: tcp_synack_options() fix · de213e5e
      Eric Dumazet 提交于
      Commit 33ad798c (tcp: options clean up) introduced a problem
      if MD5+SACK+timestamps were used in initial SYN message.
      
      Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
      sessions, but since 40 bytes of tcp options space are not enough to
      store all the bits needed, we chose to disable timestamps in this case.
      
      We send a SYN-ACK _without_ timestamp option, but socket has timestamps
      enabled and all further outgoing messages contain a TS block, all with
      the initial timestamp of the remote peer.
      
      Fix is to really disable timestamps option for the whole session.
      Reported-by: NBijay Singh <Bijay.Singh@guavus.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de213e5e
  9. 16 5月, 2010 1 次提交
    • E
      net: Introduce sk_route_nocaps · a465419b
      Eric Dumazet 提交于
      TCP-MD5 sessions have intermittent failures, when route cache is
      invalidated. ip_queue_xmit() has to find a new route, calls
      sk_setup_caps(sk, &rt->u.dst), destroying the 
      
      sk->sk_route_caps &= ~NETIF_F_GSO_MASK
      
      that MD5 desperately try to make all over its way (from
      tcp_transmit_skb() for example)
      
      So we send few bad packets, and everything is fine when
      tcp_transmit_skb() is called again for this socket.
      
      Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
      socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.
      Reported-by: NBhaskar Dutta <bhaskie@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a465419b
  10. 23 4月, 2010 1 次提交
  11. 21 4月, 2010 1 次提交
  12. 16 4月, 2010 1 次提交
  13. 12 4月, 2010 2 次提交
    • D
      tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb · 2e8e18ef
      David S. Miller 提交于
      Back in commit 04a0551c
      ("loopback: Drop obsolete ip_summed setting") we stopped
      setting CHECKSUM_UNNECESSARY in the loopback xmit.
      
      This is because such a setting was a lie since it implies that the
      checksum field of the packet is properly filled in.
      
      Instead what happens normally is that CHECKSUM_PARTIAL is set and
      skb->csum is calculated as needed.
      
      But this was only happening for TCP data packets (via the
      skb->ip_summed assignment done in tcp_sendmsg()).  It doesn't
      happen for non-data packets like ACKs etc.
      
      Fix this by setting skb->ip_summed in the common non-data packet
      constructor.  It already is setting skb->csum to zero.
      
      But this reminds us that we still have things like ip_output.c's
      ip_dev_loopback_xmit() which sets skb->ip_summed to the value
      CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
      valid.  So we'll have to address that at some point too.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e8e18ef
    • H
      inet: Remove unused send_check length argument · bb296246
      Herbert Xu 提交于
      inet: Remove unused send_check length argument
      
      This patch removes the unused length argument from the send_check
      function in struct inet_connection_sock_af_ops.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NYinghai <yinghai.lu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb296246
  14. 11 4月, 2010 1 次提交
  15. 09 4月, 2010 1 次提交
    • D
      tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb · 2626419a
      David S. Miller 提交于
      Back in commit 04a0551c
      ("loopback: Drop obsolete ip_summed setting") we stopped
      setting CHECKSUM_UNNECESSARY in the loopback xmit.
      
      This is because such a setting was a lie since it implies that the
      checksum field of the packet is properly filled in.
      
      Instead what happens normally is that CHECKSUM_PARTIAL is set and
      skb->csum is calculated as needed.
      
      But this was only happening for TCP data packets (via the
      skb->ip_summed assignment done in tcp_sendmsg()).  It doesn't
      happen for non-data packets like ACKs etc.
      
      Fix this by setting skb->ip_summed in the common non-data packet
      constructor.  It already is setting skb->csum to zero.
      
      But this reminds us that we still have things like ip_output.c's
      ip_dev_loopback_xmit() which sets skb->ip_summed to the value
      CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
      valid.  So we'll have to address that at some point too.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2626419a
  16. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  17. 09 3月, 2010 1 次提交
  18. 24 12月, 2009 2 次提交
    • L
      net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926
      laurent chavey 提交于
      Add rtnetlink init_rcvwnd to set the TCP initial receive window size
      advertised by passive and active TCP connections.
      The current Linux TCP implementation limits the advertised TCP initial
      receive window to the one prescribed by slow start. For short lived
      TCP connections used for transaction type of traffic (i.e. http
      requests), bounding the advertised TCP initial receive window results
      in increased latency to complete the transaction.
      Support for setting initial congestion window is already supported
      using rtnetlink init_cwnd, but the feature is useless without the
      ability to set a larger TCP initial receive window.
      The rtnetlink init_rcvwnd allows increasing the TCP initial receive
      window, allowing TCP connection to advertise larger TCP receive window
      than the ones bounded by slow start.
      Signed-off-by: NLaurent Chavey <chavey@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31d12926
    • K
      tcp: Remove check in __tcp_push_pending_frames · 12d50c46
      Krishna Kumar 提交于
      tcp_push checks tcp_send_head and calls __tcp_push_pending_frames,
      which again checks tcp_send_head, and this unnecessary check is
      done for every other caller of __tcp_push_pending_frames.
      
      Remove tcp_send_head check in __tcp_push_pending_frames and add
      the check to tcp_push_pending_frames. Other functions call
      __tcp_push_pending_frames only when tcp_send_head would evaluate
      to true.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d50c46
  19. 16 12月, 2009 1 次提交
    • D
      tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11
      David S. Miller 提交于
      It creates a regression, triggering badness for SYN_RECV
      sockets, for example:
      
      [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
      [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
      [19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
      [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
      [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
      
      This is likely caused by the change in the 'estab' parameter
      passed to tcp_parse_options() when invoked by the functions
      in net/ipv4/tcp_minisocks.c
      
      But even if that is fixed, the ->conn_request() changes made in
      this patch series is fundamentally wrong.  They try to use the
      listening socket's 'dst' to probe the route settings.  The
      listening socket doesn't even have a route, and you can't
      get the right route (the child request one) until much later
      after we setup all of the state, and it must be done by hand.
      
      This stuff really isn't ready, so the best thing to do is a
      full revert.  This reverts the following commits:
      
      f55017a9
      022c3f7d
      1aba721e
      cda42ebd
      345cda2f
      dc343475
      05eaade2
      6a2a2d6bSigned-off-by: NDavid S. Miller <davem@davemloft.net>
      bb5b7c11
  20. 03 12月, 2009 6 次提交
  21. 24 11月, 2009 1 次提交
  22. 29 10月, 2009 3 次提交
  23. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  24. 02 10月, 2009 1 次提交
    • O
      IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61
      Ori Finkelman 提交于
      Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
      and SYN headers even if our window scale is zero.
      
      This fixes the following observed behavior:
      
      1. Client sends a SYN with TCP window scaling option and non zero window scale
      value to a Linux box.
      2. Linux box notes large receive window from client.
      3. Linux decides on a zero value of window scale for its part.
      4. Due to compare against requested window scale size option, Linux does not to
       send windows scale TCP option header on SYN/ACK at all.
      
      With the following result:
      
      Client box thinks TCP window scaling is not supported, since SYN/ACK had no
      TCP window scale option, while Linux thinks that TCP window scaling is
      supported (and scale might be non zero), since SYN had  TCP window scale
      option and we have a mismatched idea between the client and server
      regarding window sizes.
      
      Probably it also fixes up the following bug (not observed in practice):
      
      1. Linux box opens TCP connection to some server.
      2. Linux decides on zero value of window scale.
      3. Due to compare against computed window scale size option, Linux does
      not to set windows scale TCP  option header on SYN.
      
      With the expected result that the server OS does not use window scale option
      due to not receiving such an option in the SYN headers, leading to suboptimal
      performance.
      Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NOri Finkelman <ori@comsleep.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e95a61
  25. 03 9月, 2009 1 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
  26. 24 7月, 2009 1 次提交
  27. 20 7月, 2009 1 次提交
  28. 30 6月, 2009 1 次提交
  29. 03 6月, 2009 1 次提交
  30. 05 5月, 2009 1 次提交
    • I
      tcp: extend ECN sysctl to allow server-side only ECN · 255cac91
      Ilpo Järvinen 提交于
      This should be very safe compared with full enabled, so I see
      no reason why it shouldn't be done right away. As ECN can only
      be negotiated if the SYN sending party is also supporting it,
      somebody in the loop probably knows what he/she is doing. If
      SYN does not ask for ECN, the server side SYN-ACK is identical
      to what it is without ECN. Thus it's quite safe.
      
      The chosen value is safe w.r.t to existing configs which
      choose to currently set manually either 0 or 1 but
      silently upgrades those who have not explicitly requested
      ECN off.
      
      Whether to just enable both sides comes up time to time but
      unless that gets done now we can at least make the servers
      aware of ECN already. As there are some known problems to occur
      if ECN is enabled, it's currently questionable whether there's
      any real gain from enabling clients as servers mostly won't
      support it anyway (so we'd hit just the negative sides). After
      enabling the servers and getting that deployed, the client end
      enable really has some potential gain too.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      255cac91
  31. 20 4月, 2009 1 次提交