1. 24 10月, 2009 1 次提交
  2. 21 10月, 2009 1 次提交
  3. 20 10月, 2009 1 次提交
  4. 19 10月, 2009 3 次提交
  5. 15 10月, 2009 1 次提交
  6. 13 10月, 2009 4 次提交
    • E
      tcp: replace ehash_size by ehash_mask · f373b53b
      Eric Dumazet 提交于
      Storing the mask (size - 1) instead of the size allows fast path to be
      a bit faster.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f373b53b
    • E
      udp: Fix udp_poll() and ioctl() · 85584672
      Eric Dumazet 提交于
      udp_poll() can in some circumstances drop frames with incorrect checksums.
      
      Problem is we now have to lock the socket while dropping frames, or risk
      sk_forward corruption.
      
      This bug is present since commit 95766fff
      ([UDP]: Add memory accounting.)
      
      While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85584672
    • W
      tcp: fix tcp_defer_accept to consider the timeout · 6d01a026
      Willy Tarreau 提交于
      I was trying to use TCP_DEFER_ACCEPT and noticed that if the
      client does not talk, the connection is never accepted and
      remains in SYN_RECV state until the retransmits expire, where
      it finally is deleted. This is bad when some firewall such as
      netfilter sits between the client and the server because the
      firewall sees the connection in ESTABLISHED state while the
      server will finally silently drop it without sending an RST.
      
      This behaviour contradicts the man page which says it should
      wait only for some time :
      
             TCP_DEFER_ACCEPT (since Linux 2.4)
                Allows a listener to be awakened only when data arrives
                on the socket.  Takes an integer value  (seconds), this
                can  bound  the  maximum  number  of attempts TCP will
                make to complete the connection. This option should not
                be used in code intended to be portable.
      
      Also, looking at ipv4/tcp.c, a retransmit counter is correctly
      computed :
      
              case TCP_DEFER_ACCEPT:
                      icsk->icsk_accept_queue.rskq_defer_accept = 0;
                      if (val > 0) {
                              /* Translate value in seconds to number of
                               * retransmits */
                              while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                                     val > ((TCP_TIMEOUT_INIT / HZ) <<
                                             icsk->icsk_accept_queue.rskq_defer_accept))
                                      icsk->icsk_accept_queue.rskq_defer_accept++;
                              icsk->icsk_accept_queue.rskq_defer_accept++;
                      }
                      break;
      
      ==> rskq_defer_accept is used as a counter of retransmits.
      
      But in tcp_minisocks.c, this counter is only checked. And in
      fact, I have found no location which updates it. So I think
      that what was intended was to decrease it in tcp_minisocks
      whenever it is checked, which the trivial patch below does.
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d01a026
    • N
      net: Generalize socket rx gap / receive queue overflow cmsg · 3b885787
      Neil Horman 提交于
      Create a new socket level option to report number of queue overflows
      
      Recently I augmented the AF_PACKET protocol to report the number of frames lost
      on the socket receive queue between any two enqueued frames.  This value was
      exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
      requested that this feature be generalized so that any datagram oriented socket
      could make use of this option.  As such I've created this patch, It creates a
      new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
      SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
      overflowed between any two given frames.  It also augments the AF_PACKET
      protocol to take advantage of this new feature (as it previously did not touch
      sk->sk_drops, which this patch uses to record the overflow count).  Tested
      successfully by me.
      
      Notes:
      
      1) Unlike my previous patch, this patch simply records the sk_drops value, which
      is not a number of drops between packets, but rather a total number of drops.
      Deltas must be computed in user space.
      
      2) While this patch currently works with datagram oriented protocols, it will
      also be accepted by non-datagram oriented protocols. I'm not sure if thats
      agreeable to everyone, but my argument in favor of doing so is that, for those
      protocols which aren't applicable to this option, sk_drops will always be zero,
      and reporting no drops on a receive queue that isn't used for those
      non-participating protocols seems reasonable to me.  This also saves us having
      to code in a per-protocol opt in mechanism.
      
      3) This applies cleanly to net-next assuming that commit
      97775007 (my af packet cmsg patch) is reverted
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b885787
  7. 08 10月, 2009 3 次提交
  8. 07 10月, 2009 3 次提交
  9. 05 10月, 2009 3 次提交
  10. 03 10月, 2009 1 次提交
  11. 02 10月, 2009 3 次提交
    • A
      net: Use sk_mark for routing lookup in more places · 914a9ab3
      Atis Elsts 提交于
      This patch against v2.6.31 adds support for route lookup using sk_mark in some 
      more places. The benefits from this patch are the following.
      First, SO_MARK option now has effect on UDP sockets too.
      Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
      lookup correctly if TCP sockets with SO_MARK were used.
      Signed-off-by: NAtis Elsts <atis@mikrotik.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      914a9ab3
    • O
      IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61
      Ori Finkelman 提交于
      Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
      and SYN headers even if our window scale is zero.
      
      This fixes the following observed behavior:
      
      1. Client sends a SYN with TCP window scaling option and non zero window scale
      value to a Linux box.
      2. Linux box notes large receive window from client.
      3. Linux decides on a zero value of window scale for its part.
      4. Due to compare against requested window scale size option, Linux does not to
       send windows scale TCP option header on SYN/ACK at all.
      
      With the following result:
      
      Client box thinks TCP window scaling is not supported, since SYN/ACK had no
      TCP window scale option, while Linux thinks that TCP window scaling is
      supported (and scale might be non zero), since SYN had  TCP window scale
      option and we have a mismatched idea between the client and server
      regarding window sizes.
      
      Probably it also fixes up the following bug (not observed in practice):
      
      1. Linux box opens TCP connection to some server.
      2. Linux decides on zero value of window scale.
      3. Due to compare against computed window scale size option, Linux does
      not to set windows scale TCP  option header on SYN.
      
      With the expected result that the server OS does not use window scale option
      due to not receiving such an option in the SYN headers, leading to suboptimal
      performance.
      Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NOri Finkelman <ori@comsleep.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e95a61
    • A
      net/ipv4/tcp.c: fix min() type mismatch warning · 4fdb78d3
      Andrew Morton 提交于
      net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
      net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fdb78d3
  12. 01 10月, 2009 1 次提交
  13. 25 9月, 2009 2 次提交
  14. 24 9月, 2009 1 次提交
  15. 22 9月, 2009 1 次提交
  16. 16 9月, 2009 1 次提交
    • R
      tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG() · 657e9649
      Robert Varga 提交于
      I have recently came across a preemption imbalance detected by:
      
      <4>huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101?
      <0>------------[ cut here ]------------
      <2>kernel BUG at /usr/src/linux/kernel/timer.c:664!
      <0>invalid opcode: 0000 [1] PREEMPT SMP
      
      with ffffffff80644630 being inet_twdr_hangman().
      
      This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a
      bit, so I looked at what might have caused it.
      
      One thing that struck me as strange is tcp_twsk_destructor(), as it
      calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the
      detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well,
      as far as I can tell.
      Signed-off-by: NRobert Varga <nite@hq.alert.sk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      657e9649
  17. 15 9月, 2009 3 次提交
  18. 09 9月, 2009 1 次提交
  19. 03 9月, 2009 2 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
    • E
      ip: Report qdisc packet drops · 6ce9e7b5
      Eric Dumazet 提交于
      Christoph Lameter pointed out that packet drops at qdisc level where not
      accounted in SNMP counters. Only if application sets IP_RECVERR, drops
      are reported to user (-ENOBUFS errors) and SNMP counters updated.
      
      IP_RECVERR is used to enable extended reliable error message passing,
      but these are not needed to update system wide SNMP stats.
      
      This patch changes things a bit to allow SNMP counters to be updated,
      regardless of IP_RECVERR being set or not on the socket.
      
      Example after an UDP tx flood
      # netstat -s 
      ...
      IP:
          1487048 outgoing packets dropped
      ...
      Udp:
      ...
          SndbufErrors: 1487048
      
      
      send() syscalls, do however still return an OK status, to not
      break applications.
      
      Note : send() manual page explicitly says for -ENOBUFS error :
      
       "The output queue for a network interface was full.
        This generally indicates that the interface has stopped sending,
        but may be caused by transient congestion.
        (Normally, this does not occur in Linux. Packets are just silently
        dropped when a device queue overflows.) "
      
      This is not true for IP_RECVERR enabled sockets : a send() syscall
      that hit a qdisc drop returns an ENOBUFS error.
      
      Many thanks to Christoph, David, and last but not least, Alexey !
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ce9e7b5
  20. 02 9月, 2009 3 次提交
  21. 01 9月, 2009 1 次提交
    • D
      Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value. · 6fa12c85
      Damian Lukowski 提交于
      RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
      which may represent a number of allowed retransmissions or a timeout value.
      Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
      in number of allowed retransmissions.
      
      For any desired threshold R2 (by means of time) one can specify tcp_retries2
      (by means of number of retransmissions) such that TCP will not time out
      earlier than R2. This is the case, because the RTO schedule follows a fixed
      pattern, namely exponential backoff.
      
      However, the RTO behaviour is not predictable any more if RTO backoffs can be
      reverted, as it is the case in the draft
      "Make TCP more Robust to Long Connectivity Disruptions"
      (http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).
      
      In the worst case TCP would time out a connection after 3.2 seconds, if the
      initial RTO equaled MIN_RTO and each backoff has been reverted.
      
      This patch introduces a function retransmits_timed_out(N),
      which calculates the timeout of a TCP connection, assuming an initial
      RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.
      
      Whenever timeout decisions are made by comparing the retransmission counter
      to some value N, this function can be used, instead.
      
      The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
      can occur than the value indicates. However, it yields a timeout which is
      similar to the one of an unpatched, exponentially backing off TCP in the same
      scenario. As no application could rely on an RTO greater than MIN_RTO, there
      should be no risk of a regression.
      Signed-off-by: NDamian Lukowski <damian@tvk.rwth-aachen.de>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fa12c85