1. 16 12月, 2009 1 次提交
    • D
      tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11
      David S. Miller 提交于
      It creates a regression, triggering badness for SYN_RECV
      sockets, for example:
      
      [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
      [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
      [19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
      [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
      [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
      
      This is likely caused by the change in the 'estab' parameter
      passed to tcp_parse_options() when invoked by the functions
      in net/ipv4/tcp_minisocks.c
      
      But even if that is fixed, the ->conn_request() changes made in
      this patch series is fundamentally wrong.  They try to use the
      listening socket's 'dst' to probe the route settings.  The
      listening socket doesn't even have a route, and you can't
      get the right route (the child request one) until much later
      after we setup all of the state, and it must be done by hand.
      
      This stuff really isn't ready, so the best thing to do is a
      full revert.  This reverts the following commits:
      
      f55017a9
      022c3f7d
      1aba721e
      cda42ebd
      345cda2f
      dc343475
      05eaade2
      6a2a2d6bSigned-off-by: NDavid S. Miller <davem@davemloft.net>
      bb5b7c11
  2. 09 12月, 2009 1 次提交
  3. 04 12月, 2009 1 次提交
    • E
      net: Batch inet_twsk_purge · b099ce26
      Eric W. Biederman 提交于
      This function walks the whole hashtable so there is no point in
      passing it a network namespace.  Instead I purge all timewait
      sockets from dead network namespaces that I find.  If the namespace
      is one of the once I am trying to purge I am guaranteed no new timewait
      sockets can be formed so this will get them all.  If the namespace
      is one I am not acting for it might form a few more but I will
      call inet_twsk_purge again and  shortly to get rid of them.  In
      any even if the network namespace is dead timewait sockets are
      useless.
      
      Move the calls of inet_twsk_purge into batch_exit routines so
      that if I am killing a bunch of namespaces at once I will just
      call inet_twsk_purge once and save a lot of redundant unnecessary
      work.
      
      My simple 4k network namespace exit test the cleanup time dropped from
      roughly 8.2s to 1.6s.  While the time spent running inet_twsk_purge fell
      to about 2ms.  1ms for ipv4 and 1ms for ipv6.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b099ce26
  4. 03 12月, 2009 3 次提交
    • W
      TCPCT part 1g: Responder Cookie => Initiator · 4957faad
      William Allen Simpson 提交于
      Parse incoming TCP_COOKIE option(s).
      
      Calculate <SYN,ACK> TCP_COOKIE option.
      
      Send optional <SYN,ACK> data.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
         TCPCT part 1d: define TCP cookie option, extend existing struct's
         TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
         TCPCT part 1f: Initiator Cookie => Responder
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4957faad
    • W
      TCPCT part 1d: define TCP cookie option, extend existing struct's · 435cf559
      William Allen Simpson 提交于
      Data structures are carefully composed to require minimal additions.
      For example, the struct tcp_options_received cookie_plus variable fits
      between existing 16-bit and 8-bit variables, requiring no additional
      space (taking alignment into consideration).  There are no additions to
      tcp_request_sock, and only 1 pointer in tcp_sock.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      The principle difference is using a TCP option to carry the cookie nonce,
      instead of a user configured offset in the data.  This is more flexible and
      less subject to user configuration error.  Such a cookie option has been
      suggested for many years, and is also useful without SYN data, allowing
      several related concepts to use the same extension option.
      
          "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
          http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
      
          "Re: what a new TCP header might look like", May 12, 1998.
          ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
      
      These functions will also be used in subsequent patches that implement
      additional features.
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435cf559
    • W
      TCPCT part 1a: add request_values parameter for sending SYNACK · e6b4d113
      William Allen Simpson 提交于
      Add optional function parameters associated with sending SYNACK.
      These parameters are not needed after sending SYNACK, and are not
      used for retransmission.  Avoids extending struct tcp_request_sock,
      and avoids allocating kernel memory.
      
      Also affects DCCP as it uses common struct request_sock_ops,
      but this parameter is currently reserved for future use.
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6b4d113
  5. 14 11月, 2009 1 次提交
  6. 06 11月, 2009 1 次提交
  7. 29 10月, 2009 1 次提交
  8. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  9. 08 10月, 2009 1 次提交
  10. 07 10月, 2009 1 次提交
    • B
      Use sk_mark for IPv6 routing lookups · 51953d5b
      Brian Haley 提交于
      Atis Elsts wrote:
      > Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment.
      
      Ok, I'll just drop that part, I'm not sure what should be done in this case.
      
      > Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that?
      
      I just sent that patch out too quickly, here's a better one with the updates.
      
      Add support for IPv6 route lookups using sk_mark.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51953d5b
  11. 15 9月, 2009 2 次提交
  12. 04 9月, 2009 1 次提交
    • C
      ipv6: Fix tcp_v6_send_response(): it didn't set skb transport header · a8fdf2b3
      Cosmin Ratiu 提交于
      Here is a patch which fixes an issue observed when using TCP over IPv6
      and AH from IPsec.
      
      When a connection gets closed the 4-way method and the last ACK from
      the server gets dropped, the subsequent FINs from the client do not
      get ACKed because tcp_v6_send_response does not set the transport
      header pointer. This causes ah6_output to try to allocate a lot of
      memory, which typically fails, so the ACKs never make it out of the
      stack.
      
      I have reproduced the problem on kernel 2.6.7, but after looking at
      the latest kernel it seems the problem is still there.
      Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8fdf2b3
  13. 03 9月, 2009 1 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
  14. 02 9月, 2009 2 次提交
  15. 20 7月, 2009 2 次提交
  16. 23 6月, 2009 1 次提交
  17. 03 6月, 2009 1 次提交
  18. 22 5月, 2009 1 次提交
  19. 27 4月, 2009 1 次提交
  20. 25 2月, 2009 1 次提交
  21. 30 1月, 2009 1 次提交
    • H
      gro: Avoid copying headers of unmerged packets · 86911732
      Herbert Xu 提交于
      Unfortunately simplicity isn't always the best.  The fraginfo
      interface turned out to be suboptimal.  The problem was quite
      obvious.  For every packet, we have to copy the headers from
      the frags structure into skb->head, even though for 99% of the
      packets this part is immediately thrown away after the merge.
      
      LRO didn't have this problem because it directly read the headers
      from the frags structure.
      
      This patch attempts to address this by creating an interface
      that allows GRO to access the headers in the first frag without
      having to copy it.  Because all drivers that use frags place the
      headers in the first frag this optimisation should be enough.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86911732
  22. 09 1月, 2009 1 次提交
  23. 07 1月, 2009 1 次提交
  24. 30 12月, 2008 1 次提交
  25. 26 11月, 2008 2 次提交
  26. 20 11月, 2008 1 次提交
  27. 17 11月, 2008 1 次提交
    • E
      net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls · 3ab5aee7
      Eric Dumazet 提交于
      RCU was added to UDP lookups, using a fast infrastructure :
      - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
        price of call_rcu() at freeing time.
      - hlist_nulls permits to use few memory barriers.
      
      This patch uses same infrastructure for TCP/DCCP established
      and timewait sockets.
      
      Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
      using short lived TCP connections. A followup patch, converting
      rwlocks to spinlocks will even speedup this case.
      
      __inet_lookup_established() is pretty fast now we dont have to
      dirty a contended cache line (read_lock/read_unlock)
      
      Only established and timewait hashtable are converted to RCU
      (bind table and listen table are still using traditional locking)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab5aee7
  28. 30 10月, 2008 1 次提交
  29. 29 10月, 2008 1 次提交
  30. 20 10月, 2008 1 次提交
  31. 10 10月, 2008 4 次提交
    • G
      tcpv6: fix error with CONFIG_TCP_MD5SIG disabled · fa3e5b4e
      Guo-Fu Tseng 提交于
      This patch fix error with CONFIG_TCP_MD5SIG disabled.
      Signed-off-by: NGuo-Fu Tseng <cooldavid@cooldavid.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa3e5b4e
    • I
      tcpv6: combine tcp_v6_send_(reset|ack) · 626e264d
      Ilpo Järvinen 提交于
      $ codiff tcp_ipv6.o.old tcp_ipv6.o.new
      net/ipv6/tcp_ipv6.c:
        tcp_v6_md5_hash_hdr | -144
        tcp_v6_send_ack     | -585
        tcp_v6_send_reset   | -540
       3 functions changed, 1269 bytes removed, diff: -1269
      
      net/ipv6/tcp_ipv6.c:
        tcp_v6_send_response | +791
       1 function changed, 791 bytes added, diff: +791
      
      tcp_ipv6.o.new:
       4 functions changed, 791 bytes added, 1269 bytes removed, diff: -478
      
      I choose to leave the reset related netns comment in place (not
      the one that is killed) as I cannot understand its English so
      it's a bit hard for me to evaluate its usefulness :-).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626e264d
    • I
      tcpv6: convert opt[] -> topt in tcp_v6_send_reset · 81ada62d
      Ilpo Järvinen 提交于
      after this I get:
      
      $ diff-funcs tcp_v6_send_reset tcp_ipv6.c tcp_ipv6.c tcp_v6_send_ack
       --- tcp_ipv6.c:tcp_v6_send_reset()
       +++ tcp_ipv6.c:tcp_v6_send_ack()
      @@ -1,4 +1,5 @@
      -static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
      +static void tcp_v6_send_ack(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
      u32 ts,
      +                           struct tcp_md5sig_key *key)
       {
              struct tcphdr *th = tcp_hdr(skb), *t1;
              struct sk_buff *buff;
      @@ -7,31 +8,14 @@
              struct sock *ctl_sk = net->ipv6.tcp_sk;
              unsigned int tot_len = sizeof(struct tcphdr);
              __be32 *topt;
      -#ifdef CONFIG_TCP_MD5SIG
      -       struct tcp_md5sig_key *key;
      -#endif
      -
      -       if (th->rst)
      -               return;
      -
      -       if (!ipv6_unicast_destination(skb))
      -               return;
      
      +       if (ts)
      +               tot_len += TCPOLEN_TSTAMP_ALIGNED;
       #ifdef CONFIG_TCP_MD5SIG
      -       if (sk)
      -               key = tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr);
      -       else
      -               key = NULL;
      -
              if (key)
                      tot_len += TCPOLEN_MD5SIG_ALIGNED;
       #endif
      
      -       /*
      -        * We need to grab some memory, and put together an RST,
      -        * and then put it into the queue to be sent.
      -        */
      -
              buff = alloc_skb(MAX_HEADER + sizeof(struct ipv6hdr) + tot_len,
                               GFP_ATOMIC);
              if (buff == NULL)
      @@ -46,18 +30,20 @@
              t1->dest = th->source;
              t1->source = th->dest;
              t1->doff = tot_len / 4;
      -       t1->rst = 1;
      -
      -       if(th->ack) {
      -               t1->seq = th->ack_seq;
      -       } else {
      -               t1->ack = 1;
      -               t1->ack_seq = htonl(ntohl(th->seq) + th->syn + th->fin
      -                                   + skb->len - (th->doff<<2));
      -       }
      +       t1->seq = htonl(seq);
      +       t1->ack_seq = htonl(ack);
      +       t1->ack = 1;
      +       t1->window = htons(win);
      
              topt = (__be32 *)(t1 + 1);
      
      +       if (ts) {
      +               *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) |
      +                               (TCPOPT_TIMESTAMP << 8) |
      TCPOLEN_TIMESTAMP);
      +               *topt++ = htonl(tcp_time_stamp);
      +               *topt++ = htonl(ts);
      +       }
      +
       #ifdef CONFIG_TCP_MD5SIG
              if (key) {
                      *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) |
      @@ -84,15 +70,10 @@
              fl.fl_ip_sport = t1->source;
              security_skb_classify_flow(skb, &fl);
      
      -       /* Pass a socket to ip6_dst_lookup either it is for RST
      -        * Underlying function will use this to retrieve the network
      -        * namespace
      -        */
              if (!ip6_dst_lookup(ctl_sk, &buff->dst, &fl)) {
                      if (xfrm_lookup(&buff->dst, &fl, NULL, 0) >= 0) {
                              ip6_xmit(ctl_sk, buff, &fl, NULL, 0);
                              TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
      -                       TCP_INC_STATS_BH(net, TCP_MIB_OUTRSTS);
                              return;
                      }
              }
      
      
      ...which starts to be trivial to combine.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81ada62d
    • I