1. 13 7月, 2010 1 次提交
  2. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 22 3月, 2010 1 次提交
  4. 06 3月, 2010 1 次提交
  5. 16 12月, 2009 1 次提交
    • D
      tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11
      David S. Miller 提交于
      It creates a regression, triggering badness for SYN_RECV
      sockets, for example:
      
      [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
      [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
      [19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
      [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
      [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
      
      This is likely caused by the change in the 'estab' parameter
      passed to tcp_parse_options() when invoked by the functions
      in net/ipv4/tcp_minisocks.c
      
      But even if that is fixed, the ->conn_request() changes made in
      this patch series is fundamentally wrong.  They try to use the
      listening socket's 'dst' to probe the route settings.  The
      listening socket doesn't even have a route, and you can't
      get the right route (the child request one) until much later
      after we setup all of the state, and it must be done by hand.
      
      This stuff really isn't ready, so the best thing to do is a
      full revert.  This reverts the following commits:
      
      f55017a9
      022c3f7d
      1aba721e
      cda42ebd
      345cda2f
      dc343475
      05eaade2
      6a2a2d6bSigned-off-by: NDavid S. Miller <davem@davemloft.net>
      bb5b7c11
  6. 03 12月, 2009 3 次提交
    • W
      TCPCT part 1g: Responder Cookie => Initiator · 4957faad
      William Allen Simpson 提交于
      Parse incoming TCP_COOKIE option(s).
      
      Calculate <SYN,ACK> TCP_COOKIE option.
      
      Send optional <SYN,ACK> data.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
         TCPCT part 1d: define TCP cookie option, extend existing struct's
         TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
         TCPCT part 1f: Initiator Cookie => Responder
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4957faad
    • W
      TCPCT part 1d: define TCP cookie option, extend existing struct's · 435cf559
      William Allen Simpson 提交于
      Data structures are carefully composed to require minimal additions.
      For example, the struct tcp_options_received cookie_plus variable fits
      between existing 16-bit and 8-bit variables, requiring no additional
      space (taking alignment into consideration).  There are no additions to
      tcp_request_sock, and only 1 pointer in tcp_sock.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      The principle difference is using a TCP option to carry the cookie nonce,
      instead of a user configured offset in the data.  This is more flexible and
      less subject to user configuration error.  Such a cookie option has been
      suggested for many years, and is also useful without SYN data, allowing
      several related concepts to use the same extension option.
      
          "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
          http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
      
          "Re: what a new TCP header might look like", May 12, 1998.
          ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
      
      These functions will also be used in subsequent patches that implement
      additional features.
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435cf559
    • W
      TCPCT part 1a: add request_values parameter for sending SYNACK · e6b4d113
      William Allen Simpson 提交于
      Add optional function parameters associated with sending SYNACK.
      These parameters are not needed after sending SYNACK, and are not
      used for retransmission.  Avoids extending struct tcp_request_sock,
      and avoids allocating kernel memory.
      
      Also affects DCCP as it uses common struct request_sock_ops,
      but this parameter is currently reserved for future use.
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6b4d113
  7. 22 11月, 2009 1 次提交
  8. 14 11月, 2009 1 次提交
  9. 05 11月, 2009 1 次提交
  10. 29 10月, 2009 2 次提交
  11. 20 10月, 2009 2 次提交
    • J
      tcp: accept socket after TCP_DEFER_ACCEPT period · d1b99ba4
      Julian Anastasov 提交于
      Willy Tarreau and many other folks in recent years
      were concerned what happens when the TCP_DEFER_ACCEPT period
      expires for clients which sent ACK packet. They prefer clients
      that actively resend ACK on our SYN-ACK retransmissions to be
      converted from open requests to sockets and queued to the
      listener for accepting after the deferring period is finished.
      Then application server can decide to wait longer for data
      or to properly terminate the connection with FIN if read()
      returns EAGAIN which is an indication for accepting after
      the deferring period. This change still can have side effects
      for applications that expect always to see data on the accepted
      socket. Others can be prepared to work in both modes (with or
      without TCP_DEFER_ACCEPT period) and their data processing can
      ignore the read=EAGAIN notification and to allocate resources for
      clients which proved to have no data to send during the deferring
      period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not
      as a timeout) to wait for data will notice clients that didn't
      send data for 3 seconds but that still resend ACKs.
      Thanks to Willy Tarreau for the initial idea and to
      Eric Dumazet for the review and testing the change.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b99ba4
    • D
      Revert "tcp: fix tcp_defer_accept to consider the timeout" · a1a2ad91
      David S. Miller 提交于
      This reverts commit 6d01a026.
      
      Julian Anastasov, Willy Tarreau and Eric Dumazet have come up
      with a more correct way to deal with this.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a2ad91
  12. 13 10月, 2009 1 次提交
    • W
      tcp: fix tcp_defer_accept to consider the timeout · 6d01a026
      Willy Tarreau 提交于
      I was trying to use TCP_DEFER_ACCEPT and noticed that if the
      client does not talk, the connection is never accepted and
      remains in SYN_RECV state until the retransmits expire, where
      it finally is deleted. This is bad when some firewall such as
      netfilter sits between the client and the server because the
      firewall sees the connection in ESTABLISHED state while the
      server will finally silently drop it without sending an RST.
      
      This behaviour contradicts the man page which says it should
      wait only for some time :
      
             TCP_DEFER_ACCEPT (since Linux 2.4)
                Allows a listener to be awakened only when data arrives
                on the socket.  Takes an integer value  (seconds), this
                can  bound  the  maximum  number  of attempts TCP will
                make to complete the connection. This option should not
                be used in code intended to be portable.
      
      Also, looking at ipv4/tcp.c, a retransmit counter is correctly
      computed :
      
              case TCP_DEFER_ACCEPT:
                      icsk->icsk_accept_queue.rskq_defer_accept = 0;
                      if (val > 0) {
                              /* Translate value in seconds to number of
                               * retransmits */
                              while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                                     val > ((TCP_TIMEOUT_INIT / HZ) <<
                                             icsk->icsk_accept_queue.rskq_defer_accept))
                                      icsk->icsk_accept_queue.rskq_defer_accept++;
                              icsk->icsk_accept_queue.rskq_defer_accept++;
                      }
                      break;
      
      ==> rskq_defer_accept is used as a counter of retransmits.
      
      But in tcp_minisocks.c, this counter is only checked. And in
      fact, I have found no location which updates it. So I think
      that what was intended was to decrease it in tcp_minisocks
      whenever it is checked, which the trivial patch below does.
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d01a026
  13. 16 9月, 2009 1 次提交
    • R
      tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG() · 657e9649
      Robert Varga 提交于
      I have recently came across a preemption imbalance detected by:
      
      <4>huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101?
      <0>------------[ cut here ]------------
      <2>kernel BUG at /usr/src/linux/kernel/timer.c:664!
      <0>invalid opcode: 0000 [1] PREEMPT SMP
      
      with ffffffff80644630 being inet_twdr_hangman().
      
      This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a
      bit, so I looked at what might have caused it.
      
      One thing that struck me as strange is tcp_twsk_destructor(), as it
      calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the
      detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well,
      as far as I can tell.
      Signed-off-by: NRobert Varga <nite@hq.alert.sk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      657e9649
  14. 15 9月, 2009 1 次提交
  15. 03 9月, 2009 1 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
  16. 29 8月, 2009 1 次提交
  17. 26 6月, 2009 1 次提交
    • W
      tcp: missing check ACK flag of received segment in FIN-WAIT-2 state · 1ac530b3
      Wei Yongjun 提交于
      RFC0793 defined that in FIN-WAIT-2 state if the ACK bit is off drop
      the segment and return[Page 72]. But this check is missing in function
      tcp_timewait_state_process(). This cause the segment with FIN flag but
      no ACK has two diffent action:
      
      Case 1:
          Node A                      Node B
                    <-------------    FIN,ACK
                                      (enter FIN-WAIT-1)
          ACK       ------------->
                                      (enter FIN-WAIT-2)
          FIN       ------------->    discard
                                      (move sk to tw list)
      
      Case 2:
          Node A                      Node B
                    <-------------    FIN,ACK
                                      (enter FIN-WAIT-1)
          ACK       ------------->
                                      (enter FIN-WAIT-2)
                                      (move sk to tw list)
          FIN       ------------->
      
                    <-------------    ACK
      
      This patch fixed the problem.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ac530b3
  18. 16 3月, 2009 1 次提交
    • I
      tcp: consolidate paws check · c887e6d2
      Ilpo Järvinen 提交于
      Wow, it was quite tricky to merge that stream of negations
      but I think I finally got it right:
      
      check & replace_ts_recent:
      (s32)(rcv_tsval - ts_recent) >= 0                  => 0
      (s32)(ts_recent - rcv_tsval) <= 0                  => 0
      
      discard:
      (s32)(ts_recent - rcv_tsval)  > TCP_PAWS_WINDOW    => 1
      (s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW    => 0
      
      I toggled the return values of tcp_paws_check around since
      the old encoding added yet-another negation making tracking
      of truth-values really complicated.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c887e6d2
  19. 03 3月, 2009 1 次提交
  20. 02 3月, 2009 1 次提交
  21. 03 11月, 2008 1 次提交
  22. 08 10月, 2008 1 次提交
    • I
      tcp: kill pointless urg_mode · 33f5f57e
      Ilpo Järvinen 提交于
      It all started from me noticing that this urgent check in
      tcp_clean_rtx_queue is unnecessarily inside the loop. Then
      I took a longer look to it and found out that the users of
      urg_mode can trivially do without, well almost, there was
      one gotcha.
      
      Bonus: those funny people who use urg with >= 2^31 write_seq -
      snd_una could now rejoice too (that's the only purpose for the
      between being there, otherwise a simple compare would have done
      the thing). Not that I assume that the rest of the tcp code
      happily lives with such mind-boggling numbers :-). Alas, it
      turned out to be impossible to set wmem to such numbers anyway,
      yes I really tried a big sendfile after setting some wmem but
      nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
      So I hacked a bit variable to long and found out that it seems
      to work... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33f5f57e
  23. 08 8月, 2008 1 次提交
  24. 07 8月, 2008 1 次提交
    • G
      tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup · 6edafaaf
      Gui Jianfeng 提交于
      If the following packet flow happen, kernel will panic.
      MathineA			MathineB
      		SYN
      	---------------------->    
              	SYN+ACK
      	<----------------------
      		ACK(bad seq)
      	---------------------->
      When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
      is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
      NULL at that moment, so kernel panic happens.
      This patch fixes this bug.
      
      OOPS output is as following:
      [  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
      [  302.817075] Oops: 0000 [#1] SMP 
      [  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
      [  302.849946] 
      [  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
      [  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
      [  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
      [  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
      [  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
      [  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
      [  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
      [  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
      [  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
      [  302.900140] Call Trace:
      [  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
      [  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
      [  302.910082]  [<c04250a3>] printk+0x14/0x18
      [  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
      [  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
      [  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
      [  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
      [  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
      [  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
      [  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
      [  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
      [  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
      [  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
      [  302.948999]  [<c040564b>] do_softirq+0x55/0x88
      [  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
      [  302.954986]  [<c04284da>] irq_exit+0x35/0x69
      [  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
      [  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
      [  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
      [  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
      [  302.972169]  =======================
      [  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
      [  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
      [  303.018360] Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6edafaaf
  25. 17 7月, 2008 2 次提交
  26. 13 6月, 2008 1 次提交
    • D
      tcp: Revert 'process defer accept as established' changes. · ec0a1966
      David S. Miller 提交于
      This reverts two changesets, ec3c0982
      ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
      the follow-on bug fix 9ae27e0a
      ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
      
      This change causes several problems, first reported by Ingo Molnar
      as a distcc-over-loopback regression where connections were getting
      stuck.
      
      Ilpo Järvinen first spotted the locking problems.  The new function
      added by this code, tcp_defer_accept_check(), only has the
      child socket locked, yet it is modifying state of the parent
      listening socket.
      
      Fixing that is non-trivial at best, because we can't simply just grab
      the parent listening socket lock at this point, because it would
      create an ABBA deadlock.  The normal ordering is parent listening
      socket --> child socket, but this code path would require the
      reverse lock ordering.
      
      Next is a problem noticed by Vitaliy Gusev, he noted:
      
      ----------------------------------------
      >--- a/net/ipv4/tcp_timer.c
      >+++ b/net/ipv4/tcp_timer.c
      >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
      > 		goto death;
      > 	}
      >
      >+	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
      >+		tcp_send_active_reset(sk, GFP_ATOMIC);
      >+		goto death;
      
      Here socket sk is not attached to listening socket's request queue. tcp_done()
      will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
      release this sk) as socket is not DEAD. Therefore socket sk will be lost for
      freeing.
      ----------------------------------------
      
      Finally, Alexey Kuznetsov argues that there might not even be any
      real value or advantage to these new semantics even if we fix all
      of the bugs:
      
      ----------------------------------------
      Hiding from accept() sockets with only out-of-order data only
      is the only thing which is impossible with old approach. Is this really
      so valuable? My opinion: no, this is nothing but a new loophole
      to consume memory without control.
      ----------------------------------------
      
      So revert this thing for now.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0a1966
  27. 12 6月, 2008 1 次提交
  28. 22 3月, 2008 1 次提交
    • P
      [TCP]: TCP_DEFER_ACCEPT updates - process as established · ec3c0982
      Patrick McManus 提交于
      Change TCP_DEFER_ACCEPT implementation so that it transitions a
      connection to ESTABLISHED after handshake is complete instead of
      leaving it in SYN-RECV until some data arrvies. Place connection in
      accept queue when first data packet arrives from slow path.
      
      Benefits:
        - established connection is now reset if it never makes it
         to the accept queue
      
       - diagnostic state of established matches with the packet traces
         showing completed handshake
      
       - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
         enforced with reasonable accuracy instead of rounding up to next
         exponential back-off of syn-ack retry.
      Signed-off-by: NPatrick McManus <mcmanus@ducksong.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec3c0982
  29. 04 3月, 2008 1 次提交
  30. 01 3月, 2008 1 次提交
  31. 11 10月, 2007 3 次提交
  32. 26 4月, 2007 2 次提交