1. 02 10月, 2017 4 次提交
  2. 23 8月, 2017 1 次提交
  3. 02 7月, 2017 1 次提交
  4. 01 7月, 2017 1 次提交
  5. 25 4月, 2017 2 次提交
    • W
      net/tcp_fastopen: Add snmp counter for blackhole detection · 46c2fa39
      Wei Wang 提交于
      This counter records the number of times the firewall blackhole issue is
      detected and active TFO is disabled.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46c2fa39
    • W
      net/tcp_fastopen: Disable active side TFO in certain scenarios · cf1ef3f0
      Wei Wang 提交于
      Middlebox firewall issues can potentially cause server's data being
      blackholed after a successful 3WHS using TFO. Following are the related
      reports from Apple:
      https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
      Slide 31 identifies an issue where the client ACK to the server's data
      sent during a TFO'd handshake is dropped.
      C ---> syn-data ---> S
      C <--- syn/ack ----- S
      C (accept & write)
      C <---- data ------- S
      C ----- ACK -> X     S
      		[retry and timeout]
      
      https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
      Slide 5 shows a similar situation that the server's data gets dropped
      after 3WHS.
      C ---- syn-data ---> S
      C <--- syn/ack ----- S
      C ---- ack --------> S
      S (accept & write)
      C?  X <- data ------ S
      		[retry and timeout]
      
      This is the worst failure b/c the client can not detect such behavior to
      mitigate the situation (such as disabling TFO). Failing to proceed, the
      application (e.g., SSL library) may simply timeout and retry with TFO
      again, and the process repeats indefinitely.
      
      The proposed solution is to disable active TFO globally under the
      following circumstances:
      1. client side TFO socket detects out of order FIN
      2. client side TFO socket receives out of order RST
      
      We disable active side TFO globally for 1hr at first. Then if it
      happens again, we disable it for 2h, then 4h, 8h, ...
      And we reset the timeout to 1hr if a client side TFO sockets not opened
      on loopback has successfully received data segs from server.
      And we examine this condition during close().
      
      The rational behind it is that when such firewall issue happens,
      application running on the client should eventually close the socket as
      it is not able to get the data it is expecting. Or application running
      on the server should close the socket as it is not able to receive any
      response from client.
      In both cases, out of order FIN or RST will get received on the client
      given that the firewall will not block them as no data are in those
      frames.
      And we want to disable active TFO globally as it helps if the middle box
      is very close to the client and most of the connections are likely to
      fail.
      
      Also, add a debug sysctl:
        tcp_fastopen_blackhole_detect_timeout_sec:
          the initial timeout to use when firewall blackhole issue happens.
          This can be set and read.
          When setting it to 0, it means to disable the active disable logic.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf1ef3f0
  6. 26 1月, 2017 2 次提交
    • W
      net/tcp-fastopen: Add new API support · 19f6d3f3
      Wei Wang 提交于
      This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
      alternative way to perform Fast Open on the active side (client). Prior
      to this patch, a client needs to replace the connect() call with
      sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
      to use Fast Open: these socket operations are often done in lower layer
      libraries used by many other applications. Changing these libraries
      and/or the socket call sequences are not trivial. A more convenient
      approach is to perform Fast Open by simply enabling a socket option when
      the socket is created w/o changing other socket calls sequence:
        s = socket()
          create a new socket
        setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
          newly introduced sockopt
          If set, new functionality described below will be used.
          Return ENOTSUPP if TFO is not supported or not enabled in the
          kernel.
      
        connect()
          With cookie present, return 0 immediately.
          With no cookie, initiate 3WHS with TFO cookie-request option and
          return -1 with errno = EINPROGRESS.
      
        write()/sendmsg()
          With cookie present, send out SYN with data and return the number of
          bytes buffered.
          With no cookie, and 3WHS not yet completed, return -1 with errno =
          EINPROGRESS.
          No MSG_FASTOPEN flag is needed.
      
        read()
          Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
          write() is not called yet.
          Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
          established but no msg is received yet.
          Return number of bytes read if socket is established and there is
          msg received.
      
      The new API simplifies life for applications that always perform a write()
      immediately after a successful connect(). Such applications can now take
      advantage of Fast Open by merely making one new setsockopt() call at the time
      of creating the socket. Nothing else about the application's socket call
      sequence needs to change.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19f6d3f3
    • W
      net/tcp-fastopen: refactor cookie check logic · 065263f4
      Wei Wang 提交于
      Refactor the cookie check logic in tcp_send_syn_data() into a function.
      This function will be called else where in later changes.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      065263f4
  7. 20 1月, 2017 1 次提交
    • A
      tcp: initialize max window for a new fastopen socket · 0dbd7ff3
      Alexey Kodanev 提交于
      Found that if we run LTP netstress test with large MSS (65K),
      the first attempt from server to send data comparable to this
      MSS on fastopen connection will be delayed by the probe timer.
      
      Here is an example:
      
           < S  seq 0:0 win 43690 options [mss 65495 wscale 7 tfo cookie] length 32
           > S. seq 0:0 ack 1 win 43690 options [mss 65495 wscale 7] length 0
           < .  ack 1 win 342 length 0
      
      Inside tcp_sendmsg(), tcp_send_mss() returns max MSS in 'mss_now',
      as well as in 'size_goal'. This results the segment not queued for
      transmition until all the data copied from user buffer. Then, inside
      __tcp_push_pending_frames(), it breaks on send window test and
      continues with the check probe timer.
      
      Fragmentation occurs in tcp_write_wakeup()...
      
      +0.2 > P. seq 1:43777 ack 1 win 342 length 43776
           < .  ack 43777, win 1365 length 0
           > P. seq 43777:65001 ack 1 win 342 options [...] length 21224
           ...
      
      This also contradicts with the fact that we should bound to the half
      of the window if it is large.
      
      Fix this flaw by correctly initializing max_window. Before that, it
      could have large values that affect further calculations of 'size_goal'.
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dbd7ff3
  8. 14 1月, 2017 1 次提交
    • S
      tcp: fix tcp_fastopen unaligned access complaints on sparc · 003c9410
      Shannon Nelson 提交于
      Fix up a data alignment issue on sparc by swapping the order
      of the cookie byte array field with the length field in
      struct tcp_fastopen_cookie, and making it a proper union
      to clean up the typecasting.
      
      This addresses log complaints like these:
          log_unaligned: 113 callbacks suppressed
          Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
          Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
          Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
          Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
          Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      003c9410
  9. 09 9月, 2016 1 次提交
  10. 02 9月, 2016 1 次提交
  11. 03 5月, 2016 1 次提交
  12. 28 4月, 2016 1 次提交
  13. 15 3月, 2016 1 次提交
    • M
      tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In · a44d6eac
      Martin KaFai Lau 提交于
      Per RFC4898, they count segments sent/received
      containing a positive length data segment (that includes
      retransmission segments carrying data).  Unlike
      tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments
      carrying no data (e.g. pure ack).
      
      The patch also updates the segs_in in tcp_fastopen_add_skb()
      so that segs_in >= data_segs_in property is kept.
      
      Together with retransmission data, tcpi_data_segs_out
      gives a better signal on the rxmit rate.
      
      v6: Rebase on the latest net-next
      
      v5: Eric pointed out that checking skb->len is still needed in
      tcp_fastopen_add_skb() because skb can carry a FIN without data.
      Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in()
      helper is used.  Comment is added to the fastopen case to explain why
      segs_in has to be reset and tcp_segs_in() has to be called before
      __skb_pull().
      
      v4: Add comment to the changes in tcp_fastopen_add_skb()
      and also add remark on this case in the commit message.
      
      v3: Add const modifier to the skb parameter in tcp_segs_in()
      
      v2: Rework based on recent fix by Eric:
      commit a9d99ce2 ("tcp: fix tcpi_segs_in after connection establishment")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Chris Rapier <rapier@psc.edu>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a44d6eac
  14. 07 2月, 2016 1 次提交
    • E
      tcp: fastopen: call tcp_fin() if FIN present in SYNACK · e3e17b77
      Eric Dumazet 提交于
      When we acknowledge a FIN, it is not enough to ack the sequence number
      and queue the skb into receive queue. We also have to call tcp_fin()
      to properly update socket state and send proper poll() notifications.
      
      It seems we also had the problem if we received a SYN packet with the
      FIN flag set, but it does not seem an urgent issue, as no known
      implementation can do that.
      
      Fixes: 61d2bcae ("tcp: fastopen: accept data/FIN present in SYNACK message")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3e17b77
  15. 06 2月, 2016 2 次提交
  16. 27 1月, 2016 1 次提交
  17. 23 10月, 2015 1 次提交
    • E
      tcp/dccp: fix hashdance race for passive sessions · 5e0724d0
      Eric Dumazet 提交于
      Multiple cpus can process duplicates of incoming ACK messages
      matching a SYN_RECV request socket. This is a rare event under
      normal operations, but definitely can happen.
      
      Only one must win the race, otherwise corruption would occur.
      
      To fix this without adding new atomic ops, we use logic in
      inet_ehash_nolisten() to detect the request was present in the same
      ehash bucket where we try to insert the new child.
      
      If request socket was not found, we have to undo the child creation.
      
      This actually removes a spin_lock()/spin_unlock() pair in
      reqsk_queue_unlink() for the fast path.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e0724d0
  18. 05 10月, 2015 1 次提交
    • E
      tcp: fix fastopen races vs lockless listener · 7656d842
      Eric Dumazet 提交于
      There are multiple races that need fixes :
      
      1) skb_get() + queue skb + kfree_skb() is racy
      
      An accept() can be done on another cpu, data consumed immediately.
      tcp_recvmsg() uses __kfree_skb() as it is assumed all skb found in
      socket receive queue are private.
      
      Then the kfree_skb() in tcp_rcv_state_process() uses an already freed skb
      
      2) tcp_reqsk_record_syn() needs to be done before tcp_try_fastopen()
      for the same reasons.
      
      3) We want to send the SYNACK before queueing child into accept queue,
      otherwise we might reintroduce the ooo issue fixed in
      commit 7c85af88 ("tcp: avoid reorders for TFO passive connections")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7656d842
  19. 03 10月, 2015 1 次提交
    • E
      tcp: attach SYNACK messages to request sockets instead of listener · ca6fb065
      Eric Dumazet 提交于
      If a listen backlog is very big (to avoid syncookies), then
      the listener sk->sk_wmem_alloc is the main source of false
      sharing, as we need to touch it twice per SYNACK re-transmit
      and TX completion.
      
      (One SYN packet takes listener lock once, but up to 6 SYNACK
      are generated)
      
      By attaching the skb to the request socket, we remove this
      source of contention.
      
      Tested:
      
       listen(fd, 10485760); // single listener (no SO_REUSEPORT)
       16 RX/TX queue NIC
       Sustain a SYNFLOOD attack of ~320,000 SYN per second,
       Sending ~1,400,000 SYNACK per second.
       Perf profiles now show listener spinlock being next bottleneck.
      
          20.29%  [kernel]  [k] queued_spin_lock_slowpath
          10.06%  [kernel]  [k] __inet_lookup_established
           5.12%  [kernel]  [k] reqsk_timer_handler
           3.22%  [kernel]  [k] get_next_timer_interrupt
           3.00%  [kernel]  [k] tcp_make_synack
           2.77%  [kernel]  [k] ipt_do_table
           2.70%  [kernel]  [k] run_timer_softirq
           2.50%  [kernel]  [k] ip_finish_output
           2.04%  [kernel]  [k] cascade
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca6fb065
  20. 30 9月, 2015 1 次提交
    • E
      tcp: prepare fastopen code for upcoming listener changes · 0536fcc0
      Eric Dumazet 提交于
      While auditing TCP stack for upcoming 'lockless' listener changes,
      I found I had to change fastopen_init_queue() to properly init the object
      before publishing it.
      
      Otherwise an other cpu could try to lock the spinlock before it gets
      properly initialized.
      
      Instead of adding appropriate barriers, just remove dynamic memory
      allocations :
      - Structure is 28 bytes on 64bit arches. Using additional 8 bytes
        for holding a pointer seems overkill.
      - Two listeners can share same cache line and performance would suffer.
      
      If we really want to save few bytes, we would instead dynamically allocate
      whole struct request_sock_queue in the future.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0536fcc0
  21. 29 9月, 2015 1 次提交
    • E
      tcp: avoid reorders for TFO passive connections · 7c85af88
      Eric Dumazet 提交于
      We found that a TCP Fast Open passive connection was vulnerable
      to reorders, as the exchange might look like
      
      [1] C -> S S <FO ...> <request>
      [2] S -> C S. ack request <options>
      [3] S -> C . <answer>
      
      packets [2] and [3] can be generated at almost the same time.
      
      If C receives the 3rd packet before the 2nd, it will drop it as
      the socket is in SYN_SENT state and expects a SYNACK.
      
      S will have to retransmit the answer.
      
      Current OOO avoidance in linux is defeated because SYNACK
      packets are attached to the LISTEN socket, while DATA packets
      are attached to the children. They might be sent by different cpus,
      and different TX queues might be selected.
      
      It turns out that for TFO, we created a child, which is a
      full blown socket in TCP_SYN_RECV state, and we simply can attach
      the SYNACK packet to this socket.
      
      This means that at the time tcp_sendmsg() pushes DATA packet,
      skb->ooo_okay will be set iff the SYNACK packet had been sent
      and TX completed.
      
      This removes the reorder source at the host level.
      
      We also removed the export of tcp_try_fastopen(), as it is no
      longer called from IPv6.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c85af88
  22. 23 6月, 2015 1 次提交
    • C
      tcp: Do not call tcp_fastopen_reset_cipher from interrupt context · dfea2aa6
      Christoph Paasch 提交于
      tcp_fastopen_reset_cipher really cannot be called from interrupt
      context. It allocates the tcp_fastopen_context with GFP_KERNEL and
      calls crypto_alloc_cipher, which allocates all kind of stuff with
      GFP_KERNEL.
      
      Thus, we might sleep when the key-generation is triggered by an
      incoming TFO cookie-request which would then happen in interrupt-
      context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:
      
      [   36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
      [   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
      [   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
      [   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [   36.008250]  00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
      [   36.009630]  ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
      [   36.011076]  0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
      [   36.012494] Call Trace:
      [   36.012953]  <IRQ>  [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
      [   36.014085]  [<ffffffff810967d3>] ___might_sleep+0x103/0x170
      [   36.015117]  [<ffffffff81096892>] __might_sleep+0x52/0x90
      [   36.016117]  [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
      [   36.017266]  [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
      [   36.018485]  [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
      [   36.019679]  [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
      [   36.020884]  [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
      [   36.022058]  [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
      [   36.023118]  [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
      [   36.024185]  [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
      [   36.025327]  [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
      [   36.026410]  [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
      [   36.027556]  [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
      [   36.028784]  [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
      [   36.029832]  [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
      [   36.030936]  [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
      [   36.031875]  [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
      [   36.032953]  [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
      [   36.034065]  [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
      [   36.035069]  [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
      [   36.035963]  [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
      [   36.036950]  [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
      [   36.037847]  [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
      [   36.038994]  [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
      [   36.040033]  [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
      [   36.041025]  [<ffffffff81611482>] net_rx_action+0x122/0x310
      [   36.042007]  [<ffffffff81076743>] __do_softirq+0x103/0x2f0
      [   36.042978]  [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30
      
      This patch moves the call to tcp_fastopen_init_key_once to the places
      where a listener socket creates its TFO-state, which always happens in
      user-context (either from the setsockopt, or implicitly during the
      listen()-call)
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Fixes: 222e83d2 ("tcp: switch tcp_fastopen key generation to net_get_random_once")
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfea2aa6
  23. 23 5月, 2015 1 次提交
    • E
      tcp: fix a potential deadlock in tcp_get_info() · d654976c
      Eric Dumazet 提交于
      Taking socket spinlock in tcp_get_info() can deadlock, as
      inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
      while packet processing can use the reverse locking order.
      
      We could avoid this locking for TCP_LISTEN states, but lockdep would
      certainly get confused as all TCP sockets share same lockdep classes.
      
      [  523.722504] ======================================================
      [  523.728706] [ INFO: possible circular locking dependency detected ]
      [  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
      [  523.739202] -------------------------------------------------------
      [  523.745474] ss/18032 is trying to acquire lock:
      [  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
      [  523.758129]
      [  523.758129] but task is already holding lock:
      [  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
      [  523.774661]
      [  523.774661] which lock already depends on the new lock.
      [  523.774661]
      [  523.782850]
      [  523.782850] the existing dependency chain (in reverse order) is:
      [  523.790326]
      -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
      [  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
      [  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
      [  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
      [  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
      [  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
      [  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
      [  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
      [  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
      [  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
      [  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
      [  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420
      
      Lets use u64_sync infrastructure instead. As a bonus, 64bit
      arches get optimized, as these are nop for them.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d654976c
  24. 30 4月, 2015 1 次提交
    • E
      tcp: add tcpi_bytes_received to tcp_info · bdd1f9ed
      Eric Dumazet 提交于
      This patch tracks total number of payload bytes received on a TCP socket.
      This is the sum of all changes done to tp->rcv_nxt
      
      RFC4898 named this : tcpEStatsAppHCThruOctetsReceived
      
      This is a 64bit field, and can be fetched both from TCP_INFO
      getsockopt() if one has a handle on a TCP socket, or from inet_diag
      netlink facility (iproute2/ss patch will follow)
      
      Note that tp->bytes_received was placed near tp->rcv_nxt for
      best data locality and minimal performance impact.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Eric Salo <salo@google.com>
      Cc: Martin Lau <kafai@fb.com>
      Cc: Chris Rapier <rapier@psc.edu>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdd1f9ed
  25. 08 4月, 2015 1 次提交
  26. 04 4月, 2015 1 次提交
  27. 21 3月, 2015 1 次提交
    • E
      inet: get rid of central tcp/dccp listener timer · fa76ce73
      Eric Dumazet 提交于
      One of the major issue for TCP is the SYNACK rtx handling,
      done by inet_csk_reqsk_queue_prune(), fired by the keepalive
      timer of a TCP_LISTEN socket.
      
      This function runs for awful long times, with socket lock held,
      meaning that other cpus needing this lock have to spin for hundred of ms.
      
      SYNACK are sent in huge bursts, likely to cause severe drops anyway.
      
      This model was OK 15 years ago when memory was very tight.
      
      We now can afford to have a timer per request sock.
      
      Timer invocations no longer need to lock the listener,
      and can be run from all cpus in parallel.
      
      With following patch increasing somaxconn width to 32 bits,
      I tested a listener with more than 4 million active request sockets,
      and a steady SYNFLOOD of ~200,000 SYN per second.
      Host was sending ~830,000 SYNACK per second.
      
      This is ~100 times more what we could achieve before this patch.
      
      Later, we will get rid of the listener hash and use ehash instead.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa76ce73
  28. 18 3月, 2015 2 次提交
  29. 17 3月, 2015 1 次提交
  30. 01 3月, 2015 1 次提交
  31. 13 2月, 2015 1 次提交
  32. 10 2月, 2015 1 次提交
  33. 30 9月, 2014 1 次提交