1. 21 7月, 2018 5 次提交
    • D
      tls: check RCV_SHUTDOWN in tls_wait_data · fcf4793e
      Doron Roberts-Kedes 提交于
      The current code does not check sk->sk_shutdown & RCV_SHUTDOWN.
      tls_sw_recvmsg may return a positive value in the case where bytes have
      already been copied when the socket is shutdown. sk->sk_err has been
      cleared, causing the tls_wait_data to hang forever on a subsequent
      invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg,
      fixes this problem.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcf4793e
    • D
      Merge branch 'tcp-fix-DCTCP-ECE-Ack-series' · f7a6eb1e
      David S. Miller 提交于
      Yuchung Cheng says:
      
      ====================
      fix DCTCP ECE Ack series
      
      This patch set address that the existing DCTCP implementation does not
      fully implement the ACK policy specified in the RFC. This improves
      the responsiveness of CE status change particularly on flows with
      small inflight.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7a6eb1e
    • Y
      tcp: do not delay ACK in DCTCP upon CE status change · a0496ef2
      Yuchung Cheng 提交于
      Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change
      has to be sent immediately so the sender can respond quickly:
      
      """ When receiving packets, the CE codepoint MUST be processed as follows:
      
         1.  If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to
             true and send an immediate ACK.
      
         2.  If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE
             to false and send an immediate ACK.
      """
      
      Previously DCTCP implementation may continue to delay the ACK. This
      patch fixes that to implement the RFC by forcing an immediate ACK.
      
      Tested with this packetdrill script provided by Larry Brakmo
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      0.110 < [ect0] . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
         +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
      
      0.200 < [ect0] . 1:1001(1000) ack 1 win 257
      0.200 > [ect01] . 1:1(0) ack 1001
      
      0.200 write(4, ..., 1) = 1
      0.200 > [ect01] P. 1:2(1) ack 1001
      
      0.200 < [ect0] . 1001:2001(1000) ack 2 win 257
      +0.005 < [ce] . 2001:3001(1000) ack 2 win 257
      
      +0.000 > [ect01] . 2:2(0) ack 2001
      // Previously the ACK below would be delayed by 40ms
      +0.000 > [ect01] E. 2:2(0) ack 3001
      
      +0.500 < F. 9501:9501(0) ack 4 win 257
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0496ef2
    • Y
      tcp: do not cancel delay-AcK on DCTCP special ACK · 27cde44a
      Yuchung Cheng 提交于
      Currently when a DCTCP receiver delays an ACK and receive a
      data packet with a different CE mark from the previous one's, it
      sends two immediate ACKs acking previous and latest sequences
      respectly (for ECN accounting).
      
      Previously sending the first ACK may mark off the delayed ACK timer
      (tcp_event_ack_sent). This may subsequently prevent sending the
      second ACK to acknowledge the latest sequence (tcp_ack_snd_check).
      The culprit is that tcp_send_ack() assumes it always acknowleges
      the latest sequence, which is not true for the first special ACK.
      
      The fix is to not make the assumption in tcp_send_ack and check the
      actual ack sequence before cancelling the delayed ACK. Further it's
      safer to pass the ack sequence number as a local variable into
      tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid
      future bugs like this.
      Reported-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27cde44a
    • Y
      tcp: helpers to send special DCTCP ack · 2987babb
      Yuchung Cheng 提交于
      Refactor and create helpers to send the special ACK in DCTCP.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2987babb
  2. 20 7月, 2018 2 次提交
    • Z
      net-next/hinic: fix a problem in hinic_xmit_frame() · f7482683
      Zhao Chen 提交于
      The calculation of "wqe_size" is not correct when the tx queue is busy in
      hinic_xmit_frame().
      
      When there are no free WQEs, the tx flow will unmap the skb buffer, then
      ring the doobell for the pending packets. But the "wqe_size" which used
      to calculate the doorbell address is not correct. The wqe size should be
      cleared to 0, otherwise, it will cause a doorbell error.
      
      This patch fixes the problem.
      Reported-by: NZhou Wang <wangzhou1@hisilicon.com>
      Signed-off-by: NZhao Chen <zhaochen6@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7482683
    • T
      net/page_pool: Fix inconsistent lock state warning · 4905bd9a
      Tariq Toukan 提交于
      Fix the warning below by calling the ptr_ring_consume_bh,
      which uses spin_[un]lock_bh.
      
      [  179.064300] ================================
      [  179.069073] WARNING: inconsistent lock state
      [  179.073846] 4.18.0-rc2+ #18 Not tainted
      [  179.078133] --------------------------------
      [  179.082907] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  179.089637] swapper/21/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [  179.095478] 00000000963d1995 (&(&r->consumer_lock)->rlock){+.?.}, at:
      __page_pool_empty_ring+0x61/0x100
      [  179.105988] {SOFTIRQ-ON-W} state was registered at:
      [  179.111443]   _raw_spin_lock+0x35/0x50
      [  179.115634]   __page_pool_empty_ring+0x61/0x100
      [  179.120699]   page_pool_destroy+0x32/0x50
      [  179.125204]   mlx5e_free_rq+0x38/0xc0 [mlx5_core]
      [  179.130471]   mlx5e_close_channel+0x20/0x120 [mlx5_core]
      [  179.136418]   mlx5e_close_channels+0x26/0x40 [mlx5_core]
      [  179.142364]   mlx5e_close_locked+0x44/0x50 [mlx5_core]
      [  179.148509]   mlx5e_close+0x42/0x60 [mlx5_core]
      [  179.153936]   __dev_close_many+0xb1/0x120
      [  179.158749]   dev_close_many+0xa2/0x170
      [  179.163364]   rollback_registered_many+0x148/0x460
      [  179.169047]   rollback_registered+0x56/0x90
      [  179.174043]   unregister_netdevice_queue+0x7e/0x100
      [  179.179816]   unregister_netdev+0x18/0x20
      [  179.184623]   mlx5e_remove+0x2a/0x50 [mlx5_core]
      [  179.190107]   mlx5_remove_device+0xe5/0x110 [mlx5_core]
      [  179.196274]   mlx5_unregister_interface+0x39/0x90 [mlx5_core]
      [  179.203028]   cleanup+0x5/0xbfc [mlx5_core]
      [  179.208031]   __x64_sys_delete_module+0x16b/0x240
      [  179.213640]   do_syscall_64+0x5a/0x210
      [  179.218151]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  179.224218] irq event stamp: 334398
      [  179.228438] hardirqs last  enabled at (334398): [<ffffffffa511d8b7>]
      rcu_process_callbacks+0x1c7/0x790
      [  179.239178] hardirqs last disabled at (334397): [<ffffffffa511d872>]
      rcu_process_callbacks+0x182/0x790
      [  179.249931] softirqs last  enabled at (334386): [<ffffffffa509732e>] irq_enter+0x5e/0x70
      [  179.259306] softirqs last disabled at (334387): [<ffffffffa509741c>] irq_exit+0xdc/0xf0
      [  179.268584]
      [  179.268584] other info that might help us debug this:
      [  179.276572]  Possible unsafe locking scenario:
      [  179.276572]
      [  179.283877]        CPU0
      [  179.286954]        ----
      [  179.290033]   lock(&(&r->consumer_lock)->rlock);
      [  179.295546]   <Interrupt>
      [  179.298830]     lock(&(&r->consumer_lock)->rlock);
      [  179.304550]
      [  179.304550]  *** DEADLOCK ***
      
      Fixes: ff7d6b27 ("page_pool: refurbish version of page_pool code")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4905bd9a
  3. 19 7月, 2018 28 次提交
  4. 18 7月, 2018 5 次提交