提交 d41a69f1 编写于 作者: E Eric Dumazet 提交者: David S. Miller

tcp: make tcp_sendmsg() aware of socket backlog

Large sendmsg()/write() hold socket lock for the duration of the call,
unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
are parked into socket backlog for a long time.
Critical decisions like fast retransmit might be delayed.
Receivers have to maintain a big out of order queue with additional cpu
overhead, and also possible stalls in TX once windows are full.

Bidirectional flows are particularly hurt since the backlog can become
quite big if the copy from user space triggers IO (page faults)

Some applications learnt to use sendmsg() (or sendmmsg()) with small
chunks to avoid this issue.

Kernel should know better, right ?

Add a generic sk_flush_backlog() helper and use it right
before a new skb is allocated. Typically we put 64KB of payload
per skb (unless MSG_EOR is requested) and checking socket backlog
every 64KB gives good results.

As a matter of fact, tests with TSO/GSO disabled give very nice
results, as we manage to keep a small write queue and smaller
perceived rtt.

Note that sk_flush_backlog() maintains socket ownership,
so is not equivalent to a {release_sock(sk); lock_sock(sk);},
to ensure implicit atomicity rules that sendmsg() was
giving to (possibly buggy) applications.

In this simple implementation, I chose to not call tcp_release_cb(),
but we might consider this later.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
上级 5413d1ba
...@@ -926,6 +926,17 @@ void sk_stream_kill_queues(struct sock *sk); ...@@ -926,6 +926,17 @@ void sk_stream_kill_queues(struct sock *sk);
void sk_set_memalloc(struct sock *sk); void sk_set_memalloc(struct sock *sk);
void sk_clear_memalloc(struct sock *sk); void sk_clear_memalloc(struct sock *sk);
void __sk_flush_backlog(struct sock *sk);
static inline bool sk_flush_backlog(struct sock *sk)
{
if (unlikely(READ_ONCE(sk->sk_backlog.tail))) {
__sk_flush_backlog(sk);
return true;
}
return false;
}
int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb); int sk_wait_data(struct sock *sk, long *timeo, const struct sk_buff *skb);
struct request_sock_ops; struct request_sock_ops;
......
...@@ -2048,6 +2048,13 @@ static void __release_sock(struct sock *sk) ...@@ -2048,6 +2048,13 @@ static void __release_sock(struct sock *sk)
sk->sk_backlog.len = 0; sk->sk_backlog.len = 0;
} }
void __sk_flush_backlog(struct sock *sk)
{
spin_lock_bh(&sk->sk_lock.slock);
__release_sock(sk);
spin_unlock_bh(&sk->sk_lock.slock);
}
/** /**
* sk_wait_data - wait for data to arrive at sk_receive_queue * sk_wait_data - wait for data to arrive at sk_receive_queue
* @sk: sock to wait on * @sk: sock to wait on
......
...@@ -1136,11 +1136,12 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) ...@@ -1136,11 +1136,12 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
/* This should be in poll */ /* This should be in poll */
sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
mss_now = tcp_send_mss(sk, &size_goal, flags);
/* Ok commence sending. */ /* Ok commence sending. */
copied = 0; copied = 0;
restart:
mss_now = tcp_send_mss(sk, &size_goal, flags);
err = -EPIPE; err = -EPIPE;
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
goto out_err; goto out_err;
...@@ -1166,6 +1167,9 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) ...@@ -1166,6 +1167,9 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
if (!sk_stream_memory_free(sk)) if (!sk_stream_memory_free(sk))
goto wait_for_sndbuf; goto wait_for_sndbuf;
if (sk_flush_backlog(sk))
goto restart;
skb = sk_stream_alloc_skb(sk, skb = sk_stream_alloc_skb(sk,
select_size(sk, sg), select_size(sk, sg),
sk->sk_allocation, sk->sk_allocation,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册