From 2f53384424251c06038ae612e56231b96ab610ee Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 3 Apr 2012 09:37:01 +0000 Subject: [PATCH] tcp: allow splice() to build full TSO packets MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert Cc: Nandita Dukkipati Cc: Neal Cardwell Cc: Tom Herbert Cc: Yuchung Cheng Cc: H.K. Jerry Chu Cc: Maciej Żenczykowski Cc: Mahesh Bandewar Cc: Ilpo Järvinen Signed-off-by: Eric Dumazet com> Signed-off-by: David S. Miller --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index cfd7edda0a8e..2ff6f45a76f4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -860,7 +860,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse } out: - if (copied) + if (copied && !(flags & MSG_MORE)) tcp_push(sk, flags, mss_now, tp->nonagle); return copied; -- GitLab