提交 · 12d50c46dc0f7fd2e625c4befaa5fa5740a7a594 · OpenHarmony / kernel_linux

09 12月, 2009 1 次提交

tcp: Remove runtime check that can never be true. · 3dc78932

由 David S. Miller 提交于 12月 08, 2009

GCC even warns about it, as reported by Andrew Morton:

net/ipv4/tcp.c: In function 'do_tcp_getsockopt':
net/ipv4/tcp.c:2544: warning: comparison is always false due to limited range of data type
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dc78932

03 12月, 2009 2 次提交

TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS · e56fb50f

由 William Allen Simpson 提交于 12月 02, 2009

Provide per socket control of the TCP cookie option and SYN/SYNACK data.

This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):

    http://thread.gmane.org/gmane.linux.network/102586

The principle difference is using a TCP option to carry the cookie nonce,
instead of a user configured offset in the data.

Allocations have been rearranged to avoid requiring GFP_ATOMIC.

Requires:
   net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
   TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
   TCPCT part 1d: define TCP cookie option, extend existing struct's

Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e56fb50f

TCPCT part 1b: generate Responder Cookie secret · da5c78c8

由 William Allen Simpson 提交于 12月 02, 2009

Define (missing) hash message size for SHA1.

Define hashing size constants specific to TCP cookies.

Add new function: tcp_cookie_generator().

Maintain global secret values for tcp_cookie_generator().

This is a significantly revised implementation of earlier (15-year-old)
Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform.

Linux RCU technique appears to be well-suited to this application, though
neither of the circular queue items are freed.

These functions will also be used in subsequent patches that implement
additional features.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da5c78c8

01 12月, 2009 1 次提交

tcp: tcp_disconnect() should clear window_clamp · 1fdf475a

由 Eric Dumazet 提交于 11月 30, 2009

NFS can reuse its TCP socket after calling tcp_disconnect().

We noticed window scaling was not negotiated in SYN packet of next
connection request.

Fix is to clear tp->window_clamp in tcp_disconnect().
Reported-by: NKrzysztof Oledzki <ole@ans.pl>
Tested-by: NKrzysztof Oledzki <ole@ans.pl>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fdf475a

14 11月, 2009 1 次提交

tcp: provide more information on the tcp receive_queue bugs · d792c100

由 Ilpo Järvinen 提交于 11月 13, 2009

The addition of rcv_nxt allows to discern whether the skb
was out of place or tp->copied. Also catch fancy combination
of flags if necessary (sadly we might miss the actual causer
flags as it might have already returned).

Btw, we perhaps would want to forward copied_seq in
somewhere or otherwise we might have some nice loop with
WARN stuff within but where to do that safely I don't
know at this stage until more is known (but it is not
made significantly worse by this patch).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d792c100

23 10月, 2009 1 次提交

net: use WARN() for the WARN_ON in commit · c62f4c45

由 Arjan van de Ven 提交于 10月 22, 2009

Commit b6b39e8f (tcp: Try to catch MSG_PEEK bug) added a printk()
to the WARN_ON() that's in tcp.c. This patch changes this combination
to WARN(); the advantage of WARN() is that the printk message shows up
inside the message, so that kerneloops.org will collect the message.

In addition, this gets rid of an extra if() statement.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c62f4c45

20 10月, 2009 2 次提交

tcp: Try to catch MSG_PEEK bug · b6b39e8f

由 Herbert Xu 提交于 10月 19, 2009

This patch tries to print out more information when we hit the
MSG_PEEK bug in tcp_recvmsg.  It's been around since at least
2005 and it's about time that we finally fix it.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6b39e8f

tcp: fix TCP_DEFER_ACCEPT retrans calculation · b103cf34

由 Julian Anastasov 提交于 10月 19, 2009

Fix TCP_DEFER_ACCEPT conversion between seconds and
retransmission to match the TCP SYN-ACK retransmission periods
because the time is converted to such retransmissions. The old
algorithm selects one more retransmission in some cases. Allow
up to 255 retransmissions.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b103cf34

19 10月, 2009 1 次提交

inet: rename some inet_sock fields · c720c7e8

由 Eric Dumazet 提交于 10月 15, 2009

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c720c7e8

13 10月, 2009 1 次提交

tcp: replace ehash_size by ehash_mask · f373b53b

由 Eric Dumazet 提交于 10月 09, 2009

Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f373b53b

03 10月, 2009 1 次提交

net: splice() from tcp to pipe should take into account O_NONBLOCK · 42324c62

由 Eric Dumazet 提交于 10月 01, 2009

tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag

Before this patch :

splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
causes a random endless block (if pipe is full) and
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
will return 0 immediately if the TCP buffer is empty.

User application has no way to instruct splice() that socket should be in blocking mode
but pipe in nonblock more.

Many projects cannot use splice(tcp -> pipe) because of this flaw.

http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html

Linus introduced  SPLICE_F_NONBLOCK in commit 29e35094
(splice: add SPLICE_F_NONBLOCK flag )

  It doesn't make the splice itself necessarily nonblocking (because the
  actual file descriptors that are spliced from/to may block unless they
  have the O_NONBLOCK flag set), but it makes the splice pipe operations
  nonblocking.

Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only

This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK
flag, as other socket operations do.

Users will then call :

splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );

to block on data coming from socket (if file is in blocking mode),
and not block on pipe output (to avoid deadlock)

First version of this patch was submitted by Octavian Purdila
Reported-by: NVolker Lendecke <vl@samba.org>
Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42324c62

02 10月, 2009 1 次提交

net/ipv4/tcp.c: fix min() type mismatch warning · 4fdb78d3

由 Andrew Morton 提交于 10月 01, 2009

net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fdb78d3

01 10月, 2009 1 次提交

net: Make setsockopt() optlen be unsigned. · b7058842

由 David S. Miller 提交于 9月 30, 2009

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7058842

22 9月, 2009 1 次提交

mm: replace various uses of num_physpages by totalram_pages · 4481374c

由 Jan Beulich 提交于 9月 21, 2009

Sizing of memory allocations shouldn't depend on the number of physical
pages found in a system, as that generally includes (perhaps a huge amount
of) non-RAM pages.  The amount of what actually is usable as storage
should instead be used as a basis here.

Some of the calculations (i.e.  those not intending to use high memory)
should likely even use (totalram_pages - totalhigh_pages).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4481374c

15 9月, 2009 1 次提交

tcp: fix ssthresh u16 leftover · 0b6a05c1

由 Ilpo Järvinen 提交于 9月 15, 2009

It was once upon time so that snd_sthresh was a 16-bit quantity.
...That has not been true for long period of time. I run across
some ancient compares which still seem to trust such legacy.
Put all that magic into a single place, I hopefully found all
of them.

Compile tested, though linking of allyesconfig is ridiculous
nowadays it seems.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b6a05c1

03 9月, 2009 1 次提交

tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076

由 Wu Fengguang 提交于 9月 02, 2009

This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
			tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa133076

29 8月, 2009 1 次提交

tcp: keepalive cleanups · df19a626

由 Eric Dumazet 提交于 8月 28, 2009

Introduce keepalive_probes(tp) helper, and use it, like 
keepalive_time_when(tp) and keepalive_intvl_when(tp)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df19a626

10 7月, 2009 1 次提交

net: adding memory barrier to the poll and receive callbacks · a57de0b4

由 Jiri Olsa 提交于 7月 08, 2009

Adding memory barrier after the poll_wait function, paired with
receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
to wrap the memory barrier.

Without the memory barrier, following race can happen.
The race fires, when following code paths meet, and the tp->rcv_nxt
and __add_wait_queue updates stay in CPU caches.

CPU1                         CPU2

sys_select                   receive packet
  ...                        ...
  __add_wait_queue           update tp->rcv_nxt
  ...                        ...
  tp->rcv_nxt check          sock_def_readable
  ...                        {
  schedule                      ...
                                if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                        wake_up_interruptible(sk->sk_sleep)
                                ...
                             }

If there was no cache the code would work ok, since the wait_queue and
rcv_nxt are opposit to each other.

Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
passed the tp->rcv_nxt check and sleeps, or will get the new value for
tp->rcv_nxt and will return with new data mask.
In both cases the process (CPU1) is being added to the wait queue, so the
waitqueue_active (CPU2) call cannot miss and will wake up CPU1.

The bad case is when the __add_wait_queue changes done by CPU1 stay in its
cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
endup calling schedule and sleep forever if there are no more data on the
socket.

Calls to poll_wait in following modules were ommited:
	net/bluetooth/af_bluetooth.c
	net/irda/af_irda.c
	net/irda/irnet/irnet_ppp.c
	net/mac80211/rc80211_pid_debugfs.c
	net/phonet/socket.c
	net/rds/af_rds.c
	net/rfkill/core.c
	net/sunrpc/cache.c
	net/sunrpc/rpc_pipe.c
	net/tipc/socket.c
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a57de0b4

30 6月, 2009 1 次提交

tcp: Do not tack on TSO data to non-TSO packet · 6828b92b

由 Herbert Xu 提交于 6月 28, 2009

If a socket starts out on a non-TSO route, and then switches to
a TSO route, then we will tack on data to the tail of the tx queue
even if it started out life as non-TSO. This is suboptimal because
all of it will then be copied and checksummed unnecessarily.

This patch fixes this by ensuring that skb->ip_summed is set to
CHECKSUM_PARTIAL before appending extra data beyond the MSS.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6828b92b

29 5月, 2009 1 次提交
- D
  tcp: Use SKB queue and list helpers instead of doing it by-hand. · 91521944
  由 David S. Miller 提交于 5月 28, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  91521944
27 5月, 2009 5 次提交

tcp: Do not check flush when comparing options for GRO · a2a804cd

由 Herbert Xu 提交于 5月 26, 2009

There is no need to repeatedly check flush when comparing TCP
options for GRO as it will be false 99% of the time where it
matters.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2a804cd

gro: Avoid unnecessary comparison after skb_gro_header · a5b1cf28

由 Herbert Xu 提交于 5月 26, 2009

For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL.  Yet we must check it because of its current
form.  This patch splits it up into multiple functions in order
to avoid this.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5b1cf28

tcp: Optimise len/mss comparison · 30a3ae30

由 Herbert Xu 提交于 5月 26, 2009

Instead of checking len > mss || len == 0, we can accomplish
both by checking (len - 1) > mss using the unsigned wraparound.
At nearly a million times a second, this might just help.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30a3ae30

tcp: Remove unnecessary window comparisons for GRO · 4a9a2968

由 Herbert Xu 提交于 5月 26, 2009

The window has already been checked as part of the flag word
so there is no need to check it explicitly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a9a2968

tcp: Optimise GRO port comparisons · 745898ea

由 Herbert Xu 提交于 5月 26, 2009

Instead of doing two 16-bit operations for the source/destination
ports, we can do one 32-bit operation to take care both.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

745898ea

19 5月, 2009 1 次提交

tcp: fix MSG_PEEK race check · 77527313

由 Ilpo Järvinen 提交于 5月 10, 2009

Commit 518a09ef (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.

I tried my best to deal with urg hole madness too which happens
here:
	if (!sock_flag(sk, SOCK_URGINLINE)) {
		++*seq;
		...
by using additional offset by one but I certainly have very
little interest in testing that part.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: NFrans Pop <elendil@planet.nl>
Tested-by: NIan Zimmermann <itz@buug.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77527313

17 4月, 2009 1 次提交

gro: Fix use after free in tcp_gro_receive · a0a69a01

由 Herbert Xu 提交于 4月 17, 2009

After calling skb_gro_receive skb->len can no longer be relied
on since if the skb was merged using frags, then its pages will
have been removed and the length reduced.

This caused tcp_gro_receive to prematurely end merging which
resulted in suboptimal performance with ixgbe.

The fix is to store skb->len on the stack.
Reported-by: NMark Wagner <mwagner@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0a69a01

01 4月, 2009 1 次提交

ipv4: remove unused parameter from tcp_recv_urg(). · 377f0a08

由 Rami Rosen 提交于 3月 31, 2009

Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

377f0a08

19 3月, 2009 1 次提交

tcp: remove parameter from tcp_recv_urg(). · beedad92

由 Rami Rosen 提交于 3月 18, 2009

This patch removes an unused parameter (addr_len) from tcp_recv_urg()
method in net/ipv4/tcp.c.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

beedad92

16 3月, 2009 3 次提交

tcp: make sure xmit goal size never becomes zero · afece1c6

由 Ilpo Järvinen 提交于 3月 14, 2009

It's not too likely to happen, would basically require crafted
packets (must hit the max guard in tcp_bound_to_half_wnd()).
It seems that nothing that bad would happen as there's tcp_mems
and congestion window that prevent runaway at some point from
hurting all too much (I'm not that sure what all those zero
sized segments we would generate do though in write queue).
Preventing it regardless is certainly the best way to go.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afece1c6

tcp: cache result of earlier divides when mss-aligning things · 2a3a041c

由 Ilpo Järvinen 提交于 3月 14, 2009

The results is very unlikely change every so often so we
hardly need to divide again after doing that once for a
connection. Yet, if divide still becomes necessary we
detect that and do the right thing and again settle for
non-divide state. Takes the u16 space which was previously
taken by the plain xmit_size_goal.

This should take care part of the tso vs non-tso difference
we found earlier.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a3a041c

tcp: simplify tcp_current_mss · 0c54b85f

由 Ilpo Järvinen 提交于 3月 14, 2009

There's very little need for most of the callsites to get
tp->xmit_goal_size updated. That will cost us divide as is,
so slice the function in two. Also, the only users of the
tp->xmit_goal_size are directly behind tcp_current_mss(),
so there's no need to store that variable into tcp_sock
at all! The drop of xmit_goal_size currently leaves 16-bit
hole and some reorganization would again be necessary to
change that (but I'm aiming to fill that hole with u16
xmit_goal_size_segs to cache the results of the remaining
divide to get that tso on regression).

Bring xmit_goal_size parts into tcp.c
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c54b85f

02 3月, 2009 1 次提交

tcp: in sendmsg/pages open code the real goto target · 0d6a775e

由 Ilpo Järvinen 提交于 2月 28, 2009

copied was assigned zero right before the goto, so if (copied)
cannot ever be true.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d6a775e

09 2月, 2009 1 次提交

gro: Optimise TCP packet reception · aa6320d3

由 Herbert Xu 提交于 2月 08, 2009

gro: Optimise TCP packet reception

As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch uses bit ops to logical ops, as well as open coding
memcmp to exploit alignment properties.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa6320d3

30 1月, 2009 1 次提交

gro: Avoid copying headers of unmerged packets · 86911732

由 Herbert Xu 提交于 1月 29, 2009

Unfortunately simplicity isn't always the best.  The fraginfo
interface turned out to be suboptimal.  The problem was quite
obvious.  For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it.  Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86911732

27 1月, 2009 1 次提交

tcp: Fix length tcp_splice_data_recv passes to skb_splice_bits. · 9fa5fdf2

由 Dimitris Michailidis 提交于 1月 26, 2009

tcp_splice_data_recv has two lengths to consider: the len parameter it
gets from tcp_read_sock, which specifies the amount of data in the skb,
and rd_desc->count, which is the amount of data the splice caller still
wants. Currently it passes just the latter to skb_splice_bits, which then
splices min(rd_desc->count, skb->len - offset) bytes.

Most of the time this is fine, except when the skb contains urgent data.
In that case len goes only up to the urgent byte and is less than
skb->len - offset. By ignoring len tcp_splice_data_recv may a) splice
data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len.

Now, tcp_read_sock doesn't handle used > len and leaves the socket in a
bad state (both sk_receive_queue and copied_seq are bad at that point)
resulting in duplicated data and corruption.

Fix by passing min(rd_desc->count, len) to skb_splice_bits.
Signed-off-by: NDimitris Michailidis <dm@chelsio.com>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fa5fdf2

15 1月, 2009 1 次提交

gso: Ensure that the packet is long enough · 4e704ee3

由 Herbert Xu 提交于 1月 14, 2009

When we get a GSO packet from an untrusted source, we need to
ensure that it is sufficiently long so that we don't end up
crashing.

Based on discovery and patch by Ian Campbell.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e704ee3

14 1月, 2009 1 次提交

tcp: splice as many packets as possible at once · 33966dd0

由 Willy Tarreau 提交于 1月 13, 2009

As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not
optimal. It processes at most one segment per call.
This results in low performance and very high overhead due to syscall rate
when splicing from interfaces which do not support LRO.

Willy provided a patch inside tcp_splice_read(), but a better fix
is to let tcp_read_sock() process as many segments as possible, so
that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less
often.

With this change, splice() behaves like tcp_recvmsg(), being able
to consume many skbs in one system call. With typical 1460 bytes
of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return
16*1460 = 23360 bytes.
Signed-off-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33966dd0

09 1月, 2009 1 次提交

tcp6: Add GRO support · 684f2176

由 Herbert Xu 提交于 1月 08, 2009

This patch adds GRO support for TCP over IPv6.  The code is exactly
the same as the IPv4 version except for the pseudo-header checksum
computation.

Note that I've removed the unused tcphdr argument from tcp_v6_check
rather than invent a bogus value for GRO.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

684f2176

07 1月, 2009 1 次提交

net_dma: convert to dma_find_channel · f67b4599

由 Dan Williams 提交于 1月 06, 2009

Use the general-purpose channel allocation provided by dmaengine.
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f67b4599

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年