提交 · 9aaa156cf9b7e9d9ed899f254283b91c4e3c36c8 · openeuler / Kernel

27 5月, 2009 4 次提交

gro: Store shinfo in local variable in skb_gro_receive · 9aaa156c

由 Herbert Xu 提交于 5月 26, 2009

This patch stores the two shinfo pointers in local variables
because they're used over and over again in skb_gro_receive.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9aaa156c

gro: Nasty optimisations for page frags in skb_gro_receive · 66e92fcf

由 Herbert Xu 提交于 5月 26, 2009

This patch reverses the direction of the frags array copy in
skb_gro_receive in order simplify the loop conditional.  It
also avoids touching the first element of the original frags
array.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66e92fcf

gro: Localise offset/headlen in skb_gro_offset · 67147ba9

由 Herbert Xu 提交于 5月 26, 2009

This patch stores the offset/headlen in local variables as they're
used repeatedly in skb_gro_offset.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67147ba9

gro: Open-code frags copy in skb_gro_receive · 42da6994

由 Herbert Xu 提交于 5月 26, 2009

gcc does a poor job at generating code for the memcpy of the frags
array in skb_gro_receive, which is the primary purpose of that
function when merging frags.  In particular, it can't utilise the
alignment information of the source and destination.  This patch
open-codes the copy so we process words instead of bytes.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42da6994

25 5月, 2009 2 次提交

skbuff: Copy csum instead of csum_start/csum_offset · 9bcb97ca

由 Herbert Xu 提交于 5月 22, 2009

Hi:

skbuff: Copy csum instead of csum_start/csum_offset

It's easier to copy the u32 csum instead of its two u16
constituents.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bcb97ca

skbuff: Move new code into __copy_skb_header · 82c49a35

由 Herbert Xu 提交于 5月 22, 2009

Hi:

skbuff: Move new __skb_clone code into __copy_skb_header

It seems that people just keep on adding stuff to __skb_clone
instead __copy_skb_header.  This is wrong as it means your brand-new
attributes won't always get copied as you intended.

This patch moves them to the right place, and adds a comment to
prevent this from happening again.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82c49a35

19 5月, 2009 1 次提交

net: fix skb_seq_read returning wrong offset/length for page frag data · 995b3379

由 Thomas Chenault 提交于 5月 18, 2009

When called with a consumed value that is less than skb_headlen(skb)
bytes into a page frag, skb_seq_read() incorrectly returns an
offset/length relative to skb->data. Ensure that data which should come
from a page frag does.
Signed-off-by: NThomas Chenault <thomas_chenault@dell.com>
Tested-by: NShyam Iyer <shyam_iyer@dell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

995b3379

07 5月, 2009 1 次提交

net: update skb_recycle_check() for hardware timestamping changes · b8050075

由 Lennert Buytenhek 提交于 5月 06, 2009

Commit ac45f602 ("net: infrastructure
for hardware time stamping") added two skb initialization actions to
__alloc_skb(), which need to be added to skb_recycle_check() as well.
Signed-off-by: NLennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8050075

30 4月, 2009 1 次提交

net: Fix oops when splicing skbs from a frag_list. · 7a67e56f

由 Jarek Poplawski 提交于 4月 30, 2009

Lennert Buytenhek wrote:
> Since 4fb66994 ("net: Optimize memory
> usage when splicing from sockets.") I'm seeing this oops (e.g. in
> 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver
> (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather
> than the frag mode:

My patch incorrectly assumed skb->sk was always valid, but for
"frag_listed" skbs we can only use skb->sk of their parent.
Reported-by: NLennert Buytenhek <buytenh@wantstofly.org>
Debugged-by: NLennert Buytenhek <buytenh@wantstofly.org>
Tested-by: NLennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a67e56f

29 3月, 2009 1 次提交

gso: Fix support for linear packets · 2f181855

由 Herbert Xu 提交于 3月 28, 2009

When GRO/frag_list support was added to GSO, I made an error
which broke the support for segmenting linear GSO packets (GSO
packets are normally non-linear in the payload).

These days most of these packets are constructed by the tun
driver, which prefers to allocate linear memory if possible.
This is fixed in the latest kernel, but for 2.6.29 and earlier
it is still the norm.

Therefore this bug causes failures with GSO when used with tun
in 2.6.29.
Reported-by: NJames Huang <jamesclhuang@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f181855

14 3月, 2009 1 次提交

Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying... · ead2ceb0

由 Neil Horman 提交于 3月 11, 2009

Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>

 include/linux/skbuff.h |    4 +++-
 net/core/datagram.c    |    2 +-
 net/core/skbuff.c      |   22 ++++++++++++++++++++++
 net/ipv4/arp.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 net/packet/af_packet.c |    2 +-
 6 files changed, 29 insertions(+), 5 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ead2ceb0

27 2月, 2009 1 次提交

core: remove some pointless conditionals before kfree_skb() · f3fbbe0f

由 Wei Yongjun 提交于 2月 25, 2009

Remove some pointless conditionals before kfree_skb().
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3fbbe0f

18 2月, 2009 1 次提交

net: Kill skb_truesize_check(), it only catches false-positives. · 92a0acce

由 David S. Miller 提交于 2月 17, 2009

A long time ago we had bugs, primarily in TCP, where we would modify
skb->truesize (for TSO queue collapsing) in ways which would corrupt
the socket memory accounting.

skb_truesize_check() was added in order to try and catch this error
more systematically.

However this debugging check has morphed into a Frankenstein of sorts
and these days it does nothing other than catch false-positives.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92a0acce

16 2月, 2009 1 次提交

net: infrastructure for hardware time stamping · ac45f602

由 Patrick Ohly 提交于 2月 12, 2009

The additional per-packet information (16 bytes for time stamps, 1
byte for flags) is stored for all packets in the skb_shared_info
struct. This implementation detail is hidden from users of that
information via skb_* accessor functions. A separate struct resp.
union is used for the additional information so that it can be
stored/copied easily outside of skb_shared_info.

Compared to previous implementations (reusing the tstamp field
depending on the context, optional additional structures) this
is the simplest solution. It does not extend sk_buff itself.

TX time stamping is implemented in software if the device driver
doesn't support hardware time stamping.

The new semantic for hardware/software time stamping around
ndo_start_xmit() is based on two assumptions about existing
network device drivers which don't support hardware time
stamping and know nothing about it:
 - they leave the new skb_shared_tx unmodified
 - the keep the connection to the originating socket in skb->sk
   alive, i.e., don't call skb_orphan()

Given that skb_shared_tx is new, the first assumption is safe.
The second is only true for some drivers. As a result, software
TX time stamping currently works with the bnx2 driver, but not
with the unmodified igb driver (the two drivers this patch series
was tested with).
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac45f602

13 2月, 2009 1 次提交

net: Fix page seeking for skb_splice_bits(). · ce3dd395

由 Jarek Poplawski 提交于 2月 12, 2009

struct page walking should be done with proper accessor functions, not
directly.

With doubts from David S. Miller and Herbert Xu.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce3dd395

10 2月, 2009 1 次提交

net: Move skbuff symbol exports after each symbol's definition. · b4ac530f

由 David S. Miller 提交于 2月 10, 2009

net/core/skbuff.c is a hodge-podge of symbol export placement.
Some of the exports are right after the definition of the
symbol being exported, others are clumped together into a big
group at the end of the file.

Make things consistent.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4ac530f

06 2月, 2009 1 次提交

gro: Fix frag_list merging on imprecisely split packets · 56035022

由 Herbert Xu 提交于 2月 05, 2009

The previous fix ad0f9904 (gro:
Fix handling of imprecisely split packets) only fixed the case
of frags merging, frag_list merging in the same circumstances
were still broken.

In particular, the packet headers end up in the data stream.

This patch fixes this plus another issue where an imprecisely
split packet header may be read incorrectly (this is mostly
harmless since it'll simply cause the packet to not match and
be rejected for GRO).

Thanks to Emil Tantilov and Jeff Kirsher for helping to track
this down.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56035022

01 2月, 2009 1 次提交

net: Optimize memory usage when splicing from sockets. · 4fb66994

由 Jarek Poplawski 提交于 2月 01, 2009

The recent fix of data corruption when splicing from sockets uses
memory very inefficiently allocating a new page to copy each chunk of
linear part of skb. This patch uses the same page until it's full
(almost) by caching the page in sk_sndmsg_page field.

With changes from David S. Miller <davem@davemloft.net>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fb66994

30 1月, 2009 4 次提交

gro: Do not merge paged packets into frag_list · 81705ad1

由 Herbert Xu 提交于 1月 29, 2009

gro: Do not merge paged packets into frag_list

Bigger is not always better :)

It was easy to continue to merged packets into frag_list after the
page array is full.  However, this turns out to be worse than LRO
because frag_list is a much less efficient form of storage than the
page array.  So we're better off stopping the merge and starting
a new entry with an empty page array.

In future we can optimise this further by doing frag_list merging
but making sure that we continue to fill in the page array.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81705ad1

gro: Avoid copying headers of unmerged packets · 86911732

由 Herbert Xu 提交于 1月 29, 2009

Unfortunately simplicity isn't always the best.  The fraginfo
interface turned out to be suboptimal.  The problem was quite
obvious.  For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it.  Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86911732

net: Fix OOPS in skb_seq_read(). · 71b3346d

由 Shyam Iyer 提交于 1月 29, 2009

It oopsd for me in skb_seq_read. addr2line said it was
linux-2.6/net/core/skbuff.c:2228, which is this line:


	while (st->frag_idx < skb_shinfo(st->cur_skb)->nr_frags) {


I added some printks in there and it looks like we hit this:

        } else if (st->root_skb == st->cur_skb &&
                   skb_shinfo(st->root_skb)->frag_list) {
                 st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                 st->frag_idx = 0;
                 goto next_skb;
        }



Actually I did some testing and added a few printks and found that the
st->cur_skb->data was 0 and hence the ptr used by iscsi_tcp was null.
This caused the kernel panic.

 	if (abs_offset < block_limit) {
-		*data = st->cur_skb->data + abs_offset;
+		*data = st->cur_skb->data + (abs_offset - st->stepped_offset);

I enabled the debug_tcp and with a few printks found that the code did
not go to the next_skb label and could find that the sequence being
followed was this -

It hit this if condition -

        if (st->cur_skb->next) {
                st->cur_skb = st->cur_skb->next;
                st->frag_idx = 0;
                goto next_skb;

And so, now the st pointer is shifted to the next skb whereas actually
it should have hit the second else if first since the data is in the
frag_list.

        else if (st->root_skb == st->cur_skb &&
                 skb_shinfo(st->root_skb)->frag_list) {
                st->cur_skb = skb_shinfo(st->root_skb)->frag_list;
                goto next_skb;
        }

Reversing the two conditions the attached patch fixes the issue for me
on top of Herbert's patches. 
Signed-off-by: NShyam Iyer <shyam_iyer@dell.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71b3346d

net: Fix frag_list handling in skb_seq_read · 95e3b24c

由 Herbert Xu 提交于 1月 29, 2009

The frag_list handling was broken in skb_seq_read:

1) We didn't add the stepped offset when looking at the head
are of fragments other than the first.

2) We didn't take the stepped offset away when setting the data
pointer in the head area.

3) The frag index wasn't reset.

This patch fixes both issues.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95e3b24c

21 1月, 2009 1 次提交

gro: Fix merging of paged packets · 37fe4732

由 Herbert Xu 提交于 1月 17, 2009

The previous fix to paged packets broke the merging because it
reset the skb->len before we added it to the merged packet.  This
wasn't detected because it simply resulted in the truncation of
the packet while the missing bit is subsequently retransmitted.

The fix is to store skb->len before we clobber it.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37fe4732

20 1月, 2009 1 次提交

net: Fix data corruption when splicing from sockets. · 8b9d3728

由 Jarek Poplawski 提交于 1月 19, 2009

The trick in socket splicing where we try to convert the skb->data
into a page based reference using virt_to_page() does not work so
well.

The idea is to pass the virt_to_page() reference via the pipe
buffer, and refcount the buffer using a SKB reference.

But if we are splicing from a socket to a socket (via sendpage)
this doesn't work.

The from side processing will grab the page (and SKB) references.
The sendpage() calls will grab page references only, return, and
then the from side processing completes and drops the SKB ref.

The page based reference to skb->data is not enough to keep the
kmalloc() buffer backing it from being reused.  Yet, that is
all that the socket send side has at this point.

This leads to data corruption if the skb->data buffer is reused
by SLAB before the send side socket actually gets the TX packet
out to the device.

The fix employed here is to simply allocate a page and copy the
skb->data bytes into that page.

This will hurt performance, but there is no clear way to fix this
properly without a copy at the present time, and it is important
to get rid of the data corruption.

With fixes from Herbert Xu.
Tested-by: NWilly Tarreau <w@1wt.eu>
Foreseen-by: NChangli Gao <xiaosuo@gmail.com>
Diagnosed-by: NWilly Tarreau <w@1wt.eu>
Reported-by: NWilly Tarreau <w@1wt.eu>
Fixed-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b9d3728

15 1月, 2009 1 次提交

gro: Fix page ref count for skbs freed normally · f5572068

由 Herbert Xu 提交于 1月 14, 2009

When an skb with page frags is merged into an existing one, we
cannibalise its reference count.  This is OK when the skb is
reused because we set nr_frags to zero in that case.  However,
for the case where the skb is freed through kfree_skb, we didn't
clear nr_frags which causes the page to be freed prematurely.

This is fixed by moving the skb resetting into skb_gro_receive.
Reported-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5572068

05 1月, 2009 2 次提交

gro: Add page frag support · 5d38a079

由 Herbert Xu 提交于 1月 04, 2009

This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
in one skb, rather than using the less efficient frag_list.

It also adds a new interface, napi_gro_frags to allow drivers
to inject page frags directly into the stack without allocating
an skb.  This is intended to be the GRO equivalent for LRO's
lro_receive_frags interface.

The existing GSO interface can already handle page frags with
or without an appended frag_list so nothing needs to be changed
there.

The merging itself is rather simple.  We store any new frag entries
after the last existing entry, without checking whether the first
new entry can be merged with the last existing entry.  Making this
check would actually be easy but since no existing driver can
produce contiguous frags anyway it would just be mental masturbation.

If the total number of entries would exceed the capacity of a
single skb, we simply resort to using frag_list as we do now.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d38a079

gro: Use gso_size to store MSS · b530256d

由 Herbert Xu 提交于 1月 04, 2009

In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself.  The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b530256d

16 12月, 2008 2 次提交

net: Add skb_gro_receive · 71d93b39

由 Herbert Xu 提交于 12月 15, 2008

This patch adds the helper skb_gro_receive to merge packets for
GRO.  The current method is to allocate a new header skb and then
chain the original packets to its frag_list.  This is done to
make it easier to integrate into the existing GSO framework.

In future as GSO is moved into the drivers, we can undo this and
simply chain the original packets together.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71d93b39

net: Add frag_list support to skb_segment · 89319d38

由 Herbert Xu 提交于 12月 15, 2008

This patch adds limited support for handling frag_list packets in
skb_segment.  The intention is to support GRO (Generic Receive Offload)
packets which will be constructed by chaining normal packets using
frag_list.

As such we require all frag_list members terminate on exact MSS
boundaries.  This is checked using BUG_ON.

As there should only be one producer in the kernel of such packets,
namely GRO, this requirement should not be difficult to maintain.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89319d38

26 11月, 2008 2 次提交

net: make skb_truesize_bug() call WARN() · 8f480c0e

由 Arjan van de Ven 提交于 11月 25, 2008

The truesize message check is important enough to make it print "BUG"
to the user console... lets also make it important enough to spit a
backtrace/module list etc so that kerneloops.org can track them.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f480c0e

tcp: skb_shift cannot cache frag ptrs past pskb_expand_head · 9f782db3

由 Ilpo Järvinen 提交于 11月 25, 2008

Since pskb_expand_head creates copy of the shared area we
cannot keep any frag ptr past de-cloning. This fixes the
tcpdump recvfrom -EFAULT problem.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f782db3

25 11月, 2008 2 次提交

tcp: handle shift/merge of cloned skbs too · 0ace2856

由 Ilpo Järvinen 提交于 11月 24, 2008

This caused me to get repeatably:

  tcpdump: pcap_loop: recvfrom: Bad address

Happens occassionally when I tcpdump my for-looped test xfers:
  while [ : ]; do echo -n "$(date '+%s.%N') "; ./sendfile; sleep 20; done

Rest of the relevant commands:
  ethtool -K eth0 tso off
  tc qdisc add dev eth0 root netem drop 4%
  tcpdump -n -s0 -i eth0 -w sacklog.all

Running net-next under kvm, connection goes to the same host
(basically just out of kvm). The connection itself works ok
and data gets sent without corruption even with a large
number of tests while tcpdump fails usually within less than
5 tests.

Whether it only happens because of this change or not, I
don't know for sure but it's the only thing with which
I've seen that error. The non-cloned variant works w/o it
for much longer time. I'm yet to debug where the error
actually comes from.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ace2856

tcp: Try to restore large SKBs while SACK processing · 832d11c5

由 Ilpo Järvinen 提交于 11月 24, 2008

During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.

This approach has a number of benefits:

1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
   which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
   of TCP has to be aware of this (this was not the case with
   some other approach that was, well, quite intrusive all
   around).
4) Clean_rtx_queue can release most of the pages using single
   put_page instead of previous PAGE_SIZE/mss+1 calls

In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.

TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).

I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
  1) ambiguous => no sensible measurement can be taken anyway
  2) non-ambiguous is due to reordering => having the timestamp
     of the last segment there is just skewing things more off
     than does some good since the ack got triggered by one of
     the holes (besides some substle issues that would make
     determining right hole/skb even harder problem). Anyway,
     it has nothing to do with this change then.

I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.

In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.

Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.

TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)

My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...

[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]

...To counter that, I gave up on the fifth advantage:

5) When growing previous SACK block, less allocs for new skbs
   are done, basically a new alloc is needed only when new hole
   is detected and when the previous skb runs out of frags space

...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.

With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:

                  TCPSackShifted 398
                   TCPSackMerged 877
            TCPSackShiftFallback 320
      TCPSACKCOLLAPSEFALLBACKGSO 0
  TCPSACKCOLLAPSEFALLBACKSKBBITS 0
  TCPSACKCOLLAPSEFALLBACKSKBDATA 0
    TCPSACKCOLLAPSEFALLBACKBELOW 0
    TCPSACKCOLLAPSEFALLBACKFIRST 1
 TCPSACKCOLLAPSEFALLBACKPREVBITS 318
      TCPSACKCOLLAPSEFALLBACKMSS 1
   TCPSACKCOLLAPSEFALLBACKNOHEAD 0
    TCPSACKCOLLAPSEFALLBACKSHIFT 0
          TCPSACKCOLLAPSENOOPSEQ 0
  TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
     TCPSACKCOLLAPSENOOPSMALLLEN 0
             TCPSACKCOLLAPSEHOLE 12
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

832d11c5

11 11月, 2008 1 次提交

net: fix setting of skb->tail in skb_recycle_check() · 5cd33db2

由 Lennert Buytenhek 提交于 11月 10, 2008

Since skb_reset_tail_pointer() reads skb->data, we need to set
skb->data before calling skb_reset_tail_pointer(). This was causing
spurious skb_over_panic()s from skb_put() being called on a recycled
skb that had its skb->tail set to beyond where it should have been.

Bug report from Peter van Valderen <linux@ddcrew.com>.
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5cd33db2

02 11月, 2008 1 次提交

net: add documentation for skb recycling · d1a203ea

由 Stephen Hemminger 提交于 11月 01, 2008

Commit 04a4bb55 ("net: add
skb_recycle_check() to enable netdriver skb recycling") added a
method for network drivers to recycle skbuffs, but while use of
this mechanism was documented in the commit message, it should
really have been added as a docbook comment as well -- this
patch does that.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1a203ea

01 11月, 2008 1 次提交

mac80211: Re-enable aggregation · 8b30b1fe

由 Sujith 提交于 10月 24, 2008

Wireless HW without any dedicated queues for aggregation
do not need the ampdu_queues mechanism present right now
in mac80211. Since mac80211 is still incomplete wrt TX MQ
changes, do not allow aggregation sessions for drivers that
set ampdu_queues.

This is only an interim hack until Intel fixes the requeue issue.
Signed-off-by: NSujith <Sujith.Manoharan@atheros.com>
Signed-off-by: NLuis Rodriguez <Luis.Rodriguez@Atheros.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

8b30b1fe

29 10月, 2008 1 次提交

net: reduce structures when XFRM=n · def8b4fa

由 Alexey Dobriyan 提交于 10月 28, 2008

ifdef out
* struct sk_buff::sp		(pointer)
* struct dst_entry::xfrm	(pointer)
* struct sock::sk_policy	(2 pointers)
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

def8b4fa

14 10月, 2008 1 次提交

net: Rationalise email address: Network Specific Parts · 113aa838

由 Alan Cox 提交于 10月 13, 2008

Clean up the various different email addresses of mine listed in the code
to a single current and valid address. As Dave says his network merges
for 2.6.28 are now done this seems a good point to send them in where
they won't risk disrupting real changes.
Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

113aa838

08 10月, 2008 1 次提交

net: packet split receive api · 654bed16

由 Peter Zijlstra 提交于 10月 07, 2008

Add some packet-split receive hooks.

For one this allows to do NUMA node affine page allocs. Later on these
hooks will be extended to do emergency reserve allocations for
fragments.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

654bed16

01 10月, 2008 1 次提交

net: BUG instead of corrupting memory in pskb_expand_head · 4edd87ad

由 Herbert Xu 提交于 10月 01, 2008

If the caller of pskb_expand_head specifies a negative nhead
we'll silently overwrite other people's memory.  This patch
makes it BUG instead.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4edd87ad

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功