提交 · fcdfe3a7fa4cb74391d42b6a26dc07c20dab1d82 · openeuler / raspberrypi-kernel

01 8月, 2014 1 次提交

net: Correctly set segment mac_len in skb_segment(). · fcdfe3a7

由 Vlad Yasevich 提交于 7月 31, 2014

When performing segmentation, the mac_len value is copied right
out of the original skb.  However, this value is not always set correctly
(like when the packet is VLAN-tagged) and we'll end up copying a bad
value.

One way to demonstrate this is to configure a VM which tags
packets internally and turn off VLAN acceleration on the forwarding
bridge port.  The packets show up corrupt like this:
16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q
(0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0,
        0x0000:  8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d.
        0x0010:  0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0..
        0x0020:  29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3
        0x0030:  000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp
        0x0040:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        0x0050:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        0x0060:  6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
        ...

This also leads to awful throughput as GSO packets are dropped and
cause retransmissions.

The solution is to set the mac_len using the values already available
in then new skb.  We've already adjusted all of the header offset, so we
might as well correctly figure out the mac_len using skb_reset_mac_len().
After this change, packets are segmented correctly and performance
is restored.

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcdfe3a7

26 6月, 2014 1 次提交

net: fix setting csum_start in skb_segment() · de843723

由 Tom Herbert 提交于 6月 25, 2014

Dave Jones reported that a crash is occurring in

csum_partial
tcp_gso_segment
inet_gso_segment
? update_dl_migration
skb_mac_gso_segment
__skb_gso_segment
dev_hard_start_xmit
sch_direct_xmit
__dev_queue_xmit
? dev_hard_start_xmit
dev_queue_xmit
ip_finish_output
? ip_output
ip_output
ip_forward_finish
ip_forward
ip_rcv_finish
ip_rcv
__netif_receive_skb_core
? __netif_receive_skb_core
? trace_hardirqs_on
__netif_receive_skb
netif_receive_skb_internal
napi_gro_complete
? napi_gro_complete
dev_gro_receive
? dev_gro_receive
napi_gro_receive

It looks like a likely culprit is that SKB_GSO_CB()->csum_start is
not set correctly when doing non-scatter gather. We are using
offset as opposed to doffset.
Reported-by: NDave Jones <davej@redhat.com>
Tested-by: NDave Jones <davej@redhat.com>
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 7e2b10c1 ("net: Support for multiple checksums with gso")
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de843723

15 6月, 2014 1 次提交

net: Fix save software checksum complete · 46fb51eb

由 Tom Herbert 提交于 6月 14, 2014

Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead5
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46fb51eb

12 6月, 2014 1 次提交

net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d

由 Octavian Purdila 提交于 6月 12, 2014

There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.

Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.

Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bad93e9d

11 6月, 2014 1 次提交

net: fix UDP tunnel GSO of frag_list GRO packets · 5882a07c

由 Wei-Chun Chao 提交于 6月 08, 2014

This patch fixes a kernel BUG_ON in skb_segment. It is hit when
testing two VMs on openvswitch with one VM acting as VXLAN gateway.

During VXLAN packet GSO, skb_segment is called with skb->data
pointing to inner TCP payload. skb_segment calls skb_network_protocol
to retrieve the inner protocol. skb_network_protocol actually expects
skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN.
This ends up pulling in ETH_HLEN data from header tail. As a result,
pskb_trim logic is skipped and BUG_ON is hit later.

Move skb_push in front of skb_network_protocol so that skb->data
lines up properly.

kernel BUG at net/core/skbuff.c:2999!
Call Trace:
[<ffffffff816ac412>] tcp_gso_segment+0x122/0x410
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390
[<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8109d742>] ? default_wake_function+0x12/0x20
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0
[<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550
[<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0
[<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0
[<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20
[<ffffffff81687edb>] ip_finish_output+0x66b/0x890
[<ffffffff81688a58>] ip_output+0x58/0x90
[<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350
[<ffffffff816881c9>] ip_local_out_sk+0x39/0x50
[<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130
[<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan]
[<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch]
[<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch]
[<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch]
Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5882a07c

05 6月, 2014 1 次提交

net: Support for multiple checksums with gso · 7e2b10c1

由 Tom Herbert 提交于 6月 04, 2014

When creating a GSO packet segment we may need to set more than
one checksum in the packet (for instance a TCP checksum and
UDP checksum for VXLAN encapsulation). To be efficient, we want
to do checksum calculation for any part of the packet at most once.

This patch adds csum_start offset to skb_gso_cb. This tracks the
starting offset for skb->csum which is initially set in skb_segment.
When a protocol needs to compute a transport checksum it calls
gso_make_checksum which computes the checksum value from the start
of transport header to csum_start and then adds in skb->csum to get
the full checksum. skb->csum and csum_start are then updated to reflect
the checksum of the resultant packet starting from the transport header.

This patch also adds a flag to skbuff, encap_hdr_csum, which is set
in *gso_segment fucntions to indicate that a tunnel protocol needs
checksum calculation
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e2b10c1

17 5月, 2014 1 次提交

net: gro: make sure skb->cb[] initial content has not to be zero · 29e98242

由 Eric Dumazet 提交于 5月 16, 2014

Starting from linux-3.13, GRO attempts to build full size skbs.

Problem is the commit assumed one particular field in skb->cb[]
was clean, but it is not the case on some stacked devices.

Timo reported a crash in case traffic is decrypted before
reaching a GRE device.

Fix this by initializing NAPI_GRO_CB(skb)->last at the right place,
this also removes one conditional.

Thanks a lot to Timo for providing full reports and bisecting this.

Fixes: 8a29111c ("net: gro: allow to build full sized skb")
Bisected-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29e98242

13 5月, 2014 1 次提交

net: rename local_df to ignore_df · 60ff7467

由 WANG Cong 提交于 5月 04, 2014

As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60ff7467

12 4月, 2014 1 次提交

net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369

由 David S. Miller 提交于 4月 11, 2014

Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

676d2369

11 4月, 2014 1 次提交

net: core: don't account for udp header size when computing seglen · 6d39d589

由 Florian Westphal 提交于 4月 09, 2014

In case of tcp, gso_size contains the tcpmss.

For UFO (udp fragmentation offloading) skbs, gso_size is the fragment
payload size, i.e. we must not account for udp header size.

Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet
will be needlessly fragmented in the forward path, because we think its
individual segments are too large for the outgoing link.

Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d39d589

29 3月, 2014 1 次提交

net: Account for all vlan headers in skb_mac_gso_segment · 53d6471c

由 Vlad Yasevich 提交于 3月 27, 2014

skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53d6471c

28 3月, 2014 1 次提交

core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors · 36d5fe6a

由 Zoltan Kiss 提交于 3月 26, 2014

skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36d5fe6a

14 3月, 2014 1 次提交

consolidate duplicate code is skb_checksum_setup() helpers · f9708b43

由 Jan Beulich 提交于 3月 11, 2014

consolidate duplicate code is skb_checksum_setup() helpers

Realizing that the skb_maybe_pull_tail() calls in the IP-protocol
specific portions of both helpers are terminal ones (i.e. no further
pulls are expected), their maximum size to be pulled can be made match
their minimal size needed, thus making the code identical and hence
possible to be moved into another helper.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9708b43

12 3月, 2014 5 次提交

skbuff: skb_segment: orphan frags before copying · 1fd819ec

由 Michael S. Tsirkin 提交于 3月 10, 2014

skb_segment copies frags around, so we need
to copy them carefully to avoid accessing
user memory after reporting completion to userspace
through a callback.

skb_segment doesn't normally happen on datapath:
TSO needs to be disabled - so disabling zero copy
in this case does not look like a big deal.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fd819ec

skbuff: skb_segment: s/fskb/list_skb/ · 1a4cedaf

由 Michael S. Tsirkin 提交于 3月 10, 2014

fskb is unrelated to frag: it's coming from
frag_list. Rename it list_skb to avoid confusion.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a4cedaf

skbuff: skb_segment: s/skb/head_skb/ · df5771ff

由 Michael S. Tsirkin 提交于 3月 10, 2014

rename local variable to make it easier to tell at a glance that we are
dealing with a head skb.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df5771ff

skbuff: skb_segment: s/skb_frag/frag/ · 4e1beba1

由 Michael S. Tsirkin 提交于 3月 10, 2014

skb_frag can in fact point at either skb
or fskb so rename it generally "frag".
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e1beba1

skbuff: skb_segment: s/frag/nskb_frag/ · 8cb19905

由 Michael S. Tsirkin 提交于 3月 10, 2014

frag points at nskb, so name it appropriately
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb19905

17 2月, 2014 1 次提交

netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n · 478b360a

由 Florian Westphal 提交于 2月 15, 2014

When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get
lots of "TRACE: filter:output:policy:1 IN=..." warnings as several
places will leave skb->nf_trace uninitialised.

Unlike iptables tracing functionality is not conditional in nftables,
so always copy/zero nf_trace setting when nftables is enabled.

Move this into __nf_copy() helper.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

478b360a

12 2月, 2014 1 次提交

skbuff: Introduce skb_to_sgvec_nomark to map skb without mark new end · 25a91d8d

由 Fan Du 提交于 1月 18, 2014

As compared with skb_to_sgvec, skb_to_sgvec_nomark only map skb to given sglist
without mark the sg which contain last skb data as the end. So the caller can
mannipulate sg list as will when padding new data after the first call without
calling sg_unmark_end to expend sg list.
Signed-off-by: NFan Du <fan.du@windriver.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

25a91d8d

29 1月, 2014 1 次提交

net: Fix warning on make htmldocs caused by skbuff.c · 7fceb4de

由 Masanari Iida 提交于 1月 29, 2014

This patch fixed following Warning while executing "make htmldocs".

Warning(/net/core/skbuff.c:2164): No description found for parameter 'from'
Warning(/net/core/skbuff.c:2164): Excess function parameter 'source'
description in 'skb_zerocopy'
Replace "@source" with "@from" fixed the warning.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fceb4de

27 1月, 2014 1 次提交

net: add and use skb_gso_transport_seglen() · de960aa9

由 Florian Westphal 提交于 1月 26, 2014

This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de960aa9

23 1月, 2014 1 次提交

fuse: fix pipe_buf_operations · 28a625cb

由 Miklos Szeredi 提交于 1月 22, 2014

Having this struct in module memory could Oops when if the module is
unloaded while the buffer still persists in a pipe.

Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
merge them into nosteal_pipe_buf_ops (this is the same as
default_pipe_buf_ops except stealing the page from the buffer is not
allowed).
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org

28a625cb

15 1月, 2014 1 次提交

net: add skb_checksum_setup · ed1f50c3

由 Paul Durrant 提交于 1月 09, 2014

This patch adds a function to set up the partial checksum offset for IP
packets (and optionally re-calculate the pseudo-header checksum) into the
core network code.
The implementation was previously private and duplicated between xen-netback
and xen-netfront, however it is not xen-specific and is potentially useful
to any network driver.
Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed1f50c3

07 1月, 2014 1 次提交

net: Export skb_zerocopy() to zerocopy from one skb to another · af2806f8

由 Thomas Graf 提交于 12月 13, 2013

Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJesse Gross <jesse@nicira.com>

af2806f8

22 12月, 2013 1 次提交

net: use kfree_skb_list() helper · 289dccbe

由 Eric Dumazet 提交于 12月 20, 2013

We can use kfree_skb_list() instead of open coding it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

289dccbe

18 12月, 2013 1 次提交

net: Add utility function to copy skb hash · 3df7a74e

由 Tom Herbert 提交于 12月 15, 2013

Adds skb_copy_hash to copy rxhash and l4_rxhash from one skb to another.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3df7a74e

06 12月, 2013 1 次提交

net: clear local_df when passing skb between namespaces · 239c78db

由 Hannes Frederic Sowa 提交于 12月 05, 2013

We must clear local_df when passing the skb between namespaces as the
packet is not local to the new namespace any more and thus may not get
fragmented by local rules. Fred Templin noticed that other namespaces
do fragment IPv6 packets while forwarding. Instead they should have send
back a PTB.

The same problem should be present when forwarding DF-IPv4 packets
between namespaces.
Reported-by: NTemplin, Fred L <Fred.L.Templin@boeing.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

239c78db

22 11月, 2013 1 次提交

gso: handle new frag_list of frags GRO packets · 9d8506cc

由 Herbert Xu 提交于 11月 21, 2013

Recently GRO started generating packets with frag_lists of frags.
This was not handled by GSO, thus leading to a crash.

Thankfully these packets are of a regular form and are easy to
handle.  This patch handles them in two ways.  For completely
non-linear frag_list entries, we simply continue to iterate over
the frag_list frags once we exhaust the normal frags.  For frag_list
entries with linear parts, we call pskb_trim on the first part
of the frag_list skb, and then process the rest of the frags in
the usual way.

This patch also kills a chunk of dead frag_list code that has
obviously never ever been run since it ends up generating a bogus
GSO-segmented packet with a frag_list entry.

Future work is planned to split super big packets into TSO
ones.

Fixes: 8a29111c ("net: gro: allow to build full sized skb")
Reported-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Reported-by: NJerry Chu <hkchu@google.com>
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
Tested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d8506cc

11 11月, 2013 1 次提交

netfilter: push reasm skb through instead of original frag skbs · 6aafeef0

由 Jiri Pirko 提交于 11月 06, 2013

Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:

<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>

As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.

Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aafeef0

08 11月, 2013 2 次提交

net: skbuff - kernel-doc fixes · bc32383c

由 Mathias Krause 提交于 11月 07, 2013

Use "@" to refer to parameters in the kernel-doc description. According
to Documentation/kernel-doc-nano-HOWTO.txt "&" shall be used to refer to
structures only.
Signed-off-by: NMathias Krause <mathias.krause@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc32383c

net: move pskb_put() to core code · 0c7ddf36

由 Mathias Krause 提交于 11月 07, 2013

This function has usage beside IPsec so move it to the core skbuff code.
While doing so, give it some documentation and change its return type to
'unsigned char *' to be in line with skb_put().
Signed-off-by: NMathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c7ddf36

05 11月, 2013 2 次提交

net: introduce skb_coalesce_rx_frag() · f8e617e1

由 Jason Wang 提交于 11月 01, 2013

Sometimes we need to coalesce the rx frags to avoid frag list. One example is
virtio-net driver which tries to use small frags for both MTU sized packet and
GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8e617e1

net: checksum: fix warning in skb_checksum · cea80ea8

由 Daniel Borkmann 提交于 11月 04, 2013

This patch fixes a build warning in skb_checksum() by wrapping the
csum_partial() usage in skb_checksum(). The problem is that on a few
architectures, csum_partial is used with prefix asmlinkage whereas
on most architectures it's not. So fix this up generically as we did
with csum_block_add_ext() to match the signature. Introduced by
2817a336 ("net: skb_checksum: allow custom update/combine for
walking skb").
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cea80ea8

04 11月, 2013 1 次提交

net: skb_checksum: allow custom update/combine for walking skb · 2817a336

由 Daniel Borkmann 提交于 10月 30, 2013

Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.

Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2817a336

20 10月, 2013 1 次提交

net: generalize skb_segment() · 030737bc

由 Eric Dumazet 提交于 10月 19, 2013

While implementing GSO/TSO support for IPIP, I found skb_segment()
was assuming network header was immediately following mac header.

Its not really true in the case inet_gso_segment() is stacked :
By the time tcp_gso_segment() is called, network header points
to the inner IP header.

Let's instead assume nothing and pick the current offsets found in
original skb, we have skb_headers_offset_update() helper for that.

Also move the csum_start update inside skb_headers_offset_update()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

030737bc

10 10月, 2013 1 次提交

net: gro: allow to build full sized skb · 8a29111c

由 Eric Dumazet 提交于 10月 08, 2013

skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.

It's relatively easy to extend the skb using frag_list to allow
more frags to be appended into the last sk_buff.

This still builds very efficient skbs, and allows reaching 45 MSS per
skb.

(45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
sk_buff)

High speed TCP flows benefit from this extension by lowering TCP stack
cpu usage (less packets stored in receive queue, less ACK packets
processed)

Forwarding setups could be hurt, as such skbs will need to be
linearized, although its not a new problem, as GRO could already
provide skbs with a frag_list.

We could make the 65536 bytes threshold a tunable to mitigate this.

(First time we need to linearize skb in skb_needs_linearize(), we could
lower the tunable to ~16*1460 so that following skb_gro_receive() calls
build smaller skbs)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a29111c

04 9月, 2013 1 次提交

skb: allow skb_scrub_packet() to be used by tunnels · 8b27f277

由 Nicolas Dichtel 提交于 9月 02, 2013

This function was only used when a packet was sent to another netns. Now, it can
also be used after tunnel encapsulation or decapsulation.

Only skb_orphan() should not be done when a packet is not crossing netns.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b27f277

02 8月, 2013 1 次提交

net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL · e0d1095a

由 Cong Wang 提交于 8月 01, 2013

Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0d1095a

25 7月, 2013 1 次提交

net: fix comment above build_skb() · deceb4c0

由 Florian Fainelli 提交于 7月 23, 2013

build_skb() specifies that the data parameter must come from a kmalloc'd
area, this is only true if frag_size equals 0, because then build_skb()
will use kzsize(data) to figure out the actual data size. Update the
comment to reflect that special condition.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deceb4c0