提交 · de8261c2fa364397ed872fad1244d75364689168 · openeuler / Kernel

14 2月, 2012 1 次提交

gro: fix truesize underestimation · de8261c2

由 Eric Dumazet 提交于 2月 13, 2012

skb_gro_receive() doesnt update truesize properly when adding one skb to
frag_list.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de8261c2

17 12月, 2011 1 次提交

net:core: use IS_ENABLED · a3bf7ae9

由 Igor Maravić 提交于 12月 12, 2011

Use IS_ENABLED(CONFIG_FOO)
instead of defined(CONFIG_FOO) || defined (CONFIG_FOO_MODULE)
Signed-off-by: NIgor Maravić <igorm@etf.rs>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3bf7ae9

05 12月, 2011 1 次提交

tcp: take care of misalignments · 117632e6

由 Eric Dumazet 提交于 12月 03, 2011

We discovered that TCP stack could retransmit misaligned skbs if a
malicious peer acknowledged sub MSS frame. This currently can happen
only if output interface is non SG enabled : If SG is enabled, tcp
builds headless skbs (all payload is included in fragments), so the tcp
trimming process only removes parts of skb fragments, header stay
aligned.

Some arches cant handle misalignments, so force a head reallocation and
shrink headroom to MAX_TCP_HEADER.

Dont care about misaligments on x86 and PPC (or other arches setting
NET_IP_ALIGN to 0)

This patch introduces __pskb_copy() which can specify the headroom of
new head, and pskb_copy() becomes a wrapper on top of __pskb_copy()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

117632e6

23 11月, 2011 1 次提交

net: correct comments of skb_shift · 20e994a0

由 Feng King 提交于 11月 21, 2011

when skb_shift, we want to shift paged data from skb to tgt frag area.
Original comments revert the shift order
Signed-off-by: NFeng King <kinwin2008@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20e994a0

17 11月, 2011 1 次提交

net: introduce and use netdev_features_t for device features sets · c8f44aff

由 Michał Mirosław 提交于 11月 15, 2011

v2:	add couple missing conversions in drivers
	split unexporting netdev_fix_features()
	implemented %pNF
	convert sock::sk_route_(no?)caps
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8f44aff

15 11月, 2011 1 次提交

net: introduce build_skb() · b2b5ce9d

由 Eric Dumazet 提交于 11月 14, 2011

One of the thing we discussed during netdev 2011 conference was the idea
to change some network drivers to allocate/populate their skb at RX
completion time, right before feeding the skb to network stack.

In old days, we allocated skbs when populating the RX ring.

This means bringing into cpu cache sk_buff and skb_shared_info cache
lines (since we clear/initialize them), then 'queue' skb->data to NIC.

By the time NIC fills a frame in skb->data buffer and host can process
it, cpu probably threw away the cache lines from its caches, because lot
of things happened between the allocation and final use.

So the deal would be to allocate only the data buffer for the NIC to
populate its RX ring buffer. And use build_skb() at RX completion to
attach a data buffer (now filled with an ethernet frame) to a new skb,
initialize the skb_shared_info portion, and give the hot skb to network
stack.

build_skb() is the function to allocate an skb, caller providing the
data buffer that should be attached to it. Drivers are expected to call
skb_reserve() right after build_skb() to adjust skb->data to the
Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN, but some
drivers might add a hardware provided alignment)

Data provided to build_skb() MUST have been allocated by a prior
kmalloc() call, with enough room to add SKB_DATA_ALIGN(sizeof(struct
skb_shared_info)) bytes at the end of the data without corrupting
incoming frame.

data = kmalloc(NET_SKB_PAD + NET_IP_ALIGN + 1536 +
               SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
	       GFP_ATOMIC);
...
skb = build_skb(data);
if (!skb) {
	recycle_data(data);
} else {
	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
	...
}
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Eilon Greenstein <eilong@broadcom.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Tom Herbert <therbert@google.com>
CC: Jamal Hadi Salim <hadi@mojatatu.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Thomas Graf <tgraf@infradead.org>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2b5ce9d

10 11月, 2011 1 次提交

net: add wireless TX status socket option · 6e3e939f

由 Johannes Berg 提交于 11月 09, 2011

The 802.1X EAPOL handshake hostapd does requires
knowing whether the frame was ack'ed by the peer.
Currently, we fudge this pretty badly by not even
transmitting the frame as a normal data frame but
injecting it with radiotap and getting the status
out of radiotap monitor as well. This is rather
complex, confuses users (mon.wlan0 presence) and
doesn't work with all hardware.

To get rid of that hack, introduce a real wifi TX
status option for data frame transmissions.

This works similar to the existing TX timestamping
in that it reflects the SKB back to the socket's
error queue with a SCM_WIFI_STATUS cmsg that has
an int indicating ACK status (0/1).

Since it is possible that at some point we will
want to have TX timestamping and wifi status in a
single errqueue SKB (there's little point in not
doing that), redefine SO_EE_ORIGIN_TIMESTAMPING
to SO_EE_ORIGIN_TXSTATUS which can collect more
than just the timestamp; keep the old constant
as an alias of course. Currently the internal APIs
don't make that possible, but it wouldn't be hard
to split them up in a way that makes it possible.

Thanks to Neil Horman for helping me figure out
the functions that add the control messages.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

6e3e939f

04 11月, 2011 1 次提交

net: Add back alignment for size for __alloc_skb · bc417e30

由 Tony Lindgren 提交于 11月 02, 2011

Commit 87fb4b7b (net: more
accurate skb truesize) changed the alignment of size. This
can cause problems at least on some machines with NFS root:

Unhandled fault: alignment exception (0x801) at 0xc183a43a
Internal error: : 801 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (3.1.0-08784-g5eeee4a #733)
pc : [<c02fbba0>]    lr : [<c02fbb9c>]    psr: 60000013
sp : c180fef8  ip : 00000000  fp : c181f580
r10: 00000000  r9 : c044b28c  r8 : 00000001
r7 : c183a3a0  r6 : c1835be0  r5 : c183a412  r4 : 000001f2
r3 : 00000000  r2 : 00000000  r1 : ffffffe6  r0 : c183a43a
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 10004000  DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc180e270)
Stack: (0xc180fef8 to 0xc1810000)
fee0:                                                       00000024 00000000
ff00: 00000000 c183b9c0 c183b8e0 c044b28c c0507ccc c019dfc4 c180ff2c c0503cf8
ff20: c180ff4c c180ff4c 00000000 c1835420 c182c740 c18349c0 c05233c0 00000000
ff40: 00000000 c00e6bb8 c180e000 00000000 c04dd82c c0507e7c c050cc18 c183b9c0
ff60: c05233c0 00000000 00000000 c01f34f4 c0430d70 c019d364 c04dd898 c04dd898
ff80: c04dd82c c0507e7c c180e000 00000000 c04c584c c01f4918 c04dd898 c04dd82c
ffa0: c04ddd28 c180e000 00000000 c0008758 c181fa60 3231d82c 00000037 00000000
ffc0: 00000000 c04dd898 c04dd82c c04ddd28 00000013 00000000 00000000 00000000
ffe0: 00000000 c04b2224 00000000 c04b21a0 c001056c c001056c 00000000 00000000
Function entered at [<c02fbba0>] from [<c019dfc4>]
Function entered at [<c019dfc4>] from [<c01f34f4>]
Function entered at [<c01f34f4>] from [<c01f4918>]
Function entered at [<c01f4918>] from [<c0008758>]
Function entered at [<c0008758>] from [<c04b2224>]
Function entered at [<c04b2224>] from [<c001056c>]
Code: e1a00005 e3a01028 ebfa7cb0 e35a0000 (e5858028)

Here PC is at __alloc_skb and &shinfo->dataref is unaligned because
skb->end can be unaligned without this patch.

As explained by Eric Dumazet <eric.dumazet@gmail.com>, this happens
only with SLOB, and not with SLAB or SLUB:

* Eric Dumazet <eric.dumazet@gmail.com> [111102 15:56]:
>
> Your patch is absolutely needed, I completely forgot about SLOB :(
>
> since, kmalloc(386) on SLOB gives exactly ksize=386 bytes, not nearest
> power of two.
>
> [   60.305763] malloc(size=385)->ffff880112c11e38 ksize=386 -> nsize=2
> [   60.305921] malloc(size=385)->ffff88007c92ce28 ksize=386 -> nsize=2
> [   60.306898] malloc(size=656)->ffff88007c44ad28 ksize=656 -> nsize=272
> [   60.325385] malloc(size=656)->ffff88007c575868 ksize=656 -> nsize=272
> [   60.325531] malloc(size=656)->ffff88011c777230 ksize=656 -> nsize=272
> [   60.325701] malloc(size=656)->ffff880114011008 ksize=656 -> nsize=272
> [   60.346716] malloc(size=385)->ffff880114142008 ksize=386 -> nsize=2
> [   60.346900] malloc(size=385)->ffff88011c777690 ksize=386 -> nsize=2
Signed-off-by: NTony Lindgren <tony@atomide.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc417e30

21 10月, 2011 1 次提交

net: add opaque struct around skb frag page · a8605c60

由 Ian Campbell 提交于 10月 19, 2011

I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8605c60

20 10月, 2011 1 次提交

net: Allow skb_recycle_check to be done in stages · 3d153a7c

由 Andy Fleming 提交于 10月 13, 2011

skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility.  We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.
Signed-off-by: NAndy Fleming <afleming@freescale.com>
Acked-by: NDavid Daney <david.daney@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d153a7c

19 10月, 2011 1 次提交

net: add skb frag size accessors · 9e903e08

由 Eric Dumazet 提交于 10月 18, 2011

To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e903e08

14 10月, 2011 1 次提交

net: more accurate skb truesize · 87fb4b7b

由 Eric Dumazet 提交于 10月 13, 2011

skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.

Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.

This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.

At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.

Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.

Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87fb4b7b

16 9月, 2011 1 次提交

net: copy userspace buffers on device forwarding · 48c83012

由 Michael S. Tsirkin 提交于 8月 31, 2011

dev_forward_skb loops an skb back into host networking
stack which might hang on the memory indefinitely.
In particular, this can happen in macvtap in bridged mode.
Copy the userspace fragments to avoid blocking the
sender in that case.

As this patch makes skb_copy_ubufs extern now,
I also added some documentation and made it clear
the SKBTX_DEV_ZEROCOPY flag automatically instead
of doing it in all callers. This can be made into a separate
patch if people feel it's worth it.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48c83012

25 8月, 2011 1 次提交

net: convert core to skb paged frag APIs · ea2ab693

由 Ian Campbell 提交于 8月 22, 2011

Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea2ab693

21 8月, 2011 1 次提交

net: Preserve ooo_okay when copying skb header · 6461be3a

由 Changli Gao 提交于 8月 19, 2011

Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6461be3a

18 8月, 2011 1 次提交

rps: Add flag to skb to indicate rxhash is based on L4 tuple · bdeab991

由 Tom Herbert 提交于 8月 14, 2011

The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet.  This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdeab991

02 8月, 2011 1 次提交

net: add kerneldoc to skb_copy_bits() · 22019b17

由 Eric Dumazet 提交于 7月 29, 2011

Since skb_copy_bits() is called from assembly, add a fat comment to make
clear we should think twice before changing its prototype.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22019b17

22 7月, 2011 1 次提交

skbuff: fix error handling in pskb_copy() · 1511022c

由 Dan Carpenter 提交于 7月 19, 2011

There are two problems:
1) "n" was allocated with alloc_skb() so we should free it with
   kfree_skb() instead of regular kfree().
2) We return the freed pointer instead of NULL.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1511022c

09 7月, 2011 1 次提交

skbuff: clear tx zero-copy flag · a48332f8

由 Shirley Ma 提交于 7月 09, 2011

This patch clears tx zero-copy flag as needed.
Signed-off-by: NShirley Ma <xma@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a48332f8

07 7月, 2011 1 次提交

skbuff: skb supports zero-copy buffers · a6686f2f

由 Shirley Ma 提交于 7月 06, 2011

This patch adds userspace buffers support in skb shared info. A new
struct skb_ubuf_info is needed to maintain the userspace buffers
argument and index, a callback is used to notify userspace to release
the buffers once lower device has done DMA (Last reference to that skb
has gone).

If there is any userspace apps to reference these userspace buffers,
then these userspaces buffers will be copied into kernel. This way we
can prevent userspace apps from holding these userspace buffers too long.

Use destructor_arg to point to the userspace buffer info; a new tx flags
SKBTX_DEV_ZEROCOPY is added for zero-copy buffer check.
Signed-off-by: NShirley Ma <xma@...ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6686f2f

21 5月, 2011 1 次提交

sanitize <linux/prefetch.h> usage · 268bb0ce

由 Linus Torvalds 提交于 5月 20, 2011

Commit e66eed65 ("list: remove prefetching from regular list
iterators") removed the include of prefetch.h from list.h, which
uncovered several cases that had apparently relied on that rather
obscure header file dependency.

So this fixes things up a bit, using

   grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
   grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

to guide us in finding files that either need <linux/prefetch.h>
inclusion, or have it despite not needing it.

There are more of them around (mostly network drivers), but this gets
many core ones.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

268bb0ce

18 5月, 2011 1 次提交

net: add skb_dst_force() in sock_queue_err_skb() · abb57ea4

由 Eric Dumazet 提交于 5月 18, 2011

Commit 7fee226a (add a noref bit on skb dst) forgot to use
skb_dst_force() on packets queued in sk_error_queue

This triggers following warning, for applications using IP_CMSG_PKTINFO
receiving one error status


------------[ cut here ]------------
WARNING: at include/linux/skbuff.h:457 ip_cmsg_recv_pktinfo+0xa6/0xb0()
Hardware name: 2669UYD
Modules linked in: isofs vboxnetadp vboxnetflt nfsd ebtable_nat ebtables
lib80211_crypt_ccmp uinput xcbc hdaps tp_smapi thinkpad_ec radeonfb fb_ddc
radeon ttm drm_kms_helper drm ipw2200 intel_agp intel_gtt libipw i2c_algo_bit
i2c_i801 agpgart rng_core cfbfillrect cfbcopyarea cfbimgblt video raid10 raid1
raid0 linear md_mod vboxdrv
Pid: 4697, comm: miredo Not tainted 2.6.39-rc6-00569-g5895198c-dirty #22
Call Trace:
 [<c17746b6>] ? printk+0x1d/0x1f
 [<c1058302>] warn_slowpath_common+0x72/0xa0
 [<c15bbca6>] ? ip_cmsg_recv_pktinfo+0xa6/0xb0
 [<c15bbca6>] ? ip_cmsg_recv_pktinfo+0xa6/0xb0
 [<c1058350>] warn_slowpath_null+0x20/0x30
 [<c15bbca6>] ip_cmsg_recv_pktinfo+0xa6/0xb0
 [<c15bbdd7>] ip_cmsg_recv+0x127/0x260
 [<c154f82d>] ? skb_dequeue+0x4d/0x70
 [<c1555523>] ? skb_copy_datagram_iovec+0x53/0x300
 [<c178e834>] ? sub_preempt_count+0x24/0x50
 [<c15bdd2d>] ip_recv_error+0x23d/0x270
 [<c15de554>] udp_recvmsg+0x264/0x2b0
 [<c15ea659>] inet_recvmsg+0xd9/0x130
 [<c1547752>] sock_recvmsg+0xf2/0x120
 [<c11179cb>] ? might_fault+0x4b/0xa0
 [<c15546bc>] ? verify_iovec+0x4c/0xc0
 [<c1547660>] ? sock_recvmsg_nosec+0x100/0x100
 [<c1548294>] __sys_recvmsg+0x114/0x1e0
 [<c1093895>] ? __lock_acquire+0x365/0x780
 [<c1148b66>] ? fget_light+0xa6/0x3e0
 [<c1148b7f>] ? fget_light+0xbf/0x3e0
 [<c1148aee>] ? fget_light+0x2e/0x3e0
 [<c1549f29>] sys_recvmsg+0x39/0x60

Close bug https://bugzilla.kernel.org/show_bug.cgi?id=34622Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abb57ea4

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

17 3月, 2011 1 次提交

net: introduce rx_handler results and logic around that · 8a4eb573

由 Jiri Pirko 提交于 3月 12, 2011

This patch allows rx_handlers to better signalize what to do next to
it's caller. That makes skb->deliver_no_wcard no longer needed.

kernel-doc for rx_handler_result is taken from Nicolas' patch.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Reviewed-by: NNicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a4eb573

02 3月, 2011 1 次提交

inet: Remove unused sk_sndmsg_* from UFO · 5a2ef920

由 Herbert Xu 提交于 3月 01, 2011

UFO doesn't really use the sk_sndmsg_* parameters so touching
them is pointless.  It can't use them anyway since the whole
point of UFO is to use the original pages without copying.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a2ef920

28 1月, 2011 1 次提交

net: add kmemcheck annotation in __alloc_skb() · c2aa3665

由 Eric Dumazet 提交于 1月 25, 2011

pskb_expand_head() triggers a kmemcheck warning when copy of
skb_shared_info is done in pskb_expand_head()

This is because destructor_arg field is not necessarily initialized at
this point. Add kmemcheck_annotate_variable() call in __alloc_skb() to
instruct kmemcheck this is a normal situation.

Resolves bugzilla.kernel.org 27212

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=27212Reported-by: NChristian Casteyde <casteyde.christian@free.fr>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2aa3665

25 1月, 2011 2 次提交

net: change netdev->features to u32 · 04ed3e74

由 Michał Mirosław 提交于 1月 24, 2011

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04ed3e74

GRO: fix merging a paged skb after non-paged skbs · d1dc7abf

由 Michal Schmidt 提交于 1月 24, 2011

Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 ->data_len is not decreased at all.
 ->len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 > /dev/null
 - on machine2: nc machine1 12345 < /dev/zero
 - on machine1:
   echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

[v2: Modified by Eric Dumazet to avoid advancing skb->data past the end
     and to use a temporary variable.]
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1dc7abf

13 1月, 2011 1 次提交

netfilter: fix compilation when conntrack is disabled but tproxy is enabled · 2fc72c7b

由 KOVACS Krisztian 提交于 1月 12, 2011

The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NKOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2fc72c7b

17 12月, 2010 1 次提交

net: Use skb_checksum_start_offset() · 55508d60

由 Michał Mirosław 提交于 12月 14, 2010

Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55508d60

16 12月, 2010 1 次提交

netfilter: fix compilation when conntrack is disabled but tproxy is enabled · ae90bdea

由 KOVACS Krisztian 提交于 12月 15, 2010

The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NKOVACS Krisztian <hidden@balabit.hu>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

ae90bdea

04 12月, 2010 1 次提交

net: don't reallocate skb->head unless the current one hasn't the needed extra size or is shared · ca44ac38

由 Changli Gao 提交于 11月 29, 2010

skb head being allocated by kmalloc(), it might be larger than what
actually requested because of discrete kmem caches sizes. Before
reallocating a new skb head, check if the current one has the needed
extra size.

Do this check only if skb head is not shared.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca44ac38

17 10月, 2010 1 次提交

net: allocate skbs on local node · 564824b0

由 Eric Dumazet 提交于 10月 11, 2010

commit b30973f8 (node-aware skb allocation) spread a wrong habit of
allocating net drivers skbs on a given memory node : The one closest to
the NIC hardware. This is wrong because as soon as we try to scale
network stack, we need to use many cpus to handle traffic and hit
slub/slab management on cross-node allocations/frees when these cpus
have to alloc/free skbs bound to a central node.

skb allocated in RX path are ephemeral, they have a very short
lifetime : Extra cost to maintain NUMA affinity is too expensive. What
appeared as a nice idea four years ago is in fact a bad one.

In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load,
and two 10Gb NIC might deliver more than 28 million packets per second,
needing all the available cpus.

Cost of cross-node handling in network and vm stacks outperforms the
small benefit hardware had when doing its DMA transfert in its 'local'
memory node at RX time. Even trying to differentiate the two allocations
done for one skb (the sk_buff on local node, the data part on NIC
hardware node) is not enough to bring good performance.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

564824b0

09 9月, 2010 1 次提交

gro: Re-fix different skb headrooms · 64289c8e

由 Jarek Poplawski 提交于 9月 04, 2010

The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p->data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be433Reported-by: NPlamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NPlamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64289c8e

07 9月, 2010 2 次提交

skb: Add tracepoints to freeing skb · 07dc22e7

由 Koki Sanagi 提交于 8月 23, 2010

This patch adds tracepoint to consume_skb and add trace_kfree_skb
before __kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check
how long it takes to free transmitted packets. And using it, we can
calculate how many packets driver had at that time. It is useful when
a drop of transmitted packet is a problem.

            sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8
Signed-off-by: NKoki Sanagi <sanagi.koki@jp.fujitsu.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4C724364.50903@jp.fujitsu.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

07dc22e7

net: pskb_expand_head() optimization · 1fd63041

由 Eric Dumazet 提交于 9月 02, 2010

pskb_expand_head() blindly takes references on fragments before calling
skb_release_data(), potentially releasing these references.

We can add a fast path, avoiding these atomic operations, if we own the
last reference on skb->head.

Based on a previous patch from David
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fd63041

04 9月, 2010 1 次提交

net: remove two kmemcheck annotations · 52ee7a04

由 Eric Dumazet 提交于 9月 03, 2010

__alloc_skb() uses a memset() to clear all the beginning of skb,
including bitfields contained in 'flags1' & 'flags2'.

We dont need any more to use kmemcheck_annotate_bitfield() on these
fields. However, we still need it for the clone part, which is not
cleared.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52ee7a04

02 9月, 2010 2 次提交

gro: fix different skb headrooms · 3d3be433

由 Eric Dumazet 提交于 9月 01, 2010

Packets entering GRO might have different headrooms, even for a given
flow (because of implementation details in drivers, like copybreak).
We cant force drivers to deliver packets with a fixed headroom.

1) fix skb_segment()

skb_segment() makes the false assumption headrooms of fragments are same
than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
errors, and crash later in skb_copy_and_csum_dev()

2) allocate a minimal skb for head of frag_list

skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
allocate a fresh skb. This adds NET_SKB_PAD to a padding already
provided by netdevice, depending on various things, like copybreak.

Use alloc_skb() to allocate an exact padding, to reduce cache line
needs:
NET_SKB_PAD + NET_IP_ALIGN

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626

Many thanks to Plamen Petrov, testing many debugging patches !
With help of Jarek Poplawski.
Reported-by: NPlamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d3be433

net: skbuff.c cleanup · 6602cebb

由 Eric Dumazet 提交于 9月 01, 2010

(skb->data - skb->head) can be changed by skb_headroom(skb)

Remove some uses of NET_SKBUFF_DATA_USES_OFFSET, using
(skb_end_pointer(skb) - skb->head) or
(skb_tail_pointer(skb) - skb->head) : compiler does the right thing,
and this is more readable for us ;)

(struct skb_shared_info *) casts in pskb_expand_head() to help memcpy()
to use aligned moves.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6602cebb

23 8月, 2010 1 次提交

net: Rename skb_has_frags to skb_has_frag_list · 21dc3301

由 David S. Miller 提交于 8月 23, 2010

SKBs can be "fragmented" in two ways, via a page array (called
skb_shinfo(skb)->frags[]) and via a list of SKBs (called
skb_shinfo(skb)->frag_list).

Since skb_has_frags() tests the latter, it's name is confusing
since it sounds more like it's testing the former.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21dc3301

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功