- 09 7月, 2012 1 次提交
-
-
由 Gao feng 提交于
we set max_prioidx to the first zero bit index of prioidx_map in function get_prioidx. So when we delete the low index netprio cgroup and adding a new netprio cgroup again,the max_prioidx will be set to the low index. when we set the high index cgroup's net_prio.ifpriomap,the function write_priomap will call update_netdev_tables to alloc memory which size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1), so the size of array that map->priomap point to is max_prioidx +1, which is low than what we actually need. fix this by adding check in get_prioidx,only set max_prioidx when max_prioidx low than the new prioidx. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2012 1 次提交
-
-
由 Vinson Lee 提交于
Make logging level consistent with other deprecation messages in net subsystem. Signed-off-by: NVinson Lee <vlee@twitter.com> Cc: David Mackey <tdmackey@twitter.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 6月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Orphaning skb in dev_hard_start_xmit() makes bonding behavior unfriendly for applications sending big UDP bursts : Once packets pass the bonding device and come to real device, they might hit a full qdisc and be dropped. Without orphaning, the sender is automatically throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming sk_sndbuf is not too big) We could try to defer the orphaning adding another test in dev_hard_start_xmit(), but all this seems of little gain, now that BQL tends to make packets more likely to be parked in Qdisc queues instead of NIC TX ring, in cases where performance matters. Reverts commits : fc6055a5 net: Introduce skb_orphan_try() 87fd308c net: skb_tx_hash() fix relative to skb_orphan_try() and removes SKBTX_DRV_NEEDS_SK_REF flag Reported-and-bisected-by: NJean-Michel Hautbois <jhautbois@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NOliver Hartkopp <socketcan@hartkopp.net> Acked-by: NOliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 6月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() : "skb->len += len;" instead of "skb_put(skb, len);" Meaning that _if_ a network driver needs to call skb_realloc_headroom(), only packet headers would be copied, leaving garbage in the payload. However the skb_realloc_headroom() must be avoided as much as possible since it requires memory and netpoll tries hard to work even if memory is exhausted (using a pool of preallocated skbs) It appears netpoll_send_udp() reserved 16 bytes for the ethernet header, which happens to work for typicall drivers but not all. Right thing is to use LL_RESERVED_SPACE(dev) (And also add dev->needed_tailroom of tailroom) This patch combines both fixes. Many thanks to Bogdan for raising this issue. Reported-by: NBogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NBogdan Hamciuc <bogdan.hamciuc@freescale.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: NNeil Horman <nhorman@tuxdriver.com> Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14d (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Cc: stable <stable@vger.kernel.org> # 2.6.35 Tested-by: NDave Jones <davej@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 6月, 2012 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in net/core: Warning(net/core/skbuff.c:3368): No description found for parameter 'delta_truesize' Warning(net/core/filter.c:628): No description found for parameter 'pfp' Warning(net/core/filter.c:628): Excess function parameter 'sk' description in 'sk_unattached_filter_create' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Denys found out "ip neigh" output was truncated to about 54 neighbours. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDenys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 6月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
drop_monitor calls several sleeping functions while in atomic context. BUG: sleeping function called from invalid context at mm/slub.c:943 in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2 Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55 Call Trace: [<ffffffff810697ca>] __might_sleep+0xca/0xf0 [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0 [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130 [<ffffffff815343fb>] __alloc_skb+0x4b/0x230 [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor] [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor] [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor] [<ffffffff810568e0>] process_one_work+0x130/0x4c0 [<ffffffff81058249>] worker_thread+0x159/0x360 [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240 [<ffffffff8105d403>] kthread+0x93/0xa0 [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80 [<ffffffff816be6d0>] ? gs_change+0xb/0xb Rework the logic to call the sleeping functions in right context. Use standard timer/workqueue api to let system chose any cpu to perform the allocation and netlink send. Also avoid a loop if reset_per_cpu_data() cannot allocate memory : use mod_timer() to wait 1/10 second before next try. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 6月, 2012 1 次提交
-
-
由 Jason Wang 提交于
We need to validate the number of pages consumed by data_len, otherwise frags array could be overflowed by userspace. So this patch validate data_len and return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 5月, 2012 1 次提交
-
-
由 Neil Horman 提交于
Now that we have module alias macros for generic netlink families, lets use those to mark modules with the appropriate family names for loading Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Move tcp_try_coalesce() protocol independent part to skb_try_coalesce(). skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly, to build optimized skbs (less sk_buff, and possibly less 'headers') skb_try_coalesce() is zero copy, unless the copy can fit in destination header (its a rare case) kfree_skb_partial() is also moved to net/core/skbuff.c and exported, because IPv6 will need it in patch (ipv6: use skb coalescing in reassembly). Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2012 3 次提交
-
-
由 Eric Dumazet 提交于
No need to export napi_frags_skb() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
commit c57b5468 (pktgen: fix crash at module unload) did a very poor job with list primitives. 1) list_splice() arguments were in the wrong order 2) list_splice(list, head) has undefined behavior if head is not initialized. 3) We should use the list_splice_init() variant to clear pktgen_threads list. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Fix two issues introduced in commit a1c7fff7 ( net: netdev_alloc_skb() use build_skb() ) - Must be IRQ safe (non NAPI drivers can use it) - Must not leak the frag if build_skb() fails to allocate sk_buff This patch introduces netdev_alloc_frag() for drivers willing to use build_skb() instead of __netdev_alloc_skb() variants. Factorize code so that : __dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and dev_alloc_skb() a wrapper around netdev_alloc_skb() Use __GFP_COLD flag. Almost all network drivers now benefit from skb->head_frag infrastructure. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2012 2 次提交
-
-
由 Neil Horman 提交于
When I first wrote drop monitor I wrote it to just build monolithically. There is no reason it can't be built modularly as well, so lets give it that flexibiity. I've tested this by building it as both a module and monolithically, and it seems to work quite well Change notes: v2) * fixed for_each_present_cpu loops to be more correct as per Eric D. * Converted exit path failures to BUG_ON as per Ben H. v3) * Converted del_timer to del_timer_sync to close race noted by Ben H. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
netdev_alloc_skb() is used by networks driver in their RX path to allocate an skb to receive an incoming frame. With recent skb->head_frag infrastructure, it makes sense to change netdev_alloc_skb() to use build_skb() and a frag allocator. This permits a zero copy splice(socket->pipe), and better GRO or TCP coalescing. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2012 3 次提交
-
-
由 Joe Perches 提交于
Use the current logging style. This enables use of dynamic debugging as well. Convert printk(KERN_<LEVEL> to pr_<level>. Add pr_fmt. Remove embedded prefixes, use %s, __func__ instead. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Convert printk(KERN_DEBUG to pr_debug which can enable dynamic debugging. Remove embedded prefixes from the conversions as pr_fmt adds them. Align arguments. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
- sock_flag() accepts a const pointer - sock_flag() returns a boolean Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2012 2 次提交
-
-
由 Paul Gortmaker 提交于
We are going to delete the Token ring support. This removes any special processing in the core networking for token ring, (aside from net/tr.c itself), leaving the drivers and remaining tokenring support present but inert. The mass removal of the drivers and net/tr.c will be in a separate commit, so that the history of these files that we still care about won't have the giant deletion tied into their history. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Joe Perches 提交于
Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 5月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
commit 7d3d43da (net: In unregister_netdevice_notifier unregister the netdevices.) makes pktgen crashing at module unload. [ 296.820578] BUG: spinlock bad magic on CPU#6, rmmod/3267 [ 296.820719] lock: ffff880310c38000, .magic: ffff8803, .owner: <none>/-1, .owner_cpu: -1 [ 296.820943] Pid: 3267, comm: rmmod Not tainted 3.4.0-rc5+ #254 [ 296.821079] Call Trace: [ 296.821211] [<ffffffff8168a715>] spin_dump+0x8a/0x8f [ 296.821345] [<ffffffff8168a73b>] spin_bug+0x21/0x26 [ 296.821507] [<ffffffff812b4741>] do_raw_spin_lock+0x131/0x140 [ 296.821648] [<ffffffff8169188e>] _raw_spin_lock+0x1e/0x20 [ 296.821786] [<ffffffffa00cc0fd>] __pktgen_NN_threads+0x4d/0x140 [pktgen] [ 296.821928] [<ffffffffa00ccf8d>] pktgen_device_event+0x10d/0x1e0 [pktgen] [ 296.822073] [<ffffffff8154ed4f>] unregister_netdevice_notifier+0x7f/0x100 [ 296.822216] [<ffffffffa00d2a0b>] pg_cleanup+0x48/0x73 [pktgen] [ 296.822357] [<ffffffff8109528e>] sys_delete_module+0x17e/0x2a0 [ 296.822502] [<ffffffff81699652>] system_call_fastpath+0x16/0x1b Hold the pktgen_thread_lock while splicing pktgen_threads, and test pktgen_exiting in pktgen_device_event() to make unload faster. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This reverts commit 8a83a00b. It causes regressions for S390 devices, because it does an unconditional DST drop on SKBs for vlans and the QETH device needs the neighbour entry hung off the DST for certain things on transmit. Arnd can't remember exactly why he even needed this change. Conflicts: drivers/net/macvlan.c net/8021q/vlan_dev.c net/core/dev.c Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 5月, 2012 2 次提交
-
-
由 Stuart Hodgson 提交于
ETHTOOL_GMODULEINFO returns a new struct ethtool_modinfo that will return the type and size of plug-in module eeprom (such as SFP+) for parsing by userland program. ETHTOOL_GMODULEEEPROM returns the raw eeprom information using the existing ethtool_eeprom structture to return the data Signed-off-by: NStuart Hodgson <smhodgson@solarflare.com> Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
-
由 Ben Hutchings 提交于
We want to support reading module (SFP+, XFP, ...) EEPROMs as well as NIC EEPROMs. They will need a different command number and driver operation, but the structure and arguments will be the same and so we can share most of the code here. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
-
- 09 5月, 2012 1 次提交
-
-
由 Hans Schillstrom 提交于
To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max needs to be exported. The dependency was added by "ipvs: wakeup master thread" patch. Signed-off-by: NHans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: NSimon Horman <horms@verge.net.au> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 07 5月, 2012 3 次提交
-
-
由 Alexander Duyck 提交于
With the recent changes for how we compute the skb truesize it occurs to me we are probably going to have a lot of calls to skb_end_pointer - skb->head. Instead of running all over the place doing that it would make more sense to just make it a separate inline skb_end_offset(skb) that way we can return the correct value without having gcc having to do all the optimization to cancel out skb->head - skb->head. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
Since there is now only one spot that actually uses "fastpath" there isn't much point in carrying it. Instead we can just use a check for skb_cloned to verify if we can perform the fast-path free for the head or not. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
The fast-path for pskb_expand_head contains a check where the size plus the unaligned size of skb_shared_info is compared against the size of the data buffer. This code path has two issues. First is the fact that after the recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info is always placed in the optimal spot for a buffer size making this check unnecessary. The second issue is the fact that the check doesn't take into account the aligned size of shared info. As a result the code burns cycles doing a memcpy with nothing actually being shifted. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 5月, 2012 2 次提交
-
-
由 Alexander Duyck 提交于
This patch adds support for a skb_head_is_locked helper function. It is meant to be used any time we are considering transferring the head from skb->head to a paged frag. If the head is locked it means we cannot remove the head from the skb so it must be copied or we must take the skb as a whole. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
GRO is very optimistic in skb truesize estimates, only taking into account the used part of fragments. Be conservative, and use more precise computation, so that bloated GRO skbs can be collapsed eventually. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 5月, 2012 4 次提交
-
-
由 Eric W. Biederman 提交于
These function are no longer needed replace them with their more useful equivalents. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 David S. Miller 提交于
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change is meant ot prevent stealing the skb->head to use as a page in the event that the skb->head was cloned. This allows the other clones to track each other via shinfo->dataref. Without this we break down to two methods for tracking the reference count, one being dataref, the other being the page count. As a result it becomes difficult to track how many references there are to skb->head. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neil Horman 提交于
I just noticed after some recent updates, that the init path for the drop monitor protocol has a minor error. drop monitor maintains a per cpu structure, that gets initalized from a single cpu. Normally this is fine, as the protocol isn't in use yet, but I recently made a change that causes a failed skb allocation to reschedule itself . Given the current code, the implication is that this workqueue reschedule will take place on the wrong cpu. If drop monitor is used early during the boot process, its possible that two cpus will access a single per-cpu structure in parallel, possibly leading to data corruption. This patch fixes the situation, by storing the cpu number that a given instance of this per-cpu data should be accessed from. In the case of a need for a reschedule, the cpu stored in the struct is assigned the rescheule, rather than the currently executing cpu Tested successfully by myself. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> CC: David Miller <davem@davemloft.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 5月, 2012 4 次提交
-
-
由 Eric Dumazet 提交于
TCP or UDP stacks have big enough latencies that prefetching next pointer is worth it. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
__skb_splice_bits() can check if skb to be spliced has its skb->head mapped to a page fragment, instead of a kmalloc() area. If so we can avoid a copy of the skb head and get a reference on underlying page. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
GRO can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We 'upgrade' skb->head as a fragment in itself This avoids the frag_list fallback, and permits to build true GRO skb (one sk_buff and up to 16 fragments), using less memory. This reduces number of cache misses when user makes its copy, since a single sk_buff is fetched. This is a followup of patch "net: allow skb->head to be a page fragment" Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
skb->head is currently allocated from kmalloc(). This is convenient but has the drawback the data cannot be converted to a page fragment if needed. We have three spots were it hurts : 1) GRO aggregation When a linear skb must be appended to another skb, GRO uses the frag_list fallback, very inefficient since we keep all struct sk_buff around. So drivers enabling GRO but delivering linear skbs to network stack aren't enabling full GRO power. 2) splice(socket -> pipe). We must copy the linear part to a page fragment. This kind of defeats splice() purpose (zero copy claim) 3) TCP coalescing. Recently introduced, this permits to group several contiguous segments into a single skb. This shortens queue lengths and save kernel memory, and greatly reduce probabilities of TCP collapses. This coalescing doesnt work on linear skbs (or we would need to copy data, this would be too slow) Given all these issues, the following patch introduces the possibility of having skb->head be a fragment in itself. We use a new skb flag, skb->head_frag to carry this information. build_skb() is changed to accept a frag_size argument. Drivers willing to provide a page fragment instead of kmalloc() data will set a non zero value, set to the fragment size. Then, on situations we need to convert the skb head to a frag in itself, we can check if skb->head_frag is set and avoid the copies or various fallbacks we have. This means drivers currently using frags could be updated to avoid the current skb->head allocation and reduce their memory footprint (aka skb truesize). (thats 512 or 1024 bytes saved per skb). This also makes bpf/netfilter faster since the 'first frag' will be part of skb linear part, no need to copy data. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 4月, 2012 1 次提交
-
-
由 Jeffrin Jose 提交于
Fixed a coding style issue relating to spaces in net/core/sock.c Signed-off-by: NJeffrin Jose <ahiliation@yahoo.co.in> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-