- 09 6月, 2012 1 次提交
-
-
由 Gao feng 提交于
now inetpeer doesn't support namespace,the information will be leaking across namespace. this patch move the global vars v4_peers and v6_peers to netns_ipv4 and netns_ipv6 as a field peers. add struct pernet_operations inetpeer_ops to initial pernet inetpeer data. and change family_to_base and inet_getpeer to support namespace. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 6月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
commit 5faa5df1 (inetpeer: Invalidate the inetpeer tree along with the routing cache) added a race : Before freeing an inetpeer, we must respect a RCU grace period, and make sure no user will attempt to increase refcnt. inetpeer_invalidate_tree() waits for a RCU grace period before inserting inetpeer tree into gc_list and waking the worker. At that time, no concurrent lookup can find a inetpeer in this tree. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 6月, 2012 4 次提交
-
-
由 Joe Perches 提交于
Adding casts of objects to the same type is unnecessary and confusing for a human reader. For example, this cast: int y; int *p = (int *)&y; I used the coccinelle script below to find and remove these unnecessary casts. I manually removed the conversions this script produces of casts with __force and __user. @@ type T; T *p; @@ - (T *)p + p Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Remove some dropwatch/drop_monitor false positives. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_make_synack() clones the dst, and callers release it. We can avoid two atomic operations per SYNACK if tcp_make_synack() consumes dst instead of cloning it. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
There is no value using sock_wmalloc() in tcp_make_synack(). A listener socket only sends SYNACK packets, they are not queued in a socket queue, only in Qdisc and device layers, so the number of in flight packets is limited in these layers. We used sock_wmalloc() with the %force parameter set to 1 to ignore socket limits anyway. This patch removes two atomic operations per SYNACK packet. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 6月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
While testing how linux behaves on SYNFLOOD attack on multiqueue device (ixgbe), I found that SYNACK messages were dropped at Qdisc level because we send them all on a single queue. Obvious choice is to reflect incoming SYN packet @queue_mapping to SYNACK packet. Under stress, my machine could only send 25.000 SYNACK per second (for 200.000 incoming SYN per second). NIC : ixgbe with 16 rx/tx queues. After patch, not a single SYNACK is dropped. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting larger and larger, using lots of memory and cpu time. tcp_v4_send_synack() ->inet_csk_route_req() ->ip_route_output_flow() ->rt_set_nexthop() ->rt_init_metrics() ->inet_getpeer( create = true) This is a side effect of commit a4daad6b (net: Pre-COW metrics for TCP) added in 2.6.39 Possible solution : Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS Before patch : # grep peer /proc/slabinfo inet_peer_cache 4175430 4175430 192 42 2 : tunables 0 0 0 : slabdata 99415 99415 0 Samples: 41K of event 'cycles', Event count (approx.): 30716565122 + 20,24% ksoftirqd/0 [kernel.kallsyms] [k] inet_getpeer + 8,19% ksoftirqd/0 [kernel.kallsyms] [k] peer_avl_rebalance.isra.1 + 4,81% ksoftirqd/0 [kernel.kallsyms] [k] sha_transform + 3,64% ksoftirqd/0 [kernel.kallsyms] [k] fib_table_lookup + 2,36% ksoftirqd/0 [ixgbe] [k] ixgbe_poll + 2,16% ksoftirqd/0 [kernel.kallsyms] [k] __ip_route_output_key + 2,11% ksoftirqd/0 [kernel.kallsyms] [k] kernel_map_pages + 2,11% ksoftirqd/0 [kernel.kallsyms] [k] ip_route_input_common + 2,01% ksoftirqd/0 [kernel.kallsyms] [k] __inet_lookup_established + 1,83% ksoftirqd/0 [kernel.kallsyms] [k] md5_transform + 1,75% ksoftirqd/0 [kernel.kallsyms] [k] check_leaf.isra.9 + 1,49% ksoftirqd/0 [kernel.kallsyms] [k] ipt_do_table + 1,46% ksoftirqd/0 [kernel.kallsyms] [k] hrtimer_interrupt + 1,45% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc + 1,29% ksoftirqd/0 [kernel.kallsyms] [k] inet_csk_search_req + 1,29% ksoftirqd/0 [kernel.kallsyms] [k] __netif_receive_skb + 1,16% ksoftirqd/0 [kernel.kallsyms] [k] copy_user_generic_string + 1,15% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free + 1,02% ksoftirqd/0 [kernel.kallsyms] [k] tcp_make_synack + 0,93% ksoftirqd/0 [kernel.kallsyms] [k] _raw_spin_lock_bh + 0,87% ksoftirqd/0 [kernel.kallsyms] [k] __call_rcu + 0,84% ksoftirqd/0 [kernel.kallsyms] [k] rt_garbage_collect + 0,84% ksoftirqd/0 [kernel.kallsyms] [k] fib_rules_lookup Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 5月, 2012 1 次提交
-
-
由 Glauber Costa 提交于
We call the destroy function when a cgroup starts to be removed, such as by a rmdir event. However, because of our reference counters, some objects are still inflight. Right now, we are decrementing the static_keys at destroy() time, meaning that if we get rid of the last static_key reference, some objects will still have charges, but the code to properly uncharge them won't be run. This becomes a problem specially if it is ever enabled again, because now new charges will be added to the staled charges making keeping it pretty much impossible. We just need to be careful with the static branch activation: since there is no particular preferred order of their activation, we need to make sure that we only start using it after all call sites are active. This is achieved by having a per-memcg flag that is only updated after static_key_slow_inc() returns. At this time, we are sure all sites are active. This is made per-memcg, not global, for a reason: it also has the effect of making socket accounting more consistent. The first memcg to be limited will trigger static_key() activation, therefore, accounting. But all the others will then be accounted no matter what. After this patch, only limited memcgs will have its sockets accounted. [akpm@linux-foundation.org: move enum sock_flag_bits into sock.h, document enum sock_flag_bits, convert memcg_proto_active() and memcg_proto_activated() to test_bit(), redo tcp_update_limit() comment to 80 cols] Signed-off-by: NGlauber Costa <glommer@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 5月, 2012 1 次提交
-
-
由 Benjamin Poirier 提交于
Corrects the function that determines the esp payload size. The calculations done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for certain mtu values and suboptimal frames for others. According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(), net_header_len must be taken into account before doing the alignment calculation. Signed-off-by: NBenjamin Poirier <bpoirier@suse.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 5月, 2012 3 次提交
-
-
由 Eric Dumazet 提交于
Sergio Correia reported following warning : WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110() WARN(skb && !before(tp->copied_seq, TCP_SKB_CB(skb)->end_seq), "cleanup rbuf bug: copied %X seq %X rcvnxt %X\n", tp->copied_seq, TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt); It appears TCP coalescing, and more specifically commit b081f85c (net: implement tcp coalescing in tcp_queue_rcv()) should take care of possible segment overlaps in receive queue. This was properly done in the case of out_or_order_queue by the caller. For example, segment at tail of queue have sequence 1000-2000, and we add a segment with sequence 1500-2500. This can happen in case of retransmits. In this case, just don't do the coalescing. Reported-by: NSergio Correia <lists@uece.net> Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NSergio Correia <lists@uece.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yanmin Zhang 提交于
We hit a kernel OOPS. <3>[23898.789643] BUG: sleeping function called from invalid context at /data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103 <3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name: Thread-6683 <4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me to suspend... <4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G W 3.0.8-137685-ge7742f9 #1 <4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me to suspend... <4>[23899.904225] Call Trace: <4>[23899.989209] [<c1227f50>] ? pgtable_bad+0x130/0x130 <4>[23900.000416] [<c1238c2a>] __might_sleep+0x10a/0x110 <4>[23900.007357] [<c1228021>] do_page_fault+0xd1/0x3c0 <4>[23900.013764] [<c18e9ba9>] ? restore_all+0xf/0xf <4>[23900.024024] [<c17c007b>] ? napi_complete+0x8b/0x690 <4>[23900.029297] [<c1227f50>] ? pgtable_bad+0x130/0x130 <4>[23900.123739] [<c1227f50>] ? pgtable_bad+0x130/0x130 <4>[23900.128955] [<c18ea0c3>] error_code+0x5f/0x64 <4>[23900.133466] [<c1227f50>] ? pgtable_bad+0x130/0x130 <4>[23900.138450] [<c17f6298>] ? __ip_route_output_key+0x698/0x7c0 <4>[23900.144312] [<c17f5f8d>] ? __ip_route_output_key+0x38d/0x7c0 <4>[23900.150730] [<c17f63df>] ip_route_output_flow+0x1f/0x60 <4>[23900.156261] [<c181de58>] ip4_datagram_connect+0x188/0x2b0 <4>[23900.161960] [<c18e981f>] ? _raw_spin_unlock_bh+0x1f/0x30 <4>[23900.167834] [<c18298d6>] inet_dgram_connect+0x36/0x80 <4>[23900.173224] [<c14f9e88>] ? _copy_from_user+0x48/0x140 <4>[23900.178817] [<c17ab9da>] sys_connect+0x9a/0xd0 <4>[23900.183538] [<c132e93c>] ? alloc_file+0xdc/0x240 <4>[23900.189111] [<c123925d>] ? sub_preempt_count+0x3d/0x50 Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing fi. Other cpu might be accessing fi. Fixing it by delaying the releasing. With the patch, we ran MTBF testing on Android mobile for 12 hours and didn't trigger the issue. Thank Eric for very detailed review/checking the issue. Signed-off-by: NYanmin Zhang <yanmin_zhang@linux.intel.com> Signed-off-by: NKun Jiang <kunx.jiang@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tim Bird 提交于
UDP stack needs a minimum hash size value for proper operation and also uses alloc_large_system_hash() for proper NUMA distribution of its hash tables and automatic sizing depending on available system memory. On some low memory situations, udp_table_init() must ignore the alloc_large_system_hash() result and reallocs a bigger memory area. As we cannot easily free old hash table, we leak it and kmemleak can issue a warning. This patch adds a low limit parameter to alloc_large_system_hash() to solve this problem. We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table allocation. Reported-by: NMark Asselstine <mark.asselstine@windriver.com> Reported-by: NTim Bird <tim.bird@am.sony.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2012 4 次提交
-
-
由 Eldad Zack 提交于
Replace simple_strtoul with kstrtoul in three similar occurrences, all setup handlers: * route.c: set_rhash_entries * tcp.c: set_thash_entries * udp.c: set_uhash_entries Also check if the conversion failed. Signed-off-by: NEldad Zack <eldad@fogrefinery.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eldad Zack 提交于
The __setup macro should follow the corresponding setup handler. Signed-off-by: NEldad Zack <eldad@fogrefinery.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ip_frag_reasm() can use skb_try_coalesce() to build optimized skb, reducing memory used by them (truesize), and reducing number of cache line misses and overhead for the consumer. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Move tcp_try_coalesce() protocol independent part to skb_try_coalesce(). skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly, to build optimized skbs (less sk_buff, and possibly less 'headers') skb_try_coalesce() is zero copy, unless the copy can fit in destination header (its a rare case) kfree_skb_partial() is also moved to net/core/skbuff.c and exported, because IPv6 will need it in patch (ipv6: use skb coalescing in reassembly). Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2012 3 次提交
-
-
由 Eric Dumazet 提交于
- match() method returns a boolean - return (A && B && C && D) -> return A && B && C && D - fix indentation Signed-off-by: NEric Dumazet <edumazet@google.com>
-
由 Willy Tarreau 提交于
Since recent changes on TCP splicing (starting with commits 2f533844 "tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp: tcp_sendpages() should call tcp_push() once"), I started seeing massive stalls when forwarding traffic between two sockets using splice() when pipe buffers were larger than socket buffers. Latest changes (net: netdev_alloc_skb() use build_skb()) made the problem even more apparent. The reason seems to be that if do_tcp_sendpages() fails on out of memory condition without being able to send at least one byte, tcp_push() is not called and the buffers cannot be flushed. After applying the attached patch, I cannot reproduce the stalls at all and the data rate it perfectly stable and steady under any condition which previously caused the problem to be permanent. The issue seems to have been there since before the kernel migrated to git, which makes me think that the stalls I occasionally experienced with tux during stress-tests years ago were probably related to the same issue. This issue was first encountered on 3.0.31 and 3.2.17, so please backport to -stable. Signed-off-by: NWilly Tarreau <w@1wt.eu> Acked-by: NEric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org>
-
由 Eric Dumazet 提交于
bool conversions where possible. __inline__ -> inline space cleanups Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
bool/const conversions where possible __inline__ -> inline space cleanups Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2012 4 次提交
-
-
由 Joe Perches 提交于
Use the current debugging style and enable dynamic_debug. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Gortmaker 提交于
We are going to delete the Token ring support. This removes any special processing in the core networking for token ring, (aside from net/tr.c itself), leaving the drivers and remaining tokenring support present but inert. The mass removal of the drivers and net/tr.c will be in a separate commit, so that the history of these files that we still care about won't have the giant deletion tied into their history. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Joe Perches 提交于
Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jan Beulich 提交于
By making this a standalone config option (auto-selected as needed), selecting CRYPTO from here rather than from XFRM (which is boolean) allows the core crypto code to become a module again even when XFRM=y. Signed-off-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 5月, 2012 4 次提交
-
-
由 Pavel Emelyanov 提交于
As proposed by Eric, make the tcp_input.o thinner. add/remove: 1/1 grow/shrink: 1/4 up/down: 868/-1329 (-461) function old new delta tcp_try_rmem_schedule - 864 +864 tcp_ack 4811 4815 +4 tcp_validate_incoming 817 815 -2 tcp_collapse 860 858 -2 tcp_send_rcvq 555 353 -202 tcp_data_queue 3435 3033 -402 tcp_prune_queue 721 - -721 Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
As noted by Eric, no checks are performed on the data size we're putting in the read queue during repair. Thus, validate the given data size with the common rmem management routine. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
It actually works on the input queue and will use its read mem routines, thus it's better to have in in the tcp_input.c file. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Due to RCU lookups and RCU based release, fib_info objects can be found during lookup which have fi->fib_dead set. We must ignore these entries, otherwise we risk dereferencing the parts of the entry which are being torn down. Reported-by: NYevgen Pronenko <yevgen.pronenko@sonymobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 5月, 2012 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
This patch removes ip_queue support which was marked as obsolete years ago. The nfnetlink_queue modules provides more advanced user-space packet queueing mechanism. This patch also removes capability code included in SELinux that refers to ip_queue. Otherwise, we break compilation. Several warning has been sent regarding this to the mailing list in the past month without anyone rising the hand to stop this with some strong argument. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 08 5月, 2012 1 次提交
-
-
由 Jiri Pirko 提交于
Until now, struct mreq has not been recognized and it was worked with as with struct in_addr. That means imr_multiaddr was copied to imr_address. So do recognize struct mreq here and copy that correctly. Signed-off-by: NJiri Pirko <jpirko@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 5月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
It appears some networks play bad games with the two bits reserved for ECN. This can trigger false congestion notifications and very slow transferts. Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can disable TCP ECN negociation if it happens we receive mangled CT bits in the SYN packet. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Perry Lorier <perryl@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Wilmer van der Gaast <wilmer@google.com> Cc: Ankur Jain <jankur@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Dave Täht <dave.taht@bufferbloat.net> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 5月, 2012 1 次提交
-
-
由 Alexander Duyck 提交于
This patch adds support for a skb_head_is_locked helper function. It is meant to be used any time we are considering transferring the head from skb->head to a paged frag. If the head is locked it means we cannot remove the head from the skb so it must be copied or we must take the skb as a whole. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 5月, 2012 7 次提交
-
-
由 Eric W. Biederman 提交于
As a first step to converting struct cred to be all kuid_t and kgid_t values convert the group values stored in group_info to always be kgid_t values. Unless user namespaces are used this change should have no effect. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Alexander Duyck 提交于
This change cleans up the last bits of tcp_try_coalesce so that we only need one goto which jumps to the end of the function. The idea is to make the code more readable by putting things in a linear order so that we start execution at the top of the function, and end it at the bottom. I also made a slight tweak to the code for handling frags when we are a clone. Instead of making it an if (clone) loop else nr_frags = 0 I changed the logic so that if (!clone) we just set the number of frags to 0 which disables the for loop anyway. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change reorders the code related to the use of an skb->head_frag so it is placed before we check the rest of the frags. This allows the code to read more linearly instead of like some sort of loop. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch addresses several issues in the way we were tracking the truesize in tcp_try_coalesce. First it was using ksize which prevents us from having a 0 sized head frag and getting a usable result. To resolve that this patch uses the end pointer which is set based off either ksize, or the frag_size supplied in build_skb. This allows us to compute the original truesize of the entire buffer and remove that value leaving us with just what was added as pages. The second issue was the use of skb->len if there is a mergeable head frag. We should only need to remove the size of an data aligned sk_buff from our current skb->truesize to compute the delta for a buffer with a reused head. By using skb->len the value of truesize was being artificially reduced which means that head frags could use more memory than buffers using standard allocations. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change is meant ot prevent stealing the skb->head to use as a page in the event that the skb->head was cloned. This allows the other clones to track each other via shinfo->dataref. Without this we break down to two methods for tracking the reference count, one being dataref, the other being the page count. As a result it becomes difficult to track how many references there are to skb->head. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main receiver function when application is not blocked in recvmsg(). Function tcp_queue_rcv() is moved a bit to allow its call from tcp_data_queue() This gives good results especially if GRO could not kick, and if skb head is a fragment. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Before stealing fragments or skb head, we must make sure skbs are not cloned. Alexander was worried about destination skb being cloned : In bridge setups, a driver could be fooled if skb->data_len would not match skb nr_frags. If source skb is cloned, we must take references on pages instead. Bug happened using tcpdump (if not using mmap()) Introduce kfree_skb_partial() helper to cleanup code. Reported-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-