- 30 11月, 2011 7 次提交
-
-
由 Eric Dumazet 提交于
Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_sendmsg() uses select_size() helper to choose skb head size when a new skb must be allocated. If GSO is enabled for the socket, current strategy is to force all payload data to be outside of headroom, in PAGE fragments. This strategy is not welcome for small packets, wasting memory. Experiments show that best results are obtained when using 2048 bytes for skb head (This includes the skb overhead and various headers) This patch provides better len/truesize ratios for packets sent to loopback device, and reduce memory needs for in-flight loopback packets, particularly on arches with big pages. If a sender sends many 1-byte packets to an unresponsive application, receiver rmem_alloc will grow faster and will stop queuing these packets sooner, or will collapse its receive queue to free excess memory. netperf -t TCP_RR results are improved by ~4 %, and many workloads are improved as well (tbench, mysql...) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Le lundi 28 novembre 2011 à 19:06 -0500, David Miller a écrit : > From: Dimitris Michailidis <dm@chelsio.com> > Date: Mon, 28 Nov 2011 08:25:39 -0800 > > >> +bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys > >> *flow) > >> +{ > >> + int poff, nhoff = skb_network_offset(skb); > >> + u8 ip_proto; > >> + u16 proto = skb->protocol; > > > > __be16 instead of u16 for proto? > > I'll take care of this when I apply these patches. ( CC trimmed ) Thanks David ! Here is a small patch to use one 64bit load/store on x86_64 instead of two 32bit load/stores. [PATCH net-next] flow_dissector: use a 64bit load/store gcc compiler is smart enough to use a single load/store if we memcpy(dptr, sptr, 8) on x86_64, regardless of CONFIG_CC_OPTIMIZE_FOR_SIZE In IP header, daddr immediately follows saddr, this wont change in the future. We only need to make sure our flow_keys (src,dst) fields wont break the rule. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Networking stack support for byte queue limits, uses dynamic queue limits library. Byte queue limits are maintained per transmit queue, and a dql structure has been added to netdev_queue structure for this purpose. Configuration of bql is in the tx-<n> sysfs directory for the queue under the byte_queue_limits directory. Configuration includes: limit_min, bql minimum limit limit_max, bql maximum limit hold_time, bql slack hold time Also under the directory are: limit, current byte limit inflight, current number of bytes on the queue Signed-off-by: NTom Herbert <therbert@google.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch moves the xps specific parts in netdev_queue_release into its own function which netdev_queue_release can call. This allows netdev_queue_release to be more generic (for adding new attributes to tx queues). Signed-off-by: NTom Herbert <therbert@google.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Create separate queue state flags so that either the stack or drivers can turn on XOFF. Added a set of functions used in the stack to determine if a queue is really stopped (either by stack or driver) Signed-off-by: NTom Herbert <therbert@google.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 11月, 2011 17 次提交
-
-
由 Neal Cardwell 提交于
Since 2005 (c1b4a7e6) tcp_tso_should_defer has been using tcp_max_burst() as a target limit for deciding how large to make outgoing TSO packets when not using sysctl_tcp_tso_win_divisor. But since 2008 (dd9e0dda) tcp_max_burst() returns the reordering degree. We should not have tcp_tso_should_defer attempt to build larger segments just because there is more reordering. This commit splits the notion of deferral size used in TSO from the notion of burst size used in cwnd moderation, and returns the TSO deferral limit to its original value. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pascal Hambourg 提交于
Use memcmp() instead of cast to u16 when checking the PAD field. Signed-off-by: NPascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: Nchas williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pascal Hambourg 提交于
Routed payload requires less headroom than bridged payload. So do not reallocate headroom if not needed. Also, add worst case AAL5 overhead to netdev->hard_header_len. Signed-off-by: NPascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: Nchas williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We can test/set multiple bits from sk_flags at once, to shorten a bit socket setup/dismantle phase. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Igor Maravic reported an error caused by jump_label_dec() being called from IRQ context : BUG: sleeping function called from invalid context at kernel/mutex.c:271 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper 1 lock held by swapper/0: #0: (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340 Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1 Call Trace: <IRQ> [<ffffffff8104f417>] __might_sleep+0x137/0x1f0 [<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370 [<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8109a37f>] ? local_clock+0x6f/0x80 [<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0 [<ffffffff81557929>] ? sock_def_write_space+0x59/0x160 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80 [<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50 [<ffffffff81566525>] net_disable_timestamp+0x15/0x20 [<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50 [<ffffffff81557b00>] __sk_free+0x80/0x200 [<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff81557cba>] sock_wfree+0x3a/0x70 [<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120 [<ffffffff8155c0b6>] __kfree_skb+0x16/0x30 [<ffffffff8155c119>] kfree_skb+0x49/0x170 [<ffffffff815e936e>] arp_error_report+0x3e/0x90 [<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0 [<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0 [<ffffffff81578d20>] ? neigh_update+0x640/0x640 [<ffffffff81073558>] __do_softirq+0xc8/0x3a0 Since jump_label_{inc|dec} must be called from process context only, we must defer jump_label_dec() if net_disable_timestamp() is called from interrupt context. Reported-by: NIgor Maravic <igorm@etf.rs> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ralf Baechle 提交于
The Linux coding style wants the return statement on its own line. Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ralf Baechle 提交于
nr_route.ndigis is unsigned int so the nr_route.ndigis < 0 expression is never true and can be dropped. Doing the nr_ax25_dev_get call later allows the nr_route.ndigis test to bail out without having to dev_put. Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Cc: Thomas Osterried <thomas@osterried.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ralf Baechle 提交于
struct nr_route_struct's mnemonic permits a string of up to 7 bytes to be used. If userland passes a not zero terminated string to the kernel adding a node to the routing table might result in the kernel attempting to read copy a too long string. Mnemonic is part of the NET/ROM routing protocol; NET/ROM routing table updates only broadcast 6 bytes. The 7th byte in the mnemonic array exists only as a \0 termination character for the kernel code's convenience. Fixed by rejecting mnemonic strings that have no terminating \0 in the first 7 characters. Do this test only NETROM_NODE to avoid breaking NETROM_NEIGH where userland might passing an uninitialized mnemonic field. Initial patch by Dan Carpenter <dan.carpenter@oracle.com>. Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Walter Harms <wharms@bfs.de> Cc: Thomas Osterried <thomas@osterried.de> Acked-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ralf Baechle 提交于
Very large, nonsenical arguments or use in very extreme conditions could result in integer overflows. Check ioctls arguments to avoid such overflows and return -EINVAL for too large arguments. To allow the use of AX.25 for even the most extreme setup (think packet radio to the Phase 5E mars probe) we make no further attempt to clamp the argument range. Originally reported by Fan Long <longfancn@gmail.com> and a first patch was sent by Xi Wang <xi.wang@gmail.com>. Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Cc: Xi Wang <xi.wang@gmail.com> Cc: Joerg Reuter <jreuter@yaina.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Thomas Osterried <thomas@osterried.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Support for specific hardware belongs under drivers/net/ not net/. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Acked-by: NLennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Any headers included by drivers should be under include/, and any definitions they use are not really private to the core as the name "dsa_priv.h" suggests. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Acked-by: NLennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
I mistakenly exported functions from slave.c that are only called from dsa.c, part of the same module. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Acked-by: NLennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Current SFB double hashing is not fulfilling SFB theory, if two flows share same rxhash value. Using skb_flow_dissect() permits to really have better hash dispersion, and get tunnelling support as well. Double hashing point was mentioned by Florian Westphal Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. This lack of tunnelling support was mentioned by Dan Siemon. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
No functional changes. This uses the code we factorized in skb_flow_dissect() Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We use at least two flow dissectors in network stack, with known limitations and code duplication. Introduce skb_flow_dissect() to factorize this, highly inspired from existing dissector from __skb_get_rxhash() Note : We extensively use skb_header_pointer(), this permits us to not touch skb at all. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Now sk_route_caps is u64, its dangerous to use an integer to store result of an AND operator. It wont work if NETIF_F_SG is moved on the upper part of u64. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 11月, 2011 5 次提交
-
-
由 Neal Cardwell 提交于
The problem: Senders were overriding cwnd values picked during an undo by calling tcp_moderate_cwnd() in tcp_try_to_open(). The fix: Don't moderate cwnd in tcp_try_to_open() if we're in TCP_CA_Open, since doing so is generally unnecessary and specifically would override a DSACK-based undo of a cwnd reduction made in fast recovery. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Previously, SACK-enabled connections hung around in TCP_CA_Disorder state while snd_una==high_seq, just waiting to accumulate DSACKs and hopefully undo a cwnd reduction. This could and did lead to the following unfortunate scenario: if some incoming ACKs advance snd_una beyond high_seq then we were setting undo_marker to 0 and moving to TCP_CA_Open, so if (due to reordering in the ACK return path) we shortly thereafter received a DSACK then we were no longer able to undo the cwnd reduction. The change: Simplify the congestion avoidance state machine by removing the behavior where SACK-enabled connections hung around in the TCP_CA_Disorder state just waiting for DSACKs. Instead, when snd_una advances to high_seq or beyond we typically move to TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or TCP_CA_Disorder if we later receive enough DSACKs. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
The bug: When the ACK field is below snd_una (which can happen when ACKs are reordered), senders ignored DSACKs (preventing undo) and did not call tcp_fastretrans_alert, so they did not increment prr_delivered to reflect newly-SACKed sequence ranges, and did not call tcp_xmit_retransmit_queue, thus passing up chances to send out more retransmitted and new packets based on any newly-SACKed packets. The change: When the ACK field is below snd_una (the "old_ack" goto label), call tcp_fastretrans_alert to allow undo based on any newly-arrived DSACKs and try to send out more packets based on newly-SACKed packets. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
The bug: Senders ignored DSACKs after recovery when there were no outstanding packets (a common scenario for HTTP servers). The change: when there are no outstanding packets (the "no_queue" goto label), call tcp_fastretrans_alert() in order to use DSACKs to undo congestion window reductions. Other patches in this series will provide other changes that are necessary to fully fix this problem. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Allow callers to decide whether an ACK is a duplicate ACK. This is a prerequisite to allowing fastretrans_alert to be called from new contexts, such as the no_queue and old_ack code paths, from which we have extra info that tells us whether an ACK is a dupack. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 11月, 2011 11 次提交
-
-
Signed-off-by: NChas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Change the kconfig types to tristate and adjust the condition for declaring net_device::dsa_ptr to allow for this. Adjust the makefile so that if NET_DSA_MV88E6123_61_65=y and NET_DSA_MV88E6131=m or vice versa then both drivers are built-in. We could leave these options as bool and make NET_DSA_MV88E6XXX a user-selected option, but that would break existing configurations. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
These drivers share a lot of code, so if we make them modular they should be built into the same module. Therefore, link them together and merge their respective module init and exit functions. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
These files have circular dependencies, so if we make DSA modular then they must be built into the same module. Therefore, link them together and merge their respective module init and exit functions. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
eth_type_trans() will use these functions if DSA is enabled, which blocks building DSA as a module. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
The pmtu informations on the inetpeer are visible for output and input routes. On packet forwarding, we might propagate a learned pmtu to the sender. As we update the pmtu informations of the inetpeer on demand, the original sender of the forwarded packets might never notice when the pmtu to that inetpeer increases. So use the mtu of the outgoing device on packet forwarding instead of the pmtu to the final destination. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
We move all mtu handling from dst_mtu() down to the protocol layer. So each protocol can implement the mtu handling in a different manner. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
We plan to invoke the dst_opt->default_mtu() method unconditioally from dst_mtu(). So rename the method to dst_opt->mtu() to match the name with the new meaning. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
As it is, we return null as the default mtu of blackhole routes. This may lead to a propagation of a bogus pmtu if the default_mtu method of a blackhole route is invoked. So return dst->dev->mtu as the default mtu instead. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-