- 24 8月, 2010 8 次提交
-
-
由 Eric Dumazet 提交于
No need to use a temporary struct rtnl_link_stats64 variable, just copy the source to skb buffer. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Reviewed-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
It uses ip_send_check() and stuff like that. Reported-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gerrit Renker 提交于
The current CCID-2 RTT estimator code is in parts broken and lags behind the suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR. That code is replaced by the present patch, which reuses the Linux TCP RTT estimator code. Further details: ---------------- 1. The minimum RTO of previously one second has been replaced with TCP's, since RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4) is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's concept of a default RTT (RFC 4340, 3.4). 2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with RFC2988, (2.5). 3. De-inlined the function ccid2_new_ack(). 4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will give the wrong estimate. It should be replaced with one sample per Ack. However, at the moment this can not be resolved easily, since - it depends on TX history code (which also needs some work), - the cleanest solution is not to use the `sent' time at all (saves 4 bytes per entry) and use DCCP timestamps / elapsed time to estimated the RTT, which however is non-trivial to get right (but needs to be done). Reasons for reusing the Linux TCP estimator algorithm: ------------------------------------------------------ Some time was spent to find a better alternative, using basic RFC2988 as a first step. Further analysis and experimentation showed that the Linux TCP RTO estimator is superior to a basic RFC2988 implementation. A summary is on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/ In addition, this estimator fared well in a recent empirical evaluation: Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith. A Performance Study of Loss Detection/Recovery in Real-world TCP Implementations. Proceedings of 15th IEEE International Conference on Network Protocols (ICNP-07), 2007. Thus there is significant benefit in reusing the existing TCP code. Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gerrit Renker 提交于
This removes the dec_pipe function and improves the way the RTO timer is rearmed when a new acknowledgment comes in. Details and justification for removal: -------------------------------------- 1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX history entries between tail and head, for which it had previously been incremented in tx_packet_sent; and it is not decremented twice for the same entry, since it is - either decremented when a corresponding Ack Vector cell in state 0 or 1 was received (and then ccid2s_acked==1), - or it is decremented when ccid2s_acked==0, as part of the loss detection in tx_packet_recv (and hence it can not have been decremented earlier). 2) Restarting the RTO timer happens for every single entry in each Ack Vector parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to 16192 times per Ack Vector). 3) The RTO timer should not be restarted when all outstanding data has been acknowledged. This is currently done similar to (2), in dec_pipe, when pipe has reached 0. The patch onsolidates the code which rearms the RTO timer, combining the segments from new_ack and dec_pipe. As a result, the code becomes clearer (compare with tcp_rearm_rto()). Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gerrit Renker 提交于
This removes the ccid2_hc_tx_check_sanity function: it is redundant. Details: The tx_check_sanity function performs three tests: 1) it checks that the circular TX list is sorted - in ascending order of sequence number (ccid2s_seq) - and time (ccid2s_sent), - in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh); 2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN; 3) it ensures that pipe equals the number of packets that were not marked `acked' (ccid2s_acked) between `tail' and `head'. The following argues that each of these tests is redundant, this can be verified by going through the code. (1) is not necessary, since both time and GSS increase from one packet to the next, so that subsequent insertions in tx_packet_sent (which advance the `head' pointer) will be in ascending order of time and sequence number. In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN (set to 1024) unless allocation caused an earlier failure, because: * at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1; * subsequent calls to tx_alloc_seq take place whenever head->next == tail in tx_packet_sent; then a new chunk of size 1024 is inserted between head and tail, and seqbufc is incremented by one. To show that (3) is redundant requires looking at two cases. The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and decremented in tx_packet_recv. When head == tail (TX history empty) then pipe should be 0, which is the case directly after initialisation and after a retransmission timeout has occurred (ccid2_hc_tx_rto_expire). The first case involves parsing Ack Vectors for packets recorded in the live portion of the buffer, between tail and head. For each packet marked by the receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by one, so for all such packets the BUG_ON in tx_check_sanity will not trigger. The second case is the loss detection in the second half of tx_packet_recv, below the comment "Check for NUMDUPACK". The first while-loop here ensures that the sequence number of `seqp' is either above or equal to `high_ack', or otherwise equal to the highest sequence number sent so far (of the entry head->prev, as head points to the next unsent entry). The next while-loop ("while (1)") counts the number of acked packets starting from that position of seqp, going backwards in the direction from head->prev to tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block is entered next. The while-loop contained within that block in turn traverses the list backwards, from head to tail; the position of `seqp' is saved in the variable `last_acked'. For each packet not marked as `acked', a congestion event is triggered within the loop, and pipe is decremented. The loop terminates when `seqp' has reached `tail', whereupon tail is set to the position previously stored in `last_acked'. Thus, between `last_acked' and the previous position of `tail', - pipe has been decremented earlier if the packet was marked as state 0 or 1; - pipe was decremented if the packet was not marked as acked. That is, pipe has been decremented by the number of packets between `last_acked' and the previous position of `tail'. As a consequence, pipe now again reflects the number of packets which have not (yet) been acked between the new position of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that the BUG_ON condition in check_sanity will also not be triggered, hence the test (3) is also redundant. Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gerrit Renker 提交于
The CCIDs are activated as last of the features, at the end of the handshake, were the LISTEN state of the master socket is inherited into the server state of the child socket. Thus, the only states visible to CCIDs now are OPEN/PARTOPEN, and the closing states. This allows to remove tests which were previously necessary to protect against referencing a socket in the listening state (in CCID-3), but which now have become redundant. As a further byproduct of enabling the CCIDs only after the connection has been fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock can now be eliminated: * the CCID is loaded, so it is not necessary to test if it is NULL, * if it is possible to load a CCID and leave the private area NULL, then this is a bug, which should crash loudly - and earlier, * the test for state==OPEN || state==PARTOPEN now reduces only to the closing phase (e.g. when the node has received an unexpected Reset). Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gerrit Renker 提交于
This patch collects cosmetics-only changes to separate these from code changes: * update with regard to CodingStyle and whitespace changes, * documentation: - adding/revising comments, - remove CCID-3 RX socket documentation which is either duplicate or refers to fields that no longer exist, * expand embedded tfrc_tx_info struct inline for consistency, removing indirections via #define. Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 8月, 2010 5 次提交
-
-
由 David S. Miller 提交于
SKBs can be "fragmented" in two ways, via a page array (called skb_shinfo(skb)->frags[]) and via a list of SKBs (called skb_shinfo(skb)->frag_list). Since skb_has_frags() tests the latter, it's name is confusing since it sounds more like it's testing the former. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hagen Paul Pfeifer 提交于
Via setsockopt it is possible to reduce the socket RX buffer (SO_RCVBUF). TCP method to select the initial window and window scaling option in tcp_select_initial_window() currently misbehaves and do not consider a reduced RX socket buffer via setsockopt. Even though the server's RX buffer is reduced via setsockopt() to 256 byte (Initial Window 384 byte => 256 * 2 - (256 * 2 / 4)) the window scale option is still 7: 192.168.1.38.40676 > 78.47.222.210.5001: Flags [S], seq 2577214362, win 5840, options [mss 1460,sackOK,TS val 338417 ecr 0,nop,wscale 0], length 0 78.47.222.210.5001 > 192.168.1.38.40676: Flags [S.], seq 1570631029, ack 2577214363, win 384, options [mss 1452,sackOK,TS val 2435248895 ecr 338417,nop,wscale 7], length 0 192.168.1.38.40676 > 78.47.222.210.5001: Flags [.], ack 1, win 5840, options [nop,nop,TS val 338421 ecr 2435248895], length 0 Within tcp_select_initial_window() the original space argument - a representation of the rx buffer size - is expanded during tcp_select_initial_window(). Only sysctl_tcp_rmem[2], sysctl_rmem_max and window_clamp are considered to calculate the initial window. This patch adjust the window_clamp argument if the user explicitly reduce the receive buffer. Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
While looking at using netdev_rx_handler_register for openvswitch Jesse Gross suggested that an unlikely() might be worthwhile in that code. I'm interested to see if its appropriate for the bridge code. Cc: Jesse Gross <jesse@nicira.com> Signed-off-by: NSimon Horman <horms@verge.net.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
vlan_hwaccel_do_receive() always returns 0, so make it return void. Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Rothwell 提交于
for the declararion of csum_ipv6_magic. Fixes this build error on PowerPC (at least): net/sched/act_csum.c: In function 'tcf_csum_ipv6_icmp': net/sched/act_csum.c:178: error: implicit declaration of function 'csum_ipv6_magic' Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 8月, 2010 4 次提交
-
-
由 Changli Gao 提交于
We can use rxhash to classify the traffic into flows. As rxhash maybe supplied by NIC or RPS, it is cheaper. Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Acked-by: NJamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
struct net_device has its own struct net_device_stats member, so use this one instead of a private copy in the irlan_cb struct. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Kozlov 提交于
PPP: introduce "pptp" module which implements point-to-point tunneling protocol using pppox framework NET: introduce the "gre" module for demultiplexing GRE packets on version criteria (required to pptp and ip_gre may coexists) NET: ip_gre: update to use the "gre" module This patch introduces then pptp support to the linux kernel which dramatically speeds up pptp vpn connections and decreases cpu usage in comparison of existing user-space implementation (poptop/pptpclient). There is accel-pptp project (https://sourceforge.net/projects/accel-pptp/) to utilize this module, it contains plugin for pppd to use pptp in client-mode and modified pptpd (poptop) to build high-performance pptp NAS. There was many changes from initial submitted patch, most important are: 1. using rcu instead of read-write locks 2. using static bitmap instead of dynamically allocated 3. using vmalloc for memory allocation instead of BITS_PER_LONG + __get_free_pages 4. fixed many coding style issues Thanks to Eric Dumazet. Signed-off-by: NDmitry Kozlov <xeb@mail.ru> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
__skb_get_rxhash() was broken after the commit: commit bfb564e7 Author: Krishna Kumar <krkumar2@in.ibm.com> Date: Wed Aug 4 06:15:52 2010 +0000 core: Factor out flow calculation from get_rps_cpu Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 8月, 2010 9 次提交
-
-
由 Grégoire Baron 提交于
net/sched: add ACT_CSUM action to update packets checksums ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some altered checksums in IPv4 and IPv6 packets. The following checksums are supported by this patch: - IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite - IPv6: ICMPv6, TCP, UDP & UDPLite It's possible to request in the same action to update different kind of checksums, if the packets flow mix TCP, UDP and UDPLite, ... An example of usage is done in the associated iproute2 patch. Version 3 changes: - remove useless goto instructions - improve IPv6 hop options decoding Version 2 changes: - coding style correction - remove useless arguments of some functions - use stack in tcf_csum_dump() - add tcf_csum_skb_nextlayer() to factor code Signed-off-by: NGregoire Baron <baronchon@n7mm.org> Acked-by: Njamal <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Now cmpxchg() is available on all arches, we can use it in build_ehash_secret() and rt_bind_peer() instead of using spinlocks. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
The SPI isn't at the beginning of an AH message. Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
Fragmented IP packets may have no transfer header, so when computing rxhash, we should skip them. Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
skb_get_rxhash() assumes the network header pointer of the skb is set properly after the commit: commit bfb564e7 Author: Krishna Kumar <krkumar2@in.ibm.com> Date: Wed Aug 4 06:15:52 2010 +0000 core: Factor out flow calculation from get_rps_cpu Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 8月, 2010 9 次提交
-
-
由 Eric Dumazet 提交于
After skb is queued, its illegal to dereference it. Cache skb->len into a temporary variable. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Phil Oester 提交于
When adding a new vlan, if the underlying interface has no carrier, then the newly added vlan interface should also have no carrier. At present, this is not true - the newly added vlan is added with carrier up. Fix by checking state of real device. Signed-off-by: NPhil Oester <kernel@linuxace.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
No need to clear device stats in lec_open() Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Oliver Hartkopp 提交于
This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data. The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try(). http://marc.info/?l=linux-netdev&m=128084897415886&w=2Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
struct rds_rdma_notify contains a 32 bits hole on 64bit arches, make sure it is zeroed before copying it to user. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Andy Grover <andy.grover@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johannes Berg 提交于
Since commit 1dacc76d Author: Johannes Berg <johannes@sipsolutions.net> Date: Wed Jul 1 11:26:02 2009 +0000 net/compat/wext: send different messages to compat tasks we had a race condition when setting and then restoring frag_list. Eric attempted to fix it, but the fix created even worse problems. However, the original motivation I had when I added the code that turned out to be racy is no longer clear to me, since we only copy up to skb->len to userspace, which doesn't include the frag_list length. As a result, not doing any frag_list clearing and restoring avoids the race condition, while not introducing any other problems. Additionally, while preparing this patch I found that since none of the remaining netlink code is really aware of the frag_list, we need to use the original skb's information for packet information and credentials. This fixes, for example, the group information received by compat tasks. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org [2.6.31+, for 2.6.35 revert 1235f504] Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julia Lawall 提交于
Error codes are stored in err, but the return value is always 0. Return err instead. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ local idexpression x; constant C; @@ if (...) { ... x = -C ... when != x ( return <+...x...+>; | return NULL; | return; | * return ...; ) } // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Acked-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julia Lawall 提交于
Error codes are stored in err, but the return value is always 0. Return err instead. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ local idexpression x; constant C; @@ if (...) { ... x = -C ... when != x ( return <+...x...+>; | return NULL; | return; | * return ...; ) } // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Acked-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Carpenter 提交于
There is no need to check "s". nla_data() doesn't return NULL. Also we already dereferenced "s" at this point so it would have oopsed ealier if it were NULL. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 8月, 2010 5 次提交
-
-
由 Jarek Poplawski 提交于
>Xin Xiaohui wrote: > I looked into the code dev_gro_receive(), found the code here: > if the frags[0] is pulled to 0, then the page will be released, > and memmove() frags left. > Is that right? I'm not sure if memmove do right or not, but > frags[0].size is never set after memove at least. what I think > a simple way is not to do anything if we found frags[0].size == 0. > The patch is as followed. ... This version of the patch fixes the bug directly in memmove. Reported-by: N"Xin, Xiaohui" <xiaohui.xin@intel.com> Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Allan Stephens 提交于
Ensure that TIPC does not re-establish communication with a neighboring node until it has finished updating all data structures containing information about that node to reflect the earlier loss of contact. Previously, it was possible for TIPC to perform its purge of name table entries relating to the node once contact had already been re-established, resulting in the unwanted removal of valid name table entries. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Allan Stephens 提交于
Cause a socket whose TIPC_CONN_TIMEOUT option is zero to wait indefinitely for a response to a connection request using connect(). Previously, specifying a timeout of 0 ms resulted in an immediate timeout, which was inconsistent with the behavior specified by Posix for a socket's receive and send timeout. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Allan Stephens 提交于
Eliminate printing of dashes after name table column headers (to adhere more closely to the standard format used in tipc-config), and simplify name table display logic using array lookups rather than if-then-else logic. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Allan Stephens 提交于
Eliminate unnecessary checking for null node pointer and redundant check of second active link array entry. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-