- 02 9月, 2009 1 次提交
-
-
由 Stephen Hemminger 提交于
These tables are never modified at runtime. Move to read-only section. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 9月, 2009 4 次提交
-
-
由 Damian Lukowski 提交于
RFC 1122 specifies two threshold values R1 and R2 for connection timeouts, which may represent a number of allowed retransmissions or a timeout value. Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds in number of allowed retransmissions. For any desired threshold R2 (by means of time) one can specify tcp_retries2 (by means of number of retransmissions) such that TCP will not time out earlier than R2. This is the case, because the RTO schedule follows a fixed pattern, namely exponential backoff. However, the RTO behaviour is not predictable any more if RTO backoffs can be reverted, as it is the case in the draft "Make TCP more Robust to Long Connectivity Disruptions" (http://tools.ietf.org/html/draft-zimmermann-tcp-lcd). In the worst case TCP would time out a connection after 3.2 seconds, if the initial RTO equaled MIN_RTO and each backoff has been reverted. This patch introduces a function retransmits_timed_out(N), which calculates the timeout of a TCP connection, assuming an initial RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions. Whenever timeout decisions are made by comparing the retransmission counter to some value N, this function can be used, instead. The meaning of tcp_retries2 will be changed, as many more RTO retransmissions can occur than the value indicates. However, it yields a timeout which is similar to the one of an unpatched, exponentially backing off TCP in the same scenario. As no application could rely on an RTO greater than MIN_RTO, there should be no risk of a regression. Signed-off-by: NDamian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Damian Lukowski 提交于
Here, an ICMP host/network unreachable message, whose payload fits to TCP's SND.UNA, is taken as an indication that the RTO retransmission has not been lost due to congestion, but because of a route failure somewhere along the path. With true congestion, a router won't trigger such a message and the patched TCP will operate as standard TCP. This patch reverts one RTO backoff, if an ICMP host/network unreachable message, whose payload fits to TCP's SND.UNA, arrives. Based on the new RTO, the retransmission timer is reset to reflect the remaining time, or - if the revert clocked out the timer - a retransmission is sent out immediately. Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if there have been retransmissions and reversible backoffs, already. Changes from v2: 1) Renaming of skb in tcp_v4_err() moved to another patch. 2) Reintroduced tcp_bound_rto() and __tcp_set_rto(). 3) Fixed code comments. Signed-off-by: NDamian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Damian Lukowski 提交于
This supplementary patch renames skb to icmp_skb in tcp_v4_err() in order to disambiguate from another sk_buff variable, which will be introduced in a separate patch. Signed-off-by: NDamian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 8月, 2009 6 次提交
-
-
由 John Dykstra 提交于
Remove the copy of the MD5 authentication key from tcp_check_req(). This key has already been copied by tcp_v4_syn_recv_sock() or tcp_v6_syn_recv_sock(). Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Octavian Purdila 提交于
There is a race condition in the time-wait sockets code that can lead to premature termination of FIN_WAIT2 and, subsequently, to RST generation when the FIN,ACK from the peer finally arrives: Time TCP header 0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0 0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=... 0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture] 0.136934 HTTP/1.0 200 OK [Packet size limited during capture] 0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521... 0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=... 0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=... 0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151... 0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0 Say twdr->slot = 1 and we are running inet_twdr_hangman and in this instance inet_twdr_do_twkill_work returns 1. At that point we will mark slot 1 and schedule inet_twdr_twkill_work. We will also make twdr->slot = 2. Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo) is called which will create a new FIN_WAIT2 time-wait socket and will place it in the last to be reached slot, i.e. twdr->slot = 1. At this point say inet_twdr_twkill_work will run which will start destroying the time-wait sockets in slot 1, including the just added TCP_FIN_WAIT2 one. To avoid this issue we increment the slot only if all entries in the slot have been purged. This change may delay the slots cleanup by a time-wait death row period but only if the worker thread didn't had the time to run/purge the current slot in the next period (6 seconds with default sysctl settings). However, on such a busy system even without this change we would probably see delays... Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jens Låås 提交于
Here is rework and cleanup of the resize function. Some bugs we had. We were using ->parent when we should use node_parent(). Also we used ->parent which is not assigned by inflate in inflate loop. Also a fix to set thresholds to power 2 to fit halve and double strategy. max_resize is renamed to max_work which better indicates it's function. Reaching max_work is not an error, so warning is removed. max_work only limits amount of work done per resize. (limits CPU-usage, outstanding memory etc). The clean-up makes it relatively easy to add fixed sized root-nodes if we would like to decrease the memory pressure on routers with large routing tables and dynamic routing. If we'll need that... Its been tested with 280k routes. Work done together with Robert Olsson. Signed-off-by: NJens Låås <jens.laas@its.uu.se> Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
While doing some forwarding benchmarks, I noticed ip_rt_send_redirect() is rather expensive, even if send_redirects is false for the device. Fix is to avoid two atomic ops, we dont really need to take a reference on in_dev Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Introduce keepalive_probes(tp) helper, and use it, like keepalive_time_when(tp) and keepalive_intvl_when(tp) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 8月, 2009 1 次提交
-
-
由 Tom Goff 提交于
The GRE header length should be subtracted when the tunnel MTU is calculated. This just corrects for the associativity change introduced by commit 42aa9162 ("gre: Move MTU setting out of ipgre_tunnel_bind_dev"). Signed-off-by: NTom Goff <thomas.goff@boeing.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2009 1 次提交
-
-
由 Jan Engelhardt 提交于
String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: NJan Engelhardt <jengelh@medozas.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 8月, 2009 1 次提交
-
-
由 Randy Dunlap 提交于
Fix build errors when SYSCTLs are not enabled: (.init.text+0x5154): undefined reference to `net_ipv4_ctl_path' (.init.text+0x5176): undefined reference to `register_net_sysctl_table' xfrm4_policy.c:(.exit.text+0x573): undefined reference to `unregister_net_sysctl_table Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 7月, 2009 2 次提交
-
-
由 Neil Horman 提交于
Choose saner defaults for xfrm[4|6] gc_thresh values on init Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values (set to 1024). Given that the ipv4 and ipv6 routing caches are sized dynamically at boot time, the static selections can be non-sensical. This patch dynamically selects an appropriate gc threshold based on the corresponding main routing table size, using the assumption that we should in the worst case be able to handle as many connections as the routing table can. For ipv4, the maximum route cache size is 16 * the number of hash buckets in the route cache. Given that xfrm4 starts garbage collection at the gc_thresh and prevents new allocations at 2 * gc_thresh, we set gc_thresh to half the maximum route cache size. For ipv6, its a bit trickier. there is no maximum route cache size, but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems sane to select a simmilar gc_thresh for the xfrm6 code that is half the number of hash buckets in the v6 route cache times 16 (like the v4 code does). Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 roel kluin 提交于
If arp_format_neigh_entry() can be called with n->dev->addr_len == 0, then a write to hbuffer[-1] occurs. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 7月, 2009 1 次提交
-
-
由 Neil Horman 提交于
Export garbage collector thresholds for xfrm[4|6]_dst_ops Had a problem reported to me recently in which a high volume of ipsec connections on a system began reporting ENOBUFS for new connections eventually. It seemed that after about 2000 connections we started being unable to create more. A quick look revealed that the xfrm code used a dst_ops structure that limited the gc_thresh value to 1024, and always dropped route cache entries after 2x the gc_thresh. It seems the most direct solution is to export the gc_thresh values in the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so that higher volumes of connections can be supported. This patch has been tested and allows the reporter to increase their ipsec connection volume successfully. Reported-by: NJoe Nall <joe@nall.com> Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> ipv4/xfrm4_policy.c | 18 ++++++++++++++++++ ipv6/xfrm6_policy.c | 18 ++++++++++++++++++ 2 files changed, 36 insertions(+) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 7月, 2009 1 次提交
-
-
由 Andi Kleen 提交于
While looking for something else I spent some time adding one liner comments to the tcp_output.c functions that didn't have any. That makes the comments more consistent. I hope I documented everything right. No code changes. v2: Incorporated feedback from Ilpo. v3: Change style of one liner comments, add a few more comments. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2009 5 次提交
-
-
由 John Dykstra 提交于
When the TCP connection handshake completes on the passive side, a variety of state must be set up in the "child" sock, including the key if MD5 authentication is being used. Fix TCP for both address families to label the key with the peer's destination address, rather than the address from the listening sock, which is usually the wildcard. Reported-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Dykstra 提交于
Fix MD5 signature checking so that an IPv4 active open to an IPv6 socket can succeed. In particular, use the correct address family's signature generation function for the SYN/ACK. Reported-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarek Poplawski 提交于
While looking for other fib_trie problems reported by Pawel Staszewski I noticed there are a few uses of tnode_get_child() and node_parent() in lookups instead of their rcu versions. Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarek Poplawski 提交于
During large updates there could be triggered warnings like: "Fix inflate_threshold_root. Now=25 size=11 bits" if inflate() of the root node isn't finished in 10 loops. It should be much rarer now, after changing the threshold from 15 to 25, and a temporary problem, so this patch tries to handle it automatically using a fix variable to increase by one inflate threshold for next root resizes (up to the 35 limit, max fix = 10). The fix variable is decreased when root's inflate() finishes below 7 loops (even if some other, smaller table/ trie is updated -- for simplicity the fix variable is global for now). Reported-by: NPawel Staszewski <pstaszewski@itcare.pl> Reported-by: NJorge Boncompte [DTI2] <jorge@dti2.net> Tested-by: NPawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarek Poplawski 提交于
During trie_rebalance() we free memory after resizing with call_rcu(), but large updates, especially with PREEMPT_NONE configs, can cause memory stresses, so this patch calls synchronize_rcu() in tnode_free_flush() after each sync_pages to guarantee such freeing (especially before resizing the root node). The value of sync_pages = 128 is based on Pawel Staszewski's tests as the lowest which doesn't hinder updating times. (For testing purposes there was a sysfs module parameter to change it on demand, but it's removed until we're sure it could be really useful.) The patch is based on suggestions by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: NPawel Staszewski <pstaszewski@itcare.pl> Tested-by: NPawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 7月, 2009 1 次提交
-
-
由 Eric Dumazet 提交于
Pure style cleanups. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 7月, 2009 1 次提交
-
-
由 Andreas Jaggi 提交于
Fixes two bugs: - ToS/DiffServ inheritance was unintentionally activated when using impair fixed ToS values - ECN bit was lost during ToS/DiffServ inheritance Signed-off-by: NAndreas Jaggi <aj@open.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 7月, 2009 1 次提交
-
-
由 Sridhar Samudrala 提交于
- validate and forward GSO UDP/IPv4 packets from untrusted sources. - do software UFO if the outgoing device doesn't support UFO. Signed-off-by: NSridhar Samudrala <sri@us.ibm.com> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2009 1 次提交
-
-
由 Eric Dumazet 提交于
After commit 2b85a34e (net: No more expensive sock_hold()/sock_put() on each tx) we do not take any more references on sk->sk_refcnt on outgoing packets. I forgot to delete two __sock_put() from ip_push_pending_frames() and ip6_push_pending_frames(). Reported-by: NEmil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Tested-by: NEmil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 7月, 2009 1 次提交
-
-
由 Jiri Olsa 提交于
Adding memory barrier after the poll_wait function, paired with receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper to wrap the memory barrier. Without the memory barrier, following race can happen. The race fires, when following code paths meet, and the tp->rcv_nxt and __add_wait_queue updates stay in CPU caches. CPU1 CPU2 sys_select receive packet ... ... __add_wait_queue update tp->rcv_nxt ... ... tp->rcv_nxt check sock_def_readable ... { schedule ... if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible(sk->sk_sleep) ... } If there was no cache the code would work ok, since the wait_queue and rcv_nxt are opposit to each other. Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already passed the tp->rcv_nxt check and sleeps, or will get the new value for tp->rcv_nxt and will return with new data mask. In both cases the process (CPU1) is being added to the wait queue, so the waitqueue_active (CPU2) call cannot miss and will wake up CPU1. The bad case is when the __add_wait_queue changes done by CPU1 stay in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then endup calling schedule and sleep forever if there are no more data on the socket. Calls to poll_wait in following modules were ommited: net/bluetooth/af_bluetooth.c net/irda/af_irda.c net/irda/irnet/irnet_ppp.c net/mac80211/rc80211_pid_debugfs.c net/phonet/socket.c net/rds/af_rds.c net/rfkill/core.c net/sunrpc/cache.c net/sunrpc/rpc_pipe.c net/tipc/socket.c Signed-off-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 7月, 2009 1 次提交
-
-
由 Jarek Poplawski 提交于
Pawel Staszewski wrote: <blockquote> Some time ago i report this: http://bugzilla.kernel.org/show_bug.cgi?id=6648 and now with 2.6.29 / 2.6.29.1 / 2.6.29.3 and 2.6.30 it back dmesg output: oprofile: using NMI interrupt. Fix inflate_threshold_root. Now=15 size=11 bits ... Fix inflate_threshold_root. Now=15 size=11 bits cat /proc/net/fib_triestat Basic info: size of leaf: 40 bytes, size of tnode: 56 bytes. Main: Aver depth: 2.28 Max depth: 6 Leaves: 276539 Prefixes: 289922 Internal nodes: 66762 1: 35046 2: 13824 3: 9508 4: 4897 5: 2331 6: 1149 7: 5 9: 1 18: 1 Pointers: 691228 Null ptrs: 347928 Total size: 35709 kB </blockquote> It seems, the current threshold for root resizing is too aggressive, and it causes misleading warnings during big updates, but it might be also responsible for memory problems, especially with non-preempt configs, when RCU freeing is delayed long after call_rcu. It should be also mentioned that because of non-atomic changes during resizing/rebalancing the current lookup algorithm can miss valid leaves so it's additional argument to shorten these activities even at a cost of a minimally longer searching. This patch restores values before the patch "[IPV4]: fib_trie root node settings", commit: 965ffea4 from v2.6.22. Pawel's report: <blockquote> I dont see any big change of (cpu load or faster/slower routing/propagating routes from bgpd or something else) - in avg there is from 2% to 3% more of CPU load i dont know why but it is - i change from "preempt" to "no preempt" 3 times and check this my "mpstat -P ALL 1 30" always avg cpu load was from 2 to 3% more compared to "no preempt" [...] cat /proc/net/fib_triestat Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes. Main: Aver depth: 2.44 Max depth: 6 Leaves: 277814 Prefixes: 291306 Internal nodes: 66420 1: 32737 2: 14850 3: 10332 4: 4871 5: 2313 6: 942 7: 371 8: 3 17: 1 Pointers: 599098 Null ptrs: 254865 Total size: 18067 kB </blockquote> According to this and other similar reports average depth is slightly increased (~0.2), and root nodes are shorter (log 17 vs. 18), but there is no visible performance decrease. So, until memory handling is improved or added parameters for changing this individually, this patch resets to safer defaults. Reported-by: NPawel Staszewski <pstaszewski@itcare.pl> Reported-by: NJorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Tested-by: NPawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2009 1 次提交
-
-
由 Patrick McHardy 提交于
This patch is the result of an automatic spatch transformation to convert all ndo_start_xmit() return values of 0 to NETDEV_TX_OK. Some occurences are missed by the automatic conversion, those will be handled in a seperate patch. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 7月, 2009 1 次提交
-
-
由 Wei Yongjun 提交于
The SCTP pushed the skb data above the sctp chunk header, so the check of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will never return 0 because xprth + 4 - skb->data < 0, the ports decode of sctp will always fail. Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2009 2 次提交
-
-
由 Eric W. Biederman 提交于
This reverts commit 73ce7b01. After discovering that we don't listen to gratuitious arps in 2.6.30 I tracked the failure down to this commit. The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227. are all in agreement that an arp request with sip == 0 should be used for the probe (to prevent learning) and an arp request with sip == tip should be used for the gratitous announcement that people can learn from. It appears the author of the broken patch got those two cases confused and modified the code to drop all gratuitous arp traffic. Ouch! Cc: stable@kernel.org Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarek Poplawski 提交于
Alas current delaying of freeing old tnodes by RCU in trie_rebalance is still not enough because we can free a top tnode before updating a t->trie pointer. Reported-by: NPawel Staszewski <pstaszewski@itcare.pl> Tested-by: NPawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 6月, 2009 2 次提交
-
-
由 Herbert Xu 提交于
If a socket starts out on a non-TSO route, and then switches to a TSO route, then we will tack on data to the tail of the tx queue even if it started out life as non-TSO. This is suboptimal because all of it will then be copied and checksummed unnecessarily. This patch fixes this by ensuring that skb->ip_summed is set to CHECKSUM_PARTIAL before appending extra data beyond the MSS. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
If a socket starts out on a non-TSO route, and then switches to a TSO route, then the tail on the tx queue can morph into a TSO packet, causing mischief because the rest of the stack does not expect a partially linear TSO packet. This patch fixes this by ensuring that skb->ip_summed is set to CHECKSUM_PARTIAL before declaring a packet as TSO. Reported-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2009 1 次提交
-
-
由 Patrick McHardy 提交于
When NAT helpers change the TCP packet size, the highest seen sequence number needs to be corrected. This is currently only done upwards, when the packet size is reduced the sequence number is unchanged. This causes TCP conntrack to falsely detect unacknowledged data and decrease the timeout. Fix by updating the highest seen sequence number in both directions after packet mangling. Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
- 27 6月, 2009 1 次提交
-
-
由 Herbert Xu 提交于
As transparent proxying looks up the socket early and assigns it to the skb for later processing, we must drop any existing socket ownership prior to that in order to distinguish between the case where tproxy is active and where it is not. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 6月, 2009 1 次提交
-
-
由 Wei Yongjun 提交于
RFC0793 defined that in FIN-WAIT-2 state if the ACK bit is off drop the segment and return[Page 72]. But this check is missing in function tcp_timewait_state_process(). This cause the segment with FIN flag but no ACK has two diffent action: Case 1: Node A Node B <------------- FIN,ACK (enter FIN-WAIT-1) ACK -------------> (enter FIN-WAIT-2) FIN -------------> discard (move sk to tw list) Case 2: Node A Node B <------------- FIN,ACK (enter FIN-WAIT-1) ACK -------------> (enter FIN-WAIT-2) (move sk to tw list) FIN -------------> <------------- ACK This patch fixed the problem. Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 6月, 2009 1 次提交
-
-
由 Neil Horman 提交于
When route caching is disabled (rt_caching returns false), We still use route cache entries that are created and passed into rt_intern_hash once. These routes need to be made usable for the one call path that holds a reference to them, and they need to be reclaimed when they're finished with their use. To be made usable, they need to be associated with a neighbor table entry (which they currently are not), otherwise iproute_finish2 just discards the packet, since we don't know which L2 peer to send the packet to. To do this binding, we need to follow the path a bit higher up in rt_intern_hash, which calls arp_bind_neighbour, but not assign the route entry to the hash table. Currently, if caching is off, we simply assign the route to the rp pointer and are reutrn success. This patch associates us with a neighbor entry first. Secondly, we need to make sure that any single use routes like this are known to the garbage collector when caching is off. If caching is off, and we try to hash in a route, it will leak when its refcount reaches zero. To avoid this, this patch calls rt_free on the route cache entry passed into rt_intern_hash. This places us on the gc list for the route cache garbage collector, so that when its refcount reaches zero, it will be reclaimed (Thanks to Alexey for this suggestion). I've tested this on a local system here, and with these patches in place, I'm able to maintain routed connectivity to remote systems, even if I set /proc/sys/net/ipv4/rt_cache_rebuild_count to -1, which forces rt_caching to return false. Signed-off-by: NNeil Horman <nhorman@redhat.com> Reported-by: NJarek Poplawski <jarkao2@gmail.com> Reported-by: NMaxime Bizon <mbizon@freebox.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 6月, 2009 1 次提交
-
-
由 Neil Horman 提交于
Don't drop route if we're not caching I recently got a report of an oops on a route lookup. Maxime was testing what would happen if route caching was turned off (doing so by setting making rt_caching always return 0), and found that it triggered an oops. I looked at it and found that the problem stemmed from the fact that the route lookup routines were returning success from their lookup paths (which is good), but never set the **rp pointer to anything (which is bad). This happens because in rt_intern_hash, if rt_caching returns false, we call rt_drop and return 0. This almost emulates slient success. What we should be doing is assigning *rp = rt and _not_ dropping the route. This way, during slow path lookups, when we create a new route cache entry, we don't immediately discard it, rather we just don't add it into the cache hash table, but we let this one lookup use it for the purpose of this route request. Maxime has tested and reports it prevents the oops. There is still a subsequent routing issue that I'm looking into further, but I'm confident that, even if its related to this same path, this patch makes sense to take. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-