- 03 7月, 2008 1 次提交
-
-
由 Pavel Emelyanov 提交于
There are some places in TCP that select one MIB index to bump snmp statistics like this: if (<something>) NET_INC_STATS_BH(<some_id>); else if (<something_else>) NET_INC_STATS_BH(<some_other_id>); ... else NET_INC_STATS_BH(<default_id>); or in a more tricky but still similar way. On the other hand, this NET_INC_STATS_BH is a camouflaged increment of percpu variable, which is not that small. Factoring those cases out de-bloats 235 bytes on non-preemptible i386 config and drives parts of the code into 80 columns. add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-235 (-235) function old new delta tcp_fastretrans_alert 1437 1424 -13 tcp_dsack_set 137 124 -13 tcp_xmit_retransmit_queue 690 676 -14 tcp_try_undo_recovery 283 265 -18 tcp_sacktag_write_queue 1550 1515 -35 tcp_update_reordering 162 106 -56 tcp_retransmit_timer 990 904 -86 Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 7月, 2008 1 次提交
-
-
由 Stephen Hemminger 提交于
Convert the sysctl values for icmp ratelimit to use milliseconds instead of jiffies which is based on kernel configured HZ. Internal kernel jiffies are not a proper unit for any userspace API. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 6月, 2008 4 次提交
-
-
由 Eli Cohen 提交于
When an SKB cannot be chained to a session, the current code attempts to "restore" its ip_summed field from lro_mgr->ip_summed. However, lro_mgr->ip_summed does not hold the original value; in fact, we'd better not touch skb->ip_summed since it is not modified by the code in the path leading to a failure to chain it. Also use a cleaer comment to the describe the ip_summed field of struct net_lro_mgr. Issue raised by Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
The problem is that while we work w/o the inet_frags.lock even read-locked the secret rebuild timer may occur (on another CPU, since BHs are still disabled in the inet_frag_find) and change the rnd seed for ipv4/6 fragments. It was caused by my patch fd9e6354 ([INET]: Omit double hash calculations in xxx_frag_intern) late in the 2.6.24 kernel, so this should probably be queued to -stable. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
I found another case where we are sending information to userspace in the wrong HZ scale. This should have been fixed back in 2.5 :-( This means an ABI change but as it stands there is no way for an application like ss to get the right value. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Miquel van Smoorenburg 提交于
The tcp_mem array which contains limits on the total amount of memory used by TCP sockets is calculated based on nr_all_pages. On a 32 bits x86 system, we should base this on the number of lowmem pages. Signed-off-by: NMiquel van Smoorenburg <miquels@cistron.nl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 6月, 2008 2 次提交
-
-
由 Ben Hutchings 提交于
Add skb_warn_if_lro() to test whether an skb was received with LRO and warn if so. Change br_forward(), ip_forward() and ip6_forward() to call it) and discard the skb if it returns true. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Hutchings 提交于
Large Receive Offload (LRO) is only appropriate for packets that are destined for the host, and should be disabled if received packets may be forwarded. It can also confuse the GSO on output. Add dev_disable_lro() function which uses the appropriate ethtool ops to disable LRO if enabled. Add calls to dev_disable_lro() in br_add_if() and functions that enable IPv4 and IPv6 forwarding. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2008 3 次提交
-
-
由 Eric Dumazet 提交于
In commits 33c732c3 ([IPV4]: Add raw drops counter) and a92aa318 ([IPV6]: Add raw drops counter), Wang Chen added raw drops counter for /proc/net/raw & /proc/net/raw6 This patch adds this capability to UDP sockets too (/proc/net/udp & /proc/net/udp6). This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also be examined for each udp socket. # grep Udp: /proc/net/snmp Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors Udp: 23971006 75 899420 16390693 146348 0 # cat /proc/net/udp sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt --- uid timeout inode ref pointer drops 75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000 --- 0 0 2358 2 ffff81082a538c80 0 111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 --- 0 0 2286 2 ffff81042dd35c80 146348 In this example, only port 111 (0x006F) was flooded by messages that user program could not read fast enough. 146348 messages were lost. Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
When generating the ip header for the transformed packet we just copy the frag_off field of the ip header from the original packet to the ip header of the new generated packet. If we receive a packet as a chain of fragments, all but the last of the new generated packets have the IP_MF flag set. We have to mask the frag_off field to only keep the IP_DF flag from the original packet. This got lost with git commit 36cf9acf ("[IPSEC]: Separate inner/outer mode processing on output") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Fix three ct_extend/NAT extension related races: - When cleaning up the extension area and removing it from the bysource hash, the nat->ct pointer must not be set to NULL since it may still be used in a RCU read side - When replacing a NAT extension area in the bysource hash, the nat->ct pointer must be assigned before performing the replacement - When reallocating extension storage in ct_extend, the old memory must not be freed immediately since it may still be used by a RCU read side Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315 and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 6月, 2008 9 次提交
-
-
由 Pavel Emelyanov 提交于
Although this hash takes addresses into account, the ehash chains can also be too long when, for instance, communications via lo occur. So, prepare the inet_hashfn to take struct net into account. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Listening-on-one-port sockets in many namespaces produce long chains in the listening_hash-es, so prepare the inet_lhashfn to take struct net into account. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Binding to some port in many namespaces may create too long chains in bhash-es, so prepare the hashfn to take struct net into account. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Every caller already has this one. The new argument is currently unused, but this will be fixed shortly. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
They both calculate the hash chain, but currently do not have a struct net pointer, so pass one there via additional argument, all the more so their callers already have such. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Currently the chain to store a UDP socket is calculated with simple (x & (UDP_HTABLE_SIZE - 1)). But taking net into account would make this calculation a bit more complex, so moving it into a function would help. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rami Rosen 提交于
1) Remove ICMP_MIN_LENGTH, as it is unused. 2) Remove unneeded tcp_v4_send_check() declaration. Signed-off-by: NRami Rosen <ramirose@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
I just noticed "cat /proc/net/raw" was buggy, missing '\n' separators. I believe this was introduced by commit 8cd850ef ([RAW]: Cleanup IPv4 raw_seq_show.) This trivial patch restores correct behavior, and applies to current Linus tree (should also be applied to stable tree as well.) Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Ingo's system is still seeing strange behavior, and he reports that is goes away if the rest of the deferred accept changes are reverted too. Therefore this reverts e4c78840 ("[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack") and 539fae89 ("[TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_thresh"). Just like the other revert, these ideas can be revisited for 2.6.27 Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 6月, 2008 1 次提交
-
-
由 Brian Haley 提交于
Change struct proto destroy function pointer to return void. Noticed by Al Viro. Signed-off-by: NBrian Haley <brian.haley@hp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 6月, 2008 1 次提交
-
-
由 David S. Miller 提交于
This reverts two changesets, ec3c0982 ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and the follow-on bug fix 9ae27e0a ("tcp: Fix slab corruption with ipv6 and tcp6fuzz"). This change causes several problems, first reported by Ingo Molnar as a distcc-over-loopback regression where connections were getting stuck. Ilpo Järvinen first spotted the locking problems. The new function added by this code, tcp_defer_accept_check(), only has the child socket locked, yet it is modifying state of the parent listening socket. Fixing that is non-trivial at best, because we can't simply just grab the parent listening socket lock at this point, because it would create an ABBA deadlock. The normal ordering is parent listening socket --> child socket, but this code path would require the reverse lock ordering. Next is a problem noticed by Vitaliy Gusev, he noted: ---------------------------------------- >--- a/net/ipv4/tcp_timer.c >+++ b/net/ipv4/tcp_timer.c >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data) > goto death; > } > >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { >+ tcp_send_active_reset(sk, GFP_ATOMIC); >+ goto death; Here socket sk is not attached to listening socket's request queue. tcp_done() will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should release this sk) as socket is not DEAD. Therefore socket sk will be lost for freeing. ---------------------------------------- Finally, Alexey Kuznetsov argues that there might not even be any real value or advantage to these new semantics even if we fix all of the bugs: ---------------------------------------- Hiding from accept() sockets with only out-of-order data only is the only thing which is impossible with old approach. Is this really so valuable? My opinion: no, this is nothing but a new loophole to consume memory without control. ---------------------------------------- So revert this thing for now. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 6月, 2008 5 次提交
-
-
由 Adrian Bunk 提交于
This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 YOSHIFUJI Hideaki 提交于
As we do for other socket/timewait-socket specific parameters, let the callers pass appropriate arguments to tcp_v{4,6}_do_calc_md5_hash(). Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
-
由 YOSHIFUJI Hideaki 提交于
We can share most part of the hash calculation code because the only difference between IPv4 and IPv6 is their pseudo headers. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
-
由 YOSHIFUJI Hideaki 提交于
Protocol is always TCP, so remove useless protocol argument. Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
-
由 YOSHIFUJI Hideaki 提交于
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
-
- 11 6月, 2008 3 次提交
-
-
由 Krzysztof Piotr Oledzki 提交于
Most legacy software do not like tables > 255 as rtm_table is u8 so tb_id is sent &0xff and it is possible to mismatch for example table 510 with table 254 (main). This patch introduces RT_TABLE_COMPAT=252 so the code uses it if tb_id > 255. It makes such old applications happy, new ones are still able to use RTA_TABLE to get a proper table id. Signed-off-by: NKrzysztof Piotr Oledzki <ole@ans.pl> Acked-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Graf 提交于
Makes people happy who try to keep a list of addresses up to date by listening to notifications. Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnaldo Carvalho de Melo 提交于
Wei Yongjun noticed that we may call reqsk_free on request sock objects where the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc where we initialize ->opt to NULL and set ->pktopts to NULL in inet6_reqsk_alloc. Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 6月, 2008 4 次提交
-
-
由 Rami Rosen 提交于
- No need to perform data_len = 0 in the switch command, since data_len is initialized to 0 in the beginning of the ipq_build_packet_message() method. - {ip,ip6}_queue: We can reach nlmsg_failure only from one place; skb is sure to be NULL when getting there; since skb is NULL, there is no need to check this fact and call kfree_skb(). Signed-off-by: NRami Rosen <ramirose@gmail.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Fabian Hugelshofer 提交于
Currently the last packet of a connection isn't accounted when its causing abnormal termination. Introduces nf_ct_kill_acct() which increments the accounting counters on conntrack kill. The new function was necessary, because there are calls to nf_ct_kill() which don't need accounting: nf_conntrack_proto_tcp.c line ~847: Kills ct and returns NF_REPEAT. We don't want to count twice. nf_conntrack_proto_tcp.c line ~880: Kills ct and returns NF_DROP. I think we don't want to count dropped packets. nf_conntrack_netlink.c line ~824: As far as I can see ctnetlink_del_conntrack() is used to destroy a conntrack on behalf of the user. There is an sk_buff, but I don't think this is an actual packet. Incrementing counters here is therefore not desired. Signed-off-by: NFabian Hugelshofer <hugelshofer2006@gmx.ch> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Encapsulate the common if (del_timer(&ct->timeout)) ct->timeout.function((unsigned long)ct) sequence in a new function. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 James Morris 提交于
The following patch implements a new "security" table for iptables, so that MAC (SELinux etc.) networking rules can be managed separately to standard DAC rules. This is to help with distro integration of the new secmark-based network controls, per various previous discussions. The need for a separate table arises from the fact that existing tools and usage of iptables will likely clash with centralized MAC policy management. The SECMARK and CONNSECMARK targets will still be valid in the mangle table to prevent breakage of existing users. Signed-off-by: NJames Morris <jmorris@namei.org> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 6月, 2008 1 次提交
-
-
由 Chris Wright 提交于
- Don't trust a length which is greater than the working buffer. An invalid length could cause overflow when calculating buffer size for decoding oid. - An oid length of zero is invalid and allows for an off-by-one error when decoding oid because the first subid actually encodes first 2 subids. - A primitive encoding may not have an indefinite length. Thanks to Wei Wang from McAfee for report. Cc: Steven French <sfrench@us.ibm.com> Cc: stable@kernel.org Acked-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NChris Wright <chrisw@sous-sol.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2008 5 次提交
-
-
由 Octavian Purdila 提交于
skb_splice_bits temporary drops the socket lock while iterating over the socket queue in order to break a reverse locking condition which happens with sendfile. This, however, opens a window of opportunity for tcp_collapse() to aggregate skbs and thus potentially free the current skb used in skb_splice_bits and tcp_read_sock. This patch fixes the problem by (re-)getting the same "logical skb" after the lock has been temporary dropped. Based on idea and initial patch from Evgeniy Polyakov. Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com> Acked-by: NEvgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sridhar Samudrala 提交于
TCP "resets sent" counter is not incremented when a TCP Reset is sent via tcp_send_active_reset(). Signed-off-by: NSridhar Samudrala <sri@us.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Denis V. Lunev 提交于
The program below just leaks the raw kernel socket int main() { int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP); struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); inet_aton("127.0.0.1", &addr.sin_addr); addr.sin_family = AF_INET; addr.sin_port = htons(2048); sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr)); return 0; } Corked packet is allocated via sock_wmalloc which holds the owner socket, so one should uncork it and flush all pending data on close. Do this in the same way as in UDP. Signed-off-by: NDenis V. Lunev <den@openvz.org> Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
This bug is able to corrupt fackets_out in very rare cases. In order for this to cause corruption: 1) DSACK in the middle of previous SACK block must be generated. 2) In order to take that particular branch, part or all of the DSACKed segment must already be SACKed so that we have that in cache in the first place. 3) The new info must be top enough so that fackets_out will be updated on this iteration. ...then fack_count is updated while skb wasn't, then we walk again that particular segment thus updating fack_count twice for a single skb and finally that value is assigned to fackets_out by tcp_sacktag_one. It is safe to call tcp_sacktag_one just once for a segment (at DSACK), no need to call again for plain SACK. Potential problem of the miscount are limited to premature entry to recovery and to inflated reordering metric (which could even cancel each other out in the most the luckiest scenarios :-)). Both are quite insignificant in worst case too and there exists also code to reset them (fackets_out once sacked_out becomes zero and reordering metric on RTO). This has been reported by a number of people, because it occurred quite rarely, it has been very evasive. Andy Furniss was able to get it to occur couple of times so that a bit more info was collected about the problem using a debug patch, though it still required lot of checking around. Thanks also to others who have tried to help here. This is listed as Bugzilla #10346. The bug was introduced by me in commit 68f8353b ([TCP]: Rewrite SACK block processing & sack_recv_cache use), I probably thought back then that there's need to scan that entry twice or didn't dare to make it go through it just once there. Going through twice would have required restoring fack_count after the walk but as noted above, I chose to drop the additional walk step altogether here. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Denis V. Lunev 提交于
IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data actually. In this case ip_flush_pending_frames should be called instead of ip6_flush_pending_frames. Signed-off-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
-