- 11 10月, 2007 40 次提交
-
-
由 Eric W. Biederman 提交于
Currently we never call unregister_netdev for the loopback device so it is impossible for us to reach inetdev_destroy with the loopback device. So the test in inetdev_destroy is unnecessary. Further when testing with my network namespace patches removing unregistering the loopback device and calling inetdev_destroy works fine so there appears to be no reason for avoiding unregistering the loopback device. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
In case of ACK reordering, the SACK block might be valid in it's time but is already obsoleted since we've received another kind of confirmation about arrival of the segments through snd_una advancement of an earlier packet. I didn't bother to build distinguishing of valid and invalid SACK blocks but simply made reordered SACK blocks that are too old always not counted regardless of their "real" validity which could be determined by using the ack field of the reordered packet (won't be significant IMHO). DSACKs can very well be considered useful even in this situation, so won't do any of this for them. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
I previously added checking to position that is rather poor as state has already been adjusted quite a bit. Re-placing it above all state changes should be more robust though the return should never ever get executed regardless of its place :-). Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Lezcano 提交于
This patch replaces all occurences to the static variable loopback_dev to a pointer loopback_dev. That provides the mindless, trivial, uninteressting change part for the dynamic allocation for the loopback. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com> Acked-By: NKirill Korotaev <dev@sw.ru> Acked-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Most of the description that follows comes from my mail to netdev (some editing done): Main obstacle to FRTO use is its deployment as it has to be on the sender side where as wireless link is often the receiver's access link. Take initiative on behalf of unlucky receivers and enable it by default in future Linux TCP senders. Also IETF seems to interested in advancing FRTO from experimental [1]. How does FRTO help? =================== FRTO detects spurious RTOs and avoids a number of unnecessary retransmissions and a couple of other problems that can arise due to incorrect guess made at RTO (i.e., that segments were lost when they actually got delayed which is likely to occur e.g. in wireless environments with link-layer retransmission). Though FRTO cannot prevent the first (potentially unnecessary) retransmission at RTO, I suspect that it won't cost that much even if you have to pay for each bit (won't be that high percentage out of all packets after all :-)). However, usually when you have a spurious RTO, not only the first segment unnecessarily retransmitted but the *whole window*. It goes like this: all cumulative ACKs got delayed due to in-order delivery, then TCP will actually send 1.5*original cwnd worth of data in the RTO's slow-start when the delayed ACKs arrive (basically the original cwnd worth of it unnecessarily). In case one is interested in minimizing unnecessary retransmissions e.g. due to cost, those rexmissions must never see daylight. Besides, in the worst case the generated burst overloads the bottleneck buffers which is likely to significantly delay the further progress of the flow. In case of ll rexmissions, ACK compression often occurs at the same time making the burst very "sharp edged" (in that case TCP often loses most of the segments above high_seq => very bad performance too). When FRTO is enabled, those unnecessary retransmissions are fully avoided except for the first segment and the cwnd behavior after detected spurious RTO is determined by the response (one can tune that by sysctl). Basic version (non-SACK enhanced one), FRTO can fail to detect spurious RTO as spurious and falls back to conservative behavior. ACK lossage is much less significant than reordering, usually the FRTO can detect spurious RTO if at least 2 cumulative ACKs from original window are preserved (excluding the ACK that advances to high_seq). With SACK-enhanced version, the detection is quite robust. FRTO should remove the need to set a high lower bound for the RTO estimator due to delay spikes that occur relatively common in some environments (esp. in wireless/cellular ones). [1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.htmlSigned-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Basically this change enables it, previously other undo_marker users were left with nothing. Reverse undo_marker logic completely to get it set right in CA_Loss. On the other hand, when spurious RTO is detected, clear it. Clearing might be too heavy for some scenarios but seems safe enough starting point for now and shouldn't have much effect except in majority of cases (if in any). By adding a new FLAG_ we avoid looping through write_queue when RTO occurs. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Implements following cleanups: - Comment re-placement (CodingStyle) - tcp_tso_acked() local (wrapper-like) variable removal (readability) - __-types removed (IMHO they make local variables jumpy looking and just was space) - acked -> flag (naming conventions elsewhere in TCP code) - linebreak adjustments (readability) - nested if()s combined (reduced indentation) - clarifying newlines added Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
The accounting code is pretty much the same, so it's a shame we do it in two places. I'm not too sure if added fully_acked check in MTU probing is really what we want perhaps the added end_seq could be used in the after() comparison. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
In addition, fix its function comment spacing. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
-
由 Ilpo Järvinen 提交于
Substraction for fackets_out is unconditional when snd_una advances, thus there's no need to do it inside the loop. Just make sure correct bounds are honored. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
In general, it should not be necessary to call tcp_fragment for already SACKed skbs, but it's better to be safe than sorry. And indeed, it can be called from sacktag when a DSACK arrives or some ACK (with SACK) reordering occurs (sacktag could be made to avoid the call in the latter case though I'm not sure if it's worth of the trouble and added complexity to cover such marginal case). The collapse case has return for SACKED_ACKED case earlier, so just WARN_ON if internal inconsistency is detected for some reason. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rick Jones 提交于
Return some useful information such as the maximum listen backlog and the current listen backlog in the tcp_info structure and INET_DIAG_INFO. Signed-off-by: NRick Jones <rick.jones2@hp.com> Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David L Stevens 提交于
Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. Signed-off-by: NDavid L Stevens <dlstevens@us.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Denis Cheng 提交于
Signed-off-by: NDenis Cheng <crquan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
I was looking at Patrick's fix to inet_diag and it occured to me that we're using a pointer argument to return values unnecessarily in netlink_run_queue. Changing it to return the value will allow the compiler to generate better code since the value won't have to be memory-backed. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Denis V. Lunev 提交于
This patch slightly cleanups FIB rules framework. rules_list as a pointer on struct fib_rules_ops is useless. It is always assigned with a static per/subsystem list in IPv4, IPv6 and DecNet. Signed-off-by: NDenis V. Lunev <den@openvz.org> Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Moore 提交于
The bulk of the CIPSO option parsing/processing in the cipso_v4_sock_getattr() and cipso_v4_skb_getattr() functions are identical, the only real difference being where the functions obtain the CIPSO option itself. This patch creates a new function, cipso_v4_getattr(), which contains the common CIPSO option parsing/processing code and modifies the existing functions to call this new helper function. Signed-off-by: NPaul Moore <paul.moore@hp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ralf Baechle 提交于
It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: NRalf Baechle <ralf@linux-mips.org> Signed-off-by: NJeff Garzik <jeff@garzik.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
On loaded/big hosts, rt_check_expire() if of litle use, because it generally breaks out of its main loop because of a jiffies change. It can take a long time (read : timer invocations) to actually scan the whole hash table, freeing unused entries. Converting it to use a workqueue instead of softirq is a nice move because we can allow rt_check_expire() to do the scan it is supposed to do, without hogging the CPU. This has an impact on the average number of entries in cache, reducing ram usage. Cache is more responsive to parameter changes (/proc/sys/net/ipv4/route/gc_timeout and /proc/sys/net/ipv4/route/gc_interval) Note: Maybe the default value of gc_interval (60 seconds) is too high, since this means we actually need 5 (300/60) invocations to scan the whole table. Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Graf 提交于
This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
This patch modifies every packet receive function registered with dev_add_pack() to drop packets if they are not from the initial network namespace. This should ensure that the various network stacks do not receive packets in a anything but the initial network namespace until the code has been converted and is ready for them. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ] Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Hopefully captured all single statement cases under net/. I'm not too sure if there is some policy about #includes that are "guaranteed" (ie., in the current tree) to be available through some other #included header, so I just added linux/kernel.h to each changed file that didn't #include it previously. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Masahide NAKAMURA 提交于
IPv4 IPsec tunnel gateway incorrectly sends redirect to sender if it is onlink host when network device the IPsec tunnelled packet is arrived is the same as the one the decapsulated packet is sent. With this patch, it omits to send the redirect when the forwarding skbuff carries secpath, since such skbuff should be assumed as a decapsulated packet from IPsec tunnel by own. Request for comments: Alternatively we'd have another way to change net/ipv4/route.c (__mkroute_input) to use RTCF_DOREDIRECT flag unless skbuff has no secpath. It is better than this patch at performance point of view because IPv4 redirect judgement is done at routing slow-path. However, it should be taken care of resource changes between SAD(XFRM states) and routing table. In other words, When IPv4 SAD is changed does the related routing entry go to its slow-path? If not, it is reasonable to apply this patch. Signed-off-by: NMasahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
This patch causes UDP port allocation to be randomized like TCP. The earlier code would always choose same port (ie first empty list). Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
In DSACK case, some events are not extraordinary, such as packet duplication generated DSACK. They can arrive easily below snd_una when undo_marker is not set (TCP being in CA_Open), counting such DSACKs amoung SACK discards will likely just mislead if they occur in some scenario when there are other problems as well. Similarly, excessively delayed packets could cause "normal" DSACKs. Therefore, separate counters are allocated for DSACK events. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
SACK processing code has been a sort of russian roulette as no validation of SACK blocks is previously attempted. Besides, it is not very clear what all kinds of broken SACK blocks really mean (e.g., one that has start and end sequence numbers reversed). So now close the roulette once and for all. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Only thing that tiny function does is rearming the RTO (if necessary), name it accordingly. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Makes caller side more obvious, there's no need to have a wrapper for this oneliner! Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
This patch converts the messy macro for MASK_PFX to inline function and expands TKEY_GET_MASK in the one place it is used. Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Try this out: * replace macro's with inlines * get rid of places doing multiple evaluations of NODE_PARENT [akpm@linux-foundation.org: rcu_dereference wants an lval] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilpo Järvinen 提交于
Previously code had IsReno/IsFack defined as macros that were local to tcp_input.c though sack_ok field has user elsewhere too for the same purpose. This changes them to static inlines as preferred according the current coding style and unifies the access to sack_ok across multiple files. Magic bitops of sack_ok for FACK and DSACK are also abstracted to functions with appropriate names. Note: - One sack_ok = 1 remains but that's self explanary, i.e., it enables sack - Couple of !IsReno cases are changed to tcp_is_sack - There were no users for IsDSack => I dropped it Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-