- 24 2月, 2009 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
The IP_ADVANCED_ROUTER Kconfig describes the rp_filter proc option. Recent changes added a loose mode. Instead of documenting this change too places, refer to the document describing it: Documentation/networking/ip-sysctl.txt I'm considering moving the rp_filter description away from the Kconfig file into ip-sysctl.txt. Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 2月, 2009 7 次提交
-
-
由 Eric W. Biederman 提交于
To remove the possibility of packets flying around when network devices are being cleaned up use reisger_pernet_subsys instead of register_pernet_device. Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Acked-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Recently I had a kernel panic in icmp_send during a network namespace cleanup. There were packets in the arp queue that failed to be sent and we attempted to generate an ICMP host unreachable message, but failed because icmp_sk_exit had already been called. The network devices are removed from a network namespace and their arp queues are flushed before we do attempt to shutdown subsystems so this error should have been impossible. It turns out icmp_init is using register_pernet_device instead of register_pernet_subsys. Which resulted in icmp being shut down while we still had the possibility of packets in flight, making a nasty NULL pointer deference in interrupt context possible. Changing this to register_pernet_subsys fixes the problem in my testing. Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Acked-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
While going through net/ipv4/Kconfig cleanup whitespaces. Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
The reverse path filter (rp_filter) will NOT get enabled when enabling forwarding. Read the code and tested in in practice. Most distributions do enable it in startup scripts. Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Get rid of compile warning about non-const format Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Extend existing reverse path filter option to allow strict or loose filtering. (See http://en.wikipedia.org/wiki/Reverse_path_filtering). For compatibility with existing usage, the value 1 is chosen for strict mode and 2 for loose mode. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Moore 提交于
The CIPSO protocol engine incorrectly stated that the FIPS-188 specification could be found in the kernel's Documentation directory. This patch corrects that by removing the comment and directing users to the FIPS-188 documented hosted online. For the sake of completeness I've also included a link to the CIPSO draft specification on the NetLabel website. Thanks to Randy Dunlap for spotting the error and letting me know. Signed-off-by: NPaul Moore <paul.moore@hp.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 22 2月, 2009 1 次提交
-
-
由 Herbert Xu 提交于
Our TCP stack does not set the urgent flag if the urgent pointer does not fit in 16 bits, i.e., if it is more than 64K from the sequence number of a packet. This behaviour is different from the BSDs, and clearly contradicts the purpose of urgent mode, which is to send the notification (though not necessarily the associated data) as soon as possible. Our current behaviour may in fact delay the urgent notification indefinitely if the receiver window does not open up. Simply matching BSD however may break legacy applications which incorrectly rely on the out-of-band delivery of urgent data, and conversely the in-band delivery of non-urgent data. Alexey Kuznetsov suggested a safe solution of following BSD only if the urgent pointer itself has not yet been transmitted. This way we guarantee that when the remote end sees the packet with non-urgent data marked as urgent due to wrap-around we would have advanced the urgent pointer beyond, either to the actual urgent data or to an as-yet untransmitted packet. The only potential downside is that applications on the remote end may see multiple SIGURG notifications. However, this would occur anyway with other TCP stacks. More importantly, the outcome of such a duplicate notification is likely to be harmless since the signal itself does not carry any information other than the fact that we're in urgent mode. Thanks to Ilpo Järvinen for fixing a critical bug in this and Jeff Chua for reporting that bug. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 2月, 2009 1 次提交
-
-
由 Ilpo Järvinen 提交于
This is obsolete since the passes got combined. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 2月, 2009 2 次提交
-
-
由 Thomas Gleixner 提交于
Impact: syntax fix Interestingly enough this compiles w/o any complaints: orphans = percpu_counter_sum_positive(&tcp_orphan_count), sockets = percpu_counter_sum_positive(&tcp_sockets_allocated), Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick Ohly 提交于
Instructions for time stamping outgoing packets are take from the socket layer and later copied into the new skb. Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 2月, 2009 2 次提交
-
-
由 Herbert Xu 提交于
gro: Optimise TCP packet reception As this function can be called more than half a million times for 10GbE, it's important to optimise it as much as we can. This patch uses bit ops to logical ops, as well as open coding memcmp to exploit alignment properties. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
As this function can be called more than half a million times for 10GbE, it's important to optimise it as much as we can. This patch does some obvious changes to use 2-byte and 4-byte operations instead of byte-oriented ones where possible. Bit ops are also used to replace logical ops to reduce branching. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2009 1 次提交
-
-
由 Ilpo Järvinen 提交于
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 2月, 2009 3 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Like the UDP header fix, pskb_may_pull() can potentially alter the SKB buffer. Thus the saddr and daddr, pointers may point to the old skb->data buffer. I haven't seen corruptions, as its only seen if the old skb->data buffer were reallocated by another user and written into very quickly (or poison'd by SLAB debugging). Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This reverts commit 64ff3b93. Jeff Chua reports that it breaks rlogin for him. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
The UDP header pointer assignment must happen after calling pskb_may_pull(). As pskb_may_pull() can potentially alter the SKB buffer. This was exposted by running multicast traffic through the NIU driver, as it won't prepull the protocol headers into the linear area on receive. Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 2月, 2009 1 次提交
-
-
由 Eric Dumazet 提交于
Commit 93821778 (udp: Fix rcv socket locking) accidentally removed sk_drops increments for UDP IPV4 sockets. This field can be used to detect incorrect sizing of socket receive buffers. Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 2月, 2009 2 次提交
-
-
由 Herbert Xu 提交于
sk_alloc now sets sk_family so this is redundant. In fact it caught my eye because sock_init_data already uses sk_family so this is too late anyway. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
And switch bsockets to atomic_t since it might be changed in parallel. Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Acked-by: NEvgeniy Polyakov <zbr@ioremap.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 2月, 2009 3 次提交
-
-
由 Stephen Hemminger 提交于
From: Stephen Hemminger <shemminger@vyatta.com> Fix regression introduced by a9d8f911 ("inet: Allowing more than 64k connections and heavily optimize bind(0) time.") Based upon initial patches and feedback from Evegniy Polyakov and Eric Dumazet. From Eric Dumazet: -------------------- Also there might be a problem at line 175 if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) { spin_unlock(&head->lock); goto again; If we entered inet_csk_get_port() with a non null snum, we can "goto again" while it was not expected. -------------------- Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
This adds another inet device option to enable gratuitous ARP when device is brought up or address change. This is handy for clusters or virtualization. Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Harvey Harrison 提交于
Base versions handle constant folding now. Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 1月, 2009 2 次提交
-
-
由 Herbert Xu 提交于
Unfortunately simplicity isn't always the best. The fraginfo interface turned out to be suboptimal. The problem was quite obvious. For every packet, we have to copy the headers from the frags structure into skb->head, even though for 99% of the packets this part is immediately thrown away after the merge. LRO didn't have this problem because it directly read the headers from the frags structure. This patch attempts to address this by creating an interface that allows GRO to access the headers in the first frag without having to copy it. Because all drivers that use frags place the headers in the first frag this optimisation should be enough. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Zores 提交于
Signed-off-by: NBenjamin Zores <benjamin.zores@alcatel-lucent.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 1月, 2009 3 次提交
-
-
由 Dimitris Michailidis 提交于
tcp_splice_data_recv has two lengths to consider: the len parameter it gets from tcp_read_sock, which specifies the amount of data in the skb, and rd_desc->count, which is the amount of data the splice caller still wants. Currently it passes just the latter to skb_splice_bits, which then splices min(rd_desc->count, skb->len - offset) bytes. Most of the time this is fine, except when the skb contains urgent data. In that case len goes only up to the urgent byte and is less than skb->len - offset. By ignoring len tcp_splice_data_recv may a) splice data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len. Now, tcp_read_sock doesn't handle used > len and leaves the socket in a bad state (both sk_receive_queue and copied_seq are bad at that point) resulting in duplicated data and corruption. Fix by passing min(rd_desc->count, len) to skb_splice_bits. Signed-off-by: NDimitris Michailidis <dm@chelsio.com> Acked-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
commit 9088c560 (udp: Improve port randomization) introduced a regression for UDP bind() syscall to null port (getting a random port) in case lot of ports are already in use. This is because we do about 28000 scans of very long chains (220 sockets per chain), with many spin_lock_bh()/spin_unlock_bh() calls. Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE) so that we scan chains at most once. Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms Based on a report from Vitaly Mayatskikh Reported-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Tested-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Timo Teras 提交于
Instead of keeping candidate tunnel device from all categories, keep only one candidate with best score. This optimizes stack usage and speeds up exit code. Signed-off-by: NTimo Teras <timo.teras@iki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 1月, 2009 9 次提交
-
-
由 Benjamin Thery 提交于
This last patch makes the appropriate changes to use and propagate the network namespace where needed in IPv4 multicast routing code. This consists mainly in replacing all the remaining init_net occurences with current netns pointer retrieved from sockets, net devices or mfc_caches depending on the routines' contexts. Some routines receive a new 'struct net' parameter to propagate the current netns: * vif_add/vif_delete * ipmr_new_tunnel * mroute_clean_tables * ipmr_cache_find * ipmr_cache_report * ipmr_cache_unresolved * ipmr_mfc_add/ipmr_mfc_delete * ipmr_get_route * rt_fill_info (in route.c) Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
Declare IPv4 multicast forwarding /proc/net entries per-namespace: /proc/net/ip_mr_vif /proc/net/ip_mr_cache Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
Preliminary work to make IPv4 multicast routing netns-aware. Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4. At the moment, this variable is only referenced in init_net. Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
Preliminary work to make IPv4 multicast routing netns-aware. Declare IPv multicast routing variables 'mroute_do_assert' and 'mroute_do_pim' per-namespace in struct netns_ipv4. At the moment, these variables are only referenced in init_net. Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
Preliminary work to make IPv4 multicast routing netns-aware. Declare variable cache_resolve_queue_len per-namespace: move it into struct netns_ipv4. This variable counts the number of unresolved cache entries queued in the list mfc_unres_queue. This list is kept global to all netns as the number of entries per namespace is limited to 10 (hardcoded in routine ipmr_cache_unresolved). Entries belonging to different namespaces in mfc_unres_queue will be identified by matching the mfc_net member introduced previously in struct mfc_cache. Keeping this list global to all netns, also allows us to keep a single timer (ipmr_expire_timer) to handle their expiration. In some places cache_resolve_queue_len value was tested for arming or deleting the timer. These tests were equivalent to testing mfc_unres_queue value instead and are replaced in this patch. At the moment, cache_resolve_queue_len is only referenced in init_net. Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
Preliminary work to make IPv4 multicast routing netns-aware. Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array, and move it to struct netns_ipv4. At the moment, mfc_cache_array is only referenced in init_net. Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
This patch stores into struct mfc_cache the network namespace each mfc_cache belongs to. The new member is mfc_net. mfc_net is assigned at cache allocation and doesn't change during the rest of the cache entry life. A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres. This will help to retrieve the current netns around the IPv4 multicast routing code. At the moment, all mfc_cache are allocated in init_net. Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
Preliminary work to make IPv6 multicast routing netns-aware. Dynamically allocate interface table vif_table and move it to struct netns_ipv4, and update MIF_EXISTS() macro. At the moment, vif_table is only referenced in init_net. Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Thery 提交于
Preliminary work to make IPv4 multicast routing netns-aware. Make IPv4 multicast routing mroute_socket per-namespace, moves it into struct netns_ipv4. At the moment, mroute_socket is only referenced in init_net. Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 1月, 2009 2 次提交
-
-
由 Timo Teras 提交于
Check the device on receive path and allow otherwise identical devices as long as the physical device differs. This is useful for NBMA tunnels, where you want to use different gre IP for each public IP available via different physical devices. Signed-off-by: NTimo Teras <timo.teras@iki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Evgeniy Polyakov 提交于
With simple extension to the binding mechanism, which allows to bind more than 64k sockets (or smaller amount, depending on sysctl parameters), we have to traverse the whole bind hash table to find out empty bucket. And while it is not a problem for example for 32k connections, bind() completion time grows exponentially (since after each successful binding we have to traverse one bucket more to find empty one) even if we start each time from random offset inside the hash table. So, when hash table is full, and we want to add another socket, we have to traverse the whole table no matter what, so effectivelly this will be the worst case performance and it will be constant. Attached picture shows bind() time depending on number of already bound sockets. Green area corresponds to the usual binding to zero port process, which turns on kernel port selection as described above. Red area is the bind process, when number of reuse-bound sockets is not limited by 64k (or sysctl parameters). The same exponential growth (hidden by the green area) before number of ports reaches sysctl limit. At this time bind hash table has exactly one reuse-enbaled socket in a bucket, but it is possible that they have different addresses. Actually kernel selects the first port to try randomly, so at the beginning bind will take roughly constant time, but with time number of port to check after random start will increase. And that will have exponential growth, but because of above random selection, not every next port selection will necessary take longer time than previous. So we have to consider the area below in the graph (if you could zoom it, you could find, that there are many different times placed there), so area can hide another. Blue area corresponds to the port selection optimization. This is rather simple design approach: hashtable now maintains (unprecise and racely updated) number of currently bound sockets, and when number of such sockets becomes greater than predefined value (I use maximum port range defined by sysctls), we stop traversing the whole bind hash table and just stop at first matching bucket after random start. Above limit roughly corresponds to the case, when bind hash table is full and we turned on mechanism of allowing to bind more reuse-enabled sockets, so it does not change behaviour of other sockets. Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net> Tested-by: NDenys Fedoryschenko <denys@visp.net.lb> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-