- 09 11月, 2009 5 次提交
-
-
由 Eric Dumazet 提交于
UDP multicast rx path is a bit complex and can hold a spinlock for a long time. Using a small (32 or 64 entries) stack of socket pointers can help to perform expensive operations (skb_clone(), udp_queue_rcv_skb()) outside of the lock, in most cases. It's also a base for a future RCU conversion of multicast recption. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NLucian Adrian Grijincu <lgrijincu@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We first locate the (local port) hash chain head If few sockets are in this chain, we proceed with previous lookup algo. If too many sockets are listed, we take a look at the secondary (port, address) hash chain we added in previous patch. We choose the shortest chain and proceed with a RCU lookup on the elected chain. But, if we chose (port, address) chain, and fail to find a socket on given address, we must try another lookup on (port, INADDR_ANY) chain to find socket not bound to a particular IP. -> No extra cost for typical setups, where the first lookup will probabbly be performed. RCU lookups everywhere, we dont acquire spinlock. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Extends udp_table to contain a secondary hash table. socket anchor for this second hash is free, because UDP doesnt use skc_bind_node : We define an union to hold both skc_bind_node & a new hlist_nulls_node udp_portaddr_node udp_lib_get_port() inserts sockets into second hash chain (additional cost of one atomic op) udp_lib_unhash() deletes socket from second hash chain (additional cost of one atomic op) Note : No spinlock lockdep annotation is needed, because lock for the secondary hash chain is always get after lock for primary hash chain. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Union sk_hash with two u16 hashes for udp (no extra memory taken) One 16 bits hash on (local port) value (the previous udp 'hash') One 16 bits hash on (local address, local port) values, initialized but not yet used. This second hash is using jenkin hash for better distribution. Because the 'port' is xored later, a partial hash is performed on local address + net_hash_mix(net) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Adds a counter in udp_hslot to keep an accurate count of sockets present in chain. This will permit to upcoming UDP lookup algo to chose the shortest chain when secondary hash is added. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 11月, 2009 1 次提交
-
-
由 Eric W. Biederman 提交于
There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
-
- 06 11月, 2009 5 次提交
-
-
由 Jozsef Kadlecsik 提交于
Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work over NAT. The "cause" of the problem was a fix of unacknowledged data detection with NAT (commit a3a9f79e). However, actually, that fix uncovered a long standing bug in TCP conntrack: when NAT was enabled, we simply updated the max of the right edge of the segments we have seen (td_end), by the offset NAT produced with changing IP/port in the data. However, we did not update the other parameter (td_maxend) which is affected by the NAT offset. Thus that could drift away from the correct value and thus resulted breaking active FTP. The patch below fixes the issue by *not* updating the conntrack parameters from NAT, but instead taking into account the NAT offsets in conntrack in a consistent way. (Updating from NAT would be more harder and expensive because it'd need to re-calculate parameters we already calculated in conntrack.) Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When sending fragmentation expiration ICMP V4/V6 messages, we can avoid touching device refcount, thanks to RCU Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Paris 提交于
Before calling capable(CAP_NET_RAW) check if this operations is on behalf of the kernel or on behalf of userspace. Do not do the security check if it is on behalf of the kernel. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Paris 提交于
The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Paris 提交于
struct can_proto had a capability field which wasn't ever used. It is dropped entirely. struct inet_protosw had a capability field which can be more clearly expressed in the code by just checking if sock->type = SOCK_RAW. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 11月, 2009 3 次提交
-
-
由 Gilad Ben-Yossef 提交于
Trying to parse the option of a SYN packet that we have no route entry for should just use global wide defaults for route entry options. Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Tested-by: Valdis.Kletnieks@vt.edu Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gilad Ben-Yossef 提交于
Calling IPv4 specific inet_csk_route_req in tcp_check_req is a bad idea and crashes machine on IPv6 connections, as reported by Valdis Kletnieks Also, all we are really interested in is the timestamp option in the header, so calling tcp_parse_options() with the "estab" set to false flag is an overkill as it tries to parse half a dozen other TCP options. We know whether timestamp should be enabled or not using data from request_sock. Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Tested-by: Valdis.Kletnieks@vt.edu Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
As pointed by Stephen Rothwell, commit c6d14c84 added a warning : net/ipv4/devinet.c: In function 'inet_select_addr': net/ipv4/devinet.c:902: warning: label 'out' defined but not used delete unused 'out' label and do some cleanups as well Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 11月, 2009 1 次提交
-
-
由 Eric Dumazet 提交于
Adds RCU management to the list of netdevices. Convert some for_each_netdev() users to RCU version, if it can avoid read_lock-ing dev_base_lock Ie: read_lock(&dev_base_loack); for_each_netdev(net, dev) some_action(); read_unlock(&dev_base_lock); becomes : rcu_read_lock(); for_each_netdev_rcu(net, dev) some_action(); rcu_read_unlock(); Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 11月, 2009 2 次提交
-
-
由 Eric Dumazet 提交于
We can avoid touching device refcount in icmp_send(), using dev_get_by_index_rcu() Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Use dev_get_by_index_rcu() instead of __dev_get_by_index() and dev_base_lock rwlock Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 10月, 2009 2 次提交
-
-
由 Herbert Xu 提交于
Nathan Neulinger noticed that gretap devices get their MAC address from the local IP address, which results in invalid MAC addresses half of the time. This is because gretap is still using the tunnel netdev ops rather than the correct tap netdev ops struct. This patch also fixes changelink to not clobber the MAC address for the gretap case. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NStephen Hemminger <shemminger@vyatta.com> Tested-by: NNathan Neulinger <nneul@mst.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
On UDP sockets, we must call skb_free_datagram() with socket locked, or risk sk_forward_alloc corruption. This requirement is not respected in SUNRPC. Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC Reported-by: NFrancis Moreau <francis.moro@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 10月, 2009 1 次提交
-
-
由 jamal 提交于
Policy routing is not looked up by mark on reverse path filtering. This fixes it. Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 10月, 2009 10 次提交
-
-
由 Cyrill Gorcunov 提交于
proto_ops->getname implies copying protocol specific data into storage unit (particulary to __kernel_sockaddr_storage). So when we implement new protocol support we should keep such a detail in mind (which is easy to forget about). Lets introduce DECLARE_SOCKADDR helper which check if storage unit is not overfowed at build time. Eventually inet_getname is switched to use DECLARE_SOCKADDR (to show example of usage). Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 roel kluin 提交于
optlen is unsigned so the `< 0' test is never true. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gilad Ben-Yossef 提交于
Add and use no DSCAK bit in the features field. Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: NOri Finkelman <ori@comsleep.com> Sigend-off-by: NYony Amit <yony@comsleep.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gilad Ben-Yossef 提交于
Add and use no window scale bit in the features field. Note that this is not the same as setting a window scale of 0 as would happen with window limit on route. Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: NOri Finkelman <ori@comsleep.com> Sigend-off-by: NYony Amit <yony@comsleep.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gilad Ben-Yossef 提交于
Implement querying and acting upon the no timestamp bit in the feature field. Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: NOri Finkelman <ori@comsleep.com> Sigend-off-by: NYony Amit <yony@comsleep.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gilad Ben-Yossef 提交于
Implement querying and acting upon the no sack bit in the features field. Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: NOri Finkelman <ori@comsleep.com> Sigend-off-by: NYony Amit <yony@comsleep.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gilad Ben-Yossef 提交于
We need tcp_parse_options to be aware of dst_entry to take into account per dst_entry TCP options settings Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: NOri Finkelman <ori@comsleep.com> Sigend-off-by: NYony Amit <yony@comsleep.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gilad Ben-Yossef 提交于
Since we only use tcp_parse_options here to check for the exietence of TCP timestamp option in the header, it is better to call with the "established" flag on. Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com> Signed-off-by: NOri Finkelman <ori@comsleep.com> Signed-off-by: NYony Amit <yony@comsleep.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neil Horman 提交于
Augment raw_send_hdrinc to correct for incorrect ip header length values A series of oopses was reported to me recently. Apparently when using AF_RAW sockets to send data to peers that were reachable via ipsec encapsulation, people could panic or BUG halt their systems. I've tracked the problem down to user space sending an invalid ip header over an AF_RAW socket with IP_HDRINCL set to 1. Basically what happens is that userspace sends down an ip frame that includes only the header (no data), but sets the ip header ihl value to a large number, one that is larger than the total amount of data passed to the sendmsg call. In raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr that was passed in, but assume the data is all valid. Later during ipsec encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff to provide headroom for the ipsec headers. During this operation, the skb->transport_header is repointed to a spot computed by skb->network_header + the ip header length (ihl). Since so little data was passed in relative to the value of ihl provided by the raw socket, we point transport header to an unknown location, resulting in various crashes. This fix for this is pretty straightforward, simply validate the value of of iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer size, drop the frame and return -EINVAL. I just confirmed this fixes the reported crashes. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 10月, 2009 3 次提交
-
-
由 Andreas Petlund 提交于
Corrected a spelling error in a function name. Signed-off-by: NAndreas Petlund <apetlund@simula.no> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Speedup module unloading by factorizing synchronize_rcu() calls Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 10月, 2009 2 次提交
-
-
由 Eric Dumazet 提交于
GRE tunnels use one rwlock to protect their hash tables. This locking scheme can be converted to RCU for free, since netdevice already must wait for a RCU grace period at dismantle time. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPIP tunnels use one rwlock to protect their hash tables. This locking scheme can be converted to RCU for free, since netdevice already must wait for a RCU grace period at dismantle time. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 10月, 2009 1 次提交
-
-
由 Arjan van de Ven 提交于
Commit b6b39e8f (tcp: Try to catch MSG_PEEK bug) added a printk() to the WARN_ON() that's in tcp.c. This patch changes this combination to WARN(); the advantage of WARN() is that the printk message shows up inside the message, so that kerneloops.org will collect the message. In addition, this gets rid of an extra if() statement. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 10月, 2009 1 次提交
-
-
由 Krishna Kumar 提交于
dst_negative_advice() should check for changed dst and reset sk_tx_queue_mapping accordingly. Pass sock to the callers of dst_negative_advice. (sk_reset_txq is defined just for use by dst_negative_advice. The only way I could find to get around this is to move dst_negative_() from dst.h to dst.c, include sock.h in dst.c, etc) Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 10月, 2009 3 次提交
-
-
由 Herbert Xu 提交于
This patch tries to print out more information when we hit the MSG_PEEK bug in tcp_recvmsg. It's been around since at least 2005 and it's about time that we finally fix it. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Dykstra 提交于
Use symbols instead of magic constants while checking PMTU discovery setsockopt. Remove redundant test in ip_rt_frag_needed() (done by caller). Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls. This function should be called only with RTNL or dev_base_lock held, or reader could see a corrupt hash chain and eventually enter an endless loop. Fix is to call dev_get_by_index()/dev_put(). If this happens to be performance critical, we could define a new dev_exist_by_index() function to avoid touching dev refcount. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-