- 04 10月, 2010 4 次提交
-
-
由 Eric Dumazet 提交于
Use RCU & RTNL protection for mfc_cache_array[] ipmr_cache_find() is called under rcu_read_lock(); Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Use RCU and RTNL to protect (struct mr_table)->mroute_sk Readers use RCU, writers use RTNL. ip_ra_control() already use an RCU grace period before ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in mrtsock_destruct() Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
No need to get a reference on reg_dev and release it, we are in a rcu_read_lock() protected section. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
This table is only used in gre.c Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 10月, 2010 3 次提交
-
-
由 Eric Dumazet 提交于
ip_route_output_slow() is enclosed in an rcu_read_lock() protected section, so that no references are taken/released on device, thanks to __ip_dev_find() & dev_get_by_index_rcu() Tested with ip route cache disabled, and a stress test : Before patch: elapsed time : real 1m38.347s user 0m11.909s sys 23m51.501s Profile: 13788.00 22.7% ip_route_output_slow [kernel] 7875.00 13.0% dst_destroy [kernel] 3925.00 6.5% fib_semantic_match [kernel] 3144.00 5.2% fib_rules_lookup [kernel] 3061.00 5.0% dst_alloc [kernel] 2276.00 3.7% rt_set_nexthop [kernel] 1762.00 2.9% fib_table_lookup [kernel] 1538.00 2.5% _raw_read_lock [kernel] 1358.00 2.2% ip_output [kernel] After patch: real 1m28.808s user 0m13.245s sys 20m37.293s 10950.00 17.2% ip_route_output_slow [kernel] 10726.00 16.9% dst_destroy [kernel] 5170.00 8.1% fib_semantic_match [kernel] 3937.00 6.2% dst_alloc [kernel] 3635.00 5.7% rt_set_nexthop [kernel] 2900.00 4.6% fib_rules_lookup [kernel] 2240.00 3.5% fib_table_lookup [kernel] 1427.00 2.2% _raw_read_lock [kernel] 1157.00 1.8% kmem_cache_alloc [kernel] Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ip_dev_find(net, addr) finds a device given an IPv4 source address and takes a reference on it. Introduce __ip_dev_find(), taking a third argument, to optionally take the device reference. Callers not asking the reference to be taken should be in an rcu_read_lock() protected section. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
While doing stress tests with a disabled IP route cache, I found __mkroute_output() was touching three times in_device atomic refcount. Use RCU to touch it once to reduce cache line ping pongs. Before patch time to perform the test real 1m42.009s user 0m12.545s sys 25m0.726s Profile : 16109.00 26.4% ip_route_output_slow vmlinux 7434.00 12.2% dst_destroy vmlinux 3280.00 5.4% fib_rules_lookup vmlinux 3252.00 5.3% fib_semantic_match vmlinux 2622.00 4.3% fib_table_lookup vmlinux 2535.00 4.1% dst_alloc vmlinux 1750.00 2.9% _raw_read_lock vmlinux 1532.00 2.5% rt_set_nexthop vmlinux After patch real 1m36.503s user 0m12.977s sys 23m25.608s 14234.00 22.4% ip_route_output_slow vmlinux 8717.00 13.7% dst_destroy vmlinux 4052.00 6.4% fib_rules_lookup vmlinux 3951.00 6.2% fib_semantic_match vmlinux 3191.00 5.0% dst_alloc vmlinux 1764.00 2.8% fib_table_lookup vmlinux 1692.00 2.7% _raw_read_lock vmlinux 1605.00 2.5% rt_set_nexthop vmlinux Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2010 6 次提交
-
-
由 Eric Dumazet 提交于
HARD_TX_LOCK no longer protects tunnels from dead loops, but xmit_recursion percpu counter. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Function only used in tcp_input.c Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
arp_broken_ops is only used in arp.c Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending 10000000 UDP frames via one ipip tunnel (size:200 bytes per frame) Before patch : real 2m53.321s user 0m10.277s sys 46m0.597s After patch: real 0m32.063s user 0m9.237s sys 8m16.255s Last problem to solve is the contention on dst : 16118.00 28.3% __ip_route_output_key vmlinux 6135.00 10.8% dst_release vmlinux 3220.00 5.6% ip_finish_output vmlinux 2149.00 3.8% ip_route_output_flow vmlinux 1575.00 2.8% ip_append_data vmlinux 1481.00 2.6% ip_push_pending_frames vmlinux 1349.00 2.4% __xfrm_lookup vmlinux 1216.00 2.1% csum_partial_copy_generic vmlinux 1208.00 2.1% udp_sendmsg vmlinux Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX Note: If tunnels are created with the "oseq" option, LLTX is not enabled : Even using an atomic_t o_seq, we would increase chance for packets being out of order at receiver. Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending 10000000 UDP frames via one gre tunnel (size:200 bytes per frame) Before patch : real 3m0.094s user 0m9.365s sys 47m50.103s After patch: real 0m29.756s user 0m11.097s sys 7m33.012s Last problem to solve is the contention on dst : 38660.00 21.4% __ip_route_output_key vmlinux 20786.00 11.5% dst_release vmlinux 14191.00 7.8% __xfrm_lookup vmlinux 12410.00 6.9% ip_finish_output vmlinux 4540.00 2.5% ip_push_pending_frames vmlinux 4427.00 2.4% ip_append_data vmlinux 4265.00 2.4% __alloc_skb vmlinux 4140.00 2.3% __ip_local_out vmlinux 3991.00 2.2% dev_queue_xmit vmlinux Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
commit 3c97af99 (ipip: percpu stats accounting) forgot the fallback tunnel case (tunl0), and can crash pretty fast. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2010 1 次提交
-
-
由 Tom Herbert 提交于
This patch allows a host to be configured to respond to any address in a specified range as if it were local, without actually needing to configure the address on an interface. This is done through routing table configuration. For instance, to configure a host to respond to any address in 10.1/16 received on eth0 as a local address we can do: ip rule add from all iif eth0 lookup 200 ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200 This host is now reachable by any 10.1/16 address (route lookup on input for packets received on eth0 can find the route). On output, the rule will not be matched so that this host can still send packets to 10.1/16 (not sent on loopback). Presumably, external routing can be configured to make sense out of this. To make this work, we needed to modify the logic in finding the interface which is assigned a given source address for output (dev_ip_find). We perform a normal fib_lookup instead of just a lookup on the local table, and in the lookup we ignore the input interface for matching. This patch is useful to implement IP-anycast for subnets of virtual addresses. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 9月, 2010 2 次提交
-
-
由 Eric Dumazet 提交于
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets. Other seldom used fields are kept in netdev->stats structure, possibly unsafe. This is a preliminary work to support lockless transmit path, and correct RX stats, that are already unsafe. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit : > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > index 5d6ddcb..de39b22 100644 > > --- a/net/ipv4/ip_gre.c > > +++ b/net/ipv4/ip_gre.c > [...] > > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net, > > if (parms->name[0]) > > strlcpy(name, parms->name, IFNAMSIZ); > > else > > - sprintf(name, "gre%%d"); > > + strcpy(name, "gre%d"); > > > > dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup); > > if (!dev) > [...] > > This is a valid fix, but doesn't belong in this patch! > Sorry ? It was not a fix, but at most a cleanup ;) Anyway I forgot the gretap case... [PATCH 2/4 v2] ip_gre: percpu stats accounting Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets. Other seldom used fields are kept in netdev->stats structure, possibly unsafe. This is a preliminary work to support lockless transmit path, and correct RX stats, that are already unsafe. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 9月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
__in_dev_get_rtnl(dev_out) is called while RTNL is not held, thus triggers a lockdep fault. At this point, we only perform a raw test of dev_out->ip_ptr being NULL, we dont need to make sure ip_ptr cant changed right after. We can use rcu_dereference_raw() for this. Reported-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 9月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
While investigating a bit, I found ip_fragment() slow path was taken because ip_append_data() provides following layout for a send(MTU + N*(MTU - 20)) syscall : - one skb with 1500 (mtu) bytes - N fragments of 1480 (mtu-20) bytes (before adding IP header) last fragment gets 17 bytes of trail data because of following bit: if (datalen == length + fraggap) alloclen += rt->dst.trailer_len; Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm... another bug ?) In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu, so we take slow path, building another skb chain. In order to avoid taking slow path, we should correct ip_append_data() to make sure last fragment has real trail space, under mtu... Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 9月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 9月, 2010 4 次提交
-
-
由 Ulrich Weber 提交于
otherwise ECT(1) bit will get interpreted as RTO_ONLINK and routing will fail with XfrmOutBundleGenError. Signed-off-by: NUlrich Weber <uweber@astaro.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Olsa 提交于
we need to check proper socket type within ipv4_conntrack_defrag function before referencing the nodefrag flag. For example the tun driver receive path produces skbs with AF_UNSPEC socket type, and so current code is causing unwanted fragmented packets going out. Signed-off-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Fix checksum calculation in nf_nat_snmp_basic. Based on patches by Clark Wang <wtweeker@163.com> and Stephen Hemminger <shemminger@vyatta.com>. https://bugzilla.kernel.org/show_bug.cgi?id=17622Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
ip_route_me_harder can't create the route cache when the outdev is the same with the indev for the skbs whichout a valid protocol set. __mkroute_input functions has this check: 1998 if (skb->protocol != htons(ETH_P_IP)) { 1999 /* Not IP (i.e. ARP). Do not create route, if it is 2000 * invalid for proxy arp. DNAT routes are always valid. 2001 * 2002 * Proxy arp feature have been extended to allow, ARP 2003 * replies back to the same interface, to support 2004 * Private VLAN switch technologies. See arp.c. 2005 */ 2006 if (out_dev == in_dev && 2007 IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) { 2008 err = -EINVAL; 2009 goto cleanup; 2010 } 2011 } This patch gives the new skb a valid protocol to bypass this check. In order to make ipt_REJECT work with bridges, you also need to enable ip_forward. This patch also fixes a regression. When we used skb_copy_expand(), we didn't have this issue stated above, as the protocol was properly set. Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 9月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
Special care should be taken when slow path is hit in ip_fragment() : When walking through frags, we transfert truesize ownership from skb to frags. Then if we hit a slow_path condition, we must undo this or risk uncharging frags->truesize twice, and in the end, having negative socket sk_wmem_alloc counter, or even freeing socket sooner than expected. Many thanks to Nick Bowler, who provided a very clean bug report and test program. Thanks to Jarek for reviewing my first patch and providing a V2 While Nick bisection pointed to commit 2b85a34e (net: No more expensive sock_hold()/sock_put() on each tx), underlying bug is older (2.6.12-rc5) A side effect is to extend work done in commit b2722b1c (ip_fragment: also adjust skb->truesize for packets not owned by a socket) to ipv6 as well. Reported-and-bisected-by: NNick Bowler <nbowler@elliptictech.com> Tested-by: NNick Bowler <nbowler@elliptictech.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 9月, 2010 4 次提交
-
-
由 Tom Marshall 提交于
If a RST comes in immediately after checking sk->sk_err, tcp_poll will return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end of tcp_poll. Additionally, ensure the correct order of operations on SMP machines with memory barriers. Signed-off-by: NTom Marshall <tdm.code@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Egerer 提交于
The family parameter xfrm_state_find is used to find a state matching a certain policy. This value is set to the template's family (encap_family) right before xfrm_state_find is called. The family parameter is however also used to construct a temporary state in xfrm_state_find itself which is wrong for inter-family scenarios because it produces a selector for the wrong family. Since this selector is included in the xfrm_user_acquire structure, user space programs misinterpret IPv6 addresses as IPv4 and vice versa. This patch splits up the original init_tempsel function into a part that initializes the selector respectively the props and id of the temporary state, to allow for differing ip address families whithin the state. Signed-off-by: NThomas Egerer <thomas.egerer@secunet.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Under load, netif_rx() can drop incoming packets but administrators dont have a chance to spot which device needs some tuning (RPS activation for example) This patch adds rx_dropped accounting in vlans and tunnels. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ipv6 can be a module, we should test CONFIG_IPV6 and CONFIG_IPV6_MODULE to enable ipv6 bits in ip_gre. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 9月, 2010 1 次提交
-
-
由 Bandan Das 提交于
Related dicussion here : http://lkml.org/lkml/2010/9/3/16 Introduce a function br_parse_ip_options that will audit the skb and possibly refill IP options before a packet enters the IP stack. If no options are present, the function will zero out the skb cb area so that it is not misinterpreted as options by some unsuspecting IP layer routine. If packet consistency fails, drop it. Signed-off-by: NBandan Das <bandan.das@stratus.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 9月, 2010 3 次提交
-
-
由 Eric Dumazet 提交于
dev->ip_ptr is protected by rtnl and rcu. Yet some places dont use appropriate primitives and/or locking rules. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
As RTNL is held while doing tunnels inserts and deletes, we can remove ipgre_lock spinlock. My initial RCU conversion was conservative and converted the rwlock to spinlock, with no RTNL requirement. Use appropriate rcu annotations and modern lockdep checks as well. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
As RTNL is held while doing tunnels inserts and deletes, we can remove ipip_lock spinlock. My initial RCU conversion was conservative and converted the rwlock to spinlock, with no RTNL requirement. Use appropriate rcu annotations and modern lockdep checks as well. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 9月, 2010 2 次提交
-
-
由 Michael Kerrisk 提交于
While integrating your man-pages patch for IP_NODEFRAG, I noticed that this option is settable by setsockopt(), but not gettable by getsockopt(). I suppose this is not intended. The (untested, trivial) patch below adds getsockopt() support. Signed-off-by: NMichael kerrisk <mtk.manpages@gmail.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Bob Arendt 提交于
After all these years, it turns out that the /proc/sys/net/ipv4/conf/*/force_igmp_version parameter isn't fully implemented. *Symptom*: When set force_igmp_version to a value of 2, the kernel should only perform multicast IGMPv2 operations (IETF rfc2236). An host-initiated Join message will be sent as a IGMPv2 Join message. But if a IGMPv3 query message is received, the host responds with a IGMPv3 join message. Per rfc3376 and rfc2236, a IGMPv2 host should treat a IGMPv3 query as a IGMPv2 query and respond with an IGMPv2 Join message. *Consequences*: This is an issue when a IGMPv3 capable switch is the querier and will only issue IGMPv3 queries (which double as IGMPv2 querys) and there's an intermediate switch that is only IGMPv2 capable. The intermediate switch processes the initial v2 Join, but fails to recognize the IGMPv3 Join responses to the Query, resulting in a dropped connection when the intermediate v2-only switch times it out. *Identifying issue in the kernel source*: The issue is in this section of code (in net/ipv4/igmp.c), which is called when an IGMP query is received (from mainline 2.6.36-rc3 gitweb): ... A IGMPv3 query has a length >= 12 and no sources. This routine will exit after line 880, setting the general query timer (random timeout between 0 and query response time). This calls igmp_gq_timer_expire(): ... .. which only sends a v3 response. So if a v3 query is received, the kernel always sends a v3 response. IGMP queries happen once every 60 sec (per vlan), so the traffic is low. A IGMPv3 query *is* a strict superset of a IGMPv2 query, so this patch properly short circuit's the v3 behaviour. One issue is that this does not address force_igmp_version=1. Then again, I've never seen any IGMPv1 multicast equipment in the wild. However there is a lot of v2-only equipment. If it's necessary to support the IGMPv1 case as well: 837 if (len == 8 || IGMP_V2_SEEN(in_dev) || IGMP_V1_SEEN(in_dev)) { Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 9月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
Use rcu_dereference_rtnl() helper Change hard coded constants in fib_flag_trans() 7 -> RTN_UNREACHABLE 8 -> RTN_PROHIBIT Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 9月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
xfrm4_tunnel_register() & xfrm6_tunnel_register() should use rcu_assign_pointer() to make sure previous writes (to handler->next) are committed to memory before chain insertion. deregister functions dont need a particular barrier. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 9月, 2010 4 次提交
-
-
由 Eric Dumazet 提交于
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: NKrzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and friends. inet_protos[] & inet6_protos[] are moved to read_mostly section Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jianzhao Wang 提交于
Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error triggered by IKE for example), hence this kind of route is always temporary and so we should check if a better route exists for next packets. Bug has been introduced by commit d11a4dc1. Signed-off-by: NJianzhao Wang <jianzhao.wang@6wind.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarek Poplawski 提交于
Hi, Here is one more of these warnings and a patch below: Sep 5 23:52:33 del kernel: [46044.244833] =================================================== Sep 5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ] Sep 5 23:52:33 del kernel: [46044.277000] --------------------------------------------------- Sep 5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! Sep 5 23:52:33 del kernel: [46044.293627] Sep 5 23:52:33 del kernel: [46044.293632] other info that might help us debug this: Sep 5 23:52:33 del kernel: [46044.293634] Sep 5 23:52:33 del kernel: [46044.325333] Sep 5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0 Sep 5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717: Sep 5 23:52:33 del kernel: [46044.357548] #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 Sep 5 23:52:33 del kernel: [46044.367647] Sep 5 23:52:33 del kernel: [46044.367652] stack backtrace: Sep 5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3 Sep 5 23:52:33 del kernel: [46044.398764] Call Trace: Sep 5 23:52:33 del kernel: [46044.409596] [<c12f9aba>] ? printk+0x18/0x1e Sep 5 23:52:33 del kernel: [46044.420761] [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 Sep 5 23:52:33 del kernel: [46044.432229] [<c12b7235>] trie_firstleaf+0x65/0x70 Sep 5 23:52:33 del kernel: [46044.443941] [<c12b74d4>] fib_table_flush+0x14/0x170 Sep 5 23:52:33 del kernel: [46044.455823] [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0 Sep 5 23:52:33 del kernel: [46044.467995] [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40 Sep 5 23:52:33 del kernel: [46044.480404] [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180 Sep 5 23:52:33 del kernel: [46044.493025] [<c12b069d>] fib_flush+0x2d/0x60 Sep 5 23:52:33 del kernel: [46044.505796] [<c12b06f5>] fib_disable_ip+0x25/0x50 Sep 5 23:52:33 del kernel: [46044.518772] [<c12b10d3>] fib_netdev_event+0x73/0xd0 Sep 5 23:52:33 del kernel: [46044.531918] [<c1048dfd>] notifier_call_chain+0x2d/0x70 Sep 5 23:52:33 del kernel: [46044.545358] [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20 Sep 5 23:52:33 del kernel: [46044.559092] [<c124f687>] call_netdevice_notifiers+0x27/0x60 Sep 5 23:52:33 del kernel: [46044.573037] [<c124faec>] __dev_notify_flags+0x5c/0x80 Sep 5 23:52:33 del kernel: [46044.586489] [<c124fb47>] dev_change_flags+0x37/0x60 Sep 5 23:52:33 del kernel: [46044.599394] [<c12a8a8d>] devinet_ioctl+0x54d/0x630 Sep 5 23:52:33 del kernel: [46044.612277] [<c12aabb7>] inet_ioctl+0x97/0xc0 Sep 5 23:52:34 del kernel: [46044.625208] [<c123f6af>] sock_ioctl+0x6f/0x270 Sep 5 23:52:34 del kernel: [46044.638046] [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0 Sep 5 23:52:34 del kernel: [46044.650968] [<c123f640>] ? sock_ioctl+0x0/0x270 Sep 5 23:52:34 del kernel: [46044.663865] [<c10c3188>] vfs_ioctl+0x28/0xa0 Sep 5 23:52:34 del kernel: [46044.676556] [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0 Sep 5 23:52:34 del kernel: [46044.688989] [<c1048676>] ? up_read+0x16/0x30 Sep 5 23:52:34 del kernel: [46044.701411] [<c1021376>] ? do_page_fault+0x1d6/0x3a0 Sep 5 23:52:34 del kernel: [46044.714223] [<c10b6588>] ? fget_light+0xf8/0x2f0 Sep 5 23:52:34 del kernel: [46044.726601] [<c1241f98>] ? sys_socketcall+0x208/0x2c0 Sep 5 23:52:34 del kernel: [46044.739140] [<c10c3eb3>] sys_ioctl+0x63/0x70 Sep 5 23:52:34 del kernel: [46044.751967] [<c12fca3d>] syscall_call+0x7/0xb Sep 5 23:52:34 del kernel: [46044.764734] [<c12f0000>] ? cookie_v6_check+0x3d0/0x630 --------------> This patch fixes the warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by pppd/1717: #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 stack backtrace: Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3 Call Trace: [<c12f9aba>] ? printk+0x18/0x1e [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 [<c12b7235>] trie_firstleaf+0x65/0x70 [<c12b74d4>] fib_table_flush+0x14/0x170 ... Allow trie_firstleaf() to be called either under rcu_read_lock() protection or with RTNL held. The same annotation is added to node_parent_rcu() to prevent a similar warning a bit later. Followup of commits 634a4b20 and 4eaa0e3c. Signed-off-by: NJarek Poplawski <jarkao2@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-