- 12 10月, 2012 1 次提交
-
-
由 Simon Derr 提交于
Race scenario: thread A thread B p9_write_work() p9_fd_request() if (list_empty (&m->unsent_req_list)) ... spin_lock(&client->lock); req->status = REQ_STATUS_UNSENT; list_add_tail(..., &m->unsent_req_list); spin_unlock(&client->lock); .... if (n & POLLOUT && !test_and_set_bit(Wworksched, &m->wsched) schedule_work(&m->wq); --> not done because Wworksched is set clear_bit(Wworksched, &m->wsched); return; --> nobody will take care of sending the new request. This is not very likely to happen though, because p9_write_work() being called with an empty unsent_req_list is not frequent. But this also means that taking the lock earlier will not be costly. Signed-off-by: NSimon Derr <simon.derr@bull.net> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
- 10 10月, 2012 1 次提交
-
-
由 jeff.liu 提交于
This is the revised patch for fixing rds-ping spinlock recursion according to Venkat's suggestions. RDS ping/pong over TCP feature has been broken for years(2.6.39 to 3.6.0) since we have to set TCP cork and call kernel_sendmsg() between ping/pong which both need to lock "struct sock *sk". However, this lock has already been hold before rds_tcp_data_ready() callback is triggerred. As a result, we always facing spinlock resursion which would resulting in system panic. Given that RDS ping is only used to test the connectivity and not for serious performance measurements, we can queue the pong transmit to rds_wq as a delayed response. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> CC: David S. Miller <davem@davemloft.net> CC: James Morris <james.l.morris@oracle.com> Signed-off-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2012 13 次提交
-
-
由 Michel Lespinasse 提交于
Empty nodes have no color. We can make use of this property to simplify the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros. Also, we can get rid of the rb_init_node function which had been introduced by commit 88d19cf3 ("timers: Add rb_init_node() to allow for stack allocated rb nodes") to avoid some issue with the empty node's color not being initialized. I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are doing there, though. axboe introduced them in commit 10fd48f2 ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev"). The way I see it, the 'empty node' abstraction is only used by rbtree users to flag nodes that they haven't inserted in any rbtree, so asking the predecessor or successor of such nodes doesn't make any sense. One final rb_init_node() caller was recently added in sysctl code to implement faster sysctl name lookups. This code doesn't make use of RB_EMPTY_NODE at all, and from what I could see it only called rb_init_node() under the mistaken assumption that such initialization was required before node insertion. [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build] Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Julian Anastasov 提交于
After the change "Make neigh lookups directly in output packet path" (commit a263b309) IPVS can not reach the real server for DR mode because we resolve the destination address from IP header, not from route neighbour. Use the new FLOWI_FLAG_KNOWN_NH flag to request output routes with known nexthop, so that it has preference on resolving. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
Add flag to request that output route should be returned with known rt_gateway, in case we want to use it as nexthop for neighbour resolving. The returned route can be cached as follows: - in NH exception: because the cached routes are not shared with other destinations - in FIB NH: when using gateway because all destinations for NH share same gateway As last option, to return rt_gateway!=0 we have to set DST_NOCACHE. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
Add new flag to remember when route is via gateway. We will use it to allow rt_gateway to contain address of directly connected host for the cases when DST_NOCACHE is used or when the NH exception caches per-destination route without DST_NOCACHE flag, i.e. when routes are not used for other destinations. By this way we force the neighbour resolving to work with the routed destination but we can use different address in the packet, feature needed for IPVS-DR where original packet for virtual IP is routed via route to real IP. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
Avoid checking nh_pcpu_rth_output in fast path, abort fib_info creation on alloc_percpu failure. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
After the change "Adjust semantics of rt->rt_gateway" (commit f8126f1d) rt_gateway can be 0 but ip_forward() compares it directly with nexthop. What we want here is to check if traffic is to directly connected nexthop and to fail if using gateway. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Anastasov 提交于
After "Cache input routes in fib_info nexthops" (commit d2d68ba9) and "Elide fib_validate_source() completely when possible" (commit 7a9bc9b8) we can not send ICMP redirects. It seems we should not cache the RTCF_DOREDIRECT flag in nh_rth_input because the same fib_info can be used for traffic that is not redirected, eg. from other input devices or from sources that are not in same subnet. As result, we have to disable the caching of RTCF_DOREDIRECT flag and to force source validation for the case when forwarding traffic to the input device. If traffic comes from directly connected source we allow redirection as it was done before both changes. Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS is disabled, this can avoid source address validation and to help caching the routes. After the change "Adjust semantics of rt->rt_gateway" (commit f8126f1d) we should make sure our ICMP_REDIR_HOST messages contain daddr instead of 0.0.0.0 when target is directly connected. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive() if a new skb is allocated (to serve as an anchor for frag_list) We copy NAPI_GRO_CB() only (not the IPV6 specific part) in : *NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p); So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead of IPPROTO_TCP (6) ipv6_gro_complete() isnt able to call ops->gro_complete() [ tcp6_gro_complete() ] Fix this by moving proto in NAPI_GRO_CB() and getting rid of IPV6_GRO_CB Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Zumbiehl 提交于
6a32e4f9 made the vlan code skip marking vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if there was an rx_handler, as the rx_handler could cause the frame to be received on a different (virtual) vlan-capable interface where that vlan might be configured. As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause frames for unknown vlans to be delivered to the protocol stack as if they had been received untagged. For example, if an ipv6 router advertisement that's tagged for a locally not configured vlan is received on an interface with macvlan interfaces attached, macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the macvlan interfaces, which caused it to be passed to the protocol stack, leading to ipv6 addresses for the announced prefix being configured even though those are completely unusable on the underlying interface. The fix moves marking as PACKET_OTHERHOST after the rx_handler so the rx_handler, if there is one, sees the frame unchanged, but afterwards, before the frame is delivered to the protocol stack, it gets marked whether there is an rx_handler or not. Signed-off-by: NFlorian Zumbiehl <florz@florz.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Current GRO can hold packets in gro_list for almost unlimited time, in case napi->poll() handler consumes its budget over and over. In this case, napi_complete()/napi_gro_flush() are not called. Another problem is that gro_list is flushed in non friendly way : We scan the list and complete packets in the reverse order. (youngest packets first, oldest packets last) This defeats priorities that sender could have cooked. Since GRO currently only store TCP packets, we dont really notice the bug because of retransmits, but this behavior can add unexpected latencies, particularly on mice flows clamped by elephant flows. This patch makes sure no packet can stay more than 1 ms in queue, and only in stress situations. It also complete packets in the right order to minimize latencies. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jesse Gross <jesse@nicira.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
We report cached pmtu values even if they are already expired. Change this to not report these values after they are expired and fix a race in the expire time calculation, as suggested by Eric Dumazet. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
When a local tool like tracepath tries to send packets bigger than the device mtu, we create a nh exeption and set the pmtu to device mtu. The device mtu does not expire, so check if the device mtu is smaller than the reported pmtu and don't crerate a nh exeption in that case. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steffen Klassert 提交于
Some protocols, like IPsec still cache routes. So we need to invalidate the old route on pmtu events to avoid the reuse of stale routes. We also need to update the mtu and expire time of the route if we already use a nh exception route, otherwise we ignore newly learned pmtu values after the first expiration. With this patch we always invalidate or update the route on pmtu events. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2012 3 次提交
-
-
由 Eric Dumazet 提交于
Before accessing skb first fragment, better make sure there is one. This is probably not needed for old kernels, since an ethernet frame cannot contain only an ethernet header, but the recent GRO addition to tunnels makes this patch needed. Also skb_gro_reset_offset() can be static, it actually allows compiler to inline it. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPv4 side of the problem was addressed in commit a9e050f4 (net: tcp: GRO should be ECN friendly) This patch does the same, but for IPv6 : A Traffic Class mismatch doesnt mean flows are different, but instead should force a flush of previous packets. This patch removes artificial packet reordering problem. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 ramesh.nagappa@gmail.com 提交于
The retry loop in neigh_resolve_output() and neigh_connected_output() call dev_hard_header() with out reseting the skb to network_header. This causes the retry to fail with skb_under_panic. The fix is to reset the network_header within the retry loop. Signed-off-by: NRamesh Nagappa <ramesh.nagappa@ericsson.com> Reviewed-by: NShawn Lu <shawn.lu@ericsson.com> Reviewed-by: NRobert Coulson <robert.coulson@ericsson.com> Reviewed-by: NBillie Alsup <billie.alsup@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 10月, 2012 2 次提交
-
-
由 Eric Dumazet 提交于
Over time, skb recycling infrastructure got litle interest and many bugs. Generic rx path skb allocation is now using page fragments for efficient GRO / TCP coalescing, and recyling a tx skb for rx path is not worth the pain. Last identified bug is that fat skbs can be recycled and it can endup using high order pages after few iterations. With help from Maxime Bizon, who pointed out that commit 87151b86 (net: allow pskb_expand_head() to get maximum tailroom) introduced this regression for recycled skbs. Instead of fixing this bug, lets remove skb recycling. Drivers wanting really hot skbs should use build_skb() anyway, to allocate/populate sk_buff right before netif_receive_skb() Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gao feng 提交于
I get a panic when I use ss -a and rmmod inet_diag at the same time. It's because netlink_dump uses inet_diag_dump which belongs to module inet_diag. I search the codes and find many modules have the same problem. We need to add a reference to the module which the cb->dump belongs to. Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo. Change From v3: change netlink_dump_start to inline,suggestion from Pablo and Eric. Change From v2: delete netlink_dump_done,and call module_put in netlink_dump and netlink_sock_destruct. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 10月, 2012 2 次提交
-
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 10月, 2012 7 次提交
-
-
由 Gao feng 提交于
as we hold dst_entry before we call __ip6_del_rt, so we should alse call dst_release not only return -ENOENT when the rt6_info is ip6_null_entry. and we already hold the dst entry, so I think it's safe to call dst_release out of the write-read lock. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Jones 提交于
Validation of userspace input shouldn't trigger dmesg spamming. Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Erik Hugne 提交于
When large buffers are sent over connected TIPC sockets, it is likely that the sk_backlog will be filled up on the receiver side, but the TIPC flow control mechanism is happily unaware of this since that is based on message count. The sender will receive a TIPC_ERR_OVERLOAD message when this occurs and drop it's side of the connection, leaving it stale on the receiver end. By increasing the sk_rcvbuf to a 'worst case' value, we avoid the overload caused by a full backlog queue and the flow control will work properly. This worst case value is the max TIPC message size times the flow control window, multiplied by two because a sender will transmit up to double the window size before a port is marked congested. We multiply this by 2 to account for the sk_buff and other overheads. Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Jones 提交于
Fuzzing causes these printks to spew constantly. Changing them to DEBUG statements is consistent with other usage in the file, and makes them disappear when CONFIG_IRDA_DEBUG is disabled. Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
Suppose we have an SCTP connection with two paths. After connection is established, path1 is not available, thus this path is marked as inactive. Then traffic goes through path2, but for some reasons packets are delayed (after rto.max). Because packets are delayed, the retransmit mechanism will switch again to path1. At this time, we receive a delayed SACK from path2. When we update the state of the path in sctp_check_transmitted(), we do not take into account the source address of the SACK, hence we update the wrong path. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
Just to avoid confusion when people only reads this prototype. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.) introduced a regression for forwarding. This was hard to reproduce but the symptom was that packets were delivered to local host instead of being forwarded. David suggested to add fib_type to fib_info so that we dont inadvertently share same fib_info for different purposes. With help from Julian Anastasov who provided very helpful hints, reproduced here : <quote> Can it be a problem related to fib_info reuse from different routes. For example, when local IP address is created for subnet we have: broadcast 192.168.0.255 dev DEV proto kernel scope link src 192.168.0.1 192.168.0.0/24 dev DEV proto kernel scope link src 192.168.0.1 local 192.168.0.1 dev DEV proto kernel scope host src 192.168.0.1 The "dev DEV proto kernel scope link src 192.168.0.1" is a reused fib_info structure where we put cached routes. The result can be same fib_info for 192.168.0.255 and 192.168.0.0/24. RTN_BROADCAST is cached only for input routes. Incoming broadcast to 192.168.0.255 can be cached and can cause problems for traffic forwarded to 192.168.0.0/24. So, this patch should solve the problem because it separates the broadcast from unicast traffic. And the ip_route_input_slow caching will work for local and broadcast input routes (above routes 1 and 3) just because they differ in scope and use different fib_info. </quote> Many thanks to Chris Clayton for his patience and help. Reported-by: NChris Clayton <chris2553@googlemail.com> Bisected-by: NChris Clayton <chris2553@googlemail.com> Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Tested-by: NChris Clayton <chris2553@googlemail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2012 3 次提交
-
-
由 Antonio Quartulli 提交于
skb_reset_mac_len() relies on the value of the skb->network_header pointer, therefore we must wait for such pointer to be recalculated before computing the new mac_len value. Signed-off-by: NAntonio Quartulli <ordex@autistici.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route to fe80::/64 is added in the main table: unreachable fe80::/64 dev lo proto kernel metric 256 error -101 This route does not match any prefix (no fe80:: address on lo). In fact, addrconf_dev_config() will not add link local address because this function filters interfaces by type. If the link local address is added manually, the route to the link local prefix will be automatically added by addrconf_add_linklocal(). Note also, that this route is not deleted when the address is removed. After looking at the code, it seems that addrconf_add_lroute() is redundant with addrconf_add_linklocal(), because this function will add the link local route when the link local address is configured. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
The network merge brought in a few users of functions that got deprecated by the workqueue cleanups: the 'system_nrt_wq' is now the same as the regular system_wq, since all workqueues are now non- reentrant. Similarly, remove one use of flush_work_sync() - the regular flush_work() has become synchronous, and the "_sync()" version is thus deprecated as being superfluous. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 10月, 2012 8 次提交
-
-
由 stephen hemminger 提交于
Needed for VXLAN. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Later changes need to be able to refer to neighbour attributes when doing fdb_add. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chuck Lever 提交于
An ULP is supposed to be able to replace a GSS rpc_auth object with another GSS rpc_auth object using rpcauth_create(). However, rpcauth_create() in 3.5 reliably fails with -EEXIST in this case. This is because when gss_create() attempts to create the upcall pipes, sometimes they are already there. For example if a pipe FS mount event occurs, or a previous GSS flavor was in use for this rpc_clnt. It turns out that's not the only problem here. While working on a fix for the above problem, we noticed that replacing an rpc_clnt's rpc_auth is not safe, since dereferencing the cl_auth field is not protected in any way. So we're deprecating the ability of rpcauth_create() to switch an rpc_clnt's security flavor during normal operation. Instead, let's add a fresh API that clones an rpc_clnt and gives the clone a new flavor before it's used. This makes immediate use of the new __rpc_clone_client() helper. This can be used in a similar fashion to rpcauth_create() when a client is hunting for the correct security flavor. Instead of replacing an rpc_clnt's security flavor in a loop, the ULP replaces the whole rpc_clnt. To fix the -EEXIST problem, any ULP logic that relies on replacing an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use this API instead. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Chuck Lever 提交于
rpc_clone_client() does most of the same tasks as rpc_new_client(), so there is an opportunity for code re-use. Create a generic helper that makes it easy to clone an RPC client while replacing any of the clnt's parameters. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Chuck Lever 提交于
Clean up: Some function names have changed, but debugging messages were never updated. Automate the construction of the function name in debugging messages. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Chuck Lever 提交于
Clean up: The blank space in front of the message must be spaces. Tabs show up on the console as a graphical character. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Sage Weil 提交于
If we are creating an osd request and get an invalid layout, return an EINVAL to the caller. We switch up the return to have an error code instead of NULL implying -ENOMEM. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-