- 17 1月, 2014 2 次提交
-
-
由 Ying Xue 提交于
Comparing the behaviour of how to wait for events in TIPC accept() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. As sk_sleep() and sk->sk_receive_queue variables associated with socket are not protected by socket lock, the process of calling accept() may be woken up improperly or sometimes cannot be woken up at all. After standardizing it with inet_csk_wait_for_connect routine, we can get benefits including: avoiding 'thundering herd' phenomenon, adding a timeout mechanism for accept(), coping with a pending signal, and having sk_sleep() and sk->sk_receive_queue being always protected within socket lock scope and so on. Signed-off-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
Comparing the behaviour of how to wait for events in TIPC connect() with other stacks, the TIPC implementation might be perceived as different, and sometimes even incorrect. For instance, as both sock->state and sk_sleep() are directly fed to wait_event_interruptible_timeout() as its arguments, and socket lock has to be released before we call wait_event_interruptible_timeout(), the two variables associated with socket are exposed out of socket lock protection, thereby probably getting stale values so that the process of calling connect() cannot be woken up exactly even if correct event arrives or it is woken up improperly even if the wake condition is not satisfied in practice. Therefore, standardizing its behaviour with sk_stream_wait_connect routine can avoid these risks. Additionally the implementation of connect routine is simplified as a whole, allowing it to return correct values in all different cases. Signed-off-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 1月, 2014 1 次提交
-
-
由 wangweidong 提交于
In commit 3b8401fe ("tipc: kill unnecessary goto's") didn't make the code look most readable, so fix it. This patch is cosmetic and does not change the operation of TIPC in any way. Suggested-by: NDavid Laight <David.Laight@ACULAB.COM> Signed-off-by: NWang Weidong <wangweidong1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 12月, 2013 1 次提交
-
-
由 Ying Xue 提交于
A deadlock might occur if name table is withdrawn in socket release routine, and while packets are still being received from bearer. CPU0 CPU1 T0: recv_msg() release() T1: tipc_recv_msg() tipc_withdraw() T2: [grab node lock] [grab port lock] T3: tipc_link_wakeup_ports() tipc_nametbl_withdraw() T4: [grab port lock]* named_cluster_distribute() T5: wakeupdispatch() tipc_link_send() T6: [grab node lock]* The opposite order of holding port lock and node lock on above two different paths may result in a deadlock. If socket lock instead of port lock is used to protect port instance in tipc_withdraw(), the reverse order of holding port lock and node lock will be eliminated, as a result, the deadlock is killed as well. Reported-by: NLars Everbrand <lars.everbrand@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 12月, 2013 3 次提交
-
-
由 wangweidong 提交于
Instead of reaquiring the socket lock and taking the normal exit path when a connection times out, we bail out early with a return -ETIMEDOUT. Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NWang Weidong <wangweidong1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 wangweidong 提交于
Remove a number of needless 'goto exit' in send_stream when the socket is in an unconnected state. This patch is cosmetic and does not alter the operation of TIPC in any way. Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NWang Weidong <wangweidong1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 wangweidong 提交于
We remove a number of unnecessary variables and branches in TIPC. This patch is cosmetic and does not change the operation of TIPC in any way. Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NWang Weidong <wangweidong1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 11月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 10月, 2013 2 次提交
-
-
由 Ying Xue 提交于
Eliminate below sparse warnings: net/tipc/link.c:1210:37: warning: cast removes address space of expression net/tipc/link.c:1218:59: warning: incorrect type in argument 2 (different address spaces) net/tipc/link.c:1218:59: expected void const [noderef] <asn:1>*from net/tipc/link.c:1218:59: got unsigned char const [usertype] *[assigned] sect_crs net/tipc/socket.c:341:49: warning: Using plain integer as NULL pointer net/tipc/socket.c:1371:36: warning: Using plain integer as NULL pointer net/tipc/socket.c:1694:57: warning: Using plain integer as NULL pointer Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NAndreas Bofjäll <andreas.bofjall@ericsson.com> Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
tipc_msg_build() now copies message data from iovec to skb_buff using memcpy_fromiovecend(), which doesn't need to be passed the iovec length to perform the copying. So we remove the parameter indicating iovec length in all functions where TIPC messages are built and sent. Signed-off-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 8月, 2013 1 次提交
-
-
由 Erik Hugne 提交于
Should a connect fail, if the publication/server is unavailable or due to some other error, a positive value will be returned and errno is never set. If the application code checks for an explicit zero return from connect (success) or a negative return (failure), it will not catch the error and subsequent send() calls will fail as shown from the strace snippet below. socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3 connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111 sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe) The reason for this behaviour is that TIPC wrongly inverts error codes set in sk_err. Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2013 8 次提交
-
-
由 Paul Gortmaker 提交于
No runtime code changes here. Just a realign of the function arguments to start where the 1st one was, and fit as many args as can be put in an 80 char line. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
Directly save sock structure pointer instead of void pointer to avoid unnecessary cast conversions. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
After the removal of the native API, there is now only one way to to create a TIPC port instance -- the function tipc_createport_raw(). We make it more readable by renaming it to tipc_createport(). Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
As the new socket-based TIPC server infrastructure has been introduced, we can now convert the configuration server to use it. Then we can take future steps to simplify the configuration server locking policy. Some minor reordering of initialization is done, due to the dependency on having tipc_socket_init completed. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
As the new TIPC server infrastructure has been introduced, we can now convert the TIPC topology server to it. We get two benefits from doing this: 1) It simplifies the topology server locking policy. In the original locking policy, we placed one spin lock pointer in the tipc_subscriber structure to reuse the lock of the subscriber's server port, controlling access to members of tipc_subscriber instance. That is, we only used one lock to ensure both tipc_port and tipc_subscriber members were safely accessed. Now we introduce another spin lock for tipc_subscriber structure only protecting themselves, to get a finer granularity locking policy. Moreover, the change will allow us to make the topology server code more readable and maintainable. 2) It fixes a bug where sent subscription events may be lost when the topology port is congested. Using the new service, the topology server now queues sent events into an outgoing buffer, and then wakes up a sender process which has been blocked in workqueue context. The process will keep picking events from the buffer and send them to their respective subscribers, using the kernel socket interface, until the buffer is empty. Even if the socket is congested during transmission there is no risk that events may be dropped, since the sender process may block when needed. Some minor reordering of initialization is done, since we now have a scenario where the topology server must be started after socket initialization has taken place, as the former depends on the latter. And overall, we see a simplification of the TIPC subscriber code in making this changeover. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
TIPC has two internal servers, one providing a subscription service for topology events, and another providing the configuration interface. These servers have previously been running in BH context, accessing the TIPC-port (aka native) API directly. Apart from these servers, even the TIPC socket implementation is partially built on this API. As this API may simultaneously be called via different paths and in different contexts, a complex and costly lock policiy is required in order to protect TIPC internal resources. To eliminate the need for this complex lock policiy, we introduce a new, generic service API that uses kernel sockets for message passing instead of the native API. Once the toplogy and configuration servers are converted to use this new service, all code pertaining to the native API can be removed. This entails a significant reduction in code amount and complexity, and opens up for a complete rework of the locking policy in TIPC. The new service also solves another problem: As the current topology server works in BH context, it cannot easily be blocked when sending of events fails due to congestion. In such cases events may have to be silently dropped, something that is unacceptable. Therefore, the new service keeps a dedicated outbound queue receiving messages from BH context. Once messages are inserted into this queue, we will immediately schedule a work from a special workqueue. This way, messages/events from the topology server are in reality sent in process context, and the server can block if necessary. Analogously, there is a new workqueue for receiving messages. Once a notification about an arriving message is received in BH context, we schedule a work from the receive workqueue to do the job of receiving the message in process context. As both sending and receive messages are now finished in processes, subscribed events cannot be dropped any more. As of this commit, this new server infrastructure is built, but not actually yet called by the existing TIPC code, but since the conversion changes required in order to use it are significant, the addition is kept here as a separate commit. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Erik Hugne 提交于
TIPC's implied connect feature, aka piggyback connect, allows applications to save one syscall and all SYN/SYN-ACK signalling overhead when setting up a connection. Until now, this has only been supported for SEQPACKET sockets. Here, we make it possible to use this feature even with stream sockets. At the connecting side, the connection is completed when the first data message arrives from the accepting peer. This means that we must allow the connecting user to call blocking recv() before the socket has reached state SS_CONNECTED. So we must must relax the state machine check at recv_stream(), and allow the recv() call even if socket is in state SS_CONNECTING. Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ying Xue 提交于
As per feedback from the netdev community, we change the buffer overflow protection algorithm in receiving sockets so that it always respects the nominal upper limit set in sk_rcvbuf. Instead of scaling up from a small sk_rcvbuf value, which leads to violation of the configured sk_rcvbuf limit, we now calculate the weighted per-message limit by scaling down from a much bigger value, still in the same field, according to the importance priority of the received message. To allow for administrative tunability of the socket receive buffer size, we create a tipc_rmem sysctl variable to allow the user to configure an even bigger value via sysctl command. It is a size of three (min/default/max) to be consistent with things like tcp_rmem. By default, the value initialized in tipc_rmem[1] is equal to the receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message. This value is also set as the default value of sk_rcvbuf. Originally-by: NJon Maloy <jon.maloy@ericsson.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> [Ying: added sysctl variation to Jon's original patch] Signed-off-by: NYing Xue <ying.xue@windriver.com> [PG: don't compile sysctl.c if not config'd; add Documentation] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2013 1 次提交
-
-
由 Mathias Krause 提交于
The code in set_orig_addr() does not initialize all of the members of struct sockaddr_tipc when filling the sockaddr info -- namely the union is only partly filled. This will make recv_msg() and recv_stream() -- the only users of this function -- leak kernel stack memory as the msg_name member is a local variable in net/socket.c. Additionally to that both recv_msg() and recv_stream() fail to update the msg_namelen member to 0 while otherwise returning with 0, i.e. "success". This is the case for, e.g., non-blocking sockets. This will lead to a 128 byte kernel stack leak in net/socket.c. Fix the first issue by initializing the memory of the union with memset(0). Fix the second one by setting msg_namelen to 0 early as it will be updated later if we're going to fill the msg_name member. Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: NMathias Krause <minipli@googlemail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 2月, 2013 3 次提交
-
-
由 Ying Xue 提交于
As the number of iovecs in a send request is already limited within UIO_MAXIOV(i.e. 1024) in __sys_sendmsg(), it's unnecessary to check it again in TIPC stack. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
Change overload control to be purely byte-based, using sk->sk_rmem_alloc as byte counter, and compare it to a calculated upper limit for the socket receive queue. For all connection messages, irrespective of message importance, the overload limit is set to a constant value (i.e, 67MB). This limit should normally never be reached because of the lower limit used by the flow control algorithm, and is there only as a last resort in case a faulty peer doesn't respect the send window limit. For datagram messages, message importance is taken into account when calculating the overload limit. The calculation is based on sk->sk_rcvbuf, and is hence configurable via the socket option SO_RCVBUF. Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
The tipc function discard_rx_queue() is just a duplicated implementation of __skb_queue_purge(). Remove the former and directly invoke __skb_queue_purge(). In doing so, the underscores convey to the code reader, more information about the current locking state that is assumed. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 08 12月, 2012 8 次提交
-
-
由 Paul Gortmaker 提交于
In TIPC's accept() routine, there is a large block of code relating to initialization of a new socket, all within an if condition checking if the allocation succeeded. Here, we simply flip the check of the if, so that the main execution path stays at the same indentation level, which improves readability. If the allocation fails, we jump to an already existing exit label. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
TIPC accept() call grabs the socket lock on a newly allocated socket while holding the socket lock on an old socket. But lockdep worries that this might be a recursive lock attempt: [ INFO: possible recursive locking detected ] --------------------------------------------- kworker/u:0/6 is trying to acquire lock: (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c1226c>] accept+0x15c/0x310 [tipc] but task is already holding lock: (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c12138>] accept+0x28/0x310 [tipc] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_TIPC); lock(sk_lock-AF_TIPC); *** DEADLOCK *** May be due to missing lock nesting notation [...] Tell lockdep that this locking is safe by using lock_sock_nested(). This is similar to what was done in commit 5131a184 for SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate"). Also note that this is isn't something that is seen normally, as it was uncovered with some experimental work-in-progress code not yet ready for mainline. So no need for stable backports or similar of this commit. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
As connection setup is now completed asynchronously in BH context, in the function filter_connect(), the corresponding code in recv_msg() becomes redundant. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
TIPC has so far only supported blocking connect(), meaning that a call to connect() doesn't return until either the connection is fully established, or an error occurs. This has proved insufficient for many users, so we now introduce non-blocking connect(), analogous to how this is done in TCP and other protocols. With this feature, if a connection cannot be established instantly, connect() will return the error code "-EINPROGRESS". If the user later calls connect() again, he will either have the return code "-EALREADY" or "-EISCONN", depending on whether the connection has been established or not. The user must have explicitly set the socket to be non-blocking (SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless for some reason they had set this already (the socket would anyway remain blocking in current TIPC) this change should be completely backwards compatible. It is also now possible to call select() or poll() to wait for the completion of a connection. An effect of the above is that the actual completion of a connection may now be performed asynchronously, independent of the calls from user space. Therefore, we now execute this code in BH context, in the function filter_rcv(), which is executed upon reception of messages in the socket. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> [PG: minor refactoring for improved connect/disconnect function names] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
Handling of connection-related message reception is currently scattered around at different places in the code. This makes it harder to verify that things are handled correctly in all possible scenarios. So we consolidate the existing processing of connection-oriented message reception in a single routine. In the process, we convert the chain of if/else into a switch/case for improved readability. A cast on the socket_state in the switch is needed to avoid compile warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value ‘4294967295’ not in enumerated type". This happens because existing tipc code pseudo extends the default linux socket state values with: #define SS_LISTENING -1 /* socket is listening */ #define SS_READY -2 /* socket is connectionless */ It may make sense to add these as _positive_ values to the existing socket state enum list someday, vs. these already existing defines. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> [PG: add cast to fix warning; remove returns from middle of switch] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
Currently we have tipc_disconnect and tipc_disconnect_port. It is not clear from the names alone, what they do or how they differ. It turns out that tipc_disconnect just deals with the port locking and then calls tipc_disconnect_port which does all the work. If we rename as follows: tipc_disconnect_port --> __tipc_disconnect then we will be following typical linux convention, where: __tipc_disconnect: "raw" function that does all the work. tipc_disconnect: wrapper that deals with locking and then calls the real core __tipc_disconnect function With this, the difference is immediately evident, and locking violations are more apt to be spotted by chance while working on, or even just while reading the code. On the connect side of things, we currently only have the single "tipc_connect2port" function. It does both the locking at enter/exit, and the core of the work. Pending changes will make it desireable to have the connect be a two part locking wrapper + worker function, just like the disconnect is already. Here, we make the connect look just like the updated disconnect case, for the above reason, and for consistency. In the process, we also get rid of the "2port" suffix that was on the original name, since it adds no descriptive value. On close examination, one might notice that the above connect changes implicitly move the call to tipc_link_get_max_pkt() to be within the scope of tipc_port_lock() protected region; when it was not previously. We don't see any issues with this, and it is in keeping with __tipc_connect doing the work and tipc_connect just handling the locking. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Jon Maloy 提交于
The sk_recv_queue upper limit for connectionless sockets has empirically turned out to be too low. When we double the current limit we get much fewer rejected messages and no noticable negative side-effects. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
As a complement to the per-socket sk_recv_queue limit, TIPC keeps a global atomic counter for the sum of sk_recv_queue sizes across all tipc sockets. When incremented, the counter is compared to an upper threshold value, and if this is reached, the message is rejected with error code TIPC_OVERLOAD. This check was originally meant to protect the node against buffer exhaustion and general CPU overload. However, all experience indicates that the feature not only is redundant on Linux, but even harmful. Users run into the limit very often, causing disturbances for their applications, while removing it seems to have no negative effects at all. We have also seen that overall performance is boosted significantly when this bottleneck is removed. Furthermore, we don't see any other network protocols maintaining such a mechanism, something strengthening our conviction that this control can be eliminated. As a result, the atomic variable tipc_queue_size is now unused and so it can be deleted. There is a getsockopt call that used to allow reading it; we retain that but just return zero for maximum compatibility. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Cc: Neil Horman <nhorman@tuxdriver.com> [PG: phase out tipc_queue_size as pointed out by Neil Horman] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 22 11月, 2012 3 次提交
-
-
由 Ying Xue 提交于
When a socket is shut down, we should wake up all thread sleeping on it, instead of just one of them. Otherwise, when several threads are polling the same socket, and one of them does shutdown(), the remaining threads may end up sleeping forever. Also, to align socket usage with common practice in other stacks, we use one of the common socket callback handlers, sk_state_change(), to wake up pending users. This is similar to the usage in e.g. inet_shutdown(). [net/ipv4/af_inet.c]. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Erik Hugne 提交于
If an implied connect is attempted on a nonblocking STREAM/SEQPACKET socket during link congestion, the connect message will be discarded and sendmsg will return EAGAIN. This is normal behavior, and the application is expected to poll the socket until POLLOUT is set, after which the connection attempt can be retried. However, the POLLOUT flag is never set for unconnected sockets and poll() always returns a zero mask. The application is then left without a trigger for when it can make another attempt at sending the message. The solution is to check if we're polling on an unconnected socket and set the POLLOUT flag if the TIPC port owned by this socket is not congested. The TIPC ports waiting on a specific link will be marked as 'not congested' when the link congestion have abated. Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Ying Xue 提交于
When an application blocks at poll/select on a TIPC socket while requesting a specific event mask, both the filter_rcv() and wakeupdispatch() case will wake it up unconditionally whenever the state changes (i.e an incoming message arrives, or congestion has subsided). No mask is used. To avoid this, we populate sk->sk_data_ready and sk->sk_write_space with tipc_data_ready and tipc_write_space respectively, which makes tipc more in alignment with the rest of the networking code. These pass the exact set of possible events to the waker in fs/select.c hence avoiding waking up blocked processes unnecessarily. In doing so, we uncover another issue -- that there needs to be a memory barrier in these poll/receive callbacks, otherwise we are subject to the the same race as documented above wq_has_sleeper() [in commit a57de0b4 "net: adding memory barrier to the poll and receive callbacks"]. So we need to replace poll_wait() with sock_poll_wait() and use rcu protection for the sk->sk_wq pointer in these two new functions. Signed-off-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 05 10月, 2012 1 次提交
-
-
由 Erik Hugne 提交于
When large buffers are sent over connected TIPC sockets, it is likely that the sk_backlog will be filled up on the receiver side, but the TIPC flow control mechanism is happily unaware of this since that is based on message count. The sender will receive a TIPC_ERR_OVERLOAD message when this occurs and drop it's side of the connection, leaving it stale on the receiver end. By increasing the sk_rcvbuf to a 'worst case' value, we avoid the overload caused by a full backlog queue and the flow control will work properly. This worst case value is the max TIPC message size times the flow control window, multiplied by two because a sender will transmit up to double the window size before a port is marked congested. We multiply this by 2 to account for the sk_buff and other overheads. Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 7月, 2012 1 次提交
-
-
由 Erik Hugne 提交于
All messages should go directly to the kernel log. The TIPC specific error, warning, info and debug trace macro's are removed and all references replaced with pr_err, pr_warn, pr_info and pr_debug. Commonly used sub-strings are explicitly declared as a const char to reduce .text size. Note that this means the debug messages (changed to pr_debug), are now enabled through dynamic debugging, instead of a TIPC specific Kconfig option (TIPC_DEBUG). The latter will be phased out completely Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> [PG: use pr_fmt as suggested by Joe Perches <joe@perches.com>] Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 11 7月, 2012 1 次提交
-
-
由 Ben Hutchings 提交于
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 6月, 2012 1 次提交
-
-
由 Joe Perches 提交于
Adding casts of objects to the same type is unnecessary and confusing for a human reader. For example, this cast: int y; int *p = (int *)&y; I used the coccinelle script below to find and remove these unnecessary casts. I manually removed the conversions this script produces of casts with __force and __user. @@ type T; T *p; @@ - (T *)p + p Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 5月, 2012 1 次提交
-
-
由 Paul Gortmaker 提交于
Some of the comment blocks are floating in limbo between two functions, or between blocks of code. Delete the extra line feeds between any comment and its associated following block of code, to be consistent with the majority of the rest of the kernel. Also delete trailing newlines at EOF and fix a couple trivial typos in existing comments. This is a 100% cosmetic change with no runtime impact. We get rid of over 500 lines of non-code, and being blank line deletes, they won't even show up as noise in git blame. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 27 4月, 2012 1 次提交
-
-
由 Allan Stephens 提交于
Adds check to ensure TIPC sockets reject incoming payload messages that have an unrecognized message type. Remove the old open question about whether TIPC_ERR_NO_PORT is the proper return value. It is appropriate here since there are valid instances where another node can make use of the reply, and at this point in time the host is already broadcasting TIPC data, so there are no real security concerns. Signed-off-by: NAllan Stephens <allan.stephens@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-