- 27 2月, 2017 1 次提交
-
-
由 David Howells 提交于
Calls made through the in-kernel interface can end up getting stuck because of a missed variable update in a loop in rxrpc_recvmsg_data(). The problem is like this: (1) A new packet comes in and doesn't cause a notification to be given to the client as there's still another packet in the ring - the assumption being that if the client will keep drawing off data until the ring is empty. (2) The client is in rxrpc_recvmsg_data(), inside the big while loop that iterates through the packets. This copies the window pointers into variables rather than using the information in the call struct because: (a) MSG_PEEK might be in effect; (b) we need a barrier after reading call->rx_top to pair with the barrier in the softirq routine that loads the buffer. (3) The reading of call->rx_top is done outside of the loop, and top is never updated whilst we're in the loop. This means that even through there's a new packet available, we don't see it and may return -EFAULT to the caller - who will happily return to the scheduler and await the next notification. (4) No further notifications are forthcoming until there's an abort as the ring isn't empty. The fix is to move the read of call->rx_top inside the loop - but it needs to be done before the condition is checked. Reported-by: NMarc Dionne <marc.dionne@auristor.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NMarc Dionne <marc.dionne@auristor.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 2月, 2017 1 次提交
-
-
由 Marc Dionne 提交于
In the rxrpc_read() function, which allows a user to read the contents of a key, we miscalculate the expected length of an encoded rxkad token by not taking into account the key length. However, the data is stored later anyway with an ENCODE_DATA() call - and an assertion failure then ensues when the lengths are checked at the end. Fix this by including the key length in the token size estimation. The following assertion is produced: Assertion failed - 384(0x180) == 380(0x17c) is false ------------[ cut here ]------------ kernel BUG at ../net/rxrpc/key.c:1221! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 2 PID: 2957 Comm: keyctl Not tainted 4.10.0-fscache+ #483 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8804013a8500 task.stack: ffff8804013ac000 RIP: 0010:rxrpc_read+0x10de/0x11b6 RSP: 0018:ffff8804013afe48 EFLAGS: 00010296 RAX: 000000000000003b RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000040001 RSI: 00000000000000f6 RDI: 0000000000000300 RBP: ffff8804013afed8 R08: 0000000000000001 R09: 0000000000000001 R10: ffff8804013afd90 R11: 0000000000000002 R12: 00005575f7c911b4 R13: 00005575f7c911b3 R14: 0000000000000157 R15: ffff880408a5d640 FS: 00007f8dfbc73700(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005575f7c91008 CR3: 000000040120a000 CR4: 00000000001406e0 Call Trace: keyctl_read_key+0xb6/0xd7 SyS_keyctl+0x83/0xe7 do_syscall_64+0x80/0x191 entry_SYSCALL64_slow_path+0x25/0x25 Signed-off-by: NMarc Dionne <marc.dionne@auristor.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 2月, 2017 1 次提交
-
-
由 David Howells 提交于
Change module filename from af-rxrpc.ko to rxrpc.ko so as to be consistent with the other protocol drivers. Also adjust the documentation to reflect this. Further, there is no longer a standalone rxkad module, as it has been merged into the rxrpc core, so get rid of references to that. Reported-by: NMarc Dionne <marc.dionne@auristor.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 1月, 2017 1 次提交
-
-
由 David Howells 提交于
Allow listen() with a backlog of 0 to be used to disable listening on an AF_RXRPC socket. This also releases any preallocation, thereby making it easier for a kernel service to account for all allocated call structures when shutting down the service. The socket cannot thereafter have listening reenabled, but must rather be closed and reopened. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 05 1月, 2017 4 次提交
-
-
由 David Howells 提交于
Show a call's hard-ACK cursors in /proc/net/rxrpc_calls so that a call's progress can be more easily monitored. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Add the following extra tracing information: (1) Modify the rxrpc_transmit tracepoint to record the Tx window size as this is varied by the slow-start algorithm. (2) Modify the rxrpc_rx_ack tracepoint to record more information from received ACK packets. (3) Add an rxrpc_rx_data tracepoint to record the information in DATA packets. (4) Add an rxrpc_disconnect_call tracepoint to record call disconnection, including the reason the call was disconnected. (5) Add an rxrpc_improper_term tracepoint to record implicit termination of a call by a client either by starting a new call on a particular connection channel without first transmitting the final ACK for the previous call. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Fix the way enum values are translated into strings in AF_RXRPC tracepoints. The problem with just doing a lookup in a normal flat array of strings or chars is that external tracing infrastructure can't find it. Rather, TRACE_DEFINE_ENUM must be used. Also sort the enums and string tables to make it easier to keep them in order so that a future patch to __print_symbolic() can be optimised to try a direct lookup into the table first before iterating over it. A couple of _proto() macro calls are removed because they refered to tables that got moved to the tracing infrastructure. The relevant data can be found by way of tracing. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 yuan linyu 提交于
sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned. remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent. Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 12月, 2016 1 次提交
-
-
由 Matthew Wilcox 提交于
Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference to IDR_SIZE. Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NDavid Howells <dhowells@redhat.com> Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 11月, 2016 1 次提交
-
-
由 Paolo Abeni 提交于
A new argument is added to __skb_recv_datagram to provide an explicit skb destructor, invoked under the receive queue lock. The UDP protocol uses such argument to perform memory reclaiming on dequeue, so that the UDP protocol does not set anymore skb->desctructor. Instead explicit memory reclaiming is performed at close() time and when skbs are removed from the receive queue. The in kernel UDP protocol users now need to call a skb_recv_udp() variant instead of skb_recv_datagram() to properly perform memory accounting on dequeue. Overall, this allows acquiring only once the receive queue lock on dequeue. Tested using pktgen with random src port, 64 bytes packet, wire-speed on a 10G link as sender and udp_sink as the receiver, using an l4 tuple rxhash to stress the contention, and one or more udp_sink instances with reuseport. nr sinks vanilla patched 1 440 560 3 2150 2300 6 3650 3800 9 4450 4600 12 6250 6450 v1 -> v2: - do rmem and allocated memory scheduling under the receive lock - do bulk scheduling in first_packet_length() and in udp_destruct_sock() - avoid the typdef for the dequeue callback Suggested-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 10月, 2016 2 次提交
-
-
由 David Howells 提交于
ip6_route_output() doesn't return a negative error when it fails, rather the ->error field of the returned dst_entry struct needs to be checked. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Fixes: 75b54cb5 ("rxrpc: Add IPv6 support") Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Fix the following checker warning: net/rxrpc/call_object.c:279 rxrpc_new_client_call() warn: passing zero to 'ERR_PTR' where a value that's always zero is passed to ERR_PTR() so that it can be passed to a tracepoint in an auxiliary pointer field. Just pass NULL instead to the tracepoint. Fixes: a84a46d7 ("rxrpc: Add some additional call tracing") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 06 10月, 2016 12 次提交
-
-
由 David Howells 提交于
Don't request an ACK on the last DATA packet of a call's Tx phase as for a client there will be a reply packet or some sort of ACK to shift phase. If the ACK is requested, OpenAFS sends a REQUESTED-ACK ACK with soft-ACKs in it and doesn't follow up with a hard-ACK. If we don't set the flag, OpenAFS will send a DELAY ACK that hard-ACKs the reply data, thereby allowing the call to terminate cleanly. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
We need to generate a DELAY ACK from the service end of an operation if we start doing the actual operation work and it takes longer than expected. This will hard-ACK the request data and allow the client to release its resources. To make this work: (1) We have to set the ack timer and propose an ACK when the call moves to the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel the timer when we start transmitting the reply (the first DATA packet of the reply implicitly ACKs the request phase). (2) It must be possible to set the timer when the caller is holding call->state_lock, so split the lock-getting part of the timer function out. (3) Add trace notes for the ACK we're requesting and the timer we clear. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
In rxrpc_kernel_recv_data(), when we return the error number incurred by a failed call, we must negate it before returning it as it's stored as positive (that's what we have to pass back to userspace). Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
The call's background processor work item needs to notify the socket when it completes a call so that recvmsg() or the AFS fs can deal with it. Without this, call expiry isn't handled. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
When a call expires, it must be queued for the background processor to deal with otherwise a service call that is improperly terminated will just sit there awaiting an ACK and won't expire. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
OpenAFS doesn't always correctly terminate client calls that it makes - this includes calls the OpenAFS servers make to the cache manager service. It should end the client call with either: (1) An ACK that has firstPacket set to one greater than the seq number of the reply DATA packet with the LAST_PACKET flag set (thereby hard-ACK'ing all packets). nAcks should be 0 and acks[] should be empty (ie. no soft-ACKs). (2) An ACKALL packet. OpenAFS, though, may send an ACK packet with firstPacket set to the last seq number or less and soft-ACKs listed for all packets up to and including the last DATA packet. The transmitter, however, is obliged to keep the call live and the soft-ACK'd DATA packets around until they're hard-ACK'd as the receiver is permitted to drop any merely soft-ACK'd packet and request retransmission by sending an ACK packet with a NACK in it. Further, OpenAFS will also terminate a client call by beginning the next client call on the same connection channel. This implicitly completes the previous call. This patch handles implicit ACK of a call on a channel by the reception of the first packet of the next call on that channel. If another call doesn't come along to implicitly ACK a call, then we have to time the call out. There are some bugs there that will be addressed in subsequent patches. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Separate the output of PING ACKs from the output of other sorts of ACK so that if we receive a PING ACK and schedule transmission of a PING RESPONSE ACK, the response doesn't get cancelled by a PING ACK we happen to be scheduling transmission of at the same time. If a PING RESPONSE gets lost, the other side might just sit there waiting for it and refuse to proceed otherwise. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Split rxrpc_send_data_packet() to separate ACK generation (which is more complicated) from ABORT generation. This simplifies the code a bit and fixes the following warning: In file included from ../net/rxrpc/output.c:20:0: net/rxrpc/output.c: In function 'rxrpc_send_call_packet': net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized] net/rxrpc/output.c:103:24: note: 'top' was declared here net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized] Reported-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
When a reply is deemed lost, we send a ping to find out the other end received all the request data packets we sent. This should be limited to client calls and we shouldn't do this on service calls. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
If an call comes in to a local endpoint that isn't listening for any incoming calls at the moment, an oops will happen. We need to check that the local endpoint's service pointer isn't NULL before we dereference it. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Remove a duplicate const keyword. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
struct rxrpc_local->service is marked __rcu - this means that accesses of it need to be managed using RCU wrappers. There are two such places in rxrpc_release_sock() where the value is checked and cleared. Fix this by using the appropriate wrappers. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 30 9月, 2016 12 次提交
-
-
由 David Howells 提交于
The call timer's concept of a call timeout (of which there are three) that is inactive is that it is the timeout has the same expiration time as the call expiration timeout (the expiration timer is never inactive). However, I'm not resetting the timeouts when they expire, leading to repeated processing of expired timeouts when other timeout events occur. Fix this by: (1) Move the timer expiry detection into rxrpc_set_timer() inside the locked section. This means that if a timeout is set that will expire immediately, we deal with it immediately. (2) If a timeout is at or before now then it has expired. When an expiry is detected, an event is raised, the timeout is automatically inactivated and the event processor is queued. (3) If a timeout is at or after the expiry timeout then it is inactive. Inactive timeouts do not contribute to the timer setting. (4) The call timer callback can now just call rxrpc_set_timer() to handle things. (5) The call processor work function now checks the event flags rather than checking the timeouts directly. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Keep that call timeouts as ktimes rather than jiffies so that they can be expressed as functions of RTT. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Remove error from struct rxrpc_skb_priv as it is no longer used. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
The offset field in struct rxrpc_skb_priv is unnecessary as the value can always be calculated. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
When we receive an ACK from the peer that tells us what the peer's receive window (rwind) is, we should reduce ssthresh to rwind if rwind is smaller than ssthresh. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Switch to Congestion Avoidance mode at cwnd == ssthresh rather than relying on cwnd getting incremented beyond ssthresh and the window size, the mode being shifted and then cwnd being corrected. We need to make sure we switch into CA mode so that we stop marking every packet for ACK. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Note the serial number of the packet being ACK'd in the congestion management trace rather than the serial number of the ACK packet. Whilst the serial number of the ACK packet is useful for matching ACK packet in the output of wireshark, the serial number that the ACK is in response to is of more use in working out how different trace lines relate. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Set the request-ACK on more DATA packets whilst we're in slow start mode so that we get sufficient ACKs back to supply information to configure the window. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Reduce the rxrpc_local::services list to just a pointer as we don't permit multiple service endpoints to bind to a single transport endpoints (this is excluded by rxrpc_lookup_local()). The reason we don't allow this is that if you send a request to an AFS filesystem service, it will try to talk back to your cache manager on the port you sent from (this is how file change notifications are handled). To prevent someone from stealing your CM callbacks, we don't let AF_RXRPC sockets share a UDP socket if at least one of them has a service bound. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
In rxrpc_activate_channels(), the connection cache state is checked outside of the lock, which means it can change whilst we're waking calls up, thereby changing whether or not we're allowed to wake calls up. Fix this by moving the check inside the locked region. The check to see if all the channels are currently busy can stay outside of the locked region. Whilst we're at it: (1) Split the locked section out into its own function so that we can call it from other places in a later patch. (2) Determine the mask of channels dependent on the state as we're going to add another state in a later patch that will restrict the number of simultaneous calls to 1 on a connection. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
In rxrpc_send_data_packet() make the loss-injection path return through the same code as the transmission path so that the RTT determination is initiated and any future timer shuffling will be done, despite the packet having been binned. Whilst we're at it: (1) Add to the tx_data tracepoint an indication of whether or not we're retransmitting a data packet. (2) When we're deciding whether or not to request an ACK, rather than checking if we're in fast-retransmit mode check instead if we're retransmitting. (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're not altering the sk_buff refcount nor are we just seeing it after getting it off the Tx list. (4) The rxrpc_skb_tx_lost note is then no longer used so remove it. (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Exclusive connections are currently reusable (which they shouldn't be) because rxrpc_alloc_client_connection() checks the exclusive flag in the rxrpc_connection struct before it's initialised from the function parameters. This means that the DONT_REUSE flag doesn't get set. Fix this by checking the function parameters for the exclusive flag. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 25 9月, 2016 4 次提交
-
-
由 David Howells 提交于
Implement RxRPC slow-start, which is similar to RFC 5681 for TCP. A tracepoint is added to log the state of the congestion management algorithm and the decisions it makes. Notes: (1) Since we send fixed-size DATA packets (apart from the final packet in each phase), counters and calculations are in terms of packets rather than bytes. (2) The ACK packet carries the equivalent of TCP SACK. (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly suited to SACK of a small number of packets. It seems that, almost inevitably, by the time three 'duplicate' ACKs have been seen, we have narrowed the loss down to one or two missing packets, and the FLIGHT_SIZE calculation ends up as 2. (4) In rxrpc_resend(), if there was no data that apparently needed retransmission, we transmit a PING ACK to ask the peer to tell us what its Rx window state is. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
If we've sent all the request data in a client call but haven't seen any sign of the reply data yet, schedule an ACK to be sent to the server to find out if the reply data got lost. If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to demand a response to find out whether we need to retransmit. If the server says it has received all of the data, we send an IDLE ACK to tell the server that we haven't received anything in the receive phase as yet. To make this work, a non-immediate PING ACK must carry a delay. I've chosen the same as the IDLE ACK for the moment. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Generate a summary of the Tx buffer packet state when an ACK is received for use in a later patch that does congestion management. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
When determining the resend timer value, we have a value in nsec but the timer is in jiffies which may be a million or more times more coarse. nsecs_to_jiffies() rounds down - which means that the resend timeout expressed as jiffies is very likely earlier than the one expressed as nanoseconds from which it was derived. The problem is that rxrpc_resend() gets triggered by the timer, but can't then find anything to resend yet. It sets the timer again - but gets kicked off immediately again and again until the nanosecond-based expiry time is reached and we actually retransmit. Fix this by adding 1 to the jiffies-based resend_at value to counteract the rounding and make sure that the timer happens after the nanosecond-based expiry is passed. Alternatives would be to adjust the timestamp on the packets to align with the jiffie scale or to switch back to using jiffie-timestamps. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-