1. 06 4月, 2017 1 次提交
  2. 17 3月, 2017 1 次提交
    • D
      rxrpc: Ignore BUSY packets on old calls · 4d4a6ac7
      David Howells 提交于
      If we receive a BUSY packet for a call we think we've just completed, the
      packet is handed off to the connection processor to deal with - but the
      connection processor doesn't expect a BUSY packet and so flags a protocol
      error.
      
      Fix this by simply ignoring the BUSY packet for the moment.
      
      The symptom of this may appear as a system call failing with EPROTO.  This
      may be triggered by pressing ctrl-C under some circumstances.
      
      This comes about we abort calls due to interruption by a signal (which we
      shouldn't do, but that's going to be a large fix and mostly in fs/afs/).
      What happens is that we abort the call and may also abort follow up calls
      too (this needs offloading somehoe).  So we see a transmission of something
      like the following sequence of packets:
      
      	DATA for call N
      	ABORT call N
      	DATA for call N+1
      	ABORT call N+1
      
      in very quick succession on the same channel.  However, the peer may have
      deferred the processing of the ABORT from the call N to a background thread
      and thus sees the DATA message from the call N+1 coming in before it has
      cleared the channel.  Thus it sends a BUSY packet[*].
      
      [*] Note that some implementations (OpenAFS, for example) mark the BUSY
          packet with one plus the callNumber of the call prior to call N.
          Ordinarily, this would be call N, but there's no requirement for the
          calls on a channel to be numbered strictly sequentially (the number is
          required to increase).
      
          This is wrong and means that the callNumber in the BUSY packet should
          be ignored (it really ought to be N+1 since that's what it's in
          response to).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d4a6ac7
  3. 11 3月, 2017 1 次提交
    • D
      rxrpc: Wake up the transmitter if Rx window size increases on the peer · 702f2ac8
      David Howells 提交于
      The RxRPC ACK packet may contain an extension that includes the peer's
      current Rx window size for this call.  We adjust the local Tx window size
      to match.  However, the transmitter can stall if the receive window is
      reduced to 0 by the peer and then reopened.
      
      This is because the normal way that the transmitter is re-energised is by
      dropping something out of our Tx queue and thus making space.  When a
      single gap is made, the transmitter is woken up.  However, because there's
      nothing in the Tx queue at this point, this doesn't happen.
      
      To fix this, perform a wake_up() any time we see the peer's Rx window size
      increasing.
      
      The observable symptom is that calls start failing on ETIMEDOUT and the
      following:
      
      	kAFS: SERVER DEAD state=-62
      
      appears in dmesg.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      702f2ac8
  4. 10 3月, 2017 1 次提交
  5. 08 3月, 2017 1 次提交
  6. 04 3月, 2017 1 次提交
  7. 02 3月, 2017 2 次提交
    • I
      sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1
      Ingo Molnar 提交于
      sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
      
      Fix up affected files that include this signal functionality via sched.h.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      174cd4b1
    • D
      rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48
      David Howells 提交于
      All the routines by which rxrpc is accessed from the outside are serialised
      by means of the socket lock (sendmsg, recvmsg, bind,
      rxrpc_kernel_begin_call(), ...) and this presents a problem:
      
       (1) If a number of calls on the same socket are in the process of
           connection to the same peer, a maximum of four concurrent live calls
           are permitted before further calls need to wait for a slot.
      
       (2) If a call is waiting for a slot, it is deep inside sendmsg() or
           rxrpc_kernel_begin_call() and the entry function is holding the socket
           lock.
      
       (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
           from servicing the other calls as they need to take the socket lock to
           do so.
      
       (4) The socket is stuck until a call is aborted and makes its slot
           available to the waiter.
      
      Fix this by:
      
       (1) Provide each call with a mutex ('user_mutex') that arbitrates access
           by the users of rxrpc separately for each specific call.
      
       (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
           they've got a call and taken its mutex.
      
           Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
           set but someone else has the lock.  Should I instead only return
           EWOULDBLOCK if there's nothing currently to be done on a socket, and
           sleep in this particular instance because there is something to be
           done, but we appear to be blocked by the interrupt handler doing its
           ping?
      
       (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
           call, locking its user mutex and adding it to the socket's call tree.
           The call is returned locked so that sendmsg() can add data to it
           immediately.
      
           From the moment the call is in the socket tree, it is subject to
           access by sendmsg() and recvmsg() - even if it isn't connected yet.
      
       (4) Lock new service calls in the UDP data_ready handler (in
           rxrpc_new_incoming_call()) because they may already be in the socket's
           tree and the data_ready handler makes them live immediately if a user
           ID has already been preassigned.
      
           Note that the new call is locked before any notifications are sent
           that it is live, so doing mutex_trylock() *ought* to always succeed.
           Userspace is prevented from doing sendmsg() on calls that are in a
           too-early state in rxrpc_do_sendmsg().
      
       (5) Make rxrpc_new_incoming_call() return the call with the user mutex
           held so that a ping can be scheduled immediately under it.
      
           Note that it might be worth moving the ping call into
           rxrpc_new_incoming_call() and then we can drop the mutex there.
      
       (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
           release the socket after adding the call to the socket's tree.  This
           is slightly tricky as we've dequeued the call by that point and have
           to requeue it.
      
           Note that requeuing emits a trace event.
      
       (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
           new mutex immediately and don't bother with the socket mutex at all.
      
      This patch has the nice bonus that calls on the same socket are now to some
      extent parallelisable.
      
      Note that we might want to move rxrpc_service_prealloc() calls out from the
      socket lock and give it its own lock, so that we don't hang progress in
      other calls because we're waiting for the allocator.
      
      We probably also want to avoid calling rxrpc_notify_socket() from within
      the socket lock (rxrpc_accept_call()).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540b1c48
  8. 27 2月, 2017 1 次提交
    • D
      rxrpc: Kernel calls get stuck in recvmsg · d7e15835
      David Howells 提交于
      Calls made through the in-kernel interface can end up getting stuck because
      of a missed variable update in a loop in rxrpc_recvmsg_data().  The problem
      is like this:
      
       (1) A new packet comes in and doesn't cause a notification to be given to
           the client as there's still another packet in the ring - the
           assumption being that if the client will keep drawing off data until
           the ring is empty.
      
       (2) The client is in rxrpc_recvmsg_data(), inside the big while loop that
           iterates through the packets.  This copies the window pointers into
           variables rather than using the information in the call struct
           because:
      
           (a) MSG_PEEK might be in effect;
      
           (b) we need a barrier after reading call->rx_top to pair with the
           	 barrier in the softirq routine that loads the buffer.
      
       (3) The reading of call->rx_top is done outside of the loop, and top is
           never updated whilst we're in the loop.  This means that even through
           there's a new packet available, we don't see it and may return -EFAULT
           to the caller - who will happily return to the scheduler and await the
           next notification.
      
       (4) No further notifications are forthcoming until there's an abort as the
           ring isn't empty.
      
      The fix is to move the read of call->rx_top inside the loop - but it needs
      to be done before the condition is checked.
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e15835
  9. 25 2月, 2017 1 次提交
    • M
      rxrpc: Fix an assertion in rxrpc_read() · 774521f3
      Marc Dionne 提交于
      In the rxrpc_read() function, which allows a user to read the contents of a
      key, we miscalculate the expected length of an encoded rxkad token by not
      taking into account the key length.  However, the data is stored later
      anyway with an ENCODE_DATA() call - and an assertion failure then ensues
      when the lengths are checked at the end.
      
      Fix this by including the key length in the token size estimation.
      
      The following assertion is produced:
      
      Assertion failed - 384(0x180) == 380(0x17c) is false
      ------------[ cut here ]------------
      kernel BUG at ../net/rxrpc/key.c:1221!
      invalid opcode: 0000 [#1] SMP
      Modules linked in:
      CPU: 2 PID: 2957 Comm: keyctl Not tainted 4.10.0-fscache+ #483
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804013a8500 task.stack: ffff8804013ac000
      RIP: 0010:rxrpc_read+0x10de/0x11b6
      RSP: 0018:ffff8804013afe48 EFLAGS: 00010296
      RAX: 000000000000003b RBX: 0000000000000003 RCX: 0000000000000000
      RDX: 0000000000040001 RSI: 00000000000000f6 RDI: 0000000000000300
      RBP: ffff8804013afed8 R08: 0000000000000001 R09: 0000000000000001
      R10: ffff8804013afd90 R11: 0000000000000002 R12: 00005575f7c911b4
      R13: 00005575f7c911b3 R14: 0000000000000157 R15: ffff880408a5d640
      FS:  00007f8dfbc73700(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005575f7c91008 CR3: 000000040120a000 CR4: 00000000001406e0
      Call Trace:
       keyctl_read_key+0xb6/0xd7
       SyS_keyctl+0x83/0xe7
       do_syscall_64+0x80/0x191
       entry_SYSCALL64_slow_path+0x25/0x25
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      774521f3
  10. 18 2月, 2017 1 次提交
  11. 09 1月, 2017 1 次提交
  12. 05 1月, 2017 4 次提交
    • D
      rxrpc: Show a call's hard-ACK cursors in /proc/net/rxrpc_calls · 3e018daf
      David Howells 提交于
      Show a call's hard-ACK cursors in /proc/net/rxrpc_calls so that a call's
      progress can be more easily monitored.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3e018daf
    • D
      rxrpc: Add some more tracing · b1d9f7fd
      David Howells 提交于
      Add the following extra tracing information:
      
       (1) Modify the rxrpc_transmit tracepoint to record the Tx window size as
           this is varied by the slow-start algorithm.
      
       (2) Modify the rxrpc_rx_ack tracepoint to record more information from
           received ACK packets.
      
       (3) Add an rxrpc_rx_data tracepoint to record the information in DATA
           packets.
      
       (4) Add an rxrpc_disconnect_call tracepoint to record call disconnection,
           including the reason the call was disconnected.
      
       (5) Add an rxrpc_improper_term tracepoint to record implicit termination
           of a call by a client either by starting a new call on a particular
           connection channel without first transmitting the final ACK for the
           previous call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b1d9f7fd
    • D
      rxrpc: Fix handling of enums-to-string translation in tracing · b54a134a
      David Howells 提交于
      Fix the way enum values are translated into strings in AF_RXRPC
      tracepoints.  The problem with just doing a lookup in a normal flat array
      of strings or chars is that external tracing infrastructure can't find it.
      Rather, TRACE_DEFINE_ENUM must be used.
      
      Also sort the enums and string tables to make it easier to keep them in
      order so that a future patch to __print_symbolic() can be optimised to try
      a direct lookup into the table first before iterating over it.
      
      A couple of _proto() macro calls are removed because they refered to tables
      that got moved to the tracing infrastructure.  The relevant data can be
      found by way of tracing.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b54a134a
    • Y
      scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr)) · 1ff8cebf
      yuan linyu 提交于
      sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
      remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
      CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.
      Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ff8cebf
  13. 15 12月, 2016 1 次提交
  14. 08 11月, 2016 1 次提交
    • P
      udp: do fwd memory scheduling on dequeue · 7c13f97f
      Paolo Abeni 提交于
      A new argument is added to __skb_recv_datagram to provide
      an explicit skb destructor, invoked under the receive queue
      lock.
      The UDP protocol uses such argument to perform memory
      reclaiming on dequeue, so that the UDP protocol does not
      set anymore skb->desctructor.
      Instead explicit memory reclaiming is performed at close() time and
      when skbs are removed from the receive queue.
      The in kernel UDP protocol users now need to call a
      skb_recv_udp() variant instead of skb_recv_datagram() to
      properly perform memory accounting on dequeue.
      
      Overall, this allows acquiring only once the receive queue
      lock on dequeue.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr sinks	vanilla		patched
      1		440		560
      3		2150		2300
      6		3650		3800
      9		4450		4600
      12		6250		6450
      
      v1 -> v2:
       - do rmem and allocated memory scheduling under the receive lock
       - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
       - avoid the typdef for the dequeue callback
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c13f97f
  15. 13 10月, 2016 2 次提交
  16. 06 10月, 2016 12 次提交
    • D
      rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase · bf7d620a
      David Howells 提交于
      Don't request an ACK on the last DATA packet of a call's Tx phase as for a
      client there will be a reply packet or some sort of ACK to shift phase.  If
      the ACK is requested, OpenAFS sends a REQUESTED-ACK ACK with soft-ACKs in
      it and doesn't follow up with a hard-ACK.
      
      If we don't set the flag, OpenAFS will send a DELAY ACK that hard-ACKs the
      reply data, thereby allowing the call to terminate cleanly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bf7d620a
    • D
      rxrpc: Need to produce an ACK for service op if op takes a long time · 9749fd2b
      David Howells 提交于
      We need to generate a DELAY ACK from the service end of an operation if we
      start doing the actual operation work and it takes longer than expected.
      This will hard-ACK the request data and allow the client to release its
      resources.
      
      To make this work:
      
       (1) We have to set the ack timer and propose an ACK when the call moves to
           the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel
           the timer when we start transmitting the reply (the first DATA packet
           of the reply implicitly ACKs the request phase).
      
       (2) It must be possible to set the timer when the caller is holding
           call->state_lock, so split the lock-getting part of the timer function
           out.
      
       (3) Add trace notes for the ACK we're requesting and the timer we clear.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9749fd2b
    • D
      rxrpc: Return negative error code to kernel service · cf69207a
      David Howells 提交于
      In rxrpc_kernel_recv_data(), when we return the error number incurred by a
      failed call, we must negate it before returning it as it's stored as
      positive (that's what we have to pass back to userspace).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cf69207a
    • D
      rxrpc: Add missing notification · 94bc669e
      David Howells 提交于
      The call's background processor work item needs to notify the socket when
      it completes a call so that recvmsg() or the AFS fs can deal with it.
      Without this, call expiry isn't handled.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      94bc669e
    • D
      rxrpc: Queue the call on expiry · d7833d00
      David Howells 提交于
      When a call expires, it must be queued for the background processor to deal
      with otherwise a service call that is improperly terminated will just sit
      there awaiting an ACK and won't expire.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d7833d00
    • D
      rxrpc: Partially handle OpenAFS's improper termination of calls · b3156274
      David Howells 提交于
      OpenAFS doesn't always correctly terminate client calls that it makes -
      this includes calls the OpenAFS servers make to the cache manager service.
      It should end the client call with either:
      
       (1) An ACK that has firstPacket set to one greater than the seq number of
           the reply DATA packet with the LAST_PACKET flag set (thereby
           hard-ACK'ing all packets).  nAcks should be 0 and acks[] should be
           empty (ie. no soft-ACKs).
      
       (2) An ACKALL packet.
      
      OpenAFS, though, may send an ACK packet with firstPacket set to the last
      seq number or less and soft-ACKs listed for all packets up to and including
      the last DATA packet.
      
      The transmitter, however, is obliged to keep the call live and the
      soft-ACK'd DATA packets around until they're hard-ACK'd as the receiver is
      permitted to drop any merely soft-ACK'd packet and request retransmission
      by sending an ACK packet with a NACK in it.
      
      Further, OpenAFS will also terminate a client call by beginning the next
      client call on the same connection channel.  This implicitly completes the
      previous call.
      
      This patch handles implicit ACK of a call on a channel by the reception of
      the first packet of the next call on that channel.
      
      If another call doesn't come along to implicitly ACK a call, then we have
      to time the call out.  There are some bugs there that will be addressed in
      subsequent patches.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b3156274
    • D
      rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs · a5af7e1f
      David Howells 提交于
      Separate the output of PING ACKs from the output of other sorts of ACK so
      that if we receive a PING ACK and schedule transmission of a PING RESPONSE
      ACK, the response doesn't get cancelled by a PING ACK we happen to be
      scheduling transmission of at the same time.
      
      If a PING RESPONSE gets lost, the other side might just sit there waiting
      for it and refuse to proceed otherwise.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a5af7e1f
    • D
      rxrpc: Fix warning by splitting rxrpc_send_call_packet() · 26cb02aa
      David Howells 提交于
      Split rxrpc_send_data_packet() to separate ACK generation (which is more
      complicated) from ABORT generation.  This simplifies the code a bit and
      fixes the following warning:
      
      In file included from ../net/rxrpc/output.c:20:0:
      net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
      net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      net/rxrpc/output.c:103:24: note: 'top' was declared here
      net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      26cb02aa
    • D
      rxrpc: Only ping for lost reply in client call · a9f312d9
      David Howells 提交于
      When a reply is deemed lost, we send a ping to find out the other end
      received all the request data packets we sent.  This should be limited to
      client calls and we shouldn't do this on service calls.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a9f312d9
    • D
      rxrpc: Fix oops on incoming call to serviceless endpoint · 7212a57e
      David Howells 提交于
      If an call comes in to a local endpoint that isn't listening for any
      incoming calls at the moment, an oops will happen.  We need to check that
      the local endpoint's service pointer isn't NULL before we dereference it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7212a57e
    • D
      rxrpc: Fix duplicate const · 19c0dbd5
      David Howells 提交于
      Remove a duplicate const keyword.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      19c0dbd5
    • D
      rxrpc: Accesses of rxrpc_local::service need to be RCU managed · b63452c1
      David Howells 提交于
      struct rxrpc_local->service is marked __rcu - this means that accesses of
      it need to be managed using RCU wrappers.  There are two such places in
      rxrpc_release_sock() where the value is checked and cleared.  Fix this by
      using the appropriate wrappers.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b63452c1
  17. 30 9月, 2016 8 次提交