1. 28 9月, 2018 3 次提交
    • D
      rxrpc: Make service call handling more robust · 0099dc58
      David Howells 提交于
      Make the following changes to improve the robustness of the code that sets
      up a new service call:
      
       (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
           service ID check and pass that along to rxrpc_new_incoming_call().
           This means that I can remove the check from rxrpc_new_incoming_call()
           without the need to worry about the socket attached to the local
           endpoint getting replaced - which would invalidate the check.
      
       (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
           done once.  The peer is passed to rxrpc_new_incoming_call(), thereby
           saving the need to repeat the search.
      
           This also reduces the possibility of rxrpc_publish_service_conn()
           BUG()'ing due to the detection of a duplicate connection, despite the
           initial search done by rxrpc_find_connection_rcu() having turned up
           nothing.
      
           This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
           non-reentrant and the result of the initial search should still hold
           true, but it has proven possible to hit.
      
           I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
           the iteration over the hash table if it finds a matching peer with a
           zero usage count, but I don't know for sure since it's only ever been
           hit once that I know of.
      
           Another possibility is that a bug in rxrpc_data_ready() that checked
           the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
           might've let through a packet that caused a spurious and invalid call
           to be set up.  That is addressed in another patch.
      
       (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
           usage count rather than stopping and returning not found, just in case
           there's another peer record behind it in the bucket.
      
       (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
           rather either use the peer cached in (2) or, if one wasn't found,
           preemptively install a new one.
      
      Fixes: 8496af50 ("rxrpc: Use RCU to access a peer's service connection tree")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0099dc58
    • D
      rxrpc: Emit BUSY packets when supposed to rather than ABORTs · ece64fec
      David Howells 提交于
      In the input path, a received sk_buff can be marked for rejection by
      setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
      (such as an abort code) in skb->priority.  The rejection is handled by
      queueing the sk_buff up for dealing with in process context.  The output
      code reads the mark and priority and, theoretically, generates an
      appropriate response packet.
      
      However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
      message with a random abort code is generated (since skb->priority wasn't
      set to anything).
      
      Fix this by outputting the appropriate sort of packet.
      
      Also, whilst we're at it, most of the marks are no longer used, so remove
      them and rename the remaining two to something more obvious.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ece64fec
    • D
      rxrpc: Fix checks as to whether we should set up a new call · dc71db34
      David Howells 提交于
      There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED
      flag in the packet type field rather than in the packet flags field.
      
      Fix this by creating a pair of helper functions to check whether the packet
      is going to the client or to the server and use them generally.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dc71db34
  2. 09 8月, 2018 1 次提交
    • D
      rxrpc: Fix the keepalive generator [ver #2] · 330bdcfa
      David Howells 提交于
      AF_RXRPC has a keepalive message generator that generates a message for a
      peer ~20s after the last transmission to that peer to keep firewall ports
      open.  The implementation is incorrect in the following ways:
      
       (1) It mixes up ktime_t and time64_t types.
      
       (2) It uses ktime_get_real(), the output of which may jump forward or
           backward due to adjustments to the time of day.
      
       (3) If the current time jumps forward too much or jumps backwards, the
           generator function will crank the base of the time ring round one slot
           at a time (ie. a 1s period) until it catches up, spewing out VERSION
           packets as it goes.
      
      Fix the problem by:
      
       (1) Only using time64_t.  There's no need for sub-second resolution.
      
       (2) Use ktime_get_seconds() rather than ktime_get_real() so that time
           isn't perceived to go backwards.
      
       (3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
           parts:
      
           (a) The "worker" function that manages the buckets and the timer.
      
           (b) The "dispatch" function that takes the pending peers and
           	 potentially transmits a keepalive packet before putting them back
           	 in the ring into the slot appropriate to the revised last-Tx time.
      
       (4) Taking everything that's pending out of the ring and splicing it into
           a temporary collector list for processing.
      
           In the case that there's been a significant jump forward, the ring
           gets entirely emptied and then the time base can be warped forward
           before the peers are processed.
      
           The warping can't happen if the ring isn't empty because the slot a
           peer is in is keepalive-time dependent, relative to the base time.
      
       (5) Limit the number of iterations of the bucket array when scanning it.
      
       (6) Set the timer to skip any empty slots as there's no point waking up if
           there's nothing to do yet.
      
      This can be triggered by an incoming call from a server after a reboot with
      AF_RXRPC and AFS built into the kernel causing a peer record to be set up
      before userspace is started.  The system clock is then adjusted by
      userspace, thereby potentially causing the keepalive generator to have a
      meltdown - which leads to a message like:
      
      	watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
      	...
      	Workqueue: krxrpcd rxrpc_peer_keepalive_worker
      	EIP: lock_acquire+0x69/0x80
      	...
      	Call Trace:
      	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
      	 ? _raw_spin_lock_bh+0x29/0x60
      	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
      	 ? rxrpc_peer_keepalive_worker+0x5e/0x350
      	 ? __lock_acquire+0x3d3/0x870
      	 ? process_one_work+0x110/0x340
      	 ? process_one_work+0x166/0x340
      	 ? process_one_work+0x110/0x340
      	 ? worker_thread+0x39/0x3c0
      	 ? kthread+0xdb/0x110
      	 ? cancel_delayed_work+0x90/0x90
      	 ? kthread_stop+0x70/0x70
      	 ? ret_from_fork+0x19/0x24
      
      Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      330bdcfa
  3. 01 8月, 2018 3 次提交
    • D
      rxrpc: Transmit more ACKs during data reception · d0b35a42
      David Howells 提交于
      Immediately flush any outstanding ACK on entry to rxrpc_recvmsg_data() -
      which transfers data to the target buffers - if we previously had an Rx
      underrun (ie. we returned -EAGAIN because we ran out of received data).
      This lets the server know what we've managed to receive something.
      
      Also flush any outstanding ACK after calling the function if it hit -EAGAIN
      to let the server know we processed some data.
      
      It might be better to send more ACKs, possibly on a time-based scheme, but
      that needs some more consideration.
      
      With this and some additional AFS patches, it is possible to get large
      unencrypted O_DIRECT reads to be almost as fast as NFS over TCP.  It looks
      like it might be theoretically possible to improve performance yet more for
      a server running a single operation as investigation of packet timestamps
      indicates that the server keeps stalling.
      
      The issue appears to be that rxrpc runs in to trouble with ACK packets
      getting batched together (up to ~32 at a time) somewhere between the IP
      transmit queue on the client and the ethernet receive queue on the server.
      
      However, this case isn't too much of a worry as even a lightly loaded
      server should be receiving sufficient packet flux to flush the ACK packets
      to the UDP socket.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d0b35a42
    • D
      rxrpc: Increase the size of a call's Rx window · 4075295a
      David Howells 提交于
      Increase the size of a call's Rx window from 32 to 63 - ie. one less than
      the size of the ring buffer.  This makes large data transfers perform
      better when the Tx window on the other side is around 64 (as is the case
      with Auristor's YFS fileserver).
      
      If the server window size is ~32 or smaller, this should make no
      difference.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4075295a
    • D
      rxrpc: Trace packet transmission · 4764c0da
      David Howells 提交于
      Trace successful packet transmission (kernel_sendmsg() succeeded, that is)
      in AF_RXRPC.  We can share the enum that defines the transmission points
      with the trace_rxrpc_tx_fail() tracepoint, so rename its constants to be
      applicable to both.
      
      Also, save the internal call->debug_id in the rxrpc_channel struct so that
      it can be used in retransmission trace lines.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4764c0da
  4. 05 6月, 2018 1 次提交
    • D
      rxrpc: Fix handling of call quietly cancelled out on server · 1a025028
      David Howells 提交于
      Sometimes an in-progress call will stop responding on the fileserver when
      the fileserver quietly cancels the call with an internally marked abort
      (RX_CALL_DEAD), without sending an ABORT to the client.
      
      This causes the client's call to eventually expire from lack of incoming
      packets directed its way, which currently leads to it being cancelled
      locally with ETIME.  Note that it's not currently clear as to why this
      happens as it's really hard to reproduce.
      
      The rotation policy implement by kAFS, however, doesn't differentiate
      between ETIME meaning we didn't get any response from the server and ETIME
      meaning the call got cancelled mid-flow.  The latter leads to an oops when
      fetching data as the rotation partially resets the afs_read descriptor,
      which can result in a cleared page pointer being dereferenced because that
      page has already been filled.
      
      Handle this by the following means:
      
       (1) Set a flag on a call when we receive a packet for it.
      
       (2) Store the highest packet serial number so far received for a call
           (bearing in mind this may wrap).
      
       (3) If, when the "not received anything recently" timeout expires on a
           call, we've received at least one packet for a call and the connection
           as a whole has received packets more recently than that call, then
           cancel the call locally with ECONNRESET rather than ETIME.
      
           This indicates that the call was definitely in progress on the server.
      
       (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME,
           don't try the next server, but rather abort the call.
      
           This avoids the oops as we don't try to reuse the afs_read struct.
           Rather, as-yet ungotten pages will be reread at a later data.
      
      Also:
      
       (5) Add an rxrpc tracepoint to log detection of the call being reset.
      
      Without this, I occasionally see an oops like the following:
      
          general protection fault: 0000 [#1] SMP PTI
          ...
          RIP: 0010:_copy_to_iter+0x204/0x310
          RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206
          RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560
          RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000
          RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400
          R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958
          R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560
          FS:  00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0
          Call Trace:
           skb_copy_datagram_iter+0x14e/0x289
           rxrpc_recvmsg_data.isra.0+0x6f3/0xf68
           ? trace_buffer_unlock_commit_regs+0x4f/0x89
           rxrpc_kernel_recv_data+0x149/0x421
           afs_extract_data+0x1e0/0x798
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_deliver_fs_fetch_data+0x33a/0x5ab
           afs_deliver_to_call+0x1ee/0x5e0
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_wait_for_call_to_complete+0x12b/0x52e
           ? wake_up_q+0x54/0x54
           afs_make_call+0x287/0x462
           ? afs_fs_fetch_data+0x3e6/0x3ed
           ? rcu_read_lock_sched_held+0x5d/0x63
           afs_fs_fetch_data+0x3e6/0x3ed
           afs_fetch_data+0xbb/0x14a
           afs_readpages+0x317/0x40d
           __do_page_cache_readahead+0x203/0x2ba
           ? ondemand_readahead+0x3a7/0x3c1
           ondemand_readahead+0x3a7/0x3c1
           generic_file_buffered_read+0x18b/0x62f
           __vfs_read+0xdb/0xfe
           vfs_read+0xb2/0x137
           ksys_read+0x50/0x8c
           do_syscall_64+0x7d/0x1a0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Note the weird value in RDI which is a result of trying to kmap() a NULL
      page pointer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a025028
  5. 16 5月, 2018 1 次提交
  6. 11 5月, 2018 1 次提交
    • D
      rxrpc: Fix missing start of call timeout · c54e43d7
      David Howells 提交于
      The expect_rx_by call timeout is supposed to be set when a call is started
      to indicate that we need to receive a packet by that point.  This is
      currently put back every time we receive a packet, but it isn't started
      when we first send a packet.  Without this, the call may wait forever if
      the server doesn't deign to reply.
      
      Fix this by setting the timeout upon a successful UDP sendmsg call for the
      first DATA packet.  The timeout is initiated only for initial transmission
      and not for subsequent retries as we don't want the retry mechanism to
      extend the timeout indefinitely.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c54e43d7
  7. 31 3月, 2018 6 次提交
    • D
      rxrpc: Fix leak of rxrpc_peer objects · 17226f12
      David Howells 提交于
      When a new client call is requested, an rxrpc_conn_parameters struct object
      is passed in with a bunch of parameters set, such as the local endpoint to
      use.  A pointer to the target peer record is also placed in there by
      rxrpc_get_client_conn() - and this is removed if and only if a new
      connection object is allocated.  Thus it leaks if a new connection object
      isn't allocated.
      
      Fix this by putting any peer object attached to the rxrpc_conn_parameters
      object in the function that allocated it.
      
      Fixes: 19ffa01c ("rxrpc: Use structs to hold connection params and protocol info")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      17226f12
    • D
      rxrpc: Add a tracepoint to track rxrpc_peer refcounting · 1159d4b4
      David Howells 提交于
      Add a tracepoint to track reference counting on the rxrpc_peer struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1159d4b4
    • D
      rxrpc: Fix apparent leak of rxrpc_local objects · 31f5f9a1
      David Howells 提交于
      rxrpc_local objects cannot be disposed of until all the connections that
      point to them have been RCU'd as a connection object holds refcount on the
      local endpoint it is communicating through.  Currently, this can cause an
      assertion failure to occur when a network namespace is destroyed as there's
      no check that the RCU destructors for the connections have been run before
      we start trying to destroy local endpoints.
      
      The kernel reports:
      
      	rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5}
      	------------[ cut here ]------------
      	kernel BUG at ../net/rxrpc/local_object.c:439!
      
      Fix this by keeping a count of the live connections and waiting for it to
      go to zero at the end of rxrpc_destroy_all_connections().
      
      Fixes: dee46364 ("rxrpc: Add RCU destruction for connections and calls")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      31f5f9a1
    • D
      rxrpc: Add a tracepoint to track rxrpc_local refcounting · 09d2bf59
      David Howells 提交于
      Add a tracepoint to track reference counting on the rxrpc_local struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      09d2bf59
    • D
      rxrpc: Fix potential call vs socket/net destruction race · d3be4d24
      David Howells 提交于
      rxrpc_call structs don't pin sockets or network namespaces, but may attempt
      to access both after their refcount reaches 0 so that they can detach
      themselves from the network namespace.  However, there's no guarantee that
      the socket still exists at this point (so sock_net(&call->socket->sk) may
      be invalid) and the namespace may have gone away if the call isn't pinning
      a peer.
      
      Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b)
      waiting for all calls to be destroyed when the network namespace goes away.
      
      This was detected by checker:
      
      net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces)
      net/rxrpc/call_object.c:634:57:    expected struct sock const *sk
      net/rxrpc/call_object.c:634:57:    got struct sock [noderef] <asn:4>*<noident>
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d3be4d24
    • D
      rxrpc: Fix firewall route keepalive · ace45bec
      David Howells 提交于
      Fix the firewall route keepalive part of AF_RXRPC which is currently
      function incorrectly by replying to VERSION REPLY packets from the server
      with VERSION REQUEST packets.
      
      Instead, send VERSION REPLY packets to the peers of service connections to
      act as keep-alives 20s after the latest packet was transmitted to that
      peer.
      
      Also, just discard VERSION REPLY packets rather than replying to them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ace45bec
  8. 28 3月, 2018 2 次提交
    • D
      rxrpc: Trace call completion · 1bae5d22
      David Howells 提交于
      Add a tracepoint to track rxrpc calls moving into the completed state and
      to log the completion type and the recorded error value and abort code.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1bae5d22
    • D
      rxrpc, afs: Use debug_ids rather than pointers in traces · a25e21f0
      David Howells 提交于
      In rxrpc and afs, use the debug_ids that are monotonically allocated to
      various objects as they're allocated rather than pointers as kernel
      pointers are now hashed making them less useful.  Further, the debug ids
      aren't reused anywhere nearly as quickly.
      
      In addition, allow kernel services that use rxrpc, such as afs, to take
      numbers from the rxrpc counter, assign them to their own call struct and
      pass them in to rxrpc for both client and service calls so that the trace
      lines for each will have the same ID tag.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a25e21f0
  9. 24 11月, 2017 8 次提交
    • D
      rxrpc: Fix conn expiry timers · 3d18cbb7
      David Howells 提交于
      Fix the rxrpc connection expiry timers so that connections for closed
      AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
      transport UDP port much more quickly.
      
       (1) Replace the delayed work items with work items plus timers so that
           timer_reduce() can be used to shorten them and so that the timer
           doesn't requeue the work item if the net namespace is dead.
      
       (2) Don't use queue_delayed_work() as that won't alter the timeout if the
           timer is already running.
      
       (3) Don't rearm the timers if the network namespace is dead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d18cbb7
    • D
      rxrpc: Fix service endpoint expiry · f859ab61
      David Howells 提交于
      RxRPC service endpoints expire like they're supposed to by the following
      means:
      
       (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
           global service conn timeout, otherwise the first rxrpc_net struct to
           die will cause connections on all others to expire immediately from
           then on.
      
       (2) Mark local service endpoints for which the socket has been closed
           (->service_closed) so that the expiration timeout can be much
           shortened for service and client connections going through that
           endpoint.
      
       (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
           count reaches 1, not 0, as idle conns have a 1 count.
      
       (4) The accumulator for the earliest time we might want to schedule for
           should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
           the comparison functions use signed arithmetic.
      
       (5) Simplify the expiration handling, adding the expiration value to the
           idle timestamp each time rather than keeping track of the time in the
           past before which the idle timestamp must go to be expired.  This is
           much easier to read.
      
       (6) Ignore the timeouts if the net namespace is dead.
      
       (7) Restart the service reaper work item rather the client reaper.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f859ab61
    • D
      rxrpc: Add keepalive for a call · 415f44e4
      David Howells 提交于
      We need to transmit a packet every so often to act as a keepalive for the
      peer (which has a timeout from the last time it received a packet) and also
      to prevent any intervening firewalls from closing the route.
      
      Do this by resetting a timer every time we transmit a packet.  If the timer
      ever expires, we transmit a PING ACK packet and thereby also elicit a PING
      RESPONSE ACK from the other side - which prevents our last-rx timeout from
      expiring.
      
      The timer is set to 1/6 of the last-rx timeout so that we can detect the
      other side going away if it misses 6 replies in a row.
      
      This is particularly necessary for servers where the processing of the
      service function may take a significant amount of time.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      415f44e4
    • D
      rxrpc: Add a timeout for detecting lost ACKs/lost DATA · bd1fdf8c
      David Howells 提交于
      Add an extra timeout that is set/updated when we send a DATA packet that
      has the request-ack flag set.  This allows us to detect if we don't get an
      ACK in response to the latest flagged packet.
      
      The ACK packet is adjudged to have been lost if it doesn't turn up within
      2*RTT of the transmission.
      
      If the timeout occurs, we schedule the sending of a PING ACK to find out
      the state of the other side.  If a new DATA packet is ready to go sooner,
      we cancel the sending of the ping and set the request-ack flag on that
      instead.
      
      If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
      we had at the time of the ping transmission, we adjudge all the DATA
      packets sent between the response tx_top and the ping-time tx_top to have
      been lost and retransmit immediately.
      
      Rather than sending a PING ACK, we could just pick a DATA packet and
      speculatively retransmit that with request-ack set.  It should result in
      either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
      a PING-RESPONSE ACK mentioned above.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bd1fdf8c
    • D
      rxrpc: Fix call timeouts · a158bdd3
      David Howells 提交于
      Fix the rxrpc call expiration timeouts and make them settable from
      userspace.  By analogy with other rx implementations, there should be three
      timeouts:
      
       (1) "Normal timeout"
      
           This is set for all calls and is triggered if we haven't received any
           packets from the peer in a while.  It is measured from the last time
           we received any packet on that call.  This is not reset by any
           connection packets (such as CHALLENGE/RESPONSE packets).
      
           If a service operation takes a long time, the server should generate
           PING ACKs at a duration that's substantially less than the normal
           timeout so is to keep both sides alive.  This is set at 1/6 of normal
           timeout.
      
       (2) "Idle timeout"
      
           This is set only for a service call and is triggered if we stop
           receiving the DATA packets that comprise the request data.  It is
           measured from the last time we received a DATA packet.
      
       (3) "Hard timeout"
      
           This can be set for a call and specified the maximum lifetime of that
           call.  It should not be specified by default.  Some operations (such
           as volume transfer) take a long time.
      
      Allow userspace to set/change the timeouts on a call with sendmsg, using a
      control message:
      
      	RXRPC_SET_CALL_TIMEOUTS
      
      The data to the message is a number of 32-bit words, not all of which need
      be given:
      
      	u32 hard_timeout;	/* sec from first packet */
      	u32 idle_timeout;	/* msec from packet Rx */
      	u32 normal_timeout;	/* msec from data Rx */
      
      This can be set in combination with any other sendmsg() that affects a
      call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a158bdd3
    • D
      rxrpc: Split the call params from the operation params · 48124178
      David Howells 提交于
      When rxrpc_sendmsg() parses the control message buffer, it places the
      parameters extracted into a structure, but lumps together call parameters
      (such as user call ID) with operation parameters (such as whether to send
      data, send an abort or accept a call).
      
      Split the call parameters out into their own structure, a copy of which is
      then embedded in the operation parameters struct.
      
      The call parameters struct is then passed down into the places that need it
      instead of passing the individual parameters.  This allows for extra call
      parameters to be added.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      48124178
    • D
      rxrpc: Delay terminal ACK transmission on a client call · 3136ef49
      David Howells 提交于
      Delay terminal ACK transmission on a client call by deferring it to the
      connection processor.  This allows it to be skipped if we can send the next
      call instead, the first DATA packet of which will implicitly ack this call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3136ef49
    • D
      rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls · 9faaff59
      David Howells 提交于
      Provide a different lockdep key for rxrpc_call::user_mutex when the call is
      made on a kernel socket, such as by the AFS filesystem.
      
      The problem is that lockdep registers a false positive between userspace
      calling the sendmsg syscall on a user socket where call->user_mutex is held
      whilst userspace memory is accessed whereas the AFS filesystem may perform
      operations with mmap_sem held by the caller.
      
      In such a case, the following warning is produced.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.14.0-fscache+ #243 Tainted: G            E
      ------------------------------------------------------
      modpost/16701 is trying to acquire lock:
       (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]
      
      but task is already holding lock:
       (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&mm->mmap_sem){++++}:
             __might_fault+0x61/0x89
             _copy_from_iter_full+0x40/0x1fa
             rxrpc_send_data+0x8dc/0xff3
             rxrpc_do_sendmsg+0x62f/0x6a1
             rxrpc_sendmsg+0x166/0x1b7
             sock_sendmsg+0x2d/0x39
             ___sys_sendmsg+0x1ad/0x22b
             __sys_sendmsg+0x41/0x62
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #2 (&call->user_mutex){+.+.}:
             __mutex_lock+0x86/0x7d2
             rxrpc_new_client_call+0x378/0x80e
             rxrpc_kernel_begin_call+0xf3/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_vl_get_capabilities+0x193/0x198 [kafs]
             afs_vl_lookup_vldb+0x5f/0x151 [kafs]
             afs_create_volume+0x2e/0x2f4 [kafs]
             afs_mount+0x56a/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #1 (k-sk_lock-AF_RXRPC){+.+.}:
             lock_sock_nested+0x74/0x8a
             rxrpc_kernel_begin_call+0x8a/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_fs_get_capabilities+0x17a/0x17f [kafs]
             afs_probe_fileserver+0xf7/0x2f0 [kafs]
             afs_select_fileserver+0x83f/0x903 [kafs]
             afs_fetch_status+0x89/0x11d [kafs]
             afs_iget+0x16f/0x4f8 [kafs]
             afs_mount+0x6c6/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #0 (&vnode->io_lock){+.+.}:
             lock_acquire+0x174/0x19f
             __mutex_lock+0x86/0x7d2
             afs_begin_vnode_operation+0x33/0x77 [kafs]
             afs_fetch_data+0x80/0x12a [kafs]
             afs_readpages+0x314/0x405 [kafs]
             __do_page_cache_readahead+0x203/0x2ba
             filemap_fault+0x179/0x54d
             __do_fault+0x17/0x60
             __handle_mm_fault+0x6d7/0x95c
             handle_mm_fault+0x24e/0x2a3
             __do_page_fault+0x301/0x486
             do_page_fault+0x236/0x259
             page_fault+0x22/0x30
             __clear_user+0x3d/0x60
             padzero+0x1c/0x2b
             load_elf_binary+0x785/0xdc7
             search_binary_handler+0x81/0x1ff
             do_execveat_common.isra.14+0x600/0x888
             do_execve+0x1f/0x21
             SyS_execve+0x28/0x2f
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      other info that might help us debug this:
      
      Chain exists of:
        &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&mm->mmap_sem);
                                     lock(&call->user_mutex);
                                     lock(&mm->mmap_sem);
        lock(&vnode->io_lock);
      
       *** DEADLOCK ***
      
      1 lock held by modpost/16701:
       #0:  (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      stack backtrace:
      CPU: 0 PID: 16701 Comm: modpost Tainted: G            E   4.14.0-fscache+ #243
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Call Trace:
       dump_stack+0x67/0x8e
       print_circular_bug+0x341/0x34f
       check_prev_add+0x11f/0x5d4
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? __lock_acquire+0xf77/0x10b4
       __lock_acquire+0xf77/0x10b4
       lock_acquire+0x174/0x19f
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       __mutex_lock+0x86/0x7d2
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_fetch_data+0x80/0x12a [kafs]
       afs_readpages+0x314/0x405 [kafs]
       __do_page_cache_readahead+0x203/0x2ba
       ? filemap_fault+0x179/0x54d
       filemap_fault+0x179/0x54d
       __do_fault+0x17/0x60
       __handle_mm_fault+0x6d7/0x95c
       handle_mm_fault+0x24e/0x2a3
       __do_page_fault+0x301/0x486
       do_page_fault+0x236/0x259
       page_fault+0x22/0x30
      RIP: 0010:__clear_user+0x3d/0x60
      RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
      RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
      R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
       padzero+0x1c/0x2b
       load_elf_binary+0x785/0xdc7
       search_binary_handler+0x81/0x1ff
       do_execveat_common.isra.14+0x600/0x888
       do_execve+0x1f/0x21
       SyS_execve+0x28/0x2f
       do_syscall_64+0x89/0x1be
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7fdb6009ee07
      RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
      RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
      RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
      R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9faaff59
  10. 02 11月, 2017 1 次提交
    • D
      rxrpc: Lock around calling a kernel service Rx notification · 20acbd9a
      David Howells 提交于
      Place a spinlock around the invocation of call->notify_rx() for a kernel
      service call and lock again when ending the call and replace the
      notification pointer with a pointer to a dummy function.
      
      This is required because it's possible for rxrpc_notify_socket() to be
      called after the call has been ended by the kernel service if called from
      the asynchronous work function rxrpc_process_call().
      
      However, rxrpc_notify_socket() currently only holds the RCU read lock when
      invoking ->notify_rx(), which means that the afs_call struct would need to
      be disposed of by call_rcu() rather than by kfree().
      
      But we shouldn't see any notifications from a call after calling
      rxrpc_kernel_end_call(), so a lock is required in rxrpc code.
      
      Without this, we may see the call wait queue as having a corrupt spinlock:
      
          BUG: spinlock bad magic on CPU#0, kworker/0:2/1612
          general protection fault: 0000 [#1] SMP
          ...
          Workqueue: krxrpcd rxrpc_process_call
          task: ffff88040b83c400 task.stack: ffff88040adfc000
          RIP: 0010:spin_bug+0x161/0x18f
          RSP: 0018:ffff88040adffcc0 EFLAGS: 00010002
          RAX: 0000000000000032 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff81ab16cf
          RDX: ffff88041fa14c01 RSI: ffff88041fa0ccb8 RDI: ffff88041fa0ccb8
          RBP: ffff88040adffcd8 R08: 00000000ffffffff R09: 00000000ffffffff
          R10: ffff88040adffc60 R11: 000000000000022c R12: ffff88040aca2208
          R13: ffffffff81a58114 R14: 0000000000000000 R15: 0000000000000000
          ....
          Call Trace:
           do_raw_spin_lock+0x1d/0x89
           _raw_spin_lock_irqsave+0x3d/0x49
           ? __wake_up_common_lock+0x4c/0xa7
           __wake_up_common_lock+0x4c/0xa7
           ? __lock_is_held+0x47/0x7a
           __wake_up+0xe/0x10
           afs_wake_up_call_waiter+0x11b/0x122 [kafs]
           rxrpc_notify_socket+0x12b/0x258
           rxrpc_process_call+0x18e/0x7d0
           process_one_work+0x298/0x4de
           ? rescuer_thread+0x280/0x280
           worker_thread+0x1d1/0x2ae
           ? rescuer_thread+0x280/0x280
           kthread+0x12c/0x134
           ? kthread_create_on_node+0x3a/0x3a
           ret_from_fork+0x27/0x40
      
      In this case, note the corrupt data in EBX.  The address of the offending
      afs_call is in R12, plus the offset to the spinlock.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20acbd9a
  11. 29 8月, 2017 3 次提交
    • D
      rxrpc: Allow failed client calls to be retried · c038a58c
      David Howells 提交于
      Allow a client call that failed on network error to be retried, provided
      that the Tx queue still holds DATA packet 1.  This allows an operation to
      be submitted to another server or another address for the same server
      without having to repackage and re-encrypt the data so far processed.
      
      Two new functions are provided:
      
       (1) rxrpc_kernel_check_call() - This is used to find out the completion
           state of a call to guess whether it can be retried and whether it
           should be retried.
      
       (2) rxrpc_kernel_retry_call() - Disconnect the call from its current
           connection, reset the state and submit it as a new client call to a
           new address.  The new address need not match the previous address.
      
      A call may be retried even if all the data hasn't been loaded into it yet;
      a partially constructed will be retained at the same point it was at when
      an error condition was detected.  msg_data_left() can be used to find out
      how much data was packaged before the error occurred.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c038a58c
    • D
      rxrpc: Fix IPv6 support · 7b674e39
      David Howells 提交于
      Fix IPv6 support in AF_RXRPC in the following ways:
      
       (1) When extracting the address from a received IPv4 packet, if the local
           transport socket is open for IPv6 then fill out the sockaddr_rxrpc
           struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead
           of an AF_INET one.
      
       (2) When sending CHALLENGE or RESPONSE packets, the transport length needs
           to be set from the sockaddr_rxrpc::transport_len field rather than
           sizeof() on the IPv4 transport address.
      
       (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up
           the address correctly before searching for the affected peer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7b674e39
    • B
      net: rxrpc: Replace time_t type with time64_t type · 10674a03
      Baolin Wang 提交于
      Since the 'expiry' variable of 'struct key_preparsed_payload' has been
      changed to 'time64_t' type, which is year 2038 safe on 32bits system.
      
      In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type
      when copying ticket expires time to 'prep->expiry', then this patch
      introduces two helper functions to help convert 'u32' to 'time64_t'
      type.
      
      This patch also uses ktime_get_real_seconds() to get current time instead
      of get_seconds() which is not year 2038 safe on 32bits system.
      Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      10674a03
  12. 21 7月, 2017 1 次提交
    • D
      rxrpc: Move the packet.h include file into net/rxrpc/ · ddc6c70f
      David Howells 提交于
      Move the protocol description header file into net/rxrpc/ and rename it to
      protocol.h.  It's no longer necessary to expose it as packets are no longer
      exposed to kernel services (such as AFS) that use the facility.
      
      The abort codes are transferred to the UAPI header instead as we pass these
      back to userspace and also to kernel services.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ddc6c70f
  13. 15 6月, 2017 1 次提交
    • D
      rxrpc: Cache the congestion window setting · f7aec129
      David Howells 提交于
      Cache the congestion window setting that was determined during a call's
      transmission phase when it finishes so that it can be used by the next call
      to the same peer, thereby shortcutting the slow-start algorithm.
      
      The value is stored in the rxrpc_peer struct and is accessed without
      locking.  Each call takes the value that happens to be there when it starts
      and just overwrites the value when it finishes.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7aec129
  14. 08 6月, 2017 1 次提交
    • D
      rxrpc: Provide a cmsg to specify the amount of Tx data for a call · e754eba6
      David Howells 提交于
      Provide a control message that can be specified on the first sendmsg() of a
      client call or the first sendmsg() of a service response to indicate the
      total length of the data to be transmitted for that call.
      
      Currently, because the length of the payload of an encrypted DATA packet is
      encrypted in front of the data, the packet cannot be encrypted until we
      know how much data it will hold.
      
      By specifying the length at the beginning of the transmit phase, each DATA
      packet length can be set before we start loading data from userspace (where
      several sendmsg() calls may contribute to a particular packet).
      
      An error will be returned if too little or too much data is presented in
      the Tx phase.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e754eba6
  15. 05 6月, 2017 4 次提交
    • D
      rxrpc: Add service upgrade support for client connections · 4e255721
      David Howells 提交于
      Make it possible for a client to use AuriStor's service upgrade facility.
      
      The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
      the first sendmsg() of a call.  This takes no parameters.
      
      When recvmsg() starts returning data from the call, the service ID field in
      the returned msg_name will reflect the result of the upgrade attempt.  If
      the upgrade was ignored, srx_service will match what was set in the
      sendmsg(); if the upgrade happened the srx_service will be altered to
      indicate the service the server upgraded to.
      
      Note that:
      
       (1) The choice of upgrade service is up to the server
      
       (2) Further client calls to the same server that would share a connection
           are blocked if an upgrade probe is in progress.
      
       (3) This should only be used to probe the service.  Clients should then
           use the returned service ID in all subsequent communications with that
           server (and not set the upgrade).  Note that the kernel will not
           retain this information should the connection expire from its cache.
      
       (4) If a server that supports upgrading is replaced by one that doesn't,
           whilst a connection is live, and if the replacement is running, say,
           OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
           server will not respond to packets sent to the upgraded connection.
      
           At this point, calls will time out and the server must be reprobed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4e255721
    • D
      rxrpc: Implement service upgrade · 4722974d
      David Howells 提交于
      Implement AuriStor's service upgrade facility.  There are three problems
      that this is meant to deal with:
      
       (1) Various of the standard AFS RPC calls have IPv4 addresses in their
           requests and/or replies - but there's no room for including IPv6
           addresses.
      
       (2) Definition of IPv6-specific RPC operations in the standard operation
           sets has not yet been achieved.
      
       (3) One could envision the creation a new service on the same port that as
           the original service.  The new service could implement improved
           operations - and the client could try this first, falling back to the
           original service if it's not there.
      
           Unfortunately, certain servers ignore packets addressed to a service
           they don't implement and don't respond in any way - not even with an
           ABORT.  This means that the client must then wait for the call timeout
           to occur.
      
      What service upgrade does is to see if the connection is marked as being
      'upgradeable' and if so, change the service ID in the server and thus the
      request and reply formats.  Note that the upgrade isn't mandatory - a
      server that supports only the original call set will ignore the upgrade
      request.
      
      In the protocol, the procedure is then as follows:
      
       (1) To request an upgrade, the first DATA packet in a new connection must
           have the userStatus set to 1 (this is normally 0).  The userStatus
           value is normally ignored by the server.
      
       (2) If the server doesn't support upgrading, the reply packets will
           contain the same service ID as for the first request packet.
      
       (3) If the server does support upgrading, all future reply packets on that
           connection will contain the new service ID and the new service ID will
           be applied to *all* further calls on that connection as well.
      
       (4) The RPC op used to probe the upgrade must take the same request data
           as the shadow call in the upgrade set (but may return a different
           reply).  GetCapability RPC ops were added to all standard sets for
           just this purpose.  Ops where the request formats differ cannot be
           used for probing.
      
       (5) The client must wait for completion of the probe before sending any
           further RPC ops to the same destination.  It should then use the
           service ID that recvmsg() reported back in all future calls.
      
       (6) The shadow service must have call definitions for all the operation
           IDs defined by the original service.
      
      
      To support service upgrading, a server should:
      
       (1) Call bind() twice on its AF_RXRPC socket before calling listen().
           Each bind() should supply a different service ID, but the transport
           addresses must be the same.  This allows the server to receive
           requests with either service ID.
      
       (2) Enable automatic upgrading by calling setsockopt(), specifying
           RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
           unsigned shorts as the argument:
      
      	unsigned short optval[2];
      
           This specifies a pair of service IDs.  They must be different and must
           match the service IDs bound to the socket.  Member 0 is the service ID
           to upgrade from and member 1 is the service ID to upgrade to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4722974d
    • D
      rxrpc: Permit multiple service binding · 28036f44
      David Howells 提交于
      Permit bind() to be called on an AF_RXRPC socket more than once (currently
      maximum twice) to bind multiple listening services to it.  There are some
      restrictions:
      
       (1) All bind() calls involved must have a non-zero service ID.
      
       (2) The service IDs must all be different.
      
       (3) The rest of the address (notably the transport part) must be the same
           in all (a single UDP socket is shared).
      
       (4) This must be done before listen() or sendmsg() is called.
      
      This allows someone to connect to the service socket with different service
      IDs and lays the foundation for service upgrading.
      
      The service ID used by an incoming call can be extracted from the msg_name
      returned by recvmsg().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      28036f44
    • D
      rxrpc: Separate the connection's protocol service ID from the lookup ID · 68d6d1ae
      David Howells 提交于
      Keep the rxrpc_connection struct's idea of the service ID that is exposed
      in the protocol separate from the service ID that's used as a lookup key.
      
      This allows the protocol service ID on a client connection to get upgraded
      without making the connection unfindable for other client calls that also
      would like to use the upgraded connection.
      
      The connection's actual service ID is then returned through recvmsg() by
      way of msg_name.
      
      Whilst we're at it, we get rid of the last_service_id field from each
      channel.  The service ID is per-connection, not per-call and an entire
      connection is upgraded in one go.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      68d6d1ae
  16. 26 5月, 2017 1 次提交
    • D
      rxrpc: Support network namespacing · 2baec2c3
      David Howells 提交于
      Support network namespacing in AF_RXRPC with the following changes:
      
       (1) All the local endpoint, peer and call lists, locks, counters, etc. are
           moved into the per-namespace record.
      
       (2) All the connection tracking is moved into the per-namespace record
           with the exception of the client connection ID tree, which is kept
           global so that connection IDs are kept unique per-machine.
      
       (3) Each namespace gets its own epoch.  This allows each network namespace
           to pretend to be a separate client machine.
      
       (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
           the contents reflect the namespace.
      
      fs/afs/ should be okay with this patch as it explicitly requires the current
      net namespace to be init_net to permit a mount to proceed at the moment.  It
      will, however, need updating so that cells, IP addresses and DNS records are
      per-namespace also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2baec2c3
  17. 06 4月, 2017 1 次提交
    • D
      rxrpc: Trace protocol errors in received packets · fb46f6ee
      David Howells 提交于
      Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received
      packets.  The following changes are made:
      
       (1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a
           call and mark the call aborted.  This is wrapped by
           rxrpc_abort_eproto() that makes the why string usable in trace.
      
       (2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error
           generation points, replacing rxrpc_abort_call() with the latter.
      
       (3) Only send an abort packet in rxkad_verify_packet*() if we actually
           managed to abort the call.
      
      Note that a trace event is also emitted if a kernel user (e.g. afs) tries
      to send data through a call when it's not in the transmission phase, though
      it's not technically a receive event.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fb46f6ee
  18. 02 3月, 2017 1 次提交
    • D
      rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48
      David Howells 提交于
      All the routines by which rxrpc is accessed from the outside are serialised
      by means of the socket lock (sendmsg, recvmsg, bind,
      rxrpc_kernel_begin_call(), ...) and this presents a problem:
      
       (1) If a number of calls on the same socket are in the process of
           connection to the same peer, a maximum of four concurrent live calls
           are permitted before further calls need to wait for a slot.
      
       (2) If a call is waiting for a slot, it is deep inside sendmsg() or
           rxrpc_kernel_begin_call() and the entry function is holding the socket
           lock.
      
       (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
           from servicing the other calls as they need to take the socket lock to
           do so.
      
       (4) The socket is stuck until a call is aborted and makes its slot
           available to the waiter.
      
      Fix this by:
      
       (1) Provide each call with a mutex ('user_mutex') that arbitrates access
           by the users of rxrpc separately for each specific call.
      
       (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
           they've got a call and taken its mutex.
      
           Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
           set but someone else has the lock.  Should I instead only return
           EWOULDBLOCK if there's nothing currently to be done on a socket, and
           sleep in this particular instance because there is something to be
           done, but we appear to be blocked by the interrupt handler doing its
           ping?
      
       (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
           call, locking its user mutex and adding it to the socket's call tree.
           The call is returned locked so that sendmsg() can add data to it
           immediately.
      
           From the moment the call is in the socket tree, it is subject to
           access by sendmsg() and recvmsg() - even if it isn't connected yet.
      
       (4) Lock new service calls in the UDP data_ready handler (in
           rxrpc_new_incoming_call()) because they may already be in the socket's
           tree and the data_ready handler makes them live immediately if a user
           ID has already been preassigned.
      
           Note that the new call is locked before any notifications are sent
           that it is live, so doing mutex_trylock() *ought* to always succeed.
           Userspace is prevented from doing sendmsg() on calls that are in a
           too-early state in rxrpc_do_sendmsg().
      
       (5) Make rxrpc_new_incoming_call() return the call with the user mutex
           held so that a ping can be scheduled immediately under it.
      
           Note that it might be worth moving the ping call into
           rxrpc_new_incoming_call() and then we can drop the mutex there.
      
       (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
           release the socket after adding the call to the socket's tree.  This
           is slightly tricky as we've dequeued the call by that point and have
           to requeue it.
      
           Note that requeuing emits a trace event.
      
       (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
           new mutex immediately and don't bother with the socket mutex at all.
      
      This patch has the nice bonus that calls on the same socket are now to some
      extent parallelisable.
      
      Note that we might want to move rxrpc_service_prealloc() calls out from the
      socket lock and give it its own lock, so that we don't hang progress in
      other calls because we're waiting for the allocator.
      
      We probably also want to avoid calling rxrpc_notify_socket() from within
      the socket lock (rxrpc_accept_call()).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540b1c48