1. 14 9月, 2016 2 次提交
  2. 08 9月, 2016 3 次提交
    • D
      rxrpc: Rewrite the data and ack handling code · 248f219c
      David Howells 提交于
      Rewrite the data and ack handling code such that:
      
       (1) Parsing of received ACK and ABORT packets and the distribution and the
           filing of DATA packets happens entirely within the data_ready context
           called from the UDP socket.  This allows us to process and discard ACK
           and ABORT packets much more quickly (they're no longer stashed on a
           queue for a background thread to process).
      
       (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim().  We instead
           keep track of the offset and length of the content of each packet in
           the sk_buff metadata.  This means we don't do any allocation in the
           receive path.
      
       (3) Jumbo DATA packet parsing is now done in data_ready context.  Rather
           than cloning the packet once for each subpacket and pulling/trimming
           it, we file the packet multiple times with an annotation for each
           indicating which subpacket is there.  From that we can directly
           calculate the offset and length.
      
       (4) A call's receive queue can be accessed without taking locks (memory
           barriers do have to be used, though).
      
       (5) Incoming calls are set up from preallocated resources and immediately
           made live.  They can than have packets queued upon them and ACKs
           generated.  If insufficient resources exist, DATA packet #1 is given a
           BUSY reply and other DATA packets are discarded).
      
       (6) sk_buffs no longer take a ref on their parent call.
      
      To make this work, the following changes are made:
      
       (1) Each call's receive buffer is now a circular buffer of sk_buff
           pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
           between the call and the socket.  This permits each sk_buff to be in
           the buffer multiple times.  The receive buffer is reused for the
           transmit buffer.
      
       (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
           to the data buffer.  Transmission phase annotations indicate whether a
           buffered packet has been ACK'd or not and whether it needs
           retransmission.
      
           Receive phase annotations indicate whether a slot holds a whole packet
           or a jumbo subpacket and, if the latter, which subpacket.  They also
           note whether the packet has been decrypted in place.
      
       (3) DATA packet window tracking is much simplified.  Each phase has just
           two numbers representing the window (rx_hard_ack/rx_top and
           tx_hard_ack/tx_top).
      
           The hard_ack number is the sequence number before base of the window,
           representing the last packet the other side says it has consumed.
           hard_ack starts from 0 and the first packet is sequence number 1.
      
           The top number is the sequence number of the highest-numbered packet
           residing in the buffer.  Packets between hard_ack+1 and top are
           soft-ACK'd to indicate they've been received, but not yet consumed.
      
           Four macros, before(), before_eq(), after() and after_eq() are added
           to compare sequence numbers within the window.  This allows for the
           top of the window to wrap when the hard-ack sequence number gets close
           to the limit.
      
           Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
           to indicate when rx_top and tx_top point at the packets with the
           LAST_PACKET bit set, indicating the end of the phase.
      
       (4) Calls are queued on the socket 'receive queue' rather than packets.
           This means that we don't need have to invent dummy packets to queue to
           indicate abnormal/terminal states and we don't have to keep metadata
           packets (such as ABORTs) around
      
       (5) The offset and length of a (sub)packet's content are now passed to
           the verify_packet security op.  This is currently expected to decrypt
           the packet in place and validate it.
      
           However, there's now nowhere to store the revised offset and length of
           the actual data within the decrypted blob (there may be a header and
           padding to skip) because an sk_buff may represent multiple packets, so
           a locate_data security op is added to retrieve these details from the
           sk_buff content when needed.
      
       (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
           individually secured and needs to be individually decrypted.  The code
           to do this is broken out into rxrpc_recvmsg_data() and shared with the
           kernel API.  It now iterates over the call's receive buffer rather
           than walking the socket receive queue.
      
      Additional changes:
      
       (1) The timers are condensed to a single timer that is set for the soonest
           of three timeouts (delayed ACK generation, DATA retransmission and
           call lifespan).
      
       (2) Transmission of ACK and ABORT packets is effected immediately from
           process-context socket ops/kernel API calls that cause them instead of
           them being punted off to a background work item.  The data_ready
           handler still has to defer to the background, though.
      
       (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
           filesystem can shut down the socket and flush its own work items
           before closing the socket to deal with any in-progress service calls.
      
      Future additional changes that will need to be considered:
      
       (1) Make sure that a call doesn't hog the front of the queue by receiving
           data from the network as fast as userspace is consuming it to the
           exclusion of other calls.
      
       (2) Transmit delayed ACKs from within recvmsg() when we've consumed
           sufficiently more packets to avoid the background work item needing to
           run.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      248f219c
    • D
      rxrpc: Preallocate peers, conns and calls for incoming service requests · 00e90712
      David Howells 提交于
      Make it possible for the data_ready handler called from the UDP transport
      socket to completely instantiate an rxrpc_call structure and make it
      immediately live by preallocating all the memory it might need.  The idea
      is to cut out the background thread usage as much as possible.
      
      [Note that the preallocated structs are not actually used in this patch -
       that will be done in a future patch.]
      
      If insufficient resources are available in the preallocation buffers, it
      will be possible to discard the DATA packet in the data_ready handler or
      schedule a BUSY packet without the need to schedule an attempt at
      allocation in a background thread.
      
      To this end:
      
       (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a
           maximum number each of the listen backlog size.  The backlog size is
           limited to a maxmimum of 32.  Only this many of each can be in the
           preallocation buffer.
      
       (2) For userspace sockets, the preallocation is charged initially by
           listen() and will be recharged by accepting or rejecting pending
           new incoming calls.
      
       (3) For kernel services {,re,dis}charging of the preallocation buffers is
           handled manually.  Two notifier callbacks have to be provided before
           kernel_listen() is invoked:
      
           (a) An indication that a new call has been instantiated.  This can be
           	 used to trigger background recharging.
      
           (b) An indication that a call is being discarded.  This is used when
           	 the socket is being released.
      
           A function, rxrpc_kernel_charge_accept() is called by the kernel
           service to preallocate a single call.  It should be passed the user ID
           to be used for that call and a callback to associate the rxrpc call
           with the kernel service's side of the ID.
      
       (4) Discard the preallocation when the socket is closed.
      
       (5) Temporarily bump the refcount on the call allocated in
           rxrpc_incoming_call() so that rxrpc_release_call() can ditch the
           preallocation ref on service calls unconditionally.  This will no
           longer be necessary once the preallocation is used.
      
      Note that this does not yet control the number of active service calls on a
      client - that will come in a later patch.
      
      A future development would be to provide a setsockopt() call that allows a
      userspace server to manually charge the preallocation buffer.  This would
      allow user call IDs to be provided in advance and the awkward manual accept
      stage to be bypassed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      00e90712
    • D
      rxrpc: Convert rxrpc_local::services to an hlist · de8d6c74
      David Howells 提交于
      Convert the rxrpc_local::services list to an hlist so that it can be
      accessed under RCU conditions more readily.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      de8d6c74
  3. 07 9月, 2016 2 次提交
    • D
      rxrpc: Calls shouldn't hold socket refs · 8d94aa38
      David Howells 提交于
      rxrpc calls shouldn't hold refs on the sock struct.  This was done so that
      the socket wouldn't go away whilst the call was in progress, such that the
      call could reach the socket's queues.
      
      However, we can mark the socket as requiring an RCU release and rely on the
      RCU read lock.
      
      To make this work, we do:
      
       (1) rxrpc_release_call() removes the call's call user ID.  This is now
           only called from socket operations and not from the call processor:
      
      	rxrpc_accept_call() / rxrpc_kernel_accept_call()
      	rxrpc_reject_call() / rxrpc_kernel_reject_call()
      	rxrpc_kernel_end_call()
      	rxrpc_release_calls_on_socket()
      	rxrpc_recvmsg()
      
           Though it is also called in the cleanup path of
           rxrpc_accept_incoming_call() before we assign a user ID.
      
       (2) Pass the socket pointer into rxrpc_release_call() rather than getting
           it from the call so that we can get rid of uninitialised calls.
      
       (3) Fix call processor queueing to pass a ref to the work queue and to
           release that ref at the end of the processor function (or to pass it
           back to the work queue if we have to requeue).
      
       (4) Skip out of the call processor function asap if the call is complete
           and don't requeue it if the call is complete.
      
       (5) Clean up the call immediately that the refcount reaches 0 rather than
           trying to defer it.  Actual deallocation is deferred to RCU, however.
      
       (6) Don't hold socket refs for allocated calls.
      
       (7) Use the RCU read lock when queueing a message on a socket and treat
           the call's socket pointer according to RCU rules and check it for
           NULL.
      
           We also need to use the RCU read lock when viewing a call through
           procfs.
      
       (8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call()
           if this hasn't been done yet so that we can then disconnect the call.
           Once the call is disconnected, it won't have any access to the
           connection struct and the UDP socket for the call work processor to be
           able to send the ACK.  Terminal retransmission will be handled by the
           connection processor.
      
       (9) Release all calls immediately on the closing of a socket rather than
           trying to defer this.  Incomplete calls will be aborted.
      
      The call refcount model is much simplified.  Refs are held on the call by:
      
       (1) A socket's user ID tree.
      
       (2) A socket's incoming call secureq and acceptq.
      
       (3) A kernel service that has a call in progress.
      
       (4) A queued call work processor.  We have to take care to put any call
           that we failed to queue.
      
       (5) sk_buffs on a socket's receive queue.  A future patch will get rid of
           this.
      
      Whilst we're at it, we can do:
      
       (1) Get rid of the RXRPC_CALL_EV_RELEASE event.  Release is now done
           entirely from the socket routines and never from the call's processor.
      
       (2) Get rid of the RXRPC_CALL_DEAD state.  Calls now end in the
           RXRPC_CALL_COMPLETE state.
      
       (3) Get rid of the rxrpc_call::destroyer work item.  Calls are now torn
           down when their refcount reaches 0 and then handed over to RCU for
           final cleanup.
      
       (4) Get rid of the rxrpc_call::deadspan timer.  Calls are cleaned up
           immediately they're finished with and don't hang around.
           Post-completion retransmission is handled by the connection processor
           once the call is disconnected.
      
       (5) Get rid of the dead call expiry setting as there's no longer a timer
           to set.
      
       (6) rxrpc_destroy_all_calls() can just check that the call list is empty.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8d94aa38
    • D
      rxrpc: Improve the call tracking tracepoint · fff72429
      David Howells 提交于
      Improve the call tracking tracepoint by showing more differentiation
      between some of the put and get events, including:
      
        (1) Getting and putting refs for the socket call user ID tree.
      
        (2) Getting and putting refs for queueing and failing to queue the call
            processor work item.
      
      Note that these aren't necessarily used in this patch, but will be taken
      advantage of in future patches.
      
      An enum is added for the event subtype numbers rather than coding them
      directly as decimal numbers and a table of 3-letter strings is provided
      rather than a sequence of ?: operators.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fff72429
  4. 02 9月, 2016 1 次提交
    • D
      rxrpc: Don't expose skbs to in-kernel users [ver #2] · d001648e
      David Howells 提交于
      Don't expose skbs to in-kernel users, such as the AFS filesystem, but
      instead provide a notification hook the indicates that a call needs
      attention and another that indicates that there's a new call to be
      collected.
      
      This makes the following possibilities more achievable:
      
       (1) Call refcounting can be made simpler if skbs don't hold refs to calls.
      
       (2) skbs referring to non-data events will be able to be freed much sooner
           rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
           will be able to consult the call state.
      
       (3) We can shortcut the receive phase when a call is remotely aborted
           because we don't have to go through all the packets to get to the one
           cancelling the operation.
      
       (4) It makes it easier to do encryption/decryption directly between AFS's
           buffers and sk_buffs.
      
       (5) Encryption/decryption can more easily be done in the AFS's thread
           contexts - usually that of the userspace process that issued a syscall
           - rather than in one of rxrpc's background threads on a workqueue.
      
       (6) AFS will be able to wait synchronously on a call inside AF_RXRPC.
      
      To make this work, the following interface function has been added:
      
           int rxrpc_kernel_recv_data(
      		struct socket *sock, struct rxrpc_call *call,
      		void *buffer, size_t bufsize, size_t *_offset,
      		bool want_more, u32 *_abort_code);
      
      This is the recvmsg equivalent.  It allows the caller to find out about the
      state of a specific call and to transfer received data into a buffer
      piecemeal.
      
      afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
      logic between them.  They don't wait synchronously yet because the socket
      lock needs to be dealt with.
      
      Five interface functions have been removed:
      
      	rxrpc_kernel_is_data_last()
          	rxrpc_kernel_get_abort_code()
          	rxrpc_kernel_get_error_number()
          	rxrpc_kernel_free_skb()
          	rxrpc_kernel_data_consumed()
      
      As a temporary hack, sk_buffs going to an in-kernel call are queued on the
      rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
      in-kernel user.  To process the queue internally, a temporary function,
      temp_deliver_data() has been added.  This will be replaced with common code
      between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
      future patch.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d001648e
  5. 30 8月, 2016 2 次提交
  6. 23 8月, 2016 1 次提交
  7. 06 8月, 2016 1 次提交
    • D
      rxrpc: Fix races between skb free, ACK generation and replying · 372ee163
      David Howells 提交于
      Inside the kafs filesystem it is possible to occasionally have a call
      processed and terminated before we've had a chance to check whether we need
      to clean up the rx queue for that call because afs_send_simple_reply() ends
      the call when it is done, but this is done in a workqueue item that might
      happen to run to completion before afs_deliver_to_call() completes.
      
      Further, it is possible for rxrpc_kernel_send_data() to be called to send a
      reply before the last request-phase data skb is released.  The rxrpc skb
      destructor is where the ACK processing is done and the call state is
      advanced upon release of the last skb.  ACK generation is also deferred to
      a work item because it's possible that the skb destructor is not called in
      a context where kernel_sendmsg() can be invoked.
      
      To this end, the following changes are made:
      
       (1) kernel_rxrpc_data_consumed() is added.  This should be called whenever
           an skb is emptied so as to crank the ACK and call states.  This does
           not release the skb, however.  kernel_rxrpc_free_skb() must now be
           called to achieve that.  These together replace
           rxrpc_kernel_data_delivered().
      
       (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
      
           This makes afs_deliver_to_call() easier to work as the skb can simply
           be discarded unconditionally here without trying to work out what the
           return value of the ->deliver() function means.
      
           The ->deliver() functions can, via afs_data_complete(),
           afs_transfer_reply() and afs_extract_data() mark that an skb has been
           consumed (thereby cranking the state) without the need to
           conditionally free the skb to make sure the state is correct on an
           incoming call for when the call processor tries to send the reply.
      
       (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
           has finished with a packet and MSG_PEEK isn't set.
      
       (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
      
           Because of this, we no longer need to clear the destructor and put the
           call before we free the skb in cases where we don't want the ACK/call
           state to be cranked.
      
       (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
           than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
           the delivery function already), and the caller is now responsible for
           producing an abort if that was the last packet.
      
       (6) There are many bits of unmarshalling code where:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		switch (ret) {
      		case 0:		break;
      		case -EAGAIN:	return 0;
      		default:	return ret;
      		}
      
           is to be found.  As -EAGAIN can now be passed back to the caller, we
           now just return if ret < 0:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		if (ret < 0)
      			return ret;
      
       (7) Checks for trailing data and empty final data packets has been
           consolidated as afs_data_complete().  So:
      
      		if (skb->len > 0)
      			return -EBADMSG;
      		if (!last)
      			return 0;
      
           becomes:
      
      		ret = afs_data_complete(call, skb, last);
      		if (ret < 0)
      			return ret;
      
       (8) afs_transfer_reply() now checks the amount of data it has against the
           amount of data desired and the amount of data in the skb and returns
           an error to induce an abort if we don't get exactly what we want.
      
      Without these changes, the following oops can occasionally be observed,
      particularly if some printks are inserted into the delivery path:
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
      CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G            E   4.7.0-fsdevel+ #1303
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Workqueue: kafsd afs_async_workfn [kafs]
      task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
      RIP: 0010:[<ffffffff8108fd3c>]  [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
      RSP: 0018:ffff88040c073bc0  EFLAGS: 00010002
      RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
      RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
      FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
      Stack:
       0000000000000006 000000000be04930 0000000000000000 ffff880400000000
       ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
       ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
      Call Trace:
       [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
       [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
       [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
       [<ffffffff810915f4>] lock_acquire+0x122/0x1b6
       [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff814c928f>] skb_dequeue+0x18/0x61
       [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
       [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
       [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
       [<ffffffff81063a3a>] process_one_work+0x29d/0x57c
       [<ffffffff81064ac2>] worker_thread+0x24a/0x385
       [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
       [<ffffffff810696f5>] kthread+0xf3/0xfb
       [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
       [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      372ee163
  8. 06 7月, 2016 3 次提交
    • D
      rxrpc: Move peer lookup from call-accept to new-incoming-conn · d991b4a3
      David Howells 提交于
      Move the lookup of a peer from a call that's being accepted into the
      function that creates a new incoming connection.  This will allow us to
      avoid incrementing the peer's usage count in some cases in future.
      
      Note that I haven't bother to integrate rxrpc_get_addr_from_skb() with
      rxrpc_extract_addr_from_skb() as I'm going to delete the former in the very
      near future.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d991b4a3
    • D
      rxrpc: Move usage count getting into rxrpc_queue_conn() · 2c4579e4
      David Howells 提交于
      Rather than calling rxrpc_get_connection() manually before calling
      rxrpc_queue_conn(), do it inside the queue wrapper.
      
      This allows us to do some important fixes:
      
       (1) If the usage count is 0, do nothing.  This prevents connections from
           being reanimated once they're dead.
      
       (2) If rxrpc_queue_work() fails because the work item is already queued,
           retract the usage count increment which would otherwise be lost.
      
       (3) Don't take a ref on the connection in the work function.  By passing
           the ref through the work item, this is unnecessary.  Doing it in the
           work function is too late anyway.  Previously, connection-directed
           packets held a ref on the connection, but that's not really the best
           idea.
      
      And another useful changes:
      
       (*) Don't need to take a refcount on the connection in the data_ready
           handler unless we invoke the connection's work item.  We're using RCU
           there so that's otherwise redundant.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2c4579e4
    • D
      rxrpc: Turn connection #defines into enums and put outside struct def · bba304db
      David Howells 提交于
      Turn the connection event and state #define lists into enums and move
      outside of the struct definition.
      
      Whilst we're at it, change _SERVER to _SERVICE in those identifiers and add
      EV_ into the event name to distinguish them from flags and states.
      
      Also add a symbol indicating the number of states and use that in the state
      text array.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bba304db
  9. 22 6月, 2016 4 次提交
  10. 15 6月, 2016 2 次提交
    • D
      rxrpc: Rework local endpoint management · 4f95dd78
      David Howells 提交于
      Rework the local RxRPC endpoint management.
      
      Local endpoint objects are maintained in a flat list as before.  This
      should be okay as there shouldn't be more than one per open AF_RXRPC socket
      (there can be fewer as local endpoints can be shared if their local service
      ID is 0 and they share the same local transport parameters).
      
      Changes:
      
       (1) Local endpoints may now only be shared if they have local service ID 0
           (ie. they're not being used for listening).
      
           This prevents a scenario where process A is listening of the Cache
           Manager port and process B contacts a fileserver - which may then
           attempt to send CM requests back to B.  But if A and B are sharing a
           local endpoint, A will get the CM requests meant for B.
      
       (2) We use a mutex to handle lookups and don't provide RCU-only lookups
           since we only expect to access the list when opening a socket or
           destroying an endpoint.
      
           The local endpoint object is pointed to by the transport socket's
           sk_user_data for the life of the transport socket - allowing us to
           refer to it directly from the sk_data_ready and sk_error_report
           callbacks.
      
       (3) atomic_inc_not_zero() now exists and can be used to only share a local
           endpoint if the last reference hasn't yet gone.
      
       (4) We can remove rxrpc_local_lock - a spinlock that had to be taken with
           BH processing disabled given that we assume sk_user_data won't change
           under us.
      
       (5) The transport socket is shut down before we clear the sk_user_data
           pointer so that we can be sure that the transport socket's callbacks
           won't be invoked once the RCU destruction is scheduled.
      
       (6) Local endpoints have a work item that handles both destruction and
           event processing.  The means that destruction doesn't then need to
           wait for event processing.  The event queues can then be cleared after
           the transport socket is shut down.
      
       (7) Local endpoints are no longer available for resurrection beyond the
           life of the sockets that had them open.  As soon as their last ref
           goes, they are scheduled for destruction and may not have their usage
           count moved from 0.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4f95dd78
    • D
      rxrpc: Rework peer object handling to use hash table and RCU · be6e6707
      David Howells 提交于
      Rework peer object handling to use a hash table instead of a flat list and
      to use RCU.  Peer objects are no longer destroyed by passing them to a
      workqueue to process, but rather are just passed to the RCU garbage
      collector as kfree'able objects.
      
      The hash function uses the local endpoint plus all the components of the
      remote address, except for the RxRPC service ID.  Peers thus represent a
      UDP port on the remote machine as contacted by a UDP port on this machine.
      
      The RCU read lock is used to handle non-creating lookups so that they can
      be called from bottom half context in the sk_error_report handler without
      having to lock the hash table against modification.
      rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in
      the future, this will be passed to a work item for error distribution in
      the error_report path and this function will cease being used in the
      data_ready path.
      
      Creating lookups are done under spinlock rather than mutex as they might be
      set up due to an external stimulus if the local endpoint is a server.
      
      Captured network error messages (ICMP) are handled with respect to this
      struct and MTU size and RTT are cached here.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      be6e6707
  11. 13 6月, 2016 1 次提交
    • D
      rxrpc: Rename files matching ar-*.c to git rid of the "ar-" prefix · 8c3e34a4
      David Howells 提交于
      Rename files matching net/rxrpc/ar-*.c to get rid of the "ar-" prefix.
      This will aid splitting those files by making easier to come up with new
      names.
      
      Note that the not all files are simply renamed from ar-X.c to X.c.  The
      following exceptions are made:
      
       (*) ar-call.c -> call_object.c
           ar-ack.c -> call_event.c
      
           call_object.c is going to contain the core of the call object
           handling.  Call event handling is all going to be in call_event.c.
      
       (*) ar-accept.c -> call_accept.c
      
           Incoming call handling is going to be here.
      
       (*) ar-connection.c -> conn_object.c
           ar-connevent.c -> conn_event.c
      
           The former file is going to have the basic connection object handling,
           but there will likely be some differentiation between client
           connections and service connections in additional files later.  The
           latter file will have all the connection-level event handling.
      
       (*) ar-local.c -> local_object.c
      
           This will have the local endpoint object handling code.  The local
           endpoint event handling code will later be split out into
           local_event.c.
      
       (*) ar-peer.c -> peer_object.c
      
           This will have the peer endpoint object handling code.  Peer event
           handling code will be placed in peer_event.c (for the moment, there is
           none).
      
       (*) ar-error.c -> peer_event.c
      
           This will become the peer event handling code, though for the moment
           it's actually driven from the local endpoint's perspective.
      
      Note that I haven't renamed ar-transport.c to transport_object.c as the
      intention is to delete it when the rxrpc_transport struct is excised.
      
      The only file that actually has its contents changed is net/rxrpc/Makefile.
      
      net/rxrpc/ar-internal.h will need its section marker comments updating, but
      I'll do that in a separate patch to make it easier for git to follow the
      history across the rename.  I may also want to rename ar-internal.h at some
      point - but that would mean updating all the #includes and I'd rather do
      that in a separate step.
      
      Signed-off-by: David Howells <dhowells@redhat.com.
      8c3e34a4
  12. 04 6月, 2016 1 次提交
    • J
      rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KB · 9b6d5398
      Joe Perches 提交于
      Use the more common kernel logging style and reduce object size.
      
      The logging message prefix changes from a mixture of
      "RxRPC:" and "RXRPC:" to "af_rxrpc: ".
      
      $ size net/rxrpc/built-in.o*
         text	   data	    bss	    dec	    hex	filename
        64172	   1972	   8304	  74448	  122d0	net/rxrpc/built-in.o.new
        67512	   1972	   8304	  77788	  12fdc	net/rxrpc/built-in.o.old
      
      Miscellanea:
      
      o Consolidate the ASSERT macros to use a single pr_err call with
        decimal and hexadecimal output and a stringified #OP argument
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b6d5398
  13. 12 4月, 2016 1 次提交
  14. 04 3月, 2016 4 次提交
    • D
      rxrpc: Adjust some whitespace and comments · b4f1342f
      David Howells 提交于
      Remove some excess whitespace, insert some missing spaces and adjust a
      couple of comments.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b4f1342f
    • D
      rxrpc: Keep the skb private record of the Rx header in host byte order · 0d12f8a4
      David Howells 提交于
      Currently, a copy of the Rx packet header is copied into the the sk_buff
      private data so that we can advance the pointer into the buffer,
      potentially discarding the original.  At the moment, this copy is held in
      network byte order, but this means we're doing a lot of unnecessary
      translations.
      
      The reasons it was done this way are that we need the values in network
      byte order occasionally and we can use the copy, slightly modified, as part
      of an iov array when sending an ack or an abort packet.
      
      However, it seems more reasonable on review that it would be better kept in
      host byte order and that we make up a new header when we want to send
      another packet.
      
      To this end, rename the original header struct to rxrpc_wire_header (with
      BE fields) and institute a variant called rxrpc_host_header that has host
      order fields.  Change the struct in the sk_buff private data into an
      rxrpc_host_header and translate the values when filling it in.
      
      This further allows us to keep values kept in various structures in host
      byte order rather than network byte order and allows removal of some fields
      that are byteswapped duplicates.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0d12f8a4
    • D
      rxrpc: Rename call events to begin RXRPC_CALL_EV_ · 4c198ad1
      David Howells 提交于
      Rename call event names to begin RXRPC_CALL_EV_ to distinguish them from the
      flags.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4c198ad1
    • D
      rxrpc: Fix a case where a call event bit is being used as a flag bit · e721498a
      David Howells 提交于
      Fix a case where RXRPC_CALL_RELEASE (an event) is being used to specify a
      flag bit.  RXRPC_CALL_RELEASED should be used instead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e721498a
  15. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  16. 23 3月, 2010 2 次提交
  17. 13 8月, 2008 1 次提交
  18. 18 2月, 2008 1 次提交
    • J
      net/rxrpc: Use BUG_ON · 163e3cb7
      Julia Lawall 提交于
      if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
      side-effects to allow a definition of BUG_ON that drops the code completely.
      
      The semantic patch that makes this change is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @ disable unlikely @ expression E,f; @@
      
      (
        if (<... f(...) ...>) { BUG(); }
      |
      - if (unlikely(E)) { BUG(); }
      + BUG_ON(E);
      )
      
      @@ expression E,f; @@
      
      (
        if (<... f(...) ...>) { BUG(); }
      |
      - if (E) { BUG(); }
      + BUG_ON(E);
      )
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      163e3cb7
  19. 27 4月, 2007 2 次提交
    • D
      [AF_RXRPC]: Add an interface to the AF_RXRPC module for the AFS filesystem to use · 651350d1
      David Howells 提交于
      Add an interface to the AF_RXRPC module so that the AFS filesystem module can
      more easily make use of the services available.  AFS still opens a socket but
      then uses the action functions in lieu of sendmsg() and registers an intercept
      functions to grab messages before they're queued on the socket Rx queue.
      
      This permits AFS (or whatever) to:
      
       (1) Avoid the overhead of using the recvmsg() call.
      
       (2) Use different keys directly on individual client calls on one socket
           rather than having to open a whole slew of sockets, one for each key it
           might want to use.
      
       (3) Avoid calling request_key() at the point of issue of a call or opening of
           a socket.  This is done instead by AFS at the point of open(), unlink() or
           other VFS operation and the key handed through.
      
       (4) Request the use of something other than GFP_KERNEL to allocate memory.
      
      Furthermore:
      
       (*) The socket buffer markings used by RxRPC are made available for AFS so
           that it can interpret the cooked RxRPC messages itself.
      
       (*) rxgen (un)marshalling abort codes are made available.
      
      
      The following documentation for the kernel interface is added to
      Documentation/networking/rxrpc.txt:
      
      =========================
      AF_RXRPC KERNEL INTERFACE
      =========================
      
      The AF_RXRPC module also provides an interface for use by in-kernel utilities
      such as the AFS filesystem.  This permits such a utility to:
      
       (1) Use different keys directly on individual client calls on one socket
           rather than having to open a whole slew of sockets, one for each key it
           might want to use.
      
       (2) Avoid having RxRPC call request_key() at the point of issue of a call or
           opening of a socket.  Instead the utility is responsible for requesting a
           key at the appropriate point.  AFS, for instance, would do this during VFS
           operations such as open() or unlink().  The key is then handed through
           when the call is initiated.
      
       (3) Request the use of something other than GFP_KERNEL to allocate memory.
      
       (4) Avoid the overhead of using the recvmsg() call.  RxRPC messages can be
           intercepted before they get put into the socket Rx queue and the socket
           buffers manipulated directly.
      
      To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket,
      bind an addess as appropriate and listen if it's to be a server socket, but
      then it passes this to the kernel interface functions.
      
      The kernel interface functions are as follows:
      
       (*) Begin a new client call.
      
      	struct rxrpc_call *
      	rxrpc_kernel_begin_call(struct socket *sock,
      				struct sockaddr_rxrpc *srx,
      				struct key *key,
      				unsigned long user_call_ID,
      				gfp_t gfp);
      
           This allocates the infrastructure to make a new RxRPC call and assigns
           call and connection numbers.  The call will be made on the UDP port that
           the socket is bound to.  The call will go to the destination address of a
           connected client socket unless an alternative is supplied (srx is
           non-NULL).
      
           If a key is supplied then this will be used to secure the call instead of
           the key bound to the socket with the RXRPC_SECURITY_KEY sockopt.  Calls
           secured in this way will still share connections if at all possible.
      
           The user_call_ID is equivalent to that supplied to sendmsg() in the
           control data buffer.  It is entirely feasible to use this to point to a
           kernel data structure.
      
           If this function is successful, an opaque reference to the RxRPC call is
           returned.  The caller now holds a reference on this and it must be
           properly ended.
      
       (*) End a client call.
      
      	void rxrpc_kernel_end_call(struct rxrpc_call *call);
      
           This is used to end a previously begun call.  The user_call_ID is expunged
           from AF_RXRPC's knowledge and will not be seen again in association with
           the specified call.
      
       (*) Send data through a call.
      
      	int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg,
      				   size_t len);
      
           This is used to supply either the request part of a client call or the
           reply part of a server call.  msg.msg_iovlen and msg.msg_iov specify the
           data buffers to be used.  msg_iov may not be NULL and must point
           exclusively to in-kernel virtual addresses.  msg.msg_flags may be given
           MSG_MORE if there will be subsequent data sends for this call.
      
           The msg must not specify a destination address, control data or any flags
           other than MSG_MORE.  len is the total amount of data to transmit.
      
       (*) Abort a call.
      
      	void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code);
      
           This is used to abort a call if it's still in an abortable state.  The
           abort code specified will be placed in the ABORT message sent.
      
       (*) Intercept received RxRPC messages.
      
      	typedef void (*rxrpc_interceptor_t)(struct sock *sk,
      					    unsigned long user_call_ID,
      					    struct sk_buff *skb);
      
      	void
      	rxrpc_kernel_intercept_rx_messages(struct socket *sock,
      					   rxrpc_interceptor_t interceptor);
      
           This installs an interceptor function on the specified AF_RXRPC socket.
           All messages that would otherwise wind up in the socket's Rx queue are
           then diverted to this function.  Note that care must be taken to process
           the messages in the right order to maintain DATA message sequentiality.
      
           The interceptor function itself is provided with the address of the socket
           and handling the incoming message, the ID assigned by the kernel utility
           to the call and the socket buffer containing the message.
      
           The skb->mark field indicates the type of message:
      
      	MARK				MEANING
      	===============================	=======================================
      	RXRPC_SKB_MARK_DATA		Data message
      	RXRPC_SKB_MARK_FINAL_ACK	Final ACK received for an incoming call
      	RXRPC_SKB_MARK_BUSY		Client call rejected as server busy
      	RXRPC_SKB_MARK_REMOTE_ABORT	Call aborted by peer
      	RXRPC_SKB_MARK_NET_ERROR	Network error detected
      	RXRPC_SKB_MARK_LOCAL_ERROR	Local error encountered
      	RXRPC_SKB_MARK_NEW_CALL		New incoming call awaiting acceptance
      
           The remote abort message can be probed with rxrpc_kernel_get_abort_code().
           The two error messages can be probed with rxrpc_kernel_get_error_number().
           A new call can be accepted with rxrpc_kernel_accept_call().
      
           Data messages can have their contents extracted with the usual bunch of
           socket buffer manipulation functions.  A data message can be determined to
           be the last one in a sequence with rxrpc_kernel_is_data_last().  When a
           data message has been used up, rxrpc_kernel_data_delivered() should be
           called on it..
      
           Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose
           of.  It is possible to get extra refs on all types of message for later
           freeing, but this may pin the state of a call until the message is finally
           freed.
      
       (*) Accept an incoming call.
      
      	struct rxrpc_call *
      	rxrpc_kernel_accept_call(struct socket *sock,
      				 unsigned long user_call_ID);
      
           This is used to accept an incoming call and to assign it a call ID.  This
           function is similar to rxrpc_kernel_begin_call() and calls accepted must
           be ended in the same way.
      
           If this function is successful, an opaque reference to the RxRPC call is
           returned.  The caller now holds a reference on this and it must be
           properly ended.
      
       (*) Reject an incoming call.
      
      	int rxrpc_kernel_reject_call(struct socket *sock);
      
           This is used to reject the first incoming call on the socket's queue with
           a BUSY message.  -ENODATA is returned if there were no incoming calls.
           Other errors may be returned if the call had been aborted (-ECONNABORTED)
           or had timed out (-ETIME).
      
       (*) Record the delivery of a data message and free it.
      
      	void rxrpc_kernel_data_delivered(struct sk_buff *skb);
      
           This is used to record a data message as having been delivered and to
           update the ACK state for the call.  The socket buffer will be freed.
      
       (*) Free a message.
      
      	void rxrpc_kernel_free_skb(struct sk_buff *skb);
      
           This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC
           socket.
      
       (*) Determine if a data message is the last one on a call.
      
      	bool rxrpc_kernel_is_data_last(struct sk_buff *skb);
      
           This is used to determine if a socket buffer holds the last data message
           to be received for a call (true will be returned if it does, false
           if not).
      
           The data message will be part of the reply on a client call and the
           request on an incoming call.  In the latter case there will be more
           messages, but in the former case there will not.
      
       (*) Get the abort code from an abort message.
      
      	u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb);
      
           This is used to extract the abort code from a remote abort message.
      
       (*) Get the error number from a local or network error message.
      
      	int rxrpc_kernel_get_error_number(struct sk_buff *skb);
      
           This is used to extract the error number from a message indicating either
           a local error occurred or a network error occurred.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      651350d1
    • D
      [AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both · 17926a79
      David Howells 提交于
      Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve
      answers to AFS clients.  KerberosIV security is fully supported.  The patches
      and some example test programs can be found in:
      
      	http://people.redhat.com/~dhowells/rxrpc/
      
      This will eventually replace the old implementation of kernel-only RxRPC
      currently resident in net/rxrpc/.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17926a79