1. 01 7月, 2017 2 次提交
  2. 08 6月, 2017 2 次提交
    • D
      rxrpc: Provide a cmsg to specify the amount of Tx data for a call · e754eba6
      David Howells 提交于
      Provide a control message that can be specified on the first sendmsg() of a
      client call or the first sendmsg() of a service response to indicate the
      total length of the data to be transmitted for that call.
      
      Currently, because the length of the payload of an encrypted DATA packet is
      encrypted in front of the data, the packet cannot be encrypted until we
      know how much data it will hold.
      
      By specifying the length at the beginning of the transmit phase, each DATA
      packet length can be set before we start loading data from userspace (where
      several sendmsg() calls may contribute to a particular packet).
      
      An error will be returned if too little or too much data is presented in
      the Tx phase.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e754eba6
    • D
      rxrpc: Provide a getsockopt call to query what cmsgs types are supported · 515559ca
      David Howells 提交于
      Provide a getsockopt() call that can query what cmsg types are supported by
      AF_RXRPC.
      515559ca
  3. 05 6月, 2017 3 次提交
    • D
      rxrpc: Implement service upgrade · 4722974d
      David Howells 提交于
      Implement AuriStor's service upgrade facility.  There are three problems
      that this is meant to deal with:
      
       (1) Various of the standard AFS RPC calls have IPv4 addresses in their
           requests and/or replies - but there's no room for including IPv6
           addresses.
      
       (2) Definition of IPv6-specific RPC operations in the standard operation
           sets has not yet been achieved.
      
       (3) One could envision the creation a new service on the same port that as
           the original service.  The new service could implement improved
           operations - and the client could try this first, falling back to the
           original service if it's not there.
      
           Unfortunately, certain servers ignore packets addressed to a service
           they don't implement and don't respond in any way - not even with an
           ABORT.  This means that the client must then wait for the call timeout
           to occur.
      
      What service upgrade does is to see if the connection is marked as being
      'upgradeable' and if so, change the service ID in the server and thus the
      request and reply formats.  Note that the upgrade isn't mandatory - a
      server that supports only the original call set will ignore the upgrade
      request.
      
      In the protocol, the procedure is then as follows:
      
       (1) To request an upgrade, the first DATA packet in a new connection must
           have the userStatus set to 1 (this is normally 0).  The userStatus
           value is normally ignored by the server.
      
       (2) If the server doesn't support upgrading, the reply packets will
           contain the same service ID as for the first request packet.
      
       (3) If the server does support upgrading, all future reply packets on that
           connection will contain the new service ID and the new service ID will
           be applied to *all* further calls on that connection as well.
      
       (4) The RPC op used to probe the upgrade must take the same request data
           as the shadow call in the upgrade set (but may return a different
           reply).  GetCapability RPC ops were added to all standard sets for
           just this purpose.  Ops where the request formats differ cannot be
           used for probing.
      
       (5) The client must wait for completion of the probe before sending any
           further RPC ops to the same destination.  It should then use the
           service ID that recvmsg() reported back in all future calls.
      
       (6) The shadow service must have call definitions for all the operation
           IDs defined by the original service.
      
      
      To support service upgrading, a server should:
      
       (1) Call bind() twice on its AF_RXRPC socket before calling listen().
           Each bind() should supply a different service ID, but the transport
           addresses must be the same.  This allows the server to receive
           requests with either service ID.
      
       (2) Enable automatic upgrading by calling setsockopt(), specifying
           RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
           unsigned shorts as the argument:
      
      	unsigned short optval[2];
      
           This specifies a pair of service IDs.  They must be different and must
           match the service IDs bound to the socket.  Member 0 is the service ID
           to upgrade from and member 1 is the service ID to upgrade to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4722974d
    • D
      rxrpc: Permit multiple service binding · 28036f44
      David Howells 提交于
      Permit bind() to be called on an AF_RXRPC socket more than once (currently
      maximum twice) to bind multiple listening services to it.  There are some
      restrictions:
      
       (1) All bind() calls involved must have a non-zero service ID.
      
       (2) The service IDs must all be different.
      
       (3) The rest of the address (notably the transport part) must be the same
           in all (a single UDP socket is shared).
      
       (4) This must be done before listen() or sendmsg() is called.
      
      This allows someone to connect to the service socket with different service
      IDs and lays the foundation for service upgrading.
      
      The service ID used by an incoming call can be extracted from the msg_name
      returned by recvmsg().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      28036f44
    • D
      rxrpc: Separate the connection's protocol service ID from the lookup ID · 68d6d1ae
      David Howells 提交于
      Keep the rxrpc_connection struct's idea of the service ID that is exposed
      in the protocol separate from the service ID that's used as a lookup key.
      
      This allows the protocol service ID on a client connection to get upgraded
      without making the connection unfindable for other client calls that also
      would like to use the upgraded connection.
      
      The connection's actual service ID is then returned through recvmsg() by
      way of msg_name.
      
      Whilst we're at it, we get rid of the last_service_id field from each
      channel.  The service ID is per-connection, not per-call and an entire
      connection is upgraded in one go.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      68d6d1ae
  4. 26 5月, 2017 1 次提交
    • D
      rxrpc: Support network namespacing · 2baec2c3
      David Howells 提交于
      Support network namespacing in AF_RXRPC with the following changes:
      
       (1) All the local endpoint, peer and call lists, locks, counters, etc. are
           moved into the per-namespace record.
      
       (2) All the connection tracking is moved into the per-namespace record
           with the exception of the client connection ID tree, which is kept
           global so that connection IDs are kept unique per-machine.
      
       (3) Each namespace gets its own epoch.  This allows each network namespace
           to pretend to be a separate client machine.
      
       (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
           the contents reflect the namespace.
      
      fs/afs/ should be okay with this patch as it explicitly requires the current
      net namespace to be init_net to permit a mount to proceed at the moment.  It
      will, however, need updating so that cells, IP addresses and DNS records are
      per-namespace also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2baec2c3
  5. 02 3月, 2017 1 次提交
    • D
      rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48
      David Howells 提交于
      All the routines by which rxrpc is accessed from the outside are serialised
      by means of the socket lock (sendmsg, recvmsg, bind,
      rxrpc_kernel_begin_call(), ...) and this presents a problem:
      
       (1) If a number of calls on the same socket are in the process of
           connection to the same peer, a maximum of four concurrent live calls
           are permitted before further calls need to wait for a slot.
      
       (2) If a call is waiting for a slot, it is deep inside sendmsg() or
           rxrpc_kernel_begin_call() and the entry function is holding the socket
           lock.
      
       (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
           from servicing the other calls as they need to take the socket lock to
           do so.
      
       (4) The socket is stuck until a call is aborted and makes its slot
           available to the waiter.
      
      Fix this by:
      
       (1) Provide each call with a mutex ('user_mutex') that arbitrates access
           by the users of rxrpc separately for each specific call.
      
       (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
           they've got a call and taken its mutex.
      
           Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
           set but someone else has the lock.  Should I instead only return
           EWOULDBLOCK if there's nothing currently to be done on a socket, and
           sleep in this particular instance because there is something to be
           done, but we appear to be blocked by the interrupt handler doing its
           ping?
      
       (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
           call, locking its user mutex and adding it to the socket's call tree.
           The call is returned locked so that sendmsg() can add data to it
           immediately.
      
           From the moment the call is in the socket tree, it is subject to
           access by sendmsg() and recvmsg() - even if it isn't connected yet.
      
       (4) Lock new service calls in the UDP data_ready handler (in
           rxrpc_new_incoming_call()) because they may already be in the socket's
           tree and the data_ready handler makes them live immediately if a user
           ID has already been preassigned.
      
           Note that the new call is locked before any notifications are sent
           that it is live, so doing mutex_trylock() *ought* to always succeed.
           Userspace is prevented from doing sendmsg() on calls that are in a
           too-early state in rxrpc_do_sendmsg().
      
       (5) Make rxrpc_new_incoming_call() return the call with the user mutex
           held so that a ping can be scheduled immediately under it.
      
           Note that it might be worth moving the ping call into
           rxrpc_new_incoming_call() and then we can drop the mutex there.
      
       (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
           release the socket after adding the call to the socket's tree.  This
           is slightly tricky as we've dequeued the call by that point and have
           to requeue it.
      
           Note that requeuing emits a trace event.
      
       (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
           new mutex immediately and don't bother with the socket mutex at all.
      
      This patch has the nice bonus that calls on the same socket are now to some
      extent parallelisable.
      
      Note that we might want to move rxrpc_service_prealloc() calls out from the
      socket lock and give it its own lock, so that we don't hang progress in
      other calls because we're waiting for the allocator.
      
      We probably also want to avoid calling rxrpc_notify_socket() from within
      the socket lock (rxrpc_accept_call()).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540b1c48
  6. 09 1月, 2017 1 次提交
  7. 15 12月, 2016 1 次提交
  8. 06 10月, 2016 1 次提交
  9. 30 9月, 2016 1 次提交
    • D
      rxrpc: Reduce the rxrpc_local::services list to a pointer · 1e9e5c95
      David Howells 提交于
      Reduce the rxrpc_local::services list to just a pointer as we don't permit
      multiple service endpoints to bind to a single transport endpoints (this is
      excluded by rxrpc_lookup_local()).
      
      The reason we don't allow this is that if you send a request to an AFS
      filesystem service, it will try to talk back to your cache manager on the
      port you sent from (this is how file change notifications are handled).  To
      prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
      sockets share a UDP socket if at least one of them has a service bound.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1e9e5c95
  10. 17 9月, 2016 2 次提交
    • D
      rxrpc: Improve skb tracing · 71f3ca40
      David Howells 提交于
      Improve sk_buff tracing within AF_RXRPC by the following means:
      
       (1) Use an enum to note the event type rather than plain integers and use
           an array of event names rather than a big multi ?: list.
      
       (2) Distinguish Rx from Tx packets and account them separately.  This
           requires the call phase to be tracked so that we know what we might
           find in rxtx_buffer[].
      
       (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
           event type.
      
       (4) A pair of 'rotate' events are added to indicate packets that are about
           to be rotated out of the Rx and Tx windows.
      
       (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
           packet loss injection recording.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
       
      71f3ca40
    • D
      rxrpc: Make IPv6 support conditional on CONFIG_IPV6 · d1912747
      David Howells 提交于
      Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
      This is then made conditional on CONFIG_IPV6.
      
      Without this, the following can be seen:
      
         net/built-in.o: In function `rxrpc_init_peer':
      >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1912747
  11. 14 9月, 2016 3 次提交
    • D
      rxrpc: Add IPv6 support · 75b54cb5
      David Howells 提交于
      Add IPv6 support to AF_RXRPC.  With this, AF_RXRPC sockets can be created:
      
      	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);
      
      instead of:
      
      	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
      
      The AFS filesystem doesn't support IPv6 at the moment, though, since that
      requires upgrades to some of the RPC calls.
      
      Note that a good portion of this patch is replacing "%pI4:%u" in print
      statements with "%pISpc" which is able to handle both protocols and print
      the port.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      75b54cb5
    • D
      rxrpc: Create an address for sendmsg() to bind unbound socket with · cd5892c7
      David Howells 提交于
      Create an address for sendmsg() to bind unbound socket with rather than
      using a completely blank address otherwise the transport socket creation
      will fail because it will try to use address family 0.
      
      We use the address family specified in the protocol argument when the
      AF_RXRPC socket was created and SOCK_DGRAM as the default.  For anything
      else, bind() must be used.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cd5892c7
    • D
      rxrpc: Adjust the call ref tracepoint to show kernel API refs · cbd00891
      David Howells 提交于
      Adjust the call ref tracepoint to show references held on a call by the
      kernel API separately as much as possible and add an additional trace to at
      the allocation point from the preallocation buffer for an incoming call.
      
      Note that this doesn't show the allocation of a client call for the kernel
      separately at the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cbd00891
  12. 08 9月, 2016 3 次提交
    • D
      rxrpc: Rewrite the data and ack handling code · 248f219c
      David Howells 提交于
      Rewrite the data and ack handling code such that:
      
       (1) Parsing of received ACK and ABORT packets and the distribution and the
           filing of DATA packets happens entirely within the data_ready context
           called from the UDP socket.  This allows us to process and discard ACK
           and ABORT packets much more quickly (they're no longer stashed on a
           queue for a background thread to process).
      
       (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim().  We instead
           keep track of the offset and length of the content of each packet in
           the sk_buff metadata.  This means we don't do any allocation in the
           receive path.
      
       (3) Jumbo DATA packet parsing is now done in data_ready context.  Rather
           than cloning the packet once for each subpacket and pulling/trimming
           it, we file the packet multiple times with an annotation for each
           indicating which subpacket is there.  From that we can directly
           calculate the offset and length.
      
       (4) A call's receive queue can be accessed without taking locks (memory
           barriers do have to be used, though).
      
       (5) Incoming calls are set up from preallocated resources and immediately
           made live.  They can than have packets queued upon them and ACKs
           generated.  If insufficient resources exist, DATA packet #1 is given a
           BUSY reply and other DATA packets are discarded).
      
       (6) sk_buffs no longer take a ref on their parent call.
      
      To make this work, the following changes are made:
      
       (1) Each call's receive buffer is now a circular buffer of sk_buff
           pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
           between the call and the socket.  This permits each sk_buff to be in
           the buffer multiple times.  The receive buffer is reused for the
           transmit buffer.
      
       (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
           to the data buffer.  Transmission phase annotations indicate whether a
           buffered packet has been ACK'd or not and whether it needs
           retransmission.
      
           Receive phase annotations indicate whether a slot holds a whole packet
           or a jumbo subpacket and, if the latter, which subpacket.  They also
           note whether the packet has been decrypted in place.
      
       (3) DATA packet window tracking is much simplified.  Each phase has just
           two numbers representing the window (rx_hard_ack/rx_top and
           tx_hard_ack/tx_top).
      
           The hard_ack number is the sequence number before base of the window,
           representing the last packet the other side says it has consumed.
           hard_ack starts from 0 and the first packet is sequence number 1.
      
           The top number is the sequence number of the highest-numbered packet
           residing in the buffer.  Packets between hard_ack+1 and top are
           soft-ACK'd to indicate they've been received, but not yet consumed.
      
           Four macros, before(), before_eq(), after() and after_eq() are added
           to compare sequence numbers within the window.  This allows for the
           top of the window to wrap when the hard-ack sequence number gets close
           to the limit.
      
           Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
           to indicate when rx_top and tx_top point at the packets with the
           LAST_PACKET bit set, indicating the end of the phase.
      
       (4) Calls are queued on the socket 'receive queue' rather than packets.
           This means that we don't need have to invent dummy packets to queue to
           indicate abnormal/terminal states and we don't have to keep metadata
           packets (such as ABORTs) around
      
       (5) The offset and length of a (sub)packet's content are now passed to
           the verify_packet security op.  This is currently expected to decrypt
           the packet in place and validate it.
      
           However, there's now nowhere to store the revised offset and length of
           the actual data within the decrypted blob (there may be a header and
           padding to skip) because an sk_buff may represent multiple packets, so
           a locate_data security op is added to retrieve these details from the
           sk_buff content when needed.
      
       (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
           individually secured and needs to be individually decrypted.  The code
           to do this is broken out into rxrpc_recvmsg_data() and shared with the
           kernel API.  It now iterates over the call's receive buffer rather
           than walking the socket receive queue.
      
      Additional changes:
      
       (1) The timers are condensed to a single timer that is set for the soonest
           of three timeouts (delayed ACK generation, DATA retransmission and
           call lifespan).
      
       (2) Transmission of ACK and ABORT packets is effected immediately from
           process-context socket ops/kernel API calls that cause them instead of
           them being punted off to a background work item.  The data_ready
           handler still has to defer to the background, though.
      
       (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
           filesystem can shut down the socket and flush its own work items
           before closing the socket to deal with any in-progress service calls.
      
      Future additional changes that will need to be considered:
      
       (1) Make sure that a call doesn't hog the front of the queue by receiving
           data from the network as fast as userspace is consuming it to the
           exclusion of other calls.
      
       (2) Transmit delayed ACKs from within recvmsg() when we've consumed
           sufficiently more packets to avoid the background work item needing to
           run.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      248f219c
    • D
      rxrpc: Preallocate peers, conns and calls for incoming service requests · 00e90712
      David Howells 提交于
      Make it possible for the data_ready handler called from the UDP transport
      socket to completely instantiate an rxrpc_call structure and make it
      immediately live by preallocating all the memory it might need.  The idea
      is to cut out the background thread usage as much as possible.
      
      [Note that the preallocated structs are not actually used in this patch -
       that will be done in a future patch.]
      
      If insufficient resources are available in the preallocation buffers, it
      will be possible to discard the DATA packet in the data_ready handler or
      schedule a BUSY packet without the need to schedule an attempt at
      allocation in a background thread.
      
      To this end:
      
       (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a
           maximum number each of the listen backlog size.  The backlog size is
           limited to a maxmimum of 32.  Only this many of each can be in the
           preallocation buffer.
      
       (2) For userspace sockets, the preallocation is charged initially by
           listen() and will be recharged by accepting or rejecting pending
           new incoming calls.
      
       (3) For kernel services {,re,dis}charging of the preallocation buffers is
           handled manually.  Two notifier callbacks have to be provided before
           kernel_listen() is invoked:
      
           (a) An indication that a new call has been instantiated.  This can be
           	 used to trigger background recharging.
      
           (b) An indication that a call is being discarded.  This is used when
           	 the socket is being released.
      
           A function, rxrpc_kernel_charge_accept() is called by the kernel
           service to preallocate a single call.  It should be passed the user ID
           to be used for that call and a callback to associate the rxrpc call
           with the kernel service's side of the ID.
      
       (4) Discard the preallocation when the socket is closed.
      
       (5) Temporarily bump the refcount on the call allocated in
           rxrpc_incoming_call() so that rxrpc_release_call() can ditch the
           preallocation ref on service calls unconditionally.  This will no
           longer be necessary once the preallocation is used.
      
      Note that this does not yet control the number of active service calls on a
      client - that will come in a later patch.
      
      A future development would be to provide a setsockopt() call that allows a
      userspace server to manually charge the preallocation buffer.  This would
      allow user call IDs to be provided in advance and the awkward manual accept
      stage to be bypassed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      00e90712
    • D
      rxrpc: Convert rxrpc_local::services to an hlist · de8d6c74
      David Howells 提交于
      Convert the rxrpc_local::services list to an hlist so that it can be
      accessed under RCU conditions more readily.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      de8d6c74
  13. 07 9月, 2016 2 次提交
    • D
      rxrpc: Calls shouldn't hold socket refs · 8d94aa38
      David Howells 提交于
      rxrpc calls shouldn't hold refs on the sock struct.  This was done so that
      the socket wouldn't go away whilst the call was in progress, such that the
      call could reach the socket's queues.
      
      However, we can mark the socket as requiring an RCU release and rely on the
      RCU read lock.
      
      To make this work, we do:
      
       (1) rxrpc_release_call() removes the call's call user ID.  This is now
           only called from socket operations and not from the call processor:
      
      	rxrpc_accept_call() / rxrpc_kernel_accept_call()
      	rxrpc_reject_call() / rxrpc_kernel_reject_call()
      	rxrpc_kernel_end_call()
      	rxrpc_release_calls_on_socket()
      	rxrpc_recvmsg()
      
           Though it is also called in the cleanup path of
           rxrpc_accept_incoming_call() before we assign a user ID.
      
       (2) Pass the socket pointer into rxrpc_release_call() rather than getting
           it from the call so that we can get rid of uninitialised calls.
      
       (3) Fix call processor queueing to pass a ref to the work queue and to
           release that ref at the end of the processor function (or to pass it
           back to the work queue if we have to requeue).
      
       (4) Skip out of the call processor function asap if the call is complete
           and don't requeue it if the call is complete.
      
       (5) Clean up the call immediately that the refcount reaches 0 rather than
           trying to defer it.  Actual deallocation is deferred to RCU, however.
      
       (6) Don't hold socket refs for allocated calls.
      
       (7) Use the RCU read lock when queueing a message on a socket and treat
           the call's socket pointer according to RCU rules and check it for
           NULL.
      
           We also need to use the RCU read lock when viewing a call through
           procfs.
      
       (8) Transmit the final ACK/ABORT to a client call in rxrpc_release_call()
           if this hasn't been done yet so that we can then disconnect the call.
           Once the call is disconnected, it won't have any access to the
           connection struct and the UDP socket for the call work processor to be
           able to send the ACK.  Terminal retransmission will be handled by the
           connection processor.
      
       (9) Release all calls immediately on the closing of a socket rather than
           trying to defer this.  Incomplete calls will be aborted.
      
      The call refcount model is much simplified.  Refs are held on the call by:
      
       (1) A socket's user ID tree.
      
       (2) A socket's incoming call secureq and acceptq.
      
       (3) A kernel service that has a call in progress.
      
       (4) A queued call work processor.  We have to take care to put any call
           that we failed to queue.
      
       (5) sk_buffs on a socket's receive queue.  A future patch will get rid of
           this.
      
      Whilst we're at it, we can do:
      
       (1) Get rid of the RXRPC_CALL_EV_RELEASE event.  Release is now done
           entirely from the socket routines and never from the call's processor.
      
       (2) Get rid of the RXRPC_CALL_DEAD state.  Calls now end in the
           RXRPC_CALL_COMPLETE state.
      
       (3) Get rid of the rxrpc_call::destroyer work item.  Calls are now torn
           down when their refcount reaches 0 and then handed over to RCU for
           final cleanup.
      
       (4) Get rid of the rxrpc_call::deadspan timer.  Calls are cleaned up
           immediately they're finished with and don't hang around.
           Post-completion retransmission is handled by the connection processor
           once the call is disconnected.
      
       (5) Get rid of the dead call expiry setting as there's no longer a timer
           to set.
      
       (6) rxrpc_destroy_all_calls() can just check that the call list is empty.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8d94aa38
    • D
      rxrpc: Improve the call tracking tracepoint · fff72429
      David Howells 提交于
      Improve the call tracking tracepoint by showing more differentiation
      between some of the put and get events, including:
      
        (1) Getting and putting refs for the socket call user ID tree.
      
        (2) Getting and putting refs for queueing and failing to queue the call
            processor work item.
      
      Note that these aren't necessarily used in this patch, but will be taken
      advantage of in future patches.
      
      An enum is added for the event subtype numbers rather than coding them
      directly as decimal numbers and a table of 3-letter strings is provided
      rather than a sequence of ?: operators.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fff72429
  14. 05 9月, 2016 1 次提交
  15. 02 9月, 2016 1 次提交
    • D
      rxrpc: Don't expose skbs to in-kernel users [ver #2] · d001648e
      David Howells 提交于
      Don't expose skbs to in-kernel users, such as the AFS filesystem, but
      instead provide a notification hook the indicates that a call needs
      attention and another that indicates that there's a new call to be
      collected.
      
      This makes the following possibilities more achievable:
      
       (1) Call refcounting can be made simpler if skbs don't hold refs to calls.
      
       (2) skbs referring to non-data events will be able to be freed much sooner
           rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
           will be able to consult the call state.
      
       (3) We can shortcut the receive phase when a call is remotely aborted
           because we don't have to go through all the packets to get to the one
           cancelling the operation.
      
       (4) It makes it easier to do encryption/decryption directly between AFS's
           buffers and sk_buffs.
      
       (5) Encryption/decryption can more easily be done in the AFS's thread
           contexts - usually that of the userspace process that issued a syscall
           - rather than in one of rxrpc's background threads on a workqueue.
      
       (6) AFS will be able to wait synchronously on a call inside AF_RXRPC.
      
      To make this work, the following interface function has been added:
      
           int rxrpc_kernel_recv_data(
      		struct socket *sock, struct rxrpc_call *call,
      		void *buffer, size_t bufsize, size_t *_offset,
      		bool want_more, u32 *_abort_code);
      
      This is the recvmsg equivalent.  It allows the caller to find out about the
      state of a specific call and to transfer received data into a buffer
      piecemeal.
      
      afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
      logic between them.  They don't wait synchronously yet because the socket
      lock needs to be dealt with.
      
      Five interface functions have been removed:
      
      	rxrpc_kernel_is_data_last()
          	rxrpc_kernel_get_abort_code()
          	rxrpc_kernel_get_error_number()
          	rxrpc_kernel_free_skb()
          	rxrpc_kernel_data_consumed()
      
      As a temporary hack, sk_buffs going to an in-kernel call are queued on the
      rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
      in-kernel user.  To process the queue internally, a temporary function,
      temp_deliver_data() has been added.  This will be replaced with common code
      between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
      future patch.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d001648e
  16. 30 8月, 2016 1 次提交
    • D
      rxrpc: Pass struct socket * to more rxrpc kernel interface functions · 4de48af6
      David Howells 提交于
      Pass struct socket * to more rxrpc kernel interface functions.  They should
      be starting from this rather than the socket pointer in the rxrpc_call
      struct if they need to access the socket.
      
      I have left:
      
      	rxrpc_kernel_is_data_last()
      	rxrpc_kernel_get_abort_code()
      	rxrpc_kernel_get_error_number()
      	rxrpc_kernel_free_skb()
      	rxrpc_kernel_data_consumed()
      
      unmodified as they're all about to be removed (and, in any case, don't
      touch the socket).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4de48af6
  17. 23 8月, 2016 1 次提交
  18. 13 7月, 2016 1 次提交
  19. 06 7月, 2016 2 次提交
  20. 22 6月, 2016 8 次提交
    • D
      rxrpc: Kill off the rxrpc_transport struct · aa390bbe
      David Howells 提交于
      The rxrpc_transport struct is now redundant, given that the rxrpc_peer
      struct is now per peer port rather than per peer host, so get rid of it.
      
      Service connection lists are transferred to the rxrpc_peer struct, as is
      the conn_lock.  Previous patches moved the client connection handling out
      of the rxrpc_transport struct and discarded the connection bundling code.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aa390bbe
    • D
      rxrpc: Kill the client connection bundle concept · 999b69f8
      David Howells 提交于
      Kill off the concept of maintaining a bundle of connections to a particular
      target service to increase the number of call slots available for any
      beyond four for that service (there are four call slots per connection).
      
      This will make cleaning up the connection handling code easier and
      facilitate removal of the rxrpc_transport struct.  Bundling can be
      reintroduced later if necessary.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      999b69f8
    • D
      rxrpc: Provide more refcount helper functions · 5627cc8b
      David Howells 提交于
      Provide refcount helper functions for connections so that the code doesn't
      touch local or connection usage counts directly.
      
      Also make it such that local and peer put functions can take a NULL
      pointer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5627cc8b
    • D
      rxrpc: Validate the net address given to rxrpc_kernel_begin_call() · f4552c2d
      David Howells 提交于
      Validate the net address given to rxrpc_kernel_begin_call() before using
      it.
      
      Whilst this should be mostly unnecessary for in-kernel users, it does clear
      the tail of the address struct in case we want to hash or compare the whole
      thing.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f4552c2d
    • D
      rxrpc: Use IDR to allocate client conn IDs on a machine-wide basis · 4a3388c8
      David Howells 提交于
      Use the IDR facility to allocate client connection IDs on a machine-wide
      basis so that each client connection has a unique identifier.  When the
      connection ID space wraps, we advance the epoch by 1, thereby effectively
      having a 62-bit ID space.  The IDR facility is then used to look up client
      connections during incoming packet routing instead of using an rbtree
      rooted on the transport.
      
      This change allows for the removal of the transport in the future and also
      means that client connections can be looked up directly in the data-ready
      handler by connection ID.
      
      The ID management code is placed in a new file, conn-client.c, to which all
      the client connection-specific code will eventually move.
      
      Note that the IDR tree gets very expensive on memory if the connection IDs
      are widely scattered throughout the number space, so we shall need to
      retire connections that have, say, an ID more than four times the maximum
      number of client conns away from the current allocation point to try and
      keep the IDs concentrated.  We will also need to retire connections from an
      old epoch.
      
      Also note that, for the moment, a pointer to the transport has to be passed
      through into the ID allocation function so that we can take a BH lock to
      prevent a locking issue against in-BH lookup of client connections.  This
      will go away later when RCU is used for server connections also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4a3388c8
    • D
      rxrpc: Fix exclusive connection handling · cc8feb8e
      David Howells 提交于
      "Exclusive connections" are meant to be used for a single client call and
      then scrapped.  The idea is to limit the use of the negotiated security
      context.  The current code, however, isn't doing this: it is instead
      restricting the socket to a single virtual connection and doing all the
      calls over that.
      
      This is changed such that the socket no longer maintains a special virtual
      connection over which it will do all the calls, but rather gets a new one
      each time a new exclusive call is made.
      
      Further, using a socket option for this is a poor choice.  It should be
      done on sendmsg with a control message marker instead so that calls can be
      marked exclusive individually.  To that end, add RXRPC_EXCLUSIVE_CALL
      which, if passed to sendmsg() as a control message element, will cause the
      call to be done on an single-use connection.
      
      The socket option (RXRPC_EXCLUSIVE_CONNECTION) still exists and, if set,
      will override any lack of RXRPC_EXCLUSIVE_CALL being specified so that
      programs using the setsockopt() will appear to work the same.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc8feb8e
    • D
      rxrpc: Use structs to hold connection params and protocol info · 19ffa01c
      David Howells 提交于
      Define and use a structure to hold connection parameters.  This makes it
      easier to pass multiple connection parameters around.
      
      Define and use a structure to hold protocol information used to hash a
      connection for lookup on incoming packet.  Most of these fields will be
      disposed of eventually, including the duplicate local pointer.
      
      Whilst we're at it rename "proto" to "family" when referring to a protocol
      family.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      19ffa01c
    • D
      rxrpc: checking for IS_ERR() instead of NULL · 0e4699e4
      Dan Carpenter 提交于
      rxrpc_lookup_peer_rcu() and rxrpc_lookup_peer() return NULL on error, never
      error pointers, so IS_ERR() can't be used.
      
      Fix three callers of those functions.
      
      Fixes: be6e6707 ('rxrpc: Rework peer object handling to use hash table and RCU')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0e4699e4
  21. 15 6月, 2016 2 次提交
    • D
      rxrpc: Rework local endpoint management · 4f95dd78
      David Howells 提交于
      Rework the local RxRPC endpoint management.
      
      Local endpoint objects are maintained in a flat list as before.  This
      should be okay as there shouldn't be more than one per open AF_RXRPC socket
      (there can be fewer as local endpoints can be shared if their local service
      ID is 0 and they share the same local transport parameters).
      
      Changes:
      
       (1) Local endpoints may now only be shared if they have local service ID 0
           (ie. they're not being used for listening).
      
           This prevents a scenario where process A is listening of the Cache
           Manager port and process B contacts a fileserver - which may then
           attempt to send CM requests back to B.  But if A and B are sharing a
           local endpoint, A will get the CM requests meant for B.
      
       (2) We use a mutex to handle lookups and don't provide RCU-only lookups
           since we only expect to access the list when opening a socket or
           destroying an endpoint.
      
           The local endpoint object is pointed to by the transport socket's
           sk_user_data for the life of the transport socket - allowing us to
           refer to it directly from the sk_data_ready and sk_error_report
           callbacks.
      
       (3) atomic_inc_not_zero() now exists and can be used to only share a local
           endpoint if the last reference hasn't yet gone.
      
       (4) We can remove rxrpc_local_lock - a spinlock that had to be taken with
           BH processing disabled given that we assume sk_user_data won't change
           under us.
      
       (5) The transport socket is shut down before we clear the sk_user_data
           pointer so that we can be sure that the transport socket's callbacks
           won't be invoked once the RCU destruction is scheduled.
      
       (6) Local endpoints have a work item that handles both destruction and
           event processing.  The means that destruction doesn't then need to
           wait for event processing.  The event queues can then be cleared after
           the transport socket is shut down.
      
       (7) Local endpoints are no longer available for resurrection beyond the
           life of the sockets that had them open.  As soon as their last ref
           goes, they are scheduled for destruction and may not have their usage
           count moved from 0.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4f95dd78
    • D
      rxrpc: Rework peer object handling to use hash table and RCU · be6e6707
      David Howells 提交于
      Rework peer object handling to use a hash table instead of a flat list and
      to use RCU.  Peer objects are no longer destroyed by passing them to a
      workqueue to process, but rather are just passed to the RCU garbage
      collector as kfree'able objects.
      
      The hash function uses the local endpoint plus all the components of the
      remote address, except for the RxRPC service ID.  Peers thus represent a
      UDP port on the remote machine as contacted by a UDP port on this machine.
      
      The RCU read lock is used to handle non-creating lookups so that they can
      be called from bottom half context in the sk_error_report handler without
      having to lock the hash table against modification.
      rxrpc_lookup_peer_rcu() *does* take a reference on the peer object as in
      the future, this will be passed to a work item for error distribution in
      the error_report path and this function will cease being used in the
      data_ready path.
      
      Creating lookups are done under spinlock rather than mutex as they might be
      set up due to an external stimulus if the local endpoint is a server.
      
      Captured network error messages (ICMP) are handled with respect to this
      struct and MTU size and RTT are cached here.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      be6e6707