1. 31 7月, 2018 1 次提交
  2. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  3. 26 5月, 2018 1 次提交
  4. 11 5月, 2018 1 次提交
  5. 31 3月, 2018 2 次提交
    • D
      rxrpc: Fix leak of rxrpc_peer objects · 17226f12
      David Howells 提交于
      When a new client call is requested, an rxrpc_conn_parameters struct object
      is passed in with a bunch of parameters set, such as the local endpoint to
      use.  A pointer to the target peer record is also placed in there by
      rxrpc_get_client_conn() - and this is removed if and only if a new
      connection object is allocated.  Thus it leaks if a new connection object
      isn't allocated.
      
      Fix this by putting any peer object attached to the rxrpc_conn_parameters
      object in the function that allocated it.
      
      Fixes: 19ffa01c ("rxrpc: Use structs to hold connection params and protocol info")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      17226f12
    • D
      rxrpc: Fix firewall route keepalive · ace45bec
      David Howells 提交于
      Fix the firewall route keepalive part of AF_RXRPC which is currently
      function incorrectly by replying to VERSION REPLY packets from the server
      with VERSION REQUEST packets.
      
      Instead, send VERSION REPLY packets to the peers of service connections to
      act as keep-alives 20s after the latest packet was transmitted to that
      peer.
      
      Also, just discard VERSION REPLY packets rather than replying to them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ace45bec
  6. 28 3月, 2018 1 次提交
    • D
      rxrpc, afs: Use debug_ids rather than pointers in traces · a25e21f0
      David Howells 提交于
      In rxrpc and afs, use the debug_ids that are monotonically allocated to
      various objects as they're allocated rather than pointers as kernel
      pointers are now hashed making them less useful.  Further, the debug ids
      aren't reused anywhere nearly as quickly.
      
      In addition, allow kernel services that use rxrpc, such as afs, to take
      numbers from the rxrpc counter, assign them to their own call struct and
      pass them in to rxrpc for both client and service calls so that the trace
      lines for each will have the same ID tag.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a25e21f0
  7. 27 3月, 2018 1 次提交
  8. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  9. 03 12月, 2017 1 次提交
  10. 28 11月, 2017 1 次提交
  11. 24 11月, 2017 3 次提交
    • D
      rxrpc: Fix conn expiry timers · 3d18cbb7
      David Howells 提交于
      Fix the rxrpc connection expiry timers so that connections for closed
      AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
      transport UDP port much more quickly.
      
       (1) Replace the delayed work items with work items plus timers so that
           timer_reduce() can be used to shorten them and so that the timer
           doesn't requeue the work item if the net namespace is dead.
      
       (2) Don't use queue_delayed_work() as that won't alter the timeout if the
           timer is already running.
      
       (3) Don't rearm the timers if the network namespace is dead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d18cbb7
    • D
      rxrpc: Fix service endpoint expiry · f859ab61
      David Howells 提交于
      RxRPC service endpoints expire like they're supposed to by the following
      means:
      
       (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
           global service conn timeout, otherwise the first rxrpc_net struct to
           die will cause connections on all others to expire immediately from
           then on.
      
       (2) Mark local service endpoints for which the socket has been closed
           (->service_closed) so that the expiration timeout can be much
           shortened for service and client connections going through that
           endpoint.
      
       (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
           count reaches 1, not 0, as idle conns have a 1 count.
      
       (4) The accumulator for the earliest time we might want to schedule for
           should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
           the comparison functions use signed arithmetic.
      
       (5) Simplify the expiration handling, adding the expiration value to the
           idle timestamp each time rather than keeping track of the time in the
           past before which the idle timestamp must go to be expired.  This is
           much easier to read.
      
       (6) Ignore the timeouts if the net namespace is dead.
      
       (7) Restart the service reaper work item rather the client reaper.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f859ab61
    • D
      rxrpc: Split the call params from the operation params · 48124178
      David Howells 提交于
      When rxrpc_sendmsg() parses the control message buffer, it places the
      parameters extracted into a structure, but lumps together call parameters
      (such as user call ID) with operation parameters (such as whether to send
      data, send an abort or accept a call).
      
      Split the call parameters out into their own structure, a copy of which is
      then embedded in the operation parameters struct.
      
      The call parameters struct is then passed down into the places that need it
      instead of passing the individual parameters.  This allows for extra call
      parameters to be added.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      48124178
  12. 02 11月, 2017 1 次提交
    • D
      rxrpc: Lock around calling a kernel service Rx notification · 20acbd9a
      David Howells 提交于
      Place a spinlock around the invocation of call->notify_rx() for a kernel
      service call and lock again when ending the call and replace the
      notification pointer with a pointer to a dummy function.
      
      This is required because it's possible for rxrpc_notify_socket() to be
      called after the call has been ended by the kernel service if called from
      the asynchronous work function rxrpc_process_call().
      
      However, rxrpc_notify_socket() currently only holds the RCU read lock when
      invoking ->notify_rx(), which means that the afs_call struct would need to
      be disposed of by call_rcu() rather than by kfree().
      
      But we shouldn't see any notifications from a call after calling
      rxrpc_kernel_end_call(), so a lock is required in rxrpc code.
      
      Without this, we may see the call wait queue as having a corrupt spinlock:
      
          BUG: spinlock bad magic on CPU#0, kworker/0:2/1612
          general protection fault: 0000 [#1] SMP
          ...
          Workqueue: krxrpcd rxrpc_process_call
          task: ffff88040b83c400 task.stack: ffff88040adfc000
          RIP: 0010:spin_bug+0x161/0x18f
          RSP: 0018:ffff88040adffcc0 EFLAGS: 00010002
          RAX: 0000000000000032 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff81ab16cf
          RDX: ffff88041fa14c01 RSI: ffff88041fa0ccb8 RDI: ffff88041fa0ccb8
          RBP: ffff88040adffcd8 R08: 00000000ffffffff R09: 00000000ffffffff
          R10: ffff88040adffc60 R11: 000000000000022c R12: ffff88040aca2208
          R13: ffffffff81a58114 R14: 0000000000000000 R15: 0000000000000000
          ....
          Call Trace:
           do_raw_spin_lock+0x1d/0x89
           _raw_spin_lock_irqsave+0x3d/0x49
           ? __wake_up_common_lock+0x4c/0xa7
           __wake_up_common_lock+0x4c/0xa7
           ? __lock_is_held+0x47/0x7a
           __wake_up+0xe/0x10
           afs_wake_up_call_waiter+0x11b/0x122 [kafs]
           rxrpc_notify_socket+0x12b/0x258
           rxrpc_process_call+0x18e/0x7d0
           process_one_work+0x298/0x4de
           ? rescuer_thread+0x280/0x280
           worker_thread+0x1d1/0x2ae
           ? rescuer_thread+0x280/0x280
           kthread+0x12c/0x134
           ? kthread_create_on_node+0x3a/0x3a
           ret_from_fork+0x27/0x40
      
      In this case, note the corrupt data in EBX.  The address of the offending
      afs_call is in R12, plus the offset to the spinlock.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      20acbd9a
  13. 24 10月, 2017 1 次提交
  14. 22 10月, 2017 1 次提交
  15. 18 10月, 2017 2 次提交
    • D
      rxrpc: Provide functions for allowing cleaner handling of signals · f4d15fb6
      David Howells 提交于
      Provide a couple of functions to allow cleaner handling of signals in a
      kernel service.  They are:
      
       (1) rxrpc_kernel_get_rtt()
      
           This allows the kernel service to find out the RTT time for a call, so
           as to better judge how large a timeout to employ.
      
           Note, though, that whilst this returns a value in nanoseconds, the
           timeouts can only actually be in jiffies.
      
       (2) rxrpc_kernel_check_life()
      
           This returns a number that is updated when ACKs are received from the
           peer (notably including PING RESPONSE ACKs which we can elicit by
           sending PING ACKs to see if the call still exists on the server).
      
           The caller should compare the numbers of two calls to see if the call
           is still alive.
      
      These can be used to provide an extending timeout rather than returning
      immediately in the case that a signal occurs that would otherwise abort an
      RPC operation.  The timeout would be extended if the server is still
      responsive and the call is still apparently alive on the server.
      
      For most operations this isn't that necessary - but for FS.StoreData it is:
      OpenAFS writes the data to storage as it comes in without making a backup,
      so if we immediately abort it when partially complete on a CTRL+C, say, we
      have no idea of the state of the file after the abort.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f4d15fb6
    • D
      rxrpc: Support service upgrade from a kernel service · a68f4a27
      David Howells 提交于
      Provide support for a kernel service to make use of the service upgrade
      facility.  This involves:
      
       (1) Pass an upgrade request flag to rxrpc_kernel_begin_call().
      
       (2) Make rxrpc_kernel_recv_data() return the call's current service ID so
           that the caller can detect service upgrade and see what the service
           was upgraded to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a68f4a27
  16. 29 8月, 2017 2 次提交
    • D
      rxrpc: Allow failed client calls to be retried · c038a58c
      David Howells 提交于
      Allow a client call that failed on network error to be retried, provided
      that the Tx queue still holds DATA packet 1.  This allows an operation to
      be submitted to another server or another address for the same server
      without having to repackage and re-encrypt the data so far processed.
      
      Two new functions are provided:
      
       (1) rxrpc_kernel_check_call() - This is used to find out the completion
           state of a call to guess whether it can be retried and whether it
           should be retried.
      
       (2) rxrpc_kernel_retry_call() - Disconnect the call from its current
           connection, reset the state and submit it as a new client call to a
           new address.  The new address need not match the previous address.
      
      A call may be retried even if all the data hasn't been loaded into it yet;
      a partially constructed will be retained at the same point it was at when
      an error condition was detected.  msg_data_left() can be used to find out
      how much data was packaged before the error occurred.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c038a58c
    • D
      rxrpc: Remove some excess whitespace · 3ec0efde
      David Howells 提交于
      Remove indentation from some blank lines.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3ec0efde
  17. 01 7月, 2017 2 次提交
  18. 08 6月, 2017 2 次提交
    • D
      rxrpc: Provide a cmsg to specify the amount of Tx data for a call · e754eba6
      David Howells 提交于
      Provide a control message that can be specified on the first sendmsg() of a
      client call or the first sendmsg() of a service response to indicate the
      total length of the data to be transmitted for that call.
      
      Currently, because the length of the payload of an encrypted DATA packet is
      encrypted in front of the data, the packet cannot be encrypted until we
      know how much data it will hold.
      
      By specifying the length at the beginning of the transmit phase, each DATA
      packet length can be set before we start loading data from userspace (where
      several sendmsg() calls may contribute to a particular packet).
      
      An error will be returned if too little or too much data is presented in
      the Tx phase.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e754eba6
    • D
      rxrpc: Provide a getsockopt call to query what cmsgs types are supported · 515559ca
      David Howells 提交于
      Provide a getsockopt() call that can query what cmsg types are supported by
      AF_RXRPC.
      515559ca
  19. 05 6月, 2017 3 次提交
    • D
      rxrpc: Implement service upgrade · 4722974d
      David Howells 提交于
      Implement AuriStor's service upgrade facility.  There are three problems
      that this is meant to deal with:
      
       (1) Various of the standard AFS RPC calls have IPv4 addresses in their
           requests and/or replies - but there's no room for including IPv6
           addresses.
      
       (2) Definition of IPv6-specific RPC operations in the standard operation
           sets has not yet been achieved.
      
       (3) One could envision the creation a new service on the same port that as
           the original service.  The new service could implement improved
           operations - and the client could try this first, falling back to the
           original service if it's not there.
      
           Unfortunately, certain servers ignore packets addressed to a service
           they don't implement and don't respond in any way - not even with an
           ABORT.  This means that the client must then wait for the call timeout
           to occur.
      
      What service upgrade does is to see if the connection is marked as being
      'upgradeable' and if so, change the service ID in the server and thus the
      request and reply formats.  Note that the upgrade isn't mandatory - a
      server that supports only the original call set will ignore the upgrade
      request.
      
      In the protocol, the procedure is then as follows:
      
       (1) To request an upgrade, the first DATA packet in a new connection must
           have the userStatus set to 1 (this is normally 0).  The userStatus
           value is normally ignored by the server.
      
       (2) If the server doesn't support upgrading, the reply packets will
           contain the same service ID as for the first request packet.
      
       (3) If the server does support upgrading, all future reply packets on that
           connection will contain the new service ID and the new service ID will
           be applied to *all* further calls on that connection as well.
      
       (4) The RPC op used to probe the upgrade must take the same request data
           as the shadow call in the upgrade set (but may return a different
           reply).  GetCapability RPC ops were added to all standard sets for
           just this purpose.  Ops where the request formats differ cannot be
           used for probing.
      
       (5) The client must wait for completion of the probe before sending any
           further RPC ops to the same destination.  It should then use the
           service ID that recvmsg() reported back in all future calls.
      
       (6) The shadow service must have call definitions for all the operation
           IDs defined by the original service.
      
      
      To support service upgrading, a server should:
      
       (1) Call bind() twice on its AF_RXRPC socket before calling listen().
           Each bind() should supply a different service ID, but the transport
           addresses must be the same.  This allows the server to receive
           requests with either service ID.
      
       (2) Enable automatic upgrading by calling setsockopt(), specifying
           RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
           unsigned shorts as the argument:
      
      	unsigned short optval[2];
      
           This specifies a pair of service IDs.  They must be different and must
           match the service IDs bound to the socket.  Member 0 is the service ID
           to upgrade from and member 1 is the service ID to upgrade to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4722974d
    • D
      rxrpc: Permit multiple service binding · 28036f44
      David Howells 提交于
      Permit bind() to be called on an AF_RXRPC socket more than once (currently
      maximum twice) to bind multiple listening services to it.  There are some
      restrictions:
      
       (1) All bind() calls involved must have a non-zero service ID.
      
       (2) The service IDs must all be different.
      
       (3) The rest of the address (notably the transport part) must be the same
           in all (a single UDP socket is shared).
      
       (4) This must be done before listen() or sendmsg() is called.
      
      This allows someone to connect to the service socket with different service
      IDs and lays the foundation for service upgrading.
      
      The service ID used by an incoming call can be extracted from the msg_name
      returned by recvmsg().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      28036f44
    • D
      rxrpc: Separate the connection's protocol service ID from the lookup ID · 68d6d1ae
      David Howells 提交于
      Keep the rxrpc_connection struct's idea of the service ID that is exposed
      in the protocol separate from the service ID that's used as a lookup key.
      
      This allows the protocol service ID on a client connection to get upgraded
      without making the connection unfindable for other client calls that also
      would like to use the upgraded connection.
      
      The connection's actual service ID is then returned through recvmsg() by
      way of msg_name.
      
      Whilst we're at it, we get rid of the last_service_id field from each
      channel.  The service ID is per-connection, not per-call and an entire
      connection is upgraded in one go.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      68d6d1ae
  20. 26 5月, 2017 1 次提交
    • D
      rxrpc: Support network namespacing · 2baec2c3
      David Howells 提交于
      Support network namespacing in AF_RXRPC with the following changes:
      
       (1) All the local endpoint, peer and call lists, locks, counters, etc. are
           moved into the per-namespace record.
      
       (2) All the connection tracking is moved into the per-namespace record
           with the exception of the client connection ID tree, which is kept
           global so that connection IDs are kept unique per-machine.
      
       (3) Each namespace gets its own epoch.  This allows each network namespace
           to pretend to be a separate client machine.
      
       (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
           the contents reflect the namespace.
      
      fs/afs/ should be okay with this patch as it explicitly requires the current
      net namespace to be init_net to permit a mount to proceed at the moment.  It
      will, however, need updating so that cells, IP addresses and DNS records are
      per-namespace also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2baec2c3
  21. 02 3月, 2017 1 次提交
    • D
      rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48
      David Howells 提交于
      All the routines by which rxrpc is accessed from the outside are serialised
      by means of the socket lock (sendmsg, recvmsg, bind,
      rxrpc_kernel_begin_call(), ...) and this presents a problem:
      
       (1) If a number of calls on the same socket are in the process of
           connection to the same peer, a maximum of four concurrent live calls
           are permitted before further calls need to wait for a slot.
      
       (2) If a call is waiting for a slot, it is deep inside sendmsg() or
           rxrpc_kernel_begin_call() and the entry function is holding the socket
           lock.
      
       (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
           from servicing the other calls as they need to take the socket lock to
           do so.
      
       (4) The socket is stuck until a call is aborted and makes its slot
           available to the waiter.
      
      Fix this by:
      
       (1) Provide each call with a mutex ('user_mutex') that arbitrates access
           by the users of rxrpc separately for each specific call.
      
       (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
           they've got a call and taken its mutex.
      
           Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
           set but someone else has the lock.  Should I instead only return
           EWOULDBLOCK if there's nothing currently to be done on a socket, and
           sleep in this particular instance because there is something to be
           done, but we appear to be blocked by the interrupt handler doing its
           ping?
      
       (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
           call, locking its user mutex and adding it to the socket's call tree.
           The call is returned locked so that sendmsg() can add data to it
           immediately.
      
           From the moment the call is in the socket tree, it is subject to
           access by sendmsg() and recvmsg() - even if it isn't connected yet.
      
       (4) Lock new service calls in the UDP data_ready handler (in
           rxrpc_new_incoming_call()) because they may already be in the socket's
           tree and the data_ready handler makes them live immediately if a user
           ID has already been preassigned.
      
           Note that the new call is locked before any notifications are sent
           that it is live, so doing mutex_trylock() *ought* to always succeed.
           Userspace is prevented from doing sendmsg() on calls that are in a
           too-early state in rxrpc_do_sendmsg().
      
       (5) Make rxrpc_new_incoming_call() return the call with the user mutex
           held so that a ping can be scheduled immediately under it.
      
           Note that it might be worth moving the ping call into
           rxrpc_new_incoming_call() and then we can drop the mutex there.
      
       (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
           release the socket after adding the call to the socket's tree.  This
           is slightly tricky as we've dequeued the call by that point and have
           to requeue it.
      
           Note that requeuing emits a trace event.
      
       (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
           new mutex immediately and don't bother with the socket mutex at all.
      
      This patch has the nice bonus that calls on the same socket are now to some
      extent parallelisable.
      
      Note that we might want to move rxrpc_service_prealloc() calls out from the
      socket lock and give it its own lock, so that we don't hang progress in
      other calls because we're waiting for the allocator.
      
      We probably also want to avoid calling rxrpc_notify_socket() from within
      the socket lock (rxrpc_accept_call()).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540b1c48
  22. 09 1月, 2017 1 次提交
  23. 15 12月, 2016 1 次提交
  24. 06 10月, 2016 1 次提交
  25. 30 9月, 2016 1 次提交
    • D
      rxrpc: Reduce the rxrpc_local::services list to a pointer · 1e9e5c95
      David Howells 提交于
      Reduce the rxrpc_local::services list to just a pointer as we don't permit
      multiple service endpoints to bind to a single transport endpoints (this is
      excluded by rxrpc_lookup_local()).
      
      The reason we don't allow this is that if you send a request to an AFS
      filesystem service, it will try to talk back to your cache manager on the
      port you sent from (this is how file change notifications are handled).  To
      prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
      sockets share a UDP socket if at least one of them has a service bound.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1e9e5c95
  26. 17 9月, 2016 2 次提交
    • D
      rxrpc: Improve skb tracing · 71f3ca40
      David Howells 提交于
      Improve sk_buff tracing within AF_RXRPC by the following means:
      
       (1) Use an enum to note the event type rather than plain integers and use
           an array of event names rather than a big multi ?: list.
      
       (2) Distinguish Rx from Tx packets and account them separately.  This
           requires the call phase to be tracked so that we know what we might
           find in rxtx_buffer[].
      
       (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
           event type.
      
       (4) A pair of 'rotate' events are added to indicate packets that are about
           to be rotated out of the Rx and Tx windows.
      
       (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
           packet loss injection recording.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
       
      71f3ca40
    • D
      rxrpc: Make IPv6 support conditional on CONFIG_IPV6 · d1912747
      David Howells 提交于
      Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
      This is then made conditional on CONFIG_IPV6.
      
      Without this, the following can be seen:
      
         net/built-in.o: In function `rxrpc_init_peer':
      >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1912747
  27. 14 9月, 2016 3 次提交
    • D
      rxrpc: Add IPv6 support · 75b54cb5
      David Howells 提交于
      Add IPv6 support to AF_RXRPC.  With this, AF_RXRPC sockets can be created:
      
      	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);
      
      instead of:
      
      	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
      
      The AFS filesystem doesn't support IPv6 at the moment, though, since that
      requires upgrades to some of the RPC calls.
      
      Note that a good portion of this patch is replacing "%pI4:%u" in print
      statements with "%pISpc" which is able to handle both protocols and print
      the port.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      75b54cb5
    • D
      rxrpc: Create an address for sendmsg() to bind unbound socket with · cd5892c7
      David Howells 提交于
      Create an address for sendmsg() to bind unbound socket with rather than
      using a completely blank address otherwise the transport socket creation
      will fail because it will try to use address family 0.
      
      We use the address family specified in the protocol argument when the
      AF_RXRPC socket was created and SOCK_DGRAM as the default.  For anything
      else, bind() must be used.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cd5892c7
    • D
      rxrpc: Adjust the call ref tracepoint to show kernel API refs · cbd00891
      David Howells 提交于
      Adjust the call ref tracepoint to show references held on a call by the
      kernel API separately as much as possible and add an additional trace to at
      the allocation point from the preallocation buffer for an incoming call.
      
      Note that this doesn't show the allocation of a client call for the kernel
      separately at the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cbd00891
  28. 08 9月, 2016 1 次提交
    • D
      rxrpc: Rewrite the data and ack handling code · 248f219c
      David Howells 提交于
      Rewrite the data and ack handling code such that:
      
       (1) Parsing of received ACK and ABORT packets and the distribution and the
           filing of DATA packets happens entirely within the data_ready context
           called from the UDP socket.  This allows us to process and discard ACK
           and ABORT packets much more quickly (they're no longer stashed on a
           queue for a background thread to process).
      
       (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim().  We instead
           keep track of the offset and length of the content of each packet in
           the sk_buff metadata.  This means we don't do any allocation in the
           receive path.
      
       (3) Jumbo DATA packet parsing is now done in data_ready context.  Rather
           than cloning the packet once for each subpacket and pulling/trimming
           it, we file the packet multiple times with an annotation for each
           indicating which subpacket is there.  From that we can directly
           calculate the offset and length.
      
       (4) A call's receive queue can be accessed without taking locks (memory
           barriers do have to be used, though).
      
       (5) Incoming calls are set up from preallocated resources and immediately
           made live.  They can than have packets queued upon them and ACKs
           generated.  If insufficient resources exist, DATA packet #1 is given a
           BUSY reply and other DATA packets are discarded).
      
       (6) sk_buffs no longer take a ref on their parent call.
      
      To make this work, the following changes are made:
      
       (1) Each call's receive buffer is now a circular buffer of sk_buff
           pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
           between the call and the socket.  This permits each sk_buff to be in
           the buffer multiple times.  The receive buffer is reused for the
           transmit buffer.
      
       (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
           to the data buffer.  Transmission phase annotations indicate whether a
           buffered packet has been ACK'd or not and whether it needs
           retransmission.
      
           Receive phase annotations indicate whether a slot holds a whole packet
           or a jumbo subpacket and, if the latter, which subpacket.  They also
           note whether the packet has been decrypted in place.
      
       (3) DATA packet window tracking is much simplified.  Each phase has just
           two numbers representing the window (rx_hard_ack/rx_top and
           tx_hard_ack/tx_top).
      
           The hard_ack number is the sequence number before base of the window,
           representing the last packet the other side says it has consumed.
           hard_ack starts from 0 and the first packet is sequence number 1.
      
           The top number is the sequence number of the highest-numbered packet
           residing in the buffer.  Packets between hard_ack+1 and top are
           soft-ACK'd to indicate they've been received, but not yet consumed.
      
           Four macros, before(), before_eq(), after() and after_eq() are added
           to compare sequence numbers within the window.  This allows for the
           top of the window to wrap when the hard-ack sequence number gets close
           to the limit.
      
           Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
           to indicate when rx_top and tx_top point at the packets with the
           LAST_PACKET bit set, indicating the end of the phase.
      
       (4) Calls are queued on the socket 'receive queue' rather than packets.
           This means that we don't need have to invent dummy packets to queue to
           indicate abnormal/terminal states and we don't have to keep metadata
           packets (such as ABORTs) around
      
       (5) The offset and length of a (sub)packet's content are now passed to
           the verify_packet security op.  This is currently expected to decrypt
           the packet in place and validate it.
      
           However, there's now nowhere to store the revised offset and length of
           the actual data within the decrypted blob (there may be a header and
           padding to skip) because an sk_buff may represent multiple packets, so
           a locate_data security op is added to retrieve these details from the
           sk_buff content when needed.
      
       (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
           individually secured and needs to be individually decrypted.  The code
           to do this is broken out into rxrpc_recvmsg_data() and shared with the
           kernel API.  It now iterates over the call's receive buffer rather
           than walking the socket receive queue.
      
      Additional changes:
      
       (1) The timers are condensed to a single timer that is set for the soonest
           of three timeouts (delayed ACK generation, DATA retransmission and
           call lifespan).
      
       (2) Transmission of ACK and ABORT packets is effected immediately from
           process-context socket ops/kernel API calls that cause them instead of
           them being punted off to a background work item.  The data_ready
           handler still has to defer to the background, though.
      
       (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
           filesystem can shut down the socket and flush its own work items
           before closing the socket to deal with any in-progress service calls.
      
      Future additional changes that will need to be considered:
      
       (1) Make sure that a call doesn't hog the front of the queue by receiving
           data from the network as fast as userspace is consuming it to the
           exclusion of other calls.
      
       (2) Transmit delayed ACKs from within recvmsg() when we've consumed
           sufficiently more packets to avoid the background work item needing to
           run.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      248f219c