1. 31 3月, 2018 9 次提交
    • D
      rxrpc: Add a tracepoint to track rxrpc_local refcounting · 09d2bf59
      David Howells 提交于
      Add a tracepoint to track reference counting on the rxrpc_local struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      09d2bf59
    • D
      rxrpc: Fix potential call vs socket/net destruction race · d3be4d24
      David Howells 提交于
      rxrpc_call structs don't pin sockets or network namespaces, but may attempt
      to access both after their refcount reaches 0 so that they can detach
      themselves from the network namespace.  However, there's no guarantee that
      the socket still exists at this point (so sock_net(&call->socket->sk) may
      be invalid) and the namespace may have gone away if the call isn't pinning
      a peer.
      
      Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b)
      waiting for all calls to be destroyed when the network namespace goes away.
      
      This was detected by checker:
      
      net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces)
      net/rxrpc/call_object.c:634:57:    expected struct sock const *sk
      net/rxrpc/call_object.c:634:57:    got struct sock [noderef] <asn:4>*<noident>
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d3be4d24
    • D
      rxrpc: Fix checker warnings and errors · 88f2a825
      David Howells 提交于
      Fix various issues detected by checker.
      
      Errors:
      
       (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set
           call->socket.
      
      Warnings:
      
       (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to
           trace_rxrpc_conn() as the where argument.
      
       (*) rxrpc_disconnect_client_call() should get its net pointer via the
           call->conn rather than call->sock to avoid a warning about accessing
           an RCU pointer without protection.
      
       (*) Proc seq start/stop functions need annotation as they pass locks
           between the functions.
      
      False positives:
      
       (*) Checker doesn't correctly handle of seq-retry lock context balance in
           rxrpc_find_service_conn_rcu().
      
       (*) Checker thinks execution may proceed past the BUG() in
           rxrpc_publish_service_conn().
      
       (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in
           rxkad.c.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      88f2a825
    • S
      rxrpc: remove unused static variables · edb63e2b
      Sebastian Andrzej Siewior 提交于
      The rxrpc_security_methods and rxrpc_security_sem user has been removed
      in 648af7fc ("rxrpc: Absorb the rxkad security module"). This was
      noticed by kbuild test robot for the -RT tree but is also true for !RT.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      edb63e2b
    • M
      rxrpc: Fix resend event time calculation · 59299aa1
      Marc Dionne 提交于
      Commit a158bdd3 ("rxrpc: Fix call timeouts") reworked the time calculation
      for the next resend event.  For this calculation, "oldest" will be before
      "now", so ktime_sub(oldest, now) will yield a negative value.  When passed
      to nsecs_to_jiffies which expects an unsigned value, the end result will be
      a very large value, and a resend event scheduled far into the future.  This
      could cause calls to stall if some packets were lost.
      
      Fix by ordering the arguments to ktime_sub correctly.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      59299aa1
    • D
      rxrpc: Don't treat call aborts as conn aborts · 57b0c9d4
      David Howells 提交于
      If a call-level abort is received for the previous call to complete on a
      connection channel, then that abort is queued for the connection processor
      to handle.  Unfortunately, the connection processor then assumes without
      checking that the abort is connection-level (ie. callNumber is 0) and
      distributes it over all active calls on that connection, thereby
      incorrectly aborting them.
      
      Fix this by discarding aborts aimed at a completed call.
      
      Further, discard all packets aimed at a call that's complete if there's
      currently an active call on a channel, since the DATA packets associated
      with the new call automatically terminate the old call.
      
      Fixes: 18bfeba5 ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      57b0c9d4
    • D
      rxrpc: Fix Tx ring annotation after initial Tx failure · 03877bf6
      David Howells 提交于
      rxrpc calls have a ring of packets that are awaiting ACK or retransmission
      and a parallel ring of annotations that tracks the state of those packets.
      If the initial transmission of a packet on the underlying UDP socket fails
      then the packet annotation is marked for resend - but the setting of this
      mark accidentally erases the last-packet mark also stored in the same
      annotation slot.  If this happens, a call won't switch out of the Tx phase
      when all the packets have been transmitted.
      
      Fix this by retaining the last-packet mark and only altering the packet
      state.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      03877bf6
    • D
      rxrpc: Fix a bit of time confusion · f82eb88b
      David Howells 提交于
      The rxrpc_reduce_call_timer() function should be passed the 'current time'
      in jiffies, not the current ktime time.  It's confusing in rxrpc_resend
      because that has to deal with both.  Pass the correct current time in.
      
      Note that this only affects the trace produced and not the functioning of
      the code.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f82eb88b
    • D
      rxrpc: Fix firewall route keepalive · ace45bec
      David Howells 提交于
      Fix the firewall route keepalive part of AF_RXRPC which is currently
      function incorrectly by replying to VERSION REPLY packets from the server
      with VERSION REQUEST packets.
      
      Instead, send VERSION REPLY packets to the peers of service connections to
      act as keep-alives 20s after the latest packet was transmitted to that
      peer.
      
      Also, just discard VERSION REPLY packets rather than replying to them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ace45bec
  2. 28 3月, 2018 4 次提交
  3. 27 3月, 2018 1 次提交
  4. 24 3月, 2018 1 次提交
  5. 16 3月, 2018 1 次提交
  6. 23 2月, 2018 1 次提交
  7. 17 2月, 2018 1 次提交
    • D
      rxrpc: Work around usercopy check · a16b8d0c
      David Howells 提交于
      Due to a check recently added to copy_to_user(), it's now not permitted to
      copy from slab-held data to userspace unless the slab is whitelisted.  This
      affects rxrpc_recvmsg() when it attempts to place an RXRPC_USER_CALL_ID
      control message in the userspace control message buffer.  A warning is
      generated by usercopy_warn() because the source is the copy of the
      user_call_ID retained in the rxrpc_call struct.
      
      Work around the issue by copying the user_call_ID to a variable on the
      stack and passing that to put_cmsg().
      
      The warning generated looks like:
      
      	Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'dmaengine-unmap-128' (offset 680, size 8)!
      	WARNING: CPU: 0 PID: 1401 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0
      	...
      	RIP: 0010:usercopy_warn+0x7e/0xa0
      	...
      	Call Trace:
      	 __check_object_size+0x9c/0x1a0
      	 put_cmsg+0x98/0x120
      	 rxrpc_recvmsg+0x6fc/0x1010 [rxrpc]
      	 ? finish_wait+0x80/0x80
      	 ___sys_recvmsg+0xf8/0x240
      	 ? __clear_rsb+0x25/0x3d
      	 ? __clear_rsb+0x15/0x3d
      	 ? __clear_rsb+0x25/0x3d
      	 ? __clear_rsb+0x15/0x3d
      	 ? __clear_rsb+0x25/0x3d
      	 ? __clear_rsb+0x15/0x3d
      	 ? __clear_rsb+0x25/0x3d
      	 ? __clear_rsb+0x15/0x3d
      	 ? finish_task_switch+0xa6/0x2b0
      	 ? trace_hardirqs_on_caller+0xed/0x180
      	 ? _raw_spin_unlock_irq+0x29/0x40
      	 ? __sys_recvmsg+0x4e/0x90
      	 __sys_recvmsg+0x4e/0x90
      	 do_syscall_64+0x7a/0x220
      	 entry_SYSCALL_64_after_hwframe+0x26/0x9b
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Tested-by: NJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a16b8d0c
  8. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  9. 09 2月, 2018 1 次提交
    • D
      rxrpc: Don't put crypto buffers on the stack · 8c2f826d
      David Howells 提交于
      Don't put buffers of data to be handed to crypto on the stack as this may
      cause an assertion failure in the kernel (see below).  Fix this by using an
      kmalloc'd buffer instead.
      
      kernel BUG at ./include/linux/scatterlist.h:147!
      ...
      RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
      RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028
      RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88
      RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95
      R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08
      FS:  0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
       rxrpc_process_connection+0xd1/0x690 [rxrpc]
       ? process_one_work+0x1c3/0x680
       ? __lock_is_held+0x59/0xa0
       process_one_work+0x249/0x680
       worker_thread+0x3a/0x390
       ? process_one_work+0x680/0x680
       kthread+0x121/0x140
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x3a/0x50
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c2f826d
  10. 08 2月, 2018 1 次提交
    • D
      rxrpc: Fix received abort handling · 17e9e23b
      David Howells 提交于
      AF_RXRPC is incorrectly sending back to the server any abort it receives
      for a client connection.  This is due to the final-ACK offload to the
      connection event processor patch.  The abort code is copied into the
      last-call information on the connection channel and then the event
      processor is set.
      
      Instead, the following should be done:
      
       (1) In the case of a final-ACK for a successful call, the ACK should be
           scheduled as before.
      
       (2) In the case of a locally generated ABORT, the ABORT details should be
           cached for sending in response to further packets related to that
           call and no further action scheduled at call disconnect time.
      
       (3) In the case of an ACK received from the peer, the call should be
           considered dead, no ABORT should be transmitted at this time.  In
           response to further non-ABORT packets from the peer relating to this
           call, an RX_USER_ABORT ABORT should be transmitted.
      
       (4) In the case of a call killed due to network error, an RX_USER_ABORT
           ABORT should be cached for transmission in response to further
           packets, but no ABORT should be sent at this time.
      
      Fixes: 3136ef49 ("rxrpc: Delay terminal ACK transmission on a client call")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17e9e23b
  11. 17 1月, 2018 1 次提交
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  12. 03 12月, 2017 1 次提交
  13. 29 11月, 2017 3 次提交
    • G
      rxrpc: Fix variable overwrite · 282ef472
      Gustavo A. R. Silva 提交于
      Values assigned to both variable resend_at and ack_at are overwritten
      before they can be used.
      
      The correct fix here is to add 'now' to the previously computed value in
      resend_at and ack_at.
      
      Addresses-Coverity-ID: 1462262
      Addresses-Coverity-ID: 1462263
      Addresses-Coverity-ID: 1462264
      Fixes: beb8e5e4 ("rxrpc: Express protocol timeouts in terms of RTT")
      Link: https://marc.info/?i=17004.1511808959%40warthog.procyon.org.ukSigned-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      282ef472
    • D
      rxrpc: Fix ACK generation from the connection event processor · 5fc62f6a
      David Howells 提交于
      Repeat terminal ACKs and now terminal ACKs are now generated from the
      connection event processor rather from call handling as this allows us to
      discard client call structures as soon as possible and free up the channel
      for a follow on call.
      
      However, in ACKs so generated, the additional information trailer is
      malformed because the padding that's meant to be in the middle isn't
      included in what's transmitted.
      
      Fix it so that the 3 bytes of padding are included in the transmission.
      
      Further, the trailer is misaligned because of the padding, so assigment to
      the u16 and u32 fields inside it might cause problems on some arches, so
      fix this by breaking the padding and the trailer out of the packed struct.
      
      (This also deals with potential compiler weirdies where some of the nested
      structs are packed and some aren't).
      
      The symptoms can be seen in wireshark as terminal DUPLICATE or IDLE ACK
      packets in which the Max MTU, Interface MTU and rwind fields have weird
      values and the Max Packets field is apparently missing.
      Reported-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5fc62f6a
    • D
      rxrpc: Clean up whitespace · 3d7682af
      David Howells 提交于
      Clean up some whitespace from rxrpc.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d7682af
  14. 28 11月, 2017 1 次提交
  15. 24 11月, 2017 12 次提交
    • D
      rxrpc: Fix conn expiry timers · 3d18cbb7
      David Howells 提交于
      Fix the rxrpc connection expiry timers so that connections for closed
      AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
      transport UDP port much more quickly.
      
       (1) Replace the delayed work items with work items plus timers so that
           timer_reduce() can be used to shorten them and so that the timer
           doesn't requeue the work item if the net namespace is dead.
      
       (2) Don't use queue_delayed_work() as that won't alter the timeout if the
           timer is already running.
      
       (3) Don't rearm the timers if the network namespace is dead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d18cbb7
    • D
      rxrpc: Fix service endpoint expiry · f859ab61
      David Howells 提交于
      RxRPC service endpoints expire like they're supposed to by the following
      means:
      
       (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
           global service conn timeout, otherwise the first rxrpc_net struct to
           die will cause connections on all others to expire immediately from
           then on.
      
       (2) Mark local service endpoints for which the socket has been closed
           (->service_closed) so that the expiration timeout can be much
           shortened for service and client connections going through that
           endpoint.
      
       (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
           count reaches 1, not 0, as idle conns have a 1 count.
      
       (4) The accumulator for the earliest time we might want to schedule for
           should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
           the comparison functions use signed arithmetic.
      
       (5) Simplify the expiration handling, adding the expiration value to the
           idle timestamp each time rather than keeping track of the time in the
           past before which the idle timestamp must go to be expired.  This is
           much easier to read.
      
       (6) Ignore the timeouts if the net namespace is dead.
      
       (7) Restart the service reaper work item rather the client reaper.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f859ab61
    • D
      rxrpc: Add keepalive for a call · 415f44e4
      David Howells 提交于
      We need to transmit a packet every so often to act as a keepalive for the
      peer (which has a timeout from the last time it received a packet) and also
      to prevent any intervening firewalls from closing the route.
      
      Do this by resetting a timer every time we transmit a packet.  If the timer
      ever expires, we transmit a PING ACK packet and thereby also elicit a PING
      RESPONSE ACK from the other side - which prevents our last-rx timeout from
      expiring.
      
      The timer is set to 1/6 of the last-rx timeout so that we can detect the
      other side going away if it misses 6 replies in a row.
      
      This is particularly necessary for servers where the processing of the
      service function may take a significant amount of time.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      415f44e4
    • D
      rxrpc: Add a timeout for detecting lost ACKs/lost DATA · bd1fdf8c
      David Howells 提交于
      Add an extra timeout that is set/updated when we send a DATA packet that
      has the request-ack flag set.  This allows us to detect if we don't get an
      ACK in response to the latest flagged packet.
      
      The ACK packet is adjudged to have been lost if it doesn't turn up within
      2*RTT of the transmission.
      
      If the timeout occurs, we schedule the sending of a PING ACK to find out
      the state of the other side.  If a new DATA packet is ready to go sooner,
      we cancel the sending of the ping and set the request-ack flag on that
      instead.
      
      If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
      we had at the time of the ping transmission, we adjudge all the DATA
      packets sent between the response tx_top and the ping-time tx_top to have
      been lost and retransmit immediately.
      
      Rather than sending a PING ACK, we could just pick a DATA packet and
      speculatively retransmit that with request-ack set.  It should result in
      either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
      a PING-RESPONSE ACK mentioned above.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bd1fdf8c
    • D
      rxrpc: Express protocol timeouts in terms of RTT · beb8e5e4
      David Howells 提交于
      Express protocol timeouts for data retransmission and deferred ack
      generation in terms on RTT rather than specified timeouts once we have
      sufficient RTT samples.
      
      For the moment, this requires just one RTT sample to be able to use this
      for ack deferral and two for data retransmission.
      
      The data retransmission timeout is set at RTT*1.5 and the ACK deferral
      timeout is set at RTT.
      
      Note that the calculated timeout is limited to a minimum of 4ns to make
      sure it doesn't happen too quickly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      beb8e5e4
    • D
      rxrpc: Don't transmit DELAY ACKs immediately on proposal · 8637abaa
      David Howells 提交于
      Don't transmit a DELAY ACK immediately on proposal when the Rx window is
      rotated, but rather defer it to the work function.  This means that we have
      a chance to queue/consume more received packets before we actually send the
      DELAY ACK, or even cancel it entirely, thereby reducing the number of
      packets transmitted.
      
      We do, however, want to continue sending other types of packet immediately,
      particularly REQUESTED ACKs, as they may be used for RTT calculation by the
      other side.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8637abaa
    • D
      rxrpc: Fix call timeouts · a158bdd3
      David Howells 提交于
      Fix the rxrpc call expiration timeouts and make them settable from
      userspace.  By analogy with other rx implementations, there should be three
      timeouts:
      
       (1) "Normal timeout"
      
           This is set for all calls and is triggered if we haven't received any
           packets from the peer in a while.  It is measured from the last time
           we received any packet on that call.  This is not reset by any
           connection packets (such as CHALLENGE/RESPONSE packets).
      
           If a service operation takes a long time, the server should generate
           PING ACKs at a duration that's substantially less than the normal
           timeout so is to keep both sides alive.  This is set at 1/6 of normal
           timeout.
      
       (2) "Idle timeout"
      
           This is set only for a service call and is triggered if we stop
           receiving the DATA packets that comprise the request data.  It is
           measured from the last time we received a DATA packet.
      
       (3) "Hard timeout"
      
           This can be set for a call and specified the maximum lifetime of that
           call.  It should not be specified by default.  Some operations (such
           as volume transfer) take a long time.
      
      Allow userspace to set/change the timeouts on a call with sendmsg, using a
      control message:
      
      	RXRPC_SET_CALL_TIMEOUTS
      
      The data to the message is a number of 32-bit words, not all of which need
      be given:
      
      	u32 hard_timeout;	/* sec from first packet */
      	u32 idle_timeout;	/* msec from packet Rx */
      	u32 normal_timeout;	/* msec from data Rx */
      
      This can be set in combination with any other sendmsg() that affects a
      call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a158bdd3
    • D
      rxrpc: Split the call params from the operation params · 48124178
      David Howells 提交于
      When rxrpc_sendmsg() parses the control message buffer, it places the
      parameters extracted into a structure, but lumps together call parameters
      (such as user call ID) with operation parameters (such as whether to send
      data, send an abort or accept a call).
      
      Split the call parameters out into their own structure, a copy of which is
      then embedded in the operation parameters struct.
      
      The call parameters struct is then passed down into the places that need it
      instead of passing the individual parameters.  This allows for extra call
      parameters to be added.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      48124178
    • D
      rxrpc: Delay terminal ACK transmission on a client call · 3136ef49
      David Howells 提交于
      Delay terminal ACK transmission on a client call by deferring it to the
      connection processor.  This allows it to be skipped if we can send the next
      call instead, the first DATA packet of which will implicitly ack this call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3136ef49
    • D
      rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls · 9faaff59
      David Howells 提交于
      Provide a different lockdep key for rxrpc_call::user_mutex when the call is
      made on a kernel socket, such as by the AFS filesystem.
      
      The problem is that lockdep registers a false positive between userspace
      calling the sendmsg syscall on a user socket where call->user_mutex is held
      whilst userspace memory is accessed whereas the AFS filesystem may perform
      operations with mmap_sem held by the caller.
      
      In such a case, the following warning is produced.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.14.0-fscache+ #243 Tainted: G            E
      ------------------------------------------------------
      modpost/16701 is trying to acquire lock:
       (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]
      
      but task is already holding lock:
       (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&mm->mmap_sem){++++}:
             __might_fault+0x61/0x89
             _copy_from_iter_full+0x40/0x1fa
             rxrpc_send_data+0x8dc/0xff3
             rxrpc_do_sendmsg+0x62f/0x6a1
             rxrpc_sendmsg+0x166/0x1b7
             sock_sendmsg+0x2d/0x39
             ___sys_sendmsg+0x1ad/0x22b
             __sys_sendmsg+0x41/0x62
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #2 (&call->user_mutex){+.+.}:
             __mutex_lock+0x86/0x7d2
             rxrpc_new_client_call+0x378/0x80e
             rxrpc_kernel_begin_call+0xf3/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_vl_get_capabilities+0x193/0x198 [kafs]
             afs_vl_lookup_vldb+0x5f/0x151 [kafs]
             afs_create_volume+0x2e/0x2f4 [kafs]
             afs_mount+0x56a/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #1 (k-sk_lock-AF_RXRPC){+.+.}:
             lock_sock_nested+0x74/0x8a
             rxrpc_kernel_begin_call+0x8a/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_fs_get_capabilities+0x17a/0x17f [kafs]
             afs_probe_fileserver+0xf7/0x2f0 [kafs]
             afs_select_fileserver+0x83f/0x903 [kafs]
             afs_fetch_status+0x89/0x11d [kafs]
             afs_iget+0x16f/0x4f8 [kafs]
             afs_mount+0x6c6/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #0 (&vnode->io_lock){+.+.}:
             lock_acquire+0x174/0x19f
             __mutex_lock+0x86/0x7d2
             afs_begin_vnode_operation+0x33/0x77 [kafs]
             afs_fetch_data+0x80/0x12a [kafs]
             afs_readpages+0x314/0x405 [kafs]
             __do_page_cache_readahead+0x203/0x2ba
             filemap_fault+0x179/0x54d
             __do_fault+0x17/0x60
             __handle_mm_fault+0x6d7/0x95c
             handle_mm_fault+0x24e/0x2a3
             __do_page_fault+0x301/0x486
             do_page_fault+0x236/0x259
             page_fault+0x22/0x30
             __clear_user+0x3d/0x60
             padzero+0x1c/0x2b
             load_elf_binary+0x785/0xdc7
             search_binary_handler+0x81/0x1ff
             do_execveat_common.isra.14+0x600/0x888
             do_execve+0x1f/0x21
             SyS_execve+0x28/0x2f
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      other info that might help us debug this:
      
      Chain exists of:
        &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&mm->mmap_sem);
                                     lock(&call->user_mutex);
                                     lock(&mm->mmap_sem);
        lock(&vnode->io_lock);
      
       *** DEADLOCK ***
      
      1 lock held by modpost/16701:
       #0:  (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      stack backtrace:
      CPU: 0 PID: 16701 Comm: modpost Tainted: G            E   4.14.0-fscache+ #243
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Call Trace:
       dump_stack+0x67/0x8e
       print_circular_bug+0x341/0x34f
       check_prev_add+0x11f/0x5d4
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? __lock_acquire+0xf77/0x10b4
       __lock_acquire+0xf77/0x10b4
       lock_acquire+0x174/0x19f
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       __mutex_lock+0x86/0x7d2
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_fetch_data+0x80/0x12a [kafs]
       afs_readpages+0x314/0x405 [kafs]
       __do_page_cache_readahead+0x203/0x2ba
       ? filemap_fault+0x179/0x54d
       filemap_fault+0x179/0x54d
       __do_fault+0x17/0x60
       __handle_mm_fault+0x6d7/0x95c
       handle_mm_fault+0x24e/0x2a3
       __do_page_fault+0x301/0x486
       do_page_fault+0x236/0x259
       page_fault+0x22/0x30
      RIP: 0010:__clear_user+0x3d/0x60
      RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
      RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
      R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
       padzero+0x1c/0x2b
       load_elf_binary+0x785/0xdc7
       search_binary_handler+0x81/0x1ff
       do_execveat_common.isra.14+0x600/0x888
       do_execve+0x1f/0x21
       SyS_execve+0x28/0x2f
       do_syscall_64+0x89/0x1be
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7fdb6009ee07
      RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
      RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
      RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
      R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9faaff59
    • D
      rxrpc: Don't set upgrade by default in sendmsg() · 48ca2463
      David Howells 提交于
      Don't set upgrade by default when creating a call from sendmsg().  This is
      a holdover from when I was testing the code.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      48ca2463
    • D
      rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing · 03a6c822
      David Howells 提交于
      The caller of rxrpc_accept_call() must release the lock on call->user_mutex
      returned by that function.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      03a6c822
  16. 22 11月, 2017 1 次提交
    • K
      treewide: setup_timer() -> timer_setup() · e99e88a9
      Kees Cook 提交于
      This converts all remaining cases of the old setup_timer() API into using
      timer_setup(), where the callback argument is the structure already
      holding the struct timer_list. These should have no behavioral changes,
      since they just change which pointer is passed into the callback with
      the same available pointers after conversion. It handles the following
      examples, in addition to some other variations.
      
      Casting from unsigned long:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, ptr);
      
      and forced object casts:
      
          void my_callback(struct something *ptr)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);
      
      become:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      Direct function assignments:
      
          void my_callback(unsigned long data)
          {
              struct something *ptr = (struct something *)data;
          ...
          }
          ...
          ptr->my_timer.function = my_callback;
      
      have a temporary cast added, along with converting the args:
      
          void my_callback(struct timer_list *t)
          {
              struct something *ptr = from_timer(ptr, t, my_timer);
          ...
          }
          ...
          ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;
      
      And finally, callbacks without a data assignment:
      
          void my_callback(unsigned long data)
          {
          ...
          }
          ...
          setup_timer(&ptr->my_timer, my_callback, 0);
      
      have their argument renamed to verify they're unused during conversion:
      
          void my_callback(struct timer_list *unused)
          {
          ...
          }
          ...
          timer_setup(&ptr->my_timer, my_callback, 0);
      
      The conversion is done with the following Coccinelle script:
      
      spatch --very-quiet --all-includes --include-headers \
      	-I ./arch/x86/include -I ./arch/x86/include/generated \
      	-I ./include -I ./arch/x86/include/uapi \
      	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
      	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
      	--dir . \
      	--cocci-file ~/src/data/timer_setup.cocci
      
      @fix_address_of@
      expression e;
      @@
      
       setup_timer(
      -&(e)
      +&e
       , ...)
      
      // Update any raw setup_timer() usages that have a NULL callback, but
      // would otherwise match change_timer_function_usage, since the latter
      // will update all function assignments done in the face of a NULL
      // function initialization in setup_timer().
      @change_timer_function_usage_NULL@
      expression _E;
      identifier _timer;
      type _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, NULL, _E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E->_timer, NULL, (_cast_data)_E);
      +timer_setup(&_E->_timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, &_E);
      +timer_setup(&_E._timer, NULL, 0);
      |
      -setup_timer(&_E._timer, NULL, (_cast_data)&_E);
      +timer_setup(&_E._timer, NULL, 0);
      )
      
      @change_timer_function_usage@
      expression _E;
      identifier _timer;
      struct timer_list _stl;
      identifier _callback;
      type _cast_func, _cast_data;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
      +timer_setup(&_E._timer, _callback, 0);
      |
       _E->_timer@_stl.function = _callback;
      |
       _E->_timer@_stl.function = &_callback;
      |
       _E->_timer@_stl.function = (_cast_func)_callback;
      |
       _E->_timer@_stl.function = (_cast_func)&_callback;
      |
       _E._timer@_stl.function = _callback;
      |
       _E._timer@_stl.function = &_callback;
      |
       _E._timer@_stl.function = (_cast_func)_callback;
      |
       _E._timer@_stl.function = (_cast_func)&_callback;
      )
      
      // callback(unsigned long arg)
      @change_callback_handle_cast
       depends on change_timer_function_usage@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      (
      	... when != _origarg
      	_handletype *_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(_handletype *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      |
      	... when != _origarg
      	_handletype *_handle;
      	... when != _handle
      	_handle =
      -(void *)_origarg;
      +from_timer(_handle, t, _timer);
      	... when != _origarg
      )
       }
      
      // callback(unsigned long arg) without existing variable
      @change_callback_handle_cast_no_arg
       depends on change_timer_function_usage &&
                           !change_callback_handle_cast@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _origtype;
      identifier _origarg;
      type _handletype;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *t
       )
       {
      +	_handletype *_origarg = from_timer(_origarg, t, _timer);
      +
      	... when != _origarg
      -	(_handletype *)_origarg
      +	_origarg
      	... when != _origarg
       }
      
      // Avoid already converted callbacks.
      @match_callback_converted
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
      	    !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       { ... }
      
      // callback(struct something *handle)
      @change_callback_handle_arg
       depends on change_timer_function_usage &&
      	    !match_callback_converted &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      @@
      
       void _callback(
      -_handletype *_handle
      +struct timer_list *t
       )
       {
      +	_handletype *_handle = from_timer(_handle, t, _timer);
      	...
       }
      
      // If change_callback_handle_arg ran on an empty function, remove
      // the added handler.
      @unchange_callback_handle_arg
       depends on change_timer_function_usage &&
      	    change_callback_handle_arg@
      identifier change_timer_function_usage._callback;
      identifier change_timer_function_usage._timer;
      type _handletype;
      identifier _handle;
      identifier t;
      @@
      
       void _callback(struct timer_list *t)
       {
      -	_handletype *_handle = from_timer(_handle, t, _timer);
       }
      
      // We only want to refactor the setup_timer() data argument if we've found
      // the matching callback. This undoes changes in change_timer_function_usage.
      @unchange_timer_function_usage
       depends on change_timer_function_usage &&
                  !change_callback_handle_cast &&
                  !change_callback_handle_cast_no_arg &&
      	    !change_callback_handle_arg@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type change_timer_function_usage._cast_data;
      @@
      
      (
      -timer_setup(&_E->_timer, _callback, 0);
      +setup_timer(&_E->_timer, _callback, (_cast_data)_E);
      |
      -timer_setup(&_E._timer, _callback, 0);
      +setup_timer(&_E._timer, _callback, (_cast_data)&_E);
      )
      
      // If we fixed a callback from a .function assignment, fix the
      // assignment cast now.
      @change_timer_function_assignment
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression change_timer_function_usage._E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_func;
      typedef TIMER_FUNC_TYPE;
      @@
      
      (
       _E->_timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E->_timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -&_callback;
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      |
       _E._timer.function =
      -(_cast_func)&_callback
      +(TIMER_FUNC_TYPE)_callback
       ;
      )
      
      // Sometimes timer functions are called directly. Replace matched args.
      @change_timer_function_calls
       depends on change_timer_function_usage &&
                  (change_callback_handle_cast ||
                   change_callback_handle_cast_no_arg ||
                   change_callback_handle_arg)@
      expression _E;
      identifier change_timer_function_usage._timer;
      identifier change_timer_function_usage._callback;
      type _cast_data;
      @@
      
       _callback(
      (
      -(_cast_data)_E
      +&_E->_timer
      |
      -(_cast_data)&_E
      +&_E._timer
      |
      -_E
      +&_E->_timer
      )
       )
      
      // If a timer has been configured without a data argument, it can be
      // converted without regard to the callback argument, since it is unused.
      @match_timer_function_unused_data@
      expression _E;
      identifier _timer;
      identifier _callback;
      @@
      
      (
      -setup_timer(&_E->_timer, _callback, 0);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0L);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E->_timer, _callback, 0UL);
      +timer_setup(&_E->_timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0L);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_E._timer, _callback, 0UL);
      +timer_setup(&_E._timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0L);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(&_timer, _callback, 0UL);
      +timer_setup(&_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0L);
      +timer_setup(_timer, _callback, 0);
      |
      -setup_timer(_timer, _callback, 0UL);
      +timer_setup(_timer, _callback, 0);
      )
      
      @change_callback_unused_data
       depends on match_timer_function_unused_data@
      identifier match_timer_function_unused_data._callback;
      type _origtype;
      identifier _origarg;
      @@
      
       void _callback(
      -_origtype _origarg
      +struct timer_list *unused
       )
       {
      	... when != _origarg
       }
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e99e88a9