1. 20 5月, 2020 1 次提交
  2. 11 5月, 2020 1 次提交
    • D
      rxrpc: Fix the excessive initial retransmission timeout · c410bf01
      David Howells 提交于
      rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
      sufficiently sampled.  This can cause problems with some fileservers with
      calls to the cache manager in the afs filesystem being dropped from the
      fileserver because a packet goes missing and the retransmission timeout is
      greater than the call expiry timeout.
      
      Fix this by:
      
       (1) Copying the RTT/RTO calculation code from Linux's TCP implementation
           and altering it to fit rxrpc.
      
       (2) Altering the various users of the RTT to make use of the new SRTT
           value.
      
       (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
           value instead (which is needed in jiffies), along with a backoff.
      
      Notes:
      
       (1) rxrpc provides RTT samples by matching the serial numbers on outgoing
           DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
           against the reference serial number in incoming REQUESTED ACK and
           PING-RESPONSE ACK packets.
      
       (2) Each packet that is transmitted on an rxrpc connection gets a new
           per-connection serial number, even for retransmissions, so an ACK can
           be cross-referenced to a specific trigger packet.  This allows RTT
           information to be drawn from retransmitted DATA packets also.
      
       (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
           on an rxrpc_call because many RPC calls won't live long enough to
           generate more than one sample.
      
       (4) The calculated SRTT value is in units of 8ths of a microsecond rather
           than nanoseconds.
      
      The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.
      
      Fixes: 17926a79 ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c410bf01
  3. 14 3月, 2020 1 次提交
    • D
      afs: Fix client call Rx-phase signal handling · 7d7587db
      David Howells 提交于
      Fix the handling of signals in client rxrpc calls made by the afs
      filesystem.  Ignore signals completely, leaving call abandonment or
      connection loss to be detected by timeouts inside AF_RXRPC.
      
      Allowing a filesystem call to be interrupted after the entire request has
      been transmitted and an abort sent means that the server may or may not
      have done the action - and we don't know.  It may even be worse than that
      for older servers.
      
      Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7d7587db
  4. 31 1月, 2020 1 次提交
    • D
      rxrpc: Fix insufficient receive notification generation · f71dbf2f
      David Howells 提交于
      In rxrpc_input_data(), rxrpc_notify_socket() is called if the base sequence
      number of the packet is immediately following the hard-ack point at the end
      of the function.  However, this isn't sufficient, since the recvmsg side
      may have been advancing the window and then overrun the position in which
      we're adding - at which point rx_hard_ack >= seq0 and no notification is
      generated.
      
      Fix this by always generating a notification at the end of the input
      function.
      
      Without this, a long call may stall, possibly indefinitely.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f71dbf2f
  5. 27 1月, 2020 1 次提交
    • D
      rxrpc: Fix use-after-free in rxrpc_receive_data() · 122d74fa
      David Howells 提交于
      The subpacket scanning loop in rxrpc_receive_data() references the
      subpacket count in the private data part of the sk_buff in the loop
      termination condition.  However, when the final subpacket is pasted into
      the ring buffer, the function is no longer has a ref on the sk_buff and
      should not be looking at sp->* any more.  This point is actually marked in
      the code when skb is cleared (but sp is not - which is an error).
      
      Fix this by caching sp->nr_subpackets in a local variable and using that
      instead.
      
      Also clear 'sp' to catch accesses after that point.
      
      This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets
      trashed by the sk_buff getting freed and reused in the meantime.
      
      Fixes: e2de6c40 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      122d74fa
  6. 21 12月, 2019 1 次提交
  7. 05 9月, 2019 1 次提交
  8. 27 8月, 2019 5 次提交
    • D
      rxrpc: Use skb_unshare() rather than skb_cow_data() · d0d5c0cd
      David Howells 提交于
      The in-place decryption routines in AF_RXRPC's rxkad security module
      currently call skb_cow_data() to make sure the data isn't shared and that
      the skb can be written over.  This has a problem, however, as the softirq
      handler may be still holding a ref or the Rx ring may be holding multiple
      refs when skb_cow_data() is called in rxkad_verify_packet() - and so
      skb_shared() returns true and __pskb_pull_tail() dislikes that.  If this
      occurs, something like the following report will be generated.
      
      	kernel BUG at net/core/skbuff.c:1463!
      	...
      	RIP: 0010:pskb_expand_head+0x253/0x2b0
      	...
      	Call Trace:
      	 __pskb_pull_tail+0x49/0x460
      	 skb_cow_data+0x6f/0x300
      	 rxkad_verify_packet+0x18b/0xb10 [rxrpc]
      	 rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc]
      	 rxrpc_kernel_recv_data+0x126/0x240 [rxrpc]
      	 afs_extract_data+0x51/0x2d0 [kafs]
      	 afs_deliver_fs_fetch_data+0x188/0x400 [kafs]
      	 afs_deliver_to_call+0xac/0x430 [kafs]
      	 afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs]
      	 afs_make_call+0x282/0x3f0 [kafs]
      	 afs_fs_fetch_data+0x164/0x300 [kafs]
      	 afs_fetch_data+0x54/0x130 [kafs]
      	 afs_readpages+0x20d/0x340 [kafs]
      	 read_pages+0x66/0x180
      	 __do_page_cache_readahead+0x188/0x1a0
      	 ondemand_readahead+0x17d/0x2e0
      	 generic_file_read_iter+0x740/0xc10
      	 __vfs_read+0x145/0x1a0
      	 vfs_read+0x8c/0x140
      	 ksys_read+0x4a/0xb0
      	 do_syscall_64+0x43/0xf0
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fix this by using skb_unshare() instead in the input path for DATA packets
      that have a security index != 0.  Non-DATA packets don't need in-place
      encryption and neither do unencrypted DATA packets.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Reported-by: NJulian Wollrath <jwollrath@web.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d0d5c0cd
    • D
      rxrpc: Use the tx-phase skb flag to simplify tracing · 987db9f7
      David Howells 提交于
      Use the previously-added transmit-phase skbuff private flag to simplify the
      socket buffer tracing a bit.  Which phase the skbuff comes from can now be
      divined from the skb rather than having to be guessed from the call state.
      
      We can also reduce the number of rxrpc_skb_trace values by eliminating the
      difference between Tx and Rx in the symbols.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      987db9f7
    • D
      rxrpc: Pass the input handler's data skb reference to the Rx ring · 4858e403
      David Howells 提交于
      Pass the reference held on a DATA skb in the rxrpc input handler into the
      Rx ring rather than getting an additional ref for this and then dropping
      the original ref at the end.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4858e403
    • D
      rxrpc: Use info in skbuff instead of reparsing a jumbo packet · e2de6c40
      David Howells 提交于
      Use the information now cached in the skbuff private data to avoid the need
      to reparse a jumbo packet.  We can find all the subpackets by dead
      reckoning, so it's only necessary to note how many there are, whether the
      last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK
      flag set.
      
      This is necessary as once recvmsg() can see the packet, it can start
      modifying it, such as doing in-place decryption.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e2de6c40
    • D
      rxrpc: Improve jumbo packet counting · c3c9e3df
      David Howells 提交于
      Improve the information stored about jumbo packets so that we don't need to
      reparse them so much later.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      c3c9e3df
  9. 09 8月, 2019 2 次提交
    • D
      rxrpc: Don't bother generating maxSkew in the ACK packet · e8c3af6b
      David Howells 提交于
      Don't bother generating maxSkew in the ACK packet as it has been obsolete
      since AFS 3.1.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      e8c3af6b
    • D
      rxrpc: Fix local endpoint refcounting · 730c5fd4
      David Howells 提交于
      The object lifetime management on the rxrpc_local struct is broken in that
      the rxrpc_local_processor() function is expected to clean up and remove an
      object - but it may get requeued by packets coming in on the backing UDP
      socket once it starts running.
      
      This may result in the assertion in rxrpc_local_rcu() firing because the
      memory has been scheduled for RCU destruction whilst still queued:
      
      	rxrpc: Assertion failed
      	------------[ cut here ]------------
      	kernel BUG at net/rxrpc/local_object.c:468!
      
      Note that if the processor comes around before the RCU free function, it
      will just do nothing because ->dead is true.
      
      Fix this by adding a separate refcount to count active users of the
      endpoint that causes the endpoint to be destroyed when it reaches 0.
      
      The original refcount can then be used to refcount objects through the work
      processor and cause the memory to be rcu freed when that reaches 0.
      
      Fixes: 4f95dd78 ("rxrpc: Rework local endpoint management")
      Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      730c5fd4
  10. 31 5月, 2019 1 次提交
  11. 25 4月, 2019 1 次提交
    • E
      rxrpc: fix race condition in rxrpc_input_packet() · 032be5f1
      Eric Dumazet 提交于
      After commit 5271953c ("rxrpc: Use the UDP encap_rcv hook"),
      rxrpc_input_packet() is directly called from lockless UDP receive
      path, under rcu_read_lock() protection.
      
      It must therefore use RCU rules :
      
      - udp_sk->sk_user_data can be cleared at any point in this function.
        rcu_dereference_sk_user_data() is what we need here.
      
      - Also, since sk_user_data might have been set in rxrpc_open_socket()
        we must observe a proper RCU grace period before kfree(local) in
        rxrpc_lookup_local()
      
      v4: @local can be NULL in xrpc_lookup_local() as reported by kbuild test robot <lkp@intel.com>
              and Julia Lawall <julia.lawall@lip6.fr>, thanks !
      
      v3,v2 : addressed David Howells feedback, thanks !
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 19236 Comm: syz-executor703 Not tainted 5.1.0-rc6 #79
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__lock_acquire+0xbef/0x3fb0 kernel/locking/lockdep.c:3573
      Code: 00 0f 85 a5 1f 00 00 48 81 c4 10 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 21 00 00 49 81 7d 00 20 54 9c 89 0f 84 cf f4
      RSP: 0018:ffff88809d7aef58 EFLAGS: 00010002
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000026 RSI: 0000000000000000 RDI: 0000000000000001
      RBP: ffff88809d7af090 R08: 0000000000000001 R09: 0000000000000001
      R10: ffffed1015d05bc7 R11: ffff888089428600 R12: 0000000000000000
      R13: 0000000000000130 R14: 0000000000000001 R15: 0000000000000001
      FS:  00007f059044d700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004b6040 CR3: 00000000955ca000 CR4: 00000000001406f0
      Call Trace:
       lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4211
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
       skb_queue_tail+0x26/0x150 net/core/skbuff.c:2972
       rxrpc_reject_packet net/rxrpc/input.c:1126 [inline]
       rxrpc_input_packet+0x4a0/0x5536 net/rxrpc/input.c:1414
       udp_queue_rcv_one_skb+0xaf2/0x1780 net/ipv4/udp.c:2011
       udp_queue_rcv_skb+0x128/0x730 net/ipv4/udp.c:2085
       udp_unicast_rcv_skb.isra.0+0xb9/0x360 net/ipv4/udp.c:2245
       __udp4_lib_rcv+0x701/0x2ca0 net/ipv4/udp.c:2301
       udp_rcv+0x22/0x30 net/ipv4/udp.c:2482
       ip_protocol_deliver_rcu+0x60/0x8f0 net/ipv4/ip_input.c:208
       ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:255
       dst_input include/net/dst.h:450 [inline]
       ip_rcv_finish+0x1e1/0x300 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0x115/0x1a0 net/core/dev.c:4987
       __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5099
       netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5202
       napi_frags_finish net/core/dev.c:5769 [inline]
       napi_gro_frags+0xade/0xd10 net/core/dev.c:5843
       tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
       call_write_iter include/linux/fs.h:1866 [inline]
       do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
       do_iter_write fs/read_write.c:957 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:938
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
       do_writev+0x15e/0x370 fs/read_write.c:1037
       __do_sys_writev fs/read_write.c:1110 [inline]
       __se_sys_writev fs/read_write.c:1107 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 5271953c ("rxrpc: Use the UDP encap_rcv hook")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      032be5f1
  12. 13 4月, 2019 1 次提交
    • J
      rxrpc: Fix detection of out of order acks · 1a2391c3
      Jeffrey Altman 提交于
      The rxrpc packet serial number cannot be safely used to compute out of
      order ack packets for several reasons:
      
       1. The allocation of serial numbers cannot be assumed to imply the order
          by which acks are populated and transmitted.  In some rxrpc
          implementations, delayed acks and ping acks are transmitted
          asynchronously to the receipt of data packets and so may be transmitted
          out of order.  As a result, they can race with idle acks.
      
       2. Serial numbers are allocated by the rxrpc connection and not the call
          and as such may wrap independently if multiple channels are in use.
      
      In any case, what matters is whether the ack packet provides new
      information relating to the bounds of the window (the firstPacket and
      previousPacket in the ACK data).
      
      Fix this by discarding packets that appear to wind back the window bounds
      rather than on serial number procession.
      
      Fixes: 298bc15b ("rxrpc: Only take the rwind and mtu values from latest ACK")
      Signed-off-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a2391c3
  13. 09 10月, 2018 2 次提交
    • D
      rxrpc: Fix the packet reception routine · c1e15b49
      David Howells 提交于
      The rxrpc_input_packet() function and its call tree was built around the
      assumption that data_ready() handler called from UDP to inform a kernel
      service that there is data to be had was non-reentrant.  This means that
      certain locking could be dispensed with.
      
      This, however, turns out not to be the case with a multi-queue network card
      that can deliver packets to multiple cpus simultaneously.  Each of those
      cpus can be in the rxrpc_input_packet() function at the same time.
      
      Fix by adding or changing some structure members:
      
       (1) Add peer->rtt_input_lock to serialise access to the RTT buffer.
      
       (2) Make conn->service_id into a 32-bit variable so that it can be
           cmpxchg'd on all arches.
      
       (3) Add call->input_lock to serialise access to the Rx/Tx state.  Note
           that although the Rx and Tx states are (almost) entirely separate,
           there's no point completing the separation and having separate locks
           since it's a bi-phasal RPC protocol rather than a bi-direction
           streaming protocol.  Data transmission and data reception do not take
           place simultaneously on any particular call.
      
      and making the following functional changes:
      
       (1) In rxrpc_input_data(), hold call->input_lock around the core to
           prevent simultaneous producing of packets into the Rx ring and
           updating of tracking state for a particular call.
      
       (2) In rxrpc_input_ping_response(), only read call->ping_serial once, and
           check it before checking RXRPC_CALL_PINGING as that's a cheaper test.
           The bit test and bit clear can then be combined.  No further locking
           is needed here.
      
       (3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of
           the ACK packet.  The superseded ACK check is then done both before and
           after the lock is taken.
      
           The handing of ackinfo data is split, parsing before the lock is taken
           and processing with it held.  This is keyed on rxMTU being non-zero.
      
           Congestion management is also done within the locked section.
      
       (4) In rxrpc_input_ackall(), take call->input_lock around the Tx window
           rotation.  The ACKALL packet carries no information and is only really
           useful after all packets have been transmitted since it's imprecise.
      
       (5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to
           prevent calls being simultaneously implicitly ended on two cpus and
           also to prevent any races with incoming call setup.
      
       (6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade
           on a connection.  It is only permitted to happen once for a
           connection.
      
       (7) In rxrpc_new_incoming_call(), we have to recheck the routing inside
           rx->incoming_lock to see if someone else set up the call, connection
           or peer whilst we were getting there.  We can't trust the values from
           the earlier routing check unless we pin refs on them - which we want
           to avoid.
      
           Further, we need to allow for an incoming call to have its state
           changed on another CPU between us making it live and us adjusting it
           because the conn is now in the RXRPC_CONN_SERVICE state.
      
       (8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access
           to the RTT buffer.  Don't need to lock around setting peer->rtt.
      
      For reference, the inventory of state-accessing or state-altering functions
      used by the packet input procedure is:
      
      > rxrpc_input_packet()
        * PACKET CHECKING
      
        * ROUTING
          > rxrpc_post_packet_to_local()
          > rxrpc_find_connection_rcu() - uses RCU
            > rxrpc_lookup_peer_rcu() - uses RCU
            > rxrpc_find_service_conn_rcu() - uses RCU
            > idr_find() - uses RCU
      
        * CONNECTION-LEVEL PROCESSING
          - Service upgrade
            - Can only happen once per conn
            ! Changed to use cmpxchg
          > rxrpc_post_packet_to_conn()
          - Setting conn->hi_serial
            - Probably safe not using locks
            - Maybe use cmpxchg
      
        * CALL-LEVEL PROCESSING
          > Old-call checking
            > rxrpc_input_implicit_end_call()
              > rxrpc_call_completed()
      	> rxrpc_queue_call()
      	! Need to take rx->incoming_lock
      	> __rxrpc_disconnect_call()
      	> rxrpc_notify_socket()
          > rxrpc_new_incoming_call()
            - Uses rx->incoming_lock for the entire process
              - Might be able to drop this earlier in favour of the call lock
            > rxrpc_incoming_call()
            	! Conflicts with rxrpc_input_implicit_end_call()
          > rxrpc_send_ping()
            - Don't need locks to check rtt state
            > rxrpc_propose_ACK
      
        * PACKET DISTRIBUTION
          > rxrpc_input_call_packet()
            > rxrpc_input_data()
      	* QUEUE DATA PACKET ON CALL
      	> rxrpc_reduce_call_timer()
      	  - Uses timer_reduce()
      	! Needs call->input_lock()
      	> rxrpc_receiving_reply()
      	  ! Needs locking around ack state
      	  > rxrpc_rotate_tx_window()
      	  > rxrpc_end_tx_phase()
      	> rxrpc_proto_abort()
      	> rxrpc_input_dup_data()
      	- Fills the Rx buffer
      	- rxrpc_propose_ACK()
      	- rxrpc_notify_socket()
      
            > rxrpc_input_ack()
      	* APPLY ACK PACKET TO CALL AND DISCARD PACKET
      	> rxrpc_input_ping_response()
      	  - Probably doesn't need any extra locking
      	  ! Need READ_ONCE() on call->ping_serial
      	  > rxrpc_input_check_for_lost_ack()
      	    - Takes call->lock to consult Tx buffer
      	  > rxrpc_peer_add_rtt()
      	    ! Needs to take a lock (peer->rtt_input_lock)
      	    ! Could perhaps manage with cmpxchg() and xadd() instead
      	> rxrpc_input_requested_ack
      	  - Consults Tx buffer
      	    ! Probably needs a lock
      	  > rxrpc_peer_add_rtt()
      	> rxrpc_propose_ack()
      	> rxrpc_input_ackinfo()
      	  - Changes call->tx_winsize
      	    ! Use cmpxchg to handle change
      	    ! Should perhaps track serial number
      	  - Uses peer->lock to record MTU specification changes
      	> rxrpc_proto_abort()
      	! Need to take call->input_lock
      	> rxrpc_rotate_tx_window()
      	> rxrpc_end_tx_phase()
      	> rxrpc_input_soft_acks()
      	- Consults the Tx buffer
      	> rxrpc_congestion_management()
      	  - Modifies the Tx annotations
      	  ! Needs call->input_lock()
      	  > rxrpc_queue_call()
      
            > rxrpc_input_abort()
      	* APPLY ABORT PACKET TO CALL AND DISCARD PACKET
      	> rxrpc_set_call_completion()
      	> rxrpc_notify_socket()
      
            > rxrpc_input_ackall()
      	* APPLY ACKALL PACKET TO CALL AND DISCARD PACKET
      	! Need to take call->input_lock
      	> rxrpc_rotate_tx_window()
      	> rxrpc_end_tx_phase()
      
          > rxrpc_reject_packet()
      
      There are some functions used by the above that queue the packet, after
      which the procedure is terminated:
      
       - rxrpc_post_packet_to_local()
         - local->event_queue is an sk_buff_head
         - local->processor is a work_struct
       - rxrpc_post_packet_to_conn()
         - conn->rx_queue is an sk_buff_head
         - conn->processor is a work_struct
       - rxrpc_reject_packet()
         - local->reject_queue is an sk_buff_head
         - local->processor is a work_struct
      
      And some that offload processing to process context:
      
       - rxrpc_notify_socket()
         - Uses RCU lock
         - Uses call->notify_lock to call call->notify_rx
         - Uses call->recvmsg_lock to queue recvmsg side
       - rxrpc_queue_call()
         - call->processor is a work_struct
       - rxrpc_propose_ACK()
         - Uses call->lock to wrap __rxrpc_propose_ACK()
      
      And a bunch that complete a call, all of which use call->state_lock to
      protect the call state:
      
       - rxrpc_call_completed()
       - rxrpc_set_call_completion()
       - rxrpc_abort_call()
       - rxrpc_proto_abort()
         - Also uses rxrpc_queue_call()
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c1e15b49
    • D
      rxrpc: Only take the rwind and mtu values from latest ACK · 298bc15b
      David Howells 提交于
      Move the out-of-order and duplicate ACK packet check to before the call to
      rxrpc_input_ackinfo() so that the receive window size and MTU size are only
      checked in the latest ACK packet and don't regress.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      298bc15b
  14. 08 10月, 2018 4 次提交
    • D
      rxrpc: Carry call state out of locked section in rxrpc_rotate_tx_window() · dfe99522
      David Howells 提交于
      Carry the call state out of the locked section in rxrpc_rotate_tx_window()
      rather than sampling it afterwards.  This is only used to select tracepoint
      data, but could have changed by the time we do the tracepoint.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dfe99522
    • D
      rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() · c479d5f2
      David Howells 提交于
      We should only call the function to end a call's Tx phase if we rotated the
      marked-last packet out of the transmission buffer.
      
      Make rxrpc_rotate_tx_window() return an indication of whether it just
      rotated the packet marked as the last out of the transmit buffer, carrying
      the information out of the locked section in that function.
      
      We can then check the return value instead of examining RXRPC_CALL_TX_LAST.
      
      Fixes: 70790dbe ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c479d5f2
    • D
      rxrpc: Don't need to take the RCU read lock in the packet receiver · bfd28211
      David Howells 提交于
      We don't need to take the RCU read lock in the rxrpc packet receive
      function because it's held further up the stack in the IP input routine
      around the UDP receive routines.
      
      Fix this by dropping the RCU read lock calls from rxrpc_input_packet().
      This simplifies the code.
      
      Fixes: 70790dbe ("rxrpc: Pass the last Tx packet marker in the annotation buffer")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bfd28211
    • D
      rxrpc: Use the UDP encap_rcv hook · 5271953c
      David Howells 提交于
      Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception
      in which a packet is placed onto the UDP receive queue and then immediately
      removed again by rxrpc.  Going via the queue in this manner seems like it
      should be unnecessary.
      
      This does, however, require the invention of a value to place in encap_type
      as that's one of the conditions to switch packets out to the encap_rcv
      hook.  Possibly the value doesn't actually matter for anything other than
      sockopts on the UDP socket, which aren't accessible outside of rxrpc
      anyway.
      
      This seems to cut a bit of time out of the time elapsed between each
      sk_buff being timestamped and turning up in rxrpc (the final number in the
      following trace excerpts).  I measured this by making the rxrpc_rx_packet
      trace point print the time elapsed between the skb being timestamped and
      the current time (in ns), e.g.:
      
      	... 424.278721: rxrpc_rx_packet: ...  ACK 25026
      
      So doing a 512MiB DIO read from my test server, with an unmodified kernel:
      
      	N       min     max     sum		mean    stddev
      	27605   2626    7581    7.83992e+07     2840.04 181.029
      
      and with the patch applied:
      
      	N       min     max     sum		mean    stddev
      	27547   1895    12165   6.77461e+07     2459.29 255.02
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5271953c
  15. 05 10月, 2018 2 次提交
    • D
      rxrpc: Fix the data_ready handler · 2cfa2271
      David Howells 提交于
      Fix the rxrpc_data_ready() function to pick up all packets and to not miss
      any.  There are two problems:
      
       (1) The sk_data_ready pointer on the UDP socket is set *after* it is
           bound.  This means that it's open for business before we're ready to
           dequeue packets and there's a tiny window exists in which a packet can
           sneak onto the receive queue, but we never know about it.
      
           Fix this by setting the pointers on the socket prior to binding it.
      
       (2) skb_recv_udp() will return an error (such as ENETUNREACH) if there was
           an error on the transmission side, even though we set the
           sk_error_report hook.  Because rxrpc_data_ready() returns immediately
           in such a case, it never actually removes its packet from the receive
           queue.
      
           Fix this by abstracting out the UDP dequeuing and checksumming into a
           separate function that keeps hammering on skb_recv_udp() until it
           returns -EAGAIN, passing the packets extracted to the remainder of the
           function.
      
      and two potential problems:
      
       (3) It might be possible in some circumstances or in the future for
           packets to be being added to the UDP receive queue whilst rxrpc is
           running consuming them, so the data_ready() handler might get called
           less often than once per packet.
      
           Allow for this by fully draining the queue on each call as (2).
      
       (4) If a packet fails the checksum check, the code currently returns after
           discarding the packet without checking for more.
      
           Allow for this by fully draining the queue on each call as (2).
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      2cfa2271
    • D
      rxrpc: Fix some missed refs to init_net · 5e33a23b
      David Howells 提交于
      Fix some refs to init_net that should've been changed to the appropriate
      network namespace.
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      5e33a23b
  16. 04 10月, 2018 1 次提交
  17. 28 9月, 2018 5 次提交
    • D
      rxrpc: Make service call handling more robust · 0099dc58
      David Howells 提交于
      Make the following changes to improve the robustness of the code that sets
      up a new service call:
      
       (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
           service ID check and pass that along to rxrpc_new_incoming_call().
           This means that I can remove the check from rxrpc_new_incoming_call()
           without the need to worry about the socket attached to the local
           endpoint getting replaced - which would invalidate the check.
      
       (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
           done once.  The peer is passed to rxrpc_new_incoming_call(), thereby
           saving the need to repeat the search.
      
           This also reduces the possibility of rxrpc_publish_service_conn()
           BUG()'ing due to the detection of a duplicate connection, despite the
           initial search done by rxrpc_find_connection_rcu() having turned up
           nothing.
      
           This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
           non-reentrant and the result of the initial search should still hold
           true, but it has proven possible to hit.
      
           I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
           the iteration over the hash table if it finds a matching peer with a
           zero usage count, but I don't know for sure since it's only ever been
           hit once that I know of.
      
           Another possibility is that a bug in rxrpc_data_ready() that checked
           the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
           might've let through a packet that caused a spurious and invalid call
           to be set up.  That is addressed in another patch.
      
       (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
           usage count rather than stopping and returning not found, just in case
           there's another peer record behind it in the bucket.
      
       (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
           rather either use the peer cached in (2) or, if one wasn't found,
           preemptively install a new one.
      
      Fixes: 8496af50 ("rxrpc: Use RCU to access a peer's service connection tree")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0099dc58
    • D
      rxrpc: Improve up-front incoming packet checking · 403fc2a1
      David Howells 提交于
      Do more up-front checking on incoming packets to weed out invalid ones and
      also ones aimed at services that we don't support.
      
      Whilst we're at it, replace the clearing of call and skew if we don't find
      a connection with just initialising the variables to zero at the top of the
      function.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      403fc2a1
    • D
      rxrpc: Emit BUSY packets when supposed to rather than ABORTs · ece64fec
      David Howells 提交于
      In the input path, a received sk_buff can be marked for rejection by
      setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
      (such as an abort code) in skb->priority.  The rejection is handled by
      queueing the sk_buff up for dealing with in process context.  The output
      code reads the mark and priority and, theoretically, generates an
      appropriate response packet.
      
      However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
      message with a random abort code is generated (since skb->priority wasn't
      set to anything).
      
      Fix this by outputting the appropriate sort of packet.
      
      Also, whilst we're at it, most of the marks are no longer used, so remove
      them and rename the remaining two to something more obvious.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ece64fec
    • D
      rxrpc: Fix RTT gathering · b604dd98
      David Howells 提交于
      Fix RTT information gathering in AF_RXRPC by the following means:
      
       (1) Enable Rx timestamping on the transport socket with SO_TIMESTAMPNS.
      
       (2) If the sk_buff doesn't have a timestamp set when rxrpc_data_ready()
           collects it, set it at that point.
      
       (3) Allow ACKs to be requested on the last packet of a client call, but
           not a service call.  We need to be careful lest we undo:
      
      	bf7d620a
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Thu Oct 6 08:11:51 2016 +0100
      	rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase
      
           but that only really applies to service calls that we're handling,
           since the client side gets to send the final ACK (or not).
      
       (4) When about to transmit an ACK or DATA packet, record the Tx timestamp
           before only; don't update the timestamp afterwards.
      
       (5) Switch the ordering between recording the serial and recording the
           timestamp to always set the serial number first.  The serial number
           shouldn't be seen referenced by an ACK packet until we've transmitted
           the packet bearing it - so in the Rx path, we don't need the timestamp
           until we've checked the serial number.
      
      Fixes: cf1a6474 ("rxrpc: Add per-peer RTT tracker")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b604dd98
    • D
      rxrpc: Fix checks as to whether we should set up a new call · dc71db34
      David Howells 提交于
      There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED
      flag in the packet type field rather than in the packet flags field.
      
      Fix this by creating a pair of helper functions to check whether the packet
      is going to the client or to the server and use them generally.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dc71db34
  18. 11 9月, 2018 1 次提交
  19. 01 8月, 2018 2 次提交
  20. 05 6月, 2018 1 次提交
    • D
      rxrpc: Fix handling of call quietly cancelled out on server · 1a025028
      David Howells 提交于
      Sometimes an in-progress call will stop responding on the fileserver when
      the fileserver quietly cancels the call with an internally marked abort
      (RX_CALL_DEAD), without sending an ABORT to the client.
      
      This causes the client's call to eventually expire from lack of incoming
      packets directed its way, which currently leads to it being cancelled
      locally with ETIME.  Note that it's not currently clear as to why this
      happens as it's really hard to reproduce.
      
      The rotation policy implement by kAFS, however, doesn't differentiate
      between ETIME meaning we didn't get any response from the server and ETIME
      meaning the call got cancelled mid-flow.  The latter leads to an oops when
      fetching data as the rotation partially resets the afs_read descriptor,
      which can result in a cleared page pointer being dereferenced because that
      page has already been filled.
      
      Handle this by the following means:
      
       (1) Set a flag on a call when we receive a packet for it.
      
       (2) Store the highest packet serial number so far received for a call
           (bearing in mind this may wrap).
      
       (3) If, when the "not received anything recently" timeout expires on a
           call, we've received at least one packet for a call and the connection
           as a whole has received packets more recently than that call, then
           cancel the call locally with ECONNRESET rather than ETIME.
      
           This indicates that the call was definitely in progress on the server.
      
       (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME,
           don't try the next server, but rather abort the call.
      
           This avoids the oops as we don't try to reuse the afs_read struct.
           Rather, as-yet ungotten pages will be reread at a later data.
      
      Also:
      
       (5) Add an rxrpc tracepoint to log detection of the call being reset.
      
      Without this, I occasionally see an oops like the following:
      
          general protection fault: 0000 [#1] SMP PTI
          ...
          RIP: 0010:_copy_to_iter+0x204/0x310
          RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206
          RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560
          RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000
          RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400
          R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958
          R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560
          FS:  00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0
          Call Trace:
           skb_copy_datagram_iter+0x14e/0x289
           rxrpc_recvmsg_data.isra.0+0x6f3/0xf68
           ? trace_buffer_unlock_commit_regs+0x4f/0x89
           rxrpc_kernel_recv_data+0x149/0x421
           afs_extract_data+0x1e0/0x798
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_deliver_fs_fetch_data+0x33a/0x5ab
           afs_deliver_to_call+0x1ee/0x5e0
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_wait_for_call_to_complete+0x12b/0x52e
           ? wake_up_q+0x54/0x54
           afs_make_call+0x287/0x462
           ? afs_fs_fetch_data+0x3e6/0x3ed
           ? rcu_read_lock_sched_held+0x5d/0x63
           afs_fs_fetch_data+0x3e6/0x3ed
           afs_fetch_data+0xbb/0x14a
           afs_readpages+0x317/0x40d
           __do_page_cache_readahead+0x203/0x2ba
           ? ondemand_readahead+0x3a7/0x3c1
           ondemand_readahead+0x3a7/0x3c1
           generic_file_buffered_read+0x18b/0x62f
           __vfs_read+0xdb/0xfe
           vfs_read+0xb2/0x137
           ksys_read+0x50/0x8c
           do_syscall_64+0x7d/0x1a0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Note the weird value in RDI which is a result of trying to kmap() a NULL
      page pointer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a025028
  21. 11 5月, 2018 1 次提交
    • D
      rxrpc: Fix missing start of call timeout · c54e43d7
      David Howells 提交于
      The expect_rx_by call timeout is supposed to be set when a call is started
      to indicate that we need to receive a packet by that point.  This is
      currently put back every time we receive a packet, but it isn't started
      when we first send a packet.  Without this, the call may wait forever if
      the server doesn't deign to reply.
      
      Fix this by setting the timeout upon a successful UDP sendmsg call for the
      first DATA packet.  The timeout is initiated only for initial transmission
      and not for subsequent retries as we don't want the retry mechanism to
      extend the timeout indefinitely.
      
      Fixes: a158bdd3 ("rxrpc: Fix call timeouts")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c54e43d7
  22. 04 4月, 2018 1 次提交
  23. 31 3月, 2018 2 次提交
    • D
      rxrpc: Don't treat call aborts as conn aborts · 57b0c9d4
      David Howells 提交于
      If a call-level abort is received for the previous call to complete on a
      connection channel, then that abort is queued for the connection processor
      to handle.  Unfortunately, the connection processor then assumes without
      checking that the abort is connection-level (ie. callNumber is 0) and
      distributes it over all active calls on that connection, thereby
      incorrectly aborting them.
      
      Fix this by discarding aborts aimed at a completed call.
      
      Further, discard all packets aimed at a call that's complete if there's
      currently an active call on a channel, since the DATA packets associated
      with the new call automatically terminate the old call.
      
      Fixes: 18bfeba5 ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      57b0c9d4
    • D
      rxrpc: Fix firewall route keepalive · ace45bec
      David Howells 提交于
      Fix the firewall route keepalive part of AF_RXRPC which is currently
      function incorrectly by replying to VERSION REPLY packets from the server
      with VERSION REQUEST packets.
      
      Instead, send VERSION REPLY packets to the peers of service connections to
      act as keep-alives 20s after the latest packet was transmitted to that
      peer.
      
      Also, just discard VERSION REPLY packets rather than replying to them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ace45bec
  24. 28 3月, 2018 1 次提交
    • D
      rxrpc, afs: Use debug_ids rather than pointers in traces · a25e21f0
      David Howells 提交于
      In rxrpc and afs, use the debug_ids that are monotonically allocated to
      various objects as they're allocated rather than pointers as kernel
      pointers are now hashed making them less useful.  Further, the debug ids
      aren't reused anywhere nearly as quickly.
      
      In addition, allow kernel services that use rxrpc, such as afs, to take
      numbers from the rxrpc counter, assign them to their own call struct and
      pass them in to rxrpc for both client and service calls so that the trace
      lines for each will have the same ID tag.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a25e21f0