1. 25 7月, 2023 1 次提交
  2. 09 2月, 2021 1 次提交
    • T
      rxrpc: Fix memory leak in rxrpc_lookup_local · 61fdc34e
      Takeshi Misawa 提交于
      stable inclusion
      from stable-5.10.13
      commit 2e83a57a23a6c08308baebec660d1fdd82b3a4ea
      bugzilla: 47995
      
      --------------------------------
      
      commit b8323f72 upstream.
      
      Commit 9ebeddef ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
      Then release ref in __rxrpc_put_peer and rxrpc_put_peer_locked.
      
      	struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *local, gfp_t gfp)
      	-               peer->local = local;
      	+               peer->local = rxrpc_get_local(local);
      
      rxrpc_discard_prealloc also need ref release in discarding.
      
      syzbot report:
      BUG: memory leak
      unreferenced object 0xffff8881080ddc00 (size 256):
        comm "syz-executor339", pid 8462, jiffies 4294942238 (age 12.350s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 0a 00 00 00 00 c0 00 08 81 88 ff ff  ................
        backtrace:
          [<000000002b6e495f>] kmalloc include/linux/slab.h:552 [inline]
          [<000000002b6e495f>] kzalloc include/linux/slab.h:682 [inline]
          [<000000002b6e495f>] rxrpc_alloc_local net/rxrpc/local_object.c:79 [inline]
          [<000000002b6e495f>] rxrpc_lookup_local+0x1c1/0x760 net/rxrpc/local_object.c:244
          [<000000006b43a77b>] rxrpc_bind+0x174/0x240 net/rxrpc/af_rxrpc.c:149
          [<00000000fd447a55>] afs_open_socket+0xdb/0x200 fs/afs/rxrpc.c:64
          [<000000007fd8867c>] afs_net_init+0x2b4/0x340 fs/afs/main.c:126
          [<0000000063d80ec1>] ops_init+0x4e/0x190 net/core/net_namespace.c:152
          [<00000000073c5efa>] setup_net+0xde/0x2d0 net/core/net_namespace.c:342
          [<00000000a6744d5b>] copy_net_ns+0x19f/0x3e0 net/core/net_namespace.c:483
          [<0000000017d3aec3>] create_new_namespaces+0x199/0x4f0 kernel/nsproxy.c:110
          [<00000000186271ef>] unshare_nsproxy_namespaces+0x9b/0x120 kernel/nsproxy.c:226
          [<000000002de7bac4>] ksys_unshare+0x2fe/0x5c0 kernel/fork.c:2957
          [<00000000349b12ba>] __do_sys_unshare kernel/fork.c:3025 [inline]
          [<00000000349b12ba>] __se_sys_unshare kernel/fork.c:3023 [inline]
          [<00000000349b12ba>] __x64_sys_unshare+0x12/0x20 kernel/fork.c:3023
          [<000000006d178ef7>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          [<00000000637076d4>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 9ebeddef ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
      Signed-off-by: NTakeshi Misawa <jeliantsurux@gmail.com>
      Reported-and-tested-by: syzbot+305326672fed51b205f7@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/161183091692.3506637.3206605651502458810.stgit@warthog.procyon.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      61fdc34e
  3. 05 10月, 2020 1 次提交
    • D
      rxrpc: Fix accept on a connection that need securing · 2d914c1b
      David Howells 提交于
      When a new incoming call arrives at an userspace rxrpc socket on a new
      connection that has a security class set, the code currently pushes it onto
      the accept queue to hold a ref on it for the socket.  This doesn't work,
      however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
      state and discards the ref.  This means that the call runs out of refs too
      early and the kernel oopses.
      
      By contrast, a kernel rxrpc socket manually pre-charges the incoming call
      pool with calls that already have user call IDs assigned, so they are ref'd
      by the call tree on the socket.
      
      Change the mode of operation for userspace rxrpc server sockets to work
      like this too.  Although this is a UAPI change, server sockets aren't
      currently functional.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2d914c1b
  4. 24 8月, 2020 1 次提交
  5. 21 6月, 2020 1 次提交
    • D
      rxrpc: Fix notification call on completion of discarded calls · 0041cd5a
      David Howells 提交于
      When preallocated service calls are being discarded, they're passed to
      ->discard_new_call() to have the caller clean up any attached higher-layer
      preallocated pieces before being marked completed.  However, the act of
      marking them completed now invokes the call's notification function - which
      causes a problem because that function might assume that the previously
      freed pieces of memory are still there.
      
      Fix this by setting a dummy notification function on the socket after
      calling ->discard_new_call().
      
      This results in the following kasan message when the kafs module is
      removed.
      
      ==================================================================
      BUG: KASAN: use-after-free in afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
      Write of size 1 at addr ffff8880946c39e4 by task kworker/u4:1/21
      
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.8.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x18f/0x20d lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
       rxrpc_notify_socket+0x1db/0x5d0 net/rxrpc/recvmsg.c:40
       __rxrpc_set_call_completion.part.0+0x172/0x410 net/rxrpc/recvmsg.c:76
       __rxrpc_call_completed net/rxrpc/recvmsg.c:112 [inline]
       rxrpc_call_completed+0xca/0xf0 net/rxrpc/recvmsg.c:111
       rxrpc_discard_prealloc+0x781/0xab0 net/rxrpc/call_accept.c:233
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      Allocated by task 6820:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc mm/kasan/common.c:494 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:467
       kmem_cache_alloc_trace+0x153/0x7d0 mm/slab.c:3551
       kmalloc include/linux/slab.h:555 [inline]
       kzalloc include/linux/slab.h:669 [inline]
       afs_alloc_call+0x55/0x630 fs/afs/rxrpc.c:141
       afs_charge_preallocation+0xe9/0x2d0 fs/afs/rxrpc.c:757
       afs_open_socket+0x292/0x360 fs/afs/rxrpc.c:92
       afs_net_init+0xa6c/0xe30 fs/afs/main.c:125
       ops_init+0xaf/0x420 net/core/net_namespace.c:151
       setup_net+0x2de/0x860 net/core/net_namespace.c:341
       copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
       create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
       ksys_unshare+0x43d/0x8e0 kernel/fork.c:2983
       __do_sys_unshare kernel/fork.c:3051 [inline]
       __se_sys_unshare kernel/fork.c:3049 [inline]
       __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 21:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       kasan_set_free_info mm/kasan/common.c:316 [inline]
       __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:455
       __cache_free mm/slab.c:3426 [inline]
       kfree+0x109/0x2b0 mm/slab.c:3757
       afs_put_call+0x585/0xa40 fs/afs/rxrpc.c:190
       rxrpc_discard_prealloc+0x764/0xab0 net/rxrpc/call_accept.c:230
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      The buggy address belongs to the object at ffff8880946c3800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 484 bytes inside of
       1024-byte region [ffff8880946c3800, ffff8880946c3c00)
      The buggy address belongs to the page:
      page:ffffea000251b0c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0002546508 ffffea00024fa248 ffff8880aa000c40
      raw: 0000000000000000 ffff8880946c3000 0000000100000002 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880946c3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880946c3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff8880946c3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Reported-by: syzbot+d3eccef36ddbd02713e9@syzkaller.appspotmail.com
      Fixes: 5ac0d622 ("rxrpc: Fix missing notification")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0041cd5a
  6. 11 5月, 2020 1 次提交
    • D
      rxrpc: Fix the excessive initial retransmission timeout · c410bf01
      David Howells 提交于
      rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
      sufficiently sampled.  This can cause problems with some fileservers with
      calls to the cache manager in the afs filesystem being dropped from the
      fileserver because a packet goes missing and the retransmission timeout is
      greater than the call expiry timeout.
      
      Fix this by:
      
       (1) Copying the RTT/RTO calculation code from Linux's TCP implementation
           and altering it to fit rxrpc.
      
       (2) Altering the various users of the RTT to make use of the new SRTT
           value.
      
       (3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
           value instead (which is needed in jiffies), along with a backoff.
      
      Notes:
      
       (1) rxrpc provides RTT samples by matching the serial numbers on outgoing
           DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
           against the reference serial number in incoming REQUESTED ACK and
           PING-RESPONSE ACK packets.
      
       (2) Each packet that is transmitted on an rxrpc connection gets a new
           per-connection serial number, even for retransmissions, so an ACK can
           be cross-referenced to a specific trigger packet.  This allows RTT
           information to be drawn from retransmitted DATA packets also.
      
       (3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
           on an rxrpc_call because many RPC calls won't live long enough to
           generate more than one sample.
      
       (4) The calculated SRTT value is in units of 8ths of a microsecond rather
           than nanoseconds.
      
      The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.
      
      Fixes: 17926a79 ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c410bf01
  7. 21 12月, 2019 3 次提交
    • D
      rxrpc: Fix missing security check on incoming calls · 063c60d3
      David Howells 提交于
      Fix rxrpc_new_incoming_call() to check that we have a suitable service key
      available for the combination of service ID and security class of a new
      incoming call - and to reject calls for which we don't.
      
      This causes an assertion like the following to appear:
      
      	rxrpc: Assertion failed - 6(0x6) == 12(0xc) is false
      	kernel BUG at net/rxrpc/call_object.c:456!
      
      Where call->state is RXRPC_CALL_SERVER_SECURING (6) rather than
      RXRPC_CALL_COMPLETE (12).
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      063c60d3
    • D
      rxrpc: Don't take call->user_mutex in rxrpc_new_incoming_call() · 13b7955a
      David Howells 提交于
      Standard kernel mutexes cannot be used in any way from interrupt or softirq
      context, so the user_mutex which manages access to a call cannot be a mutex
      since on a new call the mutex must start off locked and be unlocked within
      the softirq handler to prevent userspace interfering with a call we're
      setting up.
      
      Commit a0855d24 ("locking/mutex: Complain
      upon mutex API misuse in IRQ contexts") causes big warnings to be splashed
      in dmesg for each a new call that comes in from the server.  Whilst it
      *seems* like it should be okay, since the accept path uses trylock, there
      are issues with PI boosting and marking the wrong task as the owner.
      
      Fix this by not taking the mutex in the softirq path at all.  It's not
      obvious that there should be any need for it as the state is set before the
      first notification is generated for the new call.
      
      There's also no particular reason why the link-assessing ping should be
      triggered inside the mutex.  It's not actually transmitted there anyway,
      but rather it has to be deferred to a workqueue.
      
      Further, I don't think that there's any particular reason that the socket
      notification needs to be done from within rx->incoming_lock, so the amount
      of time that lock is held can be shortened too and the ping prepared before
      the new call notification is sent.
      
      Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      cc: Ingo Molnar <mingo@redhat.com>
      cc: Will Deacon <will@kernel.org>
      cc: Davidlohr Bueso <dave@stgolabs.net>
      13b7955a
    • D
      rxrpc: Unlock new call in rxrpc_new_incoming_call() rather than the caller · f33121cb
      David Howells 提交于
      Move the unlock and the ping transmission for a new incoming call into
      rxrpc_new_incoming_call() rather than doing it in the caller.  This makes
      it clearer to see what's going on.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      cc: Ingo Molnar <mingo@redhat.com>
      cc: Will Deacon <will@kernel.org>
      cc: Davidlohr Bueso <dave@stgolabs.net>
      f33121cb
  8. 07 10月, 2019 3 次提交
    • D
      rxrpc: Fix call crypto state cleanup · 91fcfbe8
      David Howells 提交于
      Fix the cleanup of the crypto state on a call after the call has been
      disconnected.  As the call has been disconnected, its connection ref has
      been discarded and so we can't go through that to get to the security ops
      table.
      
      Fix this by caching the security ops pointer in the rxrpc_call struct and
      using that when freeing the call security state.  Also use this in other
      places we're dealing with call-specific security.
      
      The symptoms look like:
      
          BUG: KASAN: use-after-free in rxrpc_release_call+0xb2d/0xb60
          net/rxrpc/call_object.c:481
          Read of size 8 at addr ffff888062ffeb50 by task syz-executor.5/4764
      
      Fixes: 1db88c53 ("rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto")
      Reported-by: syzbot+eed305768ece6682bb7f@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      91fcfbe8
    • D
      rxrpc: Fix trace-after-put looking at the put call record · 48c9e0ec
      David Howells 提交于
      rxrpc_put_call() calls trace_rxrpc_call() after it has done the decrement
      of the refcount - which looks at the debug_id in the call record.  But
      unless the refcount was reduced to zero, we no longer have the right to
      look in the record and, indeed, it may be deleted by some other thread.
      
      Fix this by getting the debug_id out before decrementing the refcount and
      then passing that into the tracepoint.
      
      Fixes: e34d4234 ("rxrpc: Trace rxrpc_call usage")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      48c9e0ec
    • D
      rxrpc: Fix trace-after-put looking at the put connection record · 4c1295dc
      David Howells 提交于
      rxrpc_put_*conn() calls trace_rxrpc_conn() after they have done the
      decrement of the refcount - which looks at the debug_id in the connection
      record.  But unless the refcount was reduced to zero, we no longer have the
      right to look in the record and, indeed, it may be deleted by some other
      thread.
      
      Fix this by getting the debug_id out before decrementing the refcount and
      then passing that into the tracepoint.
      
      Fixes: 363deeab ("rxrpc: Add connection tracepoint and client conn state tracepoint")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4c1295dc
  9. 31 5月, 2019 1 次提交
  10. 16 10月, 2018 1 次提交
  11. 09 10月, 2018 2 次提交
    • D
      rxrpc: Fix the packet reception routine · c1e15b49
      David Howells 提交于
      The rxrpc_input_packet() function and its call tree was built around the
      assumption that data_ready() handler called from UDP to inform a kernel
      service that there is data to be had was non-reentrant.  This means that
      certain locking could be dispensed with.
      
      This, however, turns out not to be the case with a multi-queue network card
      that can deliver packets to multiple cpus simultaneously.  Each of those
      cpus can be in the rxrpc_input_packet() function at the same time.
      
      Fix by adding or changing some structure members:
      
       (1) Add peer->rtt_input_lock to serialise access to the RTT buffer.
      
       (2) Make conn->service_id into a 32-bit variable so that it can be
           cmpxchg'd on all arches.
      
       (3) Add call->input_lock to serialise access to the Rx/Tx state.  Note
           that although the Rx and Tx states are (almost) entirely separate,
           there's no point completing the separation and having separate locks
           since it's a bi-phasal RPC protocol rather than a bi-direction
           streaming protocol.  Data transmission and data reception do not take
           place simultaneously on any particular call.
      
      and making the following functional changes:
      
       (1) In rxrpc_input_data(), hold call->input_lock around the core to
           prevent simultaneous producing of packets into the Rx ring and
           updating of tracking state for a particular call.
      
       (2) In rxrpc_input_ping_response(), only read call->ping_serial once, and
           check it before checking RXRPC_CALL_PINGING as that's a cheaper test.
           The bit test and bit clear can then be combined.  No further locking
           is needed here.
      
       (3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of
           the ACK packet.  The superseded ACK check is then done both before and
           after the lock is taken.
      
           The handing of ackinfo data is split, parsing before the lock is taken
           and processing with it held.  This is keyed on rxMTU being non-zero.
      
           Congestion management is also done within the locked section.
      
       (4) In rxrpc_input_ackall(), take call->input_lock around the Tx window
           rotation.  The ACKALL packet carries no information and is only really
           useful after all packets have been transmitted since it's imprecise.
      
       (5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to
           prevent calls being simultaneously implicitly ended on two cpus and
           also to prevent any races with incoming call setup.
      
       (6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade
           on a connection.  It is only permitted to happen once for a
           connection.
      
       (7) In rxrpc_new_incoming_call(), we have to recheck the routing inside
           rx->incoming_lock to see if someone else set up the call, connection
           or peer whilst we were getting there.  We can't trust the values from
           the earlier routing check unless we pin refs on them - which we want
           to avoid.
      
           Further, we need to allow for an incoming call to have its state
           changed on another CPU between us making it live and us adjusting it
           because the conn is now in the RXRPC_CONN_SERVICE state.
      
       (8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access
           to the RTT buffer.  Don't need to lock around setting peer->rtt.
      
      For reference, the inventory of state-accessing or state-altering functions
      used by the packet input procedure is:
      
      > rxrpc_input_packet()
        * PACKET CHECKING
      
        * ROUTING
          > rxrpc_post_packet_to_local()
          > rxrpc_find_connection_rcu() - uses RCU
            > rxrpc_lookup_peer_rcu() - uses RCU
            > rxrpc_find_service_conn_rcu() - uses RCU
            > idr_find() - uses RCU
      
        * CONNECTION-LEVEL PROCESSING
          - Service upgrade
            - Can only happen once per conn
            ! Changed to use cmpxchg
          > rxrpc_post_packet_to_conn()
          - Setting conn->hi_serial
            - Probably safe not using locks
            - Maybe use cmpxchg
      
        * CALL-LEVEL PROCESSING
          > Old-call checking
            > rxrpc_input_implicit_end_call()
              > rxrpc_call_completed()
      	> rxrpc_queue_call()
      	! Need to take rx->incoming_lock
      	> __rxrpc_disconnect_call()
      	> rxrpc_notify_socket()
          > rxrpc_new_incoming_call()
            - Uses rx->incoming_lock for the entire process
              - Might be able to drop this earlier in favour of the call lock
            > rxrpc_incoming_call()
            	! Conflicts with rxrpc_input_implicit_end_call()
          > rxrpc_send_ping()
            - Don't need locks to check rtt state
            > rxrpc_propose_ACK
      
        * PACKET DISTRIBUTION
          > rxrpc_input_call_packet()
            > rxrpc_input_data()
      	* QUEUE DATA PACKET ON CALL
      	> rxrpc_reduce_call_timer()
      	  - Uses timer_reduce()
      	! Needs call->input_lock()
      	> rxrpc_receiving_reply()
      	  ! Needs locking around ack state
      	  > rxrpc_rotate_tx_window()
      	  > rxrpc_end_tx_phase()
      	> rxrpc_proto_abort()
      	> rxrpc_input_dup_data()
      	- Fills the Rx buffer
      	- rxrpc_propose_ACK()
      	- rxrpc_notify_socket()
      
            > rxrpc_input_ack()
      	* APPLY ACK PACKET TO CALL AND DISCARD PACKET
      	> rxrpc_input_ping_response()
      	  - Probably doesn't need any extra locking
      	  ! Need READ_ONCE() on call->ping_serial
      	  > rxrpc_input_check_for_lost_ack()
      	    - Takes call->lock to consult Tx buffer
      	  > rxrpc_peer_add_rtt()
      	    ! Needs to take a lock (peer->rtt_input_lock)
      	    ! Could perhaps manage with cmpxchg() and xadd() instead
      	> rxrpc_input_requested_ack
      	  - Consults Tx buffer
      	    ! Probably needs a lock
      	  > rxrpc_peer_add_rtt()
      	> rxrpc_propose_ack()
      	> rxrpc_input_ackinfo()
      	  - Changes call->tx_winsize
      	    ! Use cmpxchg to handle change
      	    ! Should perhaps track serial number
      	  - Uses peer->lock to record MTU specification changes
      	> rxrpc_proto_abort()
      	! Need to take call->input_lock
      	> rxrpc_rotate_tx_window()
      	> rxrpc_end_tx_phase()
      	> rxrpc_input_soft_acks()
      	- Consults the Tx buffer
      	> rxrpc_congestion_management()
      	  - Modifies the Tx annotations
      	  ! Needs call->input_lock()
      	  > rxrpc_queue_call()
      
            > rxrpc_input_abort()
      	* APPLY ABORT PACKET TO CALL AND DISCARD PACKET
      	> rxrpc_set_call_completion()
      	> rxrpc_notify_socket()
      
            > rxrpc_input_ackall()
      	* APPLY ACKALL PACKET TO CALL AND DISCARD PACKET
      	! Need to take call->input_lock
      	> rxrpc_rotate_tx_window()
      	> rxrpc_end_tx_phase()
      
          > rxrpc_reject_packet()
      
      There are some functions used by the above that queue the packet, after
      which the procedure is terminated:
      
       - rxrpc_post_packet_to_local()
         - local->event_queue is an sk_buff_head
         - local->processor is a work_struct
       - rxrpc_post_packet_to_conn()
         - conn->rx_queue is an sk_buff_head
         - conn->processor is a work_struct
       - rxrpc_reject_packet()
         - local->reject_queue is an sk_buff_head
         - local->processor is a work_struct
      
      And some that offload processing to process context:
      
       - rxrpc_notify_socket()
         - Uses RCU lock
         - Uses call->notify_lock to call call->notify_rx
         - Uses call->recvmsg_lock to queue recvmsg side
       - rxrpc_queue_call()
         - call->processor is a work_struct
       - rxrpc_propose_ACK()
         - Uses call->lock to wrap __rxrpc_propose_ACK()
      
      And a bunch that complete a call, all of which use call->state_lock to
      protect the call state:
      
       - rxrpc_call_completed()
       - rxrpc_set_call_completion()
       - rxrpc_abort_call()
       - rxrpc_proto_abort()
         - Also uses rxrpc_queue_call()
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c1e15b49
    • D
      rxrpc: Fix connection-level abort handling · 64753092
      David Howells 提交于
      Fix connection-level abort handling to cache the abort and error codes
      properly so that a new incoming call can be properly aborted if it races
      with the parent connection being aborted by another CPU.
      
      The abort_code and error parameters can then be dropped from
      rxrpc_abort_calls().
      
      Fixes: f5c17aae ("rxrpc: Calls should only have one terminal state")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      64753092
  12. 05 10月, 2018 1 次提交
  13. 04 10月, 2018 1 次提交
  14. 28 9月, 2018 2 次提交
    • D
      rxrpc: Make service call handling more robust · 0099dc58
      David Howells 提交于
      Make the following changes to improve the robustness of the code that sets
      up a new service call:
      
       (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
           service ID check and pass that along to rxrpc_new_incoming_call().
           This means that I can remove the check from rxrpc_new_incoming_call()
           without the need to worry about the socket attached to the local
           endpoint getting replaced - which would invalidate the check.
      
       (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
           done once.  The peer is passed to rxrpc_new_incoming_call(), thereby
           saving the need to repeat the search.
      
           This also reduces the possibility of rxrpc_publish_service_conn()
           BUG()'ing due to the detection of a duplicate connection, despite the
           initial search done by rxrpc_find_connection_rcu() having turned up
           nothing.
      
           This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
           non-reentrant and the result of the initial search should still hold
           true, but it has proven possible to hit.
      
           I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
           the iteration over the hash table if it finds a matching peer with a
           zero usage count, but I don't know for sure since it's only ever been
           hit once that I know of.
      
           Another possibility is that a bug in rxrpc_data_ready() that checked
           the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
           might've let through a packet that caused a spurious and invalid call
           to be set up.  That is addressed in another patch.
      
       (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
           usage count rather than stopping and returning not found, just in case
           there's another peer record behind it in the bucket.
      
       (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
           rather either use the peer cached in (2) or, if one wasn't found,
           preemptively install a new one.
      
      Fixes: 8496af50 ("rxrpc: Use RCU to access a peer's service connection tree")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0099dc58
    • D
      rxrpc: Emit BUSY packets when supposed to rather than ABORTs · ece64fec
      David Howells 提交于
      In the input path, a received sk_buff can be marked for rejection by
      setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
      (such as an abort code) in skb->priority.  The rejection is handled by
      queueing the sk_buff up for dealing with in process context.  The output
      code reads the mark and priority and, theoretically, generates an
      appropriate response packet.
      
      However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
      message with a random abort code is generated (since skb->priority wasn't
      set to anything).
      
      Fix this by outputting the appropriate sort of packet.
      
      Also, whilst we're at it, most of the marks are no longer used, so remove
      them and rename the remaining two to something more obvious.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ece64fec
  15. 02 8月, 2018 1 次提交
  16. 31 3月, 2018 4 次提交
    • D
      rxrpc: Fix apparent leak of rxrpc_local objects · 31f5f9a1
      David Howells 提交于
      rxrpc_local objects cannot be disposed of until all the connections that
      point to them have been RCU'd as a connection object holds refcount on the
      local endpoint it is communicating through.  Currently, this can cause an
      assertion failure to occur when a network namespace is destroyed as there's
      no check that the RCU destructors for the connections have been run before
      we start trying to destroy local endpoints.
      
      The kernel reports:
      
      	rxrpc: AF_RXRPC: Leaked local 0000000036a41bc1 {5}
      	------------[ cut here ]------------
      	kernel BUG at ../net/rxrpc/local_object.c:439!
      
      Fix this by keeping a count of the live connections and waiting for it to
      go to zero at the end of rxrpc_destroy_all_connections().
      
      Fixes: dee46364 ("rxrpc: Add RCU destruction for connections and calls")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      31f5f9a1
    • D
      rxrpc: Add a tracepoint to track rxrpc_local refcounting · 09d2bf59
      David Howells 提交于
      Add a tracepoint to track reference counting on the rxrpc_local struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      09d2bf59
    • D
      rxrpc: Fix potential call vs socket/net destruction race · d3be4d24
      David Howells 提交于
      rxrpc_call structs don't pin sockets or network namespaces, but may attempt
      to access both after their refcount reaches 0 so that they can detach
      themselves from the network namespace.  However, there's no guarantee that
      the socket still exists at this point (so sock_net(&call->socket->sk) may
      be invalid) and the namespace may have gone away if the call isn't pinning
      a peer.
      
      Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b)
      waiting for all calls to be destroyed when the network namespace goes away.
      
      This was detected by checker:
      
      net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces)
      net/rxrpc/call_object.c:634:57:    expected struct sock const *sk
      net/rxrpc/call_object.c:634:57:    got struct sock [noderef] <asn:4>*<noident>
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d3be4d24
    • D
      rxrpc: Fix checker warnings and errors · 88f2a825
      David Howells 提交于
      Fix various issues detected by checker.
      
      Errors:
      
       (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set
           call->socket.
      
      Warnings:
      
       (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to
           trace_rxrpc_conn() as the where argument.
      
       (*) rxrpc_disconnect_client_call() should get its net pointer via the
           call->conn rather than call->sock to avoid a warning about accessing
           an RCU pointer without protection.
      
       (*) Proc seq start/stop functions need annotation as they pass locks
           between the functions.
      
      False positives:
      
       (*) Checker doesn't correctly handle of seq-retry lock context balance in
           rxrpc_find_service_conn_rcu().
      
       (*) Checker thinks execution may proceed past the BUG() in
           rxrpc_publish_service_conn().
      
       (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in
           rxkad.c.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      88f2a825
  17. 28 3月, 2018 1 次提交
    • D
      rxrpc, afs: Use debug_ids rather than pointers in traces · a25e21f0
      David Howells 提交于
      In rxrpc and afs, use the debug_ids that are monotonically allocated to
      various objects as they're allocated rather than pointers as kernel
      pointers are now hashed making them less useful.  Further, the debug ids
      aren't reused anywhere nearly as quickly.
      
      In addition, allow kernel services that use rxrpc, such as afs, to take
      numbers from the rxrpc counter, assign them to their own call struct and
      pass them in to rxrpc for both client and service calls so that the trace
      lines for each will have the same ID tag.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a25e21f0
  18. 24 11月, 2017 1 次提交
    • D
      rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls · 9faaff59
      David Howells 提交于
      Provide a different lockdep key for rxrpc_call::user_mutex when the call is
      made on a kernel socket, such as by the AFS filesystem.
      
      The problem is that lockdep registers a false positive between userspace
      calling the sendmsg syscall on a user socket where call->user_mutex is held
      whilst userspace memory is accessed whereas the AFS filesystem may perform
      operations with mmap_sem held by the caller.
      
      In such a case, the following warning is produced.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.14.0-fscache+ #243 Tainted: G            E
      ------------------------------------------------------
      modpost/16701 is trying to acquire lock:
       (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]
      
      but task is already holding lock:
       (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&mm->mmap_sem){++++}:
             __might_fault+0x61/0x89
             _copy_from_iter_full+0x40/0x1fa
             rxrpc_send_data+0x8dc/0xff3
             rxrpc_do_sendmsg+0x62f/0x6a1
             rxrpc_sendmsg+0x166/0x1b7
             sock_sendmsg+0x2d/0x39
             ___sys_sendmsg+0x1ad/0x22b
             __sys_sendmsg+0x41/0x62
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #2 (&call->user_mutex){+.+.}:
             __mutex_lock+0x86/0x7d2
             rxrpc_new_client_call+0x378/0x80e
             rxrpc_kernel_begin_call+0xf3/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_vl_get_capabilities+0x193/0x198 [kafs]
             afs_vl_lookup_vldb+0x5f/0x151 [kafs]
             afs_create_volume+0x2e/0x2f4 [kafs]
             afs_mount+0x56a/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #1 (k-sk_lock-AF_RXRPC){+.+.}:
             lock_sock_nested+0x74/0x8a
             rxrpc_kernel_begin_call+0x8a/0x154
             afs_make_call+0x195/0x454 [kafs]
             afs_fs_get_capabilities+0x17a/0x17f [kafs]
             afs_probe_fileserver+0xf7/0x2f0 [kafs]
             afs_select_fileserver+0x83f/0x903 [kafs]
             afs_fetch_status+0x89/0x11d [kafs]
             afs_iget+0x16f/0x4f8 [kafs]
             afs_mount+0x6c6/0x8d7 [kafs]
             mount_fs+0x6a/0x109
             vfs_kern_mount+0x67/0x135
             do_mount+0x90b/0xb57
             SyS_mount+0x72/0x98
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      -> #0 (&vnode->io_lock){+.+.}:
             lock_acquire+0x174/0x19f
             __mutex_lock+0x86/0x7d2
             afs_begin_vnode_operation+0x33/0x77 [kafs]
             afs_fetch_data+0x80/0x12a [kafs]
             afs_readpages+0x314/0x405 [kafs]
             __do_page_cache_readahead+0x203/0x2ba
             filemap_fault+0x179/0x54d
             __do_fault+0x17/0x60
             __handle_mm_fault+0x6d7/0x95c
             handle_mm_fault+0x24e/0x2a3
             __do_page_fault+0x301/0x486
             do_page_fault+0x236/0x259
             page_fault+0x22/0x30
             __clear_user+0x3d/0x60
             padzero+0x1c/0x2b
             load_elf_binary+0x785/0xdc7
             search_binary_handler+0x81/0x1ff
             do_execveat_common.isra.14+0x600/0x888
             do_execve+0x1f/0x21
             SyS_execve+0x28/0x2f
             do_syscall_64+0x89/0x1be
             return_from_SYSCALL_64+0x0/0x75
      
      other info that might help us debug this:
      
      Chain exists of:
        &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&mm->mmap_sem);
                                     lock(&call->user_mutex);
                                     lock(&mm->mmap_sem);
        lock(&vnode->io_lock);
      
       *** DEADLOCK ***
      
      1 lock held by modpost/16701:
       #0:  (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
      
      stack backtrace:
      CPU: 0 PID: 16701 Comm: modpost Tainted: G            E   4.14.0-fscache+ #243
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Call Trace:
       dump_stack+0x67/0x8e
       print_circular_bug+0x341/0x34f
       check_prev_add+0x11f/0x5d4
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? add_lock_to_list.isra.12+0x8b/0x8b
       ? __lock_acquire+0xf77/0x10b4
       __lock_acquire+0xf77/0x10b4
       lock_acquire+0x174/0x19f
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       __mutex_lock+0x86/0x7d2
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       ? afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_begin_vnode_operation+0x33/0x77 [kafs]
       afs_fetch_data+0x80/0x12a [kafs]
       afs_readpages+0x314/0x405 [kafs]
       __do_page_cache_readahead+0x203/0x2ba
       ? filemap_fault+0x179/0x54d
       filemap_fault+0x179/0x54d
       __do_fault+0x17/0x60
       __handle_mm_fault+0x6d7/0x95c
       handle_mm_fault+0x24e/0x2a3
       __do_page_fault+0x301/0x486
       do_page_fault+0x236/0x259
       page_fault+0x22/0x30
      RIP: 0010:__clear_user+0x3d/0x60
      RSP: 0018:ffff880071e93da0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720
      RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00
      R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000
       padzero+0x1c/0x2b
       load_elf_binary+0x785/0xdc7
       search_binary_handler+0x81/0x1ff
       do_execveat_common.isra.14+0x600/0x888
       do_execve+0x1f/0x21
       SyS_execve+0x28/0x2f
       do_syscall_64+0x89/0x1be
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7fdb6009ee07
      RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07
      RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900
      RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000
      R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9faaff59
  19. 29 8月, 2017 1 次提交
    • D
      rxrpc: Fix IPv6 support · 7b674e39
      David Howells 提交于
      Fix IPv6 support in AF_RXRPC in the following ways:
      
       (1) When extracting the address from a received IPv4 packet, if the local
           transport socket is open for IPv6 then fill out the sockaddr_rxrpc
           struct for an IPv4-mapped-to-IPv6 AF_INET6 transport address instead
           of an AF_INET one.
      
       (2) When sending CHALLENGE or RESPONSE packets, the transport length needs
           to be set from the sockaddr_rxrpc::transport_len field rather than
           sizeof() on the IPv4 transport address.
      
       (3) When processing an IPv4 ICMP packet received by an IPv6 socket, set up
           the address correctly before searching for the affected peer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7b674e39
  20. 19 8月, 2017 1 次提交
    • D
      rxrpc: Fix oops when discarding a preallocated service call · 9a19bad7
      David Howells 提交于
      rxrpc_service_prealloc_one() doesn't set the socket pointer on any new call
      it preallocates, but does add it to the rxrpc net namespace call list.
      This, however, causes rxrpc_put_call() to oops when the call is discarded
      when the socket is closed.  rxrpc_put_call() needs the socket to be able to
      reach the namespace so that it can use a lock held therein.
      
      Fix this by setting a call's socket pointer immediately before discarding
      it.
      
      This can be triggered by unloading the kafs module, resulting in an oops
      like the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      IP: rxrpc_put_call+0x1e2/0x32d
      PGD 0
      P4D 0
      Oops: 0000 [#1] SMP
      Modules linked in: kafs(E-)
      CPU: 3 PID: 3037 Comm: rmmod Tainted: G            E   4.12.0-fscache+ #213
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8803fc92e2c0 task.stack: ffff8803fef74000
      RIP: 0010:rxrpc_put_call+0x1e2/0x32d
      RSP: 0018:ffff8803fef77e08 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff8803fab99ac0 RCX: 000000000000000f
      RDX: ffffffff81c50a40 RSI: 000000000000000c RDI: ffff8803fc92ea88
      RBP: ffff8803fef77e30 R08: ffff8803fc87b941 R09: ffffffff82946d20
      R10: ffff8803fef77d10 R11: 00000000000076fc R12: 0000000000000005
      R13: ffff8803fab99c20 R14: 0000000000000001 R15: ffffffff816c6aee
      FS:  00007f915a059700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000030 CR3: 00000003fef39000 CR4: 00000000001406e0
      Call Trace:
       rxrpc_discard_prealloc+0x325/0x341
       rxrpc_listen+0xf9/0x146
       kernel_listen+0xb/0xd
       afs_close_socket+0x3e/0x173 [kafs]
       afs_exit+0x1f/0x57 [kafs]
       SyS_delete_module+0x10f/0x19a
       do_syscall_64+0x8a/0x149
       entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a19bad7
  21. 15 6月, 2017 1 次提交
    • D
      rxrpc: Cache the congestion window setting · f7aec129
      David Howells 提交于
      Cache the congestion window setting that was determined during a call's
      transmission phase when it finishes so that it can be used by the next call
      to the same peer, thereby shortcutting the slow-start algorithm.
      
      The value is stored in the rxrpc_peer struct and is accessed without
      locking.  Each call takes the value that happens to be there when it starts
      and just overwrites the value when it finishes.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7aec129
  22. 05 6月, 2017 2 次提交
    • D
      rxrpc: Implement service upgrade · 4722974d
      David Howells 提交于
      Implement AuriStor's service upgrade facility.  There are three problems
      that this is meant to deal with:
      
       (1) Various of the standard AFS RPC calls have IPv4 addresses in their
           requests and/or replies - but there's no room for including IPv6
           addresses.
      
       (2) Definition of IPv6-specific RPC operations in the standard operation
           sets has not yet been achieved.
      
       (3) One could envision the creation a new service on the same port that as
           the original service.  The new service could implement improved
           operations - and the client could try this first, falling back to the
           original service if it's not there.
      
           Unfortunately, certain servers ignore packets addressed to a service
           they don't implement and don't respond in any way - not even with an
           ABORT.  This means that the client must then wait for the call timeout
           to occur.
      
      What service upgrade does is to see if the connection is marked as being
      'upgradeable' and if so, change the service ID in the server and thus the
      request and reply formats.  Note that the upgrade isn't mandatory - a
      server that supports only the original call set will ignore the upgrade
      request.
      
      In the protocol, the procedure is then as follows:
      
       (1) To request an upgrade, the first DATA packet in a new connection must
           have the userStatus set to 1 (this is normally 0).  The userStatus
           value is normally ignored by the server.
      
       (2) If the server doesn't support upgrading, the reply packets will
           contain the same service ID as for the first request packet.
      
       (3) If the server does support upgrading, all future reply packets on that
           connection will contain the new service ID and the new service ID will
           be applied to *all* further calls on that connection as well.
      
       (4) The RPC op used to probe the upgrade must take the same request data
           as the shadow call in the upgrade set (but may return a different
           reply).  GetCapability RPC ops were added to all standard sets for
           just this purpose.  Ops where the request formats differ cannot be
           used for probing.
      
       (5) The client must wait for completion of the probe before sending any
           further RPC ops to the same destination.  It should then use the
           service ID that recvmsg() reported back in all future calls.
      
       (6) The shadow service must have call definitions for all the operation
           IDs defined by the original service.
      
      
      To support service upgrading, a server should:
      
       (1) Call bind() twice on its AF_RXRPC socket before calling listen().
           Each bind() should supply a different service ID, but the transport
           addresses must be the same.  This allows the server to receive
           requests with either service ID.
      
       (2) Enable automatic upgrading by calling setsockopt(), specifying
           RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
           unsigned shorts as the argument:
      
      	unsigned short optval[2];
      
           This specifies a pair of service IDs.  They must be different and must
           match the service IDs bound to the socket.  Member 0 is the service ID
           to upgrade from and member 1 is the service ID to upgrade to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4722974d
    • D
      rxrpc: Permit multiple service binding · 28036f44
      David Howells 提交于
      Permit bind() to be called on an AF_RXRPC socket more than once (currently
      maximum twice) to bind multiple listening services to it.  There are some
      restrictions:
      
       (1) All bind() calls involved must have a non-zero service ID.
      
       (2) The service IDs must all be different.
      
       (3) The rest of the address (notably the transport part) must be the same
           in all (a single UDP socket is shared).
      
       (4) This must be done before listen() or sendmsg() is called.
      
      This allows someone to connect to the service socket with different service
      IDs and lays the foundation for service upgrading.
      
      The service ID used by an incoming call can be extracted from the msg_name
      returned by recvmsg().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      28036f44
  23. 26 5月, 2017 1 次提交
    • D
      rxrpc: Support network namespacing · 2baec2c3
      David Howells 提交于
      Support network namespacing in AF_RXRPC with the following changes:
      
       (1) All the local endpoint, peer and call lists, locks, counters, etc. are
           moved into the per-namespace record.
      
       (2) All the connection tracking is moved into the per-namespace record
           with the exception of the client connection ID tree, which is kept
           global so that connection IDs are kept unique per-machine.
      
       (3) Each namespace gets its own epoch.  This allows each network namespace
           to pretend to be a separate client machine.
      
       (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
           the contents reflect the namespace.
      
      fs/afs/ should be okay with this patch as it explicitly requires the current
      net namespace to be init_net to permit a mount to proceed at the moment.  It
      will, however, need updating so that cells, IP addresses and DNS records are
      per-namespace also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2baec2c3
  24. 06 4月, 2017 1 次提交
  25. 02 3月, 2017 1 次提交
    • D
      rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48
      David Howells 提交于
      All the routines by which rxrpc is accessed from the outside are serialised
      by means of the socket lock (sendmsg, recvmsg, bind,
      rxrpc_kernel_begin_call(), ...) and this presents a problem:
      
       (1) If a number of calls on the same socket are in the process of
           connection to the same peer, a maximum of four concurrent live calls
           are permitted before further calls need to wait for a slot.
      
       (2) If a call is waiting for a slot, it is deep inside sendmsg() or
           rxrpc_kernel_begin_call() and the entry function is holding the socket
           lock.
      
       (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
           from servicing the other calls as they need to take the socket lock to
           do so.
      
       (4) The socket is stuck until a call is aborted and makes its slot
           available to the waiter.
      
      Fix this by:
      
       (1) Provide each call with a mutex ('user_mutex') that arbitrates access
           by the users of rxrpc separately for each specific call.
      
       (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
           they've got a call and taken its mutex.
      
           Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
           set but someone else has the lock.  Should I instead only return
           EWOULDBLOCK if there's nothing currently to be done on a socket, and
           sleep in this particular instance because there is something to be
           done, but we appear to be blocked by the interrupt handler doing its
           ping?
      
       (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
           call, locking its user mutex and adding it to the socket's call tree.
           The call is returned locked so that sendmsg() can add data to it
           immediately.
      
           From the moment the call is in the socket tree, it is subject to
           access by sendmsg() and recvmsg() - even if it isn't connected yet.
      
       (4) Lock new service calls in the UDP data_ready handler (in
           rxrpc_new_incoming_call()) because they may already be in the socket's
           tree and the data_ready handler makes them live immediately if a user
           ID has already been preassigned.
      
           Note that the new call is locked before any notifications are sent
           that it is live, so doing mutex_trylock() *ought* to always succeed.
           Userspace is prevented from doing sendmsg() on calls that are in a
           too-early state in rxrpc_do_sendmsg().
      
       (5) Make rxrpc_new_incoming_call() return the call with the user mutex
           held so that a ping can be scheduled immediately under it.
      
           Note that it might be worth moving the ping call into
           rxrpc_new_incoming_call() and then we can drop the mutex there.
      
       (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
           release the socket after adding the call to the socket's tree.  This
           is slightly tricky as we've dequeued the call by that point and have
           to requeue it.
      
           Note that requeuing emits a trace event.
      
       (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
           new mutex immediately and don't bother with the socket mutex at all.
      
      This patch has the nice bonus that calls on the same socket are now to some
      extent parallelisable.
      
      Note that we might want to move rxrpc_service_prealloc() calls out from the
      socket lock and give it its own lock, so that we don't hang progress in
      other calls because we're waiting for the allocator.
      
      We probably also want to avoid calling rxrpc_notify_socket() from within
      the socket lock (rxrpc_accept_call()).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540b1c48
  26. 09 1月, 2017 1 次提交
  27. 06 10月, 2016 2 次提交
    • D
      rxrpc: Fix warning by splitting rxrpc_send_call_packet() · 26cb02aa
      David Howells 提交于
      Split rxrpc_send_data_packet() to separate ACK generation (which is more
      complicated) from ABORT generation.  This simplifies the code a bit and
      fixes the following warning:
      
      In file included from ../net/rxrpc/output.c:20:0:
      net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
      net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      net/rxrpc/output.c:103:24: note: 'top' was declared here
      net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      26cb02aa
    • D
      rxrpc: Fix oops on incoming call to serviceless endpoint · 7212a57e
      David Howells 提交于
      If an call comes in to a local endpoint that isn't listening for any
      incoming calls at the moment, an oops will happen.  We need to check that
      the local endpoint's service pointer isn't NULL before we dereference it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7212a57e
  28. 30 9月, 2016 1 次提交
    • D
      rxrpc: Reduce the rxrpc_local::services list to a pointer · 1e9e5c95
      David Howells 提交于
      Reduce the rxrpc_local::services list to just a pointer as we don't permit
      multiple service endpoints to bind to a single transport endpoints (this is
      excluded by rxrpc_lookup_local()).
      
      The reason we don't allow this is that if you send a request to an AFS
      filesystem service, it will try to talk back to your cache manager on the
      port you sent from (this is how file change notifications are handled).  To
      prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
      sockets share a UDP socket if at least one of them has a service bound.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1e9e5c95
  29. 17 9月, 2016 1 次提交