1. 20 3月, 2021 1 次提交
  2. 10 2月, 2021 1 次提交
    • S
      vsock: fix locking in vsock_shutdown() · 1c5fae9c
      Stefano Garzarella 提交于
      In vsock_shutdown() we touched some socket fields without holding the
      socket lock, such as 'state' and 'sk_flags'.
      
      Also, after the introduction of multi-transport, we are accessing
      'vsk->transport' in vsock_send_shutdown() without holding the lock
      and this call can be made while the connection is in progress, so
      the transport can change in the meantime.
      
      To avoid issues, we hold the socket lock when we enter in
      vsock_shutdown() and release it when we leave.
      
      Among the transports that implement the 'shutdown' callback, only
      hyperv_transport acquired the lock. Since the caller now holds it,
      we no longer take it.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c5fae9c
  3. 07 2月, 2021 2 次提交
  4. 02 2月, 2021 1 次提交
  5. 15 12月, 2020 2 次提交
    • A
      af_vsock: Assign the vsock transport considering the vsock address flags · 7f816984
      Andra Paraschiv 提交于
      The vsock flags field can be set in the connect path (user space app)
      and the (listen) receive path (kernel space logic).
      
      When the vsock transport is assigned, the remote CID is used to
      distinguish between types of connection.
      
      Use the vsock flags value (in addition to the CID) from the remote
      address to decide which vsock transport to assign. For the sibling VMs
      use case, all the vsock packets need to be forwarded to the host, so
      always assign the guest->host transport if the VMADDR_FLAG_TO_HOST flag
      is set. For the other use cases, the vsock transport assignment logic is
      not changed.
      
      Changelog
      
      v3 -> v4
      
      * Update the "remote_flags" local variable type to reflect the change of
        the "svm_flags" field to be 1 byte in size.
      
      v2 -> v3
      
      * Update bitwise check logic to not compare result to the flag value.
      
      v1 -> v2
      
      * Use bitwise operator to check the vsock flag.
      * Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
      * Merge the checks for the g2h transport assignment in one "if" block.
      Signed-off-by: NAndra Paraschiv <andraprs@amazon.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7f816984
    • A
      af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path · 1b5f2ab9
      Andra Paraschiv 提交于
      The vsock flags can be set during the connect() setup logic, when
      initializing the vsock address data structure variable. Then the vsock
      transport is assigned, also considering this flags field.
      
      The vsock transport is also assigned on the (listen) receive path. The
      flags field needs to be set considering the use case.
      
      Set the value of the vsock flags of the remote address to the one
      targeted for packets forwarding to the host, if the following conditions
      are met:
      
      * The source CID of the packet is higher than VMADDR_CID_HOST.
      * The destination CID of the packet is higher than VMADDR_CID_HOST.
      
      Changelog
      
      v3 -> v4
      
      * No changes.
      
      v2 -> v3
      
      * No changes.
      
      v1 -> v2
      
      * Set the vsock flag on the receive path in the vsock transport
        assignment logic.
      * Use bitwise operator for the vsock flag setup.
      * Use the updated "VMADDR_FLAG_TO_HOST" flag naming.
      Signed-off-by: NAndra Paraschiv <andraprs@amazon.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1b5f2ab9
  6. 15 11月, 2020 1 次提交
    • S
      vsock: forward all packets to the host when no H2G is registered · 65b422d9
      Stefano Garzarella 提交于
      Before commit c0cfa2d8 ("vsock: add multi-transports support"),
      if a G2H transport was loaded (e.g. virtio transport), every packets
      was forwarded to the host, regardless of the destination CID.
      The H2G transports implemented until then (vhost-vsock, VMCI) always
      responded with an error, if the destination CID was not
      VMADDR_CID_HOST.
      
      From that commit, we are using the remote CID to decide which
      transport to use, so packets with remote CID > VMADDR_CID_HOST(2)
      are sent only through H2G transport. If no H2G is available, packets
      are discarded directly in the guest.
      
      Some use cases (e.g. Nitro Enclaves [1]) rely on the old behaviour
      to implement sibling VMs communication, so we restore the old
      behavior when no H2G is registered.
      It will be up to the host to discard packets if the destination is
      not the right one. As it was already implemented before adding
      multi-transport support.
      
      Tested with nested QEMU/KVM by me and Nitro Enclaves by Andra.
      
      [1] Documentation/virt/ne_overview.rst
      
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Fixes: c0cfa2d8 ("vsock: add multi-transports support")
      Reported-by: NAndra Paraschiv <andraprs@amazon.com>
      Tested-by: NAndra Paraschiv <andraprs@amazon.com>
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20201112133837.34183-1-sgarzare@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      65b422d9
  7. 30 10月, 2020 2 次提交
  8. 27 10月, 2020 1 次提交
  9. 13 8月, 2020 1 次提交
    • S
      vsock: fix potential null pointer dereference in vsock_poll() · 1980c058
      Stefano Garzarella 提交于
      syzbot reported this issue where in the vsock_poll() we find the
      socket state at TCP_ESTABLISHED, but 'transport' is null:
        general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
        CPU: 0 PID: 8227 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:vsock_poll+0x75a/0x8e0 net/vmw_vsock/af_vsock.c:1038
        Call Trace:
         sock_poll+0x159/0x460 net/socket.c:1266
         vfs_poll include/linux/poll.h:90 [inline]
         do_pollfd fs/select.c:869 [inline]
         do_poll fs/select.c:917 [inline]
         do_sys_poll+0x607/0xd40 fs/select.c:1011
         __do_sys_poll fs/select.c:1069 [inline]
         __se_sys_poll fs/select.c:1057 [inline]
         __x64_sys_poll+0x18c/0x440 fs/select.c:1057
         do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This issue can happen if the TCP_ESTABLISHED state is set after we read
      the vsk->transport in the vsock_poll().
      
      We could put barriers to synchronize, but this can only happen during
      connection setup, so we can simply check that 'transport' is valid.
      
      Fixes: c0cfa2d8 ("vsock: add multi-transports support")
      Reported-and-tested-by: syzbot+a61bac2fcc1a7c6623fe@syzkaller.appspotmail.com
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: NJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1980c058
  10. 25 7月, 2020 1 次提交
  11. 20 7月, 2020 1 次提交
  12. 28 5月, 2020 1 次提交
  13. 28 2月, 2020 1 次提交
    • S
      vsock: fix potential deadlock in transport->release() · 3f74957f
      Stefano Garzarella 提交于
      Some transports (hyperv, virtio) acquire the sock lock during the
      .release() callback.
      
      In the vsock_stream_connect() we call vsock_assign_transport(); if
      the socket was previously assigned to another transport, the
      vsk->transport->release() is called, but the sock lock is already
      held in the vsock_stream_connect(), causing a deadlock reported by
      syzbot:
      
          INFO: task syz-executor280:9768 blocked for more than 143 seconds.
            Not tainted 5.6.0-rc1-syzkaller #0
          "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          syz-executor280 D27912  9768   9766 0x00000000
          Call Trace:
           context_switch kernel/sched/core.c:3386 [inline]
           __schedule+0x934/0x1f90 kernel/sched/core.c:4082
           schedule+0xdc/0x2b0 kernel/sched/core.c:4156
           __lock_sock+0x165/0x290 net/core/sock.c:2413
           lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
           virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
           vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
           vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
           __sys_connect_file+0x161/0x1c0 net/socket.c:1857
           __sys_connect+0x174/0x1b0 net/socket.c:1874
           __do_sys_connect net/socket.c:1885 [inline]
           __se_sys_connect net/socket.c:1882 [inline]
           __x64_sys_connect+0x73/0xb0 net/socket.c:1882
           do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      To avoid this issue, this patch remove the lock acquiring in the
      .release() callback of hyperv and virtio transports, and it holds
      the lock when we call vsk->transport->release() in the vsock core.
      
      Reported-by: syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com
      Fixes: 408624af ("vsock: use local transport when it is loaded")
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f74957f
  14. 12 12月, 2019 2 次提交
  15. 22 11月, 2019 1 次提交
  16. 15 11月, 2019 9 次提交
  17. 07 11月, 2019 1 次提交
  18. 06 11月, 2019 1 次提交
  19. 29 10月, 2019 1 次提交
  20. 02 10月, 2019 1 次提交
    • D
      vsock: Fix a lockdep warning in __vsock_release() · 0d9138ff
      Dexuan Cui 提交于
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d9138ff
  21. 15 6月, 2019 1 次提交
    • S
      vsock: correct removal of socket from the list · d5afa82c
      Sunil Muthuswamy 提交于
      The current vsock code for removal of socket from the list is both
      subject to race and inefficient. It takes the lock, checks whether
      the socket is in the list, drops the lock and if the socket was on the
      list, deletes it from the list. This is subject to race because as soon
      as the lock is dropped once it is checked for presence, that condition
      cannot be relied upon for any decision. It is also inefficient because
      if the socket is present in the list, it takes the lock twice.
      Signed-off-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5afa82c
  22. 05 6月, 2019 1 次提交
  23. 04 2月, 2019 1 次提交
  24. 16 1月, 2019 1 次提交
  25. 15 12月, 2018 1 次提交
  26. 08 8月, 2018 1 次提交
    • C
      vsock: split dwork to avoid reinitializations · 455f05ec
      Cong Wang 提交于
      syzbot reported that we reinitialize an active delayed
      work in vsock_stream_connect():
      
      	ODEBUG: init active (active state 0) object type: timer_list hint:
      	delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
      	WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
      	debug_print_object+0x16a/0x210 lib/debugobjects.c:326
      
      The pattern is apparently wrong, we should only initialize
      the dealyed work once and could repeatly schedule it. So we
      have to move out the initializations to allocation side.
      And to avoid confusion, we can split the shared dwork
      into two, instead of re-using the same one.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
      Cc: Andy king <acking@vmware.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      455f05ec
  27. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  28. 26 5月, 2018 1 次提交