1. 10 11月, 2019 1 次提交
  2. 08 10月, 2019 1 次提交
    • D
      vsock: Fix a lockdep warning in __vsock_release() · 3c1f0704
      Dexuan Cui 提交于
      [ Upstream commit 0d9138ffac24cf8b75366ede3a68c951e6dcc575 ]
      
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c1f0704
  3. 04 8月, 2019 1 次提交
  4. 08 8月, 2018 1 次提交
    • C
      vsock: split dwork to avoid reinitializations · 455f05ec
      Cong Wang 提交于
      syzbot reported that we reinitialize an active delayed
      work in vsock_stream_connect():
      
      	ODEBUG: init active (active state 0) object type: timer_list hint:
      	delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
      	WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
      	debug_print_object+0x16a/0x210 lib/debugobjects.c:326
      
      The pattern is apparently wrong, we should only initialize
      the dealyed work once and could repeatly schedule it. So we
      have to move out the initializations to allocation side.
      And to avoid confusion, we can split the shared dwork
      into two, instead of re-using the same one.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
      Cc: Andy king <acking@vmware.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      455f05ec
  5. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  6. 26 5月, 2018 1 次提交
  7. 17 4月, 2018 1 次提交
    • S
      VSOCK: make af_vsock.ko removable again · 05e489b1
      Stefan Hajnoczi 提交于
      Commit c1eef220 ("vsock: always call
      vsock_init_tables()") introduced a module_init() function without a
      corresponding module_exit() function.
      
      Modules with an init function can only be removed if they also have an
      exit function.  Therefore the vsock module was considered "permanent"
      and could not be removed.
      
      This patch adds an empty module_exit() function so that "rmmod vsock"
      works.  No explicit cleanup is required because:
      
      1. Transports call vsock_core_exit() upon exit and cannot be removed
         while sockets are still alive.
      2. vsock_diag.ko does not perform any action that requires cleanup by
         vsock.ko.
      
      Fixes: c1eef220 ("vsock: always call vsock_init_tables()")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05e489b1
  8. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  9. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  10. 27 1月, 2018 1 次提交
  11. 28 11月, 2017 1 次提交
  12. 26 10月, 2017 1 次提交
  13. 06 10月, 2017 3 次提交
  14. 23 5月, 2017 1 次提交
  15. 22 3月, 2017 1 次提交
  16. 10 3月, 2017 1 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
  17. 02 3月, 2017 1 次提交
  18. 27 9月, 2016 1 次提交
  19. 02 8月, 2016 2 次提交
  20. 27 6月, 2016 1 次提交
  21. 06 5月, 2016 1 次提交
    • I
      VSOCK: do not disconnect socket when peer has shutdown SEND only · dedc58e0
      Ian Campbell 提交于
      The peer may be expecting a reply having sent a request and then done a
      shutdown(SHUT_WR), so tearing down the whole socket at this point seems
      wrong and breaks for me with a client which does a SHUT_WR.
      
      Looking at other socket family's stream_recvmsg callbacks doing a shutdown
      here does not seem to be the norm and removing it does not seem to have
      had any adverse effects that I can see.
      
      I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
      on the vmci transport.
      Signed-off-by: NIan Campbell <ian.campbell@docker.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Cc: Andy King <acking@vmware.com>
      Cc: Dmitry Torokhov <dtor@vmware.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: Adit Ranadive <aditr@vmware.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dedc58e0
  22. 23 3月, 2016 2 次提交
  23. 13 2月, 2016 1 次提交
    • L
      vsock: Fix blocking ops call in prepare_to_wait · 59888180
      Laura Abbott 提交于
      We receoved a bug report from someone using vmware:
      
      WARNING: CPU: 3 PID: 660 at kernel/sched/core.c:7389
      __might_sleep+0x7d/0x90()
      do not call blocking ops when !TASK_RUNNING; state=1 set at
      [<ffffffff810fa68d>] prepare_to_wait+0x2d/0x90
      Modules linked in: vmw_vsock_vmci_transport vsock snd_seq_midi
      snd_seq_midi_event snd_ens1371 iosf_mbi gameport snd_rawmidi
      snd_ac97_codec ac97_bus snd_seq coretemp snd_seq_device snd_pcm
      snd_timer snd soundcore ppdev crct10dif_pclmul crc32_pclmul
      ghash_clmulni_intel vmw_vmci vmw_balloon i2c_piix4 shpchp parport_pc
      parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs
      xor raid6_pq 8021q garp stp llc mrp crc32c_intel serio_raw mptspi vmwgfx
      drm_kms_helper ttm drm scsi_transport_spi mptscsih e1000 ata_generic
      mptbase pata_acpi
      CPU: 3 PID: 660 Comm: vmtoolsd Not tainted
      4.2.0-0.rc1.git3.1.fc23.x86_64 #1
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
      Reference Platform, BIOS 6.00 05/20/2014
       0000000000000000 0000000049e617f3 ffff88006ac37ac8 ffffffff818641f5
       0000000000000000 ffff88006ac37b20 ffff88006ac37b08 ffffffff810ab446
       ffff880068009f40 ffffffff81c63bc0 0000000000000061 0000000000000000
      Call Trace:
       [<ffffffff818641f5>] dump_stack+0x4c/0x65
       [<ffffffff810ab446>] warn_slowpath_common+0x86/0xc0
       [<ffffffff810ab4d5>] warn_slowpath_fmt+0x55/0x70
       [<ffffffff8112551d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
       [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
       [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
       [<ffffffff810da2bd>] __might_sleep+0x7d/0x90
       [<ffffffff812163b3>] __might_fault+0x43/0xa0
       [<ffffffff81430477>] copy_from_iter+0x87/0x2a0
       [<ffffffffa039460a>] __qp_memcpy_to_queue+0x9a/0x1b0 [vmw_vmci]
       [<ffffffffa0394740>] ? qp_memcpy_to_queue+0x20/0x20 [vmw_vmci]
       [<ffffffffa0394757>] qp_memcpy_to_queue_iov+0x17/0x20 [vmw_vmci]
       [<ffffffffa0394d50>] qp_enqueue_locked+0xa0/0x140 [vmw_vmci]
       [<ffffffffa039593f>] vmci_qpair_enquev+0x4f/0xd0 [vmw_vmci]
       [<ffffffffa04847bb>] vmci_transport_stream_enqueue+0x1b/0x20
      [vmw_vsock_vmci_transport]
       [<ffffffffa047ae05>] vsock_stream_sendmsg+0x2c5/0x320 [vsock]
       [<ffffffff810fabd0>] ? wake_atomic_t_function+0x70/0x70
       [<ffffffff81702af8>] sock_sendmsg+0x38/0x50
       [<ffffffff81702ff4>] SYSC_sendto+0x104/0x190
       [<ffffffff8126e25a>] ? vfs_read+0x8a/0x140
       [<ffffffff817042ee>] SyS_sendto+0xe/0x10
       [<ffffffff8186d9ae>] entry_SYSCALL_64_fastpath+0x12/0x76
      
      transport->stream_enqueue may call copy_to_user so it should
      not be called inside a prepare_to_wait. Narrow the scope of
      the prepare_to_wait to avoid the bad call. This also applies
      to vsock_stream_recvmsg as well.
      Reported-by: NVinson Lee <vlee@freedesktop.org>
      Tested-by: NVinson Lee <vlee@freedesktop.org>
      Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59888180
  24. 09 12月, 2015 1 次提交
  25. 04 12月, 2015 1 次提交
  26. 02 11月, 2015 1 次提交
    • S
      VSOCK: define VSOCK_SS_LISTEN once only · ea3803c1
      Stefan Hajnoczi 提交于
      The SS_LISTEN socket state is defined by both af_vsock.c and
      vmci_transport.c.  This is risky since the value could be changed in one
      file and the other would be out of sync.
      
      Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part
      of enum socket_state (SS_CONNECTED, ...).  This way it is clear that the
      constant is vsock-specific.
      
      The big text reflow in af_vsock.c was necessary to keep to the maximum
      line length.  Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea3803c1
  27. 21 10月, 2015 1 次提交
  28. 11 5月, 2015 1 次提交
  29. 03 3月, 2015 1 次提交
  30. 24 11月, 2014 1 次提交
  31. 06 5月, 2014 1 次提交
  32. 21 11月, 2013 1 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
  33. 06 8月, 2013 1 次提交
  34. 28 7月, 2013 1 次提交
  35. 24 6月, 2013 2 次提交