1. 08 10月, 2019 1 次提交
    • D
      vsock: Fix a lockdep warning in __vsock_release() · 3c1f0704
      Dexuan Cui 提交于
      [ Upstream commit 0d9138ffac24cf8b75366ede3a68c951e6dcc575 ]
      
      Lockdep is unhappy if two locks from the same class are held.
      
      Fix the below warning for hyperv and virtio sockets (vmci socket code
      doesn't have the issue) by using lock_sock_nested() when __vsock_release()
      is called recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.3.0+ #1 Not tainted
      --------------------------------------------
      server/1795 is trying to acquire lock:
      ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]
      
      but task is already holding lock:
      ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(sk_lock-AF_VSOCK);
        lock(sk_lock-AF_VSOCK);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by server/1795:
       #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
       #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]
      
      stack backtrace:
      CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
      Call Trace:
       dump_stack+0x67/0x90
       __lock_acquire.cold.67+0xd2/0x20b
       lock_acquire+0xb5/0x1c0
       lock_sock_nested+0x6d/0x90
       hvs_release+0x10/0x120 [hv_sock]
       __vsock_release+0x24/0xf0 [vsock]
       __vsock_release+0xa0/0xf0 [vsock]
       vsock_release+0x12/0x30 [vsock]
       __sock_release+0x37/0xa0
       sock_close+0x14/0x20
       __fput+0xc1/0x250
       task_work_run+0x98/0xc0
       do_exit+0x344/0xc60
       do_group_exit+0x47/0xb0
       get_signal+0x15c/0xc50
       do_signal+0x30/0x720
       exit_to_usermode_loop+0x50/0xa0
       do_syscall_64+0x24e/0x270
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f4184e85f31
      Tested-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c1f0704
  2. 16 9月, 2019 1 次提交
    • D
      hv_sock: Fix hang when a connection is closed · 91a71a61
      Dexuan Cui 提交于
      [ Upstream commit 685703b497bacea8765bb409d6b73455b73c540e ]
      
      There is a race condition for an established connection that is being closed
      by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
      'remove_sock' is false):
      
      1 for the initial value;
      1 for the sk being in the bound list;
      1 for the sk being in the connected list;
      1 for the delayed close_work.
      
      After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
      decrease the refcnt to 3.
      
      Concurrently, hvs_close_connection() runs in another thread:
        calls vsock_remove_sock() to decrease the refcnt by 2;
        call sock_put() to decrease the refcnt to 0, and free the sk;
        next, the "release_sock(sk)" may hang due to use-after-free.
      
      In the above, after hvs_release() finishes, if hvs_close_connection() runs
      faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
      because at the beginning of hvs_close_connection(), the refcnt is still 4.
      
      The issue can be resolved if an extra reference is taken when the
      connection is established.
      
      Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Reviewed-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      91a71a61
  3. 04 8月, 2019 2 次提交
    • S
      vsock: correct removal of socket from the list · 8a474bc4
      Sunil Muthuswamy 提交于
      commit d5afa82c977ea06f7119058fa0eb8519ea501031 upstream.
      
      The current vsock code for removal of socket from the list is both
      subject to race and inefficient. It takes the lock, checks whether
      the socket is in the list, drops the lock and if the socket was on the
      list, deletes it from the list. This is subject to race because as soon
      as the lock is dropped once it is checked for presence, that condition
      cannot be relied upon for any decision. It is also inefficient because
      if the socket is present in the list, it takes the lock twice.
      Signed-off-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a474bc4
    • S
      hv_sock: Add support for delayed close · 9d3586bc
      Sunil Muthuswamy 提交于
      commit a9eeb998c28d5506616426bd3a216bd5735a18b8 upstream.
      
      Currently, hvsock does not implement any delayed or background close
      logic. Whenever the hvsock socket is closed, a FIN is sent to the peer, and
      the last reference to the socket is dropped, which leads to a call to
      .destruct where the socket can hang indefinitely waiting for the peer to
      close it's side. The can cause the user application to hang in the close()
      call.
      
      This change implements proper STREAM(TCP) closing handshake mechanism by
      sending the FIN to the peer and the waiting for the peer's FIN to arrive
      for a given timeout. On timeout, it will try to terminate the connection
      (i.e. a RST). This is in-line with other socket providers such as virtio.
      
      This change does not address the hang in the vmbus_hvsock_device_unregister
      where it waits indefinitely for the host to rescind the channel. That
      should be taken up as a separate fix.
      Signed-off-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d3586bc
  4. 31 7月, 2019 1 次提交
    • S
      hvsock: fix epollout hang from race condition · 49fb03de
      Sunil Muthuswamy 提交于
      [ Upstream commit cb359b60416701c8bed82fec79de25a144beb893 ]
      
      Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
      not return even when the hvsock socket is writable, under some race
      condition. This can happen under the following sequence:
      - fd = socket(hvsocket)
      - fd_out = dup(fd)
      - fd_in = dup(fd)
      - start a writer thread that writes data to fd_out with a combination of
        epoll_wait(fd_out, EPOLLOUT) and
      - start a reader thread that reads data from fd_in with a combination of
        epoll_wait(fd_in, EPOLLIN)
      - On the host, there are two threads that are reading/writing data to the
        hvsocket
      
      stack:
      hvs_stream_has_space
      hvs_notify_poll_out
      vsock_poll
      sock_poll
      ep_poll
      
      Race condition:
      check for epollout from ep_poll():
      	assume no writable space in the socket
      	hvs_stream_has_space() returns 0
      check for epollin from ep_poll():
      	assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
      	hvs_stream_has_space() will clear the channel pending send size
      	host will not notify the guest because the pending send size has
      		been cleared and so the hvsocket will never mark the
      		socket writable
      
      Now, the EPOLLOUT will never return even if the socket write buffer is
      empty.
      
      The fix is to set the pending size to the default size and never change it.
      This way the host will always notify the guest whenever the writable space
      is bigger than the pending size. The host is already optimized to *only*
      notify the guest when the pending size threshold boundary is crossed and
      not everytime.
      
      This change also reduces the cpu usage somewhat since hv_stream_has_space()
      is in the hotpath of send:
      vsock_stream_sendmsg()->hv_stream_has_space()
      Earlier hv_stream_has_space was setting/clearing the pending size on every
      call.
      Signed-off-by: NSunil Muthuswamy <sunilmut@microsoft.com>
      Reviewed-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      49fb03de
  5. 22 6月, 2019 1 次提交
  6. 26 5月, 2019 2 次提交
    • J
      vsock/virtio: Initialize core virtio vsock before registering the driver · 9f12f4c9
      Jorge E. Moreira 提交于
      [ Upstream commit ba95e5dfd36647622d8897a2a0470dde60e59ffd ]
      
      Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are
      accessed (while handling interrupts) before they are initialized.
      
      [    4.201410] BUG: unable to handle kernel paging request at ffffffffffffffe8
      [    4.207829] IP: vsock_addr_equals_addr+0x3/0x20
      [    4.211379] PGD 28210067 P4D 28210067 PUD 28212067 PMD 0
      [    4.211379] Oops: 0000 [#1] PREEMPT SMP PTI
      [    4.211379] Modules linked in:
      [    4.211379] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 4.14.106-419297-gd7e28cc1f241 #1
      [    4.211379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [    4.211379] Workqueue: virtio_vsock virtio_transport_rx_work
      [    4.211379] task: ffffa3273d175280 task.stack: ffffaea1800e8000
      [    4.211379] RIP: 0010:vsock_addr_equals_addr+0x3/0x20
      [    4.211379] RSP: 0000:ffffaea1800ebd28 EFLAGS: 00010286
      [    4.211379] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffffffffb94e42f0
      [    4.211379] RDX: 0000000000000400 RSI: ffffffffffffffe0 RDI: ffffaea1800ebdd0
      [    4.211379] RBP: ffffaea1800ebd58 R08: 0000000000000001 R09: 0000000000000001
      [    4.211379] R10: 0000000000000000 R11: ffffffffb89d5d60 R12: ffffaea1800ebdd0
      [    4.211379] R13: 00000000828cbfbf R14: 0000000000000000 R15: ffffaea1800ebdc0
      [    4.211379] FS:  0000000000000000(0000) GS:ffffa3273fd00000(0000) knlGS:0000000000000000
      [    4.211379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    4.211379] CR2: ffffffffffffffe8 CR3: 000000002820e001 CR4: 00000000001606e0
      [    4.211379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    4.211379] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    4.211379] Call Trace:
      [    4.211379]  ? vsock_find_connected_socket+0x6c/0xe0
      [    4.211379]  virtio_transport_recv_pkt+0x15f/0x740
      [    4.211379]  ? detach_buf+0x1b5/0x210
      [    4.211379]  virtio_transport_rx_work+0xb7/0x140
      [    4.211379]  process_one_work+0x1ef/0x480
      [    4.211379]  worker_thread+0x312/0x460
      [    4.211379]  kthread+0x132/0x140
      [    4.211379]  ? process_one_work+0x480/0x480
      [    4.211379]  ? kthread_destroy_worker+0xd0/0xd0
      [    4.211379]  ret_from_fork+0x35/0x40
      [    4.211379] Code: c7 47 08 00 00 00 00 66 c7 07 28 00 c7 47 08 ff ff ff ff c7 47 04 ff ff ff ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 8b 47 08 <3b> 46 08 75 0a 8b 47 04 3b 46 04 0f 94 c0 c3 31 c0 c3 90 66 2e
      [    4.211379] RIP: vsock_addr_equals_addr+0x3/0x20 RSP: ffffaea1800ebd28
      [    4.211379] CR2: ffffffffffffffe8
      [    4.211379] ---[ end trace f31cc4a2e6df3689 ]---
      [    4.211379] Kernel panic - not syncing: Fatal exception in interrupt
      [    4.211379] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [    4.211379] Rebooting in 5 seconds..
      
      Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug")
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Stefano Garzarella <sgarzare@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: kvm@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: netdev@vger.kernel.org
      Cc: kernel-team@android.com
      Cc: stable@vger.kernel.org [4.9+]
      Signed-off-by: NJorge E. Moreira <jemoreira@google.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f12f4c9
    • S
      vsock/virtio: free packets during the socket release · 4af8a327
      Stefano Garzarella 提交于
      [ Upstream commit ac03046ece2b158ebd204dfc4896fd9f39f0e6c8 ]
      
      When the socket is released, we should free all packets
      queued in the per-socket list in order to avoid a memory
      leak.
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af8a327
  7. 02 5月, 2019 1 次提交
    • A
      vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock · 8e596397
      Adalbert Lazăr 提交于
      [ Upstream commit 4c404ce23358d5d8fbdeb7a6021a9b33d3c3c167 ]
      
      Previous to commit 22b5c0b63f32 ("vsock/virtio: fix kernel panic
      after device hot-unplug"), vsock_core_init() was called from
      virtio_vsock_probe(). Now, virtio_transport_reset_no_sock() can be called
      before vsock_core_init() has the chance to run.
      
      [Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
      [Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault]
      [Wed Feb 27 14:17:09 2019] PGD 0 P4D 0
      [Wed Feb 27 14:17:09 2019] Oops: 0000 [#1] SMP PTI
      [Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 5.0.0-rc7-390-generic-hvi #390
      [Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
      [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 [vmw_vsock_virtio_transport_common]
      [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28
      [Wed Feb 27 14:17:09 2019] RSP: 0018:ffffb42701ab7d40 EFLAGS: 00010282
      [Wed Feb 27 14:17:09 2019] RAX: 0000000000000000 RBX: ffff9d79637ee080 RCX: 0000000000000003
      [Wed Feb 27 14:17:09 2019] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9d79637ee080
      [Wed Feb 27 14:17:09 2019] RBP: ffffb42701ab7d78 R08: ffff9d796fae70e0 R09: ffff9d796f403500
      [Wed Feb 27 14:17:09 2019] R10: ffffb42701ab7d90 R11: 0000000000000000 R12: ffff9d7969d09240
      [Wed Feb 27 14:17:09 2019] R13: ffff9d79624e6840 R14: ffff9d7969d09318 R15: ffff9d796d48ff80
      [Wed Feb 27 14:17:09 2019] FS:  0000000000000000(0000) GS:ffff9d796fac0000(0000) knlGS:0000000000000000
      [Wed Feb 27 14:17:09 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [Wed Feb 27 14:17:09 2019] CR2: 0000000000000110 CR3: 0000000427f22000 CR4: 00000000000006e0
      [Wed Feb 27 14:17:09 2019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [Wed Feb 27 14:17:09 2019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [Wed Feb 27 14:17:09 2019] Call Trace:
      [Wed Feb 27 14:17:09 2019]  virtio_transport_recv_pkt+0x63/0x820 [vmw_vsock_virtio_transport_common]
      [Wed Feb 27 14:17:09 2019]  ? kfree+0x17e/0x190
      [Wed Feb 27 14:17:09 2019]  ? detach_buf_split+0x145/0x160
      [Wed Feb 27 14:17:09 2019]  ? __switch_to_asm+0x40/0x70
      [Wed Feb 27 14:17:09 2019]  virtio_transport_rx_work+0xa0/0x106 [vmw_vsock_virtio_transport]
      [Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40
      [Wed Feb 27 14:17:09 2019]  process_one_work+0x167/0x410
      [Wed Feb 27 14:17:09 2019]  worker_thread+0x4d/0x460
      [Wed Feb 27 14:17:09 2019]  kthread+0x105/0x140
      [Wed Feb 27 14:17:09 2019]  ? rescuer_thread+0x360/0x360
      [Wed Feb 27 14:17:09 2019]  ? kthread_destroy_worker+0x50/0x50
      [Wed Feb 27 14:17:09 2019]  ret_from_fork+0x35/0x40
      [Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi virtio_blk failover floppy
      
      Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug")
      Reported-by: NAlexandru Herghelegiu <aherghelegiu@bitdefender.com>
      Signed-off-by: NAdalbert Lazăr <alazar@bitdefender.com>
      Co-developed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8e596397
  8. 14 3月, 2019 2 次提交
  9. 23 2月, 2019 1 次提交
  10. 10 1月, 2019 1 次提交
  11. 08 8月, 2018 1 次提交
    • C
      vsock: split dwork to avoid reinitializations · 455f05ec
      Cong Wang 提交于
      syzbot reported that we reinitialize an active delayed
      work in vsock_stream_connect():
      
      	ODEBUG: init active (active state 0) object type: timer_list hint:
      	delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
      	WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
      	debug_print_object+0x16a/0x210 lib/debugobjects.c:326
      
      The pattern is apparently wrong, we should only initialize
      the dealyed work once and could repeatly schedule it. So we
      have to move out the initializations to allocation side.
      And to avoid confusion, we can split the shared dwork
      into two, instead of re-using the same one.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
      Cc: Andy king <acking@vmware.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      455f05ec
  12. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  13. 22 6月, 2018 1 次提交
  14. 26 5月, 2018 1 次提交
  15. 17 4月, 2018 1 次提交
    • S
      VSOCK: make af_vsock.ko removable again · 05e489b1
      Stefan Hajnoczi 提交于
      Commit c1eef220 ("vsock: always call
      vsock_init_tables()") introduced a module_init() function without a
      corresponding module_exit() function.
      
      Modules with an init function can only be removed if they also have an
      exit function.  Therefore the vsock module was considered "permanent"
      and could not be removed.
      
      This patch adds an empty module_exit() function so that "rmmod vsock"
      works.  No explicit cleanup is required because:
      
      1. Transports call vsock_core_exit() upon exit and cannot be removed
         while sockets are still alive.
      2. vsock_diag.ko does not perform any action that requires cleanup by
         vsock.ko.
      
      Fixes: c1eef220 ("vsock: always call vsock_init_tables()")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05e489b1
  16. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  17. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  18. 27 1月, 2018 1 次提交
  19. 06 12月, 2017 1 次提交
  20. 29 11月, 2017 1 次提交
  21. 28 11月, 2017 1 次提交
  22. 26 11月, 2017 1 次提交
  23. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  24. 26 10月, 2017 1 次提交
  25. 21 10月, 2017 1 次提交
    • D
      hv_sock: add locking in the open/close/release code paths · b4562ca7
      Dexuan Cui 提交于
      Without the patch, when hvs_open_connection() hasn't completely established
      a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
      inserted the sock into the connected queue), vsock_stream_connect() may see
      the sk_state change and return the connection to the userspace, and next
      when the userspace closes the connection quickly, hvs_release() may not see
      the connection in the connected queue; finally hvs_open_connection()
      inserts the connection into the queue, but we won't be able to purge the
      connection for ever.
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4562ca7
  26. 06 10月, 2017 4 次提交
    • S
      VSOCK: add sock_diag interface · 413a4317
      Stefan Hajnoczi 提交于
      This patch adds the sock_diag interface for querying sockets from
      userspace.  Tools like ss(8) and netstat(8) can use this interface to
      list open sockets.
      
      The userspace ABI is defined in <linux/vm_sockets_diag.h> and includes
      netlink request and response structs.  The request can query sockets
      based on their sk_state (e.g. listening sockets only) and the response
      contains socket information fields including the local/remote addresses,
      inode number, etc.
      
      This patch does not dump VMCI pending sockets because I have only tested
      the virtio transport, which does not use pending sockets.  Support can
      be added later by extending vsock_diag_dump() if needed by VMCI users.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      413a4317
    • S
      VSOCK: use TCP state constants for sk_state · 3b4477d2
      Stefan Hajnoczi 提交于
      There are two state fields: socket->state and sock->sk_state.  The
      socket->state field uses SS_UNCONNECTED, SS_CONNECTED, etc while the
      sock->sk_state typically uses values that match TCP state constants
      (TCP_CLOSE, TCP_ESTABLISHED).  AF_VSOCK does not follow this convention
      and instead uses SS_* constants for both fields.
      
      The sk_state field will be exposed to userspace through the vsock_diag
      interface for ss(8), netstat(8), and other programs.
      
      This patch switches sk_state to TCP state constants so that the meaning
      of this field is consistent with other address families.  Not just
      AF_INET and AF_INET6 use the TCP constants, AF_UNIX and others do too.
      
      The following mapping was used to convert the code:
      
        SS_FREE -> TCP_CLOSE
        SS_UNCONNECTED -> TCP_CLOSE
        SS_CONNECTING -> TCP_SYN_SENT
        SS_CONNECTED -> TCP_ESTABLISHED
        SS_DISCONNECTING -> TCP_CLOSING
        VSOCK_SS_LISTEN -> TCP_LISTEN
      
      In __vsock_create() the sk_state initialization was dropped because
      sock_init_data() already initializes sk_state to TCP_CLOSE.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b4477d2
    • S
      VSOCK: move __vsock_in_bound/connected_table() to af_vsock.h · bf359b81
      Stefan Hajnoczi 提交于
      The vsock_diag.ko module will need to check socket table membership.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf359b81
    • S
      VSOCK: export socket tables for sock_diag interface · 44f20980
      Stefan Hajnoczi 提交于
      The socket table symbols need to be exported from vsock.ko so that the
      vsock_diag.ko module will be able to traverse sockets.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f20980
  27. 20 9月, 2017 1 次提交
  28. 29 8月, 2017 1 次提交
    • D
      hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK) · ae0078fc
      Dexuan Cui 提交于
      Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
      mechanism between the host and the guest. It uses VMBus ringbuffer as the
      transportation layer.
      
      With hv_sock, applications between the host (Windows 10, Windows Server
      2016 or newer) and the guest can talk with each other using the traditional
      socket APIs.
      
      More info about Hyper-V Sockets is available here:
      
      "Make your own integration services":
      https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service
      
      The patch implements the necessary support in Linux guest by introducing a new
      vsock transport for AF_VSOCK.
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Andy King <acking@vmware.com>
      Cc: Dmitry Torokhov <dtor@vmware.com>
      Cc: George Zhang <georgezhang@vmware.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: Reilly Grant <grantr@vmware.com>
      Cc: Asias He <asias@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0078fc
  29. 21 6月, 2017 1 次提交
  30. 16 6月, 2017 2 次提交
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1
    • J
      networking: introduce and use skb_put_data() · 59ae1d12
      Johannes Berg 提交于
      A common pattern with skb_put() is to just want to memcpy()
      some data into the new space, introduce skb_put_data() for
      this.
      
      An spatch similar to the one for skb_put_zero() converts many
      of the places using it:
      
          @@
          identifier p, p2;
          expression len, skb, data;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, len);
          |
          -memcpy(p, data, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb, data;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, sizeof(*p));
          |
          -memcpy(p, data, sizeof(*p));
          )
      
          @@
          expression skb, len, data;
          @@
          -memcpy(skb_put(skb, len), data, len);
          +skb_put_data(skb, data, len);
      
      (again, manually post-processed to retain some comments)
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59ae1d12
  31. 23 5月, 2017 1 次提交
  32. 03 5月, 2017 1 次提交
  33. 25 4月, 2017 1 次提交