1. 16 2月, 2018 17 次提交
    • L
      RDMA/uverbs: Fix circular locking dependency · 1ff5325c
      Leon Romanovsky 提交于
      Avoid circular locking dependency by calling
      to uobj_alloc_commit() outside of xrcd_tree_mutex lock.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.15.0+ #87 Not tainted
      ------------------------------------------------------
      syzkaller401056/269 is trying to acquire lock:
       (&uverbs_dev->xrcd_tree_mutex){+.+.}, at: [<000000006c12d2cd>] uverbs_free_xrcd+0xd2/0x360
      
      but task is already holding lock:
       (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&ucontext->uobjects_lock){+.+.}:
             __mutex_lock+0x111/0x1720
             rdma_alloc_commit_uobject+0x22c/0x600
             ib_uverbs_open_xrcd+0x61a/0xdd0
             ib_uverbs_write+0x7f9/0xef0
             __vfs_write+0x10d/0x700
             vfs_write+0x1b0/0x550
             SyS_write+0xc7/0x1a0
             entry_SYSCALL_64_fastpath+0x1e/0x8b
      
      -> #0 (&uverbs_dev->xrcd_tree_mutex){+.+.}:
             lock_acquire+0x19d/0x440
             __mutex_lock+0x111/0x1720
             uverbs_free_xrcd+0xd2/0x360
             remove_commit_idr_uobject+0x6d/0x110
             uverbs_cleanup_ucontext+0x2f0/0x730
             ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120
             ib_uverbs_close+0xf2/0x570
             __fput+0x2cd/0x8d0
             task_work_run+0xec/0x1d0
             do_exit+0x6a1/0x1520
             do_group_exit+0xe8/0x380
             SyS_exit_group+0x1e/0x20
             entry_SYSCALL_64_fastpath+0x1e/0x8b
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&ucontext->uobjects_lock);
                                     lock(&uverbs_dev->xrcd_tree_mutex);
                                     lock(&ucontext->uobjects_lock);
        lock(&uverbs_dev->xrcd_tree_mutex);
      
       *** DEADLOCK ***
      
      3 locks held by syzkaller401056/269:
       #0:  (&file->cleanup_mutex){+.+.}, at: [<00000000c9f0c252>] ib_uverbs_close+0xac/0x570
       #1:  (&ucontext->cleanup_rwsem){++++}, at: [<00000000b6994d49>] uverbs_cleanup_ucontext+0xf6/0x730
       #2:  (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730
      
      stack backtrace:
      CPU: 0 PID: 269 Comm: syzkaller401056 Not tainted 4.15.0+ #87
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
       dump_stack+0xde/0x164
       ? dma_virt_map_sg+0x22c/0x22c
       ? uverbs_cleanup_ucontext+0x168/0x730
       ? console_unlock+0x502/0xbd0
       print_circular_bug.isra.24+0x35e/0x396
       ? print_circular_bug_header+0x12e/0x12e
       ? find_usage_backwards+0x30/0x30
       ? entry_SYSCALL_64_fastpath+0x1e/0x8b
       validate_chain.isra.28+0x25d1/0x40c0
       ? check_usage+0xb70/0xb70
       ? graph_lock+0x160/0x160
       ? find_usage_backwards+0x30/0x30
       ? cyc2ns_read_end+0x10/0x10
       ? print_irqtrace_events+0x280/0x280
       ? __lock_acquire+0x93d/0x1630
       __lock_acquire+0x93d/0x1630
       lock_acquire+0x19d/0x440
       ? uverbs_free_xrcd+0xd2/0x360
       __mutex_lock+0x111/0x1720
       ? uverbs_free_xrcd+0xd2/0x360
       ? uverbs_free_xrcd+0xd2/0x360
       ? __mutex_lock+0x828/0x1720
       ? mutex_lock_io_nested+0x1550/0x1550
       ? uverbs_cleanup_ucontext+0x168/0x730
       ? __lock_acquire+0x9a9/0x1630
       ? mutex_lock_io_nested+0x1550/0x1550
       ? uverbs_cleanup_ucontext+0xf6/0x730
       ? lock_contended+0x11a0/0x11a0
       ? uverbs_free_xrcd+0xd2/0x360
       uverbs_free_xrcd+0xd2/0x360
       remove_commit_idr_uobject+0x6d/0x110
       uverbs_cleanup_ucontext+0x2f0/0x730
       ? sched_clock_cpu+0x18/0x200
       ? uverbs_close_fd+0x1c0/0x1c0
       ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120
       ib_uverbs_close+0xf2/0x570
       ? ib_uverbs_remove_one+0xb50/0xb50
       ? ib_uverbs_remove_one+0xb50/0xb50
       __fput+0x2cd/0x8d0
       task_work_run+0xec/0x1d0
       do_exit+0x6a1/0x1520
       ? fsnotify_first_mark+0x220/0x220
       ? exit_notify+0x9f0/0x9f0
       ? entry_SYSCALL_64_fastpath+0x5/0x8b
       ? entry_SYSCALL_64_fastpath+0x5/0x8b
       ? trace_hardirqs_on_thunk+0x1a/0x1c
       ? time_hardirqs_on+0x27/0x670
       ? time_hardirqs_off+0x27/0x490
       ? syscall_return_slowpath+0x6c/0x460
       ? entry_SYSCALL_64_fastpath+0x5/0x8b
       do_group_exit+0xe8/0x380
       SyS_exit_group+0x1e/0x20
       entry_SYSCALL_64_fastpath+0x1e/0x8b
      RIP: 0033:0x431ce9
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: <stable@vger.kernel.org> # 4.11
      Fixes: fd3c7904 ("IB/core: Change idr objects to use the new schema")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1ff5325c
    • L
      RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcd · 5c2e1c4f
      Leon Romanovsky 提交于
      There is no matching lock for this mutex. Git history suggests this is
      just a missed remnant from an earlier version of the function before
      this locking was moved into uverbs_free_xrcd.
      
      Originally this lock was protecting the xrcd_table_delete()
      
      =====================================
      WARNING: bad unlock balance detected!
      4.15.0+ #87 Not tainted
      -------------------------------------
      syzkaller223405/269 is trying to release lock (&uverbs_dev->xrcd_tree_mutex) at:
      [<00000000b8703372>] ib_uverbs_close_xrcd+0x195/0x1f0
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by syzkaller223405/269:
       #0:  (&uverbs_dev->disassociate_srcu){....}, at: [<000000005af3b960>] ib_uverbs_write+0x265/0xef0
      
      stack backtrace:
      CPU: 0 PID: 269 Comm: syzkaller223405 Not tainted 4.15.0+ #87
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
       dump_stack+0xde/0x164
       ? dma_virt_map_sg+0x22c/0x22c
       ? ib_uverbs_write+0x265/0xef0
       ? console_unlock+0x502/0xbd0
       ? ib_uverbs_close_xrcd+0x195/0x1f0
       print_unlock_imbalance_bug+0x131/0x160
       lock_release+0x59d/0x1100
       ? ib_uverbs_close_xrcd+0x195/0x1f0
       ? lock_acquire+0x440/0x440
       ? lock_acquire+0x440/0x440
       __mutex_unlock_slowpath+0x88/0x670
       ? wait_for_completion+0x4c0/0x4c0
       ? rdma_lookup_get_uobject+0x145/0x2f0
       ib_uverbs_close_xrcd+0x195/0x1f0
       ? ib_uverbs_open_xrcd+0xdd0/0xdd0
       ib_uverbs_write+0x7f9/0xef0
       ? cyc2ns_read_end+0x10/0x10
       ? ib_uverbs_open_xrcd+0xdd0/0xdd0
       ? uverbs_devnode+0x110/0x110
       ? cyc2ns_read_end+0x10/0x10
       ? cyc2ns_read_end+0x10/0x10
       ? sched_clock_cpu+0x18/0x200
       __vfs_write+0x10d/0x700
       ? uverbs_devnode+0x110/0x110
       ? kernel_read+0x170/0x170
       ? __fget+0x358/0x5d0
       ? security_file_permission+0x93/0x260
       vfs_write+0x1b0/0x550
       SyS_write+0xc7/0x1a0
       ? SyS_read+0x1a0/0x1a0
       ? trace_hardirqs_on_thunk+0x1a/0x1c
       entry_SYSCALL_64_fastpath+0x1e/0x8b
      RIP: 0033:0x4335c9
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: <stable@vger.kernel.org> # 4.11
      Fixes: fd3c7904 ("IB/core: Change idr objects to use the new schema")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5c2e1c4f
    • L
      RDMA/restrack: Increment CQ restrack object before committing · 0cba0efc
      Leon Romanovsky 提交于
      Once the uobj is committed it is immediately possible another thread
      could destroy it, which worst case, can result in a use-after-free
      of the restrack objects.
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Fixes: 08f294a1 ("RDMA/core: Add resource tracking for create and destroy CQs")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0cba0efc
    • L
      RDMA/uverbs: Protect from command mask overflow · 3f802b16
      Leon Romanovsky 提交于
      The command number is not bounds checked against the command mask before it
      is shifted, resulting in an ubsan hit. This does not cause malfunction since
      the command number is eventually bounds checked, but we can make this ubsan
      clean by moving the bounds check to before the mask check.
      
      ================================================================================
      UBSAN: Undefined behaviour in
      drivers/infiniband/core/uverbs_main.c:647:21
      shift exponent 207 is too large for 64-bit type 'long long unsigned int'
      CPU: 0 PID: 446 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #61
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
      dump_stack+0xde/0x164
      ? dma_virt_map_sg+0x22c/0x22c
      ubsan_epilogue+0xe/0x81
      __ubsan_handle_shift_out_of_bounds+0x293/0x2f7
      ? debug_check_no_locks_freed+0x340/0x340
      ? __ubsan_handle_load_invalid_value+0x19b/0x19b
      ? lock_acquire+0x440/0x440
      ? lock_acquire+0x19d/0x440
      ? __might_fault+0xf4/0x240
      ? ib_uverbs_write+0x68d/0xe20
      ib_uverbs_write+0x68d/0xe20
      ? __lock_acquire+0xcf7/0x3940
      ? uverbs_devnode+0x110/0x110
      ? cyc2ns_read_end+0x10/0x10
      ? sched_clock_cpu+0x18/0x200
      ? sched_clock_cpu+0x18/0x200
      __vfs_write+0x10d/0x700
      ? uverbs_devnode+0x110/0x110
      ? kernel_read+0x170/0x170
      ? __fget+0x35b/0x5d0
      ? security_file_permission+0x93/0x260
      vfs_write+0x1b0/0x550
      SyS_write+0xc7/0x1a0
      ? SyS_read+0x1a0/0x1a0
      ? trace_hardirqs_on_thunk+0x1a/0x1c
      entry_SYSCALL_64_fastpath+0x18/0x85
      RIP: 0033:0x448e29
      RSP: 002b:00007f033f567c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f033f5686bc RCX: 0000000000448e29
      RDX: 0000000000000060 RSI: 0000000020001000 RDI: 0000000000000012
      RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000056a0 R14: 00000000006e8740 R15: 0000000000000000
      ================================================================================
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: <stable@vger.kernel.org> # 4.5
      Fixes: 2dbd5186 ("IB/core: IB/core: Allow legacy verbs through extended interfaces")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3f802b16
    • J
      IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy · ec6f8401
      Jason Gunthorpe 提交于
      If remove_commit fails then the lock is left locked while the uobj still
      exists. Eventually the kernel will deadlock.
      
      lockdep detects this and says:
      
       test/4221 is leaving the kernel with locks still held!
       1 lock held by test/4221:
        #0:  (&ucontext->cleanup_rwsem){.+.+}, at: [<000000001e5c7523>] rdma_explicit_destroy+0x37/0x120 [ib_uverbs]
      
      Fixes: 4da70da2 ("IB/core: Explicitly destroy an object while keeping uobject")
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ec6f8401
    • J
      IB/uverbs: Improve lockdep_check · 104f268d
      Jason Gunthorpe 提交于
      This is really being used as an assert that the expected usecnt
      is being held and implicitly that the usecnt is valid. Rename it to
      assert_uverbs_usecnt and tighten the checks to only accept valid
      values of usecnt (eg 0 and < -1 are invalid).
      
      The tigher checkes make the assertion cover more cases and is more
      likely to find bugs via syzkaller/etc.
      
      Fixes: 38321256 ("IB/core: Add support for idr types")
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      104f268d
    • L
      RDMA/uverbs: Protect from races between lookup and destroy of uobjects · 6623e3e3
      Leon Romanovsky 提交于
      The race is between lookup_get_idr_uobject and
      uverbs_idr_remove_uobj -> uverbs_uobject_put.
      
      We deliberately do not call sychronize_rcu after the idr_remove in
      uverbs_idr_remove_uobj for performance reasons, instead we call
      kfree_rcu() during uverbs_uobject_put.
      
      However, this means we can obtain pointers to uobj's that have
      already been released and must protect against krefing them
      using kref_get_unless_zero.
      
      ==================================================================
      BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
      Read of size 4 at addr ffff88005fda1ac8 by task syz-executor2/441
      
      CPU: 1 PID: 441 Comm: syz-executor2 Not tainted 4.15.0-rc2+ #56
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
      dump_stack+0x8d/0xd4
      print_address_description+0x73/0x290
      kasan_report+0x25c/0x370
      ? copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
      copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
      ? uverbs_try_lock_object+0x68/0xc0
      ? modify_qp.isra.7+0xdc4/0x10e0
      modify_qp.isra.7+0xdc4/0x10e0
      ib_uverbs_modify_qp+0xfe/0x170
      ? ib_uverbs_query_qp+0x970/0x970
      ? __lock_acquire+0xa11/0x1da0
      ib_uverbs_write+0x55a/0xad0
      ? ib_uverbs_query_qp+0x970/0x970
      ? ib_uverbs_query_qp+0x970/0x970
      ? ib_uverbs_open+0x760/0x760
      ? futex_wake+0x147/0x410
      ? sched_clock_cpu+0x18/0x180
      ? check_prev_add+0x1680/0x1680
      ? do_futex+0x3b6/0xa30
      ? sched_clock_cpu+0x18/0x180
      __vfs_write+0xf7/0x5c0
      ? ib_uverbs_open+0x760/0x760
      ? kernel_read+0x110/0x110
      ? lock_acquire+0x370/0x370
      ? __fget+0x264/0x3b0
      vfs_write+0x18a/0x460
      SyS_write+0xc7/0x1a0
      ? SyS_read+0x1a0/0x1a0
      ? trace_hardirqs_on_thunk+0x1a/0x1c
      entry_SYSCALL_64_fastpath+0x18/0x85
      RIP: 0033:0x448e29
      RSP: 002b:00007f443fee0c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f443fee16bc RCX: 0000000000448e29
      RDX: 0000000000000078 RSI: 00000000209f8000 RDI: 0000000000000012
      RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000008e98 R14: 00000000006ebf38 R15: 0000000000000000
      
      Allocated by task 1:
      kmem_cache_alloc_trace+0x16c/0x2f0
      mlx5_alloc_cmd_msg+0x12e/0x670
      cmd_exec+0x419/0x1810
      mlx5_cmd_exec+0x40/0x70
      mlx5_core_mad_ifc+0x187/0x220
      mlx5_MAD_IFC+0xd7/0x1b0
      mlx5_query_mad_ifc_gids+0x1f3/0x650
      mlx5_ib_query_gid+0xa4/0xc0
      ib_query_gid+0x152/0x1a0
      ib_query_port+0x21e/0x290
      mlx5_port_immutable+0x30f/0x490
      ib_register_device+0x5dd/0x1130
      mlx5_ib_add+0x3e7/0x700
      mlx5_add_device+0x124/0x510
      mlx5_register_interface+0x11f/0x1c0
      mlx5_ib_init+0x56/0x61
      do_one_initcall+0xa3/0x250
      kernel_init_freeable+0x309/0x3b8
      kernel_init+0x14/0x180
      ret_from_fork+0x24/0x30
      
      Freed by task 1:
      kfree+0xeb/0x2f0
      mlx5_free_cmd_msg+0xcd/0x140
      cmd_exec+0xeba/0x1810
      mlx5_cmd_exec+0x40/0x70
      mlx5_core_mad_ifc+0x187/0x220
      mlx5_MAD_IFC+0xd7/0x1b0
      mlx5_query_mad_ifc_gids+0x1f3/0x650
      mlx5_ib_query_gid+0xa4/0xc0
      ib_query_gid+0x152/0x1a0
      ib_query_port+0x21e/0x290
      mlx5_port_immutable+0x30f/0x490
      ib_register_device+0x5dd/0x1130
      mlx5_ib_add+0x3e7/0x700
      mlx5_add_device+0x124/0x510
      mlx5_register_interface+0x11f/0x1c0
      mlx5_ib_init+0x56/0x61
      do_one_initcall+0xa3/0x250
      kernel_init_freeable+0x309/0x3b8
      kernel_init+0x14/0x180
      ret_from_fork+0x24/0x30
      
      The buggy address belongs to the object at ffff88005fda1ab0
      which belongs to the cache kmalloc-32 of size 32
      The buggy address is located 24 bytes inside of
      32-byte region [ffff88005fda1ab0, ffff88005fda1ad0)
      The buggy address belongs to the page:
      page:00000000d5655c19 count:1 mapcount:0 mapping: (null)
      index:0xffff88005fda1fc0
      flags: 0x4000000000000100(slab)
      raw: 4000000000000100 0000000000000000 ffff88005fda1fc0 0000000180550008
      raw: ffffea00017f6780 0000000400000004 ffff88006c803980 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
      ffff88005fda1980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
      ffff88005fda1a00: fb fb fc fc fb fb fb fb fc fc 00 00 00 00 fc fc
      ffff88005fda1a80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb
      ffff88005fda1b00: fc fc 00 00 00 00 fc fc fb fb fb fb fc fc fb fb
      ffff88005fda1b80: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
      ==================================================================@
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: <stable@vger.kernel.org> # 4.11
      Fixes: 38321256 ("IB/core: Add support for idr types")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6623e3e3
    • J
      IB/uverbs: Hold the uobj write lock after allocate · d9dc7a35
      Jason Gunthorpe 提交于
      This clarifies the design intention that time between allocate and
      commit has the uobj exclusive to the caller. We already guarantee
      this by delaying publishing the uobj pointer via idr_insert,
      fd_install, list_add, etc.
      
      Additionally holding the usecnt lock during this period provides
      extra clarity and more protection against future mistakes.
      
      Fixes: 38321256 ("IB/core: Add support for idr types")
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d9dc7a35
    • M
      IB/uverbs: Fix possible oops with duplicate ioctl attributes · 4d39a959
      Matan Barak 提交于
      If the same attribute is listed twice by the user in the ioctl attribute
      list then error unwind can cause the kernel to deref garbage.
      
      This happens when an object with WRITE access is sent twice. The second
      parse properly fails but corrupts the state required for the error unwind
      it triggers.
      
      Fixing this by making duplicates in the attribute list invalid. This is
      not something we need to support.
      
      The ioctl interface is currently recommended to be disabled in kConfig.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4d39a959
    • M
      IB/uverbs: Add ioctl support for 32bit processes · 9dfb2ff4
      Matan Barak 提交于
      32 bit processes running on a 64 bit kernel call compat_ioctl so that
      implementations can revise any structure layout issues. Point compat_ioctl
      at our normal ioctl because:
      
      - All our structures are designed to be the same on 32 and 64 bit, ie we
        use __aligned_u64 when required and are careful to manage padding.
      
      - Any pointers are stored in u64's and userspace is expected
        to prepare them properly.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      9dfb2ff4
    • J
      IB/uverbs: Use __aligned_u64 for uapi headers · 5d2beb57
      Jason Gunthorpe 提交于
      This has no impact on the structure layout since these structs already
      have their u64s already properly aligned, but it does document that we
      have this requirement for 32 bit compatibility.
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5d2beb57
    • M
      IB/uverbs: Fix method merging in uverbs_ioctl_merge · 3d89459e
      Matan Barak 提交于
      Fix a bug in uverbs_ioctl_merge that looked at the object's iterator
      number instead of the method's iterator number when merging methods.
      
      While we're at it, make the uverbs_ioctl_merge code a bit more clear
      and faster.
      
      Fixes: 118620d3 ('IB/core: Add uverbs merge trees functionality')
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3d89459e
    • J
      IB/uverbs: Use u64_to_user_ptr() not a union · 2f36028c
      Jason Gunthorpe 提交于
      The union approach will get the endianness wrong sometimes if the kernel's
      pointer size is 32 bits resulting in EFAULTs when trying to copy to/from
      user.
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2f36028c
    • J
      IB/uverbs: Use inline data transfer for UHW_IN · 6c976c30
      Jason Gunthorpe 提交于
      The rule for the API is pointers less than 8 bytes are inlined into
      the .data field of the attribute. Fix the creation of the driver udata
      struct to follow this rule and point to the .data itself when the size
      is less than 8 bytes.
      
      Otherwise if the UHW struct is less than 8 bytes the driver will get
      EFAULT during copy_from_user.
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6c976c30
    • M
      IB/uverbs: Always use the attribute size provided by the user · 89d9e8d3
      Matan Barak 提交于
      This fixes several bugs around the copy_to/from user path:
       - copy_to used the user provided size of the attribute
         and could copy data beyond the end of the kernel buffer into
         userspace.
       - copy_from didn't know the size of the kernel buffer and
         could have left kernel memory unexpectedly un-initialized.
       - copy_from did not use the user length to determine if the
         attribute data is inlined or not.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      89d9e8d3
    • L
      RDMA/restrack: Remove unimplemented XRCD object · 415bb699
      Leon Romanovsky 提交于
      Resource tracking of XRCD objects is not implemented in current
      version of restrack and hence can be removed.
      
      Fixes: 02d8883f ("RDMA/restrack: Add general infrastructure to track RDMA resources")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      415bb699
    • A
      IB/ipoib: Do not warn if IPoIB debugfs doesn't exist · 14fa91e0
      Alaa Hleihel 提交于
      netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event
      multiple times until all refs are gone, which will result in calling
      ipoib_delete_debug_files multiple times and printing a warning.
      
      Remove the WARN_ONCE since checks of NULL pointers before calling
      debugfs_remove are not needed.
      
      Fixes: 771a5258 ("IB/IPoIB: ibX: failed to create mcg debug file")
      Signed-off-by: NAlaa Hleihel <alaa@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      14fa91e0
  2. 12 2月, 2018 7 次提交
    • L
      Linux 4.16-rc1 · 7928b2cb
      Linus Torvalds 提交于
      7928b2cb
    • A
      unify {de,}mangle_poll(), get rid of kernel-side POLL... · 7a163b21
      Al Viro 提交于
      except, again, POLLFREE and POLL_BUSY_LOOP.
      
      With this, we finally get to the promised end result:
      
       - POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
         stray instances of ->poll() still using those will be caught by
         sparse.
      
       - eventpoll.c and select.c warning-free wrt __poll_t
      
       - no more kernel-side definitions of POLL... - userland ones are
         visible through the entire kernel (and used pretty much only for
         mangle/demangle)
      
       - same behavior as after the first series (i.e. sparc et.al. epoll(2)
         working correctly).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a163b21
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
    • L
      Merge branch 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · ee5daa13
      Linus Torvalds 提交于
      Pull more poll annotation updates from Al Viro:
       "This is preparation to solving the problems you've mentioned in the
        original poll series.
      
        After this series, the kernel is ready for running
      
            for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
                  L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
                  for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
            done
      
        as a for bulk search-and-replace.
      
        After that, the kernel is ready to apply the patch to unify
        {de,}mangle_poll(), and then get rid of kernel-side POLL... uses
        entirely, and we should be all done with that stuff.
      
        Basically, that's what you suggested wrt KPOLL..., except that we can
        use EPOLL... instead - they already are arch-independent (and equal to
        what is currently kernel-side POLL...).
      
        After the preparations (in this series) switch to returning EPOLL...
        from ->poll() instances is completely mechanical and kernel-side
        POLL... can go away. The last step (killing kernel-side POLL... and
        unifying {de,}mangle_poll() has to be done after the
        search-and-replace job, since we need userland-side POLL... for
        unified {de,}mangle_poll(), thus the cherry-pick at the last step.
      
        After that we will have:
      
         - POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
           ->poll() still using those will be caught by sparse.
      
         - eventpoll.c and select.c warning-free wrt __poll_t
      
         - no more kernel-side definitions of POLL... - userland ones are
           visible through the entire kernel (and used pretty much only for
           mangle/demangle)
      
         - same behavior as after the first series (i.e. sparc et.al. epoll(2)
           working correctly)"
      
      * 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        annotate ep_scan_ready_list()
        ep_send_events_proc(): return result via esed->res
        preparation to switching ->poll() to returning EPOLL...
        add EPOLLNVAL, annotate EPOLL... and event_poll->event
        use linux/poll.h instead of asm/poll.h
        xen: fix poll misannotation
        smc: missing poll annotations
      ee5daa13
    • L
      Merge tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa · 3fc928dc
      Linus Torvalds 提交于
      Pull xtense fix from Max Filippov:
       "Build fix for xtensa architecture with KASAN enabled"
      
      * tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: fix build with KASAN
      3fc928dc
    • L
      Merge tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 · 60d7a21a
      Linus Torvalds 提交于
      Pull nios2 update from Ley Foon Tan:
      
       - clean up old Kconfig options from defconfig
      
       - remove leading 0x and 0s from bindings notation in dts files
      
      * tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
        nios2: defconfig: Cleanup from old Kconfig options
        nios2: dts: Remove leading 0x and 0s from bindings notation
      60d7a21a
    • M
      xtensa: fix build with KASAN · f8d0cbf2
      Max Filippov 提交于
      The commit 917538e2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT
      usage") removed KASAN_SHADOW_SCALE_SHIFT definition from
      include/linux/kasan.h and added it to architecture-specific headers,
      except for xtensa. This broke the xtensa build with KASAN enabled.
      Define KASAN_SHADOW_SCALE_SHIFT in arch/xtensa/include/asm/kasan.h
      
      Reported by: kbuild test robot <fengguang.wu@intel.com>
      Fixes: 917538e2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage")
      Acked-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      f8d0cbf2
  3. 11 2月, 2018 8 次提交
    • K
      nios2: defconfig: Cleanup from old Kconfig options · e0691ebb
      Krzysztof Kozlowski 提交于
      Remove old, dead Kconfig option INET_LRO. It is gone since
      commit 7bbf3cae ("ipv4: Remove inet_lro library").
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Acked-by: NLey Foon Tan <ley.foon.tan@intel.com>
      e0691ebb
    • M
      nios2: dts: Remove leading 0x and 0s from bindings notation · 5d13c731
      Mathieu Malaterre 提交于
      Improve the DTS files by removing all the leading "0x" and zeros to fix the
      following dtc warnings:
      
      Warning (unit_address_format): Node /XXX unit name should not have leading "0x"
      
      and
      
      Warning (unit_address_format): Node /XXX unit name should not have leading 0s
      
      Converted using the following command:
      
      find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +
      
      For simplicity, two sed expressions were used to solve each warnings separately.
      
      To make the regex expression more robust a few other issues were resolved,
      namely setting unit-address to lower case, and adding a whitespace before the
      the opening curly brace:
      
      https://elinux.org/Device_Tree_Linux#Linux_conventions
      
      This is a follow up to commit 4c9847b7 ("dt-bindings: Remove leading 0x from bindings notation")
      Reported-by: NDavid Daney <ddaney@caviumnetworks.com>
      Suggested-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Acked-by: NLey Foon Tan <ley.foon.tan@intel.com>
      5d13c731
    • L
      Merge tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · d48fcbd8
      Linus Torvalds 提交于
      Pull PCI fix from Bjorn Helgaas:
       "Fix a POWER9/powernv INTx regression from the merge window (Alexey
        Kardashevskiy)"
      
      * tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        powerpc/pci: Fix broken INTx configuration via OF
      d48fcbd8
    • L
      Merge tag 'for-linus-20180210' of git://git.kernel.dk/linux-block · 9454473c
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "A few fixes to round off the merge window on the block side:
      
         - a set of bcache fixes by way of Michael Lyle, from the usual bcache
           suspects.
      
         - add a simple-to-hook-into function for bpf EIO error injection.
      
         - fix blk-wbt that mischarectized flushes as reads. Improve the logic
           so that flushes and writes are accounted as writes, and only reads
           as reads. From me.
      
         - fix requeue crash in BFQ, from Paolo"
      
      * tag 'for-linus-20180210' of git://git.kernel.dk/linux-block:
        block, bfq: add requeue-request hook
        bcache: fix for data collapse after re-attaching an attached device
        bcache: return attach error when no cache set exist
        bcache: set writeback_rate_update_seconds in range [1, 60] seconds
        bcache: fix for allocator and register thread race
        bcache: set error_limit correctly
        bcache: properly set task state in bch_writeback_thread()
        bcache: fix high CPU occupancy during journal
        bcache: add journal statistic
        block: Add should_fail_bio() for bpf error injection
        blk-wbt: account flush requests correctly
      9454473c
    • L
      Merge tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86 · cc5cb5af
      Linus Torvalds 提交于
      Pull x86 platform driver updates from Darren Hart:
       "Mellanox fixes and new system type support.
      
        Mostly data for new system types with a correction and an
        uninitialized variable fix"
      
      [ Pulling from github because git.infradead.org currently seems to be
        down for some reason, but Darren had a backup location    - Linus ]
      
      * tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86:
        platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems
        platform/x86: mlx-platform: Add support for new msn201x system type
        platform/x86: mlx-platform: Add support for new msn274x system type
        platform/x86: mlx-platform: Fix power cable setting for msn21xx family
        platform/x86: mlx-platform: Add define for the negative bus
        platform/x86: mlx-platform: Use defines for bus assignment
        platform/mellanox: mlxreg-hotplug: Fix uninitialized variable
      cc5cb5af
    • L
      Merge tag 'chrome-platform-for-linus-4.16' of... · e9d46f74
      Linus Torvalds 提交于
      Merge tag 'chrome-platform-for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform
      
      Pull chrome platform updates from Benson Leung:
      
       - move cros_ec_dev to drivers/mfd
      
       - other small maintenance fixes
      
      [ The cros_ec_dev movement came in earlier through the MFD tree  - Linus ]
      
      * tag 'chrome-platform-for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
        platform/chrome: Use proper protocol transfer function
        platform/chrome: cros_ec_lpc: Add support for Google Glimmer
        platform/chrome: cros_ec_lpc: Register the driver if ACPI entry is missing.
        platform/chrome: cros_ec_lpc: remove redundant pointer request
        cros_ec: fix nul-termination for firmware build info
        platform/chrome: chromeos_laptop: make chromeos_laptop const
      e9d46f74
    • L
      Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 15303ba5
      Linus Torvalds 提交于
      Pull KVM updates from Radim Krčmář:
       "ARM:
      
         - icache invalidation optimizations, improving VM startup time
      
         - support for forwarded level-triggered interrupts, improving
           performance for timers and passthrough platform devices
      
         - a small fix for power-management notifiers, and some cosmetic
           changes
      
        PPC:
      
         - add MMIO emulation for vector loads and stores
      
         - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
           requiring the complex thread synchronization of older CPU versions
      
         - improve the handling of escalation interrupts with the XIVE
           interrupt controller
      
         - support decrement register migration
      
         - various cleanups and bugfixes.
      
        s390:
      
         - Cornelia Huck passed maintainership to Janosch Frank
      
         - exitless interrupts for emulated devices
      
         - cleanup of cpuflag handling
      
         - kvm_stat counter improvements
      
         - VSIE improvements
      
         - mm cleanup
      
        x86:
      
         - hypervisor part of SEV
      
         - UMIP, RDPID, and MSR_SMI_COUNT emulation
      
         - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
      
         - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
           AVX512 features
      
         - show vcpu id in its anonymous inode name
      
         - many fixes and cleanups
      
         - per-VCPU MSR bitmaps (already merged through x86/pti branch)
      
         - stable KVM clock when nesting on Hyper-V (merged through
           x86/hyperv)"
      
      * tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
        KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
        KVM: PPC: Book3S HV: Branch inside feature section
        KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
        KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
        KVM: PPC: Book3S PR: Fix broken select due to misspelling
        KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
        KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
        KVM: PPC: Book3S HV: Drop locks before reading guest memory
        kvm: x86: remove efer_reload entry in kvm_vcpu_stat
        KVM: x86: AMD Processor Topology Information
        x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
        kvm: embed vcpu id to dentry of vcpu anon inode
        kvm: Map PFN-type memory regions as writable (if possible)
        x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
        KVM: arm/arm64: Fixup userspace irqchip static key optimization
        KVM: arm/arm64: Fix userspace_irqchip_in_use counting
        KVM: arm/arm64: Fix incorrect timer_is_pending logic
        MAINTAINERS: update KVM/s390 maintainers
        MAINTAINERS: add Halil as additional vfio-ccw maintainer
        MAINTAINERS: add David as a reviewer for KVM/s390
        ...
      15303ba5
    • A
      powerpc/pci: Fix broken INTx configuration via OF · c591c2e3
      Alexey Kardashevskiy 提交于
      59f47eff ("powerpc/pci: Use of_irq_parse_and_map_pci() helper")
      replaced of_irq_parse_pci() + irq_create_of_mapping() with
      of_irq_parse_and_map_pci(), but neglected to capture the virq
      returned by irq_create_of_mapping(), so virq remained zero, which
      caused INTx configuration to fail.
      
      Save the virq value returned by of_irq_parse_and_map_pci() and correct
      the virq declaration to match the of_irq_parse_and_map_pci() signature.
      
      Fixes: 59f47eff "powerpc/pci: Use of_irq_parse_and_map_pci() helper"
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [bhelgaas: changelog]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      c591c2e3
  4. 10 2月, 2018 8 次提交