1. 06 9月, 2018 5 次提交
  2. 05 9月, 2018 3 次提交
    • S
      iw_cxgb4: only allow 1 flush on user qps · 308aa2b8
      Steve Wise 提交于
      Once the qp has been flushed, it cannot be flushed again.  The user qp
      flush logic wasn't enforcing it however.  The bug can cause
      touch-after-free crashes like:
      
      Unable to handle kernel paging request for data at address 0x000001ec
      Faulting instruction address: 0xc008000016069100
      Oops: Kernel access of bad area, sig: 11 [#1]
      ...
      NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
      LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
      Call Trace:
      [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
      [c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
      [c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
      [c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
      [c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
      [c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
      [c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
      [c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
      [c000000000444da4] __fput+0xe4/0x2f0
      
      So fix flush_qp() to only flush the wq once.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      308aa2b8
    • A
      IB/core: Release object lock if destroy failed · e4ff3d22
      Artemy Kovalyov 提交于
      The object lock was supposed to always be released during destroy, but
      when the destruction retry series was integrated with the destroy series
      it created a failure path that missed the unlock.
      
      Keep with convention, if destroy fails the caller must undo all locking.
      
      Fixes: 87ad80ab ("IB/uverbs: Consolidate uobject destruction")
      Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e4ff3d22
    • J
      RDMA/ucma: check fd type in ucma_migrate_id() · 0d23ba60
      Jann Horn 提交于
      The current code grabs the private_data of whatever file descriptor
      userspace has supplied and implicitly casts it to a `struct ucma_file *`,
      potentially causing a type confusion.
      
      This is probably fine in practice because the pointer is only used for
      comparisons, it is never actually dereferenced; and even in the
      comparisons, it is unlikely that a file from another filesystem would have
      a ->private_data pointer that happens to also be valid in this context.
      But ->private_data is not always guaranteed to be a valid pointer to an
      object owned by the file's filesystem; for example, some filesystems just
      cram numbers in there.
      
      Check the type of the supplied file descriptor to be safe, analogous to how
      other places in the kernel do it.
      
      Fixes: 88314e4d ("RDMA/cma: add support for rdma_migrate_id()")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0d23ba60
  3. 23 8月, 2018 1 次提交
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  4. 22 8月, 2018 1 次提交
    • A
      IB/ucm: fix UCM link error · 845b397a
      Arnd Bergmann 提交于
      Building UCM with CONFIG_INFINIBAND_USER_ACCESS=m results in a
      set of link errors including:
      
      drivers/infiniband/core/ucm.o: In function `ib_ucm_event_handler':
      ucm.c:(.text+0x6dc): undefined reference to `ib_copy_path_rec_to_user'
      drivers/infiniband/core/ucma.o: In function `ucma_event_handler':
      ucma.c:(.text+0xdc0): undefined reference to `ib_copy_ah_attr_to_user'
      
      To get it to build-test again, this makes the option itself a
      tristate, which lets Kconfig figure out the dependency correctly.
      
      Fixes: 486edfb1 ("IB/ucm: Fix compiling ucm.c")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      845b397a
  5. 21 8月, 2018 1 次提交
    • M
      IB/hfi1: Invalid NUMA node information can cause a divide by zero · c513de49
      Michael J. Ruhl 提交于
      If the system BIOS does not supply NUMA node information to the
      PCI devices, the NUMA node is selected by choosing the current
      node.
      
      This can lead to the following crash:
      
      divide error: 0000 SMP
      CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G          IOE
      ------------   3.10.0-693.21.1.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
      SE5C610.86B.01.01.0005.101720141054 10/17/2014
      Workqueue: events work_for_cpu_fn
      task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
      RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
      RSP: 0018:ffff88017448bbf8  EFLAGS: 00010246
      RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
      RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
      R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
      R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
      FS:  0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Call Trace:
       hfi1_init_dd+0x14b3/0x27a0 [hfi1]
       ? pcie_capability_write_word+0x46/0x70
       ? hfi1_pcie_init+0xc0/0x200 [hfi1]
       do_init_one+0x153/0x4c0 [hfi1]
       ? sched_clock_cpu+0x85/0xc0
       init_one+0x1b5/0x260 [hfi1]
       local_pci_probe+0x4a/0xb0
       work_for_cpu_fn+0x1a/0x30
       process_one_work+0x17f/0x440
       worker_thread+0x278/0x3c0
       ? manage_workers.isra.24+0x2a0/0x2a0
       kthread+0xd1/0xe0
       ? insert_kthread_work+0x40/0x40
       ret_from_fork+0x77/0xb0
       ? insert_kthread_work+0x40/0x40
      
      If the BIOS is not supplying NUMA information:
        - set the default table count to 1 for all possible nodes
        - select node 0 (instead of current NUMA) node to get consistent
          performance
        - generate an error indicating that the BIOS should be upgraded
      Reviewed-by: NGary Leshner <gary.s.leshner@intel.com>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c513de49
  6. 16 8月, 2018 6 次提交
  7. 15 8月, 2018 6 次提交
  8. 14 8月, 2018 1 次提交
    • J
      IB/ucm: Fix compiling ucm.c · 486edfb1
      Jason Gunthorpe 提交于
      Even though this interface is marked CONFIG_BROKEN we still expect it to
      compile, at least until we delete it completely.
      
      Also mark INFINIBAND_USER_ACCESS_UCM with COMPILE_TEST so these situations
      can be detected.
      
      Fixes: e7ff98ae ("RDMA/cma: Constify path record, ib_cm_event, listen_id pointers")
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      486edfb1
  9. 13 8月, 2018 5 次提交
    • J
      IB/uverbs: Do not check for device disassociation during ioctl · 4ce719f8
      Jason Gunthorpe 提交于
      Now that the ioctl path and uobjects are converted to use uverbs_api, it
      is now safe to remove the disassociation protection from the common ioctl
      code.
      
      This completes the work to make destroy functions continue to work even
      after device disassociation.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4ce719f8
    • J
      IB/uverbs: Remove struct uverbs_root_spec and all supporting code · 51d0a2b4
      Jason Gunthorpe 提交于
      Everything now uses the uverbs_uapi data structure.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      51d0a2b4
    • J
      IB/uverbs: Use uverbs_api to unmarshal ioctl commands · 3a863577
      Jason Gunthorpe 提交于
      Convert the ioctl method syscall path to use the uverbs_api data
      structures. The new uapi structure includes all the same information, just
      in a different and more optimal way.
      
       - Use attr_bkey instead of 2 level radix trees for everything related to
         attributes. This includes the attribute storage, presence, and
         detection of missing mandatory attributes.
       - Avoid iterating over all attribute storage at finish, instead use
         find_first_bit with the attr_bkey to locate only those attrs that need
         cleanup.
       - Organize things to always run, and always rely on, cleanup. This
         avoids a bunch of tricky error unwind cases.
       - Locate the method using the radix tree, and locate the attributes
         using a very efficient incremental radix tree lookup
       - Use the precomputed destroy_bkey to handle uobject destruction
       - Use the precomputed allocation sizes and precomputed 'need_stack'
         to avoid maths in the fast path. This is optimal if userspace
         does not pass (many) unsupported attributes.
      
      Overall this results in much better codegen for the attribute accessors,
      everything is now stored in bitmaps or linear arrays indexed by attr_bkey.
      The compiler can compute attr_bkey values at compile time for all method
      attributes, meaning things like uverbs_attr_is_valid() now compile into
      single instruction bit tests.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3a863577
    • J
      IB/uverbs: Use uverbs_alloc for allocations · b61815e2
      Jason Gunthorpe 提交于
      Several handlers need temporary allocations for the life of the method,
      switch them to use the uverbs_alloc allocator.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      b61815e2
    • J
      IB/uverbs: Add a simple allocator to uverbs_attr_bundle · 461bb2ee
      Jason Gunthorpe 提交于
      This is similar in spirit to devm, it keeps track of any allocations
      linked to this method call and ensures they are all freed when the method
      exits. Further, if there is space in the internal/onstack buffer then the
      allocator will hand out that memory and avoid an expensive call to
      kalloc/kfree in the syscall path.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      461bb2ee
  10. 11 8月, 2018 5 次提交
    • J
      IB/uverbs: Remove the ib_uverbs_attr pointer from each attr · 6a1f444f
      Jason Gunthorpe 提交于
      Memory in the bundle is valuable, do not waste it holding an 8 byte
      pointer for the rare case of writing to a PTR_OUT. We can compute the
      pointer by storing a small 1 byte array offset and the base address of the
      uattr memory in the bundle private memory.
      
      This also means we can access the kernel's copy of the ib_uverbs_attr, so
      drop the copy of flags as well.
      
      Since the uattr base should be private bundle information this also
      de-inlines the already too big uverbs_copy_to inline and moves
      create_udata into uverbs_ioctl.c so they can see the private struct
      definition.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      6a1f444f
    • J
      IB/uverbs: Provide implementation private memory for the uverbs_attr_bundle · 4b3dd2bb
      Jason Gunthorpe 提交于
      This already existed as the anonymous 'ctx' structure, but this was not
      really a useful form. Hoist this struct into bundle_priv and rework the
      internal things to use it instead.
      
      Move a bunch of the processing internal state into the priv and reduce the
      excessive use of function arguments.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      4b3dd2bb
    • J
      IB/uverbs: Use uverbs_api to manage the object type inside the uobject · 6b0d08f4
      Jason Gunthorpe 提交于
      Currently the struct uverbs_obj_type stored in the ib_uobject is part of
      the .rodata segment of the module that defines the object. This is a
      problem if drivers define new uapi objects as we will be left with a
      dangling pointer after device disassociation.
      
      Switch the uverbs_obj_type for struct uverbs_api_object, which is
      allocated memory that is part of the uverbs_api and is guaranteed to
      always exist. Further this moves the 'type_class' into this memory which
      means access to the IDR/FD function pointers is also guaranteed. Drivers
      cannot define new types.
      
      This makes it safe to continue to use all uobjects, including driver
      defined ones, after disassociation.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6b0d08f4
    • J
      IB/uverbs: Build the specs into a radix tree at runtime · 9ed3e5f4
      Jason Gunthorpe 提交于
      This radix tree datastructure is intended to replace the 'hash' structure
      used today for parsing ioctl methods during system calls. This first
      commit introduces the structure and builds it from the existing .rodata
      descriptions.
      
      The so-called hash arrangement is actually a 5 level open coded radix tree.
      This new version uses a 3 level radix tree built using the radix tree
      library.
      
      Overall this is much less code and much easier to build as the radix tree
      API allows for dynamic modification during the building. There is a small
      memory penalty to pay for this, but since the radix tree is allocated on
      a per device basis, a few kb of RAM seems immaterial considering the
      gained simplicity.
      
      The radix tree is similar to the existing tree, but also has a 'attr_bkey'
      concept, which is a small value'd index for each method attribute. This is
      used to simplify and improve performance of everything in the next
      patches.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      9ed3e5f4
    • J
      IB/uverbs: Have the core code create the uverbs_root_spec · 7d96c9b1
      Jason Gunthorpe 提交于
      There is no reason for drivers to do this, the core code should take of
      everything. The drivers will provide their information from rodata to
      describe their modifications to the core's base uapi specification.
      
      The core uses this to build up the runtime uapi for each device.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      7d96c9b1
  11. 10 8月, 2018 1 次提交
  12. 08 8月, 2018 4 次提交
  13. 03 8月, 2018 1 次提交