1. 02 4月, 2019 1 次提交
  2. 28 3月, 2019 1 次提交
  3. 07 3月, 2019 1 次提交
    • M
      IB/hfi1: Close race condition on user context disable and close · bc5add09
      Michael J. Ruhl 提交于
      When disabling and removing a receive context, it is possible for an
      asynchronous event (i.e IRQ) to occur.  Because of this, there is a race
      between cleaning up the context, and the context being used by the
      asynchronous event.
      
      cpu 0  (context cleanup)
          rc->ref_count-- (ref_count == 0)
          hfi1_rcd_free()
      cpu 1  (IRQ (with rcd index))
      	rcd_get_by_index()
      	lock
      	ref_count+++     <-- reference count race (WARNING)
      	return rcd
      	unlock
      cpu 0
          hfi1_free_ctxtdata() <-- incorrect free location
          lock
          remove rcd from array
          unlock
          free rcd
      
      This race will cause the following WARNING trace:
      
      WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
      CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      Call Trace:
        dump_stack+0x19/0x1b
        __warn+0xd8/0x100
        warn_slowpath_null+0x1d/0x20
        hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
        is_rcv_urgent_int+0x24/0x90 [hfi1]
        general_interrupt+0x1b6/0x210 [hfi1]
        __handle_irq_event_percpu+0x44/0x1c0
        handle_irq_event_percpu+0x32/0x80
        handle_irq_event+0x3c/0x60
        handle_edge_irq+0x7f/0x150
        handle_irq+0xe4/0x1a0
        do_IRQ+0x4d/0xf0
        common_interrupt+0x162/0x162
      
      The race can also lead to a use after free which could be similar to:
      
      general protection fault: 0000 1 SMP
      CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
      Call Trace:
        ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_aio_write+0xba/0x110 [hfi1]
        do_sync_readv_writev+0x7b/0xd0
        do_readv_writev+0xce/0x260
        ? handle_mm_fault+0x39d/0x9b0
        ? pick_next_task_fair+0x5f/0x1b0
        ? sched_clock_cpu+0x85/0xc0
        ? __schedule+0x13a/0x890
        vfs_writev+0x35/0x60
        SyS_writev+0x7f/0x110
        system_call_fastpath+0x22/0x27
      
      Use the appropriate kref API to verify access.
      
      Reorder context cleanup to ensure context removal before cleanup occurs
      correctly.
      
      Cc: stable@vger.kernel.org # v4.14.0+
      Fixes: f683c80c ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      bc5add09
  4. 06 3月, 2019 1 次提交
    • A
      mm: replace all open encodings for NUMA_NO_NODE · 98fa15f3
      Anshuman Khandual 提交于
      Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
      
      All these places for replacement were found by running the following
      grep patterns on the entire kernel code.  Please let me know if this
      might have missed some instances.  This might also have replaced some
      false positives.  I will appreciate suggestions, inputs and review.
      
      1. git grep "nid == -1"
      2. git grep "node == -1"
      3. git grep "nid = -1"
      4. git grep "node = -1"
      
      This patch (of 2):
      
      At present there are multiple places where invalid node number is
      encoded as -1.  Even though implicitly understood it is always better to
      have macros in there.  Replace these open encodings for an invalid node
      number with the global macro NUMA_NO_NODE.  This helps remove NUMA
      related assumptions like 'invalid node' from various places redirecting
      them to a common definition.
      
      Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
      Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
      Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98fa15f3
  5. 06 2月, 2019 3 次提交
  6. 01 2月, 2019 3 次提交
  7. 08 1月, 2019 1 次提交
  8. 04 10月, 2018 1 次提交
  9. 01 9月, 2018 3 次提交
  10. 04 7月, 2018 1 次提交
  11. 22 6月, 2018 4 次提交
  12. 20 6月, 2018 3 次提交
  13. 05 6月, 2018 3 次提交
  14. 24 5月, 2018 1 次提交
  15. 10 5月, 2018 2 次提交
    • S
      IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support · 5d18ee67
      Sebastian Sanchez 提交于
      Currently the driver doesn't support completion vectors. These
      are used to indicate which sets of CQs should be grouped together
      into the same vector. A vector is a CQ processing thread that
      runs on a specific CPU.
      
      If an application has several CQs bound to different completion
      vectors, and each completion vector runs on different CPUs, then
      the completion queue workload is balanced. This helps scale as more
      nodes are used.
      
      Implement CQ completion vector support using a global workqueue
      where a CQ entry is queued to the CPU corresponding to the CQ's
      completion vector. Since the workqueue is global, it's guaranteed
      to always be there when queueing CQ entries; Therefore, the RCU
      locking for cq->rdi->worker in the hot path is superfluous.
      
      Each completion vector is assigned to a different CPU. The number of
      completion vectors available is computed by taking the number of
      online, physical CPUs from the local NUMA node and subtracting the
      CPUs used for kernel receive queues and the general interrupt.
      Special use cases:
      
        * If there are no CPUs left for completion vectors, the same CPU
          for the general interrupt is used; Therefore, there would only
          be one completion vector available.
      
        * For multi-HFI systems, the number of completion vectors available
          for each device is the total number of completion vectors in
          the local NUMA node divided by the number of devices in the same
          NUMA node. If there's a division remainder, the first device to
          get initialized gets an extra completion vector.
      
      Upon a CQ creation, an invalid completion vector could be specified.
      Handle it as follows:
      
        * If the completion vector is less than 0, set it to 0.
      
        * Set the completion vector to the result of the passed completion
          vector moded with the number of device completion vectors
          available.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5d18ee67
    • A
      IB/{hfi1, qib}: Add handling of kernel restart · 8d3e7113
      Alex Estrin 提交于
      A warm restart will fail to unload the driver, leaving link state
      potentially flapping up to the point the BIOS resets the adapter.
      Correct the issue by hooking the shutdown pci method,
      which will bring port down.
      
      Cc: <stable@vger.kernel.org> # 4.9.x
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8d3e7113
  16. 04 5月, 2018 3 次提交
  17. 02 2月, 2018 4 次提交
    • K
      IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node · 953a9ceb
      Kamenee Arumugam 提交于
      Kzalloc_node API doesn't check for overflows in size multiplication.
      While kcalloc API check for overflows in size multiplication
      but these implementations are not NUMA-aware.
      
      This conversion allowed for correcting an allocation used in the hot
      path to be on the local NUMA and ensure us overflow free multiplication
      for the size of a memory allocation.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      953a9ceb
    • K
      IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times · 07190076
      Kamenee Arumugam 提交于
      HFI's counters SendWaitCnt and SendWaitVlCnt are in units
      of TXE cycle time (at 805MHz). OPA counters PortXmitWait and
      PortVLXmtWait are in units of flit times.
      Convert the counter values to flit units using following
      conversion formula:
      
      PortXmitWait =
      	SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
      PortVLXmitWait =
      	SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
      
      At link up or downgrade events, the link width can change. To ensure
      accurate counter calculations, sample the counters after the events,
      during counter requests, and then aggregate the OPA counters.
      Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      07190076
    • A
      IB/hfi1: Fix for early release of sdma context · 473291b3
      Alex Estrin 提交于
      With IRQF_SHARED flag set and CONFIG_DEBUG_SHIRQ enabled
      module removal may result in panic in sdma_interrupt() routine
      if associated sdma context was released before pci_free_irq();
      
      [ 9198.939885] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 9198.940514] IP: sdma_make_progress+0xa5/0x450 [hfi1]
      [ 9198.941114] PGD 170bdc0067 P4D 170bdc0067 PUD 172063e067 PMD 0
      [ 9198.941783] Oops: 0000 [#1] SMP
      .....
      [ 9198.958877] CPU: 132 PID: 64173 Comm: rmmod Tainted: G           OE   4.14.0-rc4+ #1
      [ 9198.961032] Hardware name: Intel Corporation S7200AP/S7200AP, BIOS S72C610.86B.01.02.0118.080620171935 08/06/2017
      [ 9198.963323] task: ffff9681397f0000 task.stack: ffffae1647c40000
      [ 9198.965695] RIP: 0010:sdma_make_progress+0xa5/0x450 [hfi1]
      [ 9198.968082] RSP: 0018:ffffae1647c43be8 EFLAGS: 00010046
      [ 9198.970503] RAX: 0000000000000000 RBX: ffff9680ce8b5ca8 RCX: 0000000000000000
      [ 9198.973006] RDX: 0000000000000000 RSI: 0000000001a00d28 RDI: ffff9680ce8b5ca0
      [ 9198.975546] RBP: ffffae1647c43c40 R08: ffff96814325ec00 R09: 00000000ffffffff
      [ 9198.978142] R10: 000000004325e501 R11: ffff96814325ec00 R12: ffff9680ce8b5c44
      [ 9198.980779] R13: ffff9680ce8b5ca0 R14: 0000000000000000 R15: ffff9680ce8b5b00
      [ 9198.983462] FS:  00007f31196ba740(0000) GS:ffff96819df00000(0000) knlGS:0000000000000000
      [ 9198.986231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9198.989036] CR2: 0000000000000000 CR3: 000000170833f000 CR4: 00000000001406e0
      [ 9198.991911] Call Trace:
      [ 9198.994847]  sdma_engine_interrupt+0x82/0x100 [hfi1]
      [ 9198.997852]  sdma_interrupt+0x61/0xc0 [hfi1]
      [ 9199.000852]  __free_irq+0x1b3/0x2d0
      [ 9199.003873]  free_irq+0x35/0x70
      [ 9199.006909]  pci_free_irq+0x1c/0x30
      [ 9199.009999]  clean_up_interrupts+0x53/0xf0 [hfi1]
      [ 9199.013137]  hfi1_start_cleanup+0x117/0x190 [hfi1]
      [ 9199.016315]  postinit_cleanup+0x1d/0x270 [hfi1]
      [ 9199.019529]  remove_one+0x1f3/0x210 [hfi1]
      [ 9199.022738]  pci_device_remove+0x39/0xc0
      [ 9199.025974]  device_release_driver_internal+0x141/0x210
      [ 9199.029268]  driver_detach+0x3f/0x80
      [ 9199.032580]  bus_remove_driver+0x55/0xd0
      [ 9199.035931]  driver_unregister+0x2c/0x50
      [ 9199.039321]  pci_unregister_driver+0x2a/0xa0
      [ 9199.042755]  hfi1_mod_cleanup+0x10/0xb50 [hfi1]
      [ 9199.046196]  SyS_delete_module+0x171/0x250
      ...
      
      Fix by exporting sdma_clean() and removing from sdma_exit().
      sdma_exit() now just manipulates the engine state,
      leaving the memory free to sdma_clean() which is now called
      just before the dd is freed.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NMichael J Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      473291b3
    • M
      IB/hfi1: Re-order IRQ cleanup to address driver cleanup race · 82a97926
      Michael J. Ruhl 提交于
      The pci_request_irq() interfaces always adds the IRQF_SHARED bit to
      all IRQ requests.
      
      When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the
      IRQF_SHARED bit is set, a call to the IRQ handler is made from the
      __free_irq() function. This is testing a race condition between the
      IRQ cleanup and an IRQ racing the cleanup.  The HFI driver should be
      able to handle this race, but does not.
      
      This race can cause traces that start with this footprint:
      
      BUG: unable to handle kernel NULL pointer dereference at   (null)
      Call Trace:
       <hfi1 irq handler>
       ...
       __free_irq+0x1b3/0x2d0
       free_irq+0x35/0x70
       pci_free_irq+0x1c/0x30
       clean_up_interrupts+0x53/0xf0 [hfi1]
       hfi1_start_cleanup+0x122/0x190 [hfi1]
       postinit_cleanup+0x1d/0x280 [hfi1]
       remove_one+0x233/0x250 [hfi1]
       pci_device_remove+0x39/0xc0
      
      Export IRQ cleanup function so it can be called from other modules.
      
      Using the exported cleanup function:
      
        Re-order the driver cleanup code to clean up IRQ resources before
        other resources, eliminating the race.
      
        Re-order error path for init so that the race does not occur.
      
      Reduce severity on spurious error message for SDMA IRQs to info.
      Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
      Reviewed-by: NPatel Jay P <jay.p.patel@intel.com>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      82a97926
  18. 06 1月, 2018 1 次提交
  19. 14 11月, 2017 1 次提交
  20. 31 10月, 2017 1 次提交
  21. 18 10月, 2017 1 次提交
    • K
      IB/hfi1: Convert timers to use timer_setup() · 8064135e
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly. Switches test of .data field to
      .function, since .data will be going away.
      
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8064135e