1. 24 5月, 2018 1 次提交
  2. 10 5月, 2018 2 次提交
    • S
      IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support · 5d18ee67
      Sebastian Sanchez 提交于
      Currently the driver doesn't support completion vectors. These
      are used to indicate which sets of CQs should be grouped together
      into the same vector. A vector is a CQ processing thread that
      runs on a specific CPU.
      
      If an application has several CQs bound to different completion
      vectors, and each completion vector runs on different CPUs, then
      the completion queue workload is balanced. This helps scale as more
      nodes are used.
      
      Implement CQ completion vector support using a global workqueue
      where a CQ entry is queued to the CPU corresponding to the CQ's
      completion vector. Since the workqueue is global, it's guaranteed
      to always be there when queueing CQ entries; Therefore, the RCU
      locking for cq->rdi->worker in the hot path is superfluous.
      
      Each completion vector is assigned to a different CPU. The number of
      completion vectors available is computed by taking the number of
      online, physical CPUs from the local NUMA node and subtracting the
      CPUs used for kernel receive queues and the general interrupt.
      Special use cases:
      
        * If there are no CPUs left for completion vectors, the same CPU
          for the general interrupt is used; Therefore, there would only
          be one completion vector available.
      
        * For multi-HFI systems, the number of completion vectors available
          for each device is the total number of completion vectors in
          the local NUMA node divided by the number of devices in the same
          NUMA node. If there's a division remainder, the first device to
          get initialized gets an extra completion vector.
      
      Upon a CQ creation, an invalid completion vector could be specified.
      Handle it as follows:
      
        * If the completion vector is less than 0, set it to 0.
      
        * Set the completion vector to the result of the passed completion
          vector moded with the number of device completion vectors
          available.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5d18ee67
    • A
      IB/{hfi1, qib}: Add handling of kernel restart · 8d3e7113
      Alex Estrin 提交于
      A warm restart will fail to unload the driver, leaving link state
      potentially flapping up to the point the BIOS resets the adapter.
      Correct the issue by hooking the shutdown pci method,
      which will bring port down.
      
      Cc: <stable@vger.kernel.org> # 4.9.x
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8d3e7113
  3. 04 5月, 2018 3 次提交
  4. 02 2月, 2018 4 次提交
    • K
      IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node · 953a9ceb
      Kamenee Arumugam 提交于
      Kzalloc_node API doesn't check for overflows in size multiplication.
      While kcalloc API check for overflows in size multiplication
      but these implementations are not NUMA-aware.
      
      This conversion allowed for correcting an allocation used in the hot
      path to be on the local NUMA and ensure us overflow free multiplication
      for the size of a memory allocation.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      953a9ceb
    • K
      IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times · 07190076
      Kamenee Arumugam 提交于
      HFI's counters SendWaitCnt and SendWaitVlCnt are in units
      of TXE cycle time (at 805MHz). OPA counters PortXmitWait and
      PortVLXmtWait are in units of flit times.
      Convert the counter values to flit units using following
      conversion formula:
      
      PortXmitWait =
      	SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
      PortVLXmitWait =
      	SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed)
      
      At link up or downgrade events, the link width can change. To ensure
      accurate counter calculations, sample the counters after the events,
      during counter requests, and then aggregate the OPA counters.
      Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NKamenee Arumugam <kamenee.arumugam@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      07190076
    • A
      IB/hfi1: Fix for early release of sdma context · 473291b3
      Alex Estrin 提交于
      With IRQF_SHARED flag set and CONFIG_DEBUG_SHIRQ enabled
      module removal may result in panic in sdma_interrupt() routine
      if associated sdma context was released before pci_free_irq();
      
      [ 9198.939885] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 9198.940514] IP: sdma_make_progress+0xa5/0x450 [hfi1]
      [ 9198.941114] PGD 170bdc0067 P4D 170bdc0067 PUD 172063e067 PMD 0
      [ 9198.941783] Oops: 0000 [#1] SMP
      .....
      [ 9198.958877] CPU: 132 PID: 64173 Comm: rmmod Tainted: G           OE   4.14.0-rc4+ #1
      [ 9198.961032] Hardware name: Intel Corporation S7200AP/S7200AP, BIOS S72C610.86B.01.02.0118.080620171935 08/06/2017
      [ 9198.963323] task: ffff9681397f0000 task.stack: ffffae1647c40000
      [ 9198.965695] RIP: 0010:sdma_make_progress+0xa5/0x450 [hfi1]
      [ 9198.968082] RSP: 0018:ffffae1647c43be8 EFLAGS: 00010046
      [ 9198.970503] RAX: 0000000000000000 RBX: ffff9680ce8b5ca8 RCX: 0000000000000000
      [ 9198.973006] RDX: 0000000000000000 RSI: 0000000001a00d28 RDI: ffff9680ce8b5ca0
      [ 9198.975546] RBP: ffffae1647c43c40 R08: ffff96814325ec00 R09: 00000000ffffffff
      [ 9198.978142] R10: 000000004325e501 R11: ffff96814325ec00 R12: ffff9680ce8b5c44
      [ 9198.980779] R13: ffff9680ce8b5ca0 R14: 0000000000000000 R15: ffff9680ce8b5b00
      [ 9198.983462] FS:  00007f31196ba740(0000) GS:ffff96819df00000(0000) knlGS:0000000000000000
      [ 9198.986231] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9198.989036] CR2: 0000000000000000 CR3: 000000170833f000 CR4: 00000000001406e0
      [ 9198.991911] Call Trace:
      [ 9198.994847]  sdma_engine_interrupt+0x82/0x100 [hfi1]
      [ 9198.997852]  sdma_interrupt+0x61/0xc0 [hfi1]
      [ 9199.000852]  __free_irq+0x1b3/0x2d0
      [ 9199.003873]  free_irq+0x35/0x70
      [ 9199.006909]  pci_free_irq+0x1c/0x30
      [ 9199.009999]  clean_up_interrupts+0x53/0xf0 [hfi1]
      [ 9199.013137]  hfi1_start_cleanup+0x117/0x190 [hfi1]
      [ 9199.016315]  postinit_cleanup+0x1d/0x270 [hfi1]
      [ 9199.019529]  remove_one+0x1f3/0x210 [hfi1]
      [ 9199.022738]  pci_device_remove+0x39/0xc0
      [ 9199.025974]  device_release_driver_internal+0x141/0x210
      [ 9199.029268]  driver_detach+0x3f/0x80
      [ 9199.032580]  bus_remove_driver+0x55/0xd0
      [ 9199.035931]  driver_unregister+0x2c/0x50
      [ 9199.039321]  pci_unregister_driver+0x2a/0xa0
      [ 9199.042755]  hfi1_mod_cleanup+0x10/0xb50 [hfi1]
      [ 9199.046196]  SyS_delete_module+0x171/0x250
      ...
      
      Fix by exporting sdma_clean() and removing from sdma_exit().
      sdma_exit() now just manipulates the engine state,
      leaving the memory free to sdma_clean() which is now called
      just before the dd is freed.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NMichael J Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      473291b3
    • M
      IB/hfi1: Re-order IRQ cleanup to address driver cleanup race · 82a97926
      Michael J. Ruhl 提交于
      The pci_request_irq() interfaces always adds the IRQF_SHARED bit to
      all IRQ requests.
      
      When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the
      IRQF_SHARED bit is set, a call to the IRQ handler is made from the
      __free_irq() function. This is testing a race condition between the
      IRQ cleanup and an IRQ racing the cleanup.  The HFI driver should be
      able to handle this race, but does not.
      
      This race can cause traces that start with this footprint:
      
      BUG: unable to handle kernel NULL pointer dereference at   (null)
      Call Trace:
       <hfi1 irq handler>
       ...
       __free_irq+0x1b3/0x2d0
       free_irq+0x35/0x70
       pci_free_irq+0x1c/0x30
       clean_up_interrupts+0x53/0xf0 [hfi1]
       hfi1_start_cleanup+0x122/0x190 [hfi1]
       postinit_cleanup+0x1d/0x280 [hfi1]
       remove_one+0x233/0x250 [hfi1]
       pci_device_remove+0x39/0xc0
      
      Export IRQ cleanup function so it can be called from other modules.
      
      Using the exported cleanup function:
      
        Re-order the driver cleanup code to clean up IRQ resources before
        other resources, eliminating the race.
      
        Re-order error path for init so that the race does not occur.
      
      Reduce severity on spurious error message for SDMA IRQs to info.
      Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
      Reviewed-by: NPatel Jay P <jay.p.patel@intel.com>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      82a97926
  5. 06 1月, 2018 1 次提交
  6. 14 11月, 2017 1 次提交
  7. 31 10月, 2017 1 次提交
  8. 18 10月, 2017 1 次提交
    • K
      IB/hfi1: Convert timers to use timer_setup() · 8064135e
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly. Switches test of .data field to
      .function, since .data will be going away.
      
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8064135e
  9. 27 9月, 2017 3 次提交
  10. 23 8月, 2017 3 次提交
  11. 01 8月, 2017 3 次提交
  12. 28 6月, 2017 2 次提交
  13. 05 5月, 2017 7 次提交
  14. 29 4月, 2017 3 次提交
  15. 21 4月, 2017 3 次提交
  16. 06 4月, 2017 1 次提交
  17. 19 2月, 2017 1 次提交