1. 03 7月, 2020 2 次提交
    • K
      IB/hfi1: Do not destroy link_wq when the device is shut down · 2315ec12
      Kaike Wan 提交于
      The workqueue link_wq should only be destroyed when the hfi1 driver is
      unloaded, not when the device is shut down.
      
      Fixes: 71d47008 ("IB/hfi1: Create workqueue for link events")
      Link: https://lore.kernel.org/r/20200623204053.107638.70315.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      2315ec12
    • K
      IB/hfi1: Do not destroy hfi1_wq when the device is shut down · 28b70cd9
      Kaike Wan 提交于
      The workqueue hfi1_wq is destroyed in function shutdown_device(), which is
      called by either shutdown_one() or remove_one(). The function
      shutdown_one() is called when the kernel is rebooted while remove_one() is
      called when the hfi1 driver is unloaded. When the kernel is rebooted,
      hfi1_wq is destroyed while all qps are still active, leading to a kernel
      crash:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
        IP: [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
        PGD 0
        Oops: 0000 [#1] SMP
        Modules linked in: dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) ib_isert iscsi_target_mod target_core_mod ib_ucm mlx4_ib iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm rpcrdma sunrpc irqbypass crc32_pclmul ghash_clmulni_intel rdma_ucm aesni_intel ib_uverbs lrw gf128mul opa_vnic glue_helper ablk_helper ib_iser cryptd ib_umad rdma_cm iw_cm ses enclosure libiscsi scsi_transport_sas pcspkr joydev ib_ipoib(OE) scsi_transport_iscsi ib_cm sg ipmi_ssif mei_me lpc_ich i2c_i801 mei ioatdma ipmi_si dm_multipath ipmi_devintf ipmi_msghandler wmi acpi_pad acpi_power_meter hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm hfi1(OE)
        crct10dif_pclmul crct10dif_common crc32c_intel drm ahci mlx4_core libahci rdmavt(OE) igb megaraid_sas ib_core libata drm_panel_orientation_quirks ptp pps_core devlink dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
        CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
        Hardware name: Phegda X2226A/S2600CW, BIOS SE5C610.86B.01.01.0024.021320181901 02/13/2018
        task: ffff8a799ba0d140 ti: ffff8a799bad8000 task.ti: ffff8a799bad8000
        RIP: 0010:[<ffffffff94cb7b02>] [<ffffffff94cb7b02>] __queue_work+0x32/0x3e0
        RSP: 0018:ffff8a90dde43d80 EFLAGS: 00010046
        RAX: 0000000000000082 RBX: 0000000000000086 RCX: 0000000000000000
        RDX: ffff8a90b924fcb8 RSI: 0000000000000000 RDI: 000000000000001b
        RBP: ffff8a90dde43db8 R08: ffff8a799ba0d6d8 R09: ffff8a90dde53900
        R10: 0000000000000002 R11: ffff8a90dde43de8 R12: ffff8a90b924fcb8
        R13: 000000000000001b R14: 0000000000000000 R15: ffff8a90d2890000
        FS: 0000000000000000(0000) GS:ffff8a90dde40000(0000) knlGS:0000000000000000
        CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000102 CR3: 0000001a70410000 CR4: 00000000001607e0
        Call Trace:
        [<ffffffff94cb8105>] queue_work_on+0x45/0x50
        [<ffffffffc03f781e>] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
        [<ffffffffc03f78a2>] hfi1_schedule_send+0x32/0x70 [hfi1]
        [<ffffffffc02cf2d9>] rvt_rc_timeout+0xe9/0x130 [rdmavt]
        [<ffffffff94ce563a>] ? trigger_load_balance+0x6a/0x280
        [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
        [<ffffffff94ca7f58>] call_timer_fn+0x38/0x110
        [<ffffffffc02cf1f0>] ? rvt_free_qpn+0x40/0x40 [rdmavt]
        [<ffffffff94caa3bd>] run_timer_softirq+0x24d/0x300
        [<ffffffff94ca0f05>] __do_softirq+0xf5/0x280
        [<ffffffff9537832c>] call_softirq+0x1c/0x30
        [<ffffffff94c2e675>] do_softirq+0x65/0xa0
        [<ffffffff94ca1285>] irq_exit+0x105/0x110
        [<ffffffff953796c8>] smp_apic_timer_interrupt+0x48/0x60
        [<ffffffff95375df2>] apic_timer_interrupt+0x162/0x170
        <EOI>
        [<ffffffff951adfb7>] ? cpuidle_enter_state+0x57/0xd0
        [<ffffffff951ae10e>] cpuidle_idle_call+0xde/0x230
        [<ffffffff94c366de>] arch_cpu_idle+0xe/0xc0
        [<ffffffff94cfc3ba>] cpu_startup_entry+0x14a/0x1e0
        [<ffffffff94c57db7>] start_secondary+0x1f7/0x270
        [<ffffffff94c000d5>] start_cpu+0x5/0x14
      
      The solution is to destroy the workqueue only when the hfi1 driver is
      unloaded, not when the device is shut down. In addition, when the device
      is shut down, no more work should be scheduled on the workqueues and the
      workqueues are flushed.
      
      Fixes: 8d3e7113 ("IB/{hfi1, qib}: Add handling of kernel restart")
      Link: https://lore.kernel.org/r/20200623204047.107638.77646.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      28b70cd9
  2. 21 5月, 2020 3 次提交
  3. 19 3月, 2020 1 次提交
  4. 10 1月, 2020 3 次提交
  5. 04 1月, 2020 1 次提交
  6. 07 11月, 2019 1 次提交
  7. 06 5月, 2019 1 次提交
    • M
      IB/hfi1: Fix WQ_MEM_RECLAIM warning · 4c4b1996
      Mike Marciniszyn 提交于
      The work_item cancels that occur when a QP is destroyed can elicit the
      following trace:
      
       workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1]
       WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100
       Call Trace:
        __flush_work.isra.29+0x8c/0x1a0
        ? __switch_to_asm+0x40/0x70
        __cancel_work_timer+0x103/0x190
        ? schedule+0x32/0x80
        iowait_cancel_work+0x15/0x30 [hfi1]
        rvt_reset_qp+0x1f8/0x3e0 [rdmavt]
        rvt_destroy_qp+0x65/0x1f0 [rdmavt]
        ? _cond_resched+0x15/0x30
        ib_destroy_qp+0xe9/0x230 [ib_core]
        ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib]
        process_one_work+0x171/0x370
        worker_thread+0x49/0x3f0
        kthread+0xf8/0x130
        ? max_active_store+0x80/0x80
        ? kthread_bind+0x10/0x10
        ret_from_fork+0x35/0x40
      
      Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM.
      
      The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
      entangled with memory reclaim, so this flag is appropriate.
      
      Fixes: 0a226edd ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines")
      Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4c4b1996
  8. 02 4月, 2019 1 次提交
  9. 28 3月, 2019 1 次提交
  10. 07 3月, 2019 1 次提交
    • M
      IB/hfi1: Close race condition on user context disable and close · bc5add09
      Michael J. Ruhl 提交于
      When disabling and removing a receive context, it is possible for an
      asynchronous event (i.e IRQ) to occur.  Because of this, there is a race
      between cleaning up the context, and the context being used by the
      asynchronous event.
      
      cpu 0  (context cleanup)
          rc->ref_count-- (ref_count == 0)
          hfi1_rcd_free()
      cpu 1  (IRQ (with rcd index))
      	rcd_get_by_index()
      	lock
      	ref_count+++     <-- reference count race (WARNING)
      	return rcd
      	unlock
      cpu 0
          hfi1_free_ctxtdata() <-- incorrect free location
          lock
          remove rcd from array
          unlock
          free rcd
      
      This race will cause the following WARNING trace:
      
      WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
      CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      Call Trace:
        dump_stack+0x19/0x1b
        __warn+0xd8/0x100
        warn_slowpath_null+0x1d/0x20
        hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
        is_rcv_urgent_int+0x24/0x90 [hfi1]
        general_interrupt+0x1b6/0x210 [hfi1]
        __handle_irq_event_percpu+0x44/0x1c0
        handle_irq_event_percpu+0x32/0x80
        handle_irq_event+0x3c/0x60
        handle_edge_irq+0x7f/0x150
        handle_irq+0xe4/0x1a0
        do_IRQ+0x4d/0xf0
        common_interrupt+0x162/0x162
      
      The race can also lead to a use after free which could be similar to:
      
      general protection fault: 0000 1 SMP
      CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
      Call Trace:
        ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_aio_write+0xba/0x110 [hfi1]
        do_sync_readv_writev+0x7b/0xd0
        do_readv_writev+0xce/0x260
        ? handle_mm_fault+0x39d/0x9b0
        ? pick_next_task_fair+0x5f/0x1b0
        ? sched_clock_cpu+0x85/0xc0
        ? __schedule+0x13a/0x890
        vfs_writev+0x35/0x60
        SyS_writev+0x7f/0x110
        system_call_fastpath+0x22/0x27
      
      Use the appropriate kref API to verify access.
      
      Reorder context cleanup to ensure context removal before cleanup occurs
      correctly.
      
      Cc: stable@vger.kernel.org # v4.14.0+
      Fixes: f683c80c ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      bc5add09
  11. 06 3月, 2019 1 次提交
    • A
      mm: replace all open encodings for NUMA_NO_NODE · 98fa15f3
      Anshuman Khandual 提交于
      Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
      
      All these places for replacement were found by running the following
      grep patterns on the entire kernel code.  Please let me know if this
      might have missed some instances.  This might also have replaced some
      false positives.  I will appreciate suggestions, inputs and review.
      
      1. git grep "nid == -1"
      2. git grep "node == -1"
      3. git grep "nid = -1"
      4. git grep "node = -1"
      
      This patch (of 2):
      
      At present there are multiple places where invalid node number is
      encoded as -1.  Even though implicitly understood it is always better to
      have macros in there.  Replace these open encodings for an invalid node
      number with the global macro NUMA_NO_NODE.  This helps remove NUMA
      related assumptions like 'invalid node' from various places redirecting
      them to a common definition.
      
      Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
      Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
      Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98fa15f3
  12. 06 2月, 2019 3 次提交
  13. 01 2月, 2019 3 次提交
  14. 08 1月, 2019 1 次提交
  15. 04 10月, 2018 1 次提交
  16. 01 9月, 2018 3 次提交
  17. 04 7月, 2018 1 次提交
  18. 22 6月, 2018 4 次提交
  19. 20 6月, 2018 3 次提交
  20. 05 6月, 2018 3 次提交
  21. 24 5月, 2018 1 次提交
  22. 10 5月, 2018 1 次提交
    • S
      IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support · 5d18ee67
      Sebastian Sanchez 提交于
      Currently the driver doesn't support completion vectors. These
      are used to indicate which sets of CQs should be grouped together
      into the same vector. A vector is a CQ processing thread that
      runs on a specific CPU.
      
      If an application has several CQs bound to different completion
      vectors, and each completion vector runs on different CPUs, then
      the completion queue workload is balanced. This helps scale as more
      nodes are used.
      
      Implement CQ completion vector support using a global workqueue
      where a CQ entry is queued to the CPU corresponding to the CQ's
      completion vector. Since the workqueue is global, it's guaranteed
      to always be there when queueing CQ entries; Therefore, the RCU
      locking for cq->rdi->worker in the hot path is superfluous.
      
      Each completion vector is assigned to a different CPU. The number of
      completion vectors available is computed by taking the number of
      online, physical CPUs from the local NUMA node and subtracting the
      CPUs used for kernel receive queues and the general interrupt.
      Special use cases:
      
        * If there are no CPUs left for completion vectors, the same CPU
          for the general interrupt is used; Therefore, there would only
          be one completion vector available.
      
        * For multi-HFI systems, the number of completion vectors available
          for each device is the total number of completion vectors in
          the local NUMA node divided by the number of devices in the same
          NUMA node. If there's a division remainder, the first device to
          get initialized gets an extra completion vector.
      
      Upon a CQ creation, an invalid completion vector could be specified.
      Handle it as follows:
      
        * If the completion vector is less than 0, set it to 0.
      
        * Set the completion vector to the result of the passed completion
          vector moded with the number of device completion vectors
          available.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5d18ee67