1. 04 4月, 2019 2 次提交
  2. 28 3月, 2019 4 次提交
    • K
      IB/hfi1: Fix the allocation of RSM table · d0294344
      Kaike Wan 提交于
      The receive side mapping (RSM) on hfi1 hardware is a special
      matching mechanism to direct an incoming packet to a given
      hardware receive context. It has 4 instances of matching capabilities
      (RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of
      256 entries, each of which points to a receive context.
      
      Currently, three instances of RSM have been used:
      1. RSM0 by QOS;
      2. RSM1 by PSM FECN;
      3. RSM2 by VNIC.
      
      Each RSM instance should reserve enough entries in RMT to function
      properly. Since both PSM and VNIC could allocate any receive context
      between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must
      reserve enough RMT entries to cover the entire receive context index
      range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only
      the user receive contexts allocated for PSM
      (dd->num_user_contexts). Consequently, the sizing of
      dd->num_user_contexts in set_up_context_variables is incorrect.
      
      Fixes: 2280740f ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d0294344
    • K
      IB/hfi1: Eliminate opcode tests on mr deref · a8639a79
      Kaike Wan 提交于
      When an old ack_queue entry is used to store an incoming request, it may
      need to clean up the old entry if it is still referencing the
      MR. Originally only RDMA READ request needed to reference MR on the
      responder side and therefore the opcode was tested when cleaning up the
      old entry. The introduction of tid rdma specific operations in the
      ack_queue makes the specific opcode tests wrong.  Multiple opcodes (RDMA
      READ, TID RDMA READ, and TID RDMA WRITE) may need MR ref cleanup.
      
      Remove the opcode specific tests associated with the ack_queue.
      
      Fixes: f48ad614 ("IB/hfi1: Move driver out of staging")
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a8639a79
    • K
      IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state · 93b289b9
      Kaike Wan 提交于
      When a QP is put into error state, it may be waiting for send engine
      resources. In this case, the QP will be removed from the send engine's
      waiting list, but its IOWAIT pending bits are not cleared. This will
      normally not have any major impact as the QP is being destroyed.  However,
      the QP still needs to wind down its operations, such as draining the send
      queue by scheduling the send engine. Clearing the pending bits will avoid
      any potential complications. In addition, if the QP will eventually hang,
      clearing the pending bits can help debugging by presenting a consistent
      picture if the user dumps the qp_stats.
      
      This patch clears a QP's IOWAIT_PENDING_IB and IO_PENDING_TID bits in
      priv->s_iowait.flags in this case.
      
      Fixes: 5da0fc9d ("IB/hfi1: Prepare resource waits for dual leg")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      93b289b9
    • K
      IB/hfi1: Failed to drain send queue when QP is put into error state · 662d6646
      Kaike Wan 提交于
      When a QP is put into error state, all pending requests in the send work
      queue should be drained. The following sequence of events could lead to a
      failure, causing a request to hang:
      
      (1) The QP builds a packet and tries to send through SDMA engine.
          However, PIO engine is still busy. Consequently, this packet is put on
          the QP's tx list and the QP is put on the PIO waiting list. The field
          qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;
      
      (2) The QP is put into error state by the user application and
          notify_error_qp() is called, which removes the QP from the PIO waiting
          list and the packet from the QP's tx list. In addition, qp->s_flags is
          cleared of RVT_S_ANY_WAIT_IO bits, which does not include
          HFI1_S_WAIT_PIO_DRAIN bit;
      
      (3) The hfi1_schdule_send() function is called to drain the QP's send
          queue. Subsequently, hfi1_do_send() is called. Since the flag bit
          HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails.  As
          a result, hfi1_do_send() bails out without draining any request from
          the send queue;
      
      (4) The PIO engine completes the sending and tries to wake up any QP on
          its waiting list. But the QP has been removed from the PIO waiting
          list and therefore is kept in sleep forever.
      
      The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
      HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.
      
      Fixes: 2e2ba09e ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
      Cc: <stable@vger.kernel.org> # 4.19.x+
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      662d6646
  3. 07 3月, 2019 1 次提交
    • M
      IB/hfi1: Close race condition on user context disable and close · bc5add09
      Michael J. Ruhl 提交于
      When disabling and removing a receive context, it is possible for an
      asynchronous event (i.e IRQ) to occur.  Because of this, there is a race
      between cleaning up the context, and the context being used by the
      asynchronous event.
      
      cpu 0  (context cleanup)
          rc->ref_count-- (ref_count == 0)
          hfi1_rcd_free()
      cpu 1  (IRQ (with rcd index))
      	rcd_get_by_index()
      	lock
      	ref_count+++     <-- reference count race (WARNING)
      	return rcd
      	unlock
      cpu 0
          hfi1_free_ctxtdata() <-- incorrect free location
          lock
          remove rcd from array
          unlock
          free rcd
      
      This race will cause the following WARNING trace:
      
      WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
      CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      Call Trace:
        dump_stack+0x19/0x1b
        __warn+0xd8/0x100
        warn_slowpath_null+0x1d/0x20
        hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
        is_rcv_urgent_int+0x24/0x90 [hfi1]
        general_interrupt+0x1b6/0x210 [hfi1]
        __handle_irq_event_percpu+0x44/0x1c0
        handle_irq_event_percpu+0x32/0x80
        handle_irq_event+0x3c/0x60
        handle_edge_irq+0x7f/0x150
        handle_irq+0xe4/0x1a0
        do_IRQ+0x4d/0xf0
        common_interrupt+0x162/0x162
      
      The race can also lead to a use after free which could be similar to:
      
      general protection fault: 0000 1 SMP
      CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
      Call Trace:
        ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_aio_write+0xba/0x110 [hfi1]
        do_sync_readv_writev+0x7b/0xd0
        do_readv_writev+0xce/0x260
        ? handle_mm_fault+0x39d/0x9b0
        ? pick_next_task_fair+0x5f/0x1b0
        ? sched_clock_cpu+0x85/0xc0
        ? __schedule+0x13a/0x890
        vfs_writev+0x35/0x60
        SyS_writev+0x7f/0x110
        system_call_fastpath+0x22/0x27
      
      Use the appropriate kref API to verify access.
      
      Reorder context cleanup to ensure context removal before cleanup occurs
      correctly.
      
      Cc: stable@vger.kernel.org # v4.14.0+
      Fixes: f683c80c ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      bc5add09
  4. 06 3月, 2019 1 次提交
    • A
      mm: replace all open encodings for NUMA_NO_NODE · 98fa15f3
      Anshuman Khandual 提交于
      Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
      
      All these places for replacement were found by running the following
      grep patterns on the entire kernel code.  Please let me know if this
      might have missed some instances.  This might also have replaced some
      false positives.  I will appreciate suggestions, inputs and review.
      
      1. git grep "nid == -1"
      2. git grep "node == -1"
      3. git grep "nid = -1"
      4. git grep "node = -1"
      
      This patch (of 2):
      
      At present there are multiple places where invalid node number is
      encoded as -1.  Even though implicitly understood it is always better to
      have macros in there.  Replace these open encodings for an invalid node
      number with the global macro NUMA_NO_NODE.  This helps remove NUMA
      related assumptions like 'invalid node' from various places redirecting
      them to a common definition.
      
      Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
      Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
      Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98fa15f3
  5. 22 2月, 2019 1 次提交
  6. 16 2月, 2019 1 次提交
  7. 08 2月, 2019 2 次提交
  8. 06 2月, 2019 28 次提交