1. 26 3月, 2019 20 次提交
  2. 18 3月, 2019 4 次提交
  3. 07 3月, 2019 1 次提交
    • M
      IB/hfi1: Close race condition on user context disable and close · bc5add09
      Michael J. Ruhl 提交于
      When disabling and removing a receive context, it is possible for an
      asynchronous event (i.e IRQ) to occur.  Because of this, there is a race
      between cleaning up the context, and the context being used by the
      asynchronous event.
      
      cpu 0  (context cleanup)
          rc->ref_count-- (ref_count == 0)
          hfi1_rcd_free()
      cpu 1  (IRQ (with rcd index))
      	rcd_get_by_index()
      	lock
      	ref_count+++     <-- reference count race (WARNING)
      	return rcd
      	unlock
      cpu 0
          hfi1_free_ctxtdata() <-- incorrect free location
          lock
          remove rcd from array
          unlock
          free rcd
      
      This race will cause the following WARNING trace:
      
      WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
      CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      Call Trace:
        dump_stack+0x19/0x1b
        __warn+0xd8/0x100
        warn_slowpath_null+0x1d/0x20
        hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
        is_rcv_urgent_int+0x24/0x90 [hfi1]
        general_interrupt+0x1b6/0x210 [hfi1]
        __handle_irq_event_percpu+0x44/0x1c0
        handle_irq_event_percpu+0x32/0x80
        handle_irq_event+0x3c/0x60
        handle_edge_irq+0x7f/0x150
        handle_irq+0xe4/0x1a0
        do_IRQ+0x4d/0xf0
        common_interrupt+0x162/0x162
      
      The race can also lead to a use after free which could be similar to:
      
      general protection fault: 0000 1 SMP
      CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
      task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
      Call Trace:
        ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
        hfi1_aio_write+0xba/0x110 [hfi1]
        do_sync_readv_writev+0x7b/0xd0
        do_readv_writev+0xce/0x260
        ? handle_mm_fault+0x39d/0x9b0
        ? pick_next_task_fair+0x5f/0x1b0
        ? sched_clock_cpu+0x85/0xc0
        ? __schedule+0x13a/0x890
        vfs_writev+0x35/0x60
        SyS_writev+0x7f/0x110
        system_call_fastpath+0x22/0x27
      
      Use the appropriate kref API to verify access.
      
      Reorder context cleanup to ensure context removal before cleanup occurs
      correctly.
      
      Cc: stable@vger.kernel.org # v4.14.0+
      Fixes: f683c80c ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      bc5add09
  4. 06 3月, 2019 1 次提交
    • A
      mm: replace all open encodings for NUMA_NO_NODE · 98fa15f3
      Anshuman Khandual 提交于
      Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
      
      All these places for replacement were found by running the following
      grep patterns on the entire kernel code.  Please let me know if this
      might have missed some instances.  This might also have replaced some
      false positives.  I will appreciate suggestions, inputs and review.
      
      1. git grep "nid == -1"
      2. git grep "node == -1"
      3. git grep "nid = -1"
      4. git grep "node = -1"
      
      This patch (of 2):
      
      At present there are multiple places where invalid node number is
      encoded as -1.  Even though implicitly understood it is always better to
      have macros in there.  Replace these open encodings for an invalid node
      number with the global macro NUMA_NO_NODE.  This helps remove NUMA
      related assumptions like 'invalid node' from various places redirecting
      them to a common definition.
      
      Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[ixgbe]
      Acked-by: Jens Axboe <axboe@kernel.dk>			[mtip32xx]
      Acked-by: Vinod Koul <vkoul@kernel.org>			[dmaengine.c]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Doug Ledford <dledford@redhat.com>		[drivers/infiniband]
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98fa15f3
  5. 05 3月, 2019 2 次提交
  6. 04 3月, 2019 2 次提交
  7. 27 2月, 2019 1 次提交
  8. 23 2月, 2019 3 次提交
    • L
      RDMA: Handle ucontext allocations by IB/core · a2a074ef
      Leon Romanovsky 提交于
      Following the PD conversion patch, do the same for ucontext allocations.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a2a074ef
    • D
      bnxt_re: fix the regression due to changes in alloc_pbl · c50866e2
      Devesh Sharma 提交于
      While adding the use of for_each_sg_dma_page iterator for Brodcom's rdma
      driver, there was a regression added in the __alloc_pbl path. The change
      left bnxt_re in DOA state in for-next branch.
      
      Fixing the regression to avoid the host crash when a user space object is
      created. Restricting the unconditional access to hwq.pg_arr when hwq is
      initialized for user space objects.
      
      Fixes: 161ebe24 ("RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL")
      Reported-by: NGal Pressman <galpress@amazon.com>
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c50866e2
    • H
      IB/mlx4: Increase the timeout for CM cache · 2612d723
      Håkon Bugge 提交于
      Using CX-3 virtual functions, either from a bare-metal machine or
      pass-through from a VM, MAD packets are proxied through the PF driver.
      
      Since the VF drivers have separate name spaces for MAD Transaction Ids
      (TIDs), the PF driver has to re-map the TIDs and keep the book keeping
      in a cache.
      
      Following the RDMA Connection Manager (CM) protocol, it is clear when
      an entry has to evicted form the cache. But life is not perfect,
      remote peers may die or be rebooted. Hence, it's a timeout to wipe out
      a cache entry, when the PF driver assumes the remote peer has gone.
      
      During workloads where a high number of QPs are destroyed concurrently,
      excessive amount of CM DREQ retries has been observed
      
      The problem can be demonstrated in a bare-metal environment, where two
      nodes have instantiated 8 VFs each. This using dual ported HCAs, so we
      have 16 vPorts per physical server.
      
      64 processes are associated with each vPort and creates and destroys
      one QP for each of the remote 64 processes. That is, 1024 QPs per
      vPort, all in all 16K QPs. The QPs are created/destroyed using the
      CM.
      
      When tearing down these 16K QPs, excessive CM DREQ retries (and
      duplicates) are observed. With some cat/paste/awk wizardry on the
      infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the
      nodes:
      
      cm_rx_duplicates:
            dreq  2102
      cm_rx_msgs:
            drep  1989
            dreq  6195
             rep  3968
             req  4224
             rtu  4224
      cm_tx_msgs:
            drep  4093
            dreq 27568
             rep  4224
             req  3968
             rtu  3968
      cm_tx_retries:
            dreq 23469
      
      Note that the active/passive side is equally distributed between the
      two nodes.
      
      Enabling pr_debug in cm.c gives tons of:
      
      [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave:
      1,sl_cm_id: 0xd393089f} is NULL!
      
      By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the
      tear-down phase of the application is reduced from approximately 90 to
      50 seconds. Retries/duplicates are also significantly reduced:
      
      cm_rx_duplicates:
            dreq  2460
      []
      cm_tx_retries:
            dreq  3010
             req    47
      
      Increasing the timeout further didn't help, as these duplicates and
      retries stems from a too short CMA timeout, which was 20 (~4 seconds)
      on the systems. By increasing the CMA timeout to 22 (~17 seconds), the
      numbers fell down to about 10 for both of them.
      
      Adjustment of the CMA timeout is not part of this commit.
      Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2612d723
  9. 22 2月, 2019 3 次提交
  10. 21 2月, 2019 1 次提交
  11. 20 2月, 2019 2 次提交
    • W
      iw_cxgb4: Make function read_tcb() static · 3b8f8b95
      Wei Yongjun 提交于
      Fixes the following sparse warning:
      
      drivers/infiniband/hw/cxgb4/cm.c:658:6: warning:
       symbol 'read_tcb' was not declared. Should it be static?
      
      Fixes: 11a27e21 ("iw_cxgb4: complete the cached SRQ buffers")
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Acked-by: NRaju Rangoju <rajur@chelsio.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3b8f8b95
    • Y
      RDMA/hns: Bugfix for set hem of SCC · 6ac16e40
      Yangyang Li 提交于
      The method of set hem for scc context is different from other contexts. It
      should notify the hardware with the detailed idx in bt0 for scc, while for
      other contexts, it only need to notify the bt step and the hardware will
      calculate the idx.
      
      Here fixes the following error when unloading the hip08 driver:
      
      [  123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
      [  123.579023] {1}[Hardware Error]: event severity: recoverable
      [  123.584670] {1}[Hardware Error]:  Error 0, type: recoverable
      [  123.590317] {1}[Hardware Error]:   section_type: PCIe error
      [  123.595877] {1}[Hardware Error]:   version: 4.0
      [  123.600395] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
      [  123.606562] {1}[Hardware Error]:   device_id: 0000:7d:00.0
      [  123.612034] {1}[Hardware Error]:   slot: 0
      [  123.616120] {1}[Hardware Error]:   secondary_bus: 0x00
      [  123.621245] {1}[Hardware Error]:   vendor_id: 0x19e5, device_id: 0xa222
      [  123.627847] {1}[Hardware Error]:   class_code: 000002
      [  123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000
      [  123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID
      [  123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000
      [  123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!!
      [  123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified
      [  123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error
      [  123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error
      [  123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5
      [  123.683147] hns3 0000:7d:00.0: AER: Device recovery successful
      [  123.688978] hns3 0000:7d:00.0: PF Reset requested
      [  123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF
      [  123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5!
      
      Fixes: 6a157f7d ("RDMA/hns: Add SCC context allocation support for hip08")
      Signed-off-by: NYangyang Li <liyangyang20@huawei.com>
      Reviewed-by: NYixian Liu <liuyixian@huawei.com>
      Reviewed-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6ac16e40