1. 17 10月, 2018 16 次提交
  2. 16 10月, 2018 22 次提交
    • A
      IB/mlx5: Fix MR cache initialization · 013c2403
      Artemy Kovalyov 提交于
      Schedule MR cache work only after bucket was initialized.
      
      Cc: <stable@vger.kernel.org> # 4.10
      Fixes: 49780d42 ("IB/mlx5: Expose MR cache for mlx5_ib")
      Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      013c2403
    • L
      RDMA/cm: Respect returned status of cm_init_av_by_path · e54b6a3b
      Leon Romanovsky 提交于
      Add missing check for failure of cm_init_av_by_path
      
      Fixes: e1444b5a ("IB/cm: Fix automatic path migration support")
      Reported-by: NSlava Shwartsman <slavash@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e54b6a3b
    • D
      IB/ipoib: Clear IPCB before icmp_send · 4d6e4d12
      Denis Drozdov 提交于
      IPCB should be cleared before icmp_send, since it may contain data from
      previous layers and the data could be misinterpreted as ip header options,
      which later caused the ihl to be set to an invalid value and resulted in
      the following stack corruption:
      
      [ 1083.031512] ib0: packet len 57824 (> 2048) too long to send, dropping
      [ 1083.031843] ib0: packet len 37904 (> 2048) too long to send, dropping
      [ 1083.032004] ib0: packet len 4040 (> 2048) too long to send, dropping
      [ 1083.032253] ib0: packet len 63800 (> 2048) too long to send, dropping
      [ 1083.032481] ib0: packet len 23960 (> 2048) too long to send, dropping
      [ 1083.033149] ib0: packet len 63800 (> 2048) too long to send, dropping
      [ 1083.033439] ib0: packet len 63800 (> 2048) too long to send, dropping
      [ 1083.033700] ib0: packet len 63800 (> 2048) too long to send, dropping
      [ 1083.034124] ib0: packet len 63800 (> 2048) too long to send, dropping
      [ 1083.034387] ==================================================================
      [ 1083.034602] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xf08/0x1310
      [ 1083.034798] Write of size 4 at addr ffff880353457c5f by task kworker/u16:0/7
      [ 1083.034990]
      [ 1083.035104] CPU: 7 PID: 7 Comm: kworker/u16:0 Tainted: G           O      4.19.0-rc5+ #1
      [ 1083.035316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
      [ 1083.035573] Workqueue: ipoib_wq ipoib_cm_skb_reap [ib_ipoib]
      [ 1083.035750] Call Trace:
      [ 1083.035888]  dump_stack+0x9a/0xeb
      [ 1083.036031]  print_address_description+0xe3/0x2e0
      [ 1083.036213]  kasan_report+0x18a/0x2e0
      [ 1083.036356]  ? __ip_options_echo+0xf08/0x1310
      [ 1083.036522]  __ip_options_echo+0xf08/0x1310
      [ 1083.036688]  icmp_send+0x7b9/0x1cd0
      [ 1083.036843]  ? icmp_route_lookup.constprop.9+0x1070/0x1070
      [ 1083.037018]  ? netif_schedule_queue+0x5/0x200
      [ 1083.037180]  ? debug_show_all_locks+0x310/0x310
      [ 1083.037341]  ? rcu_dynticks_curr_cpu_in_eqs+0x85/0x120
      [ 1083.037519]  ? debug_locks_off+0x11/0x80
      [ 1083.037673]  ? debug_check_no_obj_freed+0x207/0x4c6
      [ 1083.037841]  ? check_flags.part.27+0x450/0x450
      [ 1083.037995]  ? debug_check_no_obj_freed+0xc3/0x4c6
      [ 1083.038169]  ? debug_locks_off+0x11/0x80
      [ 1083.038318]  ? skb_dequeue+0x10e/0x1a0
      [ 1083.038476]  ? ipoib_cm_skb_reap+0x2b5/0x650 [ib_ipoib]
      [ 1083.038642]  ? netif_schedule_queue+0xa8/0x200
      [ 1083.038820]  ? ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
      [ 1083.038996]  ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
      [ 1083.039174]  process_one_work+0x912/0x1830
      [ 1083.039336]  ? wq_pool_ids_show+0x310/0x310
      [ 1083.039491]  ? lock_acquire+0x145/0x3a0
      [ 1083.042312]  worker_thread+0x87/0xbb0
      [ 1083.045099]  ? process_one_work+0x1830/0x1830
      [ 1083.047865]  kthread+0x322/0x3e0
      [ 1083.050624]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [ 1083.053354]  ret_from_fork+0x3a/0x50
      
      For instance __ip_options_echo is failing to proceed with invalid srr and
      optlen passed from another layer via IPCB
      
      [  762.139568] IPv4: __ip_options_echo rr=0 ts=0 srr=43 cipso=0
      [  762.139720] IPv4: ip_options_build: IPCB 00000000f3cd969e opt 000000002ccb3533
      [  762.139838] IPv4: __ip_options_echo in srr: optlen 197 soffset 84
      [  762.139852] IPv4: ip_options_build srr=0 is_frag=0 rr_needaddr=0 ts_needaddr=0 ts_needtime=0 rr=0 ts=0
      [  762.140269] ==================================================================
      [  762.140713] IPv4: __ip_options_echo rr=0 ts=0 srr=0 cipso=0
      [  762.141078] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x12ec/0x1680
      [  762.141087] Write of size 4 at addr ffff880353457c7f by task kworker/u16:0/7
      Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
      Reviewed-by: NErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4d6e4d12
    • L
      RDMA/restrack: Protect from reentry to resource return path · fe9bc164
      Leon Romanovsky 提交于
      Nullify the resource task struct pointer to ensure that subsequent calls
      won't try to release task_struct again.
      
      ------------[ cut here ]------------
      ODEBUG: free active (active state 1) object type: rcu_head hint:
      (null)
      WARNING: CPU: 0 PID: 6048 at lib/debugobjects.c:329
      debug_print_object+0x16a/0x210 lib/debugobjects.c:326
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 6048 Comm: syz-executor022 Not tainted
      4.19.0-rc7-next-20181008+ #89
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x244/0x3ab lib/dump_stack.c:113
        panic+0x238/0x4e7 kernel/panic.c:184
        __warn.cold.8+0x163/0x1ba kernel/panic.c:536
        report_bug+0x254/0x2d0 lib/bug.c:186
        fixup_bug arch/x86/kernel/traps.c:178 [inline]
        do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
        do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
        invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
      RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
      Code: 41 88 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 92 00 00 00 48 8b 14
      dd
      60 02 41 88 4c 89 fe 48 c7 c7 00 f8 40 88 e8 36 2f b4 fd <0f> 0b 83 05
      a9
      f4 5e 06 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f
      RSP: 0018:ffff8801d8c3eda8 EFLAGS: 00010086
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff8164d235 RDI: 0000000000000005
      RBP: ffff8801d8c3ede8 R08: ffff8801d70aa280 R09: ffffed003b5c3eda
      R10: ffffed003b5c3eda R11: ffff8801dae1f6d7 R12: 0000000000000001
      R13: ffffffff8939a760 R14: 0000000000000000 R15: ffffffff8840fca0
        __debug_check_no_obj_freed lib/debugobjects.c:786 [inline]
        debug_check_no_obj_freed+0x3ae/0x58d lib/debugobjects.c:818
        kmem_cache_free+0x202/0x290 mm/slab.c:3759
        free_task_struct kernel/fork.c:163 [inline]
        free_task+0x16e/0x1f0 kernel/fork.c:457
        __put_task_struct+0x2e6/0x620 kernel/fork.c:730
        put_task_struct include/linux/sched/task.h:96 [inline]
        finish_task_switch+0x66c/0x900 kernel/sched/core.c:2715
        context_switch kernel/sched/core.c:2834 [inline]
        __schedule+0x8d7/0x21d0 kernel/sched/core.c:3480
        schedule+0xfe/0x460 kernel/sched/core.c:3524
        freezable_schedule include/linux/freezer.h:172 [inline]
        futex_wait_queue_me+0x3f9/0x840 kernel/futex.c:2530
        futex_wait+0x45c/0xa50 kernel/futex.c:2645
        do_futex+0x31a/0x26d0 kernel/futex.c:3528
        __do_sys_futex kernel/futex.c:3589 [inline]
        __se_sys_futex kernel/futex.c:3557 [inline]
        __x64_sys_futex+0x472/0x6a0 kernel/futex.c:3557
        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x446549
      Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f3a998f5da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: ffffffffffffffda RBX: 00000000006dbc38 RCX: 0000000000446549
      RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000006dbc38
      RBP: 00000000006dbc30 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc3c
      R13: 2f646e6162696e69 R14: 666e692f7665642f R15: 00000000006dbd2c
      Kernel Offset: disabled
      
      Reported-by: syzbot+71aff6ea121ffefc280f@syzkaller.appspotmail.com
      Fixes: ed7a01fd ("RDMA/restrack: Release task struct which was hold by CM_ID object")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      fe9bc164
    • M
      RDMA/mlx5: Add support for flow tag to raw create flow · ba4a4119
      Mark Bloch 提交于
      A user can provide a hint which will be attached to the packet and written
      to the CQE on receive. This can be used as a way to offload operations
      into the HW, for example parsing a packet which is a tunneled packet, and
      if so, pass 0x1 as the hint. The software can use that hint to decapsulate
      the packet and parse only the inner headers thus saving CPU cycles.
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ba4a4119
    • G
      RDMA/mlx5: Remove extraneous error check · 645ba597
      Gal Pressman 提交于
      Remove double error check from create user RQ error flow.
      
      Fixes: 79b20a6c ("IB/mlx5: Add receive Work Queue verbs")
      Signed-off-by: NGal Pressman <pressmangal@gmail.com>
      Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      645ba597
    • Y
      IB/mlx5: Verify DEVX object type · 2351776e
      Yishai Hadas 提交于
      Verify that the input DEVX object type matches the created object.
      
      As the obj_id in the firmware is not globally unique the object type must
      be considered upon checking for a valid object id.
      
      Once both the type and the id match we know that the lock was taken on the
      correct object by the uverbs layer.
      
      Fixes: e662e14d ("IB/mlx5: Add DEVX support for modify and query commands")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2351776e
    • Y
      RDMA/hns: Add FRMR support for hip08 · 68a997c5
      Yixian Liu 提交于
      This patch adds fast register physical memory region (FRMR) support for
      hip08.
      Signed-off-by: NYixian Liu <liuyixian@huawei.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      68a997c5
    • S
      RDMA/bnxt_re: Avoid resource leak in case the NQ registration fails · 5df95099
      Selvin Xavier 提交于
      In case the NQ alloc/enable fails, free up the already allocated/enabled
      NQ before reporting failure. Also, track the alloc/enable using proper
      state checking.
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5df95099
    • S
      RDMA/bnxt_re: Wait for delayed work to finish before device removal · a08b9e9a
      Selvin Xavier 提交于
      Delayed work bnxt_re_worker would be still running even after
      cancel_delayed_work returns. This causes crash as the driver proceeds with
      device removal. To make sure that the work is finished before returning,
      use cancel_delayed_work_sync.
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a08b9e9a
    • D
      RDMA/bnxt_re: Limit max_pkey to 16 bit value · 854a2020
      Devesh Sharma 提交于
      Some FW versios return pkey values more than 0xFFFF. pkey_tbl_len of
      ib_port_attr is 16bit value. So restricting max_pkeys to 0xFFFF.
      Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      854a2020
    • D
      RDMA/bnxt_re: Fix qp async event reporting · 4c01f2e3
      Devesh Sharma 提交于
      Reports affiliated async event on the qp-async event channel instead of
      global event channel.
      Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4c01f2e3
    • S
      RDMA/bnxt_re: Report out of sequence hw counters · 316dd282
      Selvin Xavier 提交于
      Expose out of sequence errors received from FW.  This counter is a 32 bit
      counter and driver has to accumulate the counter. Stores the previous
      value for calculating the difference in the next query.
      
      Also, update the HW statistics structure with new fields.
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      316dd282
    • S
      RDMA/bnxt_re: Expose rx discards and drop counters · 5c80c913
      Selvin Xavier 提交于
      Expose the RoCE discard and drop counters from the HW statistics context
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5c80c913
    • S
      RDMA/bnxt_re: Prevent driver crash due to NULL pointer in error message print · bb22c36c
      Somnath Kotur 提交于
      crsqe->resp would be NULL in case the host command timed out before
      getting a response from HW. Check for NULL pointer to avoid a potential
      crash while printing the error message.
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      bb22c36c
    • D
      RDMA/bnxt_re: Drop L2 async events silently · f2bd4d09
      Devesh Sharma 提交于
      In some FW versions, RoCE driver also receives an async notification which
      was directed to L2 driver.  RoCE driver does not handle this and print a
      message to syslog.  Drop these notifications silently.
      Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f2bd4d09
    • S
      RDMA/bnxt_re: Avoid accessing nq->bar_reg_iomem in failure case · ed51efd2
      Selvin Xavier 提交于
      In the failure path, nq->bar_reg_iomem gets accessed without
      initializing. Avoid this by calling the bnxt_qplib_nq_stop_irq only if the
      initialization is complete.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 1ac5a404 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
      Fixes: 6e04b103 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ed51efd2
    • S
      RDMA/bnxt_re: Avoid NULL check after accessing the pointer · eae4ad1b
      Selvin Xavier 提交于
      This is reported by smatch check.  rcfw->creq_bar_reg_iomem is accessed in
      bnxt_qplib_rcfw_stop_irq and this variable check afterwards doesn't make
      sense.  Also, rcfw->creq_bar_reg_iomem will never be NULL.  So Removing
      this check.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 6e04b103 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      eae4ad1b
    • S
      RDMA/bnxt_re: Remove the unnecessary version macro definition · 1b7042d7
      Selvin Xavier 提交于
      Version macro is not required as the driver is not maintaining the
      version. Removing the references of this macro too.
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1b7042d7
    • S
      RDMA/bnxt_re: Fix recursive lock warning in debug kernel · d455f29f
      Selvin Xavier 提交于
      Fix possible recursive lock warning. Its a false warning as the locks are
      part of two differnt HW Queue data structure - cmdq and creq. Debug kernel
      is throwing the following warning and stack trace.
      
      [  783.914967] ============================================
      [  783.914970] WARNING: possible recursive locking detected
      [  783.914973] 4.19.0-rc2+ #33 Not tainted
      [  783.914976] --------------------------------------------
      [  783.914979] swapper/2/0 is trying to acquire lock:
      [  783.914982] 000000002aa3949d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
      [  783.914999]
      but task is already holding lock:
      [  783.915002] 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
      [  783.915013]
      other info that might help us debug this:
      [  783.915016]  Possible unsafe locking scenario:
      
      [  783.915019]        CPU0
      [  783.915021]        ----
      [  783.915034]   lock(&(&hwq->lock)->rlock);
      [  783.915035]   lock(&(&hwq->lock)->rlock);
      [  783.915037]
       *** DEADLOCK ***
      
      [  783.915038]  May be due to missing lock nesting notation
      
      [  783.915039] 1 lock held by swapper/2/0:
      [  783.915040]  #0: 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
      [  783.915044]
      stack backtrace:
      [  783.915046] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0-rc2+ #33
      [  783.915047] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
      [  783.915048] Call Trace:
      [  783.915049]  <IRQ>
      [  783.915054]  dump_stack+0x90/0xe3
      [  783.915058]  __lock_acquire+0x106c/0x1080
      [  783.915061]  ? sched_clock+0x5/0x10
      [  783.915063]  lock_acquire+0xbd/0x1a0
      [  783.915065]  ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
      [  783.915069]  _raw_spin_lock_irqsave+0x4a/0x90
      [  783.915071]  ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
      [  783.915073]  bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
      [  783.915078]  tasklet_action_common.isra.17+0x197/0x1b0
      [  783.915081]  __do_softirq+0xcb/0x3a6
      [  783.915084]  irq_exit+0xe9/0x100
      [  783.915085]  do_IRQ+0x6a/0x120
      [  783.915087]  common_interrupt+0xf/0xf
      [  783.915088]  </IRQ>
      
      Use nested notation for the spin_lock to avoid this warning.
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d455f29f
    • S
      RDMA/bnxt_re: Add missing spin lock initialization · 5a23e0b1
      Selvin Xavier 提交于
      Add the missing initalization of the cq_lock and qplib.flush_lock.
      
      Fixes: 942c9b6c ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing")
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5a23e0b1
    • J
      Merge branch 'for-rc' into rdma.git for-next · 59bfc59a
      Jason Gunthorpe 提交于
      From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
      
      This is required to resolve dependencies of the next series of RDMA
      patches.
      
      The code motion conflicts in drivers/infiniband/core/cache.c were
      resolved.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      59bfc59a
  3. 11 10月, 2018 1 次提交
    • V
      IB/mlx5: Unmap DMA addr from HCA before IOMMU · dd9a4034
      Valentine Fatiev 提交于
      The function that puts back the MR in cache also removes the DMA address
      from the HCA. Therefore we need to call this function before we remove
      the DMA mapping from MMU. Otherwise the HCA may access a memory that
      is no longer DMA mapped.
      
      Call trace:
      NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc6+ #4
      Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/20/2012
      RIP: 0010:intel_idle+0x73/0x120
      Code: 80 5c 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 80 5c 01 00 48 89 d1 0f 60 02
      RSP: 0018:ffffffff9a403e38 EFLAGS: 00000046
      RAX: 0000000000000030 RBX: 0000000000000005 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffffff9a5790c0 RDI: 0000000000000000
      RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000007cf9
      R10: 000000000000030a R11: 0000000000000018 R12: 0000000000000000
      R13: ffffffff9a5792b8 R14: ffffffff9a5790c0 R15: 0000002b48471e4d
      FS:  0000000000000000(0000) GS:ffff9c6caf400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f5737185000 CR3: 0000000590c0a002 CR4: 00000000000606f0
      Call Trace:
       cpuidle_enter_state+0x7e/0x2e0
       do_idle+0x1ed/0x290
       cpu_startup_entry+0x6f/0x80
       start_kernel+0x524/0x544
       ? set_init_arg+0x55/0x55
       secondary_startup_64+0xa4/0xb0
      DMAR: DRHD: handling fault status reg 2
      DMAR: [DMA Read] Request device [04:00.0] fault addr b34d2000 [fault reason 06] PTE Read access is not set
      DMAR: [DMA Read] Request device [01:00.2] fault addr bff8b000 [fault reason 06] PTE Read access is not set
      
      Fixes: f3f134f5 ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory")
      Signed-off-by: NValentine Fatiev <valentinef@mellanox.com>
      Reviewed-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      dd9a4034
  4. 06 10月, 2018 1 次提交