- 26 3月, 2019 20 次提交
-
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Remove the custom spinlock as the XArray handles its own locking. Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Mike Marciniszyn 提交于
The adaptive PIO implementation only considers the current packet size when deciding between SDMA and pio for a packet. This causes credit return forces if small and large packets are interleaved. Add a running average to avoid costly credit forces so that a large sequence of small packets is required to go below the threshold that chooses pio. Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Erez Alfasi 提交于
"__attribute__" set of macros has been standardized, have became more potentially portable and consistent code back in v2.6.21 by commit 82ddcb04 ("[PATCH] extend the set of "__attribute__" shortcut macros"). Moreover, nowadays checkpatch.pl warns about using __attribute__((packed)) instead of __packed. This patch converts all the "__attribute__ ((packed))" annotations to "__packed" within the RDMA subsystem. Signed-off-by: NErez Alfasi <ereza@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Lijun Ou 提交于
The src_mac array is not used in hns_roce_v2_modify_qp function. Signed-off-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Lijun Ou 提交于
According to IB protocol, the send with invalidate operation will not invalidate mr that was created through a register mr or reregister mr. Fixes: e93df010 ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Lijun Ou 提交于
The driver should not print the error information when the hip08 driver not support virtual function. Signed-off-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Lijun Ou 提交于
According to IB protocol, some fields of qp context are filled with optional when the relatived attr_mask are set. The relatived attr_mask include IB_QP_TIMEOUT, IB_QP_RETRY_CNT, IB_QP_RNR_RETRY and IB_QP_MIN_RNR_TIMER. Besides, we move some assignments of the fields of qp context into the outside of the specific qp state jump function. Signed-off-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Lijun Ou 提交于
According to hip08 UM(User Manual), the raq_psn field size is [23:0]. Signed-off-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Lijun Ou 提交于
Only when the IB_QP_RQ_PSN flags of attr_mask is set is it valid to assign the relatived fields of rq'psn into the qp context when modified qp. Signed-off-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Lijun Ou 提交于
Only when the IB_QP_SQ_PSN flags of attr_mask is set is it valid to assign the relatived fields of psn into the qp context when modified qp. Signed-off-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Matthew Wilcox 提交于
It would make sense to convert this to an allocating XArray and remove the kfifo that is currently used to allocate the CQID, but that work is better done by someone who has the hardware to test with. Signed-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 18 3月, 2019 4 次提交
-
-
由 Feng Tang 提交于
There is a panic reported that on a system with x722 ethernet, when doing the operations like: # ip link add br0 type bridge # ip link set eno1 master br0 # systemctl restart systemd-networkd The system will panic "BUG: unable to handle kernel null pointer dereference at 0000000000000034", with call chain: i40iw_inetaddr_event notifier_call_chain blocking_notifier_call_chain notifier_call_chain __inet_del_ifa inet_rtm_deladdr rtnetlink_rcv_msg netlink_rcv_skb rtnetlink_rcv netlink_unicast netlink_sendmsg sock_sendmsg __sys_sendto It is caused by "local_ipaddr = ntohl(in->ifa_list->ifa_address)", while the in->ifa_list is NULL. So add a check for the "in->ifa_list == NULL" case, and skip the ARP operation accordingly. Signed-off-by: NFeng Tang <feng.tang@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Aya Levin 提交于
Add mapping of link mode: CAUI4 100Gbps CR4/KR4 with 4 lines and 25Gbps. Fix mapping of link mode: GAUI2 50Gbps CR2/KR2 to be 2 lines with 25Gbps. Fixes: 08e8676f ("IB/mlx5: Add support for 50Gbps per lane link modes") Signed-off-by: NAya Levin <ayal@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Yishai Hadas 提交于
To prevent a hardware memory leak when a DEVX DCT object is destroyed without calling DRAIN DCT before, (e.g. under cleanup flow), need to manage its creation and destruction via mlx5 core. In that case the DRAIN DCT command will be called and only once that it will be completed the DESTROY DCT command will be called. Otherwise, the DESTROY DCT may fail and a hardware leak may occur. As of that change the DRAIN DCT command should not be exposed any more from DEVX, it's managed internally by the driver to work as expected by the device specification. Fixes: 7efce369 ("IB/mlx5: Add obj create and destroy functionality") Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jack Morgenstein 提交于
Code review revealed a race condition which could allow the catas error flow to interrupt the alias guid query post mechanism at random points. Thiis is fixed by doing cancel_delayed_work_sync() instead of cancel_delayed_work() during the alias guid mechanism destroy flow. Fixes: a0c64a17 ("mlx4: Add alias_guid mechanism") Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 07 3月, 2019 1 次提交
-
-
由 Michael J. Ruhl 提交于
When disabling and removing a receive context, it is possible for an asynchronous event (i.e IRQ) to occur. Because of this, there is a race between cleaning up the context, and the context being used by the asynchronous event. cpu 0 (context cleanup) rc->ref_count-- (ref_count == 0) hfi1_rcd_free() cpu 1 (IRQ (with rcd index)) rcd_get_by_index() lock ref_count+++ <-- reference count race (WARNING) return rcd unlock cpu 0 hfi1_free_ctxtdata() <-- incorrect free location lock remove rcd from array unlock free rcd This race will cause the following WARNING trace: WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 Call Trace: dump_stack+0x19/0x1b __warn+0xd8/0x100 warn_slowpath_null+0x1d/0x20 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] is_rcv_urgent_int+0x24/0x90 [hfi1] general_interrupt+0x1b6/0x210 [hfi1] __handle_irq_event_percpu+0x44/0x1c0 handle_irq_event_percpu+0x32/0x80 handle_irq_event+0x3c/0x60 handle_edge_irq+0x7f/0x150 handle_irq+0xe4/0x1a0 do_IRQ+0x4d/0xf0 common_interrupt+0x162/0x162 The race can also lead to a use after free which could be similar to: general protection fault: 0000 1 SMP CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230 Call Trace: ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_aio_write+0xba/0x110 [hfi1] do_sync_readv_writev+0x7b/0xd0 do_readv_writev+0xce/0x260 ? handle_mm_fault+0x39d/0x9b0 ? pick_next_task_fair+0x5f/0x1b0 ? sched_clock_cpu+0x85/0xc0 ? __schedule+0x13a/0x890 vfs_writev+0x35/0x60 SyS_writev+0x7f/0x110 system_call_fastpath+0x22/0x27 Use the appropriate kref API to verify access. Reorder context cleanup to ensure context removal before cleanup occurs correctly. Cc: stable@vger.kernel.org # v4.14.0+ Fixes: f683c80c ("IB/hfi1: Resolve kernel panics by reference counting receive contexts") Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 06 3月, 2019 1 次提交
-
-
由 Anshuman Khandual 提交于
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 3月, 2019 2 次提交
-
-
由 YueHaibing 提交于
The the below commit, hns_roce_v2_modify_qp is called inside spinlock while using GFP_KERNEL. Change it to GFP_ATOMIC. Fixes: 0425e3e6 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Shaobo He 提交于
In function `c4iw_dealloc_mw`, variable mhp's value is printed after freed, it is clearer to have the print before the kfree. Otherwise racing threads could allocate another mhp with the same pointer value and create confusing tracing. Signed-off-by: NShaobo He <shaobo@cs.utah.edu> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 04 3月, 2019 2 次提交
-
-
由 Moni Shoua 提交于
The write access of an implicit MR is inherited to all of its children. Therefore we must set the correct write access to the parent MR. Pass full access_flags when creating umem to let it calculate write access correctly. Fixes: da6a496a ("IB/mlx5: Ranges in implicit ODP MR inherit its write access") Signed-off-by: NMoni Shoua <monis@mellanox.com> Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Devesh Sharma 提交于
Kernel space provider driver should clean the CQs belonging to kernel space consumers only. The current implementation is doing reverse of it. Fixing the same by avoiding the call to __clean_cq on a kernel qp during destroy. Fixes: c50866e2 ("bnxt_re: fix the regression due to changes in alloc_pbl") Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 27 2月, 2019 1 次提交
-
-
由 Jakub Kicinski 提交于
Being able to build devlink as a module causes growing pains. First all drivers had to add a meta dependency to make sure they are not built in when devlink is built as a module. Now we are struggling to invoke ethtool compat code reliably. Make devlink code built-in, users can still not build it at all but the dynamically loadable module option is removed. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 2月, 2019 3 次提交
-
-
由 Leon Romanovsky 提交于
Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Devesh Sharma 提交于
While adding the use of for_each_sg_dma_page iterator for Brodcom's rdma driver, there was a regression added in the __alloc_pbl path. The change left bnxt_re in DOA state in for-next branch. Fixing the regression to avoid the host crash when a user space object is created. Restricting the unconditional access to hwq.pg_arr when hwq is initialized for user space objects. Fixes: 161ebe24 ("RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL") Reported-by: NGal Pressman <galpress@amazon.com> Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Håkon Bugge 提交于
Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted form the cache. But life is not perfect, remote peers may die or be rebooted. Hence, it's a timeout to wipe out a cache entry, when the PF driver assumes the remote peer has gone. During workloads where a high number of QPs are destroyed concurrently, excessive amount of CM DREQ retries has been observed The problem can be demonstrated in a bare-metal environment, where two nodes have instantiated 8 VFs each. This using dual ported HCAs, so we have 16 vPorts per physical server. 64 processes are associated with each vPort and creates and destroys one QP for each of the remote 64 processes. That is, 1024 QPs per vPort, all in all 16K QPs. The QPs are created/destroyed using the CM. When tearing down these 16K QPs, excessive CM DREQ retries (and duplicates) are observed. With some cat/paste/awk wizardry on the infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the nodes: cm_rx_duplicates: dreq 2102 cm_rx_msgs: drep 1989 dreq 6195 rep 3968 req 4224 rtu 4224 cm_tx_msgs: drep 4093 dreq 27568 rep 4224 req 3968 rtu 3968 cm_tx_retries: dreq 23469 Note that the active/passive side is equally distributed between the two nodes. Enabling pr_debug in cm.c gives tons of: [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave: 1,sl_cm_id: 0xd393089f} is NULL! By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the tear-down phase of the application is reduced from approximately 90 to 50 seconds. Retries/duplicates are also significantly reduced: cm_rx_duplicates: dreq 2460 [] cm_tx_retries: dreq 3010 req 47 Increasing the timeout further didn't help, as these duplicates and retries stems from a too short CMA timeout, which was 20 (~4 seconds) on the systems. By increasing the CMA timeout to 22 (~17 seconds), the numbers fell down to about 10 for both of them. Adjustment of the CMA timeout is not part of this commit. Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com> Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 22 2月, 2019 3 次提交
-
-
由 Moni Shoua 提交于
When prefetching odp mr it is required to verify that pd of the mr is identical to the pd for which the advise_mr request arrived with. This check was missing from synchronous flow and is added now. Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support") Reported-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Moni Shoua 提交于
When deferring a prefetch request we need to protect against MR or PD being destroyed while the request is still enqueued. The first step is to validate that PD owns the lkey that describes the MR and that the MR that the lkey refers to is owned by that PD. The second step is to dequeue all requests when MR is destroyed. Since PD can't be destroyed while it owns MRs it is guaranteed that when a worker wakes up the request it refers to is still valid. Now, it is possible to refrain from taking a reference on the device since it is assured to be present as pd. While that, replace the dedicated ordered workqueue with the system unbound workqueue to reuse an existing resource and improve performance. This will also fix a bug of queueing to the wrong workqueue. Fixes: 813e90b1 ("IB/mlx5: Add advise_mr() support") Reported-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Gustavo A. R. Silva 提交于
Fix the following warning by adding a missing break: drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’: drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=] switch (prev->wr.opcode) { ^~~~~~ drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here case IB_WR_RDMA_READ: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Fixes: c6c23117 ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: NKaike Wan <Kaike.wan@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 21 2月, 2019 1 次提交
-
-
由 Davidlohr Bueso 提交于
The current check does not take into account the previous value of pinned_vm; thus it is quite bogus as is. Fix this by checking the new value after the (optimistic) atomic inc. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 20 2月, 2019 2 次提交
-
-
由 Wei Yongjun 提交于
Fixes the following sparse warning: drivers/infiniband/hw/cxgb4/cm.c:658:6: warning: symbol 'read_tcb' was not declared. Should it be static? Fixes: 11a27e21 ("iw_cxgb4: complete the cached SRQ buffers") Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Acked-by: NRaju Rangoju <rajur@chelsio.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Yangyang Li 提交于
The method of set hem for scc context is different from other contexts. It should notify the hardware with the detailed idx in bt0 for scc, while for other contexts, it only need to notify the bt step and the hardware will calculate the idx. Here fixes the following error when unloading the hip08 driver: [ 123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 123.579023] {1}[Hardware Error]: event severity: recoverable [ 123.584670] {1}[Hardware Error]: Error 0, type: recoverable [ 123.590317] {1}[Hardware Error]: section_type: PCIe error [ 123.595877] {1}[Hardware Error]: version: 4.0 [ 123.600395] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 123.606562] {1}[Hardware Error]: device_id: 0000:7d:00.0 [ 123.612034] {1}[Hardware Error]: slot: 0 [ 123.616120] {1}[Hardware Error]: secondary_bus: 0x00 [ 123.621245] {1}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa222 [ 123.627847] {1}[Hardware Error]: class_code: 000002 [ 123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000 [ 123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID [ 123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000 [ 123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!! [ 123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified [ 123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error [ 123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error [ 123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5 [ 123.683147] hns3 0000:7d:00.0: AER: Device recovery successful [ 123.688978] hns3 0000:7d:00.0: PF Reset requested [ 123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF [ 123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5! Fixes: 6a157f7d ("RDMA/hns: Add SCC context allocation support for hip08") Signed-off-by: NYangyang Li <liyangyang20@huawei.com> Reviewed-by: NYixian Liu <liuyixian@huawei.com> Reviewed-by: NLijun Ou <oulijun@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-