- 09 12月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
mlx4 devices (ConnectX-2, ConnectX-3) has a limitation where rdma read work queue entries cannot exceed 512 bytes. A rdma_read wqe needs to fit in 512 bytes: - wqe control segment (16 bytes) - rdma segment (16 bytes) - scatter elements (16 bytes each) So max_sge_rd should be: (512 - 16 - 16) / 16 = 30. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 29 10月, 2015 2 次提交
-
-
由 Sagi Grimberg 提交于
No ULP uses it anymore, go ahead and remove it. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Acked-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Support the new memory registration API by allocating a private page list array in mlx4_ib_mr and populate it when mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx4_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Tested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 22 10月, 2015 3 次提交
-
-
由 Matan Barak 提交于
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: NMatan Barak <matanb@mellanox.com> Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Eran Ben Elisha 提交于
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is now supported downstream. In addition, this flag was supported only for IB_QPT_UD, now, with the new implementation it is supported for all QP types. Support IB_USER_VERBS_EX_CMD_CREATE_QP in order to get the flag from user space using the extension create qp command. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Eran Ben Elisha 提交于
This is an infrastructure step for allocating and attaching more than one counter to QPs on the same port. Allocate a counters table and manage the insertion and removals of the counters in load and unload of mlx4 IB. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 29 9月, 2015 1 次提交
-
-
由 Bodong Wang 提交于
Signed-off-by: NBodong Wang <bodong@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 31 8月, 2015 4 次提交
-
-
由 Yishai Hadas 提交于
Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnecting. This is done by managing the VMAs that were mapped to the HW bars such as door bell and blueflame. When need to detach remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Moni Shoua 提交于
Manage RoCE gid table with logic in IB/core, which is common to all vendors, and remove the mechanism from the mlx4 IB driver. Since management of the GID cache may lead to index mismatch with the hardware GID table, a translation between indexes is required when modifying a QP or creating an address handle. Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Moni Shoua 提交于
get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 29 8月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
Applications must not assume that max_sge and max_sge_rd are the same, Hence expose max_sge_rd correctly as well. Reported-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 15 7月, 2015 4 次提交
-
-
由 Doug Ledford 提交于
There is little chance our memory allocation will fail, so we can combine initializing the work structs with allocating them instead of looping through all of them once to allocate and again to initialize. Then when we need to actually find out if our device is up or in the process of going down, have all of our work structs batched up, take the spin_lock once and only once, and do all of the batch under the one spin_lock invocation instead of incurring all of the locked memory cycles we would otherwise incur to take/release the spin_lock over and over again. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Doug Ledford 提交于
We create a number of work structs to be queued up to a workqueue, and on completion of the workqueue handler, the workqueue handler frees the allocated memory. If, however, we don't queue the work struct because the device is going down, then we need to free the memory ourselves. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Maninder Singh 提交于
On failure, we loop through all possible pointers and test them before calling kfree. But really, why even attempt to free items we didn't allocate when we can easily loop through exactly and only the devices for which the original memory allocation succeeded and free just those. Signed-off-by: NManinder Singh <maninder1.s@samsung.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
mlx4 VFs can provide CQE raw time-stamping services, but they don't have the hca core clock mapped to their PCI bars. As such, we should not attempt to query and report the clock offset to user space for VFs. Doing so causes query_device over VFs to fail with -ENOSUPP. Fixes: 4b664c43 ('IB/mlx4: Add support for CQ time-stamping') Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 16 6月, 2015 1 次提交
-
-
由 Eran Ben Elisha 提交于
This is an infrastructure step to attach all the QPs opened from the IB driver to a counter in order to collect VF stats from the PF using those counters. If the port's type is Ethernet, the counter policy demands two counters per port (one for RoCE and one for Ethernet). The port default counter (allocated in mlx4_core) is used for the Ethernet netdev QPs and we allocate another counter for RoCE. If the port's traffic is Infiniband, the counter policy demands one counter per port, so it can use the port's default counter. Also, Add 'allocated' flag for each counter in order to clean it at unload. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 6月, 2015 5 次提交
-
-
由 Ira Weiny 提交于
Add max MAD size to the device immutable data set and have all drivers that support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core. Verify MAD size data in both the MAD core and when reading the immutable data. OPA drivers will report alternate MAD sizes in subsequent patches. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
This includes: * support allocation of CQ with the TIMESTAMP_COMPLETION creation flag. * add timestamp_mask and hca_core_clock to query_device, reporting the number of supported timestamp bits (mask) and the hca_core_clock frequency. * return hca core clock's offset in query_device vendor's data, this is needed in order to read the HCA's core clock. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
In order to read the HCA's cycle counter efficiently in user space, we need to map the HCA's register. This is done through mmap call. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
Vendors should be able to pass vendor specific data to/from user-space via query_device uverb. In order to do this, we need to pass the vendors' specific udata. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 02 6月, 2015 1 次提交
-
-
由 Roland Dreier 提交于
The unwinding clean up code are err_create_flow starts at the current index i. That means we shouldn't increment i until we're really sure we won't have to destroy the current flow; otherwise we might increment the index, fail inside an is_bonded block, and end up accessing off the end of the reg_id[] array. This was detected by Coverity (CID 1271229). Signed-off-by: NRoland Dreier <roland@purestorage.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 31 5月, 2015 2 次提交
-
-
由 Matan Barak 提交于
Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NIdo Shamay <idos@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matan Barak 提交于
In SRIOV, when simple (i.e - Ethernet L2 only) flow steering rules are created, always create them at MLX4_DOMAIN_NIC priority (instead of the real priority the function created them at). This is done in order to let multiple functions add broadcast/multicast rules without affecting other functions, which is necessary for DPDK in SRIOV. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 5月, 2015 2 次提交
-
-
由 Ira Weiny 提交于
Remove query_protocol callback Use the new Core Capability bits for: rdma_protocol_* rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ira Weiny 提交于
As of commit 5eb620c8 "IB/core: Add helpers for uncached GID and P_Key searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in the ib_device. The per port core capability flags to be added later are also immutable data to be stored in the ib_device object. In preparation for this create a structure for per port immutable data and place the pkey and gid table lengths within this structure. "get_port_immutable" is added as a mandatory device function to allow the drivers to fill in this data. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 19 5月, 2015 1 次提交
-
-
由 Michael Wang 提交于
Add new callback query_protocol() and implement for each HW. Mapping List: node-type link-layer transport protocol nes RNIC ETH IWARP IWARP amso1100 RNIC ETH IWARP IWARP cxgb3 RNIC ETH IWARP IWARP cxgb4 RNIC ETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4 IB_CA IB/ETH IB IB/IBOE mlx5 IB_CA IB IB IB ehca IB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Signed-off-by: NMichael Wang <yun.wang@profitbricks.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Tested-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NSean Hefty <sean.hefty@intel.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 13 5月, 2015 1 次提交
-
-
由 Joe Perches 提交于
These KERN_<LEVEL> uses are unnecessary with pr_<level> and cause bad logging output so remove them. Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 16 4月, 2015 2 次提交
-
-
由 Yishai Hadas 提交于
Change the default mode to be HOST assigned instead of SM assigned. This is the expected operational mode, because it doesn't depend on SM availability. As PF generates random GUIDs as the initial admin values, this gives out of the box experience. Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Yishai Hadas 提交于
Request GIDs from the SM on demand, i.e., when a VF actually needs them, and release them when the GIDs are no longer in use. In cloud environments, this is useful for GID migrations, in which a GID is assigned to a VF on the destination HCA, while the VF on the source HCA is shutdown (but the GID was not administratively released). Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 03 4月, 2015 1 次提交
-
-
由 Ido Shamay 提交于
The calls to SET_PORT used hard-code numbers, when supplying command's opcode modifiers, fix that to use well defined constants. Signed-off-by: NIdo Shamay <idos@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2015 1 次提交
-
-
由 Moni Shoua 提交于
Processing an event is done in a different context from the one when the event was dispatched. This requires a check that the slave net device is still valid when the event is being processed. The check is done under the iboe lock which ensure correctness. Fixes: a5750090 ('IB/mlx4: Add port aggregation support') Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2015 1 次提交
-
-
由 Or Gerlitz 提交于
The MLX4_PROT_IB_IPV4 protocol should only be used with RoCEv2 and such. Removing this wrong usage allows to run multicast applications over RoCE. Fixes: d487ee77 ("IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table") Reported-by: NCarol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 10 2月, 2015 2 次提交
-
-
由 Yishai Hadas 提交于
The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Moni Shoua 提交于
When attaching a QP to a multicast address in bonded mode, there was an assumption that the port of the QP must be #1. This assumption isn't the case under the flow which enables maximal usage of the physical ports. Fix it by always checking the port of the original flow and create the mirrored flow on the other port. Fixes: c6215745 ('IB/mlx4: Load balance ports in port aggregation mode') Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 2月, 2015 3 次提交
-
-
由 Moni Shoua 提交于
When the mlx4 IB (RoCE) device works in link aggregation mode, it exposes a single port to upper layers. Therefore, applications always set '1' in port_num attribute when modifying a QP or creating an address handle. To make sure that a node uses all available ports the mlx4 driver will override the port_num attribute with a round robin policy. Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Moni Shoua 提交于
In port aggregation mode flows for port #1 (the only port) should be mirrored on port #2. This is because packets can arrive from either physical ports. Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Moni Shoua 提交于
Register the interface with the mlx4 core driver with port aggregation support and check for port aggregation mode when the 'add' function is called. In this mode, only one physical port is reported to the upper layer (RoCE/IB core stack and ULPs). Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 1月, 2015 1 次提交
-
-
由 Yishai Hadas 提交于
Maintain a persistent memory that should survive reset flow/PCI error. This comes as a preparation for coming series to support above flows. Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-