- 15 7月, 2015 1 次提交
-
-
由 Ira Weiny 提交于
We recently added BUG_ON's which were inappropriate for a condition which should never happen. Change these to be WARN_ON_ONCE as a debugging aid. Fixes: 4cd7c947 ('IB/mad: Add support for additional MAD info to/from drivers') Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 16 6月, 2015 3 次提交
-
-
由 Eran Ben Elisha 提交于
This is an infrastructure step for querying VF and PF counters. This code was in the IB driver, move it to the mlx4 core driver so it will be accessible for more use cases. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
As IB VFs are not capable to read the port counters through MADs, move there to read their own QP counters to gather statistics. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eran Ben Elisha 提交于
This is an infrastructure step to attach all the QPs opened from the IB driver to a counter in order to collect VF stats from the PF using those counters. If the port's type is Ethernet, the counter policy demands two counters per port (one for RoCE and one for Ethernet). The port default counter (allocated in mlx4_core) is used for the Ethernet netdev QPs and we allocate another counter for RoCE. If the port's traffic is Infiniband, the counter policy demands one counter per port, so it can use the port's default counter. Also, Add 'allocated' flag for each counter in order to clean it at unload. Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 6月, 2015 3 次提交
-
-
由 Ira Weiny 提交于
In order to support alternate sized MADs (and variable sized MADs on OPA devices) add in/out MAD size parameters to the process_mad core call. In addition, add an out_mad_pkey_index to communicate the pkey index the driver wishes the MAD stack to use when sending OPA MAD responses. The out MAD size and the out MAD PKey index are required by the MAD stack to generate responses on OPA devices. Furthermore, the in and out MAD parameters are made generic by specifying them as ib_mad_hdr rather than ib_mad. Drivers are modified as needed and are protected by BUG_ON flags if the MAD sizes passed to them is incorrect. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ira Weiny 提交于
In preparation to support the new OPA MAD Base version, add a base version parameter to ib_create_send_mad and set it to IB_MGMT_BASE_VERSION for current users. Definition of the new base version and it's processing will occur in later patches. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 02 6月, 2015 1 次提交
-
-
由 Ira Weiny 提交于
The process_mad device function declares some parameters as "in". Make those parameters const and adjust the call tree under process_mad in the various drivers accordingly. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NHal Rosenstock <hal@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 25 5月, 2015 1 次提交
-
-
由 Or Gerlitz 提交于
When multiplexling a MAD sent from VF, we should convert the port used by the guest to send the packet to the actual physical port which will be used to transmit the packet, before building the relevant address-handle (AH). This is needed under VPI for single ported VFs, since the code that builds the AH (mlx4_ib_query_ah()) makes decisions based on the input port. If we use the port number provided by the guest, it might have different protocol vs. the one this packat has to go from, and hence the result could be wrong. So far, the conversion was done after the AH was built and it worked for single ported Eth VFs which were not enabled under VPI. When adding support for single ported IB VFs and VPI, we hit that. Fixes: 449fc488 ('net/mlx4: Adapt code for N-Port VF') Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2015 1 次提交
-
-
由 Sebastian Ott 提交于
Since ib_dma_map_single can fail use ib_dma_mapping_error to check for errors. Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 19 3月, 2015 1 次提交
-
-
由 Majd Dibbiny 提交于
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of overflow, according to the IB spec, we have to saturate a counter to its max value, do that. Fixes: c3779134 ('IB/mlx4: Support PMA counters for IBoE') Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com> Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 1月, 2015 1 次提交
-
-
由 Yishai Hadas 提交于
Maintain a persistent memory that should survive reset flow/PCI error. This comes as a preparation for coming series to support above flows. Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 8月, 2014 1 次提交
-
-
由 Ira Weiny 提交于
Registrations options are specified through flags. Definitions of flags will be in subsequent patches. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 30 5月, 2014 1 次提交
-
-
由 Jack Morgenstein 提交于
Currently, VFs in SRIOV VFs are denied QP0 access. The main reason for this decision is security, since Subnet Management Datagrams (SMPs) are not restricted by network partitioning and may affect the physical network topology. Moreover, even the SM may be denied access from portions of the network by setting management keys unknown to the SM. However, it is desirable to grant SMI access to certain privileged VFs, so that certain network management activities may be conducted within virtual machines instead of the hypervisor. This commit does the following: 1. Create QP0 tunnel QPs for all VFs. 2. Discard SMI mads sent-from/received-for non-privileged VFs in the hypervisor MAD multiplex/demultiplex logic. SMI mads from/for privileged VFs are allowed to pass. 3. MAD_IFC wrapper changes/fixes. For non-privileged VFs, only host-view MAD_IFC commands are allowed, and only for SMI LID-Routed GET mads. For privileged VFs, there are no restrictions. This commit does not allow privileged VFs as yet. To determine if a VF is privileged, it calls function mlx4_vf_smi_enabled(). This function returns 0 unconditionally for now. The next two commits allow defining and activating privileged VFs. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 21 3月, 2014 1 次提交
-
-
由 Matan Barak 提交于
Adds support for N-Port VFs, this includes: 1. Adding support in the wrapped FW command In wrapped commands, we need to verify and convert the slave's port into the real physical port. Furthermore, when sending the response back to the slave, a reverse conversion should be made. 2. Adjusting sqpn for QP1 para-virtualization The slave assumes that sqpn is used for QP1 communication. If the slave is assigned to a port != (first port), we need to adjust the sqpn that will direct its QP1 packets into the correct endpoint. 3. Adjusting gid[5] to modify the port for raw ethernet In B0 steering, gid[5] contains the port. It needs to be adjusted into the physical port. 4. Adjusting number of ports in the query / ports caps in the FW commands When a slave queries the hardware, it needs to view only the physical ports it's assigned to. 5. Adjusting the sched_qp according to the port number The QP port is encoded in the sched_qp, thus in modify_qp we need to encode the correct port in sched_qp. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2014 3 次提交
-
-
由 Jack Morgenstein 提交于
Since there is no connection between the MAC/VLAN and the GID when using IP-based addressing, the proxy QP1 (running on the slave) must pass the source-mac, destination-mac, and vlan_id information separately from the GID. Additionally, the Host must pass the remote source-mac and vlan_id back to the slave, This is achieved as follows: Outgoing MADs: 1. Source MAC: obtained from the CQ completion structure (struct ib_wc, smac field). 2. Destination MAC: obtained from the tunnel header 3. vlan_id: obtained from the tunnel header. Incoming MADs 1. The source (i.e., remote) MAC and vlan_id are passed in the tunnel header to the proxy QP1. VST mode support: For outgoing MADs, the vlan_id obtained from the header is discarded, and the vlan_id specified by the Hypervisor is used instead. For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the "invalid" vlan (0xffff) is substituted when forwarding to the slave. Signed-off-by: NMoni Shoua <monis@mellanox.co.il> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
The GIDs are statically distributed, as follows: PF: gets 16 GIDs VFs: Remaining GIDS are divided evenly between VFs activated by the driver. If the division is not even, lower-numbered VFs get an extra GID. For an IB interface, the number of gids per guest remains as before: one gid per guest. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
This requires the following modifications: 1. Fix build_mlx4_header to properly fill in the ETH fields 2. Adjust mux and demux QP1 flow to support RoCE. This commit still assumes only one GID per slave for RoCE. The commit enabling multiple GIDs is a subsequent commit, and is done separately because of its complexity. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2013 1 次提交
-
-
由 Jack Morgenstein 提交于
When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index > 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 08 5月, 2013 1 次提交
-
-
由 Andrew Morton 提交于
Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 4月, 2013 1 次提交
-
-
由 Akinobu Mita 提交于
Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Steve Wise <swise@chelsio.com> Cc: linux-rdma@vger.kernel.org Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 16 2月, 2013 1 次提交
-
-
由 Dan Carpenter 提交于
We have to decrement "i" before calling mlx4_ib_free_demux_ctx() or we free something that wasn't allocated. That's fine for free_pv_object() but it would lead to a NULL dereference calling mlx4_ib_free_demux_ctx(). The null dereference is because ->tun is NULL when we check: if (!ctx->tun[i]) Also we didn't free ->sriov.demux[0] so it was a small leak. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 19 10月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
In the MAD paravirtualization code, one of the checks performed when forwarding QP1 (GSI) packets from wire to slave was a P_Key check: the P_Key received in the MAD must be present in the guest's paravirtualized P_Key table, and at least one of the (packet P_Key, guest P_Key) must be a full-membership P_Key. However, if everyone involved has only limited membership in the default P_Key, then packets sent by full-member remote hosts arrive at the PPF but are not passed on to the VFs with the current P_Key1 check. Fix this as follows: 1. Don't care if P_Key received over wire is full or not. If it successfully passed HW checks on the real QP1, then simply pass it to guest regardless of whether the guest has full or limited membership in its P_Key table. 2. If the guest (including paravirtualized master) has both full and limited P_Key forms in its table, preferentially pass the paravirtualized P_Key index of the full P_Key form in the tunnel header. 3. In the multicast join flow (mlx4/mcg.c), use the index for the default P_Key (wherever it is located) in replies generated from within the mcg module (previously, P_Key index 0 was used in all cases). Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 01 10月, 2012 13 次提交
-
-
由 Jack Morgenstein 提交于
When we have VFs and PFs on same host, the VFs are activated within the mlx4_core module before the mlx4_ib kernel module is loaded. When the mlx4_ib module initializes the PF (master), it now creates MAD paravirtualization contexts for any VFs that already active. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
Previously, the structure of a guest's proxy QPs followed the structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1, qp1 port 2, ...). The guest then did offset calculations on the sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP(). This is now changed so that the guest does no offset calculations regarding proxy or tunnel QPs to use. This change frees the PPF from needing to adhere to a specific order in allocating proxy and tunnel QPs. Now QUERY_FUNC_CAP provides each port individually with its proxy qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are used directly where required (with no offset calculations). To accomplish this change, several fields were added to the phys_caps structure for use by the PPF and by non-SR-IOV mode: base_sqpn -- in non-sriov mode, this was formerly sqp_start. base_proxy_sqpn -- the first physical proxy qp number -- used by PPF base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF. The current code in the PPF still adheres to the previous layout of sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this layout without affecting VF or (paravirtualized) PF code. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
This is necessary in order to support > 1 VF/PF in a VM for software that uses the node guid as a discriminator, such as librdmacm. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
1. Allow only master to change node description. 2. Prevent AH leakage in send mads. 3. Take device part number from PCI structure, so that guests see the VF part number (and not the PF part number). 4. Place the device revision ID into caps structure at startup. 5. SET_PORT in update_gids_task needs to go through wrapper on master. 6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work queue on the master, since it propagates events to slaves using GEN_EQE. 7. Do not support FMR on slaves. 8. Add spinlock to slave_event(), since it is called both in interrupt context and in process context (due to 6 above, and also if smp_snoop is used). This fix was found and implemented by Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
This directory is added only for the master -- slaves do not have it. The sysfs iov directory is used to manage and examine the port P_Key and guid paravirtualization. Under iov/ports, the administrator may examine the gid and P_Key tables as they are present in the device (and as are seen in the "network view" presented to the SM). Under the iov/<pci slot number> directories, the admin may map the index numbers in the physical tables (as under iov/ports) to the paravirtualized index numbers that guests see. For example, if the administrator, for port 1 on guest 2 maps physical pkey index 10 to virtual index 1, then that guest, whenever it uses its pkey index 1, will actually be using the real pkey index 10. Based on patch from Erez Shitrit <erezsh@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
P_Key change and guid change events are not of interest to all slaves, but only to those slaves which "see" the table slots whose contents have change. For example, if the guid at port 1, index 5 has changed in the PPF, we wish to propagate the gid-change event only to the function which has that guid index mapped to its port/guid table (in this case it is slave #5). Other functions should not get the event, since the event does not affect them. Similarly with P_Keys -- P_Key change events are forwarded only to slaves which have that P_Key index mapped to their virtual P_Key table. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
For IB ports, we paravirtualize the GUID at index 0 on slaves. The GUID at index 0 seen by a slave is the actual GUID occupying the GUID table at the slave-id index. The driver, by default, requests at startup time that subnet manager populate its entire guid table with GUIDs. These guids are then mapped (paravirtualized) to the slaves, and appear for each slave as its GUID at index 0. Until each slave has such a guid, its port status is DOWN. The guid table is cached to support special QP paravirtualization, and event propagation to slaves on guid change (we test to see if the guid really changed before propagating an event to the slave). To support this caching, add capability to __mlx4_ib_query_gid() to obtain the network view (i.e., physical view) gid at index X, not just the host (paravirtualized) view. Based on a patch from Erez Shitrit <erezsh@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Amir Vadai 提交于
In CM para-virtualization: 1. Incoming requests are steered to the correct vHCA according to the embedded GID. 2. Communication IDs on outgoing requests are replaced by a globally unique ID, generated by the PPF, since there is no synchronization of ID generation between guests (and so these IDs are not guaranteed to be globally unique). The guest's comm ID is stored, and is returned to the response MAD when it arrives. Signed-off-by: NAmir Vadai <amirv@mellanox.co.il> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Oren Duer 提交于
MCG paravirtualization support includes: - Creating multicast groups by VFs, and keeping accounting of them - Leaving multicast groups by VFs - Updating SM only with real changes in the overall picture of MCGs status - Creation of MGID=0 groups (let SM choose MGID) Note that the MCG module maintains its own internal MCG object reference counts. The reason for this is that the IB core is used to track only the multicast groups joins generated by the PF it runs over. The PF IB core layer is unaware of slaves, so it cannot be used to keep track of MCG joins they generate. Signed-off-by: NOren Duer <oren@mellanox.co.il> Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
The MAD_IFC firmware command fulfills two functions. First, it is used in the QP0/QP1 MAD-handling flow to obtain information from the FW (for answering queries), and for setting variables in the HCA (MAD SET packets). For this, MAD_IFC should provide the FW (physical) view of the data. This is the view that OpenSM needs. We call this the "network view". In the second case, MAD_IFC is used by various verbs to obtain data regarding the local HCA (e.g., ib_query_device()). We call this the "host view". This data needs to be paravirtualized. MAD_IFC therefore needs a wrapper function, and also needs another flag indicating whether it should provide the network view (when it is called by ib_process_mad in special-qp packet handling), or the host view (when it is called while implementing a verb). There are currently 2 flag parameters in mlx4_MAD_IFC already: ignore_bkey and ignore_mkey. These two parameters are replaced by a single "mad_ifc_flags" parameter, with different bits set for each flag. A third flag is added: "network-view/host-view". Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
Special QPs are paravirtualized. vHCAs are not given direct access to QP0/1. Rather, these QPs are operated by a special context hosted by the PF, which mediates access to/from vHCAs. This is done by opening a "tunnel" per vHCA port per QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the PF-context and a "Proxy QP" in the vHCA. All vHCA MAD traffic must pass through the corresponding tunnel. vHCA QPs cannot be assigned to VL15 and are denied of the well-known QKey. Outgoing messages are "de-multiplexed" (i.e., directed to the wire via the real special QP). Incoming messages are "multiplexed" (i.e. steered by the PPF to the correct VF or to the PF) QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual) QP0s, but they never receive any SMPs and all SMPs sent are discarded. QP1 traffic is allowed for all vHCAs, but special care is required to bridge the gap between the host and network views. Specifically: - Transaction IDs are mapped to guarantee uniqueness among vHCAs - CM para-virtualization o Incoming requests are steered to the correct vHCA according to the embedded GID o Local communication IDs are mapped to ensure uniqueness among vHCAs (see the patch that adds CM paravirtualization.) - Multicast para-virtualization o The PF context aggregates membership state from all vHCAs o The SA is contacted only when the aggregate membership changes o If the aggregate does not change, the PF context will provide the requesting vHCA with the proper response. (see the patch that adds multicast group paravirtualization) Incoming MADs are steered according to: - the DGID If a GRH is present - the mapped transaction ID for response MADs - the embedded GID in CM requests - the remote communication ID in other CM messages Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
This requires: 1. Replacing the paravirtualized P_Key index (inserted by the guest) with the real P_Key index. 2. For UD QPs, placing the guest's true source GID index in the address path structure mgid field, and setting the ud_force_mgid bit so that the mgid is taken from the QP context and not from the WQE when posting sends. 3. For UC and RC QPs, placing the guest's true source GID index in the address path structure mgid field. 4. For tunnel and proxy QPs, setting the Q_Key value reserved for that proxy/tunnel pair. Since not all the above adjustments occur in all the QP transitions, the QP transitions require separate wrapper functions. Secondly, initialize the P_Key virtualization table to its default values: Master virtualized table is 1-1 with the real P_Key table, guest virtualized table has P_Key index 0 mapped to the real P_Key index 0, and all the other P_Key indices mapped to the reserved (invalid) P_Key at index 127. Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache. and generating events on the master only if a P_Key actually changed. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
Allocate SR-IOV paravirtualization resources and MAD demuxing contexts on the master. This has two parts. The first part is to initialize the structures to contain the contexts. This is done at master startup time in mlx4_ib_init_sriov(). The second part is to actually create the tunneling resources required on the master to support a slave. This is performed the master detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event generated when a slave initializes its comm channel). For the master, there is no such startup event, so it creates its own tunneling resources when it starts up. In addition, the master also creates the real special QPs. The ib_core layer on the master causes creation of proxy special QPs, since the master is also paravirtualized at the ib_core layer. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 11 8月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
The sm_lock spinlock is taken in the process context by mlx4_ib_modify_device, and in the interrupt context by update_sm_ah, so we need to take that spinlock with irqsave, and release it with irqrestore. Lockdeps reports this as follows: [ INFO: inconsistent lock state ] 3.5.0+ #20 Not tainted inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib] {HARDIRQ-ON-W} state was registered at: [<ffffffff810b84a0>] mark_irqflags+0x120/0x190 [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0 [<ffffffff810b9f51>] lock_acquire+0xb1/0x150 [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50 [<ffffffffa028d563>] mlx4_ib_modify_device+0x63/0x240 [mlx4_ib] [<ffffffffa026d1fc>] ib_modify_device+0x1c/0x20 [ib_core] [<ffffffffa026c353>] set_node_desc+0x83/0xc0 [ib_core] [<ffffffff8136a150>] dev_attr_store+0x20/0x30 [<ffffffff81201fd6>] sysfs_write_file+0xe6/0x170 [<ffffffff8118da38>] vfs_write+0xc8/0x190 [<ffffffff8118dc01>] sys_write+0x51/0x90 [<ffffffff8155b869>] system_call_fastpath+0x16/0x1b ... *** DEADLOCK *** 1 lock held by swapper/0/0: stack backtrace: Pid: 0, comm: swapper/0 Not tainted 3.5.0+ #20 Call Trace: <IRQ> [<ffffffff810b7bea>] print_usage_bug+0x18a/0x190 [<ffffffff810b7370>] ? print_irq_inversion_bug+0x210/0x210 [<ffffffff810b7fb2>] mark_lock_irq+0xf2/0x280 [<ffffffff810b8290>] mark_lock+0x150/0x240 [<ffffffff810b84ef>] mark_irqflags+0x16f/0x190 [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0 [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffff810b9f51>] lock_acquire+0xb1/0x150 [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50 [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffffa026b2fa>] ? ib_create_ah+0x1a/0x40 [ib_core] [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffff810c27c3>] ? is_module_address+0x23/0x30 [<ffffffffa028b05b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib] [<ffffffffa028c177>] mlx4_ib_event+0x117/0x160 [mlx4_ib] [<ffffffff81552501>] ? _raw_spin_lock_irqsave+0x61/0x70 [<ffffffffa022718c>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core] [<ffffffffa0221b40>] mlx4_eq_int+0x500/0x950 [mlx4_core] Reported by: Or Gerlitz <ogerlitz@mellanox.com> Tested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 12 7月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
To allow easy paravirtualization of P_Key and GID table sizes, keep paravirtualized sizes in mlx4_dev->caps, but save the actual physical sizes from FW in struct: mlx4_dev->phys_cap. In addition, in SR-IOV mode, do the following: 1. Reduce reported P_Key table size by 1. This is done to reserve the highest P_Key index for internal use, for declaring an invalid P_Key in P_Key paravirtualization. We require a P_Key index which always contain an invalid P_Key value for this purpose (i.e., one which cannot be modified by the subnet manager). The way to do this is to reduce the P_Key table size reported to the subnet manager by 1, so that it will not attempt to access the P_Key at index #127. 2. Paravirtualize the GID table size to 1. Thus, each guest sees only a single GID (at its paravirtualized index 0). In addition, since we are paravirtualizing the GID table size to 1, we add paravirtualization of the master GID event here (i.e., we do not do ib_dispatch_event() for the GUID change event on the master, since its (only) GUID never changes). Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 11 7月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
The port management change event can replace smp_snoop. If the capability bit for this event is set in dev-caps, the event is used (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event mask in the MAP_EQ fw command). In this case, when the driver passes incoming SMP PORT_INFO SET mads to the FW, the FW generates port management change events to signal any changes to the driver. If the FW generates these events, smp_snoop shouldn't be invoked in ib_process_mad(), or duplicate events will occur (once from the FW-generated event, and once from smp_snoop). In the case where the FW does not generate port management change events smp_snoop needs to be invoked to create these events. The flow in smp_snoop has been modified to make use of the same procedures as in the fw-generated-event event case to generate the port management events (LID change, Client-rereg, Pkey change, and/or GID change). Port management change event handling required changing the mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument (last argument) had to be changed to unsigned long in order to accomodate passing the EQE pointer. We also needed to move the definition of struct mlx4_eqe from net/mlx4.h to file device.h -- to make it available to the IB driver, to handle port management change events. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 09 7月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
Define pr_fmt and add some pr_debug prints. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-