- 15 1月, 2014 1 次提交
-
-
由 Matan Barak 提交于
This patch add the support for Ethernet L2 attributes in the verbs/cm/cma structures. When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority in a similar manner that the IB L2 (and the L4 PKEY) attributes are used. Thus, those attributes were added to the following structures: * ib_ah_attr - added dmac * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority) * ib_wc - added smac, vlan_id * ib_sa_path_rec - added smac, dmac, vlan_id * cm_av - added smac and vlan_id For the path record structure, extra care was taken to avoid the new fields when packing it into wire format, so we don't break the IB CM and SA wire protocol. On the active side, the CM fills. its internal structures from the path provided by the ULP. We add there taking the ETH L2 attributes and placing them into the CM Address Handle (struct cm_av). On the passive side, the CM fills its internal structures from the WC associated with the REQ message. We add there taking the ETH L2 attributes from the WC. When the HW driver provides the required ETH L2 attributes in the WC, they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core code checks for the presence of these flags, and in their absence does address resolution from the ib_init_ah_from_wc() helper function. ib_modify_qp_is_ok is also updated to consider the link layer. Some parameters are mandatory for Ethernet link layer, while they are irrelevant for IB. Vendor drivers are modified to support the new function signature. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 18 11月, 2013 2 次提交
-
-
由 Matan Barak 提交于
This commit reverts commit 7afbddfa ("IB/core: Temporarily disable create_flow/destroy_flow uverbs"). Since the uverbs extensions functionality was experimental for v3.12, this patch re-enables the support for them and flow-steering for v3.13. Signed-off-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Yann Droneaud 提交于
Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit 400dbc96, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ | flags | 00 00 | command | | in_words | out_words | +----------------------------------------+ | response | | response | | provider_in_words | provider_out_words | | padding | +----------------------------------------+ | | . <uverbs input> . . (in_words * 8) . | | +----------------------------------------+ | | . <provider input> . . (provider_in_words * 8) . | | +----------------------------------------+ - response, if present: +----------------------------------------+ | | . <uverbs output space> . . (out_words * 8) . | | +----------------------------------------+ | | . <provider output space> . . (provider_out_words * 8) . | | +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 16 11月, 2013 2 次提交
-
-
由 Eli Cohen 提交于
Move the check on max supported CQEs after the final number of entries is evaluated. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Eli Cohen 提交于
When calling get_sw_cqe() we need pass the consumer_index and not the masked value. Failure to do so will cause incorrect result of get_sw_cqe() possibly leading to endless loop. This problem was reported and analyzed by Michael Rice from HP. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 08 11月, 2013 1 次提交
-
-
由 Jack Morgenstein 提交于
To guarantee that all unused fields in all FW commands for both inboxes and outboxes are zeroed out, initialize the mailbox buffer to all zeroes. This is especially important for SRIOV comm-channel virtual commands (such as QUERY_FUNC_CAP), where if new fields are added to support new features, the driver can depend on older kernels passing zeroes in these fields. In addition to zeroing out the mailbox buffer at allocation time, all (now unnecessary) calls to memset by the callers of mlx4_alloc_cmd_mailbox() are removed. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 11月, 2013 1 次提交
-
-
由 Jack Morgenstein 提交于
This is step #1 for implementing SRIOV resource quotas for VFs. Quotas are implemented per resource type for VFs and the PF, to prevent any entity from simply grabbing all the resources for itself and leaving the other entities unable to obtain such resources. Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC, VLAN, and Counters. The quota system works as follows: Each entity (VF or PF) is given a max number of a given resource (its quota), and a guaranteed minimum number for each resource (starvation prevention). For QPs, CQs, SRQs, MPTs and MTTs: 50% of the available quantity for the resource is divided equally among the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool. The other 50% is the "free pool", allocated on a first-come-first-serve basis. For each VF/PF, resources are first allocated from its "guaranteed-minimum" pool. When that pool is exhausted, the driver attempts to allocate from the resource "free-pool". The quota (i.e., max) for the VFs and the PF is: The free-pool amount (50% of the real max) + the guaranteed minimum For MACs: Guarantee 2 MACs per VF/PF per port. As a result, since we have only 128 MACs per port, reduce the allowable number of VFs from 64 to 63. Any remaining MACs are put into a free pool. For VLANs: For the PF, the per-port quota is 128 and guarantee is 64 (to allow the PF to register at least a VLAN per VF in VST mode). For the VFs, the per-port quota is 64 and the guarantee is 0. We assume that VGT VFs are trusted not to abuse the VLAN resource. For Counters: For all functions (PF and VFs), the quota is 128 and the guarantee is 0. In this patch, we define the needed structures, which are added to the resource-tracker struct. In addition, we do initialization for the resource quota, and adjust the query_device response to use quotas rather than resource maxima. As part of the implementation, we introduce a new field in mlx4_dev: quotas. This field holds the resource quotas used to report maxima to the upper layers (ib_core, via query_device). The HCA maxima of these values are passed to the VFs (via QUERY_HCA) so that they may continue to use these in handling QPs, CQs, SRQs and MPTs. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 10月, 2013 1 次提交
-
-
由 Yann Droneaud 提交于
The create_flow/destroy_flow uverbs and the associated extensions to the user-kernel verbs ABI are under review and are too experimental to freeze at this point. So userspace is not exposed to experimental features and an uinstable ABI, temporarily disable this for v3.12 (with a Kconfig option behind staging to reenable it if desired). The feature will be enabled after proper cleanup for v3.13. Signed-off-by: NYann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com [ Add a Kconfig option to reenable these verbs. - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 29 8月, 2013 1 次提交
-
-
由 Hadar Hen Zion 提交于
Implement ib_create_flow() and ib_destroy_flow(). Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides a 64-bit registration ID, which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 01 8月, 2013 1 次提交
-
-
由 Jack Morgenstein 提交于
When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index > 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 29 5月, 2013 1 次提交
-
-
由 Jiri Pirko 提交于
So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: NJiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 5月, 2013 1 次提交
-
-
由 Andrew Morton 提交于
Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 4月, 2013 1 次提交
-
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 4月, 2013 3 次提交
-
-
由 Eli Cohen 提交于
When the link type is Ethernet, setting the link type in the QP context will enable TCP/IP stateless offloads (checksum, LSO, RSS) for RAW PACKET Ethernet QPs. For IB UD QPs this worked OK since the value assumed by the firmware for IB link layer is zero. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Dotan Barak 提交于
Fix the asymmetric behavior w.r.t VLAN insertion/stripping for RAW PACKET QPs -- we don't insert on send and need not strip on receive. Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Amir Vadai 提交于
The patch allows to enable/disable HW timestamping for incoming and/or outgoing packets. It adds and initializes all structs and callbacks needed by kernel TS API. To enable/disable HW timestamping appropriate ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are supported. When enabling TS on receive flow - VLAN stripping will be disabled. Also were made all relevant changes in RX/TX flows to consider TS request and plant HW timestamps into relevant structures. mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature. Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com> Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 4月, 2013 2 次提交
-
-
由 Akinobu Mita 提交于
Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Steve Wise <swise@chelsio.com> Cc: linux-rdma@vger.kernel.org Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Shlomo Pongratz 提交于
An XRC target QP may redirect to more than one XRC SRQ. This means that for work completions associated with a XRC TGT QP, the srq field in the QP has no usage and the real XRC SRQ need to be retrived using the information from the XRCETH placed into the CQE, do that. Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 14 3月, 2013 1 次提交
-
-
由 Tejun Heo 提交于
Commit 6a920060 ("IB/mlx4: convert to idr_alloc()") forgot to remove idr_pre_get() call in mlx4_ib_cm_paravirt_init(). It's unnecessary and idr_pre_get() will soon be deprecated. Remove it. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 2月, 2013 2 次提交
-
-
由 Tejun Heo 提交于
MAX_IDR_MASK is another weirdness in the idr interface. As idr covers whole positive integer range, it's defined as 0x7fffffff or INT_MAX. Its usage in idr_find(), idr_replace() and idr_remove() is bizarre. They basically mask off the sign bit and operate on the rest, so if the caller, by accident, passes in a negative number, the sign bit will be masked off and the remaining part will be used as if that was the input, which is worse than crashing. The constant is visible in idr.h and there are several users in the kernel. * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter() Basically used to test if adap->nr is a negative number which isn't -1 and returns -EINVAL if so. idr_alloc() already has negative @start checking (w/ WARN_ON_ONCE), so this can go away. * drivers/infiniband/core/cm.c:cm_alloc_id() drivers/infiniband/hw/mlx4/cm.c:id_map_alloc() Used to wrap cyclic @start. Can be replaced with max(next, 0). Note that this type of cyclic allocation using idr is buggy. These are prone to spurious -ENOSPC failure after the first wraparound. * fs/super.c:get_anon_bdev() The ID allocated from ida is masked off before being tested whether it's inside valid range. ida allocated ID can never be a negative number and the masking is unnecessary. Update idr_*() functions to fail with -EINVAL when negative @id is specified and update other MAX_IDR_MASK users as described above. This leaves MAX_IDR_MASK without any user, remove it and relocate other MAX_IDR_* constants to lib/idr.c. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: NWolfram Sang <wolfram@the-dreams.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
Convert to the much saner new idr interface. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 2月, 2013 6 次提交
-
-
由 Shani Michaeli 提交于
Indicate memory windows support through device capabilities, kernel verb entries and the relevant uverbs command mask entries. Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NShani Michaeli <shanim@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Shani Michaeli 提交于
* Implement memory windows binding in mlx4_ib_post_send. * Implement mlx4_ib_bind_mw by deferring to mlx4_ib_post_send. * Rename MLX4_WQE_FMR_PERM_* flags to MLX4_WQE_FMR_AND_BIND_PERM_*, indicating that they are used both for fast registration work requests, and for memory window bind work requests. Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NShani Michaeli <shanim@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Shani Michaeli 提交于
Implement MW allocation and deallocation in mlx4_core and mlx4_ib. Pass down the enable bind flag when registering memory regions. Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NShani Michaeli <shanim@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Syam Sidhardhan 提交于
kfree on NULL pointer is a no-op. Signed-off-by: NSyam Sidhardhan <s.syam@samsung.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Paul Bolle 提交于
Building qp.o triggers this gcc warning: drivers/infiniband/hw/mlx4/qp.c: In function ‘mlx4_ib_post_send’: drivers/infiniband/hw/mlx4/qp.c:1862:62: warning: ‘vlan’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/infiniband/hw/mlx4/qp.c:1752:6: note: ‘vlan’ was declared here Looking at the code it is clear 'vlan' is only set and used if 'is_eth' is non-zero. But by initializing 'vlan' to 0xffff, on gcc (Ubuntu 4.7.2-22ubuntu1) 4.7.2 on x86-64 at least, we fix the warning, and the compiler was already setting 'vlan' to 0 in the generated code, so there's no real downside. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> [ Get rid of unnecessary move of 'is_vlan' initialization. - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Roland Dreier 提交于
Matches the way they're used, and actually lets at least x86-64 generate better code: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-38 (-38) function old new delta mlx4_ib_post_send 4416 4378 -38 Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 22 2月, 2013 2 次提交
-
-
由 Shani Michaeli 提交于
MR deregistration fails when memory windows are bound to the MR. Handle such failures by propagating them to the caller ULP. Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NShani Michaeli <shanim@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Shani Michaeli 提交于
Remove unused fields from the local invalidate WQE segment structure. Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NShani Michaeli <shanim@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 16 2月, 2013 2 次提交
-
-
由 Julia Lawall 提交于
Delete successive tests to the same location. The code tested the result of a previous allocation, that itself was already tested. It is changed to test the result of the most recent allocation. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @s exists@ local idexpression y; expression x,e; @@ *if ( \(x == NULL\|IS_ERR(x)\|y != 0\) ) { ... when forall return ...; } ... when != \(y = e\|y += e\|y -= e\|y |= e\|y &= e\|y++\|y--\|&y\) when != \(XT_GETPAGE(...,y)\|WMI_CMD_BUF(...)\) *if ( \(x == NULL\|IS_ERR(x)\|y != 0\) ) { ... when forall return ...; } // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Dan Carpenter 提交于
We have to decrement "i" before calling mlx4_ib_free_demux_ctx() or we free something that wasn't allocated. That's fine for free_pv_object() but it would lead to a NULL dereference calling mlx4_ib_free_demux_ctx(). The null dereference is because ->tun is NULL when we check: if (!ctx->tun[i]) Also we didn't free ->sriov.demux[0] so it was a small leak. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 30 11月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
lockdep warns about taking a hard-irq-unsafe lock (sriov->id_map_lock) inside a hard-irq-safe lock (sriov->going_down_lock). Since id_map_lock is never taken in the interrupt context, we can simply reverse the order of taking the two spinlocks, thus avoiding the warning and the depencency. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 27 11月, 2012 1 次提交
-
-
由 Or Gerlitz 提交于
ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when **needed** i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 19 10月, 2012 3 次提交
-
-
由 Eli Cohen 提交于
A client re-register event invokes cleanup of all MCGs. This is required to protect against misbehaved guests leading to corruption of join/leave database. However, since cleaning up the MCGs is a heavy operation, it is pushed to a work queue for further processing. Client re-register is also propagated to ULPs (e.g IPoIB). However, since the cleanup is performed in a workqueue, the ULP could leave and re-join groups before the cleanup occurs. In this case, when the cleanup takes place, it prunes the (newly-joined) MCGs and the ULP is left without actual MCGs while believing it joined them. Fix this by setting the flushing flag before invoking the cleanup task and clearing it after flushing is complete. Signed-off-by: NEli Cohen <eli@mellanox.com> Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
In the MAD paravirtualization code, one of the checks performed when forwarding QP1 (GSI) packets from wire to slave was a P_Key check: the P_Key received in the MAD must be present in the guest's paravirtualized P_Key table, and at least one of the (packet P_Key, guest P_Key) must be a full-membership P_Key. However, if everyone involved has only limited membership in the default P_Key, then packets sent by full-member remote hosts arrive at the PPF but are not passed on to the VFs with the current P_Key1 check. Fix this as follows: 1. Don't care if P_Key received over wire is full or not. If it successfully passed HW checks on the real QP1, then simply pass it to guest regardless of whether the guest has full or limited membership in its P_Key table. 2. If the guest (including paravirtualized master) has both full and limited P_Key forms in its table, preferentially pass the paravirtualized P_Key index of the full P_Key form in the tunnel header. 3. In the multicast join flow (mlx4/mcg.c), use the index for the default P_Key (wherever it is located) in replies generated from within the mcg module (previously, P_Key index 0 was used in all cases). Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Doug Ledford 提交于
Line 110 uses UL as a compiler cast for the 0x constant, but it's not large enough to hold a 64-bit value on a 32-bit arch. Signed-off-by: NDoug Ledford <dledford@redhat.com> [ Use "-1" instead of "FFFFFFFFFFFFFFFFULL". - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 06 10月, 2012 1 次提交
-
-
由 Fengguang Wu 提交于
To avoid name conflicts: drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined While at it, also make the other names more consistent and add parentheses. [akpm@linux-foundation.org: repair fallout] [sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change] Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: walter harms <wharms@bfs.de> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 10月, 2012 3 次提交
-
-
由 Jack Morgenstein 提交于
When we have VFs and PFs on same host, the VFs are activated within the mlx4_core module before the mlx4_ib kernel module is loaded. When the mlx4_ib module initializes the PF (master), it now creates MAD paravirtualization contexts for any VFs that already active. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
Previously, the structure of a guest's proxy QPs followed the structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1, qp1 port 2, ...). The guest then did offset calculations on the sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP(). This is now changed so that the guest does no offset calculations regarding proxy or tunnel QPs to use. This change frees the PPF from needing to adhere to a specific order in allocating proxy and tunnel QPs. Now QUERY_FUNC_CAP provides each port individually with its proxy qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are used directly where required (with no offset calculations). To accomplish this change, several fields were added to the phys_caps structure for use by the PPF and by non-SR-IOV mode: base_sqpn -- in non-sriov mode, this was formerly sqp_start. base_proxy_sqpn -- the first physical proxy qp number -- used by PPF base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF. The current code in the PPF still adheres to the previous layout of sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this layout without affecting VF or (paravirtualized) PF code. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
This is necessary in order to support > 1 VF/PF in a VM for software that uses the node guid as a discriminator, such as librdmacm. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-