提交 · 7ffdf726cfe0d188907bdbb0e7729fb35a69c219 · _Walt / cloud-kernel

01 1月, 2014 1 次提交

net/mlx4_core: Add basic support for TCP/IP offloads under tunneling · 7ffdf726

由 Or Gerlitz 提交于 12月 23, 2013

Add the low-level device commands and definitions used for TCP/IP HW offloads
of tunneled/vxlan traffic which are supported by the ConnectX3-pro NIC.

This is done through the following elements:

 - read tunneling device caps in QUERY_DEV_CAP
 - add helper function to do SET_PORT for tunneling
 - add DMFS VXLAN steering rule definitions
 - add CQE and WQE checksum offload field definitions
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ffdf726

20 12月, 2013 2 次提交

net/mlx4_core: Set CQE/EQE size to 64B by default · be902ab1

由 Eyal Perry 提交于 12月 19, 2013

To achieve out of the box performance default is to use 64 byte CQE/EQE.
In tests that we conduct in our labs, we achieved a performance
improvement of twice the message rate. For older VF/libmlx4 support,
enable_64b_cqe_eqe must be set to 0 (disabled).
Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be902ab1

net/mlx4_core: Expose physical port id as PF/VF capability · 8e1a28e8

由 Hadar Hen Zion 提交于 12月 19, 2013

Add the infrastructure needed to support ndo_get_phys_port_id which
allows users to identify to which physical port a net-device is connected
to by reading a unique port id.
This will work for VFs and PFs.
The driver uses a new device capability - phys_port_id, The PF driver
reads the port phys_port_id from Firmware and stores it. The VF driver
reads the port phys_port_id from the PF using QUERY_FUNC_CAP command.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e1a28e8

10 12月, 2013 1 次提交

mlx4_core: Roll back round robin bitmap allocation commit for CQs, SRQs, and MPTs · 7c6d74d2

由 Jack Morgenstein 提交于 12月 08, 2013

Commit f4ec9e95 "mlx4_core: Change bitmap allocator to work in round-robin fashion"
introduced round-robin allocation (via bitmap) for all resources which allocate
via a bitmap.

Round robin allocation is desirable for mcgs, counters, pd's, UARs, and xrcds.
These are simply numbers, with no involvement of ICM memory mapping.

Round robin is required for QPs, since we had a problem with immediate
reuse of a 24-bit QP number (commit f4ec9e95).

However, for other resources which use the bitmap allocator and involve
mapping ICM memory -- MPTs, CQs, SRQs -- round-robin is not desirable.

What happens in these cases is the following:

ICM memory is allocated and mapped in chunks of 256K.

Since the resource allocation index goes up monotonically, the allocator
will eventually require mapping a new chunk. Now, chunks are also unmapped
when their reference count goes back to zero. Thus, if a single app is
running and starts/exits frequently we will have the following situation:

When the app starts, a new chunk must be allocated and mapped.

When the app exits, the chunk reference count goes back to zero, and the
chunk is unmapped and freed. Therefore, the app must pay the cost of allocation
and mapping of ICM memory each time it runs (although the price is paid only when
allocating the initial entry in the new chunk).

For apps which allocate MPTs/SRQs/CQs and which operate as described above,
this presented a performance problem.

We therefore roll back the round-robin allocator modification for MPTs, CQs, SRQs.
Reported-by: NMatthew Finlay <matt@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c6d74d2

04 12月, 2013 1 次提交

net/mlx4_core: destroy workqueue when driver fails to register · 1b85ee09

由 Wei Yang 提交于 12月 03, 2013

When driver registration fails, we need to clean up the resources allocated
before. mlx4_core missed destroying the workqueue allocated.

This patch destroys the workqueue when registration fails.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b85ee09

08 11月, 2013 1 次提交

net/mlx4_core: ICM pages are allocated on device NUMA node · 6e7136ed

由 Eugenia Emantayev 提交于 11月 07, 2013

This is done to optimize FW/HW access to host memory.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e7136ed

05 11月, 2013 1 次提交

mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61

由 Jack Morgenstein 提交于 11月 03, 2013

This is step #1 for implementing SRIOV resource quotas for VFs.

Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.

Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                             VLAN, and Counters.

The quota system works as follows:
Each entity (VF or PF) is given a max number of a given resource (its quota),
and a guaranteed minimum number for each resource (starvation prevention).

For QPs, CQs, SRQs, MPTs and MTTs:
50% of the available quantity for the resource is divided equally among
the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
The other 50% is the "free pool", allocated on a first-come-first-serve basis.
For each VF/PF, resources are first allocated from its "guaranteed-minimum"
pool. When that pool is exhausted, the driver attempts to allocate from
the resource "free-pool".

The quota (i.e., max) for the VFs and the PF is:
  The free-pool amount (50% of the real max) + the guaranteed minimum

For MACs:
  Guarantee 2 MACs per VF/PF per port. As a result, since we have only
  128 MACs per port, reduce the allowable number of VFs from 64 to 63.
  Any remaining MACs are put into a free pool.

For VLANs:
  For the PF, the per-port quota is 128 and guarantee is 64
     (to allow the PF to register at least a VLAN per VF in VST mode).
  For the VFs, the per-port quota is 64 and the guarantee is 0.
      We assume that VGT VFs are trusted not to abuse the VLAN resource.

For Counters:
  For all functions (PF and VFs), the quota is 128 and the guarantee is 0.

In this patch, we define the needed structures, which are added to the
resource-tracker struct.  In addition, we do initialization
for the resource quota, and adjust the query_device response to use quotas
rather than resource maxima.

As part of the implementation, we introduce a new field in
mlx4_dev: quotas.  This field holds the resource quotas used
to report maxima to the upper layers (ib_core, via query_device).

The HCA maxima of these values are passed to the VFs (via
QUERY_HCA) so that they may continue to use these in handling
QPs, CQs, SRQs and MPTs.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a0d0a61

18 10月, 2013 1 次提交

net/mlx4_core: Load higher level modules according to ports type · b046ffe5

由 Eyal Perry 提交于 10月 15, 2013

Mellanox ConnectX architecture is: mlx4_core is the lower level
PCI driver which register on the PCI id, and protocol specific drivers
are depended on it: mlx4_en - for Ethernet and mlx4_ib for Infiniband.
NIC could have multiple ports which can change their type dynamically.
We use the request_module() call to load the relevant protocol driver
when needed: on loading time or at port type change event.
Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b046ffe5

02 8月, 2013 1 次提交

net/mlx4_core: VFs must ignore the enable_64b_cqe_eqe module param · b3051320

由 Jack Morgenstein 提交于 8月 01, 2013

Slaves get the 64B CQE/EQE state from QUERY_HCA, not from the module parameter.

If the parameter is set to zero, the slave outputs an incorrect/irrelevant
warning message that 64B CQEs/EQEs are supported but not enabled (even if the
hypervisor has enabled 64B CQEs/EQEs).
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3051320

29 7月, 2013 1 次提交

net/mlx4_core: Respond to operation request by firmware · fe6f700d

由 Yevgeny Petrilin 提交于 7月 28, 2013

This commit adds new firmware command and new firmware event.  The firmware
raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
command. At the moment the supported operation is adding/removing multicast
entries which are used by the firmware for handling NCSI traffic in B0
steering mode.

Also, had to swap the order of mlx4_init_mcg_table() and
mlx4_init_eq_table() to make sure that driver will get events only after
resources are initialized to handle it.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe6f700d

26 6月, 2013 2 次提交

net/mlx4_core: Fail device init if num_vfs is negative · 30e514a7

由 Jack Morgenstein 提交于 6月 25, 2013

Should not allow negative num_vfs
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30e514a7

net/mlx4_core: Replace sscanf() with kstrtoint() · 618fad95

由 Dotan Barak 提交于 6月 25, 2013

It is not safe to use sscanf.
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

618fad95

24 6月, 2013 1 次提交

net/mlx_en: Timestamping is not supported in slave mode · f9bd2d7f

由 Amir Vadai 提交于 6月 20, 2013

Old hypervisors don't mask out timestamp capability for slave. Till slave
support will be added, need to disable capability by slave.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9bd2d7f

18 6月, 2013 1 次提交

treewide: Fix typo in printk · 278cee05

由 Masanari Iida 提交于 6月 01, 2013

Correct spelling typo in printk within various drivers.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

278cee05

05 6月, 2013 1 次提交

net/mlx4_core: Return -EPROBE_DEFER when a VF is probed before PF is sufficiently initialized · 5efe5355

由 Jack Morgenstein 提交于 6月 04, 2013

In the PF initialization, SRIOV is enabled before the PF is fully initialized.
This allows the kernel to probe the newly-exposed VFs before the PF is ready
to handle them (nested probes).

Have the probe method return the -EPROBE_DEFER value in this situation (instead
of the VF probe method retrying its initialization in a loop, and returning -EIO
on failure). When -EPROBE_DEFER is returned by the VF probe method, the kernel
itself will retry the probe after a suitable delay.

Based upon a suggestion by Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5efe5355

25 4月, 2013 2 次提交

net/mlx4_en: Add HW timestamping (TS) support · ec693d47

由 Amir Vadai 提交于 4月 23, 2013

The patch allows to enable/disable HW timestamping for incoming and/or
outgoing packets. It adds and initializes all structs and callbacks
needed by kernel TS API.
To enable/disable HW timestamping appropriate ioctl should be used.
Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are
supported.
When enabling TS on receive flow - VLAN stripping will be disabled.
Also were made all relevant changes in RX/TX flows to consider TS request
and plant HW timestamps into relevant structures.
mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec693d47

net/mlx4_core: Read HCA frequency and map internal clock · ddd8a6c1

由 Eugenia Emantayev 提交于 4月 23, 2013

Read HCA frequency, read PCI clock bar and offset, map internal clock to
PCI bar.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddd8a6c1

08 3月, 2013 1 次提交

net/mlx4_core: Fix endianness bug in set_param_l · e7dbeba8

由 Jack Morgenstein 提交于 3月 07, 2013

The set_param_l function assumes casting a u64 pointer to a u32 pointer
allows to access the lower 32bits, but it results in writing the upper
32 bits on big endian systems.

The fixed function reads the upper 32 bits of the 64 argument, and or's
them with the 32 bits of the 32-bit value passed to the function.

Since this is now a "read-modify-write" operation, we got many
"unintialized variable" warnings which needed to be fixed as well.

Reported-by: Alexander Schmidt <alexschm@de.ibm.com>.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7dbeba8

26 2月, 2013 1 次提交

mlx4_core: Enable memory windows in {INIT, QUERY}_HCA · e448834e

由 Shani Michaeli 提交于 2月 06, 2013

Add memory windows-related code to INIT_HCA and QUERY_HCA.
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e448834e

08 2月, 2013 1 次提交

net/mlx4: Move Ethernet related functionality from mlx4_core to mlx4_en · 16a10ffd

由 Yan Burman 提交于 2月 07, 2013

Move low level code that deals with management of Ethernet MACs and QPs from mlx4_core to mlx4_en.
Also convert the new functions to deal with MACs in form of char array instead of u64.

Actual functions moved:
mlx4_replace_mac
mlx4_get_eth_qp
mlx4_put_eth_qp

To conduct this change, some functionality had to be exported from the core,
the following functions were added:
mlx4_get_base_qp
__mlx4_replace_mac (low level function for CX1/A0 compatibility)
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16a10ffd

06 2月, 2013 1 次提交

mlx4_core: Fix advertisement of wrong PF context behaviour · f97b4b5d

由 Or Gerlitz 提交于 1月 10, 2013

Commit 08ff3235 ("mlx4: 64-byte CQE/EQE support") introduced a
regression where older guest VF drivers failed to load even when
64-byte EQEs/CQEs are disabled, since the PF wrongly advertises the
new context behaviour anyway.  The failure looks like:

    mlx4_core 0000:00:07.0: Unknown pf context behaviour
    mlx4_core 0000:00:07.0: Failed to obtain slave caps
    mlx4_core: probe of 0000:00:07.0 failed with error -38

Fix this by basing this advertisement on dev->caps.flags, which is the
operational capabilities used by the QUERY_FUNC_CAP command wrapper
(dev_cap->flags holds the firmware capabilities).
Reported-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f97b4b5d

05 2月, 2013 1 次提交

ethernet: Remove unnecessary alloc/OOM messages, alloc cleanups · b2adaca9

由 Joe Perches 提交于 2月 03, 2013

alloc failures already get standardized OOM
messages and a dump_stack.

Convert kzalloc's with multiplies to kcalloc.
Convert kmalloc's with multiplies to kmalloc_array.
Fix a few whitespace defects.
Convert a constant 6 to ETH_ALEN.
Use parentheses around sizeof.
Convert vmalloc/memset to vzalloc.
Remove now unused size variables.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2adaca9

01 2月, 2013 1 次提交

net/mlx4_core: Use firmware driven flow steering hash mode · 23537b73

由 Hadar Hen Zion 提交于 1月 30, 2013

The Firmware dynamically changes flow steering hash configuration from covering
L2 only to "full" L2/L3/L4 mode needed. The dynamic change allows the driver
to set hard coded hash configuration which is changed by the firmware from L2
to L2/L3/L4 when attaching the first L3/L4 flow steering rule and back to L2
when there are no more such rules.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23537b73

28 1月, 2013 1 次提交

net/mlx4_core: Return proper error code when __mlx4_add_one fails · f356fcbe

由 Jack Morgenstein 提交于 1月 24, 2013

Returning 0 (success) when in fact we are aborting the load, leads to kernel
panic when unloading the module. Fix that by returning the actual error code.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f356fcbe

19 1月, 2013 1 次提交

net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults · ca4c7b35

由 Or Gerlitz 提交于 1月 17, 2013

The lines

	if (mlx4_is_mfunc(dev)) {
		nreq = 2;
	} else {

which hard code the number of requested msi-x vectors under multi-function
mode to two can be removed completely, since the firmware sets num_eqs and
reserved_eqs appropriately Thus, the code line:

	nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs, nreq);

is by itself sufficient and correct for all cases. Currently, for mfunc
mode num_eqs = 32 and reserved_eqs = 28, hence four vectors will be enabled.

This triples (one vector is used for the async events and commands EQ) the
horse power provided for processing of incoming packets on netdev RSS scheme,
IO initiators/targets commands processing flows, etc.
Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca4c7b35

20 12月, 2012 2 次提交

mlx4_core: Allow choosing flow steering mode · 3c439b55

由 Jack Morgenstein 提交于 12月 06, 2012

Device managed flow steering will be enabled only under administrator
directive provided through setting the existing module parameter
log_num_mgm_entry_size to -1 (if the device actually supports flow
steering). If flow steering isn't requested or not available, the
driver will use the value of log_num_mgm_entry_size and B0 steering.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

3c439b55

mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV · 7b8157be

由 Jack Morgenstein 提交于 12月 06, 2012

Separate flow steering capability detection from the decision to activate.

For the master (and for native), detect the flow steering capability
in mlx4_dev_cap, but activate the appropriate steering type in a new
function choose_flow_steering() based on detected data.

For VFs, activate flow steering based on what was actually activated
by the master, where that info is obtained via QUERY_HCA. This fixes
the current VF detection which is wrongly based on QUERY_DEV_CAP.

Also, for SR-IOV mode, if flow steering may be activated, do so only
if the max number of QPs per rule is sufficient to satisfy one
subscription per VF. If not, fall back to B0 mode. This is needed to
serve registrations done by L2 network drivers such as mlx4_en and
IPoIB when the network stack attempts to join to multicast groups such
as all-hosts or the IPoIB broadcast group.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7b8157be

08 12月, 2012 1 次提交

drivers/net: fix up function prototypes after __dev* removals · 1dd06ae8

由 Greg Kroah-Hartman 提交于 12月 06, 2012

The __dev* removal patches for the network drivers ended up messing up
the function prototypes for a bunch of drivers.  This patch fixes all of
them back up to be properly aligned.

Bonus is that this almost removes 100 lines of code, always a nice
surprise.
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1dd06ae8

04 12月, 2012 1 次提交

mlx4_core: remove __dev* attributes · f57e6848

由 Bill Pemberton 提交于 12月 03, 2012

CONFIG_HOTPLUG is going away as an option.  As result the __dev*
markings will be going away.

Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.
Signed-off-by: NBill Pemberton <wfp5p@virginia.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f57e6848

27 11月, 2012 1 次提交

mlx4: 64-byte CQE/EQE support · 08ff3235

由 Or Gerlitz 提交于 10月 21, 2012

ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

08ff3235

24 10月, 2012 1 次提交

mlx4_core: Perform correct resource cleanup if mlx4_QUERY_ADAPTER() fails · 41929ed2

由 Dotan Barak 提交于 10月 21, 2012

Fixed the resource cleanup to act correctly and prevent a kernel oops when
mlx4_QUERY_ADAPTER() fails.
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

41929ed2

01 10月, 2012 9 次提交

mlx4_core: Disable SENSE_PORT for multifunction devices · aadf4f3f

由 Roland Dreier 提交于 9月 27, 2012

In the current driver, the SENSE_PORT firmware command is issued as a
"wrapped" command, but the command handling code doesn't have a
wrapper, so it will never do anything other than log an error message.
The latest ConnectX-3 2.11.500 firmware reports the SENSE_PORT
capability even in multi-function (SR-IOV) mode, so the driver will
try to issue the command.

At least until the driver has a proper wrapper for SENSE_PORT, make
sure we disable the command for multi-function devices.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

aadf4f3f

mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs · ca3e57a5

由 Roland Dreier 提交于 9月 27, 2012

Instead of having a hard-coded "PCI device ID != 0x1003" (which
obviously breaks as newer devices with ID != 0x1003 become available),
instead let's set a flag in our PCI device table for the older devices
where we're supposed to force using SENSE_PORT. This also avoids
enabling SENSE_PORT for virtual functions by mistake.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ca3e57a5

mlx4_core: Stash PCI ID driver_data in mlx4_priv structure · 839f1243

由 Roland Dreier 提交于 9月 27, 2012

That way we can check flags later on, when we've finished with the
pci_device_id structure.  Also convert the "is VF" flag to an enum:
"Never do in the preprocessor what can be done in C."
Signed-off-by: NRoland Dreier <roland@purestorage.com>

839f1243

mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem · f3d4c89e

由 Roland Dreier 提交于 9月 25, 2012

On an SR-IOV master device, __mlx4_init_one() calls mlx4_init_hca()
before mlx4_multi_func_init().  However, for unlucky configurations,
mlx4_init_hca() might call mlx4_SENSE_PORT() (via mlx4_dev_cap()), and
that calls mlx4_cmd_imm() with MLX4_CMD_WRAPPED set.

However, on a multifunction device with MLX4_CMD_WRAPPED, __mlx4_cmd()
calls into mlx4_slave_cmd(), and that immediately tries to do

	down(&priv->cmd.slave_sem);

but priv->cmd.slave_sem isn't initialized until mlx4_multi_func_init()
(which we haven't called yet).  The next thing it tries to do is access
priv->mfunc.vhcr, but that hasn't been allocated yet.

Fix this by moving the initialization of slave_sem and vhcr up into
mlx4_cmd_init(). Also, since slave_sem is really just being used as a
mutex, convert it into a slave_cmd_mutex.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f3d4c89e

R
mlx4_core: Trivial cleanups to driver log messages · 84b1f153
由 Roland Dreier 提交于 9月 25, 2012
```
Also put format string onto one line.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
```
84b1f153

mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations · 47605df9

由 Jack Morgenstein 提交于 8月 03, 2012

Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().

This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.

Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).

To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:

base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.

The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

47605df9

mlx4: Paravirtualize Node Guids for slaves · afa8fd1d

由 Jack Morgenstein 提交于 8月 03, 2012

This is necessary in order to support > 1 VF/PF in a VM for software
that uses the node guid as a discriminator, such as librdmacm.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

afa8fd1d

mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14

由 Jack Morgenstein 提交于 8月 03, 2012

This requires:

1. Replacing the paravirtualized P_Key index (inserted by the guest)
   with the real P_Key index.

2. For UD QPs, placing the guest's true source GID index in the
   address path structure mgid field, and setting the ud_force_mgid
   bit so that the mgid is taken from the QP context and not from the
   WQE when posting sends.

3. For UC and RC QPs, placing the guest's true source GID index in the
   address path structure mgid field.

4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
   proxy/tunnel pair.

Since not all the above adjustments occur in all the QP transitions,
the QP transitions require separate wrapper functions.

Secondly, initialize the P_Key virtualization table to its default
values: Master virtualized table is 1-1 with the real P_Key table,
guest virtualized table has P_Key index 0 mapped to the real P_Key
index 0, and all the other P_Key indices mapped to the reserved
(invalid) P_Key at index 127.

Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
and generating events on the master only if a P_Key actually changed.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

54679e14

mlx4_core: Add proxy and tunnel QPs to the reserved QP area · e2c76824

由 Jack Morgenstein 提交于 8月 03, 2012

In addition, pass the proxy and tunnel QP numbers to slaves so the
driver can perform special QP paravirtualization.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e2c76824

_Walt / cloud-kernel 与 Fork 源项目一致

_Walt / cloud-kernel
与 Fork 源项目一致