提交 · b1b6b4da7867d220f0da5f6686b869b304c5459b · openeuler / Kernel

20 9月, 2014 1 次提交

net/mlx4_en: Add mlx4_en_get_cqe helper · b1b6b4da

由 Ido Shamay 提交于 9月 18, 2014

This function derives the base address of the CQE from the CQE size,
and calculates the real CQE context segment in it from the factor
(this is like before). Before this change the code used the factor to
calculate the base address of the CQE as well.

The factor indicates in which segment of the cqe stride the cqe information
is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
only 32 and 64 byte strides. However, with larger strides, the factor is zero,
and so cannot be used to calculate the base of the CQE.

The helper uses the same method of CQE buffer pulling made by all other
components that reads the CQE buffer (mlx4_ib driver and libmlx4).
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1b6b4da

04 9月, 2014 1 次提交

mlx4_en: Convert the normal skb free path to dev_consume_skb_any() · b89df95d

由 Rick Jones 提交于 9月 03, 2014

It would appear the mlx4_en driver was still making a call to
dev_kfree_skb_any() where dev_consume_skb_any() would be more
appropriate.  This should make dropped packet profiling/tracking
easier/better over a NIC driven by mlx4_en.
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b89df95d

23 7月, 2014 1 次提交

net/mlx4_en: Disable blueflame using ethtool private flags · 0fef9d03

由 Amir Vadai 提交于 7月 22, 2014

Enable the user to turn off the hardware feature called BlueFlame.
Since it is something specific to mlx4_en hardware, we control
the feature via ethtool private flags.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fef9d03

09 7月, 2014 1 次提交

net/mlx4_en: Ignore budget on TX napi polling · fbc6daf1

由 Amir Vadai 提交于 7月 08, 2014

It is recommended that TX work not count against the quota.
The cost of TX packet liberation is a minute percentage of what it costs to
process an RX frame. Furthermore, that SKB freeing makes memory available for
other paths in the stack.

Give the TX a larger budget and be more aggressive about cleaning up the Tx
descriptors this budget could be changed using ethtool:
$ ethtool -C eth1 tx-frames-irq <budget>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbc6daf1

03 7月, 2014 1 次提交

net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map · 35f6f453

由 Amir Vadai 提交于 6月 29, 2014

IRQ affinity notifier can only have a single notifier - cpu_rmap
notifier. Can't use it to track changes in IRQ affinity map.
Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
during NAPI poll thread.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Hutchings <ben@decadent.org.uk>
Fixes: 2eacc23c ("net/mlx4_core: Enforce irq affinity changes immediatly")
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35f6f453

03 6月, 2014 1 次提交

IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO · 40f2287b

由 Jiri Kosina 提交于 5月 11, 2014

Modify the various routines used to allocate memory resources which
serve QPs in mlx4 to get an input GFP directive.  Have the Ethernet
driver to use GFP_KERNEL in it's QP allocations as done prior to this
commit, and the IB driver to use GFP_NOIO when the IB verbs
IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

40f2287b

15 5月, 2014 1 次提交

net/mlx4_core: Enforce irq affinity changes immediatly · 2eacc23c

由 Yuval Atias 提交于 5月 14, 2014

During heavy traffic, napi is constatntly polling the complition queue
and no interrupt is fired. Because of that, changes to irq affinity are
ignored until traffic is stopped and resumed.

By registering to the irq notifier mechanism, and forcing interrupt when
affinity is changed, irq affinity changes will be immediatly enforced.
Signed-off-by: NYuval Atias <yuvala@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2eacc23c

09 5月, 2014 1 次提交

mellanox: Logging message cleanups · 1a91de28

由 Joe Perches 提交于 5月 07, 2014

Use a more current logging style.

o Coalesce formats
o Add missing spaces for coalesced formats
o Align arguments for modified formats
o Add missing newlines for some logging messages
o Use DRV_NAME as part of format instead of %s, DRV_NAME to
  reduce overall text.
o Use ..., ##__VA_ARGS__ instead of args... in macros
o Correct a few format typos
o Use a single line message where appropriate
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a91de28

13 3月, 2014 1 次提交

mlx4: Call dev_kfree_skby_any instead of dev_kfree_skb. · e81f44b6

由 Eric W. Biederman 提交于 3月 11, 2014

Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e81f44b6

03 3月, 2014 4 次提交

net/mlx4_en: Use union for BlueFlame WQE · ec570940

由 Amir Vadai 提交于 3月 02, 2014

When BlueFlame is turned on, control segment of the TX WQE is changed,
and the second line of it is used for QPN.
Changed code to use a union in the mlx4_wqe_ctrl_seg instead of casting.
This makes the code clearer and solves the static checker warning:

drivers/net/ethernet/mellanox/mlx4/en_tx.c:839 mlx4_en_xmit()
	warn: potential memory corrupting cast 4 vs 2 bytes

CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec570940

net/mlx4_en: Move queue stopped/waked counters to be per ring · 15bffdff

由 Eugenia Emantayev 提交于 3月 02, 2014

Give accurate counters and avoids cache misses when several rings
update the counters of stop/wake queue.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15bffdff

net/mlx4_en: Pad ethernet packets smaller than 17 bytes · 93591aaa

由 Eugenia Emantayev 提交于 3月 02, 2014

Hardware can't accept packets smaller than 17 bytes. Therefore need to
pad with zeros.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93591aaa

net/mlx4_en: Verify mlx4_en module parameters · b97b33a3

由 Eugenia Emantayev 提交于 3月 02, 2014

Verify mlx4_en module parameters.
In case they are out of range - reset to default values.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b97b33a3

17 2月, 2014 1 次提交

netdevice: add queue selection fallback handler for ndo_select_queue · 99932d4f

由 Daniel Borkmann 提交于 2月 16, 2014

Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99932d4f

11 1月, 2014 1 次提交

net: core: explicitly select a txq before doing l2 forwarding · f663dd9a

由 Jason Wang 提交于 1月 10, 2014

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
  instead of lower device which misses the necessary txq synchronization for
  lower device such as txq stopping or frozen required by dev watchdog or
  control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
  watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
  when tso is disabled for lower device.

Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f663dd9a

01 1月, 2014 1 次提交

net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling · 837052d0

由 Or Gerlitz 提交于 12月 23, 2013

When the device tunneling offloads mode is vxlan do the following

 - call SET_PORT with the relevant setting

 - add DMFS steering vxlan rule for the device self and multicast mac addresses
   of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP

 - set relevant QPC fields in RSS context and RX ring QPs

 - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs
   which are marked for encapsulation such that the HW will segment them properly.

 - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE

 - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

837052d0

20 12月, 2013 2 次提交

net/mlx4_en: Add NAPI support for transmit side · 0276a330

由 Eugenia Emantayev 提交于 12月 19, 2013

Add NAPI for TX side,
implement its support and provide NAPI callback.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0276a330

net/mlx4_en: Configure the XPS queue mapping on driver load · d03a68f8

由 Ido Shamay 提交于 12月 19, 2013

Only TX rings of User Piority 0 are mapped.
TX rings of other UP's are using UP 0 mapping.
XPS is not in use when num_tc is set.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d03a68f8

08 11月, 2013 2 次提交

net/mlx4_en: Datapath structures are allocated per NUMA node · 163561a4

由 Eugenia Emantayev 提交于 11月 07, 2013

For each RX/TX ring and its CQ, allocation is done on a NUMA node that
corresponds to the core that the data structure should operate on.
The assumption is that the core number is reflected by the ring index.
The affected allocations are the ring/CQ data structures,
the TX/RX info and the shared HW/SW buffer.
For TX rings, each core has rings of all UPs.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

163561a4

net/mlx4_en: Datapath resources allocated dynamically · 41d942d5

由 Eugenia Emantayev 提交于 11月 07, 2013

Currently all TX/RX rings and completion queues are part of the
netdev priv structure and are allocated statically. This patch
will change the priv to hold only arrays of pointers and therefore
all TX/RX rings and completetion queues will be allocated
dynamically. This is in preparation for NUMA aware allocations.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41d942d5

22 8月, 2013 3 次提交

net/mlx4_en: Reduce scope of local variables in mlx4_en_xmit · 5f1cd200

由 Amir Vadai 提交于 8月 21, 2013

Some variables could have their scope reduced.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f1cd200

net/mlx4_en: Fix handling of dma_map failure · 237a3a3b

由 Amir Vadai 提交于 8月 21, 2013

Result of skb_frag_dma_map() and dma_map_single() wasn't checked.
Added a check and proper handling in case of failure.
Moved the mapping to the beginning of mlx4_en_xmit(), before updating
the ring data structure to make error handling easier.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

237a3a3b

net/mlx4_en: Notify user when TX ring in error state · bd2f631d

由 Amir Vadai 提交于 8月 21, 2013

When hardware gets into error state, must notify user about it.
When QP in error state no traffic will be tx'ed from the attached
tx_ring.

Driver should know how to recover from this unexpected state. I will send later
on the recovery flow, but having the print shouldn't be delayed.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd2f631d

29 7月, 2013 1 次提交

net/mlx4_en: Fix BlueFlame race · 2d4b6466

由 Eugenia Emantayev 提交于 7月 25, 2013

Fix a race between BlueFlame flow and stamping in post send flow.
Example:
	SW: Build WQE 0 on the TX buffer, except the ownership bit
	SW: Set ownership for WQE 0 on the TX buffer
	SW: Ring doorbell for WQE 0
	SW: Build WQE 1 on the TX buffer, except the ownership bit
	SW: Set ownership for WQE 1 on the TX buffer
	HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
	HW: Produce CQEs for WQE 0 and WQE 1
	SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
	SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
	HW: CQE error with index 0xFFFF  - the BF WQE's control segment is STAMPED,
		so the BF index is 0xFFFF. Error: Invalid Opcode.
As a result QP enters the error state and no traffic can be sent.

Solution:
When stamping - do not stamp last completed wqe.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d4b6466

05 6月, 2013 1 次提交

mlx4: use __netdev_pick_tx instead of __skb_tx_hash in mlx4_en_select_queue · c08355fb

由 govindarajulu.v 提交于 6月 03, 2013

mlx4_en_select_queue() uses __skb_tx_hash to select the transmit queue.
XPS settings are ignored by this. Instead, we can use __netdev_pick_tx
to select the transmit queue.

Compile test only.
Signed-off-by: Ngovindarajulu.v <govindarajulu90@gmail.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c08355fb

25 4月, 2013 2 次提交

net/mlx4_en: Support software timestamping · eb0cabbd

由 Amir Vadai 提交于 4月 23, 2013

Kernel software timestamping requires that the driver calls skb_tx_timestamp
just before passing the skb to the HW MAC layer. This patch adds this call.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb0cabbd

net/mlx4_en: Add HW timestamping (TS) support · ec693d47

由 Amir Vadai 提交于 4月 23, 2013

The patch allows to enable/disable HW timestamping for incoming and/or
outgoing packets. It adds and initializes all structs and callbacks
needed by kernel TS API.
To enable/disable HW timestamping appropriate ioctl should be used.
Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are
supported.
When enabling TS on receive flow - VLAN stripping will be disabled.
Also were made all relevant changes in RX/TX flows to consider TS request
and plant HW timestamps into relevant structures.
mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec693d47

08 2月, 2013 2 次提交

mlx4_en: Fix BQL reset TX queue call point · 41b74920

由 Tom Herbert 提交于 2月 06, 2013

Fix issue in Mellanox driver related to BQL.  netdev_tx_reset_queue
was not being called in certain situations where the device was
being start and stopped.  Moved netdev_tx_reset_queue from the reset
device path to mlx4_en_free_tx_buf which is where the rings are
cleaned in a reset (specifically from device being stopped).
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41b74920

net/mlx4_en: Optimize loopback related checks in data path · 79aeaccd

由 Yan Burman 提交于 2月 07, 2013

Currently there are relatively complex conditional checks in the fast path,
for TX loopback enabling and resulting RX filter logic.
Move elaborate if's out of data path, replace them with a single flag
for each state and update that state from appropriate places.
Also, in native (non SRIOV) mode and not in loopback or in selftest,
there is no need to try and filter out packets that HW loopback-ed,
as in native mode we do not loopback packets anymore.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79aeaccd

28 1月, 2013 1 次提交

net/mlx4_en: Fix a race when closing TX queue · 72259225

由 Amir Vadai 提交于 1月 24, 2013

There is a possible race where the TX completion handler can clean the
entire TX queue between the decision that the queue is full and actually
closing it. To avoid this situation, check again if the queue is really
full, if not, reopen the transmit and continue with sending the packet.

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72259225

21 1月, 2013 1 次提交

net/mlx4_en: remove redundant code · 546bfedb

由 Eric Dumazet 提交于 1月 17, 2013

remove redundant code from build_inline_wqe()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

546bfedb

19 1月, 2013 1 次提交

net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode · 213815a1

由 Yan Burman 提交于 1月 17, 2013

Commit 5b4c4d36 "mlx4_en: Allow communication between functions on
same host" introduced a regression under which a bridge acting as vSwitch
whose uplink is an mlx4 Ethernet device become non-operative in native
(non sriov) mode. This happens since broadcast ARP requests sent by VMs
were loopback-ed by the HW and hence the bridge learned VM source MACs
on both the VM and the uplink ports.

The fix is to place the DMAC in the send WQE only under SRIOV/eSwitch
configuration or when the device is in selftest.
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

213815a1

03 12月, 2012 1 次提交

net/mlx4_en: Set number of rx/tx channels using ethtool · d317966b

由 Amir Vadai 提交于 12月 02, 2012

Add support to changing number of rx/tx channels using
ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool
is the number of rings per user priority - not total number of tx rings.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d317966b

27 11月, 2012 1 次提交

mlx4: 64-byte CQE/EQE support · 08ff3235

由 Or Gerlitz 提交于 10月 21, 2012

ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

08ff3235

26 10月, 2012 2 次提交

net/mlx4_en: Don't use vlan tag value as an indication for vlan presence · 2b39a061

由 Moni Shoua 提交于 10月 25, 2012

The vlan tag can be zero. This is why it can't serve as an indication
that packet requires VLAN header in the TX flow.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b39a061

net/mlx4_en: Fix double-release-range in tx-rings · 7208ca30

由 Jack Morgenstein 提交于 10月 25, 2012

The QP range is reserved as a single block. However, when freeing the
en resources, the tx-ring QPs are released both in mlx4_en_destroy_tx_ring
(one at a time) and in mlx4_en_free_resources (as a block release).

Fix by eliminating the one-at-a-time release in mlx4_en_destroy_tx_ring.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7208ca30

02 10月, 2012 1 次提交

mlx4: dont orphan skbs in mlx4_en_xmit() · 8112ec3b

由 Eric Dumazet 提交于 9月 28, 2012

After commit e22979d9 (mlx4_en: Moving to Interrupts for TX
completions) we no longer need to orphan skbs in mlx4_en_xmit()
since skb wont stay a long time in TX ring before their release.

Orphaning skbs in ndo_start_xmit() should be avoided as much as
possible, since it breaks TCP Small Queue or other flow control
mechanisms (per socket limits)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8112ec3b

04 8月, 2012 1 次提交

net/mlx4_en: Fixing TX queue stop/wake flow · c18520bd

由 Yevgeny Petrilin 提交于 8月 03, 2012

Removing the ring->blocked flag, it is redundant and leads to a race:

We close the TX queue and then set the "blocked" flag.
Between those 2 operations the completion function can check the "blocked"
flag, sees that it is 0, and wouldn't open the TX queue.

Using netif_tx_queue_stopped to check the state of the queue to avoid this race.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c18520bd

18 5月, 2012 1 次提交

net/mlx4_en: num cores tx rings for every UP · bc6a4744

由 Amir Vadai 提交于 5月 17, 2012

Change the TX ring scheme such that the number of rings for untagged packets
and for tagged packets (per each of the vlan priorities) is the same, unlike
the current situation where for tagged traffic there's one ring per priority
and for untagged rings as the number of core.

Queue selection is done as follows:

If the mqprio qdisc is operates on the interface, such that the core networking
code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
queue set is forced - for both, tagged and untagged traffic.

Else, the egress map skb->priority => User priority is used for tagged traffic, and
all untagged traffic is sent through tx rings of UP 0.

The patch follows the convergence of discussing that issue with John Fastabend
over this thread http://comments.gmane.org/gmane.linux.network/229877

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc6a4744

24 4月, 2012 1 次提交

mlx4_en: Byte Queue Limit support · 5b263f53

由 Yevgeny Petrilin 提交于 4月 23, 2012

Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b263f53

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功