提交 · a1991c749a7bfacb766a056aff11864216670dd6 · openanolis / cloud-kernel

25 2月, 2014 1 次提交

net/mlx4: Set number of RX rings in a utility function · 02512482

由 Ido Shamay 提交于 2月 21, 2014

mlx4_en_add() is too long.
Moving set number of RX rings to a utiltity function to improve
readability and modulization of the code.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02512482

17 2月, 2014 1 次提交

netdevice: add queue selection fallback handler for ndo_select_queue · 99932d4f

由 Daniel Borkmann 提交于 2月 16, 2014

Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99932d4f

14 1月, 2014 1 次提交

net/mlx4_en: call gro handler for encapsulated frames · e6a76758

由 Eric Dumazet 提交于 1月 09, 2014

In order to use the native GRO handling of encapsulated protocols on
mlx4, we need to call napi_gro_receive() instead of netif_receive_skb()
unless busy polling is in action.

While we are at it, rename mlx4_en_cq_ll_polling() to
mlx4_en_cq_busy_polling()

Tested with GRE tunnel : GRO aggregation is now performed on the
ethernet device instead of being done later on gre device.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6a76758

11 1月, 2014 1 次提交

net: core: explicitly select a txq before doing l2 forwarding · f663dd9a

由 Jason Wang 提交于 1月 10, 2014

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
  instead of lower device which misses the necessary txq synchronization for
  lower device such as txq stopping or frozen required by dev watchdog or
  control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
  watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
  when tso is disabled for lower device.

Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f663dd9a

02 1月, 2014 1 次提交

mlx4_en: Add PTP hardware clock · ad7d4eae

由 Shawn Bohrer 提交于 12月 31, 2013

This adds a PHC to the mlx4_en driver. We use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled.  This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

This driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-3 card.
Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Acked-By: NHadar Hen Zion <hadarh@mellanox.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad7d4eae

01 1月, 2014 1 次提交

net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling · 837052d0

由 Or Gerlitz 提交于 12月 23, 2013

When the device tunneling offloads mode is vxlan do the following

 - call SET_PORT with the relevant setting

 - add DMFS steering vxlan rule for the device self and multicast mac addresses
   of the form: {<ETH, outer-mac> <VXLAN, ANY vnid> <ETH, ANY mac>} --> RSS QP

 - set relevant QPC fields in RSS context and RX ring QPs

 - in TX flow, set WQE fields to generate HW checksum, and handle gso skbs
   which are marked for encapsulation such that the HW will segment them properly.

 - in RX flow, read HW offloaded checksum for encapsulated packets from the CQE

 - advertize hw_enc_features and NETIF_F_GSO_UDP_TUNNEL to the networking stack
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

837052d0

20 12月, 2013 2 次提交

net/mlx4_en: Add NAPI support for transmit side · 0276a330

由 Eugenia Emantayev 提交于 12月 19, 2013

Add NAPI for TX side,
implement its support and provide NAPI callback.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0276a330

net/mlx4_en: Configure the XPS queue mapping on driver load · d03a68f8

由 Ido Shamay 提交于 12月 19, 2013

Only TX rings of User Piority 0 are mapped.
TX rings of other UP's are using UP 0 mapping.
XPS is not in use when num_tc is set.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d03a68f8

08 11月, 2013 2 次提交

net/mlx4_en: Datapath structures are allocated per NUMA node · 163561a4

由 Eugenia Emantayev 提交于 11月 07, 2013

For each RX/TX ring and its CQ, allocation is done on a NUMA node that
corresponds to the core that the data structure should operate on.
The assumption is that the core number is reflected by the ring index.
The affected allocations are the ring/CQ data structures,
the TX/RX info and the shared HW/SW buffer.
For TX rings, each core has rings of all UPs.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

163561a4

net/mlx4_en: Datapath resources allocated dynamically · 41d942d5

由 Eugenia Emantayev 提交于 11月 07, 2013

Currently all TX/RX rings and completion queues are part of the
netdev priv structure and are allocated statically. This patch
will change the priv to hold only arrays of pointers and therefore
all TX/RX rings and completetion queues will be allocated
dynamically. This is in preparation for NUMA aware allocations.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41d942d5

09 10月, 2013 1 次提交

net/mlx4_en: Rename name of mlx4_en_rx_alloc members · 70fbe079

由 Amir Vadai 提交于 10月 07, 2013

Add page prefix to page related members: @size and @offset into
@page_size and @page_offset

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70fbe079

02 8月, 2013 1 次提交

net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL · e0d1095a

由 Cong Wang 提交于 8月 01, 2013

Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0d1095a

26 6月, 2013 1 次提交

mlx4: allow order-0 memory allocations in RX path · 51151a16

由 Eric Dumazet 提交于 6月 23, 2013

Signed-off-by: NEric Dumazet <edumazet@google.com>

mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.

We therefore drop frames more than needed.

This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.

By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.

Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51151a16

20 6月, 2013 2 次提交

net/mlx4_en: Low Latency recv statistics · 8501841a

由 Amir Vadai 提交于 6月 18, 2013

Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8501841a

net/mlx4_en: Add Low Latency Socket (LLS) support · 9e77a2b8

由 Amir Vadai 提交于 6月 18, 2013

Add basic support for LLS.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e77a2b8

05 6月, 2013 1 次提交

net/mlx4: use one page fragment per incoming frame · e6309cff

由 Eric Dumazet 提交于 6月 03, 2013

mlx4 driver has a suboptimal memory allocation strategy for regular
MTU=1500 frames, as it uses two page fragments :

One of 512 bytes and one of 1024 bytes.

This makes GRO less effective, as each GSO packet contains 8 MSS instead
of 16 MSS.

Performance of a single TCP flow gains 25 % increase with the following
patch.

Before patch :

A:~# netperf -H 192.168.0.2 -Cc
MIGRATED TCP STREAM TEST ...
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      13798.47   3.06     4.20     0.436   0.598

After patch :

A:~# netperf -H 192.68.0.2 -Cc
MIGRATED TCP STREAM TEST ...
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      17273.80   3.44     4.19     0.391   0.477
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6309cff

25 4月, 2013 2 次提交

net/mlx4_en: Add a service task · b6c39bfc

由 Amir Vadai 提交于 4月 23, 2013

Add a service task to run tasks that needed to be executed periodically.
Currently the only task is a watchdog to catch NIC clock overflow, to make
timestamping accurate.
Will move the statistics task into this framework in a later patch.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6c39bfc

net/mlx4_en: Add HW timestamping (TS) support · ec693d47

由 Amir Vadai 提交于 4月 23, 2013

The patch allows to enable/disable HW timestamping for incoming and/or
outgoing packets. It adds and initializes all structs and callbacks
needed by kernel TS API.
To enable/disable HW timestamping appropriate ioctl should be used.
Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are
supported.
When enabling TS on receive flow - VLAN stripping will be disabled.
Also were made all relevant changes in RX/TX flows to consider TS request
and plant HW timestamps into relevant structures.
mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec693d47

08 4月, 2013 1 次提交

net/mlx4_en: Enable DCB ETS ops only when supported by the firmware · 540b3a39

由 Or Gerlitz 提交于 4月 07, 2013

Enable the DCB ETS ops only when supported by the firmware. For older firmware/cards
which don't support ETS, advertize only PFC DCB ops.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

540b3a39

08 3月, 2013 1 次提交

net/mlx4_en: Fix race when setting the device MAC address · bfa8ab47

由 Yan Burman 提交于 3月 07, 2013

Remove unnecessary use of workqueue for the device MAC address setting
flow, and fix a race when setting MAC address which was introduced by
commit c07cb4b0 "net/mlx4_en: Manage hash of MAC addresses per port"

The race happened when mlx4_en_replace_mac was being executed in parallel
with a successive call to ndo_set_mac_address, e.g witn an A/B/A MAC
setting configuration test, the third set fails.

With this change we also properly report an error if set MAC fails.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfa8ab47

08 2月, 2013 6 次提交

net/mlx4_en: Add unicast MAC filtering · cc5387f7

由 Yan Burman 提交于 2月 07, 2013

Implement and advertise unicast MAC filtering, such that setting macvlan
instance over mlx4_en interfaces will not require the networking core
to put mlx4_en devices in promiscuous mode.

If for some reason adding a unicast address filter fails e.g as of missing space in
the HW mac table, the device forces itself into promiscuous mode (and out of this
forced state when enough space is available).
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc5387f7

net/mlx4_en: Manage hash of MAC addresses per port · c07cb4b0

由 Yan Burman 提交于 2月 07, 2013

As a preparation step for supporting multiple unicast addresses, store MAC addresses in hash table.
Remove the radix tree for MAC addresses per QP, as it's not in use.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c07cb4b0

net/mlx4_en: Re-arrange ndo_set_rx_mode related code · 0eb74fdd

由 Yan Burman 提交于 2月 07, 2013

Currently, mlx4_en_do_set_multicast serves as the ndo_set_rx_mode entry for mlx4_en,
doing all related work. Split it to few calls, one per required functionality
(e.g multicast, promiscuous, etc) and rename some structures and calls
to use rx_mode notation instead of multicast.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eb74fdd

net/mlx4: Move Ethernet related functionality from mlx4_core to mlx4_en · 16a10ffd

由 Yan Burman 提交于 2月 07, 2013

Move low level code that deals with management of Ethernet MACs and QPs from mlx4_core to mlx4_en.
Also convert the new functions to deal with MACs in form of char array instead of u64.

Actual functions moved:
mlx4_replace_mac
mlx4_get_eth_qp
mlx4_put_eth_qp

To conduct this change, some functionality had to be exported from the core,
the following functions were added:
mlx4_get_base_qp
__mlx4_replace_mac (low level function for CX1/A0 compatibility)
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16a10ffd

net/mlx4_en: Optimize Rx fast path filter checks · 6bbb6d99

由 Yan Burman 提交于 2月 07, 2013

Currently, RX path code that does RX filtering is not optimized
and does an expensive conversion. In order to use ether_addr_equal_64bits
which is optimized for such cases, we need the MAC address kept by the device
to be in the form of unsigned char array instead of u64. Store the MAC address
as unsigned char array and convert to/from u64 out of the fast path when needed.
Side effect of this is that we no longer need priv->mac, since it's the same
as dev->dev_addr.

This optimization was suggested by Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bbb6d99

net/mlx4_en: Optimize loopback related checks in data path · 79aeaccd

由 Yan Burman 提交于 2月 07, 2013

Currently there are relatively complex conditional checks in the fast path,
for TX loopback enabling and resulting RX filter logic.
Move elaborate if's out of data path, replace them with a single flag
for each state and update that state from appropriate places.
Also, in native (non SRIOV) mode and not in loopback or in selftest,
there is no need to try and filter out packets that HW loopback-ed,
as in native mode we do not loopback packets anymore.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79aeaccd

01 2月, 2013 2 次提交

net/mlx4_en: Fix transmit timeout when driver restarts port · 3484aac1

由 Amir Vadai 提交于 1月 30, 2013

Under heavy CPU load, changing, ring size/mtu/etc. could result in transmit
timeout, since stop-start port might take more than 10 seconds.
Calling netif_detach_device to prevent tx queue transmit timeout.

netif_detach_device() is not called under ndo_stop, because netif_carrier_off
will prevent the timeout, and device should not be marked as not present, or
else user won't be able to start it later on.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3484aac1

net/mlx4_en: Fix ethtool rules leftovers after module unloaded · 0d256c0e

由 Hadar Hen Zion 提交于 1月 30, 2013

As part of the driver unload flow, all steering rules must be deleted,
make sure to remove the rules that were set through ethtool.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d256c0e

03 12月, 2012 1 次提交

net/mlx4_en: Set number of rx/tx channels using ethtool · d317966b

由 Amir Vadai 提交于 12月 02, 2012

Add support to changing number of rx/tx channels using
ethtool ('ethtool -[lL]'). Where the number of tx channels specified in ethtool
is the number of rings per user priority - not total number of tx rings.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d317966b

27 11月, 2012 1 次提交

mlx4: 64-byte CQE/EQE support · 08ff3235

由 Or Gerlitz 提交于 10月 21, 2012

ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

08ff3235

20 11月, 2012 1 次提交

mlx4_en: Remove remnants of LRO support · f1d29a3f

由 Ben Hutchings 提交于 11月 16, 2012

Commit fa37a958 ('mlx4_en: Moving to
work with GRO') left behind the Kconfig depends/select, some dead
code and comments referring to LRO.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Acked-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1d29a3f

08 11月, 2012 1 次提交

mlx4: change TX coalescing defaults · ecfd2ce1

由 Eric Dumazet 提交于 11月 05, 2012

mlx4 currently uses a too high tx coalescing setting, deferring
TX completion interrupts by up to 128 us.

With the recent skb_orphan() removal in commit 8112ec3b,
performance of a single TCP flow is capped to ~4 Gbps, unless
we increase tcp_limit_output_bytes.

I suggest using 16 us instead of 128 us, allowing a finer control.

Performance of a single TCP flow is restored to previous levels,
while keeping TCP small queues fully enabled with default sysctl.

This patch is also a BQL prereq.
Reported-by: NVimalkumar <j.vimal@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecfd2ce1

04 8月, 2012 1 次提交

net/mlx4_en: Fixing TX queue stop/wake flow · c18520bd

由 Yevgeny Petrilin 提交于 8月 03, 2012

Removing the ring->blocked flag, it is redundant and leads to a race:

We close the TX queue and then set the "blocked" flag.
Between those 2 operations the completion function can check the "blocked"
flag, sees that it is 0, and wouldn't open the TX queue.

Using netif_tx_queue_stopped to check the state of the queue to avoid this race.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c18520bd

20 7月, 2012 1 次提交

mlx4_en: map entire pages to increase throughput · 4cce66cd

由 Thadeu Lima de Souza Cascardo 提交于 7月 16, 2012

In its receive path, mlx4_en driver maps each page chunk that it pushes
to the hardware and unmaps it when pushing it up the stack. This limits
throughput to about 3Gbps on a Power7 8-core machine.

One solution is to map the entire allocated page at once. However, this
requires that we keep track of every page fragment we give to a
descriptor. We also need to work with the discipline that all fragments will
be released (in the sense that it will not be reused by the driver
anymore) in the order they are allocated to the driver.

This requires that we don't reuse any fragments, every single one of
them must be reallocated. We do that by releasing all the fragments that
are processed and only after finished processing the descriptors, we
start the refill.

We also must somehow guarantee that we either refill all fragments in a
descriptor or none at all, without resorting to giving up a page
fragment that we would have already given. Otherwise, we would break the
discipline of only releasing the fragments in the order they were
allocated.

This has passed page allocation fault injections (restricted to the
driver by using required-start and required-end) and device hotplug
while 16 TCP streams were able to deliver more than 9Gbps.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cce66cd

19 7月, 2012 1 次提交

net/mlx4_en: Add accelerated RFS support · 1eb8c695

由 Amir Vadai 提交于 7月 18, 2012

Use RFS infrastructure and flow steering in HW to keep CPU
affinity of rx interrupts and application per TCP stream.

A flow steering filter is added to the HW whenever the RFS
ndo callback is invoked by core networking code.

Because the invocation takes place in interrupt context, the
actual setup of HW is done using workqueue. Whenever new filter
is added, the driver checks for expiry of existing filters.

Since there's window in time between the point where the core
RFS code invoked the ndo callback, to the point where the HW
is configured from the workqueue context, the 2nd, 3rd etc
packets from that stream will cause the net core to invoke
the callback again and again.

To prevent inefficient/double configuration of the HW, the filters
are kept in a database which is indexed using hash function to enable
fast access.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1eb8c695

08 7月, 2012 4 次提交

net/mlx4_en: Add support for drop action through ethtool · cabdc8ee

由 Hadar Hen Zion 提交于 7月 05, 2012

The drop action is implemented by allocating a QP and keeping it in a reset state
such that the HW drops any packets which are steered to that QP. When a drop action
is requested, we attach the relevant flow to that QP.
Sign-off-by: NHadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cabdc8ee

net/mlx4_en: Manage flow steering rules with ethtool · 82067281

由 Hadar Hen Zion 提交于 7月 05, 2012

Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
the existing get_rxnfc callback.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82067281

{NET, IB}/mlx4: Add device managed flow steering firmware API · 0ff1fb65

由 Hadar Hen Zion 提交于 7月 05, 2012

The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ff1fb65

net/mlx4_en: Re-design multicast attachments flow · 6d199937

由 Yevgeny Petrilin 提交于 7月 05, 2012

Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.

To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed.
Reported-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d199937

26 6月, 2012 1 次提交

net/mlx4_en: Release QP range in free_resources · 044ca2a5

由 Yevgeny Petrilin 提交于 6月 25, 2012

Add a missing resource release in ring cleanup.
Not doing this leaves a range of QPs that are being reserved,
and no one can use them.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

044ca2a5

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功