提交 · 225c6c8c6bbbc32455df3d1c0fb1e1e1fb51c533 · openeuler / Kernel

14 11月, 2014 2 次提交

net/mlx4_core: Use correct variable type for mlx4_slave_cap · 225c6c8c

由 Matan Barak 提交于 11月 13, 2014

We've used an incorrect type for the loop counter and the
mlx4_QUERY_FUNC_CAP function. The current input modifier
is either a port or a boolean.
Since the number of ports is always a positive value < 255,
we should use u8 instead of an integer with casting.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

225c6c8c

net/mlx4_core: Fix wrong reading of reserved_eqs · 7c68dd43

由 Matan Barak 提交于 11月 13, 2014

We mistakenly read the reserved_eqs field as a standard
numeric value rather than a log2 value.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c68dd43

12 11月, 2014 2 次提交

net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE · f8c6455b

由 Shani Michaeli 提交于 11月 09, 2014

When processing received traffic, pass CHECKSUM_COMPLETE status to the
stack, with calculated checksum for non TCP/UDP packets (such
as GRE or ICMP).

Although the stack expects checksum which doesn't include the pseudo
header, the HW adds it. To address that, we are subtracting the pseudo
header checksum from the checksum value provided by the HW.

In the IPv6 case, we also compute/add the IP header checksum which
is not added by the HW for such packets.

Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8c6455b

net/mlx4_en: Extend usage of napi_gro_frags · dd65beac

由 Shani Michaeli 提交于 11月 09, 2014

We can call napi_gro_frags for all the received traffic regardless
of the checksum status. Specifically, received packets whose status
is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE)
are eligible for napi_gro_frags as well.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd65beac

11 11月, 2014 2 次提交

mlx4: restore conditional call to napi_complete_done() · 2e1af7d7

由 Eric Dumazet 提交于 11月 10, 2014

After commit 1a288172 ("mlx4: use napi_complete_done()") we ended up
calling napi_complete_done() in the case NAPI poll consumed all its
budget.

This added extra interrupt pressure, this patch restores proper
behavior.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 1a288172 ("mlx4: use napi_complete_done()")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e1af7d7

mlx4: use napi_complete_done() · 1a288172

由 Eric Dumazet 提交于 11月 06, 2014

To enable gro_flush_timeout, a driver has to use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a288172

07 11月, 2014 2 次提交

net/mlx5_core: Fix race on driver load · 364d1798

由 Eli Cohen 提交于 11月 06, 2014

When events arrive at driver load, the event handler gets called even before
the spinlock and list are initialized. Fix this by moving the initialization
before EQs creation.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

364d1798

net/mlx5_core: Fix race in create EQ · a158906d

由 Eli Cohen 提交于 11月 06, 2014

After the EQ is created, it can possibly generate interrupts and the interrupt
handler is referencing eq->dev. It is therefore required to set eq->dev before
calling request_irq() so if an event is generated before request_irq() returns,
we will have a valid eq->dev field.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a158906d

04 11月, 2014 5 次提交

net/mlx4_core: Add retrieval of CONFIG_DEV parameters · d475c95b

由 Matan Barak 提交于 11月 02, 2014

Add code to issue CONFIG_DEV "get" firmware command.

This command is used in order to obtain certain parameters used for
supporting various RX checksumming options and vxlan UDP port.

The GET operation is allowed for VFs too.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d475c95b

net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages · 1ab25f86

由 Ido Shamay 提交于 11月 02, 2014

Needed in order to get cache cold pages (L3 flushed) for HW scatter.

Otherwise memory may flush those entries when the packet comes from
PCI, causing back pressure resulting in BW decrease.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ab25f86

net/mlx4_en: Remove RX buffers alignment to IP_ALIGN · 5f6e9800

由 Ido Shamay 提交于 11月 02, 2014

When IP_ALIGN has a non zero value, hardware will write to a non aligned
address. The only reader from this address is when copying the header
from the first frag into the linear buffer (further access to the IP
address will be from the linear buffer, in which the headers are
aligned). Since the penalty of non align access by the hardware is
greater than the software memcpy, changing the frag_align to always be 0.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f6e9800

net/mlx4_core: Protect port type setting by mutex · 0a984556

由 Amir Vadai 提交于 11月 02, 2014

We need to protect set_port_type() for concurrency, as the sysfs code could
call it from mutliple contexts in parallel.

The port_mutex is not enough because we need to protect from concurrent
modification of 'info' and stopping of the port sensing work.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a984556

net/mlx4_core: Prevent VF from changing port configuration · 6e806699

由 Saeed Mahameed 提交于 11月 02, 2014

Added wrapper to the ACCESS_REG command for handling guest HW
registers access, preventing write operations, but do allow reads.

This will prevent SRIOV guests to change port PTYS configuration,
such as speed/advertised link modes.

Fixes: adbc7ac5 ('net/mlx4_core: Introduce ACCESS_REG CMD [...]')
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e806699

31 10月, 2014 3 次提交

mlx4: Avoid leaking steering rules on flow creation error flow · 571e1b2c

由 Or Gerlitz 提交于 10月 30, 2014

If mlx4_ib_create_flow() attempts to create > 1 rules with the
firmware, and one of these registrations fail, we leaked the
already created flow rules.

One example of the leak is when the registration of the VXLAN ghost
steering rule fails, we didn't unregister the original rule requested
by the user, introduced in commit d2fce8a9 "mlx4: Set
user-space raw Ethernet QPs to properly handle VXLAN traffic".

While here, add dump of the VXLAN portion of steering rules
so it can actually be seen when flow creation fails.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

571e1b2c

net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN · a4f2dacb

由 Or Gerlitz 提交于 10月 30, 2014

For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
both the outer UDP TX checksum and the inner TCP/UDP TX checksum.

The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
telling the HW to offload the outer UDP checksum for encapsulated packets,
fix that.

Fixes: 837052d0 ('net/mlx4_en: Add netdev support for TCP/IP
		     offloads of vxlan tunneling')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4f2dacb

mlx4: use napi_schedule_irqoff() · 477b35b4

由 Eric Dumazet 提交于 10月 29, 2014

mlx4_en_rx_irq() and mlx4_en_tx_irq() run from hard interrupt context.

They can use napi_schedule_irqoff() instead of napi_schedule()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-By: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

477b35b4

29 10月, 2014 13 次提交

net/mlx4_en: Report actual number of rings in indirection table · d5ec899a

由 Amir Vadai 提交于 10月 27, 2014

Hardware requires the number of rings in indirection table to be a power
of 2. When setting number of channels to a non power of 2 number,
indirection table is using only the closest power of 2 rings.
Report this number in 'ethtool -x' and not the total number of rx rings.
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5ec899a

net/mlx4_en: Move spinlocks and work initalizations to beginning of init_netdev · 207af6c5

由 Eugenia Emantayev 提交于 10月 27, 2014

Upon failures, destroy_netdev is called, and spinlocks/works must be
initialized before calling it. Otherwise kernel panic may occur.
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

207af6c5

net/mlx4_en: Call napi_synchronize on stop_port · f4a36751

由 Ido Shamay 提交于 10月 27, 2014

This is instead of calling the actual implementation of
napi_synchronize, for better encapsulation.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4a36751

net/mlx4_en: Cleanups suggested by clang static checker · c2a3d4b4

由 Jack Morgenstein 提交于 10月 27, 2014

clang flagged the following. All are actually cosmetic cleanups, not really bugs:

drivers/net/ethernet/mellanox/mlx4/en_main.c:233:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;
                ^     ~~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_main.c:293:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;

drivers/net/ethernet/mellanox/mlx4/en_netdev.c:648:16: warning: Assigned value is garbage or undefined
        entry->reg_id = reg_id;
                      ^ ~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:659:2: warning: Function call argument is an uninitialized value
        mlx4_en_uc_steer_release(priv, priv->dev->dev_addr, *qpn, reg_id);
(NOTE: reg_id is only used in the device-managed flow steering path, in which is it always initialized.
 This is not a bug. Cleanup here is therefore cosmetic only).

drivers/net/ethernet/mellanox/mlx4/en_rx.c:122:3: warning: Value stored to 'frag_info' is never read
                frag_info = &priv->frag_info[i];
                ^           ~~~~~~~~~~~~~~~~~~~
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2a3d4b4

net/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON · 537f6f95

由 Saeed Mahameed 提交于 10月 27, 2014

Move mlx4_en_reset_config to en_netdev.c as it now serves more general purpose.
Add support for turning OFF/ON the rx/tx vlan offlad.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

537f6f95

net/mlx4_en: Add support for setting rxvlan offload OFF/ON · 7787fa66

由 Saeed Mahameed 提交于 10月 27, 2014

Rename mlx4_en_timestamp_config to mlx4_en_reset_config and extend it to support
choosing RX vlan offload configuration.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7787fa66

net/mlx4_en: Use PTYS register to set ethtool settings (Speed) · d48b3ab4

由 Saeed Mahameed 提交于 10月 27, 2014

Added Support to set speed or advertised link modes via ethtool:
ethtool -s <ifname> [speed <speed>] [advertise <link modes>]
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d48b3ab4

net/mlx4_en: Use PTYS register to query ethtool settings · 2c762679

由 Saeed Mahameed 提交于 10月 27, 2014

- If dev cap MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL is ON, query PTYS register to fill ethtool settings.
else use default values.
- Use autoneg port cap and dev backplane autoneg cap to reprort autoneg interface capbilities.
- Fix typo in mlx4_en_port_state struct field (transciver to transceiver).
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c762679

ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting support · dcf972a3

由 Saeed Mahameed 提交于 10月 27, 2014

Added 100M, 20G and 56G ethtool speed reporting support.
Update mlx4_en_test_speed self test with the new speeds.

Defined new link speeds in include/uapi/linux/ethtool.h:
+#define SPEED_20000	20000
+#define SPEED_40000	40000
+#define SPEED_56000	56000
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcf972a3

net/mlx4_core: Add ethernet backplane autoneg device capability · a53e3e8c

由 Saeed Mahameed 提交于 10月 27, 2014

Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a53e3e8c

net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap · adbc7ac5

由 Saeed Mahameed 提交于 10月 27, 2014

Adding ACCESS REG mlx4 command and use it to implement Query method for
PTYS (Port Type and Speed Register).
Query and store eth_prot_ctrl dev cap.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adbc7ac5

ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support · 7202da8b

由 Saeed Mahameed 提交于 10月 27, 2014

Added support for get_module_info/get_module_eeprom ethtool support for cable info reading.

Added new cable types enum in include/uapi/linux/ethtool.h for ethtool use.
+#define ETH_MODULE_SFF_8636            0x3
+#define ETH_MODULE_SFF_8636_LEN        256
+#define ETH_MODULE_SFF_8436            0x4
+#define ETH_MODULE_SFF_8436_LEN        256
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7202da8b

net/mlx4_core: Introduce mlx4_get_module_info for cable module info reading · 32a173c7

由 Saeed Mahameed 提交于 10月 27, 2014

Added new MAD_IFC command to read cable module info with attribute id (0xFF60).
Update include/linux/mlx4/device.h with function declaration (mlx4_get_module_info)
and the needed defines/enums for future use.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32a173c7

27 10月, 2014 2 次提交

net/mlx4_core: Call synchronize_irq() before freeing EQ buffer · bf1bac5b

由 Eli Cohen 提交于 10月 23, 2014

After moving the EQ ownership to software effectively destroying it, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf1bac5b

net/mlx5_core: Call synchronize_irq() before freeing EQ buffer · 96e4be06

由 Eli Cohen 提交于 10月 23, 2014

After destroying the EQ, the object responsible for generating interrupts, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer. This patch solves a very
rare case when we get panic on driver unload.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96e4be06

11 10月, 2014 1 次提交

mlx4: fix race accessing page->_count · 98226208

由 Eric Dumazet 提交于 10月 10, 2014

This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98226208

09 10月, 2014 1 次提交

net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers · 53511453

由 Eric Dumazet 提交于 10月 08, 2014

Add two helpers so that drivers do not have to care of BQL being
available or not.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NJim Davis <jim.epost@gmail.com>
Fixes: 29d40c90 ("net/mlx4_en: Use prefetch in tx path")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53511453

08 10月, 2014 1 次提交

net/mlx4_en: remove NETDEV_TX_BUSY · fe971b95

由 Eric Dumazet 提交于 10月 06, 2014

Drivers should avoid NETDEV_TX_BUSY as much as possible.

They should stop the tx queue before qdisc even tries to push another
packet, to avoid requeues.

For a driver supporting skb->xmit_more, this is likely to be a prereq
anyway, otherwise we could have a tx deadlock : We need to force a
doorbell if TX ring is full.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe971b95

06 10月, 2014 6 次提交

net/mlx4_en: Use the new tx_copybreak to set inline threshold · 1556b874