提交 · 0714256c3d76793b6ce52e74b4fa207cfb502246 · openeuler / Kernel

02 7月, 2019 9 次提交

mlxsw: pci: PTP: Hook into packet transmit path · 0714256c

由 Petr Machata 提交于 6月 30, 2019

On Spectrum-1, timestamps are delivered separately from the packets, and
need to paired up. Therefore, at some point after mlxsw_sp_port_xmit()
is invoked, it is necessary to involve the chip-specific driver code to
allow it to do the necessary bookkeeping and matching.

On Spectrum-2, timestamps are delivered in CQE. For that reason,
position the point of driver involvement into mlxsw_pci_cqe_sdq_handle()
to make it hopefully easier to extend for Spectrum-2 in the future.

To tell the driver what port the packet was sent on, keep tx_info
in SKB control buffer.

Introduce a new driver core interface mlxsw_core_ptp_transmitted(), a
driver callback ptp_transmitted, and a PTP op transmitted. The callee is
responsible for taking care of releasing the SKB passed to the new
interfaces, and correspondingly have the new stub callbacks just call
dev_kfree_skb_any().

Follow-up patches will introduce the actual content into
mlxsw_sp1_ptp_transmitted() in particular.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0714256c

mlxsw: core: Add support for using SKB control buffer · d7cd206d

由 Petr Machata 提交于 6月 30, 2019

The SKB control buffer is useful (and used) for bookkeeping of information
related to that SKB. Add helpers so that the mlxsw driver(s) can safely use
the buffer as well. The structure is currently empty, individual users will
add members to it as necessary.

Note that SKB allocation functions already clear the buffer, so the cleanup
is only necessary when ndo_start_xmit is called.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7cd206d

mlxsw: spectrum: PTP: Hook into packet receive path · aed4b572

由 Petr Machata 提交于 6月 30, 2019

When configured, the Spectrum hardware can recognize PTP packets and
trap them to the CPU using dedicated traps, PTP0 and PTP1.

One reason to get PTP packets under dedicated traps is to have a
separate policer suitable for the amount of PTP traffic expected when
switch is operated as a boundary clock. For this, add two new trap
groups, MLXSW_REG_HTGT_TRAP_GROUP_SP_PTP0 and _PTP1, and associate the
two PTP traps with these two groups.

In the driver, specifically for Spectrum-1, event PTP packets will need
to be paired up with their timestamps. Those arrive through a different
set of traps, added later in the patch set. To support this future use,
introduce a new PTP op, ptp_receive.

It is possible to configure which PTP messages should be trapped under
which PTP trap. On Spectrum systems, we will use PTP0 for event
packets (which need timestamping), and PTP1 for control packets (which
do not). Thus configure PTP0 trap with a custom callback that defers to
the ptp_receive op.

Additionally, L2 PTP packets are actually trapped through the LLDP trap,
not through any of the PTP traps. So treat the LLDP trap the same way as
the PTP0 trap. Unlike PTP traps, which are currently still disabled,
LLDP trap is active. Correspondingly, have all the implementations of
the ptp_receive op return true, which the handler treats as a signal to
forward the packet immediately.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aed4b572

mlxsw: spectrum: Add support for traps specific to Spectrum-1 · dadbc6bc

由 Petr Machata 提交于 6月 30, 2019

On Spectrum-1, timestamps for PTP packets are delivered through queues
of ingress and egress timestamps. There are two event traps
corresponding to activity on each of those queues. This mechanism is
absent on Spectrum-2, and therefore the traps should only be registered
on Spectrum-1.

Carry a chip-specific listener array in mlxsw_sp->listeners and
listeners_count. Register listeners from that array in
mlxsw_sp_traps_init(). Add a new listener array for Spectrum-1 traps and
configure the newly-added mlxsw_sp->listeners with this array.

The listener array is empty for now, the events will be added in a later
patch.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dadbc6bc

mlxsw: spectrum: Extract a helper for trap registration · 4b6b91ed

由 Petr Machata 提交于 6月 30, 2019

On Spectrum-1, timestamps for PTP packets are delivered through queues
of ingress and egress timestamps. There are two event traps
corresponding to activity on each of those queues. This mechanism is
absent on Spectrum-2, and therefore the traps should only be registered
on Spectrum-1.

Extract out of mlxsw_sp_traps_init() a generic helper,
mlxsw_sp_traps_register(), and likewise with _unregister(). The new helpers
will later be called with Spectrum-1-specific traps.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b6b91ed

mlxsw: reg: Add Monitoring Global Configuration Register · 41ce78b9

由 Petr Machata 提交于 6月 30, 2019

This register serves to configure global parameters of certain
monitoring operations. The following patches will use it to configure
that when PTP timestamps are delivered through the PTP FIFO traps, the
FIFO in question is cleared as well.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41ce78b9

mlxsw: reg: Add Time Precision Packet Timestamping Reading · 98b9028e

由 Petr Machata 提交于 6月 30, 2019

The MTPPTR is used for reading the per port PTP timestamp FIFO.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98b9028e

mlxsw: reg: Add Monitoring Precision Time Protocol Trap Register · 4dfecb65

由 Petr Machata 提交于 6月 30, 2019

This register is used for configuring under which trap to deliver PTP
packets depending on type of the packet.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4dfecb65

mlxsw: reg: Add Monitoring Time Precision Packet Port Configuration Register · da28e878

由 Petr Machata 提交于 6月 30, 2019

This register serves for configuration of which PTP messages should be
timestamped. This is a global configuration, despite the register name.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da28e878

29 6月, 2019 10 次提交

net/mlx5e: Disallow tc redirect offload cases we don't support · f6dc1264

由 Paul Blakey 提交于 6月 24, 2019

After changing the parent_id to be the same for both NICs of same
the hardware device, netdev_port_same_parent_id now returns true for
more cases (all the lower devices in the hierarchy are on the same
hardware device).

If merged eswitch isn't enabled, these cases aren't supported, so disallow
them.
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f6dc1264

net/mlx5e: Expose same physical switch_id for all representors · 7ff40a46

由 Paul Blakey 提交于 5月 16, 2019

Report system_image_guid as the E-Switch switch_id, this ensures
that when a NIC contains multiple PCI functions and which
has merged eswitch capability, all representors from
multiple PFs publish same switch_id.
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7ff40a46

net/mlx5e: Don't refresh TIRs when updating representor SQs · a90f88fe

由 Gavi Teitz 提交于 5月 23, 2019

Refreshing TIRs is done in order to update the TIRs with the current
state of SQs in the transport domain, so that the TIRs can filter out
undesired self-loopback packets based on the source SQ of the packet.

Representor TIRs will only receive packets that originate from their
associated vport, due to dedicated steering, and therefore will never
receive self-loopback packets, whose source vport will be the vport of
the E-Switch manager, and therefore not the vport associated with the
representor. As such, it is not necessary to refresh the representors'
TIRs, since self-loopback packets can't reach them.

Since representors only exist in switchdev mode, and there is no
scenario in which a representor will exist in the transport domain
alongside a non-representor, it is not necessary to refresh the
transport domain's TIRs upon changing the state of a representor's
queues. Therefore, do not refresh TIRs upon such a change. Achieve
this by adding an update_rx callback to the mlx5e_profile, which
refreshes TIRs for non-representors and does nothing for representors,
and replace instances of mlx5e_refresh_tirs() upon changing the state
of the queues with update_rx().
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a90f88fe

net/mlx5e: reduce stack usage in mlx5_eswitch_termtbl_create · 5233794b

由 Arnd Bergmann 提交于 6月 18, 2019

Putting an empty 'mlx5_flow_spec' structure on the stack is a bit
wasteful and causes a warning on 32-bit architectures when building
with clang -fsanitize-coverage:

drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c: In function 'mlx5_eswitch_termtbl_create':
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads_termtbl.c:90:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Since the structure is never written to, we can statically allocate
it to avoid the stack usage. To be on the safe side, mark all
subsequent function arguments that we pass it into as 'const'
as well.

Fixes: 10caabda ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Acked-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5233794b

net/mlx5e: Set drvinfo in generic manner · f72e6c3e

由 Parav Pandit 提交于 5月 27, 2019

Consider PCI and non PCI device types while setting device name
in get_drvinfo() callback using existing generic device.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NVu Pham <vuhuong@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f72e6c3e

net/mlx5e: Correct phys_port_name for PF port · 08706736

由 Parav Pandit 提交于 6月 05, 2019

Currently PF phys_port_name is named as pfNvf-1 as vport number for PF
vport is 65535.
Correct PF's phys_port name as agreed upon name as pfN.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NVu Pham <vuhuong@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

08706736

net/mlx5e: Report netdevice MPLS features · 5dc9520b

由 Ariel Levkovich 提交于 6月 05, 2019

Set supported device features in the netdevice MPLS features mask.
This will enable HW checksumming and TSO for MPLS tagged traffic.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5dc9520b

net/mlx5e: Move to HW checksumming advertising · e4683f35

由 Ariel Levkovich 提交于 6月 05, 2019

This patch changes the way the driver advertises its checksum offload
capabilities within the net device features bit mask.

Instead of advertising protocol specific checksumming capabilities
which are limited today to IPv4 and IPv6, we move to reporing
generic HW checksumming capabilities.

This will allow the network stack to let mlx5 device offload checksum
for cases where the IP header is encapsulated within another protocol
and the skb->protocol doesn't indicate one of the IP versions protocol,
specifically in the case of MPLS label encapsulating the IP header and
the skb->protocol indiciates MPLS ethertype rather than IP.

Moving the HW_CSUM reporting is required in the basic net device hw
features mask and also in the extensions (vlan and encpasulation
features) since the extensions are always multiplied by the basic
features set during the packet's traversal through the stack's tx flow.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e4683f35

net/mlx5: MPFS, Allow adding the same MAC more than once · e7e0bee8

由 Gavi Teitz 提交于 6月 11, 2019

Remove the limitation preventing adding a vport's MAC address to the
Multi-Physical Function Switch (MPFS) more than once per E-switch, as
there is no difference in the MPFS if an address is being used by an
E-switch more than once.

This allows the E-switch to have multiple vports with the same MAC
address, allowing vports to be classified by VLAN id instead of by MAC
if desired.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e7e0bee8

net/mlx5: MPFS, Cleanup add MAC flow · 6311f308

由 Gavi Teitz 提交于 6月 11, 2019

Unify and isolate the error handling flow in mlx5_mpfs_add_mac(),
removing code duplication.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6311f308

27 6月, 2019 11 次提交

net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it · 92ab1eb3

由 Jianbo Liu 提交于 6月 25, 2019

As the ingress ACL rules save vhca id and vport number to packet's
metadata REG_C_0, and the metadata matching for the rules in both fast
path and slow path are all added, enable this feature if supported.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

92ab1eb3

net/mlx5: E-Switch, Add match on vport metadata for rule in slow path · a5641cb5

由 Jianbo Liu 提交于 6月 25, 2019

In slow path, packet that not matched by any offloaded rule is
forwarded to eswitch vport manager for further processing.
Add matching on metadata for peer miss rules in FDB, and rules which
forward packet to correct representor in esw manager NIC_RX table.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a5641cb5

net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager · c1286050

由 Jianbo Liu 提交于 6月 25, 2019

In order to do matching on metadata in slow path when demuxing traffic
to representors, explicitly enable the feature that allows HW to pass
metadata REG_C_0 from FDB to eswitch manager NIC_RX table.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c1286050

net/mlx5: E-Switch, Add query and modify esw vport context functions · 57843868

由 Jianbo Liu 提交于 6月 25, 2019

Add esw vport query and modify functions, and exposing them is needed for
enabling or disabling registers passed as metatdata to vport NIC_RX table
in slow path.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

57843868

net/mlx5: E-Switch, Add match on vport metadata for rule in fast path · c01cfd0f

由 Jianbo Liu 提交于 6月 25, 2019

If FW's capabilities and configurations meet the requirement of vport
metadata matching, this feature will be used. As the information
about vport number and vhca_id related to packet is already stored to
its metadata register, which is used as an indicator for perticular
vport, now we can change to match on this metadata for all the
offloading rules in fast path.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c01cfd0f

net/mlx5e: Specifying known origin of packets matching the flow · 8d212ff0

由 Jianbo Liu 提交于 6月 25, 2019

In vport metadata matching, source port number is replaced by metadata.
While FW has no idea about what it is in the metadata, a syndrome will
happen. Specify a known origin to avoid the syndrome.
However, there is no functional change because ANY_VPORT (0) is filled
in flow_source, the same default value as before, as a pre-step towards
metadata matching for fast path.
There are two other values can be filled in flow_source. When setting
0x1, packet matching this rule is from uplink, while 0x2 is for packet
from other local vports.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8d212ff0

net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs · 7445cfb1

由 Jianbo Liu 提交于 6月 25, 2019

When a dual-port VHCA sends a RoCE packet on its non-native port, and the
packet arrives to its affiliated vport FDB, a mismatch might occur on the
rules that match the packet source vport as it is not represented by single
VHCA only in this case. So we change to match on metadata instead of source
vport.
To do that, a rule is created in all vports and uplink ingress ACLs, to
save the source vport number and vhca id in the packet's metadata in order
to match on it later.
The metadata register used is the first of the 32-bit type C registers. It
can be used for matching and header modify operations. The higher 16 bits
of this register are for vhca id, and the lower 16 ones is for vport
number.
This change is not for dual-port RoCE only. If HW and FW allow, the vport
metadata matching is enabled by default.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7445cfb1

net/mlx5: Add flow context for flow tag · bb0ee7dc

由 Jianbo Liu 提交于 6月 25, 2019

Refactor the flow data structures, add new flow_context and move
flow_tag into it, as flow_tag doesn't belong to the rule action.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

bb0ee7dc

net/mlx5: Introduce a helper API to check VF vport · 91d6291c

由 Parav Pandit 提交于 6月 25, 2019

Introduce a helper API mlx5_eswitch_is_vf_vport() to check
if a given vport_num belongs to VF or not.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NJianbo Liu <jianbol@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

91d6291c

net/mlx5: Support allocating modify header context from ingress ACL · 84b0d6a7

由 Jianbo Liu 提交于 6月 25, 2019

That modify header action can be then attached to a steering rule in
the ingress ACL.
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

84b0d6a7

net/mlx5: Get vport ACL namespace by vport index · f53297d6

由 Jianbo Liu 提交于 6月 25, 2019

The ingress and egress ACL root namespaces are created per vport and
stored into arrays. However, the vport number is not the same as the
index. Passing the array index, instead of vport number, to get the
correct ingress and egress acl namespace.

Fixes: 9b93ab98 ("net/mlx5: Separate ingress/egress namespaces for each vport")
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f53297d6

26 6月, 2019 4 次提交

linux/dim: Move implementation to .c files · 4f75da36

由 Tal Gilboa 提交于 1月 10, 2019

Moved all logic from dim.h and net_dim.h to dim.c and net_dim.c.
This is both more structurally appealing and would allow to only
expose externally used functions.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4f75da36

linux/dim: Rename externally used net_dim members · 8960b389

由 Tal Gilboa 提交于 1月 31, 2019

Removed 'net' prefix from functions and structs used by external drivers.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8960b389

linux/dim: Rename net_dim_sample() to net_dim_update_sample() · e5b6ab02

由 Tal Gilboa 提交于 1月 14, 2019

In order to avoid confusion between the function and the similarly
named struct.
In preparation for removing the 'net' prefix from dim members.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e5b6ab02

linux/dim: Rename externally exposed macros · c002bd52

由 Tal Gilboa 提交于 11月 05, 2018

Renamed macros in use by external drivers.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c002bd52

25 6月, 2019 1 次提交

net/mlx5: Convert mkey_table to XArray · 792c4e9d

由 Matthew Wilcox 提交于 6月 20, 2019

The lock protecting the data structure does not need to be an rwlock.  The
only read access to the lock is in an error path, and if that's limiting
your scalability, you have bigger performance problems.

Eliminate mlx5_mkey_table in favour of using the xarray directly.
reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may
be called in interrupt context.

This also fixes a minor bug where SRCU locking was being used on the radix
tree read side, when RCU was needed too.
Signed-off-by: NMatthew Wilcox <willy@infradead.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

792c4e9d

24 6月, 2019 3 次提交

mlxsw: core: Add support for negative temperature readout · f485cc36

由 Vadim Pasternak 提交于 6月 24, 2019

Extend macros MLXSW_REG_MTMP_TEMP_TO_MC() to allow support of negative
temperature readout, since chip and others thermal components are
capable of operating within the negative temperature.
With no such support negative temperature will be consider as very high
temperature and it will cause wrong readout and thermal shutdown.
For negative values 2`s complement is used.
Tested in chamber.
Example of chip ambient temperature readout with chamber temperature:
-10 Celsius:
temp1:             -6.0C  (highest =  -5.0C)
-5 Celsius:
temp1:             -1.0C  (highest =  -1.0C)

v2 (Andrew Lunn):
* Replace '%u' with '%d' in mlxsw_hwmon_module_temp_show()
Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f485cc36

mlxsw: core: Add the hottest thermal zone detection · 6f73862f

由 Vadim Pasternak 提交于 6月 24, 2019

When multiple sensors are mapped to the same cooling device, the
cooling device should be set according the worst sensor from the
sensors associated with this cooling device.

Provide the hottest thermal zone detection and enforce cooling device
to follow the temperature trends of the hottest zone only.
Prevent competition for the cooling device control from others zones,
by "stable trend" indication. A cooling device will not perform any
actions associated with a zone with a "stable trend".

When other thermal zone is detected as a hottest, a cooling device is
to be switched to following temperature trends of new hottest zone.

Thermal zone score is represented by 32 bits unsigned integer and
calculated according to the next formula:
For T < TZ<t><i>, where t from {normal trip = 0, high trip = 1, hot
trip = 2, critical = 3}:
TZ<i> score = (T + (TZ<t><i> - T) / 2) / (TZ<t><i> - T) * 256 ** j;
Highest thermal zone score s is set as MAX(TZ<i>score);
Following this formula, if TZ<i> is in trip point higher than TZ<k>,
the higher score is to be always assigned to TZ<i>.

For two thermal zones located at the same kind of trip point, the higher
score will be assigned to the zone which is closer to the next trip
point. Thus, the highest score will always be assigned objectively to
the hottest thermal zone.

All the thermal zones initially are to be configured with mode
"enabled" with the "step_wise" governor.
Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f73862f

mlxsw: core: Extend thermal core with per inter-connect device thermal zones · f14f4e62

由 Vadim Pasternak 提交于 6月 24, 2019

Add a dedicated thermal zone for each inter-connect device. The
current temperature is obtained from inter-connect temperature sensor
and the default trip points are set to the same values as default ASIC
trip points. These settings could be changed from the user space.
A cooling device (fan) is bound to all inter-connect devices.
Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f14f4e62

19 6月, 2019 2 次提交

xdp: tracking page_pool resources and safe removal · 99c07c43

由 Jesper Dangaard Brouer 提交于 6月 18, 2019

This patch is needed before we can allow drivers to use page_pool for
DMA-mappings. Today with page_pool and XDP return API, it is possible to
remove the page_pool object (from rhashtable), while there are still
in-flight packet-pages. This is safely handled via RCU and failed lookups in
__xdp_return() fallback to call put_page(), when page_pool object is gone.
In-case page is still DMA mapped, this will result in page note getting
correctly DMA unmapped.

To solve this, the page_pool is extended with tracking in-flight pages. And
XDP disconnect system queries page_pool and waits, via workqueue, for all
in-flight pages to be returned.

To avoid killing performance when tracking in-flight pages, the implement
use two (unsigned) counters, that in placed on different cache-lines, and
can be used to deduct in-flight packets. This is done by mapping the
unsigned "sequence" counters onto signed Two's complement arithmetic
operations. This is e.g. used by kernel's time_after macros, described in
kernel commit 1ba3aab3 and 5a581b36, and also explained in RFC1982.

The trick is these two incrementing counters only need to be read and
compared, when checking if it's safe to free the page_pool structure. Which
will only happen when driver have disconnected RX/alloc side. Thus, on a
non-fast-path.

It is chosen that page_pool tracking is also enabled for the non-DMA
use-case, as this can be used for statistics later.

After this patch, using page_pool requires more strict resource "release",
e.g. via page_pool_release_page() that was introduced in this patchset, and
previous patches implement/fix this more strict requirement.

Drivers no-longer call page_pool_destroy(). Drivers already call
xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will
attempt to disconnect the mem id, and if attempt fails schedule the
disconnect for later via delayed workqueue.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99c07c43

mlx5: more strict use of page_pool API · 29b006a6

由 Jesper Dangaard Brouer 提交于 6月 18, 2019

The mlx5 driver is using page_pool, but not for DMA-mapping (currently), and
is a little too relaxed about returning or releasing page resources, as it
is not strictly necessary, when not using DMA-mappings.

As this patchset is working towards tracking page_pool resources, to know
about in-flight frames on shutdown. Then fix places where mlx5 leak
page_pool resource.

In case of dma_mapping_error, then recycle into page_pool.

In mlx5e_free_rq() moved the page_pool_destroy() call to after the
mlx5e_page_release() calls, as it is more correct.

In mlx5e_page_release() when no recycle was requested, then release page
from the page_pool, via page_pool_release_page().
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29b006a6

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功