提交 513334e1 编写于 作者: D David S. Miller

Merge branch 'mlx5-next'

Saeed Mahameed says:

====================
Mellanox 100G SRIOV E-Switch offload and VF representors

We are happy to announce SRIOV E-Switch offload and VF netdev representors.

Or Gerlitz says:

Currently, the way SR-IOV embedded switches are dealt with in Linux is limited
in its expressiveness and flexibility, but this is not necessarily due to
hardware limitations. The kernel software model for controlling the SR-IOV
switch simply does not allow the configuration of anything more complex than
MAC/VLAN based forwarding.

Hence the benefits brought by SRIOV come at a price of management flexibility,
when compared to software virtual switches which are used in Para-Virtual (PV)
schemes and allow implementing complex policies and virtual topologies. Such
SW switching typically involved a complex per-packet processing within the host
kernel using subsystems such as TC, Bridge, Netfilter and Open-vswitch.

We'd like to change that and get the best of both worlds: the performance of SR-IOV
with the management flexibility of software switches. This will eventually include
a richer model for controlling the SR-IOV switch for flow-based switching and
tunneling. Under this model, the e-switch is configured dynamically and a fallback
to software exists in case the hardware is unable to offload all required flows.

This series from Hadar Hen-Zion and myself, is the 1st step in that direction,
specfically, it provides full control on the SRIOV embedded switching by host
software and paves the way to offload switching rules and polices with downstream
patches.

To allow for host based SW control on the SRIOV HW switch, we introduce per VF
representor host netdevice. The VF representor plays the same role as TAP devices
in PV setup. A packet send through the VF representor on the host arrives to
the VF, and a packet sent through the VF is received by its representor. The
administrator can hook the representor netdev into a kernel switching component.
Once they do that, packets from the VF are subject to steering (matching and
actions) of that software component."

Doing so indeed hurts the performance benefits of SRIOV as it forces all the
traffic to go through the hypervisor. However, this SW representation is what
would eventually allow us to introduce hybrid model, where we offload steering
for some of the VF/VM traffic to the HW while keeping other VM traffic to go
through the hypervisor. Examples for the latter are first packet of flows which
are needed for SW switches learning and/or matching against policy database or
types of traffic for which offloading is not desired or not supported by the
current HW eswitch generation.

The embedded switch is managed through a PCI device driver. As such, we introduce
a devlink/pci based scheme for setting the mode of the e-switch. The current mode
(where steering is done based on mac/vlan, etc) is referred to as "legacy" and the
new mode as "offloads".

For the mlx5 driver / ConnectX4 HW case, the VF representors implement a functional
subset of mlx5e Ethernet netdevices using their own profile. This design buys us robust
implementation with code reuse and sharing.

The representors are created by the host PCI driver when (1) in SRIOV and (2) the
e-switch is set to offloads mode. Currently, in mlx5 the e-switch management is done
through the PF vport (0) and hence the VF representors along with the existing PF
netdev which represents the uplink share the PCI PF device instance.

The series is built from two major components, the first relates to the e-switch
management and the second to VF representors.

We start with a refactoring that treats the existing SRIOV e-switch code as of operating
in legacy mode. Next, we add the code for the offloads mode which programs the e-switch
to operate in a way which serves for software based switching:

1. miss rule which matches all packets that do not match any HW other switching rule
and forwards them to the e-switch management port (0) for further processing.

2. infrastructure for send-to-vport rules which conceptually bypass other "normal"
steering rules which present at the e-switch datapath. Such rules apply only for packets
that originate in the e-switch manager vport (0).

Since all the VF reps run over the same e-switch port, we use more logic in the host PCI
driver to do HW steering of missed packets into the HW queue opened by a the respective VF
representor. Finally here, we add the devlink APIs to configure the e-switch mode.

The second part from Hadar starts with some refactoring work which allow for multiple
mlx5e NIC instances to be created over the same PCI function, use common resources
and avoid wrong loopbacks.

Next comes the heart of the change which is a profile definition which allow to practically
have both "conventional" mlx5e NIC use cases such as native mode (non SRIOV), VF, PF and VF
representor to share the Ethernet driver code. This is done by a small surgery that ended up
with few internal callbacks that should be implemented by a profile instance. The profile
for the conventional NIC is implemented, to preserve the existing functionality.

The last two patches add e-switch registration API for the VF representors and the
implementation of the VF representors netdevice profile. Being an mlx5e instance, the
VF representor uses HW send/recv queues, completions queues and such. It currently doesn't
support NIC offloads but some of them could be added later on. The VF representor has
switchdev ops, where currently the only supported API is the one to the HW ID,
which is needed to identify multiple representors belonging to the same e-switch.

The architecture + solution (software and firmware) work were done by a team consisting
of Ilya Lesokhin, Haggai Eran, Rony Efraim, Tal Anker, Natan Oppenheimer, Saeed Mahameed,
Hadar and Or, thanks you all!

v1 --> v2 fixes:
* removed unneeded variable (patch #3)
* removed unused value DEVLINK_ESWITCH_MODE_NONE (patch #8)
* changed the devlink mode name from "offloads" to "switchdev" which
   better describes what are we referring here, using a known concept (patch #8)
* correctly refer to devlink e-switch modes (patch #10)
* use the correct mlx5e way to define the VF rep statistics  (patch #16)

v2 --> v3 fixes:
* Rebased on top 6fde0e63 'be2net: signedness bug in be_msix_enable()'
* Handled compilation error introduced by rebase on top "f5074d0c Merge branch 'mlx5-100G-fixes'"
* This series applies perfectly even with 'mlx5 resiliency and xmit path fixes' merged to net-next
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
......@@ -4,6 +4,7 @@
config MLX5_CORE
tristate "Mellanox Technologies ConnectX-4 and Connect-IB core driver"
depends on MAY_USE_DEVLINK
depends on PCI
default n
---help---
......
......@@ -5,9 +5,9 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
fs_counters.o rl.o
mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o \
en_main.o en_fs.o en_ethtool.o en_tx.o en_rx.o \
en_rx_am.o en_txrx.o en_clock.o vxlan.o en_tc.o \
en_arfs.o
mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \
en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \
en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
en_tc.o en_arfs.o en_rep.o
mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o
......@@ -44,6 +44,7 @@
#include <linux/mlx5/vport.h>
#include <linux/mlx5/transobj.h>
#include <linux/rhashtable.h>
#include <net/switchdev.h>
#include "wq.h"
#include "mlx5_core.h"
#include "en_stats.h"
......@@ -552,9 +553,15 @@ struct mlx5e_flow_steering {
struct mlx5e_arfs_tables arfs;
};
struct mlx5e_direct_tir {
u32 tirn;
struct mlx5e_rqt {
u32 rqtn;
bool enabled;
};
struct mlx5e_tir {
u32 tirn;
struct mlx5e_rqt rqt;
struct list_head list;
};
enum {
......@@ -562,6 +569,22 @@ enum {
MLX5E_NIC_PRIO
};
struct mlx5e_profile {
void (*init)(struct mlx5_core_dev *mdev,
struct net_device *netdev,
const struct mlx5e_profile *profile, void *ppriv);
void (*cleanup)(struct mlx5e_priv *priv);
int (*init_rx)(struct mlx5e_priv *priv);
void (*cleanup_rx)(struct mlx5e_priv *priv);
int (*init_tx)(struct mlx5e_priv *priv);
void (*cleanup_tx)(struct mlx5e_priv *priv);
void (*enable)(struct mlx5e_priv *priv);
void (*disable)(struct mlx5e_priv *priv);
void (*update_stats)(struct mlx5e_priv *priv);
int (*max_nch)(struct mlx5_core_dev *mdev);
int max_tc;
};
struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_sq **txq_to_sq_map;
......@@ -570,18 +593,14 @@ struct mlx5e_priv {
unsigned long state;
struct mutex state_lock; /* Protects Interface state */
struct mlx5_uar cq_uar;
u32 pdn;
u32 tdn;
struct mlx5_core_mkey mkey;
struct mlx5_core_mkey umr_mkey;
struct mlx5e_rq drop_rq;
struct mlx5e_channel **channel;
u32 tisn[MLX5E_MAX_NUM_TC];
u32 indir_rqtn;
u32 indir_tirn[MLX5E_NUM_INDIR_TIRS];
struct mlx5e_direct_tir direct_tir[MLX5E_MAX_NUM_CHANNELS];
struct mlx5e_rqt indir_rqt;
struct mlx5e_tir indir_tir[MLX5E_NUM_INDIR_TIRS];
struct mlx5e_tir direct_tir[MLX5E_MAX_NUM_CHANNELS];
u32 tx_rates[MLX5E_MAX_NUM_SQS];
struct mlx5e_flow_steering fs;
......@@ -599,6 +618,8 @@ struct mlx5e_priv {
struct mlx5e_stats stats;
struct mlx5e_tstamp tstamp;
u16 q_counter;
const struct mlx5e_profile *profile;
void *ppriv;
};
enum mlx5e_link_mode {
......@@ -788,5 +809,39 @@ int mlx5e_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb,
#endif
u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev);
int mlx5e_create_tir(struct mlx5_core_dev *mdev,
struct mlx5e_tir *tir, u32 *in, int inlen);
void mlx5e_destroy_tir(struct mlx5_core_dev *mdev,
struct mlx5e_tir *tir);
int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev);
struct mlx5_eswitch_rep;
int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep);
void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep);
int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep);
void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep);
int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv);
void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv);
int mlx5e_attr_get(struct net_device *dev, struct switchdev_attr *attr);
int mlx5e_create_direct_rqts(struct mlx5e_priv *priv);
void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt);
int mlx5e_create_direct_tirs(struct mlx5e_priv *priv);
void mlx5e_destroy_direct_tirs(struct mlx5e_priv *priv);
int mlx5e_create_tises(struct mlx5e_priv *priv);
void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv);
int mlx5e_close(struct net_device *netdev);
int mlx5e_open(struct net_device *netdev);
void mlx5e_update_stats_work(struct work_struct *work);
void *mlx5e_create_netdev(struct mlx5_core_dev *mdev,
const struct mlx5e_profile *profile, void *ppriv);
void mlx5e_destroy_netdev(struct mlx5_core_dev *mdev, struct mlx5e_priv *priv);
struct rtnl_link_stats64 *
mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats);
#endif /* __MLX5_EN_H__ */
......@@ -93,14 +93,14 @@ static enum mlx5e_traffic_types arfs_get_tt(enum arfs_type type)
static int arfs_disable(struct mlx5e_priv *priv)
{
struct mlx5_flow_destination dest;
u32 *tirn = priv->indir_tirn;
struct mlx5e_tir *tir = priv->indir_tir;
int err = 0;
int tt;
int i;
dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
for (i = 0; i < ARFS_NUM_TYPES; i++) {
dest.tir_num = tirn[i];
dest.tir_num = tir[i].tirn;
tt = arfs_get_tt(i);
/* Modify ttc rules destination to bypass the aRFS tables*/
err = mlx5_modify_rule_destination(priv->fs.ttc.rules[tt],
......@@ -176,7 +176,7 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
struct arfs_table *arfs_t = &priv->fs.arfs.arfs_tables[type];
struct mlx5_flow_destination dest;
u8 match_criteria_enable = 0;
u32 *tirn = priv->indir_tirn;
struct mlx5e_tir *tir = priv->indir_tir;
u32 *match_criteria;
u32 *match_value;
int err = 0;
......@@ -192,16 +192,16 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
switch (type) {
case ARFS_IPV4_TCP:
dest.tir_num = tirn[MLX5E_TT_IPV4_TCP];
dest.tir_num = tir[MLX5E_TT_IPV4_TCP].tirn;
break;
case ARFS_IPV4_UDP:
dest.tir_num = tirn[MLX5E_TT_IPV4_UDP];
dest.tir_num = tir[MLX5E_TT_IPV4_UDP].tirn;
break;
case ARFS_IPV6_TCP:
dest.tir_num = tirn[MLX5E_TT_IPV6_TCP];
dest.tir_num = tir[MLX5E_TT_IPV6_TCP].tirn;
break;
case ARFS_IPV6_UDP:
dest.tir_num = tirn[MLX5E_TT_IPV6_UDP];
dest.tir_num = tir[MLX5E_TT_IPV6_UDP].tirn;
break;
default:
err = -EINVAL;
......
/*
* Copyright (c) 2016, Mellanox Technologies. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "en.h"
/* mlx5e global resources should be placed in this file.
* Global resources are common to all the netdevices crated on the same nic.
*/
int mlx5e_create_tir(struct mlx5_core_dev *mdev,
struct mlx5e_tir *tir, u32 *in, int inlen)
{
int err;
err = mlx5_core_create_tir(mdev, in, inlen, &tir->tirn);
if (err)
return err;
list_add(&tir->list, &mdev->mlx5e_res.td.tirs_list);
return 0;
}
void mlx5e_destroy_tir(struct mlx5_core_dev *mdev,
struct mlx5e_tir *tir)
{
mlx5_core_destroy_tir(mdev, tir->tirn);
list_del(&tir->list);
}
static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn,
struct mlx5_core_mkey *mkey)
{
struct mlx5_create_mkey_mbox_in *in;
int err;
in = mlx5_vzalloc(sizeof(*in));
if (!in)
return -ENOMEM;
in->seg.flags = MLX5_PERM_LOCAL_WRITE |
MLX5_PERM_LOCAL_READ |
MLX5_ACCESS_MODE_PA;
in->seg.flags_pd = cpu_to_be32(pdn | MLX5_MKEY_LEN64);
in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8);
err = mlx5_core_create_mkey(mdev, mkey, in, sizeof(*in), NULL, NULL,
NULL);
kvfree(in);
return err;
}
int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev)
{
struct mlx5e_resources *res = &mdev->mlx5e_res;
int err;
err = mlx5_alloc_map_uar(mdev, &res->cq_uar, false);
if (err) {
mlx5_core_err(mdev, "alloc_map uar failed, %d\n", err);
return err;
}
err = mlx5_core_alloc_pd(mdev, &res->pdn);
if (err) {
mlx5_core_err(mdev, "alloc pd failed, %d\n", err);
goto err_unmap_free_uar;
}
err = mlx5_core_alloc_transport_domain(mdev, &res->td.tdn);
if (err) {
mlx5_core_err(mdev, "alloc td failed, %d\n", err);
goto err_dealloc_pd;
}
err = mlx5e_create_mkey(mdev, res->pdn, &res->mkey);
if (err) {
mlx5_core_err(mdev, "create mkey failed, %d\n", err);
goto err_dealloc_transport_domain;
}
INIT_LIST_HEAD(&mdev->mlx5e_res.td.tirs_list);
return 0;
err_dealloc_transport_domain:
mlx5_core_dealloc_transport_domain(mdev, res->td.tdn);
err_dealloc_pd:
mlx5_core_dealloc_pd(mdev, res->pdn);
err_unmap_free_uar:
mlx5_unmap_free_uar(mdev, &res->cq_uar);
return err;
}
void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev)
{
struct mlx5e_resources *res = &mdev->mlx5e_res;
mlx5_core_destroy_mkey(mdev, &res->mkey);
mlx5_core_dealloc_transport_domain(mdev, res->td.tdn);
mlx5_core_dealloc_pd(mdev, res->pdn);
mlx5_unmap_free_uar(mdev, &res->cq_uar);
}
int mlx5e_refresh_tirs_self_loopback_enable(struct mlx5_core_dev *mdev)
{
struct mlx5e_tir *tir;
void *in;
int inlen;
int err;
inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
in = mlx5_vzalloc(inlen);
if (!in)
return -ENOMEM;
MLX5_SET(modify_tir_in, in, bitmask.self_lb_en, 1);
list_for_each_entry(tir, &mdev->mlx5e_res.td.tirs_list, list) {
err = mlx5_core_modify_tir(mdev, tir->tirn, in, inlen);
if (err)
return err;
}
kvfree(in);
return 0;
}
......@@ -876,7 +876,7 @@ static void mlx5e_modify_tirs_hash(struct mlx5e_priv *priv, void *in, int inlen)
mlx5e_build_tir_ctx_hash(tirc, priv);
for (i = 0; i < MLX5E_NUM_INDIR_TIRS; i++)
mlx5_core_modify_tir(mdev, priv->indir_tirn[i], in, inlen);
mlx5_core_modify_tir(mdev, priv->indir_tir[i].tirn, in, inlen);
}
static int mlx5e_set_rxfh(struct net_device *dev, const u32 *indir,
......@@ -898,7 +898,7 @@ static int mlx5e_set_rxfh(struct net_device *dev, const u32 *indir,
mutex_lock(&priv->state_lock);
if (indir) {
u32 rqtn = priv->indir_rqtn;
u32 rqtn = priv->indir_rqt.rqtn;
memcpy(priv->params.indirection_rqt, indir,
sizeof(priv->params.indirection_rqt));
......
......@@ -655,7 +655,7 @@ static int mlx5e_generate_ttc_table_rules(struct mlx5e_priv *priv)
if (tt == MLX5E_TT_ANY)
dest.tir_num = priv->direct_tir[0].tirn;
else
dest.tir_num = priv->indir_tirn[tt];
dest.tir_num = priv->indir_tir[tt].tirn;
rules[tt] = mlx5e_generate_ttc_rule(priv, ft, &dest,
ttc_rules[tt].etype,
ttc_rules[tt].proto);
......
/*
* Copyright (c) 2016, Mellanox Technologies. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <generated/utsrelease.h>
#include <linux/mlx5/fs.h>
#include <net/switchdev.h>
#include "eswitch.h"
#include "en.h"
static const char mlx5e_rep_driver_name[] = "mlx5e_rep";
static void mlx5e_rep_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *drvinfo)
{
strlcpy(drvinfo->driver, mlx5e_rep_driver_name,
sizeof(drvinfo->driver));
strlcpy(drvinfo->version, UTS_RELEASE, sizeof(drvinfo->version));
}
static const struct counter_desc sw_rep_stats_desc[] = {
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_packets) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_bytes) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_packets) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, tx_bytes) },
};
#define NUM_VPORT_REP_COUNTERS ARRAY_SIZE(sw_rep_stats_desc)
static void mlx5e_rep_get_strings(struct net_device *dev,
u32 stringset, uint8_t *data)
{
int i;
switch (stringset) {
case ETH_SS_STATS:
for (i = 0; i < NUM_VPORT_REP_COUNTERS; i++)
strcpy(data + (i * ETH_GSTRING_LEN),
sw_rep_stats_desc[i].format);
break;
}
}
static void mlx5e_update_sw_rep_counters(struct mlx5e_priv *priv)
{
struct mlx5e_sw_stats *s = &priv->stats.sw;
struct mlx5e_rq_stats *rq_stats;
struct mlx5e_sq_stats *sq_stats;
int i, j;
memset(s, 0, sizeof(*s));
for (i = 0; i < priv->params.num_channels; i++) {
rq_stats = &priv->channel[i]->rq.stats;
s->rx_packets += rq_stats->packets;
s->rx_bytes += rq_stats->bytes;
for (j = 0; j < priv->params.num_tc; j++) {
sq_stats = &priv->channel[i]->sq[j].stats;
s->tx_packets += sq_stats->packets;
s->tx_bytes += sq_stats->bytes;
}
}
}
static void mlx5e_rep_get_ethtool_stats(struct net_device *dev,
struct ethtool_stats *stats, u64 *data)
{
struct mlx5e_priv *priv = netdev_priv(dev);
int i;
if (!data)
return;
mutex_lock(&priv->state_lock);
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_update_sw_rep_counters(priv);
mutex_unlock(&priv->state_lock);
for (i = 0; i < NUM_VPORT_REP_COUNTERS; i++)
data[i] = MLX5E_READ_CTR64_CPU(&priv->stats.sw,
sw_rep_stats_desc, i);
}
static int mlx5e_rep_get_sset_count(struct net_device *dev, int sset)
{
switch (sset) {
case ETH_SS_STATS:
return NUM_VPORT_REP_COUNTERS;
default:
return -EOPNOTSUPP;
}
}
static const struct ethtool_ops mlx5e_rep_ethtool_ops = {
.get_drvinfo = mlx5e_rep_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_strings = mlx5e_rep_get_strings,
.get_sset_count = mlx5e_rep_get_sset_count,
.get_ethtool_stats = mlx5e_rep_get_ethtool_stats,
};
int mlx5e_attr_get(struct net_device *dev, struct switchdev_attr *attr)
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
u8 mac[ETH_ALEN];
if (esw->mode == SRIOV_NONE)
return -EOPNOTSUPP;
switch (attr->id) {
case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
mlx5_query_nic_vport_mac_address(priv->mdev, 0, mac);
attr->u.ppid.id_len = ETH_ALEN;
memcpy(&attr->u.ppid.id, &mac, ETH_ALEN);
break;
default:
return -EOPNOTSUPP;
}
return 0;
}
int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv)
{
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_eswitch_rep *rep = priv->ppriv;
struct mlx5e_channel *c;
int n, tc, err, num_sqs = 0;
u16 *sqs;
sqs = kcalloc(priv->params.num_channels * priv->params.num_tc, sizeof(u16), GFP_KERNEL);
if (!sqs)
return -ENOMEM;
for (n = 0; n < priv->params.num_channels; n++) {
c = priv->channel[n];
for (tc = 0; tc < c->num_tc; tc++)
sqs[num_sqs++] = c->sq[tc].sqn;
}
err = mlx5_eswitch_sqs2vport_start(esw, rep, sqs, num_sqs);
kfree(sqs);
return err;
}
int mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
{
struct mlx5e_priv *priv = rep->priv_data;
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
return mlx5e_add_sqs_fwd_rules(priv);
return 0;
}
void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
{
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_eswitch_rep *rep = priv->ppriv;
mlx5_eswitch_sqs2vport_stop(esw, rep);
}
void mlx5e_nic_rep_unload(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep)
{
struct mlx5e_priv *priv = rep->priv_data;
if (test_bit(MLX5E_STATE_OPENED, &priv->state))
mlx5e_remove_sqs_fwd_rules(priv);
}
static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
char *buf, size_t len)
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_eswitch_rep *rep = priv->ppriv;
int ret;
ret = snprintf(buf, len, "%d", rep->vport - 1);
if (ret >= len)
return -EOPNOTSUPP;
return 0;
}
static const struct switchdev_ops mlx5e_rep_switchdev_ops = {
.switchdev_port_attr_get = mlx5e_attr_get,
};
static const struct net_device_ops mlx5e_netdev_ops_rep = {
.ndo_open = mlx5e_open,
.ndo_stop = mlx5e_close,
.ndo_start_xmit = mlx5e_xmit,
.ndo_get_phys_port_name = mlx5e_rep_get_phys_port_name,
.ndo_get_stats64 = mlx5e_get_stats,
};
static void mlx5e_build_rep_netdev_priv(struct mlx5_core_dev *mdev,
struct net_device *netdev,
const struct mlx5e_profile *profile,
void *ppriv)
{
struct mlx5e_priv *priv = netdev_priv(netdev);
u8 cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
priv->params.log_sq_size =
MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE;
priv->params.rq_wq_type = MLX5_WQ_TYPE_LINKED_LIST;
priv->params.log_rq_size = MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE;
priv->params.min_rx_wqes = mlx5_min_rx_wqes(priv->params.rq_wq_type,
BIT(priv->params.log_rq_size));
priv->params.rx_am_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
mlx5e_set_rx_cq_mode_params(&priv->params, cq_period_mode);
priv->params.tx_max_inline = mlx5e_get_max_inline_cap(mdev);
priv->params.num_tc = 1;
priv->params.lro_wqe_sz =
MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ;
priv->mdev = mdev;
priv->netdev = netdev;
priv->params.num_channels = profile->max_nch(mdev);
priv->profile = profile;
priv->ppriv = ppriv;
mutex_init(&priv->state_lock);
INIT_DELAYED_WORK(&priv->update_stats_work, mlx5e_update_stats_work);
}
static void mlx5e_build_rep_netdev(struct net_device *netdev)
{
netdev->netdev_ops = &mlx5e_netdev_ops_rep;
netdev->watchdog_timeo = 15 * HZ;
netdev->ethtool_ops = &mlx5e_rep_ethtool_ops;
#ifdef CONFIG_NET_SWITCHDEV
netdev->switchdev_ops = &mlx5e_rep_switchdev_ops;
#endif
netdev->features |= NETIF_F_VLAN_CHALLENGED;
eth_hw_addr_random(netdev);
}
static void mlx5e_init_rep(struct mlx5_core_dev *mdev,
struct net_device *netdev,
const struct mlx5e_profile *profile,
void *ppriv)
{
mlx5e_build_rep_netdev_priv(mdev, netdev, profile, ppriv);
mlx5e_build_rep_netdev(netdev);
}
static int mlx5e_init_rep_rx(struct mlx5e_priv *priv)
{
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
struct mlx5_eswitch_rep *rep = priv->ppriv;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_flow_rule *flow_rule;
int err;
int i;
err = mlx5e_create_direct_rqts(priv);
if (err) {
mlx5_core_warn(mdev, "create direct rqts failed, %d\n", err);
return err;
}
err = mlx5e_create_direct_tirs(priv);
if (err) {
mlx5_core_warn(mdev, "create direct tirs failed, %d\n", err);
goto err_destroy_direct_rqts;
}
flow_rule = mlx5_eswitch_create_vport_rx_rule(esw,
rep->vport,
priv->direct_tir[0].tirn);
if (IS_ERR(flow_rule)) {
err = PTR_ERR(flow_rule);
goto err_destroy_direct_tirs;
}
rep->vport_rx_rule = flow_rule;
return 0;
err_destroy_direct_tirs:
mlx5e_destroy_direct_tirs(priv);
err_destroy_direct_rqts:
for (i = 0; i < priv->params.num_channels; i++)
mlx5e_destroy_rqt(priv, &priv->direct_tir[i].rqt);
return err;
}
static void mlx5e_cleanup_rep_rx(struct mlx5e_priv *priv)
{
struct mlx5_eswitch_rep *rep = priv->ppriv;
int i;
mlx5_del_flow_rule(rep->vport_rx_rule);
mlx5e_destroy_direct_tirs(priv);
for (i = 0; i < priv->params.num_channels; i++)
mlx5e_destroy_rqt(priv, &priv->direct_tir[i].rqt);
}
static int mlx5e_init_rep_tx(struct mlx5e_priv *priv)
{
int err;
err = mlx5e_create_tises(priv);
if (err) {
mlx5_core_warn(priv->mdev, "create tises failed, %d\n", err);
return err;
}
return 0;
}
static int mlx5e_get_rep_max_num_channels(struct mlx5_core_dev *mdev)
{
#define MLX5E_PORT_REPRESENTOR_NCH 1
return MLX5E_PORT_REPRESENTOR_NCH;
}
static struct mlx5e_profile mlx5e_rep_profile = {
.init = mlx5e_init_rep,
.init_rx = mlx5e_init_rep_rx,
.cleanup_rx = mlx5e_cleanup_rep_rx,
.init_tx = mlx5e_init_rep_tx,
.cleanup_tx = mlx5e_cleanup_nic_tx,
.update_stats = mlx5e_update_sw_rep_counters,
.max_nch = mlx5e_get_rep_max_num_channels,
.max_tc = 1,
};
int mlx5e_vport_rep_load(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep)
{
rep->priv_data = mlx5e_create_netdev(esw->dev, &mlx5e_rep_profile, rep);
if (!rep->priv_data) {
pr_warn("Failed to create representor for vport %d\n",
rep->vport);
return -EINVAL;
}
return 0;
}
void mlx5e_vport_rep_unload(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep)
{
struct mlx5e_priv *priv = rep->priv_data;
mlx5e_destroy_netdev(esw->dev, priv);
}
......@@ -40,17 +40,6 @@
#define UPLINK_VPORT 0xFFFF
#define MLX5_DEBUG_ESWITCH_MASK BIT(3)
#define esw_info(dev, format, ...) \
pr_info("(%s): E-Switch: " format, (dev)->priv.name, ##__VA_ARGS__)
#define esw_warn(dev, format, ...) \
pr_warn("(%s): E-Switch: " format, (dev)->priv.name, ##__VA_ARGS__)
#define esw_debug(dev, format, ...) \
mlx5_core_dbg_mask(dev, MLX5_DEBUG_ESWITCH_MASK, format, ##__VA_ARGS__)
enum {
MLX5_ACTION_NONE = 0,
MLX5_ACTION_ADD = 1,
......@@ -92,6 +81,9 @@ enum {
MC_ADDR_CHANGE | \
PROMISC_CHANGE)
int esw_offloads_init(struct mlx5_eswitch *esw, int nvports);
void esw_offloads_cleanup(struct mlx5_eswitch *esw, int nvports);
static int arm_vport_context_events_cmd(struct mlx5_core_dev *dev, u16 vport,
u32 events_mask)
{
......@@ -428,7 +420,7 @@ esw_fdb_set_vport_promisc_rule(struct mlx5_eswitch *esw, u32 vport)
return __esw_fdb_set_vport_rule(esw, vport, true, mac_c, mac_v);
}
static int esw_create_fdb_table(struct mlx5_eswitch *esw, int nvports)
static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw, int nvports)
{
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
struct mlx5_core_dev *dev = esw->dev;
......@@ -479,7 +471,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, int nvports)
esw_warn(dev, "Failed to create flow group err(%d)\n", err);
goto out;
}
esw->fdb_table.addr_grp = g;
esw->fdb_table.legacy.addr_grp = g;
/* Allmulti group : One rule that forwards any mcast traffic */
MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable,
......@@ -494,7 +486,7 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, int nvports)
esw_warn(dev, "Failed to create allmulti flow group err(%d)\n", err);
goto out;
}
esw->fdb_table.allmulti_grp = g;
esw->fdb_table.legacy.allmulti_grp = g;
/* Promiscuous group :
* One rule that forward all unmatched traffic from previous groups
......@@ -511,17 +503,17 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, int nvports)
esw_warn(dev, "Failed to create promisc flow group err(%d)\n", err);
goto out;
}
esw->fdb_table.promisc_grp = g;
esw->fdb_table.legacy.promisc_grp = g;
out:
if (err) {
if (!IS_ERR_OR_NULL(esw->fdb_table.allmulti_grp)) {
mlx5_destroy_flow_group(esw->fdb_table.allmulti_grp);
esw->fdb_table.allmulti_grp = NULL;
if (!IS_ERR_OR_NULL(esw->fdb_table.legacy.allmulti_grp)) {
mlx5_destroy_flow_group(esw->fdb_table.legacy.allmulti_grp);
esw->fdb_table.legacy.allmulti_grp = NULL;
}
if (!IS_ERR_OR_NULL(esw->fdb_table.addr_grp)) {
mlx5_destroy_flow_group(esw->fdb_table.addr_grp);
esw->fdb_table.addr_grp = NULL;
if (!IS_ERR_OR_NULL(esw->fdb_table.legacy.addr_grp)) {
mlx5_destroy_flow_group(esw->fdb_table.legacy.addr_grp);
esw->fdb_table.legacy.addr_grp = NULL;
}
if (!IS_ERR_OR_NULL(esw->fdb_table.fdb)) {
mlx5_destroy_flow_table(esw->fdb_table.fdb);
......@@ -533,20 +525,20 @@ static int esw_create_fdb_table(struct mlx5_eswitch *esw, int nvports)
return err;
}
static void esw_destroy_fdb_table(struct mlx5_eswitch *esw)
static void esw_destroy_legacy_fdb_table(struct mlx5_eswitch *esw)
{
if (!esw->fdb_table.fdb)
return;
esw_debug(esw->dev, "Destroy FDB Table\n");
mlx5_destroy_flow_group(esw->fdb_table.promisc_grp);
mlx5_destroy_flow_group(esw->fdb_table.allmulti_grp);
mlx5_destroy_flow_group(esw->fdb_table.addr_grp);
mlx5_destroy_flow_group(esw->fdb_table.legacy.promisc_grp);
mlx5_destroy_flow_group(esw->fdb_table.legacy.allmulti_grp);
mlx5_destroy_flow_group(esw->fdb_table.legacy.addr_grp);
mlx5_destroy_flow_table(esw->fdb_table.fdb);
esw->fdb_table.fdb = NULL;
esw->fdb_table.addr_grp = NULL;
esw->fdb_table.allmulti_grp = NULL;
esw->fdb_table.promisc_grp = NULL;
esw->fdb_table.legacy.addr_grp = NULL;
esw->fdb_table.legacy.allmulti_grp = NULL;
esw->fdb_table.legacy.promisc_grp = NULL;
}
/* E-Switch vport UC/MC lists management */
......@@ -578,7 +570,8 @@ static int esw_add_uc_addr(struct mlx5_eswitch *esw, struct vport_addr *vaddr)
if (err)
goto abort;
if (esw->fdb_table.fdb) /* SRIOV is enabled: Forward UC MAC to vport */
/* SRIOV is enabled: Forward UC MAC to vport */
if (esw->fdb_table.fdb && esw->mode == SRIOV_LEGACY)
vaddr->flow_rule = esw_fdb_set_vport_rule(esw, mac, vport);
esw_debug(esw->dev, "\tADDED UC MAC: vport[%d] %pM index:%d fr(%p)\n",
......@@ -1540,10 +1533,10 @@ static void esw_disable_vport(struct mlx5_eswitch *esw, int vport_num)
}
/* Public E-Switch API */
int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs)
int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs, int mode)
{
int err;
int i;
int i, enabled_events;
if (!esw || !MLX5_CAP_GEN(esw->dev, vport_group_manager) ||
MLX5_CAP_GEN(esw->dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
......@@ -1561,16 +1554,20 @@ int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs)
if (!MLX5_CAP_ESW_EGRESS_ACL(esw->dev, ft_support))
esw_warn(esw->dev, "E-Switch engress ACL is not supported by FW\n");
esw_info(esw->dev, "E-Switch enable SRIOV: nvfs(%d)\n", nvfs);
esw_info(esw->dev, "E-Switch enable SRIOV: nvfs(%d) mode (%d)\n", nvfs, mode);
esw->mode = mode;
esw_disable_vport(esw, 0);
err = esw_create_fdb_table(esw, nvfs + 1);
if (mode == SRIOV_LEGACY)
err = esw_create_legacy_fdb_table(esw, nvfs + 1);
else
err = esw_offloads_init(esw, nvfs + 1);
if (err)
goto abort;
enabled_events = (mode == SRIOV_LEGACY) ? SRIOV_VPORT_EVENTS : UC_ADDR_CHANGE;
for (i = 0; i <= nvfs; i++)
esw_enable_vport(esw, i, SRIOV_VPORT_EVENTS);
esw_enable_vport(esw, i, enabled_events);
esw_info(esw->dev, "SRIOV enabled: active vports(%d)\n",
esw->enabled_vports);
......@@ -1584,16 +1581,18 @@ int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs)
void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw)
{
struct esw_mc_addr *mc_promisc;
int nvports;
int i;
if (!esw || !MLX5_CAP_GEN(esw->dev, vport_group_manager) ||
MLX5_CAP_GEN(esw->dev, port_type) != MLX5_CAP_PORT_TYPE_ETH)
return;
esw_info(esw->dev, "disable SRIOV: active vports(%d)\n",
esw->enabled_vports);
esw_info(esw->dev, "disable SRIOV: active vports(%d) mode(%d)\n",
esw->enabled_vports, esw->mode);
mc_promisc = esw->mc_promisc;
nvports = esw->enabled_vports;
for (i = 0; i < esw->total_vports; i++)
esw_disable_vport(esw, i);
......@@ -1601,8 +1600,12 @@ void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw)
if (mc_promisc && mc_promisc->uplink_rule)
mlx5_del_flow_rule(mc_promisc->uplink_rule);
esw_destroy_fdb_table(esw);
if (esw->mode == SRIOV_LEGACY)
esw_destroy_legacy_fdb_table(esw);
else if (esw->mode == SRIOV_OFFLOADS)
esw_offloads_cleanup(esw, nvports);
esw->mode = SRIOV_NONE;
/* VPORT 0 (PF) must be enabled back with non-sriov configuration */
esw_enable_vport(esw, 0, UC_ADDR_CHANGE);
}
......@@ -1660,6 +1663,14 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
goto abort;
}
esw->offloads.vport_reps =
kzalloc(total_vports * sizeof(struct mlx5_eswitch_rep),
GFP_KERNEL);
if (!esw->offloads.vport_reps) {
err = -ENOMEM;
goto abort;
}
mutex_init(&esw->state_lock);
for (vport_num = 0; vport_num < total_vports; vport_num++) {
......@@ -1673,6 +1684,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
esw->total_vports = total_vports;
esw->enabled_vports = 0;
esw->mode = SRIOV_NONE;
dev->priv.eswitch = esw;
esw_enable_vport(esw, 0, UC_ADDR_CHANGE);
......@@ -1683,6 +1695,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
destroy_workqueue(esw->work_queue);
kfree(esw->l2_table.bitmap);
kfree(esw->vports);
kfree(esw->offloads.vport_reps);
kfree(esw);
return err;
}
......@@ -1700,6 +1713,7 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
destroy_workqueue(esw->work_queue);
kfree(esw->l2_table.bitmap);
kfree(esw->mc_promisc);
kfree(esw->offloads.vport_reps);
kfree(esw->vports);
kfree(esw);
}
......
......@@ -35,6 +35,7 @@
#include <linux/if_ether.h>
#include <linux/if_link.h>
#include <net/devlink.h>
#include <linux/mlx5/device.h>
#define MLX5_MAX_UC_PER_VPORT(dev) \
......@@ -46,6 +47,8 @@
#define MLX5_L2_ADDR_HASH_SIZE (BIT(BITS_PER_BYTE))
#define MLX5_L2_ADDR_HASH(addr) (addr[5])
#define FDB_UPLINK_VPORT 0xffff
/* L2 -mac address based- hash helpers */
struct l2addr_node {
struct hlist_node hlist;
......@@ -134,9 +137,48 @@ struct mlx5_l2_table {
struct mlx5_eswitch_fdb {
void *fdb;
struct mlx5_flow_group *addr_grp;
struct mlx5_flow_group *allmulti_grp;
struct mlx5_flow_group *promisc_grp;
union {
struct legacy_fdb {
struct mlx5_flow_group *addr_grp;
struct mlx5_flow_group *allmulti_grp;
struct mlx5_flow_group *promisc_grp;
} legacy;
struct offloads_fdb {
struct mlx5_flow_group *send_to_vport_grp;
struct mlx5_flow_group *miss_grp;
struct mlx5_flow_rule *miss_rule;
} offloads;
};
};
enum {
SRIOV_NONE,
SRIOV_LEGACY,
SRIOV_OFFLOADS
};
struct mlx5_esw_sq {
struct mlx5_flow_rule *send_to_vport_rule;
struct list_head list;
};
struct mlx5_eswitch_rep {
int (*load)(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep);
void (*unload)(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep);
u16 vport;
struct mlx5_flow_rule *vport_rx_rule;
void *priv_data;
struct list_head vport_sqs_list;
bool valid;
};
struct mlx5_esw_offload {
struct mlx5_flow_table *ft_offloads;
struct mlx5_flow_group *vport_rx_group;
struct mlx5_eswitch_rep *vport_reps;
};
struct mlx5_eswitch {
......@@ -153,13 +195,15 @@ struct mlx5_eswitch {
*/
struct mutex state_lock;
struct esw_mc_addr *mc_promisc;
struct mlx5_esw_offload offloads;
int mode;
};
/* E-Switch API */
int mlx5_eswitch_init(struct mlx5_core_dev *dev);
void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw);
void mlx5_eswitch_vport_event(struct mlx5_eswitch *esw, struct mlx5_eqe *eqe);
int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs);
int mlx5_eswitch_enable_sriov(struct mlx5_eswitch *esw, int nvfs, int mode);
void mlx5_eswitch_disable_sriov(struct mlx5_eswitch *esw);
int mlx5_eswitch_set_vport_mac(struct mlx5_eswitch *esw,
int vport, u8 mac[ETH_ALEN]);
......@@ -177,4 +221,30 @@ int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
int vport,
struct ifla_vf_stats *vf_stats);
struct mlx5_flow_rule *
mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 tirn);
int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep,
u16 *sqns_array, int sqns_num);
void mlx5_eswitch_sqs2vport_stop(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep);
int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode);
int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode);
void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep);
void mlx5_eswitch_unregister_vport_rep(struct mlx5_eswitch *esw,
int vport);
#define MLX5_DEBUG_ESWITCH_MASK BIT(3)
#define esw_info(dev, format, ...) \
pr_info("(%s): E-Switch: " format, (dev)->priv.name, ##__VA_ARGS__)
#define esw_warn(dev, format, ...) \
pr_warn("(%s): E-Switch: " format, (dev)->priv.name, ##__VA_ARGS__)
#define esw_debug(dev, format, ...) \
mlx5_core_dbg_mask(dev, MLX5_DEBUG_ESWITCH_MASK, format, ##__VA_ARGS__)
#endif /* __MLX5_ESWITCH_H__ */
/*
* Copyright (c) 2016, Mellanox Technologies. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <linux/etherdevice.h>
#include <linux/mlx5/driver.h>
#include <linux/mlx5/mlx5_ifc.h>
#include <linux/mlx5/vport.h>
#include <linux/mlx5/fs.h>
#include "mlx5_core.h"
#include "eswitch.h"
static struct mlx5_flow_rule *
mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, int vport, u32 sqn)
{
struct mlx5_flow_destination dest;
struct mlx5_flow_rule *flow_rule;
int match_header = MLX5_MATCH_MISC_PARAMETERS;
u32 *match_v, *match_c;
void *misc;
match_v = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
if (!match_v || !match_c) {
esw_warn(esw->dev, "FDB: Failed to alloc match parameters\n");
flow_rule = ERR_PTR(-ENOMEM);
goto out;
}
misc = MLX5_ADDR_OF(fte_match_param, match_v, misc_parameters);
MLX5_SET(fte_match_set_misc, misc, source_sqn, sqn);
MLX5_SET(fte_match_set_misc, misc, source_port, 0x0); /* source vport is 0 */
misc = MLX5_ADDR_OF(fte_match_param, match_c, misc_parameters);
MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_sqn);
MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
dest.vport_num = vport;
flow_rule = mlx5_add_flow_rule(esw->fdb_table.fdb, match_header, match_c,
match_v, MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
0, &dest);
if (IS_ERR(flow_rule))
esw_warn(esw->dev, "FDB: Failed to add send to vport rule err %ld\n", PTR_ERR(flow_rule));
out:
kfree(match_v);
kfree(match_c);
return flow_rule;
}
void mlx5_eswitch_sqs2vport_stop(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep)
{
struct mlx5_esw_sq *esw_sq, *tmp;
if (esw->mode != SRIOV_OFFLOADS)
return;
list_for_each_entry_safe(esw_sq, tmp, &rep->vport_sqs_list, list) {
mlx5_del_flow_rule(esw_sq->send_to_vport_rule);
list_del(&esw_sq->list);
kfree(esw_sq);
}
}
int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep,
u16 *sqns_array, int sqns_num)
{
struct mlx5_flow_rule *flow_rule;
struct mlx5_esw_sq *esw_sq;
int vport;
int err;
int i;
if (esw->mode != SRIOV_OFFLOADS)
return 0;
vport = rep->vport == 0 ?
FDB_UPLINK_VPORT : rep->vport;
for (i = 0; i < sqns_num; i++) {
esw_sq = kzalloc(sizeof(*esw_sq), GFP_KERNEL);
if (!esw_sq) {
err = -ENOMEM;
goto out_err;
}
/* Add re-inject rule to the PF/representor sqs */
flow_rule = mlx5_eswitch_add_send_to_vport_rule(esw,
vport,
sqns_array[i]);
if (IS_ERR(flow_rule)) {
err = PTR_ERR(flow_rule);
kfree(esw_sq);
goto out_err;
}
esw_sq->send_to_vport_rule = flow_rule;
list_add(&esw_sq->list, &rep->vport_sqs_list);
}
return 0;
out_err:
mlx5_eswitch_sqs2vport_stop(esw, rep);
return err;
}
static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
{
struct mlx5_flow_destination dest;
struct mlx5_flow_rule *flow_rule = NULL;
u32 *match_v, *match_c;
int err = 0;
match_v = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
if (!match_v || !match_c) {
esw_warn(esw->dev, "FDB: Failed to alloc match parameters\n");
err = -ENOMEM;
goto out;
}
dest.type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
dest.vport_num = 0;
flow_rule = mlx5_add_flow_rule(esw->fdb_table.fdb, 0, match_c, match_v,
MLX5_FLOW_CONTEXT_ACTION_FWD_DEST, 0, &dest);
if (IS_ERR(flow_rule)) {
err = PTR_ERR(flow_rule);
esw_warn(esw->dev, "FDB: Failed to add miss flow rule err %d\n", err);
goto out;
}
esw->fdb_table.offloads.miss_rule = flow_rule;
out:
kfree(match_v);
kfree(match_c);
return err;
}
#define MAX_PF_SQ 256
static int esw_create_offloads_fdb_table(struct mlx5_eswitch *esw, int nvports)
{
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
struct mlx5_core_dev *dev = esw->dev;
struct mlx5_flow_namespace *root_ns;
struct mlx5_flow_table *fdb = NULL;
struct mlx5_flow_group *g;
u32 *flow_group_in;
void *match_criteria;
int table_size, ix, err = 0;
flow_group_in = mlx5_vzalloc(inlen);
if (!flow_group_in)
return -ENOMEM;
root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_FDB);
if (!root_ns) {
esw_warn(dev, "Failed to get FDB flow namespace\n");
goto ns_err;
}
esw_debug(dev, "Create offloads FDB table, log_max_size(%d)\n",
MLX5_CAP_ESW_FLOWTABLE_FDB(dev, log_max_ft_size));
table_size = nvports + MAX_PF_SQ + 1;
fdb = mlx5_create_flow_table(root_ns, 0, table_size, 0);
if (IS_ERR(fdb)) {
err = PTR_ERR(fdb);
esw_warn(dev, "Failed to create FDB Table err %d\n", err);
goto fdb_err;
}
esw->fdb_table.fdb = fdb;
/* create send-to-vport group */
memset(flow_group_in, 0, inlen);
MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable,
MLX5_MATCH_MISC_PARAMETERS);
match_criteria = MLX5_ADDR_OF(create_flow_group_in, flow_group_in, match_criteria);
MLX5_SET_TO_ONES(fte_match_param, match_criteria, misc_parameters.source_sqn);
MLX5_SET_TO_ONES(fte_match_param, match_criteria, misc_parameters.source_port);
ix = nvports + MAX_PF_SQ;
MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 0);
MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, ix - 1);
g = mlx5_create_flow_group(fdb, flow_group_in);
if (IS_ERR(g)) {
err = PTR_ERR(g);
esw_warn(dev, "Failed to create send-to-vport flow group err(%d)\n", err);
goto send_vport_err;
}
esw->fdb_table.offloads.send_to_vport_grp = g;
/* create miss group */
memset(flow_group_in, 0, inlen);
MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable, 0);
MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, ix);
MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, ix + 1);
g = mlx5_create_flow_group(fdb, flow_group_in);
if (IS_ERR(g)) {
err = PTR_ERR(g);
esw_warn(dev, "Failed to create miss flow group err(%d)\n", err);
goto miss_err;
}
esw->fdb_table.offloads.miss_grp = g;
err = esw_add_fdb_miss_rule(esw);
if (err)
goto miss_rule_err;
return 0;
miss_rule_err:
mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
miss_err:
mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
send_vport_err:
mlx5_destroy_flow_table(fdb);
fdb_err:
ns_err:
kvfree(flow_group_in);
return err;
}
static void esw_destroy_offloads_fdb_table(struct mlx5_eswitch *esw)
{
if (!esw->fdb_table.fdb)
return;
esw_debug(esw->dev, "Destroy offloads FDB Table\n");
mlx5_del_flow_rule(esw->fdb_table.offloads.miss_rule);
mlx5_destroy_flow_group(esw->fdb_table.offloads.send_to_vport_grp);
mlx5_destroy_flow_group(esw->fdb_table.offloads.miss_grp);
mlx5_destroy_flow_table(esw->fdb_table.fdb);
}
static int esw_create_offloads_table(struct mlx5_eswitch *esw)
{
struct mlx5_flow_namespace *ns;
struct mlx5_flow_table *ft_offloads;
struct mlx5_core_dev *dev = esw->dev;
int err = 0;
ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_OFFLOADS);
if (!ns) {
esw_warn(esw->dev, "Failed to get offloads flow namespace\n");
return -ENOMEM;
}
ft_offloads = mlx5_create_flow_table(ns, 0, dev->priv.sriov.num_vfs + 2, 0);
if (IS_ERR(ft_offloads)) {
err = PTR_ERR(ft_offloads);
esw_warn(esw->dev, "Failed to create offloads table, err %d\n", err);
return err;
}
esw->offloads.ft_offloads = ft_offloads;
return 0;
}
static void esw_destroy_offloads_table(struct mlx5_eswitch *esw)
{
struct mlx5_esw_offload *offloads = &esw->offloads;
mlx5_destroy_flow_table(offloads->ft_offloads);
}
static int esw_create_vport_rx_group(struct mlx5_eswitch *esw)
{
int inlen = MLX5_ST_SZ_BYTES(create_flow_group_in);
struct mlx5_flow_group *g;
struct mlx5_priv *priv = &esw->dev->priv;
u32 *flow_group_in;
void *match_criteria, *misc;
int err = 0;
int nvports = priv->sriov.num_vfs + 2;
flow_group_in = mlx5_vzalloc(inlen);
if (!flow_group_in)
return -ENOMEM;
/* create vport rx group */
memset(flow_group_in, 0, inlen);
MLX5_SET(create_flow_group_in, flow_group_in, match_criteria_enable,
MLX5_MATCH_MISC_PARAMETERS);
match_criteria = MLX5_ADDR_OF(create_flow_group_in, flow_group_in, match_criteria);
misc = MLX5_ADDR_OF(fte_match_param, match_criteria, misc_parameters);
MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
MLX5_SET(create_flow_group_in, flow_group_in, start_flow_index, 0);
MLX5_SET(create_flow_group_in, flow_group_in, end_flow_index, nvports - 1);
g = mlx5_create_flow_group(esw->offloads.ft_offloads, flow_group_in);
if (IS_ERR(g)) {
err = PTR_ERR(g);
mlx5_core_warn(esw->dev, "Failed to create vport rx group err %d\n", err);
goto out;
}
esw->offloads.vport_rx_group = g;
out:
kfree(flow_group_in);
return err;
}
static void esw_destroy_vport_rx_group(struct mlx5_eswitch *esw)
{
mlx5_destroy_flow_group(esw->offloads.vport_rx_group);
}
struct mlx5_flow_rule *
mlx5_eswitch_create_vport_rx_rule(struct mlx5_eswitch *esw, int vport, u32 tirn)
{
struct mlx5_flow_destination dest;
struct mlx5_flow_rule *flow_rule;
int match_header = MLX5_MATCH_MISC_PARAMETERS;
u32 *match_v, *match_c;
void *misc;
match_v = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
if (!match_v || !match_c) {
esw_warn(esw->dev, "Failed to alloc match parameters\n");
flow_rule = ERR_PTR(-ENOMEM);
goto out;
}
misc = MLX5_ADDR_OF(fte_match_param, match_v, misc_parameters);
MLX5_SET(fte_match_set_misc, misc, source_port, vport);
misc = MLX5_ADDR_OF(fte_match_param, match_c, misc_parameters);
MLX5_SET_TO_ONES(fte_match_set_misc, misc, source_port);
dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
dest.tir_num = tirn;
flow_rule = mlx5_add_flow_rule(esw->offloads.ft_offloads, match_header, match_c,
match_v, MLX5_FLOW_CONTEXT_ACTION_FWD_DEST,
0, &dest);
if (IS_ERR(flow_rule)) {
esw_warn(esw->dev, "fs offloads: Failed to add vport rx rule err %ld\n", PTR_ERR(flow_rule));
goto out;
}
out:
kfree(match_v);
kfree(match_c);
return flow_rule;
}
static int esw_offloads_start(struct mlx5_eswitch *esw)
{
int err, num_vfs = esw->dev->priv.sriov.num_vfs;
if (esw->mode != SRIOV_LEGACY) {
esw_warn(esw->dev, "Can't set offloads mode, SRIOV legacy not enabled\n");
return -EINVAL;
}
mlx5_eswitch_disable_sriov(esw);
err = mlx5_eswitch_enable_sriov(esw, num_vfs, SRIOV_OFFLOADS);
if (err)
esw_warn(esw->dev, "Failed set eswitch to offloads, err %d\n", err);
return err;
}
int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
{
struct mlx5_eswitch_rep *rep;
int vport;
int err;
err = esw_create_offloads_fdb_table(esw, nvports);
if (err)
return err;
err = esw_create_offloads_table(esw);
if (err)
goto create_ft_err;
err = esw_create_vport_rx_group(esw);
if (err)
goto create_fg_err;
for (vport = 0; vport < nvports; vport++) {
rep = &esw->offloads.vport_reps[vport];
if (!rep->valid)
continue;
err = rep->load(esw, rep);
if (err)
goto err_reps;
}
return 0;
err_reps:
for (vport--; vport >= 0; vport--) {
rep = &esw->offloads.vport_reps[vport];
if (!rep->valid)
continue;
rep->unload(esw, rep);
}
esw_destroy_vport_rx_group(esw);
create_fg_err:
esw_destroy_offloads_table(esw);
create_ft_err:
esw_destroy_offloads_fdb_table(esw);
return err;
}
static int esw_offloads_stop(struct mlx5_eswitch *esw)
{
int err, num_vfs = esw->dev->priv.sriov.num_vfs;
mlx5_eswitch_disable_sriov(esw);
err = mlx5_eswitch_enable_sriov(esw, num_vfs, SRIOV_LEGACY);
if (err)
esw_warn(esw->dev, "Failed set eswitch legacy mode. err %d\n", err);
return err;
}
void esw_offloads_cleanup(struct mlx5_eswitch *esw, int nvports)
{
struct mlx5_eswitch_rep *rep;
int vport;
for (vport = 0; vport < nvports; vport++) {
rep = &esw->offloads.vport_reps[vport];
if (!rep->valid)
continue;
rep->unload(esw, rep);
}
esw_destroy_vport_rx_group(esw);
esw_destroy_offloads_table(esw);
esw_destroy_offloads_fdb_table(esw);
}
static int mlx5_esw_mode_from_devlink(u16 mode, u16 *mlx5_mode)
{
switch (mode) {
case DEVLINK_ESWITCH_MODE_LEGACY:
*mlx5_mode = SRIOV_LEGACY;
break;
case DEVLINK_ESWITCH_MODE_SWITCHDEV:
*mlx5_mode = SRIOV_OFFLOADS;
break;
default:
return -EINVAL;
}
return 0;
}
int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
{
struct mlx5_core_dev *dev;
u16 cur_mlx5_mode, mlx5_mode = 0;
dev = devlink_priv(devlink);
if (!MLX5_CAP_GEN(dev, vport_group_manager))
return -EOPNOTSUPP;
cur_mlx5_mode = dev->priv.eswitch->mode;
if (cur_mlx5_mode == SRIOV_NONE)
return -EOPNOTSUPP;
if (mlx5_esw_mode_from_devlink(mode, &mlx5_mode))
return -EINVAL;
if (cur_mlx5_mode == mlx5_mode)
return 0;
if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV)
return esw_offloads_start(dev->priv.eswitch);
else if (mode == DEVLINK_ESWITCH_MODE_LEGACY)
return esw_offloads_stop(dev->priv.eswitch);
else
return -EINVAL;
}
int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
{
struct mlx5_core_dev *dev;
dev = devlink_priv(devlink);
if (!MLX5_CAP_GEN(dev, vport_group_manager))
return -EOPNOTSUPP;
if (dev->priv.eswitch->mode == SRIOV_NONE)
return -EOPNOTSUPP;
*mode = dev->priv.eswitch->mode;
return 0;
}
void mlx5_eswitch_register_vport_rep(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep)
{
struct mlx5_esw_offload *offloads = &esw->offloads;
memcpy(&offloads->vport_reps[rep->vport], rep,
sizeof(struct mlx5_eswitch_rep));
INIT_LIST_HEAD(&offloads->vport_reps[rep->vport].vport_sqs_list);
offloads->vport_reps[rep->vport].valid = true;
}
void mlx5_eswitch_unregister_vport_rep(struct mlx5_eswitch *esw,
int vport)
{
struct mlx5_esw_offload *offloads = &esw->offloads;
struct mlx5_eswitch_rep *rep;
rep = &offloads->vport_reps[vport];
if (esw->mode == SRIOV_OFFLOADS && esw->vports[vport].enabled)
rep->unload(esw, rep);
offloads->vport_reps[vport].valid = false;
}
......@@ -83,6 +83,11 @@
#define ANCHOR_NUM_LEVELS 1
#define ANCHOR_NUM_PRIOS 1
#define ANCHOR_MIN_LEVEL (BY_PASS_MIN_LEVEL + 1)
#define OFFLOADS_MAX_FT 1
#define OFFLOADS_NUM_PRIOS 1
#define OFFLOADS_MIN_LEVEL (ANCHOR_MIN_LEVEL + 1)
struct node_caps {
size_t arr_sz;
long *caps;
......@@ -98,7 +103,7 @@ static struct init_tree_node {
int num_levels;
} root_fs = {
.type = FS_TYPE_NAMESPACE,
.ar_size = 4,
.ar_size = 5,
.children = (struct init_tree_node[]) {
ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
......@@ -107,6 +112,9 @@ static struct init_tree_node {
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
ADD_NS(ADD_MULTIPLE_PRIO(MLX5_BY_PASS_NUM_PRIOS,
BY_PASS_PRIO_NUM_LEVELS))),
ADD_PRIO(0, OFFLOADS_MIN_LEVEL, 0, {},
ADD_NS(ADD_MULTIPLE_PRIO(OFFLOADS_NUM_PRIOS, OFFLOADS_MAX_FT))),
ADD_PRIO(0, KERNEL_MIN_LEVEL, 0, {},
ADD_NS(ADD_MULTIPLE_PRIO(1, 1),
ADD_MULTIPLE_PRIO(KERNEL_NIC_NUM_PRIOS,
......@@ -1369,6 +1377,7 @@ struct mlx5_flow_namespace *mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
switch (type) {
case MLX5_FLOW_NAMESPACE_BYPASS:
case MLX5_FLOW_NAMESPACE_OFFLOADS:
case MLX5_FLOW_NAMESPACE_KERNEL:
case MLX5_FLOW_NAMESPACE_LEFTOVERS:
case MLX5_FLOW_NAMESPACE_ANCHOR:
......
......@@ -51,6 +51,7 @@
#ifdef CONFIG_RFS_ACCEL
#include <linux/cpu_rmap.h>
#endif
#include <net/devlink.h>
#include "mlx5_core.h"
#include "fs_core.h"
#ifdef CONFIG_MLX5_CORE_EN
......@@ -1315,19 +1316,28 @@ struct mlx5_core_event_handler {
void *data);
};
static const struct devlink_ops mlx5_devlink_ops = {
#ifdef CONFIG_MLX5_CORE_EN
.eswitch_mode_set = mlx5_devlink_eswitch_mode_set,
.eswitch_mode_get = mlx5_devlink_eswitch_mode_get,
#endif
};
static int init_one(struct pci_dev *pdev,
const struct pci_device_id *id)
{
struct mlx5_core_dev *dev;
struct devlink *devlink;
struct mlx5_priv *priv;
int err;
dev = kzalloc(sizeof(*dev), GFP_KERNEL);
if (!dev) {
devlink = devlink_alloc(&mlx5_devlink_ops, sizeof(*dev));
if (!devlink) {
dev_err(&pdev->dev, "kzalloc failed\n");
return -ENOMEM;
}
dev = devlink_priv(devlink);
priv = &dev->priv;
priv->pci_dev_data = id->driver_data;
......@@ -1364,15 +1374,21 @@ static int init_one(struct pci_dev *pdev,
goto clean_health;
}
err = devlink_register(devlink, &pdev->dev);
if (err)
goto clean_load;
return 0;
clean_load:
mlx5_unload_one(dev, priv);
clean_health:
mlx5_health_cleanup(dev);
close_pci:
mlx5_pci_close(dev, priv);
clean_dev:
pci_set_drvdata(pdev, NULL);
kfree(dev);
devlink_free(devlink);
return err;
}
......@@ -1380,8 +1396,10 @@ static int init_one(struct pci_dev *pdev,
static void remove_one(struct pci_dev *pdev)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
struct devlink *devlink = priv_to_devlink(dev);
struct mlx5_priv *priv = &dev->priv;
devlink_unregister(devlink);
if (mlx5_unload_one(dev, priv)) {
dev_err(&dev->pdev->dev, "mlx5_unload_one failed\n");
mlx5_health_cleanup(dev);
......@@ -1390,7 +1408,7 @@ static void remove_one(struct pci_dev *pdev)
mlx5_health_cleanup(dev);
mlx5_pci_close(dev, priv);
pci_set_drvdata(pdev, NULL);
kfree(dev);
devlink_free(devlink);
}
static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev,
......
......@@ -167,7 +167,7 @@ int mlx5_core_sriov_configure(struct pci_dev *pdev, int num_vfs)
mlx5_core_init_vfs(dev, num_vfs);
#ifdef CONFIG_MLX5_CORE_EN
mlx5_eswitch_enable_sriov(dev->priv.eswitch, num_vfs);
mlx5_eswitch_enable_sriov(dev->priv.eswitch, num_vfs, SRIOV_LEGACY);
#endif
return num_vfs;
......@@ -209,7 +209,8 @@ int mlx5_sriov_init(struct mlx5_core_dev *dev)
mlx5_core_init_vfs(dev, cur_vfs);
#ifdef CONFIG_MLX5_CORE_EN
if (cur_vfs)
mlx5_eswitch_enable_sriov(dev->priv.eswitch, cur_vfs);
mlx5_eswitch_enable_sriov(dev->priv.eswitch, cur_vfs,
SRIOV_LEGACY);
#endif
enable_vfs(dev, cur_vfs);
......
......@@ -578,6 +578,18 @@ enum mlx5_pci_status {
MLX5_PCI_STATUS_ENABLED,
};
struct mlx5_td {
struct list_head tirs_list;
u32 tdn;
};
struct mlx5e_resources {
struct mlx5_uar cq_uar;
u32 pdn;
struct mlx5_td td;
struct mlx5_core_mkey mkey;
};
struct mlx5_core_dev {
struct pci_dev *pdev;
/* sync pci state */
......@@ -602,6 +614,7 @@ struct mlx5_core_dev {
struct mlx5_profile *profile;
atomic_t num_qps;
u32 issi;
struct mlx5e_resources mlx5e_res;
#ifdef CONFIG_RFS_ACCEL
struct cpu_rmap *rmap;
#endif
......
......@@ -54,6 +54,7 @@ static inline void build_leftovers_ft_param(int *priority,
enum mlx5_flow_namespace_type {
MLX5_FLOW_NAMESPACE_BYPASS,
MLX5_FLOW_NAMESPACE_OFFLOADS,
MLX5_FLOW_NAMESPACE_KERNEL,
MLX5_FLOW_NAMESPACE_LEFTOVERS,
MLX5_FLOW_NAMESPACE_ANCHOR,
......
......@@ -90,6 +90,9 @@ struct devlink_ops {
u16 tc_index,
enum devlink_sb_pool_type pool_type,
u32 *p_cur, u32 *p_max);
int (*eswitch_mode_get)(struct devlink *devlink, u16 *p_mode);
int (*eswitch_mode_set)(struct devlink *devlink, u16 mode);
};
static inline void *devlink_priv(struct devlink *devlink)
......
......@@ -57,6 +57,8 @@ enum devlink_command {
DEVLINK_CMD_SB_OCC_SNAPSHOT,
DEVLINK_CMD_SB_OCC_MAX_CLEAR,
DEVLINK_CMD_ESWITCH_MODE_GET,
DEVLINK_CMD_ESWITCH_MODE_SET,
/* add new commands above here */
__DEVLINK_CMD_MAX,
......@@ -95,6 +97,11 @@ enum devlink_sb_threshold_type {
#define DEVLINK_SB_THRESHOLD_TO_ALPHA_MAX 20
enum devlink_eswitch_mode {
DEVLINK_ESWITCH_MODE_LEGACY,
DEVLINK_ESWITCH_MODE_SWITCHDEV,
};
enum devlink_attr {
/* don't change the order or add anything between, this is ABI! */
DEVLINK_ATTR_UNSPEC,
......@@ -125,6 +132,7 @@ enum devlink_attr {
DEVLINK_ATTR_SB_TC_INDEX, /* u16 */
DEVLINK_ATTR_SB_OCC_CUR, /* u32 */
DEVLINK_ATTR_SB_OCC_MAX, /* u32 */
DEVLINK_ATTR_ESWITCH_MODE, /* u16 */
/* add new attributes above here, update the policy in devlink.c */
......
......@@ -1394,6 +1394,78 @@ static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb,
return -EOPNOTSUPP;
}
static int devlink_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
enum devlink_command cmd, u32 portid,
u32 seq, int flags, u16 mode)
{
void *hdr;
hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
if (!hdr)
return -EMSGSIZE;
if (devlink_nl_put_handle(msg, devlink))
goto nla_put_failure;
if (nla_put_u16(msg, DEVLINK_ATTR_ESWITCH_MODE, mode))
goto nla_put_failure;
genlmsg_end(msg, hdr);
return 0;
nla_put_failure:
genlmsg_cancel(msg, hdr);
return -EMSGSIZE;
}
static int devlink_nl_cmd_eswitch_mode_get_doit(struct sk_buff *skb,
struct genl_info *info)
{
struct devlink *devlink = info->user_ptr[0];
const struct devlink_ops *ops = devlink->ops;
struct sk_buff *msg;
u16 mode;
int err;
if (!ops || !ops->eswitch_mode_get)
return -EOPNOTSUPP;
err = ops->eswitch_mode_get(devlink, &mode);
if (err)
return err;
msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!msg)
return -ENOMEM;
err = devlink_eswitch_fill(msg, devlink, DEVLINK_CMD_ESWITCH_MODE_GET,
info->snd_portid, info->snd_seq, 0, mode);
if (err) {
nlmsg_free(msg);
return err;
}
return genlmsg_reply(msg, info);
}
static int devlink_nl_cmd_eswitch_mode_set_doit(struct sk_buff *skb,
struct genl_info *info)
{
struct devlink *devlink = info->user_ptr[0];
const struct devlink_ops *ops = devlink->ops;
u16 mode;
if (!info->attrs[DEVLINK_ATTR_ESWITCH_MODE])
return -EINVAL;
mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
if (ops && ops->eswitch_mode_set)
return ops->eswitch_mode_set(devlink, mode);
return -EOPNOTSUPP;
}
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
......@@ -1407,6 +1479,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE] = { .type = NLA_U8 },
[DEVLINK_ATTR_SB_THRESHOLD] = { .type = NLA_U32 },
[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
};
static const struct genl_ops devlink_nl_ops[] = {
......@@ -1525,6 +1598,20 @@ static const struct genl_ops devlink_nl_ops[] = {
DEVLINK_NL_FLAG_NEED_SB |
DEVLINK_NL_FLAG_LOCK_PORTS,
},
{
.cmd = DEVLINK_CMD_ESWITCH_MODE_GET,
.doit = devlink_nl_cmd_eswitch_mode_get_doit,
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
{
.cmd = DEVLINK_CMD_ESWITCH_MODE_SET,
.doit = devlink_nl_cmd_eswitch_mode_set_doit,
.policy = devlink_nl_policy,
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
};
/**
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册