提交 · 5b6a07297bdca1701dc983bf084d6c0b2569ff18 · openeuler / Kernel

18 8月, 2022 7 次提交

net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats · e780e319

由 Vladimir Oltean 提交于 8月 16, 2022

Rather than reading the stats64 counters directly from the 32-bit
hardware, it's better to rely on the output produced by the periodic
ocelot_port_update_stats().

It would be even better to call ocelot_port_update_stats() right from
ocelot_get_stats64() to make sure we report the current values rather
than the ones from 2 seconds ago. But we need to export
ocelot_port_update_stats() from the switch lib towards the switchdev
driver for that, and future work will largely undo that.

There are more ocelot-based drivers waiting to be introduced, an example
of which is the SPI-controlled VSC7512. In that driver's case, it will
be impossible to call ocelot_port_update_stats() from ndo_get_stats64
context, since the latter is atomic, and reading the stats over SPI is
sleepable. So the compromise taken here, which will also hold going
forward, is to report 64-bit counters to stats64, which are not 100% up
to date.

Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e780e319

net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset · d4c36765

由 Vladimir Oltean 提交于 8月 16, 2022

With so many counter addresses recently discovered as being wrong, it is
desirable to at least have a central database of information, rather
than two: one through the SYS_COUNT_* registers (used for
ndo_get_stats64), and the other through the offset field of struct
ocelot_stat_layout elements (used for ethtool -S).

The strategy will be to keep the SYS_COUNT_* definitions as the single
source of truth, but for that we need to expand our current definitions
to cover all registers. Then we need to convert the ocelot region
creation logic, and stats worker, to the read semantics imposed by going
through SYS_COUNT_* absolute register addresses, rather than offsets
of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been
SYS_CNT, by the way).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

d4c36765

net: mscc: ocelot: make struct ocelot_stat_layout array indexable · 91904600

由 Vladimir Oltean 提交于 8月 16, 2022

The ocelot counters are 32-bit and require periodic reading, every 2
seconds, by ocelot_port_update_stats(), so that wraparounds are
detected.

Currently, the counters reported by ocelot_get_stats64() come from the
32-bit hardware counters directly, rather than from the 64-bit
accumulated ocelot->stats, and this is a problem for their integrity.

The strategy is to make ocelot_get_stats64() able to cherry-pick
individual stats from ocelot->stats the way in which it currently reads
them out from SYS_COUNT_* registers. But currently it can't, because
ocelot->stats is an opaque u64 array that's used only to feed data into
ethtool -S.

To solve that problem, we need to make ocelot->stats indexable, and
associate each element with an element of struct ocelot_stat_layout used
by ethtool -S.

This makes ocelot_stat_layout a fat (and possibly sparse) array, so we
need to change the way in which we access it. We no longer need
OCELOT_STAT_END as a sentinel, because we know the array's size
(OCELOT_NUM_STATS). We just need to skip the array elements that were
left unpopulated for the switch revision (ocelot, felix, seville).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

91904600

net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work · 18d8e67d

由 Vladimir Oltean 提交于 8月 16, 2022

The 2 methods can run concurrently, and one will change the window of
counters (SYS_STAT_CFG_STAT_VIEW) that the other sees. The fix is
similar to what commit 7fbf6795 ("net: mscc: ocelot: fix mutex lock
error during ethtool stats read") has done for ethtool -S.

Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

18d8e67d

net: mscc: ocelot: turn stats_lock into a spinlock · 22d842e3

由 Vladimir Oltean 提交于 8月 16, 2022

ocelot_get_stats64() currently runs unlocked and therefore may collide
with ocelot_port_update_stats() which indirectly accesses the same
counters. However, ocelot_get_stats64() runs in atomic context, and we
cannot simply take the sleepable ocelot->stats_lock mutex. We need to
convert it to an atomic spinlock first. Do that as a preparatory change.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

22d842e3

net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter · 173ca866

由 Vladimir Oltean 提交于 8月 16, 2022

This register, used as part of stats->tx_dropped in
ocelot_get_stats64(), has a wrong address. At the address currently
given, there is actually the c_tx_green_prio_6 counter.

Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

173ca866

net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters · 5152de7b

由 Vladimir Oltean 提交于 8月 16, 2022

Reading stats using the SYS_COUNT_* register definitions is only used by
ocelot_get_stats64() from the ocelot switchdev driver, however,
currently the bucket definitions are incorrect.

Separately, on both RX and TX, we have the following problems:
- a 256-1023 bucket which actually tracks the 256-511 packets
- the 1024-1526 bucket actually tracks the 512-1023 packets
- the 1527-max bucket actually tracks the 1024-1526 packets

=> nobody tracks the packets from the real 1527-max bucket

Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS
all track the wrong thing. However this doesn't seem to have any
consequence, since ocelot_get_stats64() doesn't use these.

Even though this problem only manifests itself for the switchdev driver,
we cannot split the fix for ocelot and for DSA, since it requires fixing
the bucket definitions from enum ocelot_reg, which makes us necessarily
adapt the structures from felix and seville as well.

Fixes: 84705fc1 ("net: dsa: felix: introduce support for Seville VSC9953 switch")
Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
Fixes: a556c76a ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

5152de7b

08 7月, 2022 1 次提交

net: ocelot: fix wrong time_after usage · f46fd3d7

由 Pavel Skripkin 提交于 7月 06, 2022

Accidentally noticed, that this driver is the only user of
while (time_after(jiffies...)).

It looks like typo, because likely this while loop will finish after 1st
iteration, because time_after() returns true when 1st argument _is after_
2nd one.

There is one possible problem with this poll loop: the scheduler could put
the thread to sleep, and it does not get woken up for
OCELOT_FDMA_CH_SAFE_TIMEOUT_US. During that time, the hardware has done
its thing, but you exit the while loop and return -ETIMEDOUT.

Fix it by using sane poll API that avoids all problems described above

Fixes: 753a026c ("net: ocelot: add FDMA support")
Suggested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220706132845.27968-1-paskripkin@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

f46fd3d7

19 6月, 2022 1 次提交

net: dsa: felix: update base time of time-aware shaper when adjusting PTP time · 8670dc33

由 Xiaoliang Yang 提交于 6月 17, 2022

When adjusting the PTP clock, the base time of the TAS configuration
will become unreliable. We need reset the TAS configuration by using a
new base time.

For example, if the driver gets a base time 0 of Qbv configuration from
user, and current time is 20000. The driver will set the TAS base time
to be 20000. After the PTP clock adjustment, the current time becomes
10000. If the TAS base time is still 20000, it will be a future time,
and TAS entry list will stop running. Another example, if the current
time becomes to be 10000000 after PTP clock adjust, a large time offset
can cause the hardware to hang.

This patch introduces a tas_clock_adjust() function to reset the TAS
module by using a new base time after the PTP clock adjustment. This can
avoid issues above.

Due to PTP clock adjustment can occur at any time, it may conflict with
the TAS configuration. We introduce a new TAS lock to serialize the
access to the TAS registers.
Signed-off-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8670dc33

23 5月, 2022 5 次提交

net: mscc: ocelot: switch from {,un}set to {,un}assign for tag_8021q CPU ports · c295f983

由 Vladimir Oltean 提交于 5月 22, 2022

There is a desire for the felix driver to gain support for multiple
tag_8021q CPU ports, but the current model prevents it.

This is because ocelot_apply_bridge_fwd_mask() only takes into
consideration whether a port is a tag_8021q CPU port, but not whose CPU
port it is.

We need a model where we can have a direct affinity between an ocelot
port and a tag_8021q CPU port. This serves as the basis for multiple CPU
ports.

Declare a "dsa_8021q_cpu" backpointer in struct ocelot_port which
encodes that affinity. Repurpose the "ocelot_set_dsa_8021q_cpu" API to
"ocelot_assign_dsa_8021q_cpu" to express the change of paradigm.

Note that this change makes the first practical use of the new
ocelot_port->index field in ocelot_port_unassign_dsa_8021q_cpu(), where
we need to remove the old tag_8021q CPU port from the reserved VLAN range.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c295f983

net: dsa: felix: directly call ocelot_port_{set,unset}_dsa_8021q_cpu · 8c166acb

由 Vladimir Oltean 提交于 5月 22, 2022

Absorb the final details of calling ocelot_port_{,un}set_dsa_8021q_cpu(),
i.e. the need to lock &ocelot->fwd_domain_lock, into the callee, to
simplify the caller and permit easier code reuse later.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c166acb

net: dsa: felix: update bridge fwd mask from ocelot lib when changing tag_8021q CPU · a72e23dd

由 Vladimir Oltean 提交于 5月 22, 2022

Add more logic to ocelot_port_{,un}set_dsa_8021q_cpu() from the ocelot
switch lib by encapsulating the ocelot_apply_bridge_fwd_mask() call that
felix used to have.

This is necessary because the CPU port change procedure will also need
to do this, and it's good to reduce code duplication by having an entry
point in the ocelot switch lib that does all that is needed.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a72e23dd

net: dsa: felix: move the updating of PGID_CPU to the ocelot lib · 61be79ba

由 Vladimir Oltean 提交于 5月 22, 2022

PGID_CPU must be updated every time a port is configured or unconfigured
as a tag_8021q CPU port. The ocelot switch lib already has a hook for
that operation, so move the updating of PGID_CPU to those hooks.

These bits are pretty specific to DSA, so normally I would keep them out
of the common switch lib, but when tag_8021q is in use, this has
implications upon the forwarding mask determined by
ocelot_apply_bridge_fwd_mask() and called extensively by the switch lib.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61be79ba

net: mscc: ocelot: offload tc action "ok" using an empty action vector · 4149af28

由 Vladimir Oltean 提交于 5月 22, 2022

The "ok" tc action is useful when placed in front of a more generic
filter to exclude some more specific rules from matching it.

The ocelot switches can offload this tc action by creating an empty
action vector (no _ENA fields set to 1). This makes sense for all of
VCAP IS1, IS2 and ES0 (but not for PSFP).

Add support for this action. Note that this makes the
gact_drop_and_ok_test() selftest pass, where "action ok" is used in
front of an "action drop" rule, both offloaded to VCAP IS2.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4149af28

21 5月, 2022 1 次提交

net: mscc: fix the alignment in ocelot_port_fdb_del() · f7b5a89c

由 Alaa Mohamed 提交于 5月 20, 2022

align the extack argument of the ocelot_port_fdb_del()
function.
Signed-off-by: NAlaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520002040.4442-1-eng.alaamohamedsoliman.am@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

f7b5a89c

13 5月, 2022 4 次提交

net: mscc: ocelot: move ocelot_port_private :: chip_port to ocelot_port :: index · 7e708760

由 Vladimir Oltean 提交于 5月 11, 2022

Currently the ocelot switch lib is unaware of the index of a struct
ocelot_port, since that is kept in the encapsulating structures of outer
drivers (struct dsa_port :: index, struct ocelot_port_private :: chip_port).

With the upcoming increase in complexity associated with assigning DSA
tag_8021q CPU ports to certain user ports, it becomes necessary for the
switch lib to be able to retrieve the index of a certain ocelot_port.

Therefore, introduce a new u8 to ocelot_port (same size as the chip_port
used by the ocelot switchdev driver) and rework the existing code to
populate and use it.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

7e708760

net: dsa: felix: bring the NPI port indirection for host flooding to surface · 910ee6cc

由 Vladimir Oltean 提交于 5月 11, 2022

For symmetry with host FDBs and MDBs where the indirection is now
handled outside the ocelot switch lib, do the same for bridge port
flags (unicast/multicast/broadcast flooding).

The only caller of the ocelot switch lib which uses the NPI port is the
Felix DSA driver.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

910ee6cc

net: dsa: felix: bring the NPI port indirection for host MDBs to surface · 0ddf83cd

由 Vladimir Oltean 提交于 5月 11, 2022

For symmetry with host FDBs where the indirection is now handled outside
the ocelot switch lib, do the same for host MDB entries. The only caller
of the ocelot switch lib which uses the NPI port is the Felix DSA driver.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

0ddf83cd

net: dsa: felix: program host FDB entries towards PGID_CPU for tag_8021q too · e9b3ba43

由 Vladimir Oltean 提交于 5月 11, 2022

I remembered why we had the host FDB migration procedure in place.

It is true that host FDB entry migration can be done by changing the
value of PGID_CPU, but the problem is that only host FDB entries learned
while operating in NPI mode go to PGID_CPU. When the CPU port operates
in tag_8021q mode, the FDB entries are learned towards the unicast PGID
equal to the physical port number of this CPU port, bypassing the
PGID_CPU indirection.

So host FDB entries learned in tag_8021q mode are not migrated any
longer towards the NPI port.

Fix this by extracting the NPI port -> PGID_CPU redirection from the
ocelot switch lib, moving it to the Felix DSA driver, and applying it
for any CPU port regardless of its kind (NPI or tag_8021q).

Fixes: a51c1c3f ("net: dsa: felix: stop migrating FDBs back and forth on tag proto change")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e9b3ba43

09 5月, 2022 1 次提交

rtnetlink: add extack support in fdb del handlers · ca4567f1

由 Alaa Mohamed 提交于 5月 05, 2022

Add extack support to .ndo_fdb_del in netdevice.h and
all related methods.
Signed-off-by: NAlaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca4567f1

08 5月, 2022 1 次提交

eth: switch to netif_napi_add_weight() · b707b89f

由 Jakub Kicinski 提交于 5月 06, 2022

Switch all Ethernet drivers which use custom napi weights
to the new API.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b707b89f

07 5月, 2022 1 次提交

net: dsa: felix: perform MDB migration based on ocelot->multicast list · 28de0f9f

由 Vladimir Oltean 提交于 5月 05, 2022

The felix driver is the only user of dsa_port_walk_mdbs(), and there
isn't even a good reason for it, considering that the host MDB entries
are already saved by the ocelot switch lib in the ocelot->multicast list.

Rewrite the multicast entry migration procedure around the
ocelot->multicast list so we can delete dsa_port_walk_mdbs().
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

28de0f9f

06 5月, 2022 5 次提交

net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters · 93a84170

由 Vladimir Oltean 提交于 5月 05, 2022

Given the following order of operations:

(1) we add filter A using tc-flower
(2) we send a packet that matches it
(3) we read the filter's statistics to find a hit count of 1
(4) we add a second filter B with a higher preference than A, and A
    moves one position to the right to make room in the TCAM for it
(5) we send another packet, and this matches the second filter B
(6) we read the filter statistics again.

When this happens, the hit count of filter A is 2 and of filter B is 1,
despite a single packet having matched each filter.

Furthermore, in an alternate history, reading the filter stats a second
time between steps (3) and (4) makes the hit count of filter A remain at
1 after step (6), as expected.

The reason why this happens has to do with the filter->stats.pkts field,
which is written to hardware through the call path below:

               vcap_entry_set
               /      |      \
              /       |       \
             /        |        \
            /         |         \
es0_entry_set   is1_entry_set   is2_entry_set
            \         |         /
             \        |        /
              \       |       /
        vcap_data_set(data.counter, ...)

The primary role of filter->stats.pkts is to transport the filter hit
counters from the last readout all the way from vcap_entry_get() ->
ocelot_vcap_filter_stats_update() -> ocelot_cls_flower_stats().
The reason why vcap_entry_set() writes it to hardware is so that the
counters (saturating and having a limited bit width) are cleared
after each user space readout.

The writing of filter->stats.pkts to hardware during the TCAM entry
movement procedure is an unintentional consequence of the code design,
because the hit count isn't up to date at this point.

So at step (4), when filter A is moved by ocelot_vcap_filter_add() to
make room for filter B, the hardware hit count is 0 (no packet matched
on it in the meantime), but filter->stats.pkts is 1, because the last
readout saw the earlier packet. The movement procedure programs the old
hit count back to hardware, so this creates the impression to user space
that more packets have been matched than they really were.

The bug can be seen when running the gact_drop_and_ok_test() from the
tc_actions.sh selftest.

Fix the issue by reading back the hit count to tmp->stats.pkts before
migrating the VCAP filter. Sure, this is a best-effort technique, since
the packets that hit the rule between vcap_entry_get() and
vcap_entry_set() won't be counted, but at least it allows the counters
to be reliably used for selftests where the traffic is under control.

The vcap_entry_get() name is a bit unintuitive, but it only reads back
the counter portion of the TCAM entry, not the entire entry.

The index from which we retrieve the counter is also a bit unintuitive
(i - 1 during add, i + 1 during del), but this is the way in which TCAM
entry movement works. The "entry index" isn't a stored integer for a
TCAM filter, instead it is dynamically computed by
ocelot_vcap_block_get_filter_index() based on the entry's position in
the &block->rules list. That position (as well as block->count) is
automatically updated by ocelot_vcap_filter_add_to_block() on add, and
by ocelot_vcap_block_remove_filter() on del. So "i" is the new filter
index, and "i - 1" or "i + 1" respectively are the old addresses of that
TCAM entry (we only support installing/deleting one filter at a time).

Fixes: b5962294 ("net: mscc: ocelot: Add support for tcam")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

93a84170

net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0 · 477d2b91

由 Vladimir Oltean 提交于 5月 05, 2022

Once the CPU port was added to the destination port mask of a packet, it
can never be cleared, so even packets marked as dropped by the MASK_MODE
of a VCAP IS2 filter will still reach it. This is why we need the
OCELOT_POLICER_DISCARD to "kill dropped packets dead" and make software
stop seeing them.

We disallow policer rules from being put on any other chain than the one
for the first lookup, but we don't do this for "drop" rules, although we
should. This change is merely ascertaining that the rules dont't
(completely) work and letting the user know.

The blamed commit is the one that introduced the multi-chain architecture
in ocelot. Prior to that, we should have always offloaded the filters to
VCAP IS2 lookup 0, where they did work.

Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

477d2b91

net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups · 6741e118

由 Vladimir Oltean 提交于 5月 05, 2022

The VCAP IS2 TCAM is looked up twice per packet, and each filter can be
configured to only match during the first, second lookup, or both, or
none.

The blamed commit wrote the code for making VCAP IS2 filters match only
on the given lookup. But right below that code, there was another line
that explicitly made the lookup a "don't care", and this is overwriting
the lookup we've selected. So the code had no effect.

Some of the more noticeable effects of having filters match on both
lookups:

- in "tc -s filter show dev swp0 ingress", we see each packet matching a
VCAP IS2 filter counted twice. This throws off scripts such as
tools/testing/selftests/net/forwarding/tc_actions.sh and makes them
fail.

- a "tc-drop" action offloaded to VCAP IS2 needs a policer as well,
because once the CPU port becomes a member of the destination port
mask of a packet, nothing removes it, not even a PERMIT/DENY mask mode
with a port mask of 0. But VCAP IS2 rules with the POLICE_ENA bit in
the action vector can only appear in the first lookup. What happens
when a filter matches both lookups is that the action vector is
combined, and this makes the POLICE_ENA bit ineffective, since the
last lookup in which it has appeared is the second one. In other
words, "tc-drop" actions do not drop packets for the CPU port, dropped
packets are still seen by software unless there was an FDB entry that
directed those packets to some other place different from the CPU.

The last bit used to work, because in the initial commit b5962294
("net: mscc: ocelot: Add support for tcam"), we were writing the FIRST
field of the VCAP IS2 half key with a 1, not with a "don't care".
The change to "don't care" was made inadvertently by me in commit
c1c3993e ("net: mscc: ocelot: generalize existing code for VCAP"),
which I just realized, and which needs a separate fix from this one,
for "stable" kernels that lack the commit blamed below.

Fixes: 226e9cd8 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

6741e118

net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted · 16bbebd3

由 Vladimir Oltean 提交于 5月 05, 2022

ocelot_vcap_filter_del() works by moving the next filters over the
current one, and then deleting the last filter by calling vcap_entry_set()
with a del_filter which was specially created by memsetting its memory
to zeroes. vcap_entry_set() then programs this to the TCAM and action
RAM via the cache registers.

The problem is that vcap_entry_set() is a dispatch function which looks
at del_filter->block_id. But since del_filter is zeroized memory, the
block_id is 0, or otherwise said, VCAP_ES0. So practically, what we do
is delete the entry at the same TCAM index from VCAP ES0 instead of IS1
or IS2.

The code was not always like this. vcap_entry_set() used to simply be
is2_entry_set(), and then, the logic used to work.

Restore the functionality by populating the block_id of the del_filter
based on the VCAP block of the filter that we're deleting. This makes
vcap_entry_set() know what to do.

Fixes: 1397a2eb ("net: mscc: ocelot: create TCAM skeleton from tc filter chains")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

16bbebd3

net: mscc: ocelot: mark traps with a bool instead of keeping them in a list · e1846cff

由 Vladimir Oltean 提交于 5月 05, 2022

Since the blamed commit, VCAP filters can appear on more than one list.
If their action is "trap", they are chained on ocelot->traps via
filter->trap_list. This is in addition to their normal placement on the
VCAP block->rules list head.

Therefore, when we free a VCAP filter, we must remove it from all lists
it is a member of, including ocelot->traps.

There are at least 2 bugs which are direct consequences of this design
decision.

First is the incorrect usage of list_empty(), meant to denote whether
"filter" is chained into ocelot->traps via filter->trap_list.
This does not do the correct thing, because list_empty() checks whether
"head->next == head", but in our case, head->next == head->prev == NULL.
So we dereference NULL pointers and die when we call list_del().

Second is the fact that not all places that should remove the filter
from ocelot->traps do so. One example is ocelot_vcap_block_remove_filter(),
which is where we have the main kfree(filter). By keeping freed filters
in ocelot->traps we end up in a use-after-free in
felix_update_trapping_destinations().

Attempting to fix all the buggy patterns is a whack-a-mole game which
makes the driver unmaintainable. Actually this is what the previous
patch version attempted to do:
https://patchwork.kernel.org/project/netdevbpf/patch/20220503115728.834457-3-vladimir.oltean@nxp.com/

but it introduced another set of bugs, because there are other places in
which create VCAP filters, not just ocelot_vcap_filter_create():

- ocelot_trap_add()
- felix_tag_8021q_vlan_add_rx()
- felix_tag_8021q_vlan_add_tx()

Relying on the convention that all those code paths must call
INIT_LIST_HEAD(&filter->trap_list) is not going to scale.

So let's do what should have been done in the first place and keep a
bool in struct ocelot_vcap_filter which denotes whether we are looking
at a trapping rule or not. Iterating now happens over the main VCAP IS2
block->rules. The advantage is that we no longer risk having stale
references to a freed filter, since it is only present in that list.

Fixes: e42bd4ed ("net: mscc: ocelot: keep traps in a list")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e1846cff

05 5月, 2022 5 次提交

net: mscc: ocelot: don't use magic numbers for OCELOT_POLICER_DISCARD · 91d350d6

由 Vladimir Oltean 提交于 5月 03, 2022

OCELOT_POLICER_DISCARD helps "kill dropped packets dead" since a
PERMIT/DENY mask mode with a port mask of 0 isn't enough to stop the CPU
port from receiving packets removed from the forwarding path.

The hardcoded initialization done for it in ocelot_vcap_init() is
confusing. All we need from it is to have a rate and a burst size of 0.

Reuse qos_policer_conf_set() for that purpose.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

91d350d6

net: mscc: ocelot: drop port argument from qos_policer_conf_set · 8e90c499

由 Vladimir Oltean 提交于 5月 03, 2022

The "port" argument is used for nothing else except printing on the
error path. Print errors on behalf of the policer index, which is less
confusing anyway.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8e90c499

net: mscc: ocelot: use list_for_each_entry in ocelot_vcap_filter_add_to_block · 09fd1e0d

由 Vladimir Oltean 提交于 5月 03, 2022

Unify the code paths for adding to an empty list and to a list with
elements by keeping a "pos" list_head element that indicates where to
insert. Initialize "pos" with the list head itself in case
list_for_each_entry() doesn't iterate over any element.

Note that list_for_each_safe() isn't needed because no element is
removed from the list while iterating.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

09fd1e0d

net: mscc: ocelot: add to tail of empty list in ocelot_vcap_filter_add_to_block · 3825a0d0

由 Vladimir Oltean 提交于 5月 03, 2022

This makes no functional difference but helps in minimizing the delta
for a future change.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3825a0d0

net: mscc: ocelot: use list_add_tail in ocelot_vcap_filter_add_to_block() · 0a448bba

由 Vladimir Oltean 提交于 5月 03, 2022

list_add(..., pos->prev) and list_add_tail(..., pos) are equivalent, use
the later form to unify with the case where the list is empty later.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

0a448bba

30 4月, 2022 1 次提交

net: ethernet: ocelot: remove the need for num_stats initializer · 2f187bfa

由 Colin Foster 提交于 4月 29, 2022

There is a desire to share the oclot_stats_layout struct outside of the
current vsc7514 driver. In order to do so, the length of the array needs to
be known at compile time, and defined in the struct ocelot and struct
felix_info.

Since the array is defined in a .c file and would be declared in the header
file via:
extern struct ocelot_stat_layout[];
the size of the array will not be known at compile time to outside modules.

To fix this, remove the need for defining the number of stats at compile
time and allow this number to be determined at initialization.
Signed-off-by: NColin Foster <colin.foster@in-advantage.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f187bfa

25 4月, 2022 3 次提交

net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware bridge · 1fcb8fb3

由 Vladimir Oltean 提交于 4月 22, 2022

DSA, through dsa_port_bridge_leave(), first notifies the port of the
fact that it left a bridge, then, if that bridge was VLAN-aware, it
notifies the port of the change in VLAN awareness state, towards
VLAN-unaware mode.

So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge
is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct
ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on
that port.

In a way this structure correctly reflects the reality, but by design,
VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge
VLAN list of the driver, but managed separately.

Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on
several sanity checks that did not expect to have this VID there.
For example, after we leave a VLAN-aware bridge and we re-join it, we
can no longer program egress-tagged VLANs to hardware:

 # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
 # ip link set swp0 master br0
 # ip link set swp0 nomaster
 # ip link set swp0 master br0
 # bridge vlan add dev swp0 vid 100
Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs.

But this configuration is in fact supported by the hardware, since we
could use OCELOT_PORT_TAG_NATIVE. According to its comment:

/* all VLANs except the native VLAN and VID 0 are egress-tagged */

yet when assessing the eligibility for this mode, we do not check for
VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that
ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0
doesn't have a bridge VLAN structure.

The way I identify the problem is that ocelot_port_vlan_filtering(false)
only means to call ocelot_add_vlan_unaware_pvid() when we dynamically
turn off VLAN awareness for a bridge we are under, and the PVID changes
from the bridge PVID to a reserved PVID based on the bridge number.

Since OCELOT_STANDALONE_PVID is statically added to the VLAN table
during ocelot_vlan_init() and never removed afterwards, calling
ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve
any purpose.

Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0)
when we're resetting VLAN awareness after leaving the bridge, to become
a standalone port.

Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fcb8fb3

net: mscc: ocelot: ignore VID 0 added by 8021q module · 9323ac36

由 Vladimir Oltean 提交于 4月 22, 2022

Both the felix DSA driver and ocelot switchdev driver declare
dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*,
so the 8021q module will add VID 0 to our RX filter when the port goes
up, to ensure 802.1p traffic is not dropped.

We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which
deliberately does not have a struct ocelot_bridge_vlan associated with
it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init().

If we allow external calls to modify VID 0, we reach the following
situation:

# ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
# ip link set swp0 master br0
# ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false
bridge vlan
port vlan-id
swp0 1 PVID Egress Untagged # the bridge also adds VID 1
br0 1 PVID Egress Untagged
# bridge vlan add dev swp0 vid 100 untagged
Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN.

This configuration should have been accepted, because
ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE.
Yet it isn't, because we have an entry in ocelot->vlans which says
VID 0 should be egress-tagged, something the hardware can't do.

Fix this by suppressing additions/deletions on VID 0 and managing this
VLAN exclusively using OCELOT_STANDALONE_PVID.

*DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware
bridge. Ocelot declares it unconditionally for some reason.

Fixes: 54c31984 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9323ac36

net: mscc: ocelot: Remove useless code · 985e254c

由 Haowen Bai 提交于 4月 21, 2022

payload only memset but no use at all, so we drop them.
Signed-off-by: NHaowen Bai <baihaowen@meizu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

985e254c

19 4月, 2022 1 次提交

net: mscc: ocelot: fix broken IP multicast flooding · 4cf35a2b

由 Vladimir Oltean 提交于 4月 15, 2022

When the user runs:
bridge link set dev $br_port mcast_flood on

this command should affect not only L2 multicast, but also IPv4 and IPv6
multicast.

In the Ocelot switch, unknown multicast gets flooded according to
different PGIDs according to its type, and PGID_MC only handles L2
multicast. Therefore, by leaving PGID_MCIPV4 and PGID_MCIPV6 at their
default value of 0, unknown IP multicast traffic is never flooded.

Fixes: 421741ea ("net: mscc: ocelot: offload bridge port flags to device")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220415151950.219660-1-vladimir.oltean@nxp.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

4cf35a2b

18 3月, 2022 3 次提交

net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2 · f2a0e216

由 Vladimir Oltean 提交于 3月 16, 2022

Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an
offload for tc-flower) is done by setting the MIRROR_ENA bit from the
action vector of the filter. The packet is mirrored to the port mask
configured in the ANA:ANA:MIRRORPORTS register (the same port mask as
the destinations for port-based mirroring).

Functionality was tested with:

tc qdisc add dev swp3 clsact
tc filter add dev swp3 ingress protocol ip \
	flower skip_sw ip_proto icmp \
	action mirred egress mirror dev swp1

and pinging through swp3, while seeing that the ICMP replies are
mirrored towards swp1.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f2a0e216

net: mscc: ocelot: establish functions for handling VCAP aux resources · c3d427ea

由 Vladimir Oltean 提交于 3月 16, 2022

Some VCAP filters utilize resources which are global to the switch, like
for example VCAP IS2 policers take an index into a global policer pool.

In commit c9a7fe12 ("net: mscc: ocelot: add action of police on
vcap_is2"), Xiaoliang expressed this by hooking into the low-level
ocelot_vcap_filter_add_to_block() and ocelot_vcap_block_remove_filter()
functions, and allocating/freeing the policers from there.

Evaluating the code, there probably isn't a better place, but we'll need
to do something similar for the mirror ports, and the code will start to
look even more hacked up than it is right now.

Create two ocelot_vcap_filter_{add,del}_aux_resources() functions to
contain the madness, and pollute less the body of other functions such
as ocelot_vcap_filter_add_to_block().
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c3d427ea

net: mscc: ocelot: add port mirroring support using tc-matchall · ccb6ed42

由 Vladimir Oltean 提交于 3月 16, 2022

Ocelot switches perform port-based ingress mirroring if
ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if
the port is in ANA:ANA:EMIRRORPORTS.

Both ingress-mirrored and egress-mirrored frames are copied to the port
mask from ANA:ANA:MIRRORPORTS.

So the choice of limiting to a single mirror port via ocelot_mirror_get()
and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't
map very well to the user space model. If the user wants to mirror the
ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd
have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would
make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there
are no tc-matchall rules to describe those actions.

Now, we could offload a matchall rule with multiple mirred actions, one
per desired mirror port, and force the user to stick to the multi-action
rule format for subsequent matchall filters. But both DSA and ocelot
have the flow_offload_has_one_action() check for the matchall offload,
plus the fact that it will get cumbersome to cross-check matchall
mirrors with flower mirrors (which will be added in the next patch).

As a result, we limit the configuration to a single mirror port, with
the possibility of lifting the restriction in the future.

Frames injected from the CPU don't get egress-mirrored, since they are
sent with the BYPASS bit in the injection frame header, and this
bypasses the analyzer module (effectively also the mirroring logic).
I don't know what to do/say about this.

Functionality was tested with:

tc qdisc add dev swp3 clsact
tc filter add dev swp3 ingress \
	matchall skip_sw \
	action mirred egress mirror dev swp1

and pinging through swp3, while seeing that the ICMP replies are
mirrored towards swp1.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ccb6ed42

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功