- 28 4月, 2021 2 次提交
-
-
由 Yangbo Lu 提交于
Convert to a common ocelot_port_txtstamp_request() for TX timestamp request handling. Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com> Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yangbo Lu 提交于
Free skb->cb usage in core driver and let device drivers decide to use or not. The reason having a DSA_SKB_CB(skb)->clone was because dsa_skb_tx_timestamp() which may set the clone pointer was called before p->xmit() which would use the clone if any, and the device driver has no way to initialize the clone pointer. This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning of dsa_slave_xmit(). Some new features in the future, like one-step timestamp may need more bytes of skb->cb to use in dsa_skb_tx_timestamp(), and p->xmit(). Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com> Acked-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
The premise of this change is that the switchdev port attributes and objects offloaded by ocelot might have been missed when we are joining an already existing bridge port, such as a bonding interface. The patch pulls these switchdev attributes and objects from the bridge, on behalf of the 'bridge port' net device which might be either the ocelot switch interface, or the bonding upper interface. The ocelot_net.c belongs strictly to the switchdev ocelot driver, while ocelot.c is part of a library shared with the DSA felix driver. The ocelot_port_bridge_leave function (part of the common library) used to call ocelot_port_vlan_filtering(false), something which is not necessary for DSA, since the framework deals with that already there. So we move this function to ocelot_switchdev_unsync, which is specific to the switchdev driver. The code movement described above makes ocelot_port_bridge_leave no longer return an error code, so we change its type from int to void. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
The ocelot switches are a bit odd in that they do not have an STP state to put the ports into. Instead, the forwarding configuration is delayed from the typical port_bridge_join into stp_state_set, when the port enters the BR_STATE_FORWARDING state. I can only guess that the implementation of this quirk is the reason that led to the simplification of the driver such that only one bridge could be offloaded at a time. We can simplify the data structures somewhat, and introduce a per-port bridge device pointer and STP state, similar to how the LAG offload works now (there we have a per-port bonding device pointer and TX enabled state). This allows offloading multiple bridges with relative ease, while still keeping in place the quirk to delay the programming of the PGIDs. We actually need this change now because we need to remove the bogus restriction from ocelot_bridge_stp_state_set that ocelot->bridge_mask needs to contain BIT(port), otherwise that function is a no-op. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 3月, 2021 2 次提交
-
-
由 Horatiu Vultur 提交于
This patch extends MRP support for Ocelot. It allows to have multiple rings and when the node has the MRC role it forwards MRP Test frames in HW. For MRM there is no change. Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Horatiu Vultur 提交于
Add a new PGID that is used not to forward frames anywhere. It is used by MRP to make sure that MRP Test frames will not reach CPU port. Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2021 1 次提交
-
-
由 Horatiu Vultur 提交于
Add basic support for MRP. The HW will just trap all MRP frames on the ring ports to CPU and allow the SW to process them. In this way it is possible to for this node to behave both as MRM and MRC. Current limitations are: - it doesn't support Interconnect roles. - it supports only a single ring. - the HW should be able to do forwarding of MRP Test frames so the SW will not need to do this. So it would be able to have the role MRC without SW support. Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 2月, 2021 4 次提交
-
-
由 Vladimir Oltean 提交于
For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Since the felix DSA driver will need to poll the CPU port module for extracted frames as well, let's create some common functions that read an Extraction Frame Header, and then an skb, from a CPU extraction group. We abuse the struct ocelot_ops :: port_to_netdev function a little bit, in order to retrieve the DSA port net_device or the ocelot switchdev net_device based on the source port information from the Extraction Frame Header, but it's all in the benefit of code simplification - netdev_alloc_skb needs it. Originally, the port_to_netdev method was intended for parsing act->dev from tc flower offload code. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
The Injection Frame Header and Extraction Frame Header that the switch prepends to frames over the NPI port is also prepended to frames delivered over the CPU port module's queues. Let's unify the handling of the frame headers by making the ocelot driver call some helpers exported by the DSA tagger. Among other things, this allows us to get rid of the strange cpu_to_be32 when transmitting the Injection Frame Header on ocelot, since the packing API uses network byte order natively (when "quirks" is 0). The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and the cpu extraction queue mask to something, but the code doesn't do it, so we don't do it either. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
The felix DSA driver will inject some frames through register MMIO, same as ocelot switchdev currently does. So we need to be able to reuse the common code. Also create some shim definitions, since the DSA tagger can be compiled without support for the switch driver. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 2月, 2021 2 次提交
-
-
由 Vladimir Oltean 提交于
We should not be unconditionally enabling address learning, since doing that is actively detrimential when a port is standalone and not offloading a bridge. Namely, if a port in the switch is standalone and others are offloading the bridge, then we could enter a situation where we learn an address towards the standalone port, but the bridged ports could not forward the packet there, because the CPU is the only path between the standalone and the bridged ports. The solution of course is to not enable address learning unless the bridge asks for it. We need to set up the initial port flags for no learning and flooding everything, and also when the port joins and leaves the bridge. The flood configuration was already configured ok for standalone mode in ocelot_init, we just need to disable learning in ocelot_init_port. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
In preparation of offloading the bridge port flags which have independent settings for unknown multicast and for broadcast, we should also start reserving one destination Port Group ID for the flooding of broadcast packets, to allow configuring it individually. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 2月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75 ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2021 4 次提交
-
-
由 Vladimir Oltean 提交于
The ocelot switch has been supporting LAG offload since its initial commit, however felix could not make use of that, due to lack of a LAG abstraction in DSA. Now that we have that, let's forward DSA's calls towards the ocelot library, who will deal with setting up the bonding. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
At present there is an issue when ocelot is offloading a bonding interface, but one of the links of the physical ports goes down. Traffic keeps being hashed towards that destination, and of course gets dropped on egress. Monitor the netdev notifier events emitted by the bonding driver for changes in the physical state of lower interfaces, to determine which ports are active and which ones are no longer. Then extend ocelot_get_bond_mask to return either the configured bonding interfaces, or the active ones, depending on a boolean argument. The code that does rebalancing only needs to do so among the active ports, whereas the bridge forwarding mask and the logical port IDs still need to look at the permanently bonded ports. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
We can now simplify the implementation by always using ocelot_get_bond_mask to look up the other ports that are offloading the same bonding interface as us. In ocelot_set_aggr_pgids, the code had a way to uniquely iterate through LAGs. We need to achieve the same behavior by marking each LAG as visited, which we do now by using a temporary 32-bit "visited" bitmask. This is ok and we do not need dynamic memory allocation, because we know that this switch architecture will not have more than 32 ports (the PGID port masks are 32-bit anyway). Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
Since this code should be called from pure switchdev as well as from DSA, we must find a way to determine the bonding mask not by looking directly at the net_device lowers of the bonding interface, since those could have different private structures. We keep a pointer to the bonding upper interface, if present, in struct ocelot_port. Then the bonding mask becomes the bitwise OR of all ports that have the same bonding upper interface. This adds a duplication of functionality with the current "lags" array, but the duplication will be short-lived, since further patches will remove the latter completely. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 30 1月, 2021 2 次提交
-
-
由 Vladimir Oltean 提交于
Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
Context: Ocelot switches put the injection/extraction frame header in front of the Ethernet header. When used in NPI mode, a DSA master would see junk instead of the destination MAC address, and it would most likely drop the packets. So the Ocelot frame header can have an optional prefix, which is just "ff:ff:ff:ff:ff:fe > ff:ff:ff:ff:ff:ff" padding put before the actual tag (still before the real Ethernet header) such that the DSA master thinks it's looking at a broadcast frame with a strange EtherType. Unfortunately, a lesson learned in commit 69df578c ("net: mscc: ocelot: eliminate confusion between CPU and NPI port") seems to have been forgotten in the meanwhile. The CPU port module and the NPI port have independent settings for the length of the tag prefix. However, the driver is using the same variable to program both of them. There is no reason really to use any tag prefix with the CPU port module, since that is not connected to any Ethernet port. So this patch makes the inj_prefix and xtr_prefix variables apply only to the NPI port (which the switchdev ocelot_vsc7514 driver does not use). Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 16 1月, 2021 5 次提交
-
-
由 Vladimir Oltean 提交于
Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
Add devlink integration into the mscc_ocelot switchdev driver. All physical ports (i.e. the unused ones as well) except the CPU port module at ocelot->num_phys_ports are registered with devlink, and that requires keeping the devlink_port structure outside struct ocelot_port_private, since the latter has a 1:1 mapping with a struct net_device (which does not exist for unused ports). Since we use devlink_port_type_eth_set to link the devlink port to the net_device, we can as well remove the .ndo_get_phys_port_name and .ndo_get_port_parent_id implementations, since devlink takes care of retrieving the port name and number automatically, once .ndo_get_devlink_port is implemented. Note that the felix DSA driver is already integrated with devlink by default, since that is a thing that the DSA core takes care of. This is the reason why these devlink stubs were put in ocelot_net.c and not in the common library. It is also the reason why ocelot::devlink is a pointer and not a full structure embedded inside struct ocelot: because the mscc_ocelot driver allocates that by itself (as the container of struct ocelot, in fact), but in the case of felix, it is DSA who allocates the devlink, and felix just propagates the pointer towards struct ocelot. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
We should be moving anything that isn't DSA-specific or SoC-specific out of the felix DSA driver, and into the common mscc_ocelot switch library. The number of traffic classes is one of the aspects that is common between all ocelot switches, so it belongs in the library. This patch also makes seville use 8 TX queues, and therefore enables prioritization via the QOS_CLASS field in the NPI injection header. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
We'll need to read back the watermark thresholds and occupancy from hardware (for devlink-sb integration), not only to write them as we did so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct ocelot_ops, similar to wm_enc, and implement them for the 3 supported mscc_ocelot switches. Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT register, because that doesn't scale with the number of switches that mscc_ocelot supports now. They have different bit widths for the watermarks, and we need function pointers to abstract that difference away. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
Instead of reading these values from the reference manual and writing them down into the driver, it appears that the hardware gives us the option of detecting them dynamically. The number of frame references corresponds to what the reference manual notes, however it seems that the frame buffers are reported as slightly less than the books would indicate. On VSC9959 (Felix), the books say it should have 128KB of packet buffer, but the registers indicate only 129840 bytes (126.79 KB). Also, the unit of measurement for FREECNT from the documentation of all these devices is incorrect (taken from an older generation). This was confirmed by Younes Leroul from Microchip support. Not having anything better to do with these values at the moment* (this will change soon), let's just print them. *The frame buffer size is, in fact, used to calculate the tail dropping watermarks. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 12 1月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
Since the introduction of the switchdev API, port attributes were transmitted to drivers for offloading using a two-step transactional model, with a prepare phase that was supposed to catch all errors, and a commit phase that was supposed to never fail. Some classes of failures can never be avoided, like hardware access, or memory allocation. In the latter case, merely attempting to move the memory allocation to the preparation phase makes it impossible to avoid memory leaks, since commit 91cf8ece ("switchdev: Remove unused transaction item queue") which has removed the unused mechanism of passing on the allocated memory between one phase and another. It is time we admit that separating the preparation from the commit phase is something that is best left for the driver to decide, and not something that should be baked into the API, especially since there are no switchdev callers that depend on this. This patch removes the struct switchdev_trans member from switchdev port attribute notifier structures, and converts drivers to not look at this member. In part, this patch contains a revert of my previous commit 2e554a7a ("net: dsa: propagate switchdev vlan_filtering prepare phase to drivers"). For the most part, the conversion was trivial except for: - Rocker's world implementation based on Broadcom OF-DPA had an odd implementation of ofdpa_port_attr_bridge_flags_set. The conversion was done mechanically, by pasting the implementation twice, then only keeping the code that would get executed during prepare phase on top, then only keeping the code that gets executed during the commit phase on bottom, then simplifying the resulting code until this was obtained. - DSA's offloading of STP state, bridge flags, VLAN filtering and multicast router could be converted right away. But the ageing time could not, so a shim was introduced and this was left for a further commit. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NLinus Walleij <linus.walleij@linaro.org> Acked-by: NJiri Pirko <jiri@nvidia.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB Reviewed-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 15 12月, 2020 1 次提交
-
-
由 Vladimir Oltean 提交于
Currently ocelot_set_rx_mode calls ocelot_mact_learn directly, which has a very nice ocelot_mact_wait_for_completion at the end. Introduced in commit 639c1b26 ("net: mscc: ocelot: Register poll timeout should be wall time not attempts"), this function uses readx_poll_timeout which triggers a lot of lockdep warnings and is also dangerous to use from atomic context, potentially leading to lockups and panics. Steen Hegelund added a poll timeout of 100 ms for checking the MAC table, a duration which is clearly absurd to poll in atomic context. So we need to defer the MAC table access to process context, which we do via a dynamically allocated workqueue which contains all there is to know about the MAC table operation it has to do. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20201212191612.222019-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 06 12月, 2020 1 次提交
-
-
由 Vladimir Oltean 提交于
The current assumption is that the felix DSA driver has flooding knobs per traffic class, while ocelot switchdev has a single flooding knob. This was correct for felix VSC9959 and ocelot VSC7514, but with the introduction of seville VSC9953, we see a switch driven by felix.c which has a single flooding knob. So it is clear that we must do what should have been done from the beginning, which is not to overwrite the configuration done by ocelot.c in felix, but instead to teach the common ocelot library about the differences in our switches, and set up the flooding PGIDs centrally. The effect that the bogus iteration through FELIX_NUM_TC has upon seville is quite dramatic. ANA_FLOODING is located at 0x00b548, and ANA_FLOODING_IPMC is located at 0x00b54c. So the bogus iteration will actually overwrite ANA_FLOODING_IPMC when attempting to write ANA_FLOODING[1]. There is no ANA_FLOODING[1] in sevile, just ANA_FLOODING. And when ANA_FLOODING_IPMC is overwritten with a bogus value, the effect is that ANA_FLOODING_IPMC gets the value of 0x0003CF7D: MC6_DATA = 61, MC6_CTRL = 61, MC4_DATA = 60, MC4_CTRL = 0. Because MC4_CTRL is zero, this means that IPv4 multicast control packets are not flooded, but dropped. An invalid configuration, and this is how the issue was actually spotted. Reported-by: NEldar Gasanov <eldargasanov2@gmail.com> Reported-by: NMaxim Kochetkov <fido_max@inbox.ru> Tested-by: NEldar Gasanov <eldargasanov2@gmail.com> Fixes: 84705fc1 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 3c7b51bd ("net: dsa: felix: allow flooding for all traffic classes") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201204175416.1445937-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 03 11月, 2020 3 次提交
-
-
由 Vladimir Oltean 提交于
Put the preparation phase of switchdev VLAN objects to some good use, and move the check we already had, for preventing the existence of more than one egress-untagged VLAN per port, to the preparation phase of the addition. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
Currently we are checking in some places whether the port has a native VLAN on egress or not, by comparing the ocelot_port->vid value with zero. That works, because VID 0 can never be a native VLAN configured by the bridge, but now we want to make similar checks for the pvid. That won't work, because there are cases when we do have the pvid set to 0 (not by the bridge, by ourselves, but still.. it's confusing). And we can't encode a negative value into an u16, so add a bool to the structure. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vladimir Oltean 提交于
This is a mechanical patch only. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 31 10月, 2020 1 次提交
-
-
由 Vladimir Oltean 提交于
There is one main difference in mscc_ocelot between IP multicast and L2 multicast. With IP multicast, destination ports are encoded into the upper bytes of the multicast MAC address. Example: to deliver the address 01:00:5E:11:22:33 to ports 3, 8, and 9, one would need to program the address of 00:03:08:11:22:33 into hardware. Whereas for L2 multicast, the MAC table entry points to a Port Group ID (PGID), and that PGID contains the port mask that the packet will be forwarded to. As to why it is this way, no clue. My guess is that not all port combinations can be supported simultaneously with the limited number of PGIDs, and this was somehow an issue for IP multicast but not for L2 multicast. Anyway. Prior to this change, the raw L2 multicast code was bogus, due to the fact that there wasn't really any way to test it using the bridge code. There were 2 issues: - A multicast PGID was allocated for each MDB entry, but it wasn't in fact programmed to hardware. It was dummy. - In fact we don't want to reserve a multicast PGID for every single MDB entry. That would be odd because we can only have ~60 PGIDs, but thousands of MDB entries. So instead, we want to reserve a multicast PGID for every single port combination for multicast traffic. And since we can have 2 (or more) MDB entries delivered to the same port group (and therefore PGID), we need to reference-count the PGIDs. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 05 10月, 2020 1 次提交
-
-
由 Vladimir Oltean 提交于
A driver may refuse to enable VLAN filtering for any reason beyond what the DSA framework cares about, such as: - having tc-flower rules that rely on the switch being VLAN-aware - the particular switch does not support VLAN, even if the driver does (the DSA framework just checks for the presence of the .port_vlan_add and .port_vlan_del pointers) - simply not supporting this configuration to be toggled at runtime Currently, when a driver rejects a configuration it cannot support, it does this from the commit phase, which triggers various warnings in switchdev. So propagate the prepare phase to drivers, to give them the ability to refuse invalid configurations cleanly and avoid the warnings. Since we need to modify all function prototypes and check for the prepare phase from within the drivers, take that opportunity and move the existing driver restrictions within the prepare phase where that is possible and easy. Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Landen Chao <Landen.Chao@mediatek.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Jonathan McDowell <noodles@earth.li> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2020 2 次提交
-
-
由 Vladimir Oltean 提交于
For Ocelot switches, there are 2 ingress pipelines for flow offload rules: VCAP IS1 (Ingress Classification) and IS2 (Security Enforcement). IS1 and IS2 support different sets of actions. The pipeline order for a packet on ingress is: Basic classification -> VCAP IS1 -> VCAP IS2 Furthermore, IS1 is looked up 3 times, and IS2 is looked up twice (each TCAM entry can be configured to match only on the first lookup, or only on the second, or on both etc). Because the TCAMs are completely independent in hardware, and because of the fixed pipeline, we actually have very limited options when it comes to offloading complex rules to them while still maintaining the same semantics with the software data path. This patch maps flow offload rules to ingress TCAMs according to a predefined chain index number. There is going to be a script in selftests that clarifies the usage model. There is also an egress TCAM (VCAP ES0, the Egress Rewriter), which is modeled on top of the default chain 0 of the egress qdisc, because it doesn't have multiple lookups. Suggested-by: NAllan W. Nielsen <allan.nielsen@microchip.com> Co-developed-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Since the mscc_ocelot_switch_lib is common between a pure switchdev and a DSA driver, the procedure of retrieving a net_device for a certain port index differs, as those are registered by their individual front-ends. Up to now that has been dealt with by always passing the port index to the switch library, but now, we're going to need to work with net_device pointers from the tc-flower offload, for things like indev, or mirred. It is not desirable to refactor that, so let's make sure that the flower offload core has the ability to translate between a net_device and a port index properly. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2020 5 次提交
-
-
由 Vladimir Oltean 提交于
The numbers in struct vcap_props are not intuitive to derive, because they are not a straightforward copy-and-paste from the reference manual but instead rely on a fairly detailed level of understanding of the layout of an entry in the TCAM and in the action RAM. For this reason, bugs are very easy to introduce here. Ease the work of hardware porters and read from hardware the constants that were exported for this particular purpose. Note that this implies that struct vcap_props can no longer be const. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
As a preparation step for the offloading to ES0, let's create the infrastructure for talking with this hardware block. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
As a preparation step for the offloading to IS1, let's create the infrastructure for talking with this hardware block. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
In the Ocelot switches there are 3 TCAMs: VCAP ES0, IS1 and IS2, which have the same configuration interface, but different sets of keys and actions. The driver currently only supports VCAP IS2. In preparation of VCAP IS1 and ES0 support, the existing code must be generalized to work with any VCAP. In that direction, we should move the structures that depend upon VCAP instantiation, like vcap_is2_keys and vcap_is2_actions, out of struct ocelot and into struct vcap_props .keys and .actions, a structure that is replicated 3 times, once per VCAP. We'll pass that structure as an argument to each function that does the key and action packing - only the control logic needs to distinguish between ocelot->vcap[VCAP_IS2] or IS1 or ES0. Another change is to make use of the newly introduced ocelot_target_read and ocelot_target_write API, since the 3 VCAPs have the same registers but put at different addresses. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
There are some targets (register blocks) in the Ocelot switch that are instantiated more than once. For example, the VCAP IS1, IS2 and ES0 blocks all share the same register layout for interacting with the cache for the TCAM and the action RAM. For the VCAPs, the procedure for servicing them is actually common. We just need an API specifying which VCAP we are talking to, and we do that via these raw ocelot_target_read and ocelot_target_write accessors. In plain ocelot_read, the target is encoded into the register enum itself: u16 target = reg >> TARGET_OFFSET; For the VCAPs, the registers are currently defined like this: enum ocelot_reg { [...] S2_CORE_UPDATE_CTRL = S2 << TARGET_OFFSET, S2_CORE_MV_CFG, S2_CACHE_ENTRY_DAT, S2_CACHE_MASK_DAT, S2_CACHE_ACTION_DAT, S2_CACHE_CNT_DAT, S2_CACHE_TG_DAT, [...] }; which is precisely what we want to avoid, because we'd have to duplicate the same register map for S1 and for S0, and then figure out how to pass VCAP instance-specific registers to the ocelot_read calls (basically another lookup table that undoes the effect of shifting with TARGET_OFFSET). So for some targets, propose a more raw API, similar to what is currently done with ocelot_port_readl and ocelot_port_writel. Those targets can only be accessed with ocelot_target_{read,write} and not with ocelot_{read,write} after the conversion, which is fine. The VCAP registers are not actually modified to use this new API as of this patch. They will be modified in the next one. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-