1. 08 5月, 2020 4 次提交
    • V
      net: dsa: sja1105: implement tc-gate using time-triggered virtual links · 834f8933
      Vladimir Oltean 提交于
      Restrict the TTEthernet hardware support on this switch to operate as
      closely as possible to IEEE 802.1Qci as possible. This means that it can
      perform PTP-time-based ingress admission control on streams identified
      by {DMAC, VID, PCP}, which is useful when trying to ensure the
      determinism of traffic scheduled via IEEE 802.1Qbv.
      
      The oddity comes from the fact that in hardware (and in TTEthernet at
      large), virtual links always need a full-blown action, including not
      only the type of policing, but also the list of destination ports. So in
      practice, a single tc-gate action will result in all packets getting
      dropped. Additional actions (either "trap" or "redirect") need to be
      specified in the same filter rule such that the conforming packets are
      actually forwarded somewhere.
      
      Apart from the VL Lookup, Policing and Forwarding tables which need to
      be programmed for each flow (virtual link), the Schedule engine also
      needs to be told to open/close the admission gates for each individual
      virtual link. A fairly accurate (and detailed) description of how that
      works is already present in sja1105_tas.c, since it is already used to
      trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
      point here, we remember that the schedule engine supports 8
      "subschedules" (execution threads that iterate through the global
      schedule in parallel, and that no 2 hardware threads must execute a
      schedule entry at the same time). For tc-taprio, each egress port used
      one of these 8 subschedules, leaving a total of 4 subschedules unused.
      In principle we could have allocated 1 subschedule for the tc-gate
      offload of each ingress port, but actually the schedules of all virtual
      links installed on each ingress port would have needed to be merged
      together, before they could have been programmed to hardware. So
      simplify our life and just merge the entire tc-gate configuration, for
      all virtual links on all ingress ports, into a single subschedule. Be
      sure to check that against the usual hardware scheduling conflicts, and
      program it to hardware alongside any tc-taprio subschedule that may be
      present.
      
      The following scenarios were tested:
      
      1. Quantitative testing:
      
         tc qdisc add dev swp2 clsact
         tc filter add dev swp2 ingress flower skip_sw \
                 dst_mac 42:be:24:9b:76:20 \
                 action gate index 1 base-time 0 \
                 sched-entry OPEN 1200 -1 -1 \
                 sched-entry CLOSE 1200 -1 -1 \
                 action trap
      
         ping 192.168.1.2 -f
         PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
         .............................
         --- 192.168.1.2 ping statistics ---
         948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
      
      2. Qualitative testing (with a phase-aligned schedule - the clocks are
         synchronized by ptp4l, not shown here):
      
         Receiver (sja1105):
      
         tc qdisc add dev swp2 clsact
         now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
                 sec=$(echo $now | awk -F. '{print $1}') && \
                 base_time="$(((sec + 2) * 1000000000))" && \
                 echo "base time ${base_time}"
         tc filter add dev swp2 ingress flower skip_sw \
                 dst_mac 42:be:24:9b:76:20 \
                 action gate base-time ${base_time} \
                 sched-entry OPEN  60000 -1 -1 \
                 sched-entry CLOSE 40000 -1 -1 \
                 action trap
      
         Sender (enetc):
         now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
                 sec=$(echo $now | awk -F. '{print $1}') && \
                 base_time="$(((sec + 2) * 1000000000))" && \
                 echo "base time ${base_time}"
         tc qdisc add dev eno0 parent root taprio \
                 num_tc 8 \
                 map 0 1 2 3 4 5 6 7 \
                 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
                 base-time ${base_time} \
                 sched-entry S 01  50000 \
                 sched-entry S 00  50000 \
                 flags 2
      
         ping -A 192.168.1.1
         PING 192.168.1.1 (192.168.1.1): 56 data bytes
         ...
         ^C
         --- 192.168.1.1 ping statistics ---
         1425 packets transmitted, 1424 packets received, 0% packet loss
         round-trip min/avg/max = 0.322/0.361/0.990 ms
      
         And just for comparison, with the tc-taprio schedule deleted:
      
         ping -A 192.168.1.1
         PING 192.168.1.1 (192.168.1.1): 56 data bytes
         ...
         ^C
         --- 192.168.1.1 ping statistics ---
         33 packets transmitted, 19 packets received, 42% packet loss
         round-trip min/avg/max = 0.336/0.464/0.597 ms
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      834f8933
    • V
      net: dsa: sja1105: support flow-based redirection via virtual links · dfacc5a2
      Vladimir Oltean 提交于
      Implement tc-flower offloads for redirect, trap and drop using
      non-critical virtual links.
      
      Commands which were tested to work are:
      
        # Send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the
        # CPU and to swp3. This type of key (DA only) when the port's VLAN
        # awareness state is off.
        tc qdisc add dev swp2 clsact
        tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \
                action mirred egress redirect dev swp3 \
                action trap
      
        # Drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID
        # of 100 and a PCP of 0.
        tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \
                dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop
      
      Under the hood, all rules match on DMAC, VID and PCP, but when VLAN
      filtering is disabled, those are set internally by the driver to the
      port-based defaults. Because we would be put in an awkward situation if
      the user were to change the VLAN filtering state while there are active
      rules (packets would no longer match on the specified keys), we simply
      deny changing vlan_filtering unless the list of flows offloaded via
      virtual links is empty. Then the user can re-add new rules.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfacc5a2
    • V
      net: dsa: sja1105: make room for virtual link parsing in flower offload · b70bb8d4
      Vladimir Oltean 提交于
      Virtual links are a sja1105 hardware concept of executing various flow
      actions based on a key extracted from the frame's DMAC, VID and PCP.
      
      Currently the tc-flower offload code supports only parsing the DMAC if
      that is the broadcast MAC address, and the VLAN PCP. Extract the key
      parsing logic from the L2 policers functionality and move it into its
      own function, after adding extra logic for matching on any DMAC and VID.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b70bb8d4
    • V
      net: dsa: sja1105: add static tables for virtual links · 94f94d4a
      Vladimir Oltean 提交于
      This patch adds the register definitions for the:
      - VL Lookup Table
      - VL Policing Table
      - VL Forwarding Table
      - VL Forwarding Parameters Table
      
      These are needed in order to perform TTEthernet operations: QoS
      classification, flow-based policing and/or frame redirecting with the
      switch.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94f94d4a
  2. 21 4月, 2020 1 次提交
    • V
      net: dsa: sja1105: enable internal pull-down for RX_DV/CRS_DV/RX_CTL and RX_ER · 135e3018
      Vladimir Oltean 提交于
      Some boards do not have the RX_ER MII signal connected. Normally in such
      situation, those pins would be grounded, but then again, some boards
      left it electrically floating.
      
      When sending traffic to those switch ports, one can see that the
      N_SOFERR statistics counter is incrementing once per each packet. The
      user manual states for this counter that it may count the number of
      frames "that have the MII error input being asserted prior to or
      up to the SOF delimiter byte". So the switch MAC is sampling an
      electrically floating signal, and preventing proper traffic reception
      because of that.
      
      As a workaround, enable the internal weak pull-downs on the input pads
      for the MII control signals. This way, a floating signal would be
      internally tied to ground.
      
      The logic levels of signals which _are_ externally driven should not be
      bothered by this 40-50 KOhm internal resistor. So it is not an issue to
      enable the internal pull-down unconditionally, irrespective of PHY
      interface type (MII, RMII, RGMII, SGMII) and of board layout.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      135e3018
  3. 31 3月, 2020 1 次提交
    • V
      net: dsa: sja1105: add broadcast and per-traffic class policers · a6af7763
      Vladimir Oltean 提交于
      This patch adds complete support for manipulating the L2 Policing Tables
      from this switch. There are 45 table entries, one entry per each port
      and traffic class, and one dedicated entry for broadcast traffic for
      each ingress port.
      
      Policing entries are shareable, and we use this functionality to support
      shared block filters.
      
      We are modeling broadcast policers as simple tc-flower matches on
      dst_mac. As for the traffic class policers, the switch only deduces the
      traffic class from the VLAN PCP field, so it makes sense to model this
      as a tc-flower match on vlan_prio.
      
      How to limit broadcast traffic coming from all front-panel ports to a
      cumulated total of 10 Mbit/s:
      
      tc qdisc add dev sw0p0 ingress_block 1 clsact
      tc qdisc add dev sw0p1 ingress_block 1 clsact
      tc qdisc add dev sw0p2 ingress_block 1 clsact
      tc qdisc add dev sw0p3 ingress_block 1 clsact
      tc filter add block 1 flower skip_sw dst_mac ff:ff:ff:ff:ff:ff \
      	action police rate 10mbit burst 64k
      
      How to limit traffic with VLAN PCP 0 (also includes untagged traffic) to
      100 Mbit/s on port 0 only:
      
      tc filter add dev sw0p0 ingress protocol 802.1Q flower skip_sw \
      	vlan_prio 0 action police rate 100mbit burst 64k
      
      The broadcast, VLAN PCP and port policers are compatible with one
      another (can be installed at the same time on a port).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6af7763
  4. 30 3月, 2020 1 次提交
  5. 28 3月, 2020 1 次提交
    • V
      net: dsa: sja1105: implement the port MTU callbacks · c279c726
      Vladimir Oltean 提交于
      On this switch, the frame length enforcements are performed by the
      ingress policers. There are 2 types of those: regular L2 (also called
      best-effort) and Virtual Link policers (an ARINC664/AFDX concept for
      defining L2 streams with certain QoS abilities). To avoid future
      confusion, I prefer to call the reset reason "Best-effort policers",
      even though the VL policers are not yet supported.
      
      We also need to change the setup of the initial static config, such that
      DSA calls to .change_mtu (which are expensive) become no-ops and don't
      reset the switch 5 times.
      
      A driver-level decision is to unconditionally allow single VLAN-tagged
      traffic on all ports. The CPU port must accept an additional VLAN header
      for the DSA tag, which is again a driver-level decision.
      
      The policers actually count bytes not only from the SDU, but also from
      the Ethernet header and FCS, so those need to be accounted for as well.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c279c726
  6. 24 3月, 2020 2 次提交
    • V
      net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT · 747e5eb3
      Vladimir Oltean 提交于
      The SJA1105 switch family has a PTP_CLK pin which emits a signal with
      fixed 50% duty cycle, but variable frequency and programmable start time.
      
      On the second generation (P/Q/R/S) switches, this pin supports even more
      functionality. The use case described by the hardware documents talks
      about synchronization via oneshot pulses: given 2 sja1105 switches,
      arbitrarily designated as a master and a slave, the master emits a
      single pulse on PTP_CLK, while the slave is configured to timestamp this
      pulse received on its PTP_CLK pin (which must obviously be configured as
      input). The difference between the timestamps then exactly becomes the
      slave offset to the master.
      
      The only trouble with the above is that the hardware is very much tied
      into this use case only, and not very generic beyond that:
       - When emitting a oneshot pulse, instead of being told when to emit it,
         the switch just does it "now" and tells you later what time it was,
         via the PTPSYNCTS register. [ Incidentally, this is the same register
         that the slave uses to collect the ext_ts timestamp from, too. ]
       - On the sync slave, there is no interrupt mechanism on reception of a
         new extts, and no FIFO to buffer them, because in the foreseen use
         case, software is in control of both the master and the slave pins,
         so it "knows" when there's something to collect.
      
      These 2 problems mean that:
       - We don't support (at least yet) the quirky oneshot mode exposed by
         the hardware, just normal periodic output.
       - We abuse the hardware a little bit when we expose generic extts.
         Because there's no interrupt mechanism, we need to poll at double the
         frequency we expect to receive a pulse. Currently that means a
         non-configurable "twice a second".
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      747e5eb3
    • V
      net: dsa: sja1105: make the AVB table dynamically reconfigurable · 0a7e984c
      Vladimir Oltean 提交于
      The AVB table contains the CAS_MASTER field (to be added in the next
      patch) which decides the direction of the PTP_CLK pin.
      
      Reconfiguring this field dynamically is highly preferable to having to
      reset the switch and upload a new static configuration, so we add
      support for exactly that.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a7e984c
  7. 20 3月, 2020 1 次提交
    • V
      net: dsa: sja1105: Add support for the SGMII port · ffe10e67
      Vladimir Oltean 提交于
      SJA1105 switches R and S have one SerDes port with an 802.3z
      quasi-compatible PCS, hardwired on port 4. The other ports are still
      MII/RMII/RGMII. The PCS performs rate adaptation to lower link speeds;
      the MAC on this port is hardwired at gigabit. Only full duplex is
      supported.
      
      The SGMII port can be configured as part of the static config tables, as
      well as through a dedicated SPI address region for its pseudo-clause-22
      registers. However it looks like the static configuration is not
      able to change some out-of-reset values (like the value of MII_BMCR), so
      at the end of the day, having code for it is utterly pointless. We are
      just going to use the pseudo-C22 interface.
      
      Because the PCS gets reset when the switch resets, we have to add even
      more restoration logic to sja1105_static_config_reload, otherwise the
      SGMII port breaks after operations such as enabling PTP timestamping
      which require a switch reset.
      
      >From PHYLINK perspective, the switch supports *only* SGMII (it doesn't
      support 1000Base-X). It also doesn't expose access to the raw config
      word for in-band AN in registers MII_ADV/MII_LPA.
      It is able to work in the following modes:
       - Forced speed
       - SGMII in-band AN slave (speed received from PHY)
       - SGMII in-band AN master (acting as a PHY)
      
      The latter mode is not supported by this patch. It is even unclear to me
      how that would be described. There is some code for it left in the
      patch, but 'an_master' is always passed as false.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffe10e67
  8. 15 11月, 2019 3 次提交
    • V
      net: dsa: sja1105: Simplify reset handling · abfb228a
      Vladimir Oltean 提交于
      We don't really need 10k species of reset. Remove everything except cold
      reset which is what is actually used. Too bad the hardware designers
      couldn't agree to use the same bit field for rev 1 and rev 2, so the
      (*reset_cmd) function pointer is there to stay.
      
      However let's simplify the prototype and give it a struct dsa_switch (we
      want to avoid forward-declarations of structures, in this case struct
      sja1105_private, wherever we can).
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abfb228a
    • V
      net: dsa: sja1105: Implement state machine for TAS with PTP clock source · 86db36a3
      Vladimir Oltean 提交于
      Tested using the following bash script and the tc from iproute2-next:
      
      	#!/bin/bash
      
      	set -e -u -o pipefail
      
      	NSEC_PER_SEC="1000000000"
      
      	gatemask() {
      		local tc_list="$1"
      		local mask=0
      
      		for tc in ${tc_list}; do
      			mask=$((${mask} | (1 << ${tc})))
      		done
      
      		printf "%02x" ${mask}
      	}
      
      	if ! systemctl is-active --quiet ptp4l; then
      		echo "Please start the ptp4l service"
      		exit
      	fi
      
      	now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
      	# Phase-align the base time to the start of the next second.
      	sec=$(echo "${now}" | gawk -F. '{ print $1; }')
      	base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
      
      	tc qdisc add dev swp5 parent root handle 100 taprio \
      		num_tc 8 \
      		map 0 1 2 3 5 6 7 \
      		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      		base-time ${base_time} \
      		sched-entry S $(gatemask 7) 100000 \
      		sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
      		clockid CLOCK_TAI flags 2
      
      The "state machine" is a workqueue invoked after each manipulation
      command on the PTP clock (reset, adjust time, set time, adjust
      frequency) which checks over the state of the time-aware scheduler.
      So it is not monitored periodically, only in reaction to a PTP command
      typically triggered from a userspace daemon (linuxptp). Otherwise there
      is no reason for things to go wrong.
      
      Now that the timecounter/cyclecounter has been replaced with hardware
      operations on the PTP clock, the TAS Kconfig now depends upon PTP and
      the standalone clocksource operating mode has been removed.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86db36a3
    • V
      net: dsa: sja1105: Make the PTP command read-write · 41603d78
      Vladimir Oltean 提交于
      The PTPSTRTSCH and PTPSTOPSCH bits are actually readable and indicate
      whether the time-aware scheduler is running or not. We will be using
      that for monitoring the scheduler in the next patch, so refactor the PTP
      command API in order to allow that.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41603d78
  9. 13 11月, 2019 1 次提交
  10. 12 11月, 2019 1 次提交
    • V
      net: dsa: sja1105: Implement the .gettimex64 system call for PTP · 34d76e9f
      Vladimir Oltean 提交于
      Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
      applications (i.e. phc2sys) to compensate for the delays incurred while
      reading the PHC's time.
      
      The task itself of taking the software timestamp is delegated to the SPI
      subsystem, through the newly introduced API in struct spi_transfer. The
      goal is to cross-timestamp I/O operations on the switch's PTP clock with
      values in the local system clock (CLOCK_REALTIME). For that we need to
      understand a bit of the hardware internals.
      
      The 'read PTP time' message is a 12 byte structure, first 4 bytes of
      which represent the SPI header, and the last 8 bytes represent the
      64-bit PTP time. The switch itself starts processing the command
      immediately after receiving the last bit of the address, i.e. at the
      middle of byte 3 (last byte of header). The PTP time is shadowed to a
      buffer register in the switch, and retrieved atomically during the
      subsequent SPI frames.
      
      A similar thing goes on for the 'write PTP time' message, although in
      that case the switch waits until the 64-bit PTP time becomes fully
      available before taking any action. So the byte that needs to be
      software-timestamped is byte 11 (last) of the transfer.
      
      The patch creates a common (and local) sja1105_xfer implementation for
      the SPI I/O, and offers 3 front-ends:
      
      - sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally
        requesting a PTP timestamp
      
      - sja1105_xfer_buf: this is for large transfers (e.g. the static config
        buffer) and other misc data, and there is no point in giving
        timestamping capabilities to this.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34d76e9f
  11. 19 10月, 2019 1 次提交
  12. 16 10月, 2019 2 次提交
    • N
      net: dsa: sja1105: Use the correct style for SPDX License Identifier · b790b554
      Nishad Kamdar 提交于
      This patch corrects the SPDX License Identifier style
      in header files related to Distributed Switch Architecture
      drivers for NXP SJA1105 series Ethernet switch support.
      It uses an expilict block comment for the SPDX License
      Identifier.
      
      Changes made by using a script provided by Joe Perches here:
      https://lkml.org/lkml/2019/2/7/46.
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NNishad Kamdar <nishadkamdar@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b790b554
    • V
      net: dsa: sja1105: Switch to scatter/gather API for SPI · 08839c06
      Vladimir Oltean 提交于
      This reworks the SPI transfer implementation to make use of more of the
      SPI core features. The main benefit is to avoid the memcpy in
      sja1105_xfer_buf().
      
      The memcpy was only needed because the function was transferring a
      single buffer at a time. So it needed to copy the caller-provided buffer
      at buf + 4, to store the SPI message header in the "headroom" area.
      
      But the SPI core supports scatter-gather messages, comprised of multiple
      transfers. We can actually use those to break apart every SPI message
      into 2 transfers: one for the header and one for the actual payload.
      
      To keep the behavior the same regarding the chip select signal, it is
      necessary to tell the SPI core to de-assert the chip select after each
      chunk. This was not needed before, because each spi_message contained
      only 1 single transfer.
      
      The meaning of the per-transfer cs_change=1 is:
      
      - If the transfer is the last one of the message, keep CS asserted
      - Otherwise, deassert CS
      
      We need to deassert CS in the "otherwise" case, which was implicit
      before.
      
      Avoiding the memcpy creates yet another opportunity. The device can't
      process more than 256 bytes of SPI payload at a time, so the
      sja1105_xfer_long_buf() function used to exist, to split the larger
      caller buffer into chunks.
      
      But these chunks couldn't be used as scatter/gather buffers for
      spi_message until now, because of that memcpy (we would have needed more
      memory for each chunk). So we can now remove the sja1105_xfer_long_buf()
      function and have a single implementation for long and short buffers.
      
      Another benefit is lower usage of stack memory. Previously we had to
      store 2 SPI buffers for each chunk. Due to the elimination of the
      memcpy, we can now send pointers to the actual chunks from the
      caller-supplied buffer to the SPI core.
      
      Since the patch merges two functions into a rewritten implementation,
      the function prototype was also changed, mainly for cosmetic consistency
      with the structures used within it.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08839c06
  13. 15 10月, 2019 3 次提交
    • V
      net: dsa: sja1105: Change the PTP command access pattern · 66427778
      Vladimir Oltean 提交于
      The PTP command register contains enable bits for:
      - Putting the 64-bit PTPCLKVAL register in add/subtract or write mode
      - Taking timestamps off of the corrected vs free-running clock
      - Starting/stopping the TTEthernet scheduling
      - Starting/stopping PPS output
      - Resetting the switch
      
      When a command needs to be issued (e.g. "change the PTPCLKVAL from write
      mode to add/subtract mode"), one cannot simply write to the command
      register setting the PTPCLKADD bit to 1, because that would zeroize the
      other settings. One also cannot do a read-modify-write (that would be
      too easy for this hardware) because not all bits of the command register
      are readable over SPI.
      
      So this leaves us with the only option of keeping the value of the PTP
      command register in the driver, and operating on that.
      
      Actually there are 2 types of PTP operations now:
      - Operations that modify the cached PTP command. These operate on
        ptp_data->cmd as a pointer.
      - Operations that apply all previously cached PTP settings, but don't
        otherwise cache what they did themselves. The sja1105_ptp_reset
        function is such an example. It copies the ptp_data->cmd on stack
        before modifying and writing it to SPI.
      
      This practically means that struct sja1105_ptp_cmd is no longer an
      implementation detail, since it needs to be stored in full into struct
      sja1105_ptp_data, and hence in struct sja1105_private. So the (*ptp_cmd)
      function prototype can change and take struct sja1105_ptp_cmd as second
      argument now.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66427778
    • V
      net: dsa: sja1105: Move PTP data to its own private structure · a9d6ed7a
      Vladimir Oltean 提交于
      This is a non-functional change with 2 goals (both for the case when
      CONFIG_NET_DSA_SJA1105_PTP is not enabled):
      
      - Reduce the size of the sja1105_private structure.
      - Make the PTP code more self-contained.
      
      Leaving priv->ptp_data.lock to be initialized in sja1105_main.c is not a
      leftover: it will be used in a future patch "net: dsa: sja1105: Restore
      PTP time after switch reset".
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9d6ed7a
    • V
      net: dsa: sja1105: Make all public PTP functions take dsa_switch as argument · 61c77126
      Vladimir Oltean 提交于
      The new rule (as already started for sja1105_tas.h) is for functions of
      optional driver components (ones which may be disabled via Kconfig - PTP
      and TAS) to take struct dsa_switch *ds instead of struct sja1105_private
      *priv as first argument.
      
      This is so that forward-declarations of struct sja1105_private can be
      avoided.
      
      So make sja1105_ptp.h the second user of this rule.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c77126
  14. 03 10月, 2019 2 次提交
  15. 17 9月, 2019 1 次提交
    • V
      net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload · 317ab5b8
      Vladimir Oltean 提交于
      This qdisc offload is the closest thing to what the SJA1105 supports in
      hardware for time-based egress shaping. The switch core really is built
      around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
      operate similarly to IEEE 802.1Qbv with some constraints:
      
      - The gate control list is a global list for all ports. There are 8
        execution threads that iterate through this global list in parallel.
        I don't know why 8, there are only 4 front-panel ports.
      
      - Care must be taken by the user to make sure that two execution threads
        never get to execute a GCL entry simultaneously. I created a O(n^4)
        checker for this hardware limitation, prior to accepting a taprio
        offload configuration as valid.
      
      - The spec says that if a GCL entry's interval is shorter than the frame
        length, you shouldn't send it (and end up in head-of-line blocking).
        Well, this switch does anyway.
      
      - The switch has no concept of ADMIN and OPER configurations. Because
        it's so simple, the TAS settings are loaded through the static config
        tables interface, so there isn't even place for any discussion about
        'graceful switchover between ADMIN and OPER'. You just reset the
        switch and upload a new OPER config.
      
      - The switch accepts multiple time sources for the gate events. Right
        now I am using the standalone clock source as opposed to PTP. So the
        base time parameter doesn't really do much. Support for the PTP clock
        source will be added in a future series.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      317ab5b8
  16. 10 6月, 2019 3 次提交
  17. 09 6月, 2019 3 次提交
    • V
      net: dsa: sja1105: Add a global sja1105_tagger_data structure · 844d7edc
      Vladimir Oltean 提交于
      This will be used to keep state for RX timestamping. It is global
      because the switch serializes timestampable and meta frames when
      trapping them towards the CPU port (lower port indices have higher
      priority) and therefore having one state machine per port would create
      unnecessary complications.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      844d7edc
    • V
      net: dsa: sja1105: Add logic for TX timestamping · 47ed985e
      Vladimir Oltean 提交于
      On TX, timestamping is performed synchronously from the
      port_deferred_xmit worker thread.
      In management routes, the switch is requested to take egress timestamps
      (again partial), which are reconstructed and appended to a clone of the
      skb that was just sent.  The cloning is done by DSA and we retrieve the
      pointer from the structure that DSA keeps in skb->cb.
      Then these clones are enqueued to the socket's error queue for
      application-level processing.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47ed985e
    • V
      net: dsa: sja1105: Add support for the PTP clock · bb77f36a
      Vladimir Oltean 提交于
      The design of this PHC driver is influenced by the switch's behavior
      w.r.t. timestamping.  It exposes two PTP counters, one free-running
      (PTPTSCLK) and the other offset- and frequency-corrected in hardware
      through PTPCLKVAL, PTPCLKADD and PTPCLKRATE.  The MACs can sample either
      of these for frame timestamps.
      
      However, the user manual warns that taking timestamps based on the
      corrected clock is less than useful, as the switch can deliver corrupted
      timestamps in a variety of circumstances.
      
      Therefore, this PHC uses the free-running PTPTSCLK together with a
      timecounter/cyclecounter structure that translates it into a software
      time domain.  Thus, the settime/adjtime and adjfine callbacks are
      hardware no-ops.
      
      The timestamps (introduced in a further patch) will also be translated
      to the correct time domain before being handed over to the userspace PTP
      stack.
      
      The introduction of a second set of PHC operations that operate on the
      hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat
      unavoidable, as the TTEthernet core uses the corrected PTP time domain.
      However, the free-running counter + timecounter structure combination
      will suffice for now, as the resulting timestamps yield a sub-50 ns
      synchronization offset in steady state using linuxptp.
      
      For this patch, in absence of frame timestamping, the operations of the
      switch PHC were tested by syncing it to the system time as a local slave
      clock with:
      
      phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb77f36a
  18. 05 6月, 2019 2 次提交
  19. 06 5月, 2019 1 次提交
    • V
      net: dsa: sja1105: Add support for traffic through standalone ports · 227d07a0
      Vladimir Oltean 提交于
      In order to support this, we are creating a make-shift switch tag out of
      a VLAN trunk configured on the CPU port. Termination of normal traffic
      on switch ports only works when not under a vlan_filtering bridge.
      Termination of management (PTP, BPDU) traffic works under all
      circumstances because it uses a different tagging mechanism
      (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q
      code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105.
      
      There are two types of traffic: regular and link-local.
      
      The link-local traffic received on the CPU port is trapped from the
      switch's regular forwarding decisions because it matched one of the two
      DMAC filters for management traffic.
      
      On transmission, the switch requires special massaging for these
      link-local frames. Due to a weird implementation of the switching IP, by
      default it drops link-local frames that originate on the CPU port.
      It needs to be told where to forward them to, through an SPI command
      ("management route") that is valid for only a single frame.
      So when we're sending link-local traffic, we are using the
      dsa_defer_xmit mechanism.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      227d07a0
  20. 03 5月, 2019 6 次提交
    • V
      net: dsa: sja1105: Prevent PHY jabbering during switch reset · 1a4c6940
      Vladimir Oltean 提交于
      Resetting the switch at runtime is currently done while changing the
      vlan_filtering setting (due to the required TPID change).
      
      But reset is asynchronous with packet egress, and the switch core will
      not wait for egress to finish before carrying on with the reset
      operation.
      
      As a result, a connected PHY such as the BCM5464 would see an
      unterminated Ethernet frame and start to jabber (repeat the last seen
      Ethernet symbols - jabber is by definition an oversized Ethernet frame
      with bad FCS). This behavior is strange in itself, but it also causes
      the MACs of some link partners (such as the FRDM-LS1012A) to completely
      lock up.
      
      So as a remedy for this situation, when switch reset is required, simply
      inhibit Tx on all ports, and wait for the necessary time for the
      eventual one frame left in the egress queue (not even the Tx inhibit
      command is instantaneous) to be flushed.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a4c6940
    • V
      net: dsa: sja1105: Add support for configuring address ageing time · 8456721d
      Vladimir Oltean 提交于
      If STP is active, this setting is applied on bridged ports each time an
      Ethernet link is established (topology changes).
      
      Since the setting is global to the switch and a reset is required to
      change it, resets are prevented if the new callback does not change the
      value that the hardware already is programmed for.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8456721d
    • V
    • V
      net: dsa: sja1105: Error out if RGMII delays are requested in DT · f5b8631c
      Vladimir Oltean 提交于
      Documentation/devicetree/bindings/net/ethernet.txt is confusing because
      it says what the MAC should not do, but not what it *should* do:
      
        * "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC
           should not add an RX delay in this case)
      
      The gap in semantics is threefold:
      1. Is it illegal for the MAC to apply the Rx internal delay by itself,
         and simplify the phy_mode (mask off "rgmii-rxid" into "rgmii") before
         passing it to of_phy_connect? The documentation would suggest yes.
      1. For "rgmii-rxid", while the situation with the Rx clock skew is more
         or less clear (needs to be added by the PHY), what should the MAC
         driver do about the Tx delays? Is it an implicit wild card for the
         MAC to apply delays in the Tx direction if it can? What if those were
         already added as serpentine PCB traces, how could that be made more
         obvious through DT bindings so that the MAC doesn't attempt to add
         them twice and again potentially break the link?
      3. If the interface is a fixed-link and therefore the PHY object is
         fixed (a purely software entity that obviously cannot add clock
         skew), what is the meaning of the above property?
      
      So an interpretation of the RGMII bindings was chosen that hopefully
      does not contradict their intention but also makes them more applied.
      The SJA1105 driver understands to act upon "rgmii-*id" phy-mode bindings
      if the port is in the PHY role (either explicitly, or if it is a
      fixed-link). Otherwise it always passes the duty of setting up delays to
      the PHY driver.
      
      The error behavior that this patch adds is required on SJA1105E/T where
      the MAC really cannot apply internal delays. If the other end of the
      fixed-link cannot apply RGMII delays either (this would be specified
      through its own DT bindings), then the situation requires PCB delays.
      
      For SJA1105P/Q/R/S, this is however hardware supported and the error is
      thus only temporary. I created a stub function pointer for configuring
      delays per-port on RXC and TXC, and will implement it when I have access
      to a board with this hardware setup.
      
      Meanwhile do not allow the user to select an invalid configuration.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5b8631c
    • V
      net: dsa: sja1105: Add support for FDB and MDB management · 291d1e72
      Vladimir Oltean 提交于
      Currently only the (more difficult) first generation E/T series is
      supported. Here the TCAM is only 4-way associative, and to know where
      the hardware will search for a FDB entry, we need to perform the same
      hash algorithm in order to install the entry in the correct bin.
      
      On P/Q/R/S, the TCAM should be fully associative. However the SPI
      command interface is different, and because I don't have access to a
      new-generation device at the moment, support for it is TODO.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      291d1e72
    • V
      net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch · 8aa9ebcc
      Vladimir Oltean 提交于
      At this moment the following is supported:
      * Link state management through phylib
      * Autonomous L2 forwarding managed through iproute2 bridge commands.
      
      IP termination must be done currently through the master netdevice,
      since the switch is unmanaged at this point and using
      DSA_TAG_PROTO_NONE.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NGeorg Waibel <georg.waibel@sensor-technik.de>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8aa9ebcc