1. 25 5月, 2021 1 次提交
  2. 22 5月, 2021 4 次提交
    • V
      net: dsa: sja1105: don't use burst SPI reads for port statistics · 039b167d
      Vladimir Oltean 提交于
      The current internal sja1105 driver API is optimized for retrieving many
      statistics counters at once. But the switch does not do atomic snapshotting
      for them anyway.
      
      In case we start reporting the hardware port counters through
      ndo_get_stats64 as well, not just ethtool, it would be good to be able
      to read individual port counters and not all of them.
      
      Additionally, since Arnd Bergmann's commit ae1804de ("dsa: sja1105:
      dynamically allocate stats structure"), sja1105_get_ethtool_stats
      allocates memory dynamically, since struct sja1105_port_status was
      deemed to consume too much stack memory. That is not ideal.
      The large structure is only needed because of the burst read.
      If we read statistics one by one, we can consume less memory, and
      we can avoid dynamic allocation.
      
      Additionally, latency-sensitive interfaces such as PTP operations (for
      phc2sys) might suffer if the SPI mutex is being held for too long, which
      happens in the case of SPI burst reads. By reading counters one by one,
      we give a chance for higher priority processes to preempt and take the
      SPI bus mutex for accessing the PTP clock.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      039b167d
    • V
      net: dsa: sja1105: stop reporting the queue levels in ethtool port counters · 30a2e9c0
      Vladimir Oltean 提交于
      The queue levels are not counters, but instead they represent the
      occupancy of the MAC TX queues. Having these in ethtool port counters is
      not helpful, so remove them.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30a2e9c0
    • V
      net: dsa: sja1105: adapt to a SPI controller with a limited max transfer size · 718bad0e
      Vladimir Oltean 提交于
      The static config of the sja1105 switch is a long stream of bytes which
      is programmed to the hardware in chunks (portions with the chip select
      continuously asserted) of max 256 bytes each. Each chunk is a
      spi_message composed of 2 spi_transfers: the buffer with the data and a
      preceding buffer with the SPI access header.
      
      Only that certain SPI controllers, such as the spi-sc18is602 I2C-to-SPI
      bridge, cannot keep the chip select asserted for that long.
      The spi_max_transfer_size() and spi_max_message_size() functions are how
      the controller can impose its hardware limitations upon the SPI
      peripheral driver.
      
      For the sja1105 driver to work with these controllers, both buffers must
      be smaller than the transfer limit, and their sum must be smaller than
      the message limit.
      
      Regression-tested on a switch connected to a controller with no
      limitations (spi-fsl-dspi) as well as with one with caps for both
      max_transfer_size and max_message_size (spi-sc18is602).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      718bad0e
    • V
      net: dsa: sja1105: send multiple spi_messages instead of using cs_change · ca021f0d
      Vladimir Oltean 提交于
      The sja1105 driver has been described by Mark Brown as "not using the
      [ SPI ] API at all idiomatically" due to the use of cs_change:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210520135031.2969183-1-olteanv@gmail.com/
      
      According to include/linux/spi/spi.h, the chip select is supposed to be
      asserted for the entire length of a SPI message, as long as cs_change is
      false for all member transfers. The cs_change flag changes the following:
      
      (i) When a non-final SPI transfer has cs_change = true, the chip select
          should temporarily deassert and then reassert starting with the next
          transfer.
      (ii) When a final SPI transfer has cs_change = true, the chip select
           should remain asserted until the following SPI message.
      
      The sja1105 driver only uses cs_change for its first property, to form a
      single SPI message whose layout can be seen below:
      
                                                   this is an entire, single spi_message
                 _______________________________________________________________________________________________
                /                                                                                               \
                +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
                | hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] |     | hdr_xfer[n] | chunk_xfer[n] |
                +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
      cs_change      false          true           false           true                false          false
      
                 ____________________________  _____________________________       _____________________________
      CS line __/                            \/                             \ ... /                             \__
      
      The fact of the matter is that spi_max_message_size() has an ambiguous
      meaning if any non-final transfer has cs_change = true.
      
      If the SPI master has a limitation in that it cannot keep the chip
      select asserted for more than, say, 200 bytes (like the spi-sc18is602),
      the normal thing for it to do is to implement .max_transfer_size and
      .max_message_size, and limit both to 200: in the "worst case" where
      cs_change is always false, then the controller can, indeed, not send
      messages larger than 200 bytes.
      
      But the fact that the SPI controller's max_message_size does not
      necessarily mean that we cannot send messages larger than that.
      Notably, if the SPI master special-cases the transfers with cs_change
      and treats every chip select toggling as an entirely new transaction,
      then a SPI message can easily exceed that limit. So there is a
      temptation to ignore the controller's reported max_message_size when
      using cs_change = true in non-final transfers.
      
      But that can lead to false conclusions. As Mark points out, the SPI
      controller might have a different kind of limitation with the max
      message size, that has nothing at all to do with how long it can keep
      the chip select asserted.
      For example, that might be the case if the device is able to offload the
      chip select changes to the hardware as part of the data stream, and it
      packs the entire stream of commands+data (corresponding to a SPI
      message) into a single DMA transfer that is itself limited in size.
      
      So the only thing we can do is avoid ambiguity by not using cs_change at
      all. Instead of sending a single spi_message, we now send multiple SPI
      messages as follows:
      
                        spi_message 0                 spi_message 1                       spi_message n
                 ____________________________   ___________________________        _____________________________
                /                            \ /                           \      /                             \
                +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
                | hdr_xfer[0] | chunk_xfer[0] | hdr_xfer[1] | chunk_xfer[1] |     | hdr_xfer[n] | chunk_xfer[n] |
                +-------------+---------------+-------------+---------------+ ... +-------------+---------------+
      cs_change      false          true           false           true                false          false
      
                 ____________________________  _____________________________       _____________________________
      CS line __/                            \/                             \ ... /                             \__
      
      which is clearer because the max_message_size limit is now easier to
      enforce. What is transmitted on the wire stays, of course, the same.
      
      Additionally, because we send no more than 2 transfers at a time, we now
      avoid dynamic memory allocation too, which might be seen as an
      improvement by some.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca021f0d
  3. 13 2月, 2021 1 次提交
    • V
      net: dsa: sja1105: offload bridge port flags to device · 4d942354
      Vladimir Oltean 提交于
      The chip can configure unicast flooding, broadcast flooding and learning.
      Learning is per port, while flooding is per {ingress, egress} port pair
      and we need to configure the same value for all possible ingress ports
      towards the requested one.
      
      While multicast flooding is not officially supported, we can hack it by
      using a feature of the second generation (P/Q/R/S) devices, which is that
      FDB entries are maskable, and multicast addresses always have an odd
      first octet. So by putting a match-all for 00:01:00:00:00:00 addr and
      00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is
      always checked last, and does not take precedence in front of any other
      MDB. So it behaves effectively as an unknown multicast entry.
      
      For the first generation switches, this feature is not available, so
      unknown multicast will always be treated the same as unknown unicast.
      So the only thing we can do is request the user to offload the settings
      for these 2 flags in tandem, i.e.
      
      ip link set swp2 type bridge_slave flood off
      Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
      ip link set swp2 type bridge_slave flood off mcast_flood off
      ip link set swp2 type bridge_slave mcast_flood on
      Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d942354
  4. 26 9月, 2020 1 次提交
  5. 23 6月, 2020 1 次提交
  6. 29 5月, 2020 1 次提交
    • V
      net: dsa: sja1105: offload the Credit-Based Shaper qdisc · 4d752508
      Vladimir Oltean 提交于
      SJA1105, being AVB/TSN switches, provide hardware assist for the
      Credit-Based Shaper as described in the IEEE 8021Q-2018 document.
      
      First generation has 10 shapers, freely assignable to any of the 4
      external ports and 8 traffic classes, and second generation has 16
      shapers.
      
      The Credit-Based Shaper tables are accessed through the dynamic
      reconfiguration interface, so we have to restore them manually after a
      switch reset. The tables are backed up by the static config only on
      P/Q/R/S, and we don't want to add custom code only for that family,
      since the procedure that is in place now works for both.
      
      Tested with the following commands:
      
      data_rate_kbps=67000
      port_transmit_rate_kbps=1000000
      idleslope=$data_rate_kbps
      sendslope=$(($idleslope - $port_transmit_rate_kbps))
      locredit=$((-0x80000000))
      hicredit=$((0x7fffffff))
      tc qdisc add dev swp2 root handle 1: mqprio hw 0 num_tc 8 \
              map 0 1 2 3 4 5 6 7 \
              queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7
      tc qdisc replace dev swp2 parent 1:1 cbs \
              idleslope $idleslope \
              sendslope $sendslope \
              hicredit $hicredit \
              locredit $locredit \
              offload 1
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d752508
  7. 13 5月, 2020 1 次提交
    • V
      net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously · 38b5beea
      Vladimir Oltean 提交于
      In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of
      0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering
      mode, it needs to work with normal VLAN TPID values.
      
      A complication arises when we must transmit a VLAN-tagged packet to the
      switch when it's in VLAN-aware mode. We need to construct a packet with
      2 VLAN tags, and the switch will use the outer header for routing and
      pop it on egress. But sadly, here the 2 hardware generations don't
      behave the same:
      
      - E/T switches won't pop an ETH_P_8021AD tag on egress, it seems
        (packets will remain double-tagged).
      - P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks
        like it tries to prevent VLAN hopping).
      
      But looks like the reverse is also true:
      
      - E/T switches have no problem popping the outer tag from packets with
        2 ETH_P_8021Q tags.
      - P/Q/R/S will have no problem popping a single tag even if that is
        ETH_P_8021AD.
      
      So it is clear that if we want the hardware to work with dsa_8021q
      tagging in VLAN-aware mode, we need to send different TPIDs depending on
      revision. Keep that information in priv->info->qinq_tpid.
      
      The per-port tagger structure will hold an xmit_tpid value that depends
      not only upon the qinq_tpid, but also upon the VLAN awareness state
      itself (in case we must transmit using 0xdadb).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38b5beea
  8. 08 5月, 2020 1 次提交
    • V
      net: dsa: sja1105: implement tc-gate using time-triggered virtual links · 834f8933
      Vladimir Oltean 提交于
      Restrict the TTEthernet hardware support on this switch to operate as
      closely as possible to IEEE 802.1Qci as possible. This means that it can
      perform PTP-time-based ingress admission control on streams identified
      by {DMAC, VID, PCP}, which is useful when trying to ensure the
      determinism of traffic scheduled via IEEE 802.1Qbv.
      
      The oddity comes from the fact that in hardware (and in TTEthernet at
      large), virtual links always need a full-blown action, including not
      only the type of policing, but also the list of destination ports. So in
      practice, a single tc-gate action will result in all packets getting
      dropped. Additional actions (either "trap" or "redirect") need to be
      specified in the same filter rule such that the conforming packets are
      actually forwarded somewhere.
      
      Apart from the VL Lookup, Policing and Forwarding tables which need to
      be programmed for each flow (virtual link), the Schedule engine also
      needs to be told to open/close the admission gates for each individual
      virtual link. A fairly accurate (and detailed) description of how that
      works is already present in sja1105_tas.c, since it is already used to
      trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
      point here, we remember that the schedule engine supports 8
      "subschedules" (execution threads that iterate through the global
      schedule in parallel, and that no 2 hardware threads must execute a
      schedule entry at the same time). For tc-taprio, each egress port used
      one of these 8 subschedules, leaving a total of 4 subschedules unused.
      In principle we could have allocated 1 subschedule for the tc-gate
      offload of each ingress port, but actually the schedules of all virtual
      links installed on each ingress port would have needed to be merged
      together, before they could have been programmed to hardware. So
      simplify our life and just merge the entire tc-gate configuration, for
      all virtual links on all ingress ports, into a single subschedule. Be
      sure to check that against the usual hardware scheduling conflicts, and
      program it to hardware alongside any tc-taprio subschedule that may be
      present.
      
      The following scenarios were tested:
      
      1. Quantitative testing:
      
         tc qdisc add dev swp2 clsact
         tc filter add dev swp2 ingress flower skip_sw \
                 dst_mac 42:be:24:9b:76:20 \
                 action gate index 1 base-time 0 \
                 sched-entry OPEN 1200 -1 -1 \
                 sched-entry CLOSE 1200 -1 -1 \
                 action trap
      
         ping 192.168.1.2 -f
         PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
         .............................
         --- 192.168.1.2 ping statistics ---
         948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
      
      2. Qualitative testing (with a phase-aligned schedule - the clocks are
         synchronized by ptp4l, not shown here):
      
         Receiver (sja1105):
      
         tc qdisc add dev swp2 clsact
         now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
                 sec=$(echo $now | awk -F. '{print $1}') && \
                 base_time="$(((sec + 2) * 1000000000))" && \
                 echo "base time ${base_time}"
         tc filter add dev swp2 ingress flower skip_sw \
                 dst_mac 42:be:24:9b:76:20 \
                 action gate base-time ${base_time} \
                 sched-entry OPEN  60000 -1 -1 \
                 sched-entry CLOSE 40000 -1 -1 \
                 action trap
      
         Sender (enetc):
         now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
                 sec=$(echo $now | awk -F. '{print $1}') && \
                 base_time="$(((sec + 2) * 1000000000))" && \
                 echo "base time ${base_time}"
         tc qdisc add dev eno0 parent root taprio \
                 num_tc 8 \
                 map 0 1 2 3 4 5 6 7 \
                 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
                 base-time ${base_time} \
                 sched-entry S 01  50000 \
                 sched-entry S 00  50000 \
                 flags 2
      
         ping -A 192.168.1.1
         PING 192.168.1.1 (192.168.1.1): 56 data bytes
         ...
         ^C
         --- 192.168.1.1 ping statistics ---
         1425 packets transmitted, 1424 packets received, 0% packet loss
         round-trip min/avg/max = 0.322/0.361/0.990 ms
      
         And just for comparison, with the tc-taprio schedule deleted:
      
         ping -A 192.168.1.1
         PING 192.168.1.1 (192.168.1.1): 56 data bytes
         ...
         ^C
         --- 192.168.1.1 ping statistics ---
         33 packets transmitted, 19 packets received, 42% packet loss
         round-trip min/avg/max = 0.336/0.464/0.597 ms
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      834f8933
  9. 21 4月, 2020 1 次提交
    • V
      net: dsa: sja1105: enable internal pull-down for RX_DV/CRS_DV/RX_CTL and RX_ER · 135e3018
      Vladimir Oltean 提交于
      Some boards do not have the RX_ER MII signal connected. Normally in such
      situation, those pins would be grounded, but then again, some boards
      left it electrically floating.
      
      When sending traffic to those switch ports, one can see that the
      N_SOFERR statistics counter is incrementing once per each packet. The
      user manual states for this counter that it may count the number of
      frames "that have the MII error input being asserted prior to or
      up to the SOF delimiter byte". So the switch MAC is sampling an
      electrically floating signal, and preventing proper traffic reception
      because of that.
      
      As a workaround, enable the internal weak pull-downs on the input pads
      for the MII control signals. This way, a floating signal would be
      internally tied to ground.
      
      The logic levels of signals which _are_ externally driven should not be
      bothered by this 40-50 KOhm internal resistor. So it is not an issue to
      enable the internal pull-down unconditionally, irrespective of PHY
      interface type (MII, RMII, RGMII, SGMII) and of board layout.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      135e3018
  10. 30 3月, 2020 1 次提交
  11. 24 3月, 2020 1 次提交
    • V
      net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT · 747e5eb3
      Vladimir Oltean 提交于
      The SJA1105 switch family has a PTP_CLK pin which emits a signal with
      fixed 50% duty cycle, but variable frequency and programmable start time.
      
      On the second generation (P/Q/R/S) switches, this pin supports even more
      functionality. The use case described by the hardware documents talks
      about synchronization via oneshot pulses: given 2 sja1105 switches,
      arbitrarily designated as a master and a slave, the master emits a
      single pulse on PTP_CLK, while the slave is configured to timestamp this
      pulse received on its PTP_CLK pin (which must obviously be configured as
      input). The difference between the timestamps then exactly becomes the
      slave offset to the master.
      
      The only trouble with the above is that the hardware is very much tied
      into this use case only, and not very generic beyond that:
       - When emitting a oneshot pulse, instead of being told when to emit it,
         the switch just does it "now" and tells you later what time it was,
         via the PTPSYNCTS register. [ Incidentally, this is the same register
         that the slave uses to collect the ext_ts timestamp from, too. ]
       - On the sync slave, there is no interrupt mechanism on reception of a
         new extts, and no FIFO to buffer them, because in the foreseen use
         case, software is in control of both the master and the slave pins,
         so it "knows" when there's something to collect.
      
      These 2 problems mean that:
       - We don't support (at least yet) the quirky oneshot mode exposed by
         the hardware, just normal periodic output.
       - We abuse the hardware a little bit when we expose generic extts.
         Because there's no interrupt mechanism, we need to poll at double the
         frequency we expect to receive a pulse. Currently that means a
         non-configurable "twice a second".
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      747e5eb3
  12. 20 3月, 2020 1 次提交
    • V
      net: dsa: sja1105: Add support for the SGMII port · ffe10e67
      Vladimir Oltean 提交于
      SJA1105 switches R and S have one SerDes port with an 802.3z
      quasi-compatible PCS, hardwired on port 4. The other ports are still
      MII/RMII/RGMII. The PCS performs rate adaptation to lower link speeds;
      the MAC on this port is hardwired at gigabit. Only full duplex is
      supported.
      
      The SGMII port can be configured as part of the static config tables, as
      well as through a dedicated SPI address region for its pseudo-clause-22
      registers. However it looks like the static configuration is not
      able to change some out-of-reset values (like the value of MII_BMCR), so
      at the end of the day, having code for it is utterly pointless. We are
      just going to use the pseudo-C22 interface.
      
      Because the PCS gets reset when the switch resets, we have to add even
      more restoration logic to sja1105_static_config_reload, otherwise the
      SGMII port breaks after operations such as enabling PTP timestamping
      which require a switch reset.
      
      >From PHYLINK perspective, the switch supports *only* SGMII (it doesn't
      support 1000Base-X). It also doesn't expose access to the raw config
      word for in-band AN in registers MII_ADV/MII_LPA.
      It is able to work in the following modes:
       - Forced speed
       - SGMII in-band AN slave (speed received from PHY)
       - SGMII in-band AN master (acting as a PHY)
      
      The latter mode is not supported by this patch. It is even unclear to me
      how that would be described. There is some code for it left in the
      patch, but 'an_master' is always passed as false.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffe10e67
  13. 15 11月, 2019 3 次提交
    • V
      net: dsa: sja1105: Simplify reset handling · abfb228a
      Vladimir Oltean 提交于
      We don't really need 10k species of reset. Remove everything except cold
      reset which is what is actually used. Too bad the hardware designers
      couldn't agree to use the same bit field for rev 1 and rev 2, so the
      (*reset_cmd) function pointer is there to stay.
      
      However let's simplify the prototype and give it a struct dsa_switch (we
      want to avoid forward-declarations of structures, in this case struct
      sja1105_private, wherever we can).
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abfb228a
    • V
      net: dsa: sja1105: Implement state machine for TAS with PTP clock source · 86db36a3
      Vladimir Oltean 提交于
      Tested using the following bash script and the tc from iproute2-next:
      
      	#!/bin/bash
      
      	set -e -u -o pipefail
      
      	NSEC_PER_SEC="1000000000"
      
      	gatemask() {
      		local tc_list="$1"
      		local mask=0
      
      		for tc in ${tc_list}; do
      			mask=$((${mask} | (1 << ${tc})))
      		done
      
      		printf "%02x" ${mask}
      	}
      
      	if ! systemctl is-active --quiet ptp4l; then
      		echo "Please start the ptp4l service"
      		exit
      	fi
      
      	now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
      	# Phase-align the base time to the start of the next second.
      	sec=$(echo "${now}" | gawk -F. '{ print $1; }')
      	base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
      
      	tc qdisc add dev swp5 parent root handle 100 taprio \
      		num_tc 8 \
      		map 0 1 2 3 5 6 7 \
      		queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      		base-time ${base_time} \
      		sched-entry S $(gatemask 7) 100000 \
      		sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
      		clockid CLOCK_TAI flags 2
      
      The "state machine" is a workqueue invoked after each manipulation
      command on the PTP clock (reset, adjust time, set time, adjust
      frequency) which checks over the state of the time-aware scheduler.
      So it is not monitored periodically, only in reaction to a PTP command
      typically triggered from a userspace daemon (linuxptp). Otherwise there
      is no reason for things to go wrong.
      
      Now that the timecounter/cyclecounter has been replaced with hardware
      operations on the PTP clock, the TAS Kconfig now depends upon PTP and
      the standalone clocksource operating mode has been removed.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86db36a3
    • V
      net: dsa: sja1105: Make the PTP command read-write · 41603d78
      Vladimir Oltean 提交于
      The PTPSTRTSCH and PTPSTOPSCH bits are actually readable and indicate
      whether the time-aware scheduler is running or not. We will be using
      that for monitoring the scheduler in the next patch, so refactor the PTP
      command API in order to allow that.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41603d78
  14. 12 11月, 2019 2 次提交
    • V
      net: dsa: sja1105: Restore PTP time after switch reset · 6cf99c13
      Vladimir Oltean 提交于
      The PTP time of the switch is not preserved when uploading a new static
      configuration. Work around this hardware oddity by reading its PTP time
      before a static config upload, and restoring it afterwards.
      
      Static config changes are expected to occur at runtime even in scenarios
      directly related to PTP, i.e. the Time-Aware Scheduler of the switch is
      programmed in this way.
      
      Perhaps the larger implication of this patch is that the PTP .gettimex64
      and .settime functions need to be exposed to sja1105_main.c, where the
      PTP lock needs to be held during this entire process. So their core
      implementation needs to move to some common functions which get exposed
      in sja1105_ptp.h.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cf99c13
    • V
      net: dsa: sja1105: Implement the .gettimex64 system call for PTP · 34d76e9f
      Vladimir Oltean 提交于
      Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
      applications (i.e. phc2sys) to compensate for the delays incurred while
      reading the PHC's time.
      
      The task itself of taking the software timestamp is delegated to the SPI
      subsystem, through the newly introduced API in struct spi_transfer. The
      goal is to cross-timestamp I/O operations on the switch's PTP clock with
      values in the local system clock (CLOCK_REALTIME). For that we need to
      understand a bit of the hardware internals.
      
      The 'read PTP time' message is a 12 byte structure, first 4 bytes of
      which represent the SPI header, and the last 8 bytes represent the
      64-bit PTP time. The switch itself starts processing the command
      immediately after receiving the last bit of the address, i.e. at the
      middle of byte 3 (last byte of header). The PTP time is shadowed to a
      buffer register in the switch, and retrieved atomically during the
      subsequent SPI frames.
      
      A similar thing goes on for the 'write PTP time' message, although in
      that case the switch waits until the 64-bit PTP time becomes fully
      available before taking any action. So the byte that needs to be
      software-timestamped is byte 11 (last) of the transfer.
      
      The patch creates a common (and local) sja1105_xfer implementation for
      the SPI I/O, and offers 3 front-ends:
      
      - sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally
        requesting a PTP timestamp
      
      - sja1105_xfer_buf: this is for large transfers (e.g. the static config
        buffer) and other misc data, and there is no point in giving
        timestamping capabilities to this.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34d76e9f
  15. 19 10月, 2019 1 次提交
  16. 16 10月, 2019 2 次提交
    • V
      net: dsa: sja1105: Switch to scatter/gather API for SPI · 08839c06
      Vladimir Oltean 提交于
      This reworks the SPI transfer implementation to make use of more of the
      SPI core features. The main benefit is to avoid the memcpy in
      sja1105_xfer_buf().
      
      The memcpy was only needed because the function was transferring a
      single buffer at a time. So it needed to copy the caller-provided buffer
      at buf + 4, to store the SPI message header in the "headroom" area.
      
      But the SPI core supports scatter-gather messages, comprised of multiple
      transfers. We can actually use those to break apart every SPI message
      into 2 transfers: one for the header and one for the actual payload.
      
      To keep the behavior the same regarding the chip select signal, it is
      necessary to tell the SPI core to de-assert the chip select after each
      chunk. This was not needed before, because each spi_message contained
      only 1 single transfer.
      
      The meaning of the per-transfer cs_change=1 is:
      
      - If the transfer is the last one of the message, keep CS asserted
      - Otherwise, deassert CS
      
      We need to deassert CS in the "otherwise" case, which was implicit
      before.
      
      Avoiding the memcpy creates yet another opportunity. The device can't
      process more than 256 bytes of SPI payload at a time, so the
      sja1105_xfer_long_buf() function used to exist, to split the larger
      caller buffer into chunks.
      
      But these chunks couldn't be used as scatter/gather buffers for
      spi_message until now, because of that memcpy (we would have needed more
      memory for each chunk). So we can now remove the sja1105_xfer_long_buf()
      function and have a single implementation for long and short buffers.
      
      Another benefit is lower usage of stack memory. Previously we had to
      store 2 SPI buffers for each chunk. Due to the elimination of the
      memcpy, we can now send pointers to the actual chunks from the
      caller-supplied buffer to the SPI core.
      
      Since the patch merges two functions into a rewritten implementation,
      the function prototype was also changed, mainly for cosmetic consistency
      with the structures used within it.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08839c06
    • V
      net: dsa: sja1105: Move sja1105_spi_transfer into sja1105_xfer · 8a559400
      Vladimir Oltean 提交于
      This is a cosmetic patch that reduces some boilerplate in the SPI
      interaction of the driver.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a559400
  17. 15 10月, 2019 1 次提交
  18. 05 10月, 2019 1 次提交
  19. 03 10月, 2019 2 次提交
  20. 01 10月, 2019 1 次提交
  21. 28 6月, 2019 1 次提交
  22. 10 6月, 2019 3 次提交
  23. 09 6月, 2019 3 次提交
    • V
      net: dsa: sja1105: Add logic for TX timestamping · 47ed985e
      Vladimir Oltean 提交于
      On TX, timestamping is performed synchronously from the
      port_deferred_xmit worker thread.
      In management routes, the switch is requested to take egress timestamps
      (again partial), which are reconstructed and appended to a clone of the
      skb that was just sent.  The cloning is done by DSA and we retrieve the
      pointer from the structure that DSA keeps in skb->cb.
      Then these clones are enqueued to the socket's error queue for
      application-level processing.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47ed985e
    • V
      net: dsa: sja1105: Add support for the PTP clock · bb77f36a
      Vladimir Oltean 提交于
      The design of this PHC driver is influenced by the switch's behavior
      w.r.t. timestamping.  It exposes two PTP counters, one free-running
      (PTPTSCLK) and the other offset- and frequency-corrected in hardware
      through PTPCLKVAL, PTPCLKADD and PTPCLKRATE.  The MACs can sample either
      of these for frame timestamps.
      
      However, the user manual warns that taking timestamps based on the
      corrected clock is less than useful, as the switch can deliver corrupted
      timestamps in a variety of circumstances.
      
      Therefore, this PHC uses the free-running PTPTSCLK together with a
      timecounter/cyclecounter structure that translates it into a software
      time domain.  Thus, the settime/adjtime and adjfine callbacks are
      hardware no-ops.
      
      The timestamps (introduced in a further patch) will also be translated
      to the correct time domain before being handed over to the userspace PTP
      stack.
      
      The introduction of a second set of PHC operations that operate on the
      hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat
      unavoidable, as the TTEthernet core uses the corrected PTP time domain.
      However, the free-running counter + timecounter structure combination
      will suffice for now, as the resulting timestamps yield a sub-50 ns
      synchronization offset in steady state using linuxptp.
      
      For this patch, in absence of frame timestamping, the operations of the
      switch PHC were tested by syncing it to the system time as a local slave
      clock with:
      
      phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb77f36a
    • V
      net: dsa: sja1105: Export symbols for upcoming PTP driver · 28e8fb3e
      Vladimir Oltean 提交于
      These are needed for the situation where the switch driver and the PTP
      driver are both built as modules.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e8fb3e
  24. 05 6月, 2019 1 次提交
    • V
      net: dsa: sja1105: Make room for P/Q/R/S FDB operations · 9dfa6911
      Vladimir Oltean 提交于
      The DSA callbacks were written with the E/T (first generation) in mind,
      which is quite different.
      
      For P/Q/R/S completely new implementations need to be provided, which
      are held as function pointers in the priv->info structure.  We are
      taking a slightly roundabout way for this (a function from
      sja1105_main.c reads a structure defined in sja1105_spi.c that
      points to a function defined in sja1105_main.c), but it is what it is.
      
      The FDB dump callback works for both families, hence no function pointer
      for that.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dfa6911
  25. 09 5月, 2019 2 次提交
  26. 03 5月, 2019 2 次提交