1. 11 3月, 2021 11 次提交
  2. 09 3月, 2021 1 次提交
    • V
      net: enetc: allow hardware timestamping on TX queues with tc-etf enabled · 29d98f54
      Vladimir Oltean 提交于
      The txtime is passed to the driver in skb->skb_mstamp_ns, which is
      actually in a union with skb->tstamp (the place where software
      timestamps are kept).
      
      Since commit b50a5c70 ("net: allow simultaneous SW and HW transmit
      timestamping"), __sock_recv_timestamp has some logic for making sure
      that the two calls to skb_tstamp_tx:
      
      skb_tx_timestamp(skb) # Software timestamp in the driver
      -> skb_tstamp_tx(skb, NULL)
      
      and
      
      skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver
      
      will both do the right thing and in a race-free manner, meaning that
      skb_tx_timestamp will deliver a cmsg with the software timestamp only,
      and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg
      with the hardware timestamp only.
      
      Why are races even possible? Well, because although the software timestamp
      skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb)
      lives in skb_shinfo(skb), an area which is shared between skbs and their
      clones. And skb_tstamp_tx works by cloning the packets when timestamping
      them, therefore attempting to perform hardware timestamping on an skb's
      clone will also change the hardware timestamp of the original skb. And
      the original skb might have been yet again cloned for software
      timestamping, at an earlier stage.
      
      So the logic in __sock_recv_timestamp can't be as simple as saying
      "does this skb have a hardware timestamp? if yes I'll send the hardware
      timestamp to the socket, otherwise I'll send the software timestamp",
      precisely because the hardware timestamp is shared.
      Instead, it's quite the other way around: __sock_recv_timestamp says
      "does this skb have a software timestamp? if yes, I'll send the software
      timestamp, otherwise the hardware one". This works because the software
      timestamp is not shared with clones.
      
      But that means we have a problem when we attempt hardware timestamping
      with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp
      will say "oh, yeah, this must be some sort of odd clone" and will not
      deliver the hardware timestamp to the socket. And this is exactly what
      is happening when we have txtime enabled on the socket: as mentioned,
      that is put in a union with skb->tstamp, so it is quite easy to mistake
      it.
      
      Do what other drivers do (intel igb/igc) and write zero to skb->tstamp
      before taking the hardware timestamp. It's of no use to us now (we're
      already on the TX confirmation path).
      
      Fixes: 0d08c9ec ("enetc: add support time specific departure base on the qos etf")
      Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29d98f54
  3. 02 3月, 2021 6 次提交
    • V
      net: enetc: keep RX ring consumer index in sync with hardware · 3a5d12c9
      Vladimir Oltean 提交于
      The RX rings have a producer index owned by hardware, where newly
      received frame buffers are placed, and a consumer index owned by
      software, where newly allocated buffers are placed, in expectation of
      hardware being able to place frame data in them.
      
      Hardware increments the producer index when a frame is received, however
      it is not allowed to increment the producer index to match the consumer
      index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
      BDs. Whenever the producer index matches the value of the consumer
      index, the ring has no unprocessed received frames and all BDs in the
      ring have been initialized/prepared by software, i.e. hardware owns all
      BDs in the ring.
      
      The code uses the next_to_clean variable to keep track of the producer
      index, and the next_to_use variable to keep track of the consumer index.
      
      The RX rings are seeded from enetc_refill_rx_ring, which is called from
      two places:
      
      1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
         i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:
      
      .ndo_open
      -> enetc_open
         -> enetc_setup_bdrs
            -> enetc_setup_rxbdr
               -> enetc_refill_rx_ring
      
      2. then during the data path processing, it is refilled with 16 buffers
         at a time:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_clean_rx_ring
               -> enetc_refill_rx_ring
      
      There is just one problem: the initial seeding done during .ndo_open
      updates just the producer index (ENETC_RBPIR) with 0, and the software
      next_to_clean and next_to_use variables. Notably, it will not update the
      consumer index to make the hardware aware of the newly added buffers.
      
      Wait, what? So how does it work?
      
      Well, the reset values of the producer index and of the consumer index
      of a ring are both zero. As per the description in the second paragraph,
      it means that the ring is full of buffers waiting for hardware to put
      frames in them, which by coincidence is almost true, because we have in
      fact seeded 511 buffers into the ring.
      
      But will the hardware attempt to access the 512th entry of the ring,
      which has an invalid BD in it? Well, no, because in order to do that, it
      would have to first populate the first 511 entries, and the NAPI
      enetc_poll will kick in by then. Eventually, after 16 processed slots
      have become available in the RX ring, enetc_clean_rx_ring will call
      enetc_refill_rx_ring and then will [ finally ] update the consumer index
      with the new software next_to_use variable. From now on, the
      next_to_clean and next_to_use variables are in sync with the producer
      and consumer ring indices.
      
      So the day is saved, right? Well, not quite. Freeing the memory
      allocated for the rings is done in:
      
      enetc_close
      -> enetc_clear_bdrs
         -> enetc_clear_rxbdr
            -> this just disables the ring
      -> enetc_free_rxtx_rings
         -> enetc_free_rx_ring
            -> sets next_to_clean and next_to_use to 0
      
      but again, nothing is committed to the hardware producer and consumer
      indices (yay!). The assumption is that the ring is disabled, so the
      indices don't matter anyway, and it's the responsibility of the "open"
      code path to set those up.
      
      .. Except that the "open" code path does not set those up properly.
      
      While initially, things almost work, during subsequent enetc_close ->
      enetc_open sequences, we have problems. To be precise, the enetc_open
      that is subsequent to enetc_close will again refill the ring with 511
      entries, but it will leave the consumer index untouched. Untouched
      means, of course, equal to the value it had before disabling the ring
      and draining the old buffers in enetc_close.
      
      But as mentioned, enetc_setup_rxbdr will at least update the producer
      index though, through this line of code:
      
      	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
      
      so at this stage we'll have:
      
      next_to_clean=0 (in hardware 0)
      next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)
      
      Again, the next_to_clean and producer index are in sync and set to
      correct values, so the driver manages to limp on. Eventually, 16 ring
      entries will be consumed by enetc_poll, and the savior
      enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
      update the hardware consumer ring based upon the new next_to_use.
      
      So.. it works?
      Well, by coincidence, it almost does, but there's a circumstance where
      enetc_clean_rx_ring won't be there to save us. If the previous value of
      the consumer index was 15, there's a problem, because the NAPI poll
      sequence will only issue a refill when 16 or more buffers have been
      consumed.
      
      It's easiest to illustrate this with an example:
      
      ip link set eno0 up
      ip addr add 192.168.100.1/24 dev eno0
      ping 192.168.100.1 -c 20 # ping this port from another board
      ip link set eno0 down
      ip link set eno0 up
      ping 192.168.100.1 -c 20 # ping it again from the same other board
      
      One by one:
      
      1. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 0)
      
      2. ping 192.168.100.1 -c 20 # ping this port from another board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)
      
      20 packets transmitted, 20 packets received, 0% packet loss
      
      3. ip link set eno0 down
      enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)
      
      4. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 15)
      
      5. ping 192.168.100.1 -c 20 # ping it again from the same other board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)
      
      20 packets transmitted, 12 packets received, 40% packet loss
      
      And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
      to 15 for that to happen), no nothing. The hardware enters the condition where
      the producer (14) + 1 is equal to the consumer (15) index, which makes it
      believe it has no more free buffers to put packets in, so it starts discarding
      them:
      
      ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
      NIC statistics:
           Rx ring  0 discarded frames: 8
      
      Summarized, if the interface receives between 16 and 32 (mod 512) frames
      and then there is a link flap, then the port will eventually die with no
      way to recover. If it receives less than 16 (mod 512) frames, then the
      initial NAPI poll [ before the link flap ] will not update the consumer
      index in hardware (it will remain zero) which will be ok when the buffers
      are later reinitialized. If more than 32 (mod 512) frames are received,
      the initial NAPI poll has the chance to refill the ring twice, updating
      the consumer index to at least 32. So after the link flap, the consumer
      index is still wrong, but the post-flap NAPI poll gets a chance to
      refill the ring once (because it passes through cleaned_cnt=15) and
      makes the consumer index be again back in sync with next_to_use.
      
      The solution to this problem is actually simple, we just need to write
      next_to_use into the hardware consumer index at enetc_open time, which
      always brings it back in sync after an initial buffer seeding process.
      
      The simpler thing would be to put the write to the consumer index into
      enetc_refill_rx_ring directly, but there are issues with the MDIO
      locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
      top-level and we use the unlocked enetc_wr_reg_hot, whereas in
      enetc_open, the enetc_lock_mdio() is not taken at the top level, but
      instead by each individual enetc_wr_reg, so we are forced to put an
      additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
      the code is left as a refactoring exercise.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a5d12c9
    • V
      net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr · 96a5223b
      Vladimir Oltean 提交于
      The Station Interface Receive Interrupt Detect Register (SIRXIDR)
      contains a 16-bit wide mask of 'interrupt detected' events for each ring
      associated with a port. Bit i is write-1-to-clean for RX ring i.
      
      I have no explanation whatsoever how this line of code came to be
      inserted in the blamed commit. I checked the downstream versions of that
      patch and none of them have it.
      
      The somewhat comical aspect of it is that we're writing a binary number
      to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring).
      Since the RX rings have 512 buffer descriptors, we end up writing 511 to
      this register, which is 0x1ff, so we are effectively clearing the
      'interrupt detected' event for rings 0-8.
      
      This register is not what is used for interrupt handling though - it
      only provides a summary for the entire SI. The hardware provides one
      separate Interrupt Detect Register per RX ring, which auto-clears upon
      read. So there doesn't seem to be any adverse effect caused by this
      bogus write.
      
      There is, however, one reason why this should be handled as a bugfix:
      next_to_clean _should_ be committed to hardware, just not to that
      register, and this was obscuring the fact that it wasn't. This is fixed
      in the next patch, and removing the bogus line now allows the fix patch
      to be backported beyond that point.
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96a5223b
    • V
      net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets · 827b6fd0
      Vladimir Oltean 提交于
      When the enetc ports have rx-vlan-offload enabled, they report a TPID of
      ETH_P_8021Q regardless of what was actually in the packet. When
      rx-vlan-offload is disabled, packets have the proper TPID. Fix this
      inconsistency by finishing the TODO left in the code.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      827b6fd0
    • V
      net: enetc: take the MDIO lock only once per NAPI poll cycle · 6d36ecdb
      Vladimir Oltean 提交于
      The workaround for the ENETC MDIO erratum caused a performance
      degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of
      64B packets). This is due to excessive locking and unlocking in the fast
      path, which can be avoided.
      
      By taking the MDIO read-side lock only once per NAPI poll cycle, we are
      able to regain 54 Kpps (65%) of the performance hit. The rest of the
      performance degradation comes from the TX data path, but unfortunately
      it doesn't look like we can optimize that away easily, even with
      netdev_xmit_more(), there just isn't any skb batching done, to help with
      taking the MDIO lock less often than once per packet.
      
      We need to change the register accessor type for enetc_get_tx_tstamp,
      because it now runs under the enetc_lock_mdio as per the new call path
      detailed below:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_lock_mdio
            -> enetc_clean_tx_ring
               -> enetc_get_tx_tstamp
            -> enetc_clean_rx_ring
            -> enetc_unlock_mdio
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d36ecdb
    • V
      net: enetc: initialize RFS/RSS memories for unused ports too · 3222b5b6
      Vladimir Oltean 提交于
      Michael reports that since linux-next-20210211, the AER messages for ECC
      errors have started reappearing, and this time they can be reliably
      reproduced with the first ping on one of his LS1028A boards.
      
      $ ping 1[   33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
      72.16.0.1
      PING [   33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000
      172.16.0.1 (172.16.0.1): 56 data bytes
      64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms
      64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms
      
      $ devmem 0x1f8010e10 32
      0xC0000006
      
      It isn't clear why this is necessary, but it seems that for the errors
      to go away, we must clear the entire RFS and RSS memory, not just for
      the ports in use.
      
      Sadly the code is structured in such a way that we can't have unified
      logic for the used and unused ports. For the minimal initialization of
      an unused port, we need just to enable and ioremap the PF memory space,
      and a control buffer descriptor ring. Unused ports must then free the
      CBDR because the driver will exit, but used ports can not pick up from
      where that code path left, since the CBDR API does not reinitialize a
      ring when setting it up, so its producer and consumer indices are out of
      sync between the software and hardware state. So a separate
      enetc_init_unused_port function was created, and it gets called right
      after the PF memory space is enabled.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Reported-by: NMichael Walle <michael@walle.cc>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3222b5b6
    • V
      net: enetc: don't overwrite the RSS indirection table when initializing · c646d10d
      Vladimir Oltean 提交于
      After the blamed patch, all RX traffic gets hashed to CPU 0 because the
      hashing indirection table set up in:
      
      enetc_pf_probe
      -> enetc_alloc_si_resources
         -> enetc_configure_si
            -> enetc_setup_default_rss_table
      
      is overwritten later in:
      
      enetc_pf_probe
      -> enetc_init_port_rss_memory
      
      which zero-initializes the entire port RSS table in order to avoid ECC errors.
      
      The trouble really is that enetc_init_port_rss_memory really neads
      enetc_alloc_si_resources to be called, because it depends upon
      enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
      thing could have been better thought out, it has nothing to do in a
      function called "alloc_si_resources", especially since its counterpart,
      "free_si_resources", does nothing to unwind the configuration of the SI.
      
      The point is, we need to pull out enetc_configure_si out of
      enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
      This allows us to set up the default RSS indirection table after
      initializing the memory.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c646d10d
  4. 18 11月, 2020 1 次提交
  5. 05 11月, 2020 1 次提交
  6. 12 10月, 2020 1 次提交
    • C
      enetc: Migrate to PHYLINK and PCS_LYNX · 71b77a7a
      Claudiu Manoil 提交于
      This is a methodical transition of the driver from phylib
      to phylink, following the guidelines from sfp-phylink.rst.
      The MAC register configurations based on interface mode
      were moved from the probing path to the mac_config() hook.
      MAC enable and disable commands (enabling Rx and Tx paths
      at MAC level) were also extracted and assigned to their
      corresponding phylink hooks.
      As part of the migration to phylink, the serdes configuration
      from the driver was offloaded to the PCS_LYNX module,
      introduced in commit 0da4c3d3 ("net: phy: add Lynx PCS module"),
      the PCS_LYNX module being a mandatory component required to
      make the enetc driver work with phylink.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: NIoana Ciornei <ioana.cionei@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      71b77a7a
  7. 04 8月, 2020 1 次提交
    • J
      enetc: use napi_schedule to be compatible with PREEMPT_RT · 215602a8
      Jiafei Pan 提交于
      The driver calls napi_schedule_irqoff() from a context where, in RT,
      hardirqs are not disabled, since the IRQ handler is force-threaded.
      
      In the call path of this function, __raise_softirq_irqoff() is modifying
      its per-CPU mask of pending softirqs that must be processed, using
      or_softirq_pending(). The or_softirq_pending() function is not atomic,
      but since interrupts are supposed to be disabled, nobody should be
      preempting it, and the operation should be safe.
      
      Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case,
      it isn't safe, and the pending softirqs mask can get corrupted,
      resulting in softirqs being lost and never processed.
      
      To have common code that works with PREEMPT_RT and with mainline Linux,
      we can use plain napi_schedule() instead. The difference is that
      napi_schedule() (via __napi_schedule) also calls local_irq_save, which
      disables hardirqs if they aren't already. But, since they already are
      disabled in non-RT, this means that in practice we don't see any
      measurable difference in throughput or latency with this patch.
      Signed-off-by: NJiafei Pan <Jiafei.Pan@nxp.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      215602a8
  8. 22 7月, 2020 5 次提交
    • C
      enetc: Add adaptive interrupt coalescing · ae0e6a5d
      Claudiu Manoil 提交于
      Use the generic dynamic interrupt moderation (dim)
      framework to implement adaptive interrupt coalescing
      on Rx.  With the per-packet interrupt scheme, a high
      interrupt rate has been noted for moderate traffic flows
      leading to high CPU utilization.  The 'dim' scheme
      implemented by the current patch addresses this issue
      improving CPU utilization while using minimal coalescing
      time thresholds in order to preserve a good latency.
      On the Tx side use an optimal time threshold value by
      default.  This value has been optimized for Tx TCP
      streams at a rate of around 85kpps on a 1G link,
      at which rate half of the Tx ring size (128) gets filled
      in 1500 usecs.  Scaling this down to 2.5G links yields
      the current value of 600 usecs, which is conservative
      and gives good enough results for 1G links too (see
      next).
      
      Below are some measurement results for before and after
      this patch (and related dependencies) basically, for a
      2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache),
      using 60secs log netperf TCP stream tests @ 1Gbit link
      (maximum throughput):
      
      1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI
      thread on the same CPU:
      	CPU utilization		int rate (ints/sec)
      Before:	50%-60% (over 50%)		92k
      After:  13%-22%				3.5k-12k
      Comment:  Major CPU utilization improvement for a single flow
      	  Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single
      	  CPU. Usually settles under 16% for longer tests.
      
      2) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency):
      	Total CPU utilization	Total int rate (ints/sec)
      Before:	~80% (spikes to 90%)		~100k
      After:   60% (more steady)		  ~4k
      Comment:  Important improvement for this load test, while the
      	  ping test outcome does not show any notable
      	  difference compared to before.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0e6a5d
    • C
      enetc: Add interrupt coalescing support · 91571081
      Claudiu Manoil 提交于
      Enable programming of the interrupt coalescing registers
      and allow manual configuration of the coalescing time
      thresholds via ethtool.  Packet thresholds have been fixed
      to predetermined values as there's no point in making them
      run-time configurable, also anticipating the dynamic interrupt
      moderation (DIM) algorithm which uses fixed packet thresholds
      as well.  If the interface is up when the operation mode of
      traffic interrupt events is changed by the user (i.e. switching
      from default per-packet interrupts to coalesced interrupts),
      the traffic needs to be paused in the process.
      This patch also prepares the ground for introducing DIM on Rx.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91571081
    • C
      enetc: Fix interrupt coalescing register naming · 12460a0a
      Claudiu Manoil 提交于
      Interrupt coalescing registers naming in the current revision
      of the Ref Man (RM) is ICR, deprecating the ICIR name used
      in earlier (draft) versions of the RM.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12460a0a
    • C
      enetc: Factor out the traffic start/stop procedures · bbb96dc7
      Claudiu Manoil 提交于
      A reliable traffic pause (and reconfiguration) procedure
      is needed to be able to safely make h/w configuration
      changes during run-time, like changing the mode in which the
      interrupts are operating (i.e. with or without coalescing),
      as opposed to making on-the-fly register updates that
      may be subject to h/w or s/w concurrency issues.
      To this end, the code responsible of the run-time device
      configurations that basically starts resp. stops the traffic
      flow through the device has been extracted from the
      the enetc_open/_close procedures, to the separate standalone
      enetc_start/_stop procedures. Traffic stop should be as
      graceful as possible, it lets the executing napi threads to
      to finish while the interrupts stay disabled.  But since
      the napi thread will try to re-enable interrupts by clearing
      the device's unmask register, the enable_irq/ disable_irq
      API has been used to avoid this potential concurrency issue
      and make the traffic pause procedure more reliable.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbb96dc7
    • C
      enetc: Refine buffer descriptor ring sizes · 02293dd4
      Claudiu Manoil 提交于
      It's time to differentiate between Rx and Tx ring sizes.
      Not only Tx rings are processed differently than Rx rings,
      but their default number also differs - i.e. up to 8 Tx rings
      per device (8 traffic classes) vs. 2 Rx rings (one per CPU).
      So let's set Tx rings sizes to half the size of the Rx rings
      for now, to be conservative.
      The default ring sizes were decreased as well (to the next
      lower power of 2), to reduce the memory footprint, buffering
      etc., since the measurements I've made so far show that the
      rings are very unlikely to get full.
      This change also anticipates the introduction of the
      dynamic interrupt moderation (dim) algorithm which operates
      on maximum packet thresholds of 256 packets for Rx and 128
      packets for Tx.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02293dd4
  9. 27 6月, 2020 1 次提交
    • C
      enetc: Fix tx rings bitmap iteration range, irq handling · 0574e200
      Claudiu Manoil 提交于
      The rings bitmap of an interrupt vector encodes
      which of the device's rings were assigned to that
      interrupt vector.
      Hence the iteration range of the tx rings bitmap
      (for_each_set_bit()) should be the total number of
      Tx rings of that netdevice instead of the number of
      rings assigned to the interrupt vector.
      Since there are 2 cores, and one interrupt vector for
      each core, the number of rings asigned to an interrupt
      vector is half the number of available rings.
      The impact of this error is that the upper half of the
      tx rings could still generate interrupts during napi
      polling.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0574e200
  10. 20 6月, 2020 1 次提交
    • C
      enetc: Fix HW_VLAN_CTAG_TX|RX toggling · 9deba33f
      Claudiu Manoil 提交于
      VLAN tag insertion/extraction offload is correctly
      activated at probe time but deactivation of this feature
      (i.e. via ethtool) is broken.  Toggling works only for
      Tx/Rx ring 0 of a PF, and is ignored for the other rings,
      including the VF rings.
      To fix this, the existing VLAN offload toggling code
      was extended to all the rings assigned to a netdevice,
      instead of the default ring 0 (likely a leftover from the
      early validation days of this feature).  And the code was
      moved to the common set_features() function to fix toggling
      for the VF driver too.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9deba33f
  11. 19 6月, 2020 1 次提交
  12. 02 5月, 2020 2 次提交
    • P
      net: enetc: add tc flower psfp offload driver · 888ae5a3
      Po Liu 提交于
      This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
      function. There are four main feature parts to implement the flow
      policing and filtering for ingress flow with IEEE 802.1Qci features.
      They are stream identify(this is defined in the P802.1cb exactly but
      needed for 802.1Qci), stream filtering, stream gate and flow metering.
      Each function block includes many entries by index to assign parameters.
      So for one frame would be filtered by stream identify first, then
      flow into stream filter block by the same handle between stream identify
      and stream filtering. Then flow into stream gate control which assigned
      by the stream filtering entry. And then policing by the gate and limited
      by the max sdu in the filter block(optional). At last, policing by the
      flow metering block, index choosing at the fitering block.
      So you can see that each entry of block may link to many upper entries
      since they can be assigned same index means more streams want to share
      the same feature in the stream filtering or stream gate or flow
      metering.
      To implement such features, each stream filtered by source/destination
      mac address, some stream maybe also plus the vlan id value would be
      treated as one flow chain. This would be identified by the chain_index
      which already in the tc filter concept. Driver would maintain this chain
      and also with gate modules. The stream filter entry create by the gate
      index and flow meter(optional) entry id and also one priority value.
      Offloading only transfer the gate action and flow filtering parameters.
      Driver would create (or search same gate id and flow meter id and
       priority) one stream filter entry to set to the hardware. So stream
      filtering do not need transfer by the action offloading.
      This architecture is same with tc filter and actions relationship. tc
      filter maintain the list for each flow feature by keys. And actions
      maintain by the action list.
      
      Below showing a example commands by tc:
      > tc qdisc add dev eth0 ingress
      > ip link set eth0 address 10:00:80:00:00:00
      > tc filter add dev eth0 parent ffff: protocol ip chain 11 \
      	flower skip_sw dst_mac 10:00:80:00:00:00 \
      	action gate index 10 \
      	sched-entry open 200000000 1 8000000 \
      	sched-entry close 100000000 -1 -1
      
      Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
      identify module. Then setting the gate index 10 of stream gate module.
      Keep the gate open for 200ms and limit the traffic volume to 8MB in this
      sched-entry. Then direct the frames to the ingress queue 1.
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      888ae5a3
    • P
      net: enetc: add hw tc hw offload features for PSPF capability · 79e49982
      Po Liu 提交于
      This patch is to let ethtool enable/disable the tc flower offload
      features. Hardware ENETC has the feature of PSFP which is for per-stream
      policing. When enable the tc hw offloading feature, driver would enable
      the IEEE 802.1Qci feature. It is only set the register enable bit for
      this feature not enable for any entry of per stream filtering and stream
      gate or stream identify but get how much capabilities for each feature.
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e49982
  13. 11 3月, 2020 2 次提交
    • C
      enetc: Add dynamic allocation of extended Rx BD rings · 434cebab
      Claudiu Manoil 提交于
      Hardware timestamping support (PTP) on Rx requires extended
      buffer descriptors, double the size of normal Rx descriptors.
      On the current controller revision only the timestamping offload
      requires extended Rx descriptors.
      Since Rx timestamping can be turned on/off at runtime, make Rx ring
      allocation configurable at runtime too. As a result, the static
      config option FSL_ENETC_HW_TIMESTAMPING can be dropped and the
      extended descriptors can be used only when Rx timestamping gets
      activated.
      The extension has the same size as the base descriptor, making
      the descriptor iterators easy to update for the extended case.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      434cebab
    • C
      enetc: Clean up Rx BD iteration · 714239ac
      Claudiu Manoil 提交于
      Improve maintainability of the code iterating the Rx buffer
      descriptors to prepare it to support iterating extended Rx BD
      descriptors as well.
      Don't increment by one the h/w descriptor pointers explicitly,
      provide an iterator that takes care of the h/w details.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      714239ac
  14. 03 1月, 2020 1 次提交
    • P
      enetc: add support time specific departure base on the qos etf · 0d08c9ec
      Po Liu 提交于
      ENETC implement time specific departure capability, which enables
      the user to specify when a frame can be transmitted. When this
      capability is enabled, the device will delay the transmission of
      the frame so that it can be transmitted at the precisely specified time.
      The delay departure time up to 0.5 seconds in the future. If the
      departure time in the transmit BD has not yet been reached, based
      on the current time, the packet will not be transmitted.
      
      This driver was loaded by Qos driver ETF. User could load it by tc
      commands. Here are the example commands:
      
      tc qdisc add dev eth0 root handle 1: mqprio \
      	   num_tc 8 map 0 1 2 3 4 5 6 7 hw 1
      tc qdisc replace dev eth0 parent 1:8 etf \
      	   clockid CLOCK_TAI delta 30000  offload
      
      These example try to set queue mapping first and then set queue 7
      with 30us ahead dequeue time.
      
      Then user send test frame should set SO_TXTIME feature for socket.
      
      There are also some limitations for this feature in hardware:
      - Transmit checksum offloads and time specific departure operation
      are mutually exclusive.
      - Time Aware Shaper feature (Qbv) offload and time specific departure
      operation are mutually exclusive.
      Signed-off-by: NPo Liu <Po.Liu@nxp.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d08c9ec
  15. 11 12月, 2019 1 次提交
  16. 07 12月, 2019 1 次提交
  17. 26 11月, 2019 1 次提交
  18. 22 11月, 2019 1 次提交
    • M
      enetc: make enetc_setup_tc_mqprio static · 13baf667
      Mao Wenan 提交于
      While using ARCH=mips CROSS_COMPILE=mips-linux-gnu- command to compile,
      make C=2 drivers/net/ethernet/freescale/enetc/enetc.o
      
      one warning can be found:
      drivers/net/ethernet/freescale/enetc/enetc.c:1439:5:
      warning: symbol 'enetc_setup_tc_mqprio' was not declared.
      Should it be static?
      
      This patch make symbol enetc_setup_tc_mqprio static.
      Fixes: 34c6adf1 ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13baf667
  19. 17 11月, 2019 1 次提交