- 11 3月, 2021 11 次提交
-
-
由 Vladimir Oltean 提交于
There is no other reason why this forward declaration exists rather than poor ordering of the functions. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
This patch moves the NAPI enetc_poll after enetc_clean_rx_ring such that we can delete the forward declarations. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
When we iterate through the BDs in the RX ring, the software producer index (which is already passed by value to enetc_rxbd_next) lags behind, and we end up with this funny looking "++i == rx_ring->bd_count" check so that we drag it after us. Let's pass the software producer index "i" by reference, so that enetc_rxbd_next can increment it by itself (mod rx_ring->bd_count), especially since enetc_rxbd_next has to increment the index anyway. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Since commit 3222b5b6 ("net: enetc: initialize RFS/RSS memories for unused ports too") there is a requirement to initialize the memories of unused PFs too, which has left the probe path in a bit of a rough shape, because we basically have a minimal initialization path for unused PFs which is separate from the main initialization path. Now that initializing a control BD ring is as simple as calling enetc_setup_cbdr, let's move that outside of enetc_alloc_si_resources (unused PFs don't need classification rules, so no point in allocating them just to free them later). But enetc_alloc_si_resources is called both for PFs and for VFs, so now that enetc_setup_cbdr is no longer called from this common function, it means that the VF probe path needs to explicitly call enetc_setup_cbdr too. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
It makes no sense from an API perspective to first initialize some portion of struct enetc_cbdr outside enetc_setup_cbdr, then leave that function to initialize the rest. enetc_setup_cbdr should be able to perform all initialization given a zero-initialized struct enetc_cbdr. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
All call sites call enetc_clear_cbdr and enetc_free_cbdr one after another, so let's combine the two functions into a single method named enetc_teardown_cbdr which does both, and in the same order. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
enetc_clear_cbdr depends on struct enetc_hw because it must disable the ring through a register write. We'd like to remove that dependency, so let's do what's already done with the producer and consumer indices, which is to save the iomem address in a variable kept in struct enetc_cbdr. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
enetc_alloc_cbdr and enetc_setup_cbdr are always called one after another, so we can simplify the callers and make enetc_setup_cbdr do everything that's needed. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
We shouldn't need to pass the struct device *dev to enetc CBDR APIs over and over again, so save this inside struct enetc_cbdr::dma_dev and avoid calling it from the enetc_free_cbdr functions. This breaks the dependency of the cbdr API from struct enetc_si (the station interface). Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Since there is a dedicated file in this driver for interacting with control BD rings, it makes sense to move these functions there. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
As explained in commit 29d98f54 ("net: enetc: allow hardware timestamping on TX queues with tc-etf enabled"), hardware TX timestamping requires an skb with skb->tstamp = 0. When a packet is sent with SO_TXTIME, the skb->skb_mstamp_ns corrupts the value of skb->tstamp, so the drivers need to explicitly reset skb->tstamp to zero after consuming the TX time. Create a helper named skb_txtime_consumed() which does just that. All drivers which offload TC_SETUP_QDISC_ETF should implement it, and it would make it easier to assess during review whether they do the right thing in order to be compatible with hardware timestamping or not. Suggested-by: NVinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 3月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
The txtime is passed to the driver in skb->skb_mstamp_ns, which is actually in a union with skb->tstamp (the place where software timestamps are kept). Since commit b50a5c70 ("net: allow simultaneous SW and HW transmit timestamping"), __sock_recv_timestamp has some logic for making sure that the two calls to skb_tstamp_tx: skb_tx_timestamp(skb) # Software timestamp in the driver -> skb_tstamp_tx(skb, NULL) and skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver will both do the right thing and in a race-free manner, meaning that skb_tx_timestamp will deliver a cmsg with the software timestamp only, and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg with the hardware timestamp only. Why are races even possible? Well, because although the software timestamp skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb) lives in skb_shinfo(skb), an area which is shared between skbs and their clones. And skb_tstamp_tx works by cloning the packets when timestamping them, therefore attempting to perform hardware timestamping on an skb's clone will also change the hardware timestamp of the original skb. And the original skb might have been yet again cloned for software timestamping, at an earlier stage. So the logic in __sock_recv_timestamp can't be as simple as saying "does this skb have a hardware timestamp? if yes I'll send the hardware timestamp to the socket, otherwise I'll send the software timestamp", precisely because the hardware timestamp is shared. Instead, it's quite the other way around: __sock_recv_timestamp says "does this skb have a software timestamp? if yes, I'll send the software timestamp, otherwise the hardware one". This works because the software timestamp is not shared with clones. But that means we have a problem when we attempt hardware timestamping with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp will say "oh, yeah, this must be some sort of odd clone" and will not deliver the hardware timestamp to the socket. And this is exactly what is happening when we have txtime enabled on the socket: as mentioned, that is put in a union with skb->tstamp, so it is quite easy to mistake it. Do what other drivers do (intel igb/igc) and write zero to skb->tstamp before taking the hardware timestamp. It's of no use to us now (we're already on the TX confirmation path). Fixes: 0d08c9ec ("enetc: add support time specific departure base on the qos etf") Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2021 6 次提交
-
-
由 Vladimir Oltean 提交于
The RX rings have a producer index owned by hardware, where newly received frame buffers are placed, and a consumer index owned by software, where newly allocated buffers are placed, in expectation of hardware being able to place frame data in them. Hardware increments the producer index when a frame is received, however it is not allowed to increment the producer index to match the consumer index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received BDs. Whenever the producer index matches the value of the consumer index, the ring has no unprocessed received frames and all BDs in the ring have been initialized/prepared by software, i.e. hardware owns all BDs in the ring. The code uses the next_to_clean variable to keep track of the producer index, and the next_to_use variable to keep track of the consumer index. The RX rings are seeded from enetc_refill_rx_ring, which is called from two places: 1. initially the ring is seeded until full with enetc_bd_unused(rx_ring), i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511: .ndo_open -> enetc_open -> enetc_setup_bdrs -> enetc_setup_rxbdr -> enetc_refill_rx_ring 2. then during the data path processing, it is refilled with 16 buffers at a time: enetc_msix -> napi_schedule -> enetc_poll -> enetc_clean_rx_ring -> enetc_refill_rx_ring There is just one problem: the initial seeding done during .ndo_open updates just the producer index (ENETC_RBPIR) with 0, and the software next_to_clean and next_to_use variables. Notably, it will not update the consumer index to make the hardware aware of the newly added buffers. Wait, what? So how does it work? Well, the reset values of the producer index and of the consumer index of a ring are both zero. As per the description in the second paragraph, it means that the ring is full of buffers waiting for hardware to put frames in them, which by coincidence is almost true, because we have in fact seeded 511 buffers into the ring. But will the hardware attempt to access the 512th entry of the ring, which has an invalid BD in it? Well, no, because in order to do that, it would have to first populate the first 511 entries, and the NAPI enetc_poll will kick in by then. Eventually, after 16 processed slots have become available in the RX ring, enetc_clean_rx_ring will call enetc_refill_rx_ring and then will [ finally ] update the consumer index with the new software next_to_use variable. From now on, the next_to_clean and next_to_use variables are in sync with the producer and consumer ring indices. So the day is saved, right? Well, not quite. Freeing the memory allocated for the rings is done in: enetc_close -> enetc_clear_bdrs -> enetc_clear_rxbdr -> this just disables the ring -> enetc_free_rxtx_rings -> enetc_free_rx_ring -> sets next_to_clean and next_to_use to 0 but again, nothing is committed to the hardware producer and consumer indices (yay!). The assumption is that the ring is disabled, so the indices don't matter anyway, and it's the responsibility of the "open" code path to set those up. .. Except that the "open" code path does not set those up properly. While initially, things almost work, during subsequent enetc_close -> enetc_open sequences, we have problems. To be precise, the enetc_open that is subsequent to enetc_close will again refill the ring with 511 entries, but it will leave the consumer index untouched. Untouched means, of course, equal to the value it had before disabling the ring and draining the old buffers in enetc_close. But as mentioned, enetc_setup_rxbdr will at least update the producer index though, through this line of code: enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0); so at this stage we'll have: next_to_clean=0 (in hardware 0) next_to_use=511 (in hardware we'll have the refill index prior to enetc_close) Again, the next_to_clean and producer index are in sync and set to correct values, so the driver manages to limp on. Eventually, 16 ring entries will be consumed by enetc_poll, and the savior enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then update the hardware consumer ring based upon the new next_to_use. So.. it works? Well, by coincidence, it almost does, but there's a circumstance where enetc_clean_rx_ring won't be there to save us. If the previous value of the consumer index was 15, there's a problem, because the NAPI poll sequence will only issue a refill when 16 or more buffers have been consumed. It's easiest to illustrate this with an example: ip link set eno0 up ip addr add 192.168.100.1/24 dev eno0 ping 192.168.100.1 -c 20 # ping this port from another board ip link set eno0 down ip link set eno0 up ping 192.168.100.1 -c 20 # ping it again from the same other board One by one: 1. ip link set eno0 up -> calls enetc_setup_rxbdr: -> calls enetc_refill_rx_ring(511 buffers) -> next_to_clean=0 (in hw 0) -> next_to_use=511 (in hw 0) 2. ping 192.168.100.1 -c 20 # ping this port from another board enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0) enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15) 20 packets transmitted, 20 packets received, 0% packet loss 3. ip link set eno0 down enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15) 4. ip link set eno0 up -> calls enetc_setup_rxbdr: -> calls enetc_refill_rx_ring(511 buffers) -> next_to_clean=0 (in hw 0) -> next_to_use=511 (in hw 15) 5. ping 192.168.100.1 -c 20 # ping it again from the same other board enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15) 20 packets transmitted, 12 packets received, 40% packet loss And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal to 15 for that to happen), no nothing. The hardware enters the condition where the producer (14) + 1 is equal to the consumer (15) index, which makes it believe it has no more free buffers to put packets in, so it starts discarding them: ip netns exec ns0 ethtool -S eno0 | grep -v ': 0' NIC statistics: Rx ring 0 discarded frames: 8 Summarized, if the interface receives between 16 and 32 (mod 512) frames and then there is a link flap, then the port will eventually die with no way to recover. If it receives less than 16 (mod 512) frames, then the initial NAPI poll [ before the link flap ] will not update the consumer index in hardware (it will remain zero) which will be ok when the buffers are later reinitialized. If more than 32 (mod 512) frames are received, the initial NAPI poll has the chance to refill the ring twice, updating the consumer index to at least 32. So after the link flap, the consumer index is still wrong, but the post-flap NAPI poll gets a chance to refill the ring once (because it passes through cleaned_cnt=15) and makes the consumer index be again back in sync with next_to_use. The solution to this problem is actually simple, we just need to write next_to_use into the hardware consumer index at enetc_open time, which always brings it back in sync after an initial buffer seeding process. The simpler thing would be to put the write to the consumer index into enetc_refill_rx_ring directly, but there are issues with the MDIO locking: in the NAPI poll code we have the enetc_lock_mdio() taken from top-level and we use the unlocked enetc_wr_reg_hot, whereas in enetc_open, the enetc_lock_mdio() is not taken at the top level, but instead by each individual enetc_wr_reg, so we are forced to put an additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of the code is left as a refactoring exercise. Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
The Station Interface Receive Interrupt Detect Register (SIRXIDR) contains a 16-bit wide mask of 'interrupt detected' events for each ring associated with a port. Bit i is write-1-to-clean for RX ring i. I have no explanation whatsoever how this line of code came to be inserted in the blamed commit. I checked the downstream versions of that patch and none of them have it. The somewhat comical aspect of it is that we're writing a binary number to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring). Since the RX rings have 512 buffer descriptors, we end up writing 511 to this register, which is 0x1ff, so we are effectively clearing the 'interrupt detected' event for rings 0-8. This register is not what is used for interrupt handling though - it only provides a summary for the entire SI. The hardware provides one separate Interrupt Detect Register per RX ring, which auto-clears upon read. So there doesn't seem to be any adverse effect caused by this bogus write. There is, however, one reason why this should be handled as a bugfix: next_to_clean _should_ be committed to hardware, just not to that register, and this was obscuring the fact that it wasn't. This is fixed in the next patch, and removing the bogus line now allows the fix patch to be backported beyond that point. Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
When the enetc ports have rx-vlan-offload enabled, they report a TPID of ETH_P_8021Q regardless of what was actually in the packet. When rx-vlan-offload is disabled, packets have the proper TPID. Fix this inconsistency by finishing the TODO left in the code. Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
The workaround for the ENETC MDIO erratum caused a performance degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of 64B packets). This is due to excessive locking and unlocking in the fast path, which can be avoided. By taking the MDIO read-side lock only once per NAPI poll cycle, we are able to regain 54 Kpps (65%) of the performance hit. The rest of the performance degradation comes from the TX data path, but unfortunately it doesn't look like we can optimize that away easily, even with netdev_xmit_more(), there just isn't any skb batching done, to help with taking the MDIO lock less often than once per packet. We need to change the register accessor type for enetc_get_tx_tstamp, because it now runs under the enetc_lock_mdio as per the new call path detailed below: enetc_msix -> napi_schedule -> enetc_poll -> enetc_lock_mdio -> enetc_clean_tx_ring -> enetc_get_tx_tstamp -> enetc_clean_rx_ring -> enetc_unlock_mdio Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Michael reports that since linux-next-20210211, the AER messages for ECC errors have started reappearing, and this time they can be reliably reproduced with the first ping on one of his LS1028A boards. $ ping 1[ 33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0 72.16.0.1 PING [ 33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000 172.16.0.1 (172.16.0.1): 56 data bytes 64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms 64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms $ devmem 0x1f8010e10 32 0xC0000006 It isn't clear why this is necessary, but it seems that for the errors to go away, we must clear the entire RFS and RSS memory, not just for the ports in use. Sadly the code is structured in such a way that we can't have unified logic for the used and unused ports. For the minimal initialization of an unused port, we need just to enable and ioremap the PF memory space, and a control buffer descriptor ring. Unused ports must then free the CBDR because the driver will exit, but used ports can not pick up from where that code path left, since the CBDR API does not reinitialize a ring when setting it up, so its producer and consumer indices are out of sync between the software and hardware state. So a separate enetc_init_unused_port function was created, and it gets called right after the PF memory space is enabled. Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories") Reported-by: NMichael Walle <michael@walle.cc> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Tested-by: NMichael Walle <michael@walle.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
After the blamed patch, all RX traffic gets hashed to CPU 0 because the hashing indirection table set up in: enetc_pf_probe -> enetc_alloc_si_resources -> enetc_configure_si -> enetc_setup_default_rss_table is overwritten later in: enetc_pf_probe -> enetc_init_port_rss_memory which zero-initializes the entire port RSS table in order to avoid ECC errors. The trouble really is that enetc_init_port_rss_memory really neads enetc_alloc_si_resources to be called, because it depends upon enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si thing could have been better thought out, it has nothing to do in a function called "alloc_si_resources", especially since its counterpart, "free_si_resources", does nothing to unwind the configuration of the SI. The point is, we need to pull out enetc_configure_si out of enetc_alloc_resources, and move it after enetc_init_port_rss_memory. This allows us to set up the default RSS indirection table after initializing the memory. Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories") Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 11月, 2020 1 次提交
-
-
由 Alex Marginean 提交于
Due to a hardware issue, an access to MDIO registers that is concurrent with other ENETC register accesses may lead to the MDIO access being dropped or corrupted. The workaround introduces locking for all register accesses to the ENETC register space. To reduce performance impact, a readers-writers locking scheme has been implemented. The writer in this case is the MDIO access code (irrelevant whether that MDIO access is a register read or write), and the reader is any access code to non-MDIO ENETC registers. Also, the datapath functions acquire the read lock fewer times and use _hot accessors. All the rest of the code uses the _wa accessors which lock every register access. The commit introducing MDIO support is - commit ebfcb23d ("enetc: Add ENETC PF level external MDIO support") but due to subsequent refactoring this patch is applicable on top of a later commit. Fixes: 6517798d ("enetc: Make MDIO accessors more generic and export to include/linux/fsl") Signed-off-by: NAlex Marginean <alexandru.marginean@nxp.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20201112182608.26177-1-claudiu.manoil@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 05 11月, 2020 1 次提交
-
-
由 Claudiu Manoil 提交于
Tx checksumming has been defeatured and completely removed from the h/w reference manual. Made a little cleanup for the TSE case as this is complementary code. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20201103140213.3294-1-claudiu.manoil@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 12 10月, 2020 1 次提交
-
-
由 Claudiu Manoil 提交于
This is a methodical transition of the driver from phylib to phylink, following the guidelines from sfp-phylink.rst. The MAC register configurations based on interface mode were moved from the probing path to the mac_config() hook. MAC enable and disable commands (enabling Rx and Tx paths at MAC level) were also extracted and assigned to their corresponding phylink hooks. As part of the migration to phylink, the serdes configuration from the driver was offloaded to the PCS_LYNX module, introduced in commit 0da4c3d3 ("net: phy: add Lynx PCS module"), the PCS_LYNX module being a mandatory component required to make the enetc driver work with phylink. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: NIoana Ciornei <ioana.cionei@nxp.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 04 8月, 2020 1 次提交
-
-
由 Jiafei Pan 提交于
The driver calls napi_schedule_irqoff() from a context where, in RT, hardirqs are not disabled, since the IRQ handler is force-threaded. In the call path of this function, __raise_softirq_irqoff() is modifying its per-CPU mask of pending softirqs that must be processed, using or_softirq_pending(). The or_softirq_pending() function is not atomic, but since interrupts are supposed to be disabled, nobody should be preempting it, and the operation should be safe. Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case, it isn't safe, and the pending softirqs mask can get corrupted, resulting in softirqs being lost and never processed. To have common code that works with PREEMPT_RT and with mainline Linux, we can use plain napi_schedule() instead. The difference is that napi_schedule() (via __napi_schedule) also calls local_irq_save, which disables hardirqs if they aren't already. But, since they already are disabled in non-RT, this means that in practice we don't see any measurable difference in throughput or latency with this patch. Signed-off-by: NJiafei Pan <Jiafei.Pan@nxp.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2020 5 次提交
-
-
由 Claudiu Manoil 提交于
Use the generic dynamic interrupt moderation (dim) framework to implement adaptive interrupt coalescing on Rx. With the per-packet interrupt scheme, a high interrupt rate has been noted for moderate traffic flows leading to high CPU utilization. The 'dim' scheme implemented by the current patch addresses this issue improving CPU utilization while using minimal coalescing time thresholds in order to preserve a good latency. On the Tx side use an optimal time threshold value by default. This value has been optimized for Tx TCP streams at a rate of around 85kpps on a 1G link, at which rate half of the Tx ring size (128) gets filled in 1500 usecs. Scaling this down to 2.5G links yields the current value of 600 usecs, which is conservative and gives good enough results for 1G links too (see next). Below are some measurement results for before and after this patch (and related dependencies) basically, for a 2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache), using 60secs log netperf TCP stream tests @ 1Gbit link (maximum throughput): 1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI thread on the same CPU: CPU utilization int rate (ints/sec) Before: 50%-60% (over 50%) 92k After: 13%-22% 3.5k-12k Comment: Major CPU utilization improvement for a single flow Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single CPU. Usually settles under 16% for longer tests. 2) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency): Total CPU utilization Total int rate (ints/sec) Before: ~80% (spikes to 90%) ~100k After: 60% (more steady) ~4k Comment: Important improvement for this load test, while the ping test outcome does not show any notable difference compared to before. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Claudiu Manoil 提交于
Enable programming of the interrupt coalescing registers and allow manual configuration of the coalescing time thresholds via ethtool. Packet thresholds have been fixed to predetermined values as there's no point in making them run-time configurable, also anticipating the dynamic interrupt moderation (DIM) algorithm which uses fixed packet thresholds as well. If the interface is up when the operation mode of traffic interrupt events is changed by the user (i.e. switching from default per-packet interrupts to coalesced interrupts), the traffic needs to be paused in the process. This patch also prepares the ground for introducing DIM on Rx. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Claudiu Manoil 提交于
Interrupt coalescing registers naming in the current revision of the Ref Man (RM) is ICR, deprecating the ICIR name used in earlier (draft) versions of the RM. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Claudiu Manoil 提交于
A reliable traffic pause (and reconfiguration) procedure is needed to be able to safely make h/w configuration changes during run-time, like changing the mode in which the interrupts are operating (i.e. with or without coalescing), as opposed to making on-the-fly register updates that may be subject to h/w or s/w concurrency issues. To this end, the code responsible of the run-time device configurations that basically starts resp. stops the traffic flow through the device has been extracted from the the enetc_open/_close procedures, to the separate standalone enetc_start/_stop procedures. Traffic stop should be as graceful as possible, it lets the executing napi threads to to finish while the interrupts stay disabled. But since the napi thread will try to re-enable interrupts by clearing the device's unmask register, the enable_irq/ disable_irq API has been used to avoid this potential concurrency issue and make the traffic pause procedure more reliable. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Claudiu Manoil 提交于
It's time to differentiate between Rx and Tx ring sizes. Not only Tx rings are processed differently than Rx rings, but their default number also differs - i.e. up to 8 Tx rings per device (8 traffic classes) vs. 2 Rx rings (one per CPU). So let's set Tx rings sizes to half the size of the Rx rings for now, to be conservative. The default ring sizes were decreased as well (to the next lower power of 2), to reduce the memory footprint, buffering etc., since the measurements I've made so far show that the rings are very unlikely to get full. This change also anticipates the introduction of the dynamic interrupt moderation (dim) algorithm which operates on maximum packet thresholds of 256 packets for Rx and 128 packets for Tx. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 6月, 2020 1 次提交
-
-
由 Claudiu Manoil 提交于
The rings bitmap of an interrupt vector encodes which of the device's rings were assigned to that interrupt vector. Hence the iteration range of the tx rings bitmap (for_each_set_bit()) should be the total number of Tx rings of that netdevice instead of the number of rings assigned to the interrupt vector. Since there are 2 cores, and one interrupt vector for each core, the number of rings asigned to an interrupt vector is half the number of available rings. The impact of this error is that the upper half of the tx rings could still generate interrupts during napi polling. Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 6月, 2020 1 次提交
-
-
由 Claudiu Manoil 提交于
VLAN tag insertion/extraction offload is correctly activated at probe time but deactivation of this feature (i.e. via ethtool) is broken. Toggling works only for Tx/Rx ring 0 of a PF, and is ignored for the other rings, including the VF rings. To fix this, the existing VLAN offload toggling code was extended to all the rings assigned to a netdevice, instead of the default ring 0 (likely a leftover from the early validation days of this feature). And the code was moved to the common set_features() function to fix toggling for the VF driver too. Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 6月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2020 2 次提交
-
-
由 Po Liu 提交于
This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP) function. There are four main feature parts to implement the flow policing and filtering for ingress flow with IEEE 802.1Qci features. They are stream identify(this is defined in the P802.1cb exactly but needed for 802.1Qci), stream filtering, stream gate and flow metering. Each function block includes many entries by index to assign parameters. So for one frame would be filtered by stream identify first, then flow into stream filter block by the same handle between stream identify and stream filtering. Then flow into stream gate control which assigned by the stream filtering entry. And then policing by the gate and limited by the max sdu in the filter block(optional). At last, policing by the flow metering block, index choosing at the fitering block. So you can see that each entry of block may link to many upper entries since they can be assigned same index means more streams want to share the same feature in the stream filtering or stream gate or flow metering. To implement such features, each stream filtered by source/destination mac address, some stream maybe also plus the vlan id value would be treated as one flow chain. This would be identified by the chain_index which already in the tc filter concept. Driver would maintain this chain and also with gate modules. The stream filter entry create by the gate index and flow meter(optional) entry id and also one priority value. Offloading only transfer the gate action and flow filtering parameters. Driver would create (or search same gate id and flow meter id and priority) one stream filter entry to set to the hardware. So stream filtering do not need transfer by the action offloading. This architecture is same with tc filter and actions relationship. tc filter maintain the list for each flow feature by keys. And actions maintain by the action list. Below showing a example commands by tc: > tc qdisc add dev eth0 ingress > ip link set eth0 address 10:00:80:00:00:00 > tc filter add dev eth0 parent ffff: protocol ip chain 11 \ flower skip_sw dst_mac 10:00:80:00:00:00 \ action gate index 10 \ sched-entry open 200000000 1 8000000 \ sched-entry close 100000000 -1 -1 Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream identify module. Then setting the gate index 10 of stream gate module. Keep the gate open for 200ms and limit the traffic volume to 8MB in this sched-entry. Then direct the frames to the ingress queue 1. Signed-off-by: NPo Liu <Po.Liu@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Po Liu 提交于
This patch is to let ethtool enable/disable the tc flower offload features. Hardware ENETC has the feature of PSFP which is for per-stream policing. When enable the tc hw offloading feature, driver would enable the IEEE 802.1Qci feature. It is only set the register enable bit for this feature not enable for any entry of per stream filtering and stream gate or stream identify but get how much capabilities for each feature. Signed-off-by: NPo Liu <Po.Liu@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2020 2 次提交
-
-
由 Claudiu Manoil 提交于
Hardware timestamping support (PTP) on Rx requires extended buffer descriptors, double the size of normal Rx descriptors. On the current controller revision only the timestamping offload requires extended Rx descriptors. Since Rx timestamping can be turned on/off at runtime, make Rx ring allocation configurable at runtime too. As a result, the static config option FSL_ENETC_HW_TIMESTAMPING can be dropped and the extended descriptors can be used only when Rx timestamping gets activated. The extension has the same size as the base descriptor, making the descriptor iterators easy to update for the extended case. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Claudiu Manoil 提交于
Improve maintainability of the code iterating the Rx buffer descriptors to prepare it to support iterating extended Rx BD descriptors as well. Don't increment by one the h/w descriptor pointers explicitly, provide an iterator that takes care of the h/w details. Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 1月, 2020 1 次提交
-
-
由 Po Liu 提交于
ENETC implement time specific departure capability, which enables the user to specify when a frame can be transmitted. When this capability is enabled, the device will delay the transmission of the frame so that it can be transmitted at the precisely specified time. The delay departure time up to 0.5 seconds in the future. If the departure time in the transmit BD has not yet been reached, based on the current time, the packet will not be transmitted. This driver was loaded by Qos driver ETF. User could load it by tc commands. Here are the example commands: tc qdisc add dev eth0 root handle 1: mqprio \ num_tc 8 map 0 1 2 3 4 5 6 7 hw 1 tc qdisc replace dev eth0 parent 1:8 etf \ clockid CLOCK_TAI delta 30000 offload These example try to set queue mapping first and then set queue 7 with 30us ahead dequeue time. Then user send test frame should set SO_TXTIME feature for socket. There are also some limitations for this feature in hardware: - Transmit checksum offloads and time specific departure operation are mutually exclusive. - Time Aware Shaper feature (Qbv) offload and time specific departure operation are mutually exclusive. Signed-off-by: NPo Liu <Po.Liu@nxp.com> Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 12月, 2019 1 次提交
-
-
由 Michael Walle 提交于
Provide a software TX timestamp and add it to the ethtool query interface. skb_tx_timestamp() is also needed if one would like to use PHY timestamping. Signed-off-by: NMichael Walle <michael@walle.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2019 1 次提交
-
-
由 Yangbo Lu 提交于
The EEE support has not been enabled on ENETC, but it may connect to a PHY which supports EEE and advertises EEE by default, while its link partner also advertises EEE. If this happens, the PHY enters low power mode when the traffic rate is low and causes packet loss. This patch disables EEE advertisement by default for any PHY that ENETC connects to, to prevent the above unwanted outcome. Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com> Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 11月, 2019 1 次提交
-
-
由 Po Liu 提交于
The ENETC hardware support the Credit Based Shaper(CBS) which part of the IEEE-802.1Qav. The CBS driver was loaded by the sch_cbs interface when set in the QOS in the kernel. Here is an example command to set 20Mbits bandwidth in 1Gbits port for taffic class 7: tc qdisc add dev eth0 root handle 1: mqprio \ num_tc 8 map 0 1 2 3 4 5 6 7 hw 1 tc qdisc replace dev eth0 parent 1:8 cbs \ locredit -1470 hicredit 30 \ sendslope -980000 idleslope 20000 offload 1 Signed-off-by: NPo Liu <Po.Liu@nxp.com> Reviewed-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 11月, 2019 1 次提交
-
-
由 Mao Wenan 提交于
While using ARCH=mips CROSS_COMPILE=mips-linux-gnu- command to compile, make C=2 drivers/net/ethernet/freescale/enetc/enetc.o one warning can be found: drivers/net/ethernet/freescale/enetc/enetc.c:1439:5: warning: symbol 'enetc_setup_tc_mqprio' was not declared. Should it be static? This patch make symbol enetc_setup_tc_mqprio static. Fixes: 34c6adf1 ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: NMao Wenan <maowenan@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 11月, 2019 1 次提交
-
-
由 Po Liu 提交于
ENETC has a register PSPEED to indicate the link speed of hardware. It is need to update accordingly. PSPEED field needs to be updated with the port speed for QBV scheduling purposes. Or else there is chance for gate slot not free by frame taking the MAC if PSPEED and phy speed not match. So update PSPEED when link adjust. This is implement by the adjust_link. Signed-off-by: NPo Liu <Po.Liu@nxp.com> Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-