1. 08 1月, 2021 29 次提交
    • J
      Merge branch 'offload-software-learnt-bridge-addresses-to-dsa' · c214cc3a
      Jakub Kicinski 提交于
      Vladimir Oltean says:
      
      ====================
      Offload software learnt bridge addresses to DSA
      
      This series tries to make DSA behave a bit more sanely when bridged with
      "foreign" (non-DSA) interfaces and source address learning is not
      supported on the hardware CPU port (which would make things work more
      seamlessly without software intervention). When a station A connected to
      a DSA switch port needs to talk to another station B connected to a
      non-DSA port through the Linux bridge, DSA must explicitly add a route
      for station B towards its CPU port.
      
      Initial RFC was posted here:
      https://patchwork.ozlabs.org/project/netdev/cover/20201108131953.2462644-1-olteanv@gmail.com/
      
      v2 was posted here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20201213024018.772586-1-vladimir.oltean@nxp.com/
      
      v3 was posted here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20201213140710.1198050-1-vladimir.oltean@nxp.com/
      
      This is a resend of the previous v3 with some added Reviewed-by tags.
      ====================
      
      Link: https://lore.kernel.org/r/20210106095136.224739-1-olteanv@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c214cc3a
    • V
      net: dsa: ocelot: request DSA to fix up lack of address learning on CPU port · c54913c1
      Vladimir Oltean 提交于
      Given the following setup:
      
      ip link add br0 type bridge
      ip link set eno0 master br0
      ip link set swp0 master br0
      ip link set swp1 master br0
      ip link set swp2 master br0
      ip link set swp3 master br0
      
      Currently, packets received on a DSA slave interface (such as swp0)
      which should be routed by the software bridge towards a non-switch port
      (such as eno0) are also flooded towards the other switch ports (swp1,
      swp2, swp3) because the destination is unknown to the hardware switch.
      
      This patch addresses the issue by monitoring the addresses learnt by the
      software bridge on eno0, and adding/deleting them as static FDB entries
      on the CPU port accordingly.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c54913c1
    • V
      net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors · d5f19486
      Vladimir Oltean 提交于
      Some DSA switches (and not only) cannot learn source MAC addresses from
      packets injected from the CPU. They only perform hardware address
      learning from inbound traffic.
      
      This can be problematic when we have a bridge spanning some DSA switch
      ports and some non-DSA ports (which we'll call "foreign interfaces" from
      DSA's perspective).
      
      There are 2 classes of problems created by the lack of learning on
      CPU-injected traffic:
      - excessive flooding, due to the fact that DSA treats those addresses as
        unknown
      - the risk of stale routes, which can lead to temporary packet loss
      
      To illustrate the second class, consider the following situation, which
      is common in production equipment (wireless access points, where there
      is a WLAN interface and an Ethernet switch, and these form a single
      bridging domain).
      
       AP 1:
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
             |                                                       ^        ^
             |                                                       |        |
             |                                                       |        |
             |                                                    Client A  Client B
             |
             |
             |
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       AP 2
      
      - br0 of AP 1 will know that Clients A and B are reachable via wlan0
      - the hardware fdb of a DSA switch driver today is not kept in sync with
        the software entries on other bridge ports, so it will not know that
        clients A and B are reachable via the CPU port UNLESS the hardware
        switch itself performs SA learning from traffic injected from the CPU.
        Nonetheless, a substantial number of switches don't.
      - the hardware fdb of the DSA switch on AP 2 may autonomously learn that
        Client A and B are reachable through swp0. Therefore, the software br0
        of AP 2 also may or may not learn this. In the example we're
        illustrating, some Ethernet traffic has been going on, and br0 from AP
        2 has indeed learnt that it can reach Client B through swp0.
      
      One of the wireless clients, say Client B, disconnects from AP 1 and
      roams to AP 2. The topology now looks like this:
      
       AP 1:
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
             |                                                            ^
             |                                                            |
             |                                                         Client A
             |
             |
             |                                                         Client B
             |                                                            |
             |                                                            v
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       AP 2
      
      - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
      - br0 of AP 1 will (possibly) know that Client B has left wlan0. There
        are cases where it might never find out though. Either way, DSA today
        does not process that notification in any way.
      - the hardware FDB of the DSA switch on AP 1 may learn autonomously that
        Client B can be reached via swp0, if it receives any packet with
        Client 1's source MAC address over Ethernet.
      - the hardware FDB of the DSA switch on AP 2 still thinks that Client B
        can be reached via swp0. It does not know that it has roamed to wlan0,
        because it doesn't perform SA learning from the CPU port.
      
      Now Client A contacts Client B.
      AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
      segment.
      AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
      Hairpinning is disabled => drop.
      
      This problem comes from the fact that these switches have a 'blind spot'
      for addresses coming from software bridging. The generic solution is not
      to assume that hardware learning can be enabled somehow, but to listen
      to more bridge learning events. It turns out that the bridge driver does
      learn in software from all inbound frames, in __br_handle_local_finish.
      A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
      addresses serviced by the bridge on 'foreign' interfaces. The software
      bridge also does the right thing on migration, by notifying that the old
      entry is deleted, so that does not need to be special-cased in DSA. When
      it is deleted, we just need to delete our static FDB entry towards the
      CPU too, and wait.
      
      The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
      events received on its own interfaces, such as static FDB entries.
      
      Luckily we can change that, and DSA can listen to all switchdev FDB
      add/del events in the system and figure out if those events were emitted
      by a bridge that spans at least one of DSA's own ports. In case that is
      true, DSA will also offload that address towards its own CPU port, in
      the eventuality that there might be bridge clients attached to the DSA
      switch who want to talk to the station connected to the foreign
      interface.
      
      In terms of implementation, we need to keep the fdb_info->added_by_user
      check for the case where the switchdev event was targeted directly at a
      DSA switch port. But we don't need to look at that flag for snooped
      events. So the check is currently too late, we need to move it earlier.
      This also simplifies the code a bit, since we avoid uselessly allocating
      and freeing switchdev_work.
      
      We could probably do some improvements in the future. For example,
      multi-bridge support is rudimentary at the moment. If there are two
      bridges spanning a DSA switch's ports, and both of them need to service
      the same MAC address, then what will happen is that the migration of one
      of those stations will trigger the deletion of the FDB entry from the
      CPU port while it is still used by other bridge. That could be improved
      with reference counting but is left for another time.
      
      This behavior needs to be enabled at driver level by setting
      ds->assisted_learning_on_cpu_port = true. This is because we don't want
      to inflict a potential performance penalty (accesses through
      MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
      because address learning on the CPU port works there.
      Reported-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d5f19486
    • V
      net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDB · 5fb4a451
      Vladimir Oltean 提交于
      Right now, the following would happen for a switch driver that does not
      implement .port_fdb_add or .port_fdb_del.
      
      dsa_slave_switchdev_event returns NOTIFY_OK and schedules:
      -> dsa_slave_switchdev_event_work
         -> dsa_port_fdb_add
            -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD)
               -> dsa_switch_fdb_add
                  -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP;
         -> an error is printed with dev_dbg, and
            dsa_fdb_offload_notify(switchdev_work) is not called.
      
      We can avoid scheduling the worker for nothing and say NOTIFY_DONE.
      Because we don't call dsa_fdb_offload_notify, the static FDB entry will
      remain just in the software bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5fb4a451
    • V
      net: dsa: move switchdev event implementation under the same switch/case statement · 447d290a
      Vladimir Oltean 提交于
      We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
      events even for interfaces where dsa_slave_dev_check returns false, so
      we need that check inside the switch-case statement for SWITCHDEV_FDB_*.
      
      This movement also avoids a useless allocation / free of switchdev_work
      on the untreated "default event" case.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      447d290a
    • V
      net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work · c4bb76a9
      Vladimir Oltean 提交于
      Currently DSA doesn't add FDB entries on the CPU port, because it only
      does so through switchdev, which is associated with a net_device, and
      there are none of those for the CPU port.
      
      But actually FDB addresses on the CPU port have some use cases of their
      own, if the switchdev operations are initiated from within the DSA
      layer. There is just one problem with the existing code: it passes a
      structure in dsa_switchdev_event_work which was retrieved directly from
      switchdev, so it contains a net_device. We need to generalize the
      contents to something that covers the CPU port as well: the "ds, port"
      tuple is fine for that.
      
      Note that the new procedure for notifying the successful FDB offload is
      inspired from the rocker model.
      
      Also, nothing was being done if added_by_user was false. Let's check for
      that a lot earlier, and don't actually bother to schedule the worker
      for nothing.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c4bb76a9
    • V
      net: dsa: be louder when a non-legacy FDB operation fails · 2fd18650
      Vladimir Oltean 提交于
      The dev_close() call was added in commit c9eb3e0f ("net: dsa: Add
      support for learning FDB through notification") "to indicate inconsistent
      situation" when we could not delete an FDB entry from the port.
      
      bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master
      
      It is a bit drastic and at the same time not helpful if the above fails
      to only print with netdev_dbg log level, but on the other hand to bring
      the interface down.
      
      So increase the verbosity of the error message, and drop dev_close().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2fd18650
    • V
      net: bridge: notify switchdev of disappearance of old FDB entry upon migration · 90dc8fd3
      Vladimir Oltean 提交于
      Currently the bridge emits atomic switchdev notifications for
      dynamically learnt FDB entries. Monitoring these notifications works
      wonders for switchdev drivers that want to keep their hardware FDB in
      sync with the bridge's FDB.
      
      For example station A wants to talk to station B in the diagram below,
      and we are concerned with the behavior of the bridge on the DUT device:
      
                         DUT
       +-------------------------------------+
       |                 br0                 |
       | +------+ +------+ +------+ +------+ |
       | |      | |      | |      | |      | |
       | | swp0 | | swp1 | | swp2 | | eth0 | |
       +-------------------------------------+
            |        |                  |
        Station A    |                  |
                     |                  |
               +--+------+--+    +--+------+--+
               |  |      |  |    |  |      |  |
               |  | swp0 |  |    |  | swp0 |  |
       Another |  +------+  |    |  +------+  | Another
        switch |     br0    |    |     br0    | switch
               |  +------+  |    |  +------+  |
               |  |      |  |    |  |      |  |
               |  | swp1 |  |    |  | swp1 |  |
               +--+------+--+    +--+------+--+
                                        |
                                    Station B
      
      Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
      the following property: frames injected from its control interface bypass
      the internal address analyzer logic, and therefore, this hardware does
      not learn from the source address of packets transmitted by the network
      stack through it. So, since bridging between eth0 (where Station B is
      attached) and swp0 (where Station A is attached) is done in software,
      the switchdev hardware will never learn the source address of Station B.
      So the traffic towards that destination will be treated as unknown, i.e.
      flooded.
      
      This is where the bridge notifications come in handy. When br0 on the
      DUT sees frames with Station B's MAC address on eth0, the switchdev
      driver gets these notifications and can install a rule to send frames
      towards Station B's address that are incoming from swp0, swp1, swp2,
      only towards the control interface. This is all switchdev driver private
      business, which the notification makes possible.
      
      All is fine until someone unplugs Station B's cable and moves it to the
      other switch:
      
                         DUT
       +-------------------------------------+
       |                 br0                 |
       | +------+ +------+ +------+ +------+ |
       | |      | |      | |      | |      | |
       | | swp0 | | swp1 | | swp2 | | eth0 | |
       +-------------------------------------+
            |        |                  |
        Station A    |                  |
                     |                  |
               +--+------+--+    +--+------+--+
               |  |      |  |    |  |      |  |
               |  | swp0 |  |    |  | swp0 |  |
       Another |  +------+  |    |  +------+  | Another
        switch |     br0    |    |     br0    | switch
               |  +------+  |    |  +------+  |
               |  |      |  |    |  |      |  |
               |  | swp1 |  |    |  | swp1 |  |
               +--+------+--+    +--+------+--+
                     |
                 Station B
      
      Luckily for the use cases we care about, Station B is noisy enough that
      the DUT hears it (on swp1 this time). swp1 receives the frames and
      delivers them to the bridge, who enters the unlikely path in br_fdb_update
      of updating an existing entry. It moves the entry in the software bridge
      to swp1 and emits an addition notification towards that.
      
      As far as the switchdev driver is concerned, all that it needs to ensure
      is that traffic between Station A and Station B is not forever broken.
      If it does nothing, then the stale rule to send frames for Station B
      towards the control interface remains in place. But Station B is no
      longer reachable via the control interface, but via a port that can
      offload the bridge port learning attribute. It's just that the port is
      prevented from learning this address, since the rule overrides FDB
      updates. So the rule needs to go. The question is via what mechanism.
      
      It sure would be possible for this switchdev driver to keep track of all
      addresses which are sent to the control interface, and then also listen
      for bridge notifier events on its own ports, searching for the ones that
      have a MAC address which was previously sent to the control interface.
      But this is cumbersome and inefficient. Instead, with one small change,
      the bridge could notify of the address deletion from the old port, in a
      symmetrical manner with how it did for the insertion. Then the switchdev
      driver would not be required to monitor learn/forget events for its own
      ports. It could just delete the rule towards the control interface upon
      bridge entry migration. This would make hardware address learning be
      possible again. Then it would take a few more packets until the hardware
      and software FDB would be in sync again.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      90dc8fd3
    • J
      Merge branch 'r8169-improve-rtl8168g-phy-suspend-quirk' · dd15c4a0
      Jakub Kicinski 提交于
      Heiner Kallweit says:
      
      ====================
      r8169: improve RTL8168g PHY suspend quirk
      
      According to Realtek the ERI register 0x1a8 quirk is needed to work
      around a hw issue with the PHY on RTL8168g. The register needs to be
      changed before powering down the PHY. Currently we don't meet this
      requirement, however I'm not aware of any problems caused by this.
      Therefore I see the change as an improvement.
      
      The PHY driver has no means to access the chip ERI registers,
      therefore we have to intercept MDIO writes to the BMCR register.
      If the BMCR_PDOWN bit is going to be set, then let's apply the
      quirk before actually powering down the PHY.
      ====================
      
      Link: https://lore.kernel.org/r/9303c2cf-c521-beea-c09f-63b5dfa91b9c@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      dd15c4a0
    • H
      r8169: improve RTL8168g PHY suspend quirk · acb58657
      Heiner Kallweit 提交于
      According to Realtek the ERI register 0x1a8 quirk is needed to work
      around a hw issue with the PHY on RTL8168g. The register needs to be
      changed before powering down the PHY. Currently we don't meet this
      requirement, however I'm not aware of any problems caused by this.
      Therefore I see the change as an improvement.
      
      The PHY driver has no means to access the chip ERI registers,
      therefore we have to intercept MDIO writes to BMCR register.
      If the BMCR_PDOWN bit is going to be set, then let's apply the
      quirk before actually powering down the PHY.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      acb58657
    • H
      r8169: move ERI access functions to avoid forward declaration · c6cff9df
      Heiner Kallweit 提交于
      No functional change here. We just move a code block to avoid a
      function forward declaration in a subsequent change.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c6cff9df
    • H
      net: phy: replace mutex_is_locked with lockdep_assert_held in phylib · e6e918d4
      Heiner Kallweit 提交于
      Switch to lockdep_assert_held(_once), similar to what is being done
      in other subsystems. One advantage is that there's zero runtime
      overhead if lockdep support isn't enabled.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/ccc40b9d-8ee0-43a1-5009-2cc95ca79c85@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e6e918d4
    • F
      net: phy: bcm7xxx: Add an entry for BCM72116 · 8b86850b
      Florian Fainelli 提交于
      BCM72116 features a 28nm integrated EPHY, add an entry to match this PHY
      OUI.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20210106170944.1253046-1-f.fainelli@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8b86850b
    • J
      Merge branch 'udp_tunnel_nic-post-conversion-cleanup' · 0b86235d
      Jakub Kicinski 提交于
      udp_tunnel_nic: post conversion cleanup
      
      It has been two releases since we added the common infra for UDP
      tunnel port offload, and we have not heard of any major issues.
      Remove the old direct driver NDOs completely, and perform minor
      simplifications in the tunnel drivers.
      
      Link: https://lore.kernel.org/r/20210106210637.1839662-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0b86235d
    • J
      udp_tunnel: reshuffle NETIF_F_RX_UDP_TUNNEL_PORT checks · b9ef3fec
      Jakub Kicinski 提交于
      Move the NETIF_F_RX_UDP_TUNNEL_PORT feature check into
      udp_tunnel_nic_*_port() helpers, since they're always
      done right before the call.
      
      Add similar checks before calling the notifier.
      udp_tunnel_nic invokes the notifier without checking
      features which could result in some wasted cycles.
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b9ef3fec
    • J
      net: remove ndo_udp_tunnel_* callbacks · 30bfce10
      Jakub Kicinski 提交于
      All UDP tunnel port management is now routed via udp_tunnel_nic
      infra directly. Remove the old callbacks.
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      30bfce10
    • J
      udp_tunnel: remove REGISTER/UNREGISTER handling from tunnel drivers · dedc33e7
      Jakub Kicinski 提交于
      udp_tunnel_nic handles REGISTER and UNREGISTER event, now that all
      drivers use that infra we can drop the event handling in the tunnel
      drivers.
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      dedc33e7
    • J
      udp_tunnel: hard-wire NDOs to udp_tunnel_nic_*_port() helpers · 876c4384
      Jakub Kicinski 提交于
      All drivers use udp_tunnel_nic_*_port() helpers, prepare for
      NDO removal by invoking those helpers directly.
      
      The helpers are safe to call on all devices, they check if
      device has the UDP tunnel state initialized.
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      876c4384
    • F
      net: broadcom: Drop OF dependency from BGMAC_PLATFORM · ddb4d32e
      Florian Fainelli 提交于
      All of the OF code that is used has stubbed and will compile and link
      just fine, keeping COMPILE_TEST is enough.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NRandy Dunlap <rdunlap@infradead.org>
      Link: https://lore.kernel.org/r/20210106191546.1358324-1-f.fainelli@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ddb4d32e
    • J
      Merge branch 'bcm63xx_enet-major-makeover-of-driver' · c61ce06f
      Jakub Kicinski 提交于
      Sieng Piaw Liew says:
      
      ====================
      bcm63xx_enet: major makeover of driver
      
      This patch series aim to improve the bcm63xx_enet driver by integrating the
      latest networking features, i.e. batched rx processing, BQL, build_skb,
      etc.
      
      The newer enetsw SoCs are found to be able to do unaligned rx DMA by adding
      NET_IP_ALIGN padding which, combined with these patches, improved packet
      processing performance by ~50% on BCM6328.
      
      Older non-enetsw SoCs still benefit mainly from rx batching. Performance
      improvement of ~30% is observed on BCM6333.
      
      The BCM63xx SoCs are designed for routers. As such, having BQL is
      beneficial as well as trivial to add.
      
      v3:
      * Simplify xmit_more patch by not moving around the code needlessly.
      * Fix indentation in xmit_more patch.
      * Fix indentation in build_skb patch.
      * Split rx ring cleanup patch from build_skb patch and precede build_skb
        patch for better understanding, as suggested by Florian Fainelli.
      
      v2:
      * Add xmit_more support and rx loop improvisation patches.
      * Moved BQL netdev_reset_queue() to bcm_enet_stop()/bcm_enetsw_stop()
        functions as suggested by Florian Fainelli.
      * Improved commit messages.
      ====================
      
      Link: https://lore.kernel.org/r/20210106144208.1935-1-liew.s.piaw@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c61ce06f
    • S
      bcm63xx_enet: improve rx loop · ae2259ee
      Sieng Piaw Liew 提交于
      Use existing rx processed count to track against budget, thereby making
      budget decrement operation redundant.
      
      rx_desc_count can be calculated outside the rx loop, making the loop a
      bit smaller.
      Signed-off-by: NSieng Piaw Liew <liew.s.piaw@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ae2259ee
    • S
      bcm63xx_enet: convert to build_skb · d27de0ef
      Sieng Piaw Liew 提交于
      We can increase the efficiency of rx path by using buffers to receive
      packets then build SKBs around them just before passing into the network
      stack. In contrast, preallocating SKBs too early reduces CPU cache
      efficiency.
      
      Check if we're in NAPI context when refilling RX. Normally we're almost
      always running in NAPI context. Dispatch to napi_alloc_frag directly
      instead of relying on netdev_alloc_frag which does the same but
      with the overhead of local_bh_disable/enable.
      
      Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec
      performance. Included netif_receive_skb_list and NET_IP_ALIGN
      optimizations.
      
      Before:
      [ ID] Interval           Transfer     Bandwidth       Retr
      [  4]   0.00-10.00  sec  49.9 MBytes  41.9 Mbits/sec  197         sender
      [  4]   0.00-10.00  sec  49.3 MBytes  41.3 Mbits/sec            receiver
      
      After:
      [ ID] Interval           Transfer     Bandwidth       Retr
      [  4]   0.00-30.00  sec   171 MBytes  47.8 Mbits/sec  272         sender
      [  4]   0.00-30.00  sec   170 MBytes  47.6 Mbits/sec            receiver
      Signed-off-by: NSieng Piaw Liew <liew.s.piaw@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d27de0ef
    • S
      bcm63xx_enet: consolidate rx SKB ring cleanup code · 3d0b7265
      Sieng Piaw Liew 提交于
      The rx SKB ring use the same code for cleanup at various points.
      Combine them into a function to reduce lines of code.
      Signed-off-by: NSieng Piaw Liew <liew.s.piaw@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3d0b7265
    • S
      bcm63xx_enet: alloc rx skb with NET_IP_ALIGN · c4a20786
      Sieng Piaw Liew 提交于
      Use netdev_alloc_skb_ip_align on newer SoCs with integrated switch
      (enetsw) when refilling RX. Increases packet processing performance
      by 30% (with netif_receive_skb_list).
      
      Non-enetsw SoCs cannot function with the extra pad so continue to use
      the regular netdev_alloc_skb.
      
      Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec
      performance.
      
      Before:
      [ ID] Interval Transfer Bandwidth Retr
      [ 4] 0.00-30.00 sec 120 MBytes 33.7 Mbits/sec 277 sender
      [ 4] 0.00-30.00 sec 120 MBytes 33.5 Mbits/sec receiver
      
      After (+netif_receive_skb_list):
      [ 4] 0.00-30.00 sec 155 MBytes 43.3 Mbits/sec 354 sender
      [ 4] 0.00-30.00 sec 154 MBytes 43.1 Mbits/sec receiver
      Signed-off-by: NSieng Piaw Liew <liew.s.piaw@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c4a20786
    • S
      bcm63xx_enet: add xmit_more support · 375281d3
      Sieng Piaw Liew 提交于
      Support bulking hardware TX queue by using netdev_xmit_more().
      Signed-off-by: NSieng Piaw Liew <liew.s.piaw@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      375281d3
    • S
      bcm63xx_enet: add BQL support · 4c59b0f5
      Sieng Piaw Liew 提交于
      Add Byte Queue Limits support to reduce/remove bufferbloat in
      bcm63xx_enet.
      Signed-off-by: NSieng Piaw Liew <liew.s.piaw@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4c59b0f5
    • S
      bcm63xx_enet: batch process rx path · 9cbfea02
      Sieng Piaw Liew 提交于
      Use netif_receive_skb_list to batch process rx skb.
      Tested on BCM6328 320 MHz using iperf3 -M 512, increasing performance
      by 12.5%.
      
      Before:
      [ ID] Interval           Transfer     Bandwidth       Retr
      [  4]   0.00-30.00  sec   120 MBytes  33.7 Mbits/sec  277         sender
      [  4]   0.00-30.00  sec   120 MBytes  33.5 Mbits/sec            receiver
      
      After:
      [ ID] Interval           Transfer     Bandwidth       Retr
      [  4]   0.00-30.00  sec   136 MBytes  37.9 Mbits/sec  203         sender
      [  4]   0.00-30.00  sec   135 MBytes  37.7 Mbits/sec            receiver
      Signed-off-by: NSieng Piaw Liew <liew.s.piaw@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9cbfea02
    • K
      qmi_wwan: Increase headroom for QMAP SKBs · 2e423387
      Kristian Evensen 提交于
      When measuring the throughput (iperf3 + TCP) while routing on a
      not-so-powerful device (Mediatek MT7621, 880MHz CPU), I noticed that I
      achieved significantly lower speeds with QMI-based modems than for
      example a USB LAN dongle. The CPU was saturated in all of my tests.
      
      With the dongle I got ~300 Mbit/s, while I only measured ~200 Mbit/s
      with the modems. All offloads, etc.  were switched off for the dongle,
      and I configured the modems to use QMAP (16k aggregation). The tests
      with the dongle were performed in my local (gigabit) network, while the
      LTE network the modems were connected to delivers 700-800 Mbit/s.
      
      Profiling the kernel revealed the cause of the performance difference.
      In qmimux_rx_fixup(), an SKB is allocated for each packet contained in
      the URB. This SKB has too little headroom, causing the check in
      skb_cow() (called from ip_forward()) to fail. pskb_expand_head() is then
      called and the SKB is reallocated. In the output from perf, I see that a
      significant amount of time is spent in pskb_expand_head() + support
      functions.
      
      In order to ensure that the SKB has enough headroom, this commit
      increases the amount of memory allocated in qmimux_rx_fixup() by
      LL_MAX_HEADER. The reason for using LL_MAX_HEADER and not a more
      accurate value, is that we do not know the type of the outgoing network
      interface. After making this change, I achieve the same throughput with
      the modems as with the dongle.
      Signed-off-by: NKristian Evensen <kristian.evensen@gmail.com>
      Acked-by: NBjørn Mork <bjorn@mork.no>
      Link: https://lore.kernel.org/r/20210106122403.1321180-1-kristian.evensen@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      2e423387
    • J
      Merge tag 'linux-can-next-for-5.12-20210106' of... · c10b377f
      Jakub Kicinski 提交于
      Merge tag 'linux-can-next-for-5.12-20210106' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2021-01-06
      
      The first 16 patches are by me and target the tcan4x5x SPI glue driver for the
      m_can CAN driver. First there are a several cleanup commits, then the SPI
      regmap part is converted to 8 bits per word, to make it possible to use that
      driver on SPI controllers that only support the 8 bit per word mode (such as
      the SPI cores on the raspberry pi).
      
      Oliver Hartkopp contributes a patch for the CAN_RAW protocol. The getsockopt()
      for CAN_RAW_FILTER is changed to return -ERANGE if the filterset does not fit
      into the provided user space buffer.
      
      The last two patches are by Joakim Zhang and add wakeup support to the flexcan
      driver for the i.MX8QM SoC. The dt-bindings docs are extended to describe the
      added property.
      
      * tag 'linux-can-next-for-5.12-20210106' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
        can: flexcan: add CAN wakeup function for i.MX8QM
        dt-bindings: can: fsl,flexcan: add fsl,scu-index property to indicate a resource
        can: raw: return -ERANGE when filterset does not fit into user space buffer
        can: tcan4x5x: add support for half-duplex controllers
        can: tcan4x5x: rework SPI access
        can: tcan4x5x: add {wr,rd}_table
        can: tcan4x5x: add max_raw_{read,write} of 256
        can: tcan4x5x: tcan4x5x_regmap: set reg_stride to 4
        can: tcan4x5x: fix max register value
        can: tcan4x5x: tcan4x5x_regmap_init(): use spi as context pointer
        can: tcan4x5x: tcan4x5x_regmap_write(): remove not needed casts and replace 4 by sizeof
        can: tcan4x5x: rename regmap_spi_gather_write() -> tcan4x5x_regmap_gather_write()
        can: tcan4x5x: remove regmap async support
        can: tcan4x5x: tcan4x5x_bus: remove not needed read_flag_mask
        can: tcan4x5x: mark struct regmap_bus tcan4x5x_bus as constant
        can: tcan4x5x: move regmap code into seperate file
        can: tcan4x5x: rename tcan4x5x.c -> tcan4x5x-core.c
        can: tcan4x5x: beautify indention of tcan4x5x_of_match and tcan4x5x_id_table
        can: tcan4x5x: replace DEVICE_NAME by KBUILD_MODNAME
      ====================
      
      Link: https://lore.kernel.org/r/20210107094900.173046-1-mkl@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c10b377f
  2. 07 1月, 2021 1 次提交
  3. 06 1月, 2021 10 次提交