1. 29 7月, 2021 19 次提交
    • D
      Merge branch 'skb-gro-optimize' · 8cb79af5
      David S. Miller 提交于
      Paolo Abeni says:
      
      ====================
      sk_buff: optimize GRO for the common case
      
      This is a trimmed down revision of "sk_buff: optimize layout for GRO",
      specifically dropping the changes to the sk_buff layout[1].
      
      This series tries to accomplish 2 goals:
      - optimize the GRO stage for the most common scenario, avoiding a bunch
        of conditional and some more code
      - let owned skbs entering the GRO engine, allowing backpressure in the
        veth GRO forward path.
      
      A new sk_buff flag (!!!) is introduced and maintained for GRO's sake.
      Such field uses an existing hole, so there is no change to the sk_buff
      size.
      
      [1] two main reasons:
      - move skb->inner_ field requires some extra care, as some in kernel
        users access and the fields regardless of skb->encapsulation.
      - extending secmark size clash with ct and nft uAPIs
      
      address the all above is possible, I think, but for sure not in a single
      series.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb79af5
    • P
      veth: use skb_prepare_for_gro() · d504fff0
      Paolo Abeni 提交于
      Leveraging the previous patch we can now avoid orphaning the
      skb in the veth gro path, allowing correct backpressure.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d504fff0
    • P
      skbuff: allow 'slow_gro' for skb carring sock reference · 5e10da53
      Paolo Abeni 提交于
      This change leverages the infrastructure introduced by the previous
      patches to allow soft devices passing to the GRO engine owned skbs
      without impacting the fast-path.
      
      It's up to the GRO caller ensuring the slow_gro bit validity before
      invoking the GRO engine. The new helper skb_prepare_for_gro() is
      introduced for that goal.
      
      On slow_gro, skbs are aggregated only with equal sk.
      Additionally, skb truesize on GRO recycle and free is correctly
      updated so that sk wmem is not changed by the GRO processing.
      
      rfc-> v1:
       - fixed bad truesize on dev_gro_receive NAPI_FREE
       - use the existing state bit
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e10da53
    • P
      net: optimize GRO for the common case. · 9efb4b5b
      Paolo Abeni 提交于
      After the previous patches, at GRO time, skb->slow_gro is
      usually 0, unless the packets comes from some H/W offload
      slowpath or tunnel.
      
      We can optimize the GRO code assuming !skb->slow_gro is likely.
      This remove multiple conditionals in the most common path, at the
      price of an additional one when we hit the above "slow-paths".
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9efb4b5b
    • P
      sk_buff: track extension status in slow_gro · b0999f38
      Paolo Abeni 提交于
      Similar to the previous one, but tracking the
      active_extensions field status.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0999f38
    • P
      sk_buff: track dst status in slow_gro · 8a886b14
      Paolo Abeni 提交于
      Similar to the previous patch, but covering the dst field:
      the slow_gro flag is additionally set when a dst is attached
      to the skb
      
      RFC -> v1:
       - use the existing flag instead of adding a new one
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a886b14
    • P
      sk_buff: introduce 'slow_gro' flags · 5fc88f93
      Paolo Abeni 提交于
      The new flag tracks if any state field is set, so that
      GRO requires 'unusual'/slow prepare steps.
      
      Set such flag when a ct entry is attached to the skb,
      and never clear it.
      
      The new bit uses an existing hole into the sk_buff struct
      
      RFC -> v1:
       - use a single state bit, never clear it
       - avoid moving the _nfct field
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fc88f93
    • H
      Documentation: networking: add ioam6-sysctl into index · 883d71a5
      Hu Haowen 提交于
      Append ioam6-sysctl to toctree in order to get rid of building warnings.
      Signed-off-by: NHu Haowen <src.res@email.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      883d71a5
    • V
      net: dsa: sja1105: be stateless when installing FDB entries · b11f0a4c
      Vladimir Oltean 提交于
      Currently there are issues when adding a bridge FDB entry as VLAN-aware
      and deleting it as VLAN-unaware, or vice versa.
      
      However this is an unneeded complication, since the bridge always
      installs its default FDB entries in VLAN 0 to match on VLAN-unaware
      ports, and in the default_pvid (VLAN 1) to match on VLAN-aware ports.
      So instead of trying to outsmart the bridge, just install all entries it
      gives us, and they will start matching packets when the vlan_filtering
      mode changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b11f0a4c
    • D
      Merge branch 'switchdev-notifiers' · b0fdb999
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      Plug the last 2 holes in the switchdev notifiers for local FDB entries
      
      The work for trapping local FDB entries to the CPU in switchdev/DSA
      started with the "RX filtering in DSA" series:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210629140658.2510288-1-olteanv@gmail.com/
      and was continued with further improvements such as "Fan out FDB entries
      pointing towards the bridge to all switchdev member ports":
      https://patchwork.kernel.org/project/netdevbpf/cover/20210719135140.278938-1-vladimir.oltean@nxp.com/
      https://patchwork.kernel.org/project/netdevbpf/cover/20210720173557.999534-1-vladimir.oltean@nxp.com/
      
      There are only 2 more issues left to be addressed (famous last words),
      and these are:
      - dynamically learned FDB entries towards interfaces foreign to DSA need
        to be replayed too
      - adding/deleting a VLAN on a port causes the local FDB entries in that
        VLAN to be prematurely deleted
      
      This patch series addresses both, and patch 2 depends on 1 to work properly.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0fdb999
    • V
      net: bridge: switchdev: treat local FDBs the same as entries towards the bridge · 52e4bec1
      Vladimir Oltean 提交于
      Currently the following script:
      
      1. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
      2. ip link set swp2 up && ip link set swp2 master br0
      3. ip link set swp3 up && ip link set swp3 master br0
      4. ip link set swp4 up && ip link set swp4 master br0
      5. bridge vlan del dev swp2 vid 1
      6. bridge vlan del dev swp3 vid 1
      7. ip link set swp4 nomaster
      8. ip link set swp3 nomaster
      
      produces the following output:
      
      [  641.010738] sja1105 spi0.1: port 2 failed to delete 00:1f:7b:63:02:48 vid 1 from fdb: -2
      
      [ swp2, swp3 and br0 all have the same MAC address, the one listed above ]
      
      In short, this happens because the number of FDB entry additions
      notified to switchdev is unbalanced with the number of deletions.
      
      At step 1, the bridge has a random MAC address. At step 2, the
      br_fdb_replay of swp2 receives this initial MAC address. Then the bridge
      inherits the MAC address of swp2 via br_fdb_change_mac_address(), and it
      notifies switchdev (only swp2 at this point) of the deletion of the
      random MAC address and the addition of 00:1f:7b:63:02:48 as a local FDB
      entry with fdb->dst == swp2, in VLANs 0 and the default_pvid (1).
      
      During step 7:
      
      del_nbp
      -> br_fdb_delete_by_port(br, p, vid=0, do_all=1);
         -> fdb_delete_local(br, p, f);
      
      br_fdb_delete_by_port() deletes all entries towards the ports,
      regardless of vid, because do_all is 1.
      
      fdb_delete_local() has logic to migrate local FDB entries deleted from
      one port to another port which shares the same MAC address and is in the
      same VLAN, or to the bridge device itself. This migration happens
      without notifying switchdev of the deletion on the old port and the
      addition on the new one, just fdb->dst is changed and the added_by_user
      flag is cleared.
      
      In the example above, the del_nbp(swp4) causes the
      "addr 00:1f:7b:63:02:48 vid 1" local FDB entry with fdb->dst == swp4
      that existed up until then to be migrated directly towards the bridge
      (fdb->dst == NULL). This is because it cannot be migrated to any of the
      other ports (swp2 and swp3 are not in VLAN 1).
      
      After the migration to br0 takes place, swp4 requests a deletion replay
      of all FDB entries. Since the "addr 00:1f:7b:63:02:48 vid 1" entry now
      point towards the bridge, a deletion of it is replayed. There was just
      a prior addition of this address, so the switchdev driver deletes this
      entry.
      
      Then, the del_nbp(swp3) at step 8 triggers another br_fdb_replay, and
      switchdev is notified again to delete "addr 00:1f:7b:63:02:48 vid 1".
      But it can't because it no longer has it, so it returns -ENOENT.
      
      There are other possibilities to trigger this issue, but this is by far
      the simplest to explain.
      
      To fix this, we must avoid the situation where the addition of an FDB
      entry is notified to switchdev as a local entry on a port, and the
      deletion is notified on the bridge itself.
      
      Considering that the 2 types of FDB entries are completely equivalent
      and we cannot have the same MAC address as a local entry on 2 bridge
      ports, or on a bridge port and pointing towards the bridge at the same
      time, it makes sense to hide away from switchdev completely the fact
      that a local FDB entry is associated with a given bridge port at all.
      Just say that it points towards the bridge, it should make no difference
      whatsoever to the switchdev driver and should even lead to a simpler
      overall implementation, will less cases to handle.
      
      This also avoids any modification at all to the core bridge driver, just
      what is reported to switchdev changes. With the local/permanent entries
      on bridge ports being already reported to user space, it is hard to
      believe that the bridge behavior can change in any backwards-incompatible
      way such as making all local FDB entries point towards the bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e4bec1
    • V
      net: bridge: switchdev: replay the entire FDB for each port · b4454bc6
      Vladimir Oltean 提交于
      Currently when a switchdev port joins a bridge, we replay all FDB
      entries pointing towards that port or towards the bridge.
      
      However, this is insufficient in certain situations:
      
      (a) DSA, through its assisted_learning_on_cpu_port logic, snoops
          dynamically learned FDB entries on foreign interfaces.
          These are FDB entries that are pointing neither towards the newly
          joined switchdev port, nor towards the bridge. So these addresses
          would be missed when joining a bridge where a foreign interface has
          already learned some addresses, and they would also linger on if the
          DSA port leaves the bridge before the foreign interface forgets them.
          None of this happens if we replay the entire FDB when the port joins.
      
      (b) There is a desire to treat local FDB entries on a port (i.e. the
          port's termination MAC address) identically to FDB entries pointing
          towards the bridge itself. More details on the reason behind this in
          the next patch. The point is that this cannot be done given the
          current structure of br_fdb_replay() in this situation:
            ip link set swp0 master br0  # br0 inherits its MAC address from swp0
            ip link set swp1 master br0
          What is desirable is that when swp1 joins the bridge, br_fdb_replay()
          also notifies swp1 of br0's MAC address, but this won't in fact
          happen because the MAC address of br0 does not have fdb->dst == NULL
          (it doesn't point towards the bridge), but it has fdb->dst == swp0.
          So our current logic makes it impossible for that address to be
          replayed. But if we dump the entire FDB instead of just the entries
          with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC
          address of br0 will be replayed too, which is what we need.
      
      A natural question arises: say there is an FDB entry to be replayed,
      like a MAC address dynamically learned on a foreign interface that
      belongs to a bridge where no switchdev port has joined yet. If 10
      switchdev ports belonging to the same driver join this bridge, one by
      one, won't every port get notified 10 times of the foreign FDB entry,
      amounting to a total of 100 notifications for this FDB entry in the
      switchdev driver?
      
      Well, yes, but this is where the "void *ctx" argument for br_fdb_replay
      is useful: every port of the switchdev driver is notified whenever any
      other port requests an FDB replay, but because the replay was initiated
      by a different port, its context is different from the initiating port's
      context, so it ignores those replays.
      
      So the foreign FDB entry will be installed only 10 times, once per port.
      This is done so that the following 4 code paths are always well balanced:
      (a) addition of foreign FDB entry is replayed when port joins bridge
      (b) deletion of foreign FDB entry is replayed when port leaves bridge
      (c) addition of foreign FDB entry is notified to all ports currently in bridge
      (c) deletion of foreign FDB entry is notified to all ports currently in bridge
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4454bc6
    • D
      Merge branch 'bnxt_en-ptp' · 1159da64
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: PTP enhancements
      
      This series adds two PTP enhancements.  This first one is to register
      the PHC during probe time and keep it registered whether it is in
      ifup or ifdown state.  It will get unregistered and possibly
      reregistered if the firmware PTP capability changes after firmware
      reset.  The second one is to add the 1PPS (one pulse per second)
      feature to support input/output of the 1PPS signal.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1159da64
    • P
      bnxt_en: Log if an invalid signal detected on TSIO pin · abf90ac2
      Pavan Chebbi 提交于
      FW can report to driver via ASYNC event if it encountered an
      invalid signal on any TSIO PIN. Driver will log this event
      for the user to take corrective action.
      Reviewed-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: NArvind Susarla <arvind.susarla@broadcom.com>
      Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abf90ac2
    • P
      bnxt_en: Event handler for PPS events · 099fdeda
      Pavan Chebbi 提交于
      Once the PPS pins are configured, the FW can report
      PPS values using ASYNC event. This patch adds the
      ASYNC event handler and subsequent reporting of the
      events to kernel.
      Signed-off-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      099fdeda
    • P
      bnxt_en: 1PPS functions to configure TSIO pins · 9e518f25
      Pavan Chebbi 提交于
      Application will send ioctls to set/clear PPS pin functions
      based on user input. This patch implements the driver
      callbacks that will configure the TSIO pins using firmware
      commands. After firmware reset, the TSIO pins will be reconfigured
      again.
      Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e518f25
    • P
      bnxt_en: 1PPS support for 5750X family chips · caf3eedb
      Pavan Chebbi 提交于
      1PPS (One Pulse Per Second) is a signal generated either
      by the NIC PHC or an external timing source.
      Integrating the support to configure and use 1PPS using
      the TSIO pins along with PTP timestamps will add Grand
      Master capability to the 5750X family chipsets.
      
      This patch initializes the driver data structures and
      registers the 1PPS with kernel, based on the TSIO pins'
      capability in the hardware. This will create a /dev/ppsX
      device which applications can use to receive PPS events.
      
      Later patches will define functions to configure and use
      the pins.
      Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      caf3eedb
    • M
      bnxt_en: Do not read the PTP PHC during chip reset · 30e96f48
      Michael Chan 提交于
      During error recovery or hot firmware upgrade, the chip may be under
      reset and the PHC register read cycles may cause completion timeouts.
      Check that the chip is not under reset condition before proceeding
      to read the PHC by checking the flag BNXT_STATE_IN_FW_RESET.  We also
      need to take the ptp_lock before we set this flag to prevent race
      conditions.
      
      We need this logic because the PHC now will stay registered after
      bnxt_close().
      Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30e96f48
    • M
      bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one() · a521c8a0
      Michael Chan 提交于
      It was pointed out by Richard Cochran that registering the PHC during
      probe is better than during ifup, so move bnxt_ptp_init() back to
      bnxt_init_one().  In order to work correctly after firmware reset which
      may result in PTP config. changes, we modify bnxt_ptp_init() to return
      if the PHC has been registered earlier.  If PTP is no longer supported
      by the new firmware, we will unregister the PHC and clean up.
      
      This partially reverts:
      
      d7859afb ("bnxt_en: Move bnxt_ptp_init() to bnxt_open()")
      Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a521c8a0
  2. 28 7月, 2021 21 次提交