1. 09 3月, 2022 1 次提交
  2. 08 3月, 2022 1 次提交
  3. 07 3月, 2022 2 次提交
  4. 06 3月, 2022 2 次提交
    • D
      net: tun: track dropped skb via kfree_skb_reason() · 4b4f052e
      Dongli Zhang 提交于
      The TUN can be used as vhost-net backend. E.g, the tun_net_xmit() is the
      interface to forward the skb from TUN to vhost-net/virtio-net.
      
      However, there are many "goto drop" in the TUN driver. Therefore, the
      kfree_skb_reason() is involved at each "goto drop" to help userspace
      ftrace/ebpf to track the reason for the loss of packets.
      
      The below reasons are introduced:
      
      - SKB_DROP_REASON_DEV_READY
      - SKB_DROP_REASON_NOMEM
      - SKB_DROP_REASON_HDR_TRUNC
      - SKB_DROP_REASON_TAP_FILTER
      - SKB_DROP_REASON_TAP_TXFILTER
      
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Joe Jin <joe.jin@oracle.com>
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b4f052e
    • D
      net: tap: track dropped skb via kfree_skb_reason() · 736f16de
      Dongli Zhang 提交于
      The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
      the interface to forward the skb from TAP to vhost-net/virtio-net.
      
      However, there are many "goto drop" in the TAP driver. Therefore, the
      kfree_skb_reason() is involved at each "goto drop" to help userspace
      ftrace/ebpf to track the reason for the loss of packets.
      
      The below reasons are introduced:
      
      - SKB_DROP_REASON_SKB_CSUM
      - SKB_DROP_REASON_SKB_GSO_SEG
      - SKB_DROP_REASON_SKB_UCOPY_FAULT
      - SKB_DROP_REASON_DEV_HDR
      - SKB_DROP_REASON_FULL_RING
      
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Joe Jin <joe.jin@oracle.com>
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      736f16de
  5. 04 3月, 2022 7 次提交
  6. 03 3月, 2022 11 次提交
    • M
      bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time() · 8d21ec0e
      Martin KaFai Lau 提交于
      * __sk_buff->delivery_time_type:
      This patch adds __sk_buff->delivery_time_type.  It tells if the
      delivery_time is stored in __sk_buff->tstamp or not.
      
      It will be most useful for ingress to tell if the __sk_buff->tstamp
      has the (rcv) timestamp or delivery_time.  If delivery_time_type
      is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp.
      
      Two non-zero types are defined for the delivery_time_type,
      BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC.  For UNSPEC,
      it can only happen in egress because only mono delivery_time can be
      forwarded to ingress now.  The clock of UNSPEC delivery_time
      can be deduced from the skb->sk->sk_clockid which is how
      the sch_etf doing it also.
      
      * Provide forwarded delivery_time to tc-bpf@ingress:
      With the help of the new delivery_time_type, the tc-bpf has a way
      to tell if the __sk_buff->tstamp has the (rcv) timestamp or
      the delivery_time.  During bpf load time, the verifier will learn if
      the bpf prog has accessed the new __sk_buff->delivery_time_type.
      If it does, it means the tc-bpf@ingress is expecting the
      skb->tstamp could have the delivery_time.  The kernel will then
      read the skb->tstamp as-is during bpf insn rewrite without
      checking the skb->mono_delivery_time.  This is done by adding a
      new prog->delivery_time_access bit.  The same goes for
      writing skb->tstamp.
      
      * bpf_skb_set_delivery_time():
      The bpf_skb_set_delivery_time() helper is added to allow setting both
      delivery_time and the delivery_time_type at the same time.  If the
      tc-bpf does not need to change the delivery_time_type, it can directly
      write to the __sk_buff->tstamp as the existing tc-bpf has already been
      doing.  It will be most useful at ingress to change the
      __sk_buff->tstamp from the (rcv) timestamp to
      a mono delivery_time and then bpf_redirect_*().
      
      bpf only has mono clock helper (bpf_ktime_get_ns), and
      the current known use case is the mono EDT for fq, and
      only mono delivery time can be kept during forward now,
      so bpf_skb_set_delivery_time() only supports setting
      BPF_SKB_DELIVERY_TIME_MONO.  It can be extended later when use cases
      come up and the forwarding path also supports other clock bases.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d21ec0e
    • M
      bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress · 7449197d
      Martin KaFai Lau 提交于
      The current tc-bpf@ingress reads and writes the __sk_buff->tstamp
      as a (rcv) timestamp which currently could either be 0 (not available)
      or ktime_get_real().  This patch is to backward compatible with the
      (rcv) timestamp expectation at ingress.  If the skb->tstamp has
      the delivery_time, the bpf insn rewrite will read 0 for tc-bpf
      running at ingress as it is not available.  When writing at ingress,
      it will also clear the skb->mono_delivery_time bit.
      
      /* BPF_READ: a = __sk_buff->tstamp */
      if (!skb->tc_at_ingress || !skb->mono_delivery_time)
      	a = skb->tstamp;
      else
      	a = 0
      
      /* BPF_WRITE: __sk_buff->tstamp = a */
      if (skb->tc_at_ingress)
      	skb->mono_delivery_time = 0;
      skb->tstamp = a;
      
      [ A note on the BPF_CGROUP_INET_INGRESS which can also access
        skb->tstamp.  At that point, the skb is delivered locally
        and skb_clear_delivery_time() has already been done,
        so the skb->tstamp will only have the (rcv) timestamp. ]
      
      If the tc-bpf@egress writes 0 to skb->tstamp, the skb->mono_delivery_time
      has to be cleared also.  It could be done together during
      convert_ctx_access().  However, the latter patch will also expose
      the skb->mono_delivery_time bit as __sk_buff->delivery_time_type.
      Changing the delivery_time_type in the background may surprise
      the user, e.g. the 2nd read on __sk_buff->delivery_time_type
      may need a READ_ONCE() to avoid compiler optimization.  Thus,
      in expecting the needs in the latter patch, this patch does a
      check on !skb->tstamp after running the tc-bpf and clears the
      skb->mono_delivery_time bit if needed.  The earlier discussion
      on v4 [0].
      
      The bpf insn rewrite requires the skb's mono_delivery_time bit and
      tc_at_ingress bit.  They are moved up in sk_buff so that bpf rewrite
      can be done at a fixed offset.  tc_skip_classify is moved together with
      tc_at_ingress.  To get one bit for mono_delivery_time, csum_not_inet is
      moved down and this bit is currently used by sctp.
      
      [0]: https://lore.kernel.org/bpf/20220217015043.khqwqklx45c4m4se@kafai-mbp.dhcp.thefacebook.com/Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7449197d
    • M
      net: ipv6: Get rcv timestamp if needed when handling hop-by-hop IOAM option · b6561f84
      Martin KaFai Lau 提交于
      IOAM is a hop-by-hop option with a temporary iana allocation (49).
      Since it is hop-by-hop, it is done before the input routing decision.
      One of the traced data field is the (rcv) timestamp.
      
      When the locally generated skb is looping from egress to ingress over
      a virtual interface (e.g. veth, loopback...), skb->tstamp may have the
      delivery time before it is known that it will be delivered locally
      and received by another sk.
      
      Like handling the network tapping (tcpdump) in the earlier patch,
      this patch gets the timestamp if needed without over-writing the
      delivery_time in the skb->tstamp.  skb_tstamp_cond() is added to do the
      ktime_get_real() with an extra cond arg to check on top of the
      netstamp_needed_key static key.  skb_tstamp_cond() will also be used in
      a latter patch and it needs the netstamp_needed_key check.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6561f84
    • M
      net: Set skb->mono_delivery_time and clear it after sch_handle_ingress() · d98d58a0
      Martin KaFai Lau 提交于
      The previous patches handled the delivery_time before sch_handle_ingress().
      
      This patch can now set the skb->mono_delivery_time to flag the skb->tstamp
      is used as the mono delivery_time (EDT) instead of the (rcv) timestamp
      and also clear it with skb_clear_delivery_time() after
      sch_handle_ingress().  This will make the bpf_redirect_*()
      to keep the mono delivery_time and used by a qdisc (fq) of
      the egress-ing interface.
      
      A latter patch will postpone the skb_clear_delivery_time() until the
      stack learns that the skb is being delivered locally and that will
      make other kernel forwarding paths (ip[6]_forward) able to keep
      the delivery_time also.  Thus, like the previous patches on using
      the skb->mono_delivery_time bit, calling skb_clear_delivery_time()
      is not limited within the CONFIG_NET_INGRESS to avoid too many code
      churns among this set.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d98d58a0
    • M
      net: Clear mono_delivery_time bit in __skb_tstamp_tx() · d93376f5
      Martin KaFai Lau 提交于
      In __skb_tstamp_tx(), it may clone the egress skb and queues the clone to
      the sk_error_queue.  The outgoing skb may have the mono delivery_time
      while the (rcv) timestamp is expected for the clone, so the
      skb->mono_delivery_time bit needs to be cleared from the clone.
      
      This patch adds the skb->mono_delivery_time clearing to the existing
      __net_timestamp() and use it in __skb_tstamp_tx().
      The __net_timestamp() fast path usage in dev.c is changed to directly
      call ktime_get_real() since the mono_delivery_time bit is not set at
      that point.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d93376f5
    • M
      net: Handle delivery_time in skb->tstamp during network tapping with af_packet · 27942a15
      Martin KaFai Lau 提交于
      A latter patch will set the skb->mono_delivery_time to flag the skb->tstamp
      is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
      skb_clear_tstamp() will then keep this delivery_time during forwarding.
      
      This patch is to make the network tapping (with af_packet) to handle
      the delivery_time stored in skb->tstamp.
      
      Regardless of tapping at the ingress or egress,  the tapped skb is
      received by the af_packet socket, so it is ingress to the af_packet
      socket and it expects the (rcv) timestamp.
      
      When tapping at egress, dev_queue_xmit_nit() is used.  It has already
      expected skb->tstamp may have delivery_time,  so it does
      skb_clone()+net_timestamp_set() to ensure the cloned skb has
      the (rcv) timestamp before passing to the af_packet sk.
      This patch only adds to clear the skb->mono_delivery_time
      bit in net_timestamp_set().
      
      When tapping at ingress, it currently expects the skb->tstamp is either 0
      or the (rcv) timestamp.  Meaning, the tapping at ingress path
      has already expected the skb->tstamp could be 0 and it will get
      the (rcv) timestamp by ktime_get_real() when needed.
      
      There are two cases for tapping at ingress:
      
      One case is af_packet queues the skb to its sk_receive_queue.
      The skb is either not shared or new clone created.  The newly
      added skb_clear_delivery_time() is called to clear the
      delivery_time (if any) and set the (rcv) timestamp if
      needed before the skb is queued to the sk_receive_queue.
      
      Another case, the ingress skb is directly copied to the rx_ring
      and tpacket_get_timestamp() is used to get the (rcv) timestamp.
      The newly added skb_tstamp() is used in tpacket_get_timestamp()
      to check the skb->mono_delivery_time bit before returning skb->tstamp.
      As mentioned earlier, the tapping@ingress has already expected
      the skb may not have the (rcv) timestamp (because no sk has asked
      for it) and has handled this case by directly calling ktime_get_real().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27942a15
    • M
      net: Add skb_clear_tstamp() to keep the mono delivery_time · de799101
      Martin KaFai Lau 提交于
      Right now, skb->tstamp is reset to 0 whenever the skb is forwarded.
      
      If skb->tstamp has the mono delivery_time, clearing it can hurt
      the performance when it finally transmits out to fq@phy-dev.
      
      The earlier patch added a skb->mono_delivery_time bit to
      flag the skb->tstamp carrying the mono delivery_time.
      
      This patch adds skb_clear_tstamp() helper which keeps
      the mono delivery_time and clears everything else.
      
      The delivery_time clearing will be postponed until the stack knows the
      skb will be delivered locally.  It will be done in a latter patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de799101
    • M
      net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp · a1ac9c8a
      Martin KaFai Lau 提交于
      skb->tstamp was first used as the (rcv) timestamp.
      The major usage is to report it to the user (e.g. SO_TIMESTAMP).
      
      Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP)
      during egress and used by the qdisc (e.g. sch_fq) to make decision on when
      the skb can be passed to the dev.
      
      Currently, there is no way to tell skb->tstamp having the (rcv) timestamp
      or the delivery_time, so it is always reset to 0 whenever forwarded
      between egress and ingress.
      
      While it makes sense to always clear the (rcv) timestamp in skb->tstamp
      to avoid confusing sch_fq that expects the delivery_time, it is a
      performance issue [0] to clear the delivery_time if the skb finally
      egress to a fq@phy-dev.  For example, when forwarding from egress to
      ingress and then finally back to egress:
      
                  tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns
                                           ^              ^
                                           reset          rest
      
      This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp
      is storing the mono delivery_time (EDT) instead of the (rcv) timestamp.
      
      The current use case is to keep the TCP mono delivery_time (EDT) and
      to be used with sch_fq.  A latter patch will also allow tc-bpf@ingress
      to read and change the mono delivery_time.
      
      In the future, another bit (e.g. skb->user_delivery_time) can be added
      for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid.
      
      [ This patch is a prep work.  The following patches will
        get the other parts of the stack ready first.  Then another patch
        after that will finally set the skb->mono_delivery_time. ]
      
      skb_set_delivery_time() function is added.  It is used by the tcp_output.c
      and during ip[6] fragmentation to assign the delivery_time to
      the skb->tstamp and also set the skb->mono_delivery_time.
      
      A note on the change in ip_send_unicast_reply() in ip_output.c.
      It is only used by TCP to send reset/ack out of a ctl_sk.
      Like the new skb_set_delivery_time(), this patch sets
      the skb->mono_delivery_time to 0 for now as a place
      holder.  It will be enabled in a latter patch.
      A similar case in tcp_ipv6 can be done with
      skb_set_delivery_time() in tcp_v6_send_response().
      
      [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdfSigned-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1ac9c8a
    • P
      net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS · 5fd0b838
      Petr Machata 提交于
      The offloaded HW stats are designed to allow per-netdevice enablement and
      disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS,
      which should be carried by the RTM_SETSTATS message, and expresses a desire
      to toggle L3 offload xstats on or off.
      
      As part of the above, add an exported function rtnl_offload_xstats_notify()
      that drivers can use when they have installed or deinstalled the counters
      backing the HW stats.
      
      At this point, it is possible to enable, disable and query L3 offload
      xstats on netdevices. (However there is no driver actually implementing
      these.)
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fd0b838
    • P
      net: dev: Add hardware stats support · 9309f97a
      Petr Machata 提交于
      Offloading switch device drivers may be able to collect statistics of the
      traffic taking place in the HW datapath that pertains to a certain soft
      netdevice, such as VLAN. Add the necessary infrastructure to allow exposing
      these statistics to the offloaded netdevice in question. The API was shaped
      by the following considerations:
      
      - Collection of HW statistics is not free: there may be a finite number of
        counters, and the act of counting may have a performance impact. It is
        therefore necessary to allow toggling whether HW counting should be done
        for any particular SW netdevice.
      
      - As the drivers are loaded and removed, a particular device may get
        offloaded and unoffloaded again. At the same time, the statistics values
        need to stay monotonic (modulo the eventual 64-bit wraparound),
        increasing only to reflect traffic measured in the device.
      
        To that end, the netdevice keeps around a lazily-allocated copy of struct
        rtnl_link_stats64. Device drivers then contribute to the values kept
        therein at various points. Even as the driver goes away, the struct stays
        around to maintain the statistics values.
      
      - Different HW devices may be able to count different things. The
        motivation behind this patch in particular is exposure of HW counters on
        Nvidia Spectrum switches, where the only practical approach to counting
        traffic on offloaded soft netdevices currently is to use router interface
        counters, and count L3 traffic. Correspondingly that is the statistics
        suite added in this patch.
      
        Other devices may be able to measure different kinds of traffic, and for
        that reason, the APIs are built to allow uniform access to different
        statistics suites.
      
      - Because soft netdevices and offloading drivers are only loosely bound, a
        netdevice uses a notifier chain to communicate with the drivers. Several
        new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages
        to the offloading drivers.
      
      - Devices can have various conditions for when a particular counter is
        available. As the device is configured and reconfigured, the device
        offload may become or cease being suitable for counter binding. A
        netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to
        ping offloading drivers and determine whether anyone currently implements
        a given statistics suite. This information can then be propagated to user
        space.
      
        When the driver decides to unoffload a netdevice, it can use a
        newly-added function, netdev_offload_xstats_report_delta(), to record
        outstanding collected statistics, before destroying the HW counter.
      
      This patch adds a helper, call_netdevice_notifiers_info_robust(), for
      dispatching a notifier with the possibility of unwind when one of the
      consumers bails. Given the wish to eventually get rid of the global
      notifier block altogether, this helper only invokes the per-netns notifier
      block.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9309f97a
    • K
      flow_dissector: Add support for HSR · bf08824a
      Kurt Kanzenbach 提交于
      Network drivers such as igb or igc call eth_get_headlen() to determine the
      header length for their to be constructed skbs in receive path.
      
      When running HSR on top of these drivers, it results in triggering BUG_ON() in
      skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
      correctly. skb_pull() notices that.
      
      For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
      is not correct. The problem is, the flow dissection code does not take HSR into
      account. Therefore, add support for it.
      Reported-by: NAnthony Harivel <anthony.harivel@linutronix.de>
      Signed-off-by: NKurt Kanzenbach <kurt@linutronix.de>
      Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      bf08824a
  7. 01 3月, 2022 2 次提交
  8. 28 2月, 2022 1 次提交
  9. 27 2月, 2022 5 次提交
    • V
      net: dsa: tag_8021q: rename dsa_8021q_bridge_tx_fwd_offload_vid · b6362bdf
      Vladimir Oltean 提交于
      The dsa_8021q_bridge_tx_fwd_offload_vid is no longer used just for
      bridge TX forwarding offload, it is the private VLAN reserved for
      VLAN-unaware bridging in a way that is compatible with FDB isolation.
      
      So just rename it dsa_tag_8021q_bridge_vid.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6362bdf
    • V
      net: dsa: tag_8021q: merge RX and TX VLANs · 04b67e18
      Vladimir Oltean 提交于
      In the old Shared VLAN Learning mode of operation that tag_8021q
      previously used for forwarding, we needed to have distinct concepts for
      an RX and a TX VLAN.
      
      An RX VLAN could be installed on all ports that were members of a given
      bridge, so that autonomous forwarding could still work, while a TX VLAN
      was dedicated for precise packet steering, so it just contained the CPU
      port and one egress port.
      
      Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX
      all over, those lines have been blurred and we no longer have the need
      to do precise TX towards a port that is in a bridge. As for standalone
      ports, it is fine to use the same VLAN ID for both RX and TX.
      
      This patch changes the tag_8021q format by shifting the VLAN range it
      reserves, and halving it. Previously, our DIR bits were encoding the
      VLAN direction (RX/TX) and were set to either 1 or 2. This meant that
      tag_8021q reserved 2K VLANs, or 50% of the available range.
      
      Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q
      reserve only 1K VLANs, and a different range now (the last 1K). This is
      done so that we leave the old format in place in case we need to return
      to it.
      
      In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan
      functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and
      they are no longer distinct. For example, felix which did different
      things for different VLAN types, now needs to handle the RX and the TX
      logic for the same VLAN.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b67e18
    • V
      net: dsa: tag_8021q: add support for imprecise RX based on the VBID · d7f9787a
      Vladimir Oltean 提交于
      The sja1105 switch can't populate the PORT field of the tag_8021q header
      when sending a frame to the CPU with a non-zero VBID.
      
      Similar to dsa_find_designated_bridge_port_by_vid() which performs
      imprecise RX for VLAN-aware bridges, let's introduce a helper in
      tag_8021q for performing imprecise RX based on the VLAN that it has
      allocated for a VLAN-unaware bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7f9787a
    • V
      net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging · 91495f21
      Vladimir Oltean 提交于
      For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
      tied with the sja1105 switch: each port uses the same pvid which is also
      used for standalone operation (a unique one from which the source port
      and device ID can be retrieved when packets from that port are forwarded
      to the CPU). Since each port has a unique pvid when performing
      autonomous forwarding, the switch must be configured for Shared VLAN
      Learning (SVL) such that the VLAN ID itself is ignored when performing
      FDB lookups. Without SVL, packets would always be flooded, since FDB
      lookup in the source port's VLAN would never find any entry.
      
      First of all, to make tag_8021q more palatable to switches which might
      not support Shared VLAN Learning, let's just use a common VLAN for all
      ports that are under the same bridge.
      
      Secondly, using Shared VLAN Learning means that FDB isolation can never
      be enforced. But if all ports under the same VLAN-unaware bridge share
      the same VLAN ID, it can.
      
      The disadvantage is that the CPU port can no longer perform precise
      source port identification for these packets. But at least we have a
      mechanism which has proven to be adequate for that situation: imprecise
      RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
      termination on VLAN-aware bridges.
      
      The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
      same one as we were previously using for imprecise TX (bridge TX
      forwarding offload). It is already allocated, it is just a matter of
      using it.
      
      Note that because now all ports under the same bridge share the same
      VLAN, the complexity of performing a tag_8021q bridge join decreases
      dramatically. We no longer have to install the RX VLAN of a newly
      joining port into the port membership of the existing bridge ports.
      The newly joining port just becomes a member of the VLAN corresponding
      to that bridge, and the other ports are already members of it from when
      they joined the bridge themselves. So forwarding works properly.
      
      This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
      cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
      these calls directly into the sja1105 driver.
      
      With this new mode of operation, a port controlled by tag_8021q can have
      two pvids whereas before it could only have one. The pvid for standalone
      operation is different from the pvid used for VLAN-unaware bridging.
      This is done, again, so that FDB isolation can be enforced.
      Let tag_8021q manage this by deleting the standalone pvid when a port
      joins a bridge, and restoring it when it leaves it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91495f21
    • D
      PCI: Add Fungible Vendor ID to pci_ids.h · e8eb9e32
      Dimitris Michailidis 提交于
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Signed-off-by: NDimitris Michailidis <dmichail@fungible.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8eb9e32
  10. 26 2月, 2022 3 次提交
    • M
      net: neigh: use kfree_skb_reason() for __neigh_event_send() · a5736edd
      Menglong Dong 提交于
      Replace kfree_skb() used in __neigh_event_send() with
      kfree_skb_reason(). Following drop reasons are added:
      
      SKB_DROP_REASON_NEIGH_FAILED
      SKB_DROP_REASON_NEIGH_QUEUEFULL
      SKB_DROP_REASON_NEIGH_DEAD
      
      The first two reasons above should be the hot path that skb drops
      in neighbour subsystem.
      Reviewed-by: NMengen Sun <mengensun@tencent.com>
      Reviewed-by: NHao Peng <flyingpeng@tencent.com>
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5736edd
    • M
      net: ip: add skb drop reasons for ip egress path · 5e187189
      Menglong Dong 提交于
      Replace kfree_skb() which is used in the packet egress path of IP layer
      with kfree_skb_reason(). Functions that are involved include:
      
      __ip_queue_xmit()
      ip_finish_output()
      ip_mc_finish_output()
      ip6_output()
      ip6_finish_output()
      ip6_finish_output2()
      
      Following new drop reasons are introduced:
      
      SKB_DROP_REASON_IP_OUTNOROUTES
      SKB_DROP_REASON_BPF_CGROUP_EGRESS
      SKB_DROP_REASON_IPV6DISABLED
      SKB_DROP_REASON_NEIGH_CREATEFAIL
      Reviewed-by: NMengen Sun <mengensun@tencent.com>
      Reviewed-by: NHao Peng <flyingpeng@tencent.com>
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e187189
    • C
      tracing: Uninline trace_trigger_soft_disabled() partly · bc82c38a
      Christophe Leroy 提交于
      On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline
      keyword is not honored and trace_trigger_soft_disabled() appears
      approx 50 times in vmlinux.
      
      Adding -Winline to the build, the following message appears:
      
      	./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline]
      
      That function is rather big for an inlined function:
      
      	c003df60 <trace_trigger_soft_disabled>:
      	c003df60:	94 21 ff f0 	stwu    r1,-16(r1)
      	c003df64:	7c 08 02 a6 	mflr    r0
      	c003df68:	90 01 00 14 	stw     r0,20(r1)
      	c003df6c:	bf c1 00 08 	stmw    r30,8(r1)
      	c003df70:	83 e3 00 24 	lwz     r31,36(r3)
      	c003df74:	73 e9 01 00 	andi.   r9,r31,256
      	c003df78:	41 82 00 10 	beq     c003df88 <trace_trigger_soft_disabled+0x28>
      	c003df7c:	38 60 00 00 	li      r3,0
      	c003df80:	39 61 00 10 	addi    r11,r1,16
      	c003df84:	4b fd 60 ac 	b       c0014030 <_rest32gpr_30_x>
      	c003df88:	73 e9 00 80 	andi.   r9,r31,128
      	c003df8c:	7c 7e 1b 78 	mr      r30,r3
      	c003df90:	41 a2 00 14 	beq     c003dfa4 <trace_trigger_soft_disabled+0x44>
      	c003df94:	38 c0 00 00 	li      r6,0
      	c003df98:	38 a0 00 00 	li      r5,0
      	c003df9c:	38 80 00 00 	li      r4,0
      	c003dfa0:	48 05 c5 f1 	bl      c009a590 <event_triggers_call>
      	c003dfa4:	73 e9 00 40 	andi.   r9,r31,64
      	c003dfa8:	40 82 00 28 	bne     c003dfd0 <trace_trigger_soft_disabled+0x70>
      	c003dfac:	73 ff 02 00 	andi.   r31,r31,512
      	c003dfb0:	41 82 ff cc 	beq     c003df7c <trace_trigger_soft_disabled+0x1c>
      	c003dfb4:	80 01 00 14 	lwz     r0,20(r1)
      	c003dfb8:	83 e1 00 0c 	lwz     r31,12(r1)
      	c003dfbc:	7f c3 f3 78 	mr      r3,r30
      	c003dfc0:	83 c1 00 08 	lwz     r30,8(r1)
      	c003dfc4:	7c 08 03 a6 	mtlr    r0
      	c003dfc8:	38 21 00 10 	addi    r1,r1,16
      	c003dfcc:	48 05 6f 6c 	b       c0094f38 <trace_event_ignore_this_pid>
      	c003dfd0:	38 60 00 01 	li      r3,1
      	c003dfd4:	4b ff ff ac 	b       c003df80 <trace_trigger_soft_disabled+0x20>
      
      However it is located in a hot path so inlining it is important.
      But forcing inlining of the entire function by using __always_inline
      leads to increasing the text size by approx 20 kbytes.
      
      Instead, split the fonction in two parts, one part with the likely
      fast path, flagged __always_inline, and a second part out of line.
      
      With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE
      vmlinux text increases by only 1,4 kbytes, which is partly
      compensated by a decrease of vmlinux data by 7 kbytes.
      
      On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this
      change reduces vmlinux text by more than 30 kbytes.
      
      Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.euSigned-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      bc82c38a
  11. 24 2月, 2022 5 次提交