1. 30 8月, 2021 2 次提交
  2. 29 8月, 2021 2 次提交
  3. 28 8月, 2021 1 次提交
    • R
      ipv6: add IFLA_INET6_RA_MTU to expose mtu value · 49b99da2
      Rocco Yue 提交于
      The kernel provides a "/proc/sys/net/ipv6/conf/<iface>/mtu"
      file, which can temporarily record the mtu value of the last
      received RA message when the RA mtu value is lower than the
      interface mtu, but this proc has following limitations:
      
      (1) when the interface mtu (/sys/class/net/<iface>/mtu) is
      updeated, mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) will
      be updated to the value of interface mtu;
      (2) mtu6 (/proc/sys/net/ipv6/conf/<iface>/mtu) only affect
      ipv6 connection, and not affect ipv4.
      
      Therefore, when the mtu option is carried in the RA message,
      there will be a problem that the user sometimes cannot obtain
      RA mtu value correctly by reading mtu6.
      
      After this patch set, if a RA message carries the mtu option,
      you can send a netlink msg which nlmsg_type is RTM_GETLINK,
      and then by parsing the attribute of IFLA_INET6_RA_MTU to
      get the mtu value carried in the RA message received on the
      inet6 device. In addition, you can also get a link notification
      when ra_mtu is updated so it doesn't have to poll.
      
      In this way, if the MTU values that the device receives from
      the network in the PCO IPv4 and the RA IPv6 procedures are
      different, the user can obtain the correct ipv6 ra_mtu value
      and compare the value of ra_mtu and ipv4 mtu, then the device
      can use the lower MTU value for both IPv4 and IPv6.
      Signed-off-by: NRocco Yue <rocco.yue@mediatek.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20210827150412.9267-1-rocco.yue@mediatek.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      49b99da2
  4. 27 8月, 2021 1 次提交
  5. 26 8月, 2021 3 次提交
  6. 25 8月, 2021 11 次提交
    • F
      netfilter: ecache: remove nf_exp_event_notifier structure · bd1431db
      Florian Westphal 提交于
      Reuse the conntrack event notofier struct, this allows to remove the
      extra register/unregister functions and avoids a pointer in struct net.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bd1431db
    • F
      netfilter: ecache: prepare for event notifier merge · b86c0e64
      Florian Westphal 提交于
      This prepares for merge for ct and exp notifier structs.
      
      The 'fcn' member is renamed to something unique.
      Second, the register/unregister api is simplified.  There is only
      one implementation so there is no need to do any error checking.
      
      Replace the EBUSY logic with WARN_ON_ONCE.  This allows to remove
      error unwinding.
      
      The exp notifier register/unregister function is removed in
      a followup patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b86c0e64
    • F
      netfilter: ecache: remove one indent level · 478374a3
      Florian Westphal 提交于
      nf_conntrack_eventmask_report and nf_ct_deliver_cached_events shared
      most of their code.  This unifies the layout by changing
      
       if (nf_ct_is_confirmed(ct)) {
         foo
       }
      
       to
       if (!nf_ct_is_confirmed(ct)))
         return
       foo
      
      This removes one level of indentation.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      478374a3
    • S
      mctp: Remove the repeated declaration · 87e5ef4b
      Shaokun Zhang 提交于
      Function 'mctp_dev_get_rtnl' is declared twice, so remove the
      repeated declaration.
      
      Cc: Jeremy Kerr <jk@codeconstruct.com.au>
      Cc: Matt Johnston <matt@codeconstruct.com.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87e5ef4b
    • V
      net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160
      Vladimir Oltean 提交于
      Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
      for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
      function solved quite a different problem than our needs are now.
      
      Then, we used best-effort VLAN filtering and we were using the xmit_tpid
      to tunnel packets coming from an 8021q upper through the TX VLAN allocated
      by tag_8021q to that egress port. The need for a different VLAN protocol
      depending on switch revision came from the fact that this in itself was
      more of a hack to trick the hardware into accepting tunneled VLANs in
      the first place.
      
      Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
      supported them again, we would not do that using the same method of
      {tunneling the VLAN on egress, retagging the VLAN on ingress} that we
      had in the best-effort VLAN filtering mode. It seems rather simpler that
      we just allocate a VLAN in the VLAN table that is simply not used by the
      bridge at all, or by any other port.
      
      Anyway, I have 2 gripes with the current sja1105_xmit_tpid:
      
      1. When sending packets on behalf of a VLAN-aware bridge (with the new
         TX forwarding offload framework) plus untagged (with the tag_8021q
         VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
         and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
         through the DSA master have a VLAN protocol of 0x8100 and others of
         0x88a8. This is strange and there is no reason for it now. If we have
         a bridge and are therefore forced to send using that bridge's TPID,
         we can as well blend with that bridge's VLAN protocol for all packets.
      
      2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
         because it looks inside dp->priv. It is desirable to keep as much
         separation between taggers and switch drivers as possible. Now it
         doesn't do that anymore.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ded9160
    • V
      net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e
      Vladimir Oltean 提交于
      The sja1105 driver is a bit special in its use of VLAN headers as DSA
      tags. This is because in VLAN-aware mode, the VLAN headers use an actual
      TPID of 0x8100, which is understood even by the DSA master as an actual
      VLAN header.
      
      Furthermore, control packets such as PTP and STP are transmitted with no
      VLAN header as a DSA tag, because, depending on switch generation, there
      are ways to steer these control packets towards a precise egress port
      other than VLAN tags. Transmitting control packets as untagged means
      leaving a door open for traffic in general to be transmitted as untagged
      from the DSA master, and for it to traverse the switch and exit a random
      switch port according to the FDB lookup.
      
      This behavior is a bit out of line with other DSA drivers which have
      native support for DSA tagging. There, it is to be expected that the
      switch only accepts DSA-tagged packets on its CPU port, dropping
      everything that does not match this pattern.
      
      We perhaps rely a bit too much on the switches' hardware dropping on the
      CPU port, and place no other restrictions in the kernel data path to
      avoid that. For example, sja1105 is also a bit special in that STP/PTP
      packets are transmitted using "management routes"
      (sja1105_port_deferred_xmit): when sending a link-local packet from the
      CPU, we must first write a SPI message to the switch to tell it to
      expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
      it towards port 3 when it gets it. This entry expires as soon as it
      matches a packet received by the switch, and it needs to be reinstalled
      for the next packet etc. All in all quite a ghetto mechanism, but it is
      all that the sja1105 switches offer for injecting a control packet.
      The driver takes a mutex for serializing control packets and making the
      pairs of SPI writes of a management route and its associated skb atomic,
      but to be honest, a mutex is only relevant as long as all parties agree
      to take it. With the DSA design, it is possible to open an AF_PACKET
      socket on the DSA master net device, and blast packets towards
      01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
      it all goes kaput because management routes installed by the driver will
      match skbs sent by the DSA master, and not skbs generated by the driver
      itself. So they will end up being routed on the wrong port.
      
      So through the lens of that, maybe it would make sense to avoid that
      from happening by doing something in the network stack, like: introduce
      a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
      dev_hard_start_xmit(), introduce the following check:
      
      	if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
      		kfree_skb(skb);
      
      Ok, maybe that is a bit drastic, but that would at least prevent a bunch
      of problems. For example, right now, even though the majority of DSA
      switches drop packets without DSA tags sent by the DSA master (and
      therefore the majority of garbage that user space daemons like avahi and
      udhcpcd and friends create), it is still conceivable that an aggressive
      user space program can open an AF_PACKET socket and inject a spoofed DSA
      tag directly on the DSA master. We have no protection against that; the
      packet will be understood by the switch and be routed wherever user
      space says. Furthermore: there are some DSA switches where we even have
      register access over Ethernet, using DSA tags. So even user space
      drivers are possible in this way. This is a huge hole.
      
      However, the biggest thing that bothers me is that udhcpcd attempts to
      ask for an IP address on all interfaces by default, and with sja1105, it
      will attempt to get a valid IP address on both the DSA master as well as
      on sja1105 switch ports themselves. So with IP addresses in the same
      subnet on multiple interfaces, the routing table will be messed up and
      the system will be unusable for traffic until it is configured manually
      to not ask for an IP address on the DSA master itself.
      
      It turns out that it is possible to avoid that in the sja1105 driver, at
      least very superficially, by requesting the switch to drop VLAN-untagged
      packets on the CPU port. With the exception of control packets, all
      traffic originated from tag_sja1105.c is already VLAN-tagged, so only
      STP and PTP packets need to be converted. For that, we need to uphold
      the equivalence between an untagged and a pvid-tagged packet, and to
      remember that the CPU port of sja1105 uses a pvid of 4095.
      
      Now that we drop untagged traffic on the CPU port, non-aggressive user
      space applications like udhcpcd stop bothering us, and sja1105 effectively
      becomes just as vulnerable to the aggressive kind of user space programs
      as other DSA switches are (ok, users can also create 8021q uppers on top
      of the DSA master in the case of sja1105, but in future patches we can
      easily deny that, but it still doesn't change the fact that VLAN-tagged
      packets can still be injected over raw sockets).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0b8c67e
    • G
      mptcp: MP_FAIL suboption sending · c25aeb4e
      Geliang Tang 提交于
      This patch added the MP_FAIL suboption sending support.
      
      Add a new flag named send_mp_fail in struct mptcp_subflow_context. If
      this flag is set, send out MP_FAIL suboption.
      
      Add a new member fail_seq in struct mptcp_out_options to save the data
      sequence number to put into the MP_FAIL suboption.
      
      An MP_FAIL option could be included in a RST or on the subflow-level
      ACK.
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NGeliang Tang <geliangtang@xiaomi.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c25aeb4e
    • P
      mptcp: shrink mptcp_out_options struct · d7b26908
      Paolo Abeni 提交于
      After the previous patch we can alias with a union several
      fields in mptcp_out_options. Such struct is stack allocated and
      memset() for each plain TCP out packet. Every saved byted counts.
      
      Before:
      pahole -EC mptcp_out_options
       # ...
      /* size: 136, cachelines: 3, members: 17 */
      
      After:
      pahole -EC mptcp_out_options
       # ...
      /* size: 56, cachelines: 1, members: 9 */
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7b26908
    • G
      net-next: When a bond have a massive amount of VLANs with IPv6 addresses,... · 406f42fa
      Gilad Naaman 提交于
      net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.
      
      The source of most of the slow down is the `dev_addr_lists.c` module,
      which mainatins a linked list of HW addresses.
      When using IPv6, this list grows for each IPv6 address added on a
      VLAN, since each IPv6 address has a multicast HW address associated with
      it.
      
      When performing any modification to the involved links, this list is
      traversed many times, often for nothing, all while holding the RTNL
      lock.
      
      Instead, this patch adds an auxilliary rbtree which cuts down
      traversal time significantly.
      
      Performance can be seen with the following script:
      
      	#!/bin/bash
      	ip netns del test || true 2>/dev/null
      	ip netns add test
      
      	echo 1 | ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null
      
      	set -e
      
      	ip -n test link add foo type veth peer name bar
      	ip -n test link add b1 type bond
      	ip -n test link add florp type vrf table 10
      
      	ip -n test link set bar master b1
      	ip -n test link set foo up
      	ip -n test link set bar up
      	ip -n test link set b1 up
      	ip -n test link set florp up
      
      	VLAN_COUNT=1500
      	BASE_DEV=b1
      
      	echo Creating vlans
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done"
      
      	echo Bringing them up
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test link set foo.\$i up; done"
      
      	echo Assiging IPv6 Addresses
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test address add dev foo.\$i 2000::\$i/64; done"
      
      	echo Attaching to VRF
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test link set foo.\$i master florp; done"
      
      On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance
      before the patch is (truncated):
      
      	Creating vlans
      	real 108.35
      	Bringing them up
      	real 4.96
      	Assiging IPv6 Addresses
      	real 19.22
      	Attaching to VRF
      	real 458.84
      
      After the patch:
      
      	Creating vlans
      	real 5.59
      	Bringing them up
      	real 5.07
      	Assiging IPv6 Addresses
      	real 5.64
      	Attaching to VRF
      	real 25.37
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Lu Wei <luwei32@huawei.com>
      Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: NGilad Naaman <gnaaman@drivenets.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      406f42fa
    • V
      PCI: Add pcie_ptm_enabled() · 014408cd
      Vinicius Costa Gomes 提交于
      Add a predicate that returns if PCIe PTM (Precision Time Measurement)
      is enabled.
      
      It will only return true if it's enabled in all the ports in the path
      from the device to the root.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      014408cd
    • V
      Revert "PCI: Make pci_enable_ptm() private" · 1d71eb53
      Vinicius Costa Gomes 提交于
      Make pci_enable_ptm() accessible from the drivers.
      
      Exposing this to the driver enables the driver to use the
      'ptm_enabled' field of 'pci_dev' to check if PTM is enabled or not.
      
      This reverts commit ac6c26da ("PCI: Make pci_enable_ptm() private").
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1d71eb53
  7. 24 8月, 2021 8 次提交
  8. 23 8月, 2021 1 次提交
    • V
      net: dsa: track unique bridge numbers across all DSA switch trees · f5e165e7
      Vladimir Oltean 提交于
      Right now, cross-tree bridging setups work somewhat by mistake.
      
      In the case of cross-tree bridging with sja1105, all switch instances
      need to agree upon a common VLAN ID for forwarding a packet that belongs
      to a certain bridging domain.
      
      With TX forwarding offload, the VLAN ID is the bridge VLAN for
      VLAN-aware bridging, and the tag_8021q TX forwarding offload VID
      (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging.
      
      The VBID for VLAN-unaware bridging is derived from the dp->bridge_num
      value calculated by DSA independently for each switch tree.
      
      If ports from one tree join one bridge, and ports from another tree join
      another bridge, DSA will assign them the same bridge_num, even though
      the bridges are different. If cross-tree bridging is supported, this
      is an issue.
      
      Modify DSA to calculate the bridge_num globally across all switch trees.
      This has the implication for a driver that the dp->bridge_num value that
      DSA will assign to its ports might not be contiguous, if there are
      boards with multiple DSA drivers instantiated. Additionally, all
      bridge_num values eat up towards each switch's
      ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate,
      and can be seen as a limitation introduced by this patch. However, that
      is the lesser evil for now.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5e165e7
  9. 22 8月, 2021 1 次提交
  10. 21 8月, 2021 3 次提交
  11. 20 8月, 2021 5 次提交
  12. 19 8月, 2021 2 次提交