1. 27 8月, 2021 5 次提交
  2. 26 8月, 2021 8 次提交
    • net: fix NULL pointer reference in cipso_v4_doi_free · 733c99ee
      王贇 提交于
      In netlbl_cipsov4_add_std() when 'doi_def->map.std' alloc
      failed, we sometime observe panic:
      
        BUG: kernel NULL pointer dereference, address:
        ...
        RIP: 0010:cipso_v4_doi_free+0x3a/0x80
        ...
        Call Trace:
         netlbl_cipsov4_add_std+0xf4/0x8c0
         netlbl_cipsov4_add+0x13f/0x1b0
         genl_family_rcv_msg_doit.isra.15+0x132/0x170
         genl_rcv_msg+0x125/0x240
      
      This is because in cipso_v4_doi_free() there is no check
      on 'doi_def->map.std' when 'doi_def->type' equal 1, which
      is possibe, since netlbl_cipsov4_add_std() haven't initialize
      it before alloc 'doi_def->map.std'.
      
      This patch just add the check to prevent panic happen for similar
      cases.
      Reported-by: NAbaci <abaci@linux.alibaba.com>
      Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      733c99ee
    • A
      rtnetlink: Return correct error on changing device netns · 96a6b93b
      Andrey Ignatov 提交于
      Currently when device is moved between network namespaces using
      RTM_NEWLINK message type and one of netns attributes (FLA_NET_NS_PID,
      IFLA_NET_NS_FD, IFLA_TARGET_NETNSID) but w/o specifying IFLA_IFNAME, and
      target namespace already has device with same name, userspace will get
      EINVAL what is confusing and makes debugging harder.
      
      Fix it so that userspace gets more appropriate EEXIST instead what makes
      debugging much easier.
      
      Before:
      
        # ./ifname.sh
        + ip netns add ns0
        + ip netns exec ns0 ip link add l0 type dummy
        + ip netns exec ns0 ip link show l0
        8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 66:90:b5:d5:78:69 brd ff:ff:ff:ff:ff:ff
        + ip link add l0 type dummy
        + ip link show l0
        10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 6e:c6:1f:15:20:8d brd ff:ff:ff:ff:ff:ff
        + ip link set l0 netns ns0
        RTNETLINK answers: Invalid argument
      
      After:
      
        # ./ifname.sh
        + ip netns add ns0
        + ip netns exec ns0 ip link add l0 type dummy
        + ip netns exec ns0 ip link show l0
        8: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether 1e:4a:72:e3:e3:8f brd ff:ff:ff:ff:ff:ff
        + ip link add l0 type dummy
        + ip link show l0
        10: l0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
            link/ether f2:fc:fe:2b:7d:a6 brd ff:ff:ff:ff:ff:ff
        + ip link set l0 netns ns0
        RTNETLINK answers: File exists
      
      The problem is that do_setlink() passes its `char *ifname` argument,
      that it gets from a caller, to __dev_change_net_namespace() as is (as
      `const char *pat`), but semantics of ifname and pat can be different.
      
      For example, __rtnl_newlink() does this:
      
      net/core/rtnetlink.c
          3270	char ifname[IFNAMSIZ];
           ...
          3286	if (tb[IFLA_IFNAME])
          3287		nla_strscpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
          3288	else
          3289		ifname[0] = '\0';
           ...
          3364	if (dev) {
           ...
          3394		return do_setlink(skb, dev, ifm, extack, tb, ifname, status);
          3395	}
      
      , i.e. do_setlink() gets ifname pointer that is always valid no matter
      if user specified IFLA_IFNAME or not and then do_setlink() passes this
      ifname pointer as is to __dev_change_net_namespace() as pat argument.
      
      But the pat (pattern) in __dev_change_net_namespace() is used as:
      
      net/core/dev.c
         11198	err = -EEXIST;
         11199	if (__dev_get_by_name(net, dev->name)) {
         11200		/* We get here if we can't use the current device name */
         11201		if (!pat)
         11202			goto out;
         11203		err = dev_get_valid_name(net, dev, pat);
         11204		if (err < 0)
         11205			goto out;
         11206	}
      
      As the result the `goto out` path on line 11202 is neven taken and
      instead of returning EEXIST defined on line 11198,
      __dev_change_net_namespace() returns an error from dev_get_valid_name()
      and this, in turn, will be EINVAL for ifname[0] = '\0' set earlier.
      
      Fixes: d8a5ec67 ("[NET]: netlink support for moving devices between network namespaces.")
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96a6b93b
    • Y
      sock: remove one redundant SKB_FRAG_PAGE_ORDER macro · 723783d0
      Yunsheng Lin 提交于
      Both SKB_FRAG_PAGE_ORDER are defined to the same value in
      net/core/sock.c and drivers/vhost/net.c.
      
      Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h,
      as both net/core/sock.c and drivers/vhost/net.c include it,
      and it seems a reasonable file to put the macro.
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      723783d0
    • E
      ipv4: use siphash instead of Jenkins in fnhe_hashfun() · 6457378f
      Eric Dumazet 提交于
      A group of security researchers brought to our attention
      the weakness of hash function used in fnhe_hashfun().
      
      Lets use siphash instead of Jenkins Hash, to considerably
      reduce security risks.
      
      Also remove the inline keyword, this really is distracting.
      
      Fixes: d546c621 ("ipv4: harden fnhe_hashfun()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NKeyu Man <kman001@ucr.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6457378f
    • E
      ipv6: use siphash in rt6_exception_hash() · 4785305c
      Eric Dumazet 提交于
      A group of security researchers brought to our attention
      the weakness of hash function used in rt6_exception_hash()
      
      Lets use siphash instead of Jenkins Hash, to considerably
      reduce security risks.
      
      Following patch deals with IPv4.
      
      Fixes: 35732d01 ("ipv6: introduce a hash table to store dst cache")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NKeyu Man <kman001@ucr.edu>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4785305c
    • S
      cfg80211: use wiphy DFS domain if it is self-managed · 90bd5bee
      Sriram R 提交于
      Currently during CAC start or other radar events, the DFS
      domain is fetched from cfg based on global DFS domain,
      even if the wiphy regdomain disagrees.
      
      But this could be different in case of self managed wiphy's
      in case the self managed driver updates its database or supports
      regions which has DFS domain set to UNSET in cfg80211 local
      regdomain.
      
      So for explicitly self-managed wiphys, just use their DFS
      domain.
      Signed-off-by: NSriram R <srirrama@codeaurora.org>
      Link: https://lore.kernel.org/r/1629934730-16388-1-git-send-email-srirrama@codeaurora.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      90bd5bee
    • W
      mac80211: parse transmit power envelope element · b0345850
      Wen Gong 提交于
      Parse and store the transmit power envelope element.
      Signed-off-by: NWen Gong <wgong@codeaurora.org>
      Link: https://lore.kernel.org/r/20210820122041.12157-8-wgong@codeaurora.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      b0345850
    • T
      SUNRPC: Fix XPT_BUSY flag leakage in svc_handle_xprt()... · 062b829c
      Trond Myklebust 提交于
      If the attempt to reserve a slot fails, we currently leak the XPT_BUSY
      flag on the socket. Among other things, this make it impossible to close
      the socket.
      
      Fixes: 82011c80 ("SUNRPC: Move svc_xprt_received() call sites")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      062b829c
  3. 25 8月, 2021 10 次提交
    • D
      net/sched: ets: fix crash when flipping from 'strict' to 'quantum' · cd9b50ad
      Davide Caratti 提交于
      While running kselftests, Hangbin observed that sch_ets.sh often crashes,
      and splats like the following one are seen in the output of 'dmesg':
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 159f12067 P4D 159f12067 PUD 159f13067 PMD 0
       Oops: 0000 [#1] SMP NOPTI
       CPU: 2 PID: 921 Comm: tc Not tainted 5.14.0-rc6+ #458
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:__list_del_entry_valid+0x2d/0x50
       Code: 48 8b 57 08 48 b9 00 01 00 00 00 00 ad de 48 39 c8 0f 84 ac 6e 5b 00 48 b9 22 01 00 00 00 00 ad de 48 39 ca 0f 84 cf 6e 5b 00 <48> 8b 32 48 39 fe 0f 85 af 6e 5b 00 48 8b 50 08 48 39 f2 0f 85 94
       RSP: 0018:ffffb2da005c3890 EFLAGS: 00010217
       RAX: 0000000000000000 RBX: ffff9073ba23f800 RCX: dead000000000122
       RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff9073ba23fbc8
       RBP: ffff9073ba23f890 R08: 0000000000000001 R09: 0000000000000001
       R10: 0000000000000001 R11: 0000000000000001 R12: dead000000000100
       R13: ffff9073ba23fb00 R14: 0000000000000002 R15: 0000000000000002
       FS:  00007f93e5564e40(0000) GS:ffff9073bba00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000014ad34000 CR4: 0000000000350ee0
       Call Trace:
        ets_qdisc_reset+0x6e/0x100 [sch_ets]
        qdisc_reset+0x49/0x1d0
        tbf_reset+0x15/0x60 [sch_tbf]
        qdisc_reset+0x49/0x1d0
        dev_reset_queue.constprop.42+0x2f/0x90
        dev_deactivate_many+0x1d3/0x3d0
        dev_deactivate+0x56/0x90
        qdisc_graft+0x47e/0x5a0
        tc_get_qdisc+0x1db/0x3e0
        rtnetlink_rcv_msg+0x164/0x4c0
        netlink_rcv_skb+0x50/0x100
        netlink_unicast+0x1a5/0x280
        netlink_sendmsg+0x242/0x480
        sock_sendmsg+0x5b/0x60
        ____sys_sendmsg+0x1f2/0x260
        ___sys_sendmsg+0x7c/0xc0
        __sys_sendmsg+0x57/0xa0
        do_syscall_64+0x3a/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f93e44b8338
       Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
       RSP: 002b:00007ffc0db737a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000061255c06 RCX: 00007f93e44b8338
       RDX: 0000000000000000 RSI: 00007ffc0db73810 RDI: 0000000000000003
       RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
       R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000001
       R13: 0000000000687880 R14: 0000000000000000 R15: 0000000000000000
       Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev i2c_i801 pcspkr i2c_smbus lpc_ich virtio_balloon ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel ahci libahci ghash_clmulni_intel libata serio_raw virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod
       CR2: 0000000000000000
      
      When the change() function decreases the value of 'nstrict', we must take
      into account that packets might be already enqueued on a class that flips
      from 'strict' to 'quantum': otherwise that class will not be added to the
      bandwidth-sharing list. Then, a call to ets_qdisc_reset() will attempt to
      do list_del(&alist) with 'alist' filled with zero, hence the NULL pointer
      dereference.
      For classes flipping from 'strict' to 'quantum', initialize an empty list
      and eventually add it to the bandwidth-sharing list, if there are packets
      already enqueued. In this way, the kernel will:
       a) prevent crashing as described above.
       b) avoid retaining the backlog packets (for an arbitrarily long time) in
          case no packet is enqueued after a change from 'strict' to 'quantum'.
      Reported-by: NHangbin Liu <liuhangbin@gmail.com>
      Fixes: dcc68b4d ("net: sch_ets: Add a new Qdisc")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd9b50ad
    • V
      net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160
      Vladimir Oltean 提交于
      Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
      for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
      function solved quite a different problem than our needs are now.
      
      Then, we used best-effort VLAN filtering and we were using the xmit_tpid
      to tunnel packets coming from an 8021q upper through the TX VLAN allocated
      by tag_8021q to that egress port. The need for a different VLAN protocol
      depending on switch revision came from the fact that this in itself was
      more of a hack to trick the hardware into accepting tunneled VLANs in
      the first place.
      
      Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
      supported them again, we would not do that using the same method of
      {tunneling the VLAN on egress, retagging the VLAN on ingress} that we
      had in the best-effort VLAN filtering mode. It seems rather simpler that
      we just allocate a VLAN in the VLAN table that is simply not used by the
      bridge at all, or by any other port.
      
      Anyway, I have 2 gripes with the current sja1105_xmit_tpid:
      
      1. When sending packets on behalf of a VLAN-aware bridge (with the new
         TX forwarding offload framework) plus untagged (with the tag_8021q
         VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
         and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
         through the DSA master have a VLAN protocol of 0x8100 and others of
         0x88a8. This is strange and there is no reason for it now. If we have
         a bridge and are therefore forced to send using that bridge's TPID,
         we can as well blend with that bridge's VLAN protocol for all packets.
      
      2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
         because it looks inside dp->priv. It is desirable to keep as much
         separation between taggers and switch drivers as possible. Now it
         doesn't do that anymore.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ded9160
    • V
      net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e
      Vladimir Oltean 提交于
      The sja1105 driver is a bit special in its use of VLAN headers as DSA
      tags. This is because in VLAN-aware mode, the VLAN headers use an actual
      TPID of 0x8100, which is understood even by the DSA master as an actual
      VLAN header.
      
      Furthermore, control packets such as PTP and STP are transmitted with no
      VLAN header as a DSA tag, because, depending on switch generation, there
      are ways to steer these control packets towards a precise egress port
      other than VLAN tags. Transmitting control packets as untagged means
      leaving a door open for traffic in general to be transmitted as untagged
      from the DSA master, and for it to traverse the switch and exit a random
      switch port according to the FDB lookup.
      
      This behavior is a bit out of line with other DSA drivers which have
      native support for DSA tagging. There, it is to be expected that the
      switch only accepts DSA-tagged packets on its CPU port, dropping
      everything that does not match this pattern.
      
      We perhaps rely a bit too much on the switches' hardware dropping on the
      CPU port, and place no other restrictions in the kernel data path to
      avoid that. For example, sja1105 is also a bit special in that STP/PTP
      packets are transmitted using "management routes"
      (sja1105_port_deferred_xmit): when sending a link-local packet from the
      CPU, we must first write a SPI message to the switch to tell it to
      expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
      it towards port 3 when it gets it. This entry expires as soon as it
      matches a packet received by the switch, and it needs to be reinstalled
      for the next packet etc. All in all quite a ghetto mechanism, but it is
      all that the sja1105 switches offer for injecting a control packet.
      The driver takes a mutex for serializing control packets and making the
      pairs of SPI writes of a management route and its associated skb atomic,
      but to be honest, a mutex is only relevant as long as all parties agree
      to take it. With the DSA design, it is possible to open an AF_PACKET
      socket on the DSA master net device, and blast packets towards
      01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
      it all goes kaput because management routes installed by the driver will
      match skbs sent by the DSA master, and not skbs generated by the driver
      itself. So they will end up being routed on the wrong port.
      
      So through the lens of that, maybe it would make sense to avoid that
      from happening by doing something in the network stack, like: introduce
      a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
      dev_hard_start_xmit(), introduce the following check:
      
      	if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
      		kfree_skb(skb);
      
      Ok, maybe that is a bit drastic, but that would at least prevent a bunch
      of problems. For example, right now, even though the majority of DSA
      switches drop packets without DSA tags sent by the DSA master (and
      therefore the majority of garbage that user space daemons like avahi and
      udhcpcd and friends create), it is still conceivable that an aggressive
      user space program can open an AF_PACKET socket and inject a spoofed DSA
      tag directly on the DSA master. We have no protection against that; the
      packet will be understood by the switch and be routed wherever user
      space says. Furthermore: there are some DSA switches where we even have
      register access over Ethernet, using DSA tags. So even user space
      drivers are possible in this way. This is a huge hole.
      
      However, the biggest thing that bothers me is that udhcpcd attempts to
      ask for an IP address on all interfaces by default, and with sja1105, it
      will attempt to get a valid IP address on both the DSA master as well as
      on sja1105 switch ports themselves. So with IP addresses in the same
      subnet on multiple interfaces, the routing table will be messed up and
      the system will be unusable for traffic until it is configured manually
      to not ask for an IP address on the DSA master itself.
      
      It turns out that it is possible to avoid that in the sja1105 driver, at
      least very superficially, by requesting the switch to drop VLAN-untagged
      packets on the CPU port. With the exception of control packets, all
      traffic originated from tag_sja1105.c is already VLAN-tagged, so only
      STP and PTP packets need to be converted. For that, we need to uphold
      the equivalence between an untagged and a pvid-tagged packet, and to
      remember that the CPU port of sja1105 uses a pvid of 4095.
      
      Now that we drop untagged traffic on the CPU port, non-aggressive user
      space applications like udhcpcd stop bothering us, and sja1105 effectively
      becomes just as vulnerable to the aggressive kind of user space programs
      as other DSA switches are (ok, users can also create 8021q uppers on top
      of the DSA master in the case of sja1105, but in future patches we can
      easily deny that, but it still doesn't change the fact that VLAN-tagged
      packets can still be injected over raw sockets).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0b8c67e
    • G
      mptcp: add the mibs for MP_FAIL · eb7f3365
      Geliang Tang 提交于
      This patch added the mibs for MP_FAIL: MPTCP_MIB_MPFAILTX and
      MPTCP_MIB_MPFAILRX.
      Signed-off-by: NGeliang Tang <geliangtang@xiaomi.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb7f3365
    • G
      mptcp: send out MP_FAIL when data checksum fails · 478d7700
      Geliang Tang 提交于
      When a bad checksum is detected, set the send_mp_fail flag to send out
      the MP_FAIL option.
      
      Add a new function mptcp_has_another_subflow() to check whether there's
      only a single subflow.
      
      When multiple subflows are in use, close the affected subflow with a RST
      that includes an MP_FAIL option and discard the data with the bad
      checksum.
      
      Set the sk_state of the subsocket to TCP_CLOSE, then the flag
      MPTCP_WORK_CLOSE_SUBFLOW will be set in subflow_sched_work_if_closed,
      and the subflow will be closed.
      
      When a single subfow is in use, temporarily handled by sending MP_FAIL
      with a RST too.
      Signed-off-by: NGeliang Tang <geliangtang@xiaomi.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      478d7700
    • G
      mptcp: MP_FAIL suboption receiving · 5580d41b
      Geliang Tang 提交于
      This patch added handling for receiving MP_FAIL suboption.
      
      Add a new members mp_fail and fail_seq in struct mptcp_options_received.
      When MP_FAIL suboption is received, set mp_fail to 1 and save the sequence
      number to fail_seq.
      
      Then invoke mptcp_pm_mp_fail_received to deal with the MP_FAIL suboption.
      Signed-off-by: NGeliang Tang <geliangtang@xiaomi.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5580d41b
    • G
      mptcp: MP_FAIL suboption sending · c25aeb4e
      Geliang Tang 提交于
      This patch added the MP_FAIL suboption sending support.
      
      Add a new flag named send_mp_fail in struct mptcp_subflow_context. If
      this flag is set, send out MP_FAIL suboption.
      
      Add a new member fail_seq in struct mptcp_out_options to save the data
      sequence number to put into the MP_FAIL suboption.
      
      An MP_FAIL option could be included in a RST or on the subflow-level
      ACK.
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NGeliang Tang <geliangtang@xiaomi.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c25aeb4e
    • P
      mptcp: optimize out option generation · 1bff1e43
      Paolo Abeni 提交于
      Currently we have several protocol constraints on MPTCP options
      generation (e.g. MPC and MPJ subopt are mutually exclusive)
      and some additional ones required by our implementation
      (e.g. almost all ADD_ADDR variant are mutually exclusive with
      everything else).
      
      We can leverage the above to optimize the out option generation:
      we check DSS/MPC/MPJ presence in a mutually exclusive way,
      avoiding many unneeded conditionals in the common cases.
      
      Additionally extend the existing constraints on ADD_ADDR opt on
      all subvariants, so that it becomes fully mutually exclusive with
      the above and we can skip another conditional statement for the
      common case.
      
      This change is also needed by the next patch.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bff1e43
    • G
      net-next: When a bond have a massive amount of VLANs with IPv6 addresses,... · 406f42fa
      Gilad Naaman 提交于
      net-next: When a bond have a massive amount of VLANs with IPv6 addresses, performance of changing link state, attaching a VRF, changing an IPv6 address, etc. go down dramtically.
      
      The source of most of the slow down is the `dev_addr_lists.c` module,
      which mainatins a linked list of HW addresses.
      When using IPv6, this list grows for each IPv6 address added on a
      VLAN, since each IPv6 address has a multicast HW address associated with
      it.
      
      When performing any modification to the involved links, this list is
      traversed many times, often for nothing, all while holding the RTNL
      lock.
      
      Instead, this patch adds an auxilliary rbtree which cuts down
      traversal time significantly.
      
      Performance can be seen with the following script:
      
      	#!/bin/bash
      	ip netns del test || true 2>/dev/null
      	ip netns add test
      
      	echo 1 | ip netns exec test tee /proc/sys/net/ipv6/conf/all/keep_addr_on_down > /dev/null
      
      	set -e
      
      	ip -n test link add foo type veth peer name bar
      	ip -n test link add b1 type bond
      	ip -n test link add florp type vrf table 10
      
      	ip -n test link set bar master b1
      	ip -n test link set foo up
      	ip -n test link set bar up
      	ip -n test link set b1 up
      	ip -n test link set florp up
      
      	VLAN_COUNT=1500
      	BASE_DEV=b1
      
      	echo Creating vlans
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test link add link $BASE_DEV name foo.\$i type vlan id \$i; done"
      
      	echo Bringing them up
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test link set foo.\$i up; done"
      
      	echo Assiging IPv6 Addresses
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test address add dev foo.\$i 2000::\$i/64; done"
      
      	echo Attaching to VRF
      	ip netns exec test time -p bash -c "for i in \$(seq 1 $VLAN_COUNT);
      	do ip -n test link set foo.\$i master florp; done"
      
      On an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz machine, the performance
      before the patch is (truncated):
      
      	Creating vlans
      	real 108.35
      	Bringing them up
      	real 4.96
      	Assiging IPv6 Addresses
      	real 19.22
      	Attaching to VRF
      	real 458.84
      
      After the patch:
      
      	Creating vlans
      	real 5.59
      	Bringing them up
      	real 5.07
      	Assiging IPv6 Addresses
      	real 5.64
      	Attaching to VRF
      	real 25.37
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Lu Wei <luwei32@huawei.com>
      Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: NGilad Naaman <gnaaman@drivenets.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      406f42fa
    • K
      net: bridge: change return type of br_handle_ingress_vlan_tunnel · a37c5c26
      Kangmin Park 提交于
      br_handle_ingress_vlan_tunnel() is only referenced in
      br_handle_frame(). If br_handle_ingress_vlan_tunnel() is called and
      return non-zero value, goto drop in br_handle_frame().
      
      But, br_handle_ingress_vlan_tunnel() always return 0. So, the
      routines that check the return value and goto drop has no meaning.
      
      Therefore, change return type of br_handle_ingress_vlan_tunnel() to
      void and remove if statement of br_handle_frame().
      Signed-off-by: NKangmin Park <l4stpr0gr4m@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210823102118.17966-1-l4stpr0gr4m@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a37c5c26
  4. 24 8月, 2021 15 次提交
    • Y
      ethtool: extend coalesce setting uAPI with CQE mode · f3ccfda1
      Yufeng Mo 提交于
      In order to support more coalesce parameters through netlink,
      add two new parameter kernel_coal and extack for .set_coalesce
      and .get_coalesce, then some extra info can return to user with
      the netlink API.
      Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f3ccfda1
    • Y
      ethtool: add two coalesce attributes for CQE mode · 029ee6b1
      Yufeng Mo 提交于
      Currently, there are many drivers who support CQE mode configuration,
      some configure it as a fixed when initialized, some provide an
      interface to change it by ethtool private flags. In order to make it
      more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and
      'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these
      parameters can be accessed by ethtool netlink coalesce uAPI.
      
      Also add an new structure kernel_ethtool_coalesce, then the
      new parameter can be added into this struct.
      Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      029ee6b1
    • Y
      page_pool: use relaxed atomic for release side accounting · 7fb9b66d
      Yunsheng Lin 提交于
      There is no need to synchronize the account updating, so
      use the relaxed atomic to avoid some memory barrier in the
      data path.
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fb9b66d
    • Z
      ipv6: correct comments about fib6_node sernum · 446e7f21
      zhang kai 提交于
      correct comments in set and get fn_sernum
      Signed-off-by: Nzhang kai <zhangkaiheb@126.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      446e7f21
    • V
      net: dsa: let drivers state that they need VLAN filtering while standalone · 58adf9dc
      Vladimir Oltean 提交于
      As explained in commit e358bef7 ("net: dsa: Give drivers the chance
      to veto certain upper devices"), the hellcreek driver uses some tricks
      to comply with the network stack expectations: it enforces port
      separation in standalone mode using VLANs. For untagged traffic,
      bridging between ports is prevented by using different PVIDs, and for
      VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on
      two ports, so packets with one VLAN cannot leak from one port to another.
      
      That is almost fine*, and has worked because hellcreek relied on an
      implicit behavior of the DSA core that was changed by the previous
      patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on
      [fixed]'. Since most of the DSA drivers are actually VLAN-unaware in
      standalone mode, that feature was actually incorrectly reflecting the
      hardware/driver state, so there was a desire to fix it. This leaves the
      hellcreek driver in a situation where it has to explicitly request this
      behavior from the DSA framework.
      
      We configure the ports as follows:
      
      - Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a
        standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid
        and will add a VLAN to the hardware tables, giving the driver the
        opportunity to refuse it through .port_prechangeupper.
      
      - Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper
        on top of a bridged hellcreek port will not go through
        dsa_slave_vlan_rx_add_vid, because there will not be any attempt to
        offload this VLAN. The driver already disables VLAN awareness, so that
        upper should receive the traffic it needs.
      
      - Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper
        on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid,
        and can again be vetoed through .port_prechangeupper.
      
      *It is not actually completely fine, because if I follow through
      correctly, we can have the following situation:
      
      ip link add br0 type bridge vlan_filtering 0
      ip link set lan0 master br0 # lan0 now becomes VLAN-unaware
      ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation
      
      This patch fixes that corner case by extending the DSA core logic, based
      on this requested attribute, to change the VLAN awareness state of the
      switch (port) when it leaves the bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58adf9dc
    • V
      net: dsa: don't advertise 'rx-vlan-filter' when not needed · 06cfb2df
      Vladimir Oltean 提交于
      There have been multiple independent reports about
      dsa_slave_vlan_rx_add_vid being called (and consequently calling the
      drivers' .port_vlan_add) when it isn't needed, and sometimes (not
      always) causing problems in the process.
      
      Case 1:
      mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on
      bridged ports. That is understandably so, because standalone mv88e6xxx
      ports are VLAN-unaware, and VTU entries are said to be a scarce
      resource.
      
      Otherwise said, the following fails lamentably on mv88e6xxx:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link set lan3 master br0
      ip link add link lan10 name lan10.1 type vlan id 1
      [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0
      RTNETLINK answers: Operation not supported
      
      This has become a worse issue since commit 9b236d2a ("net: dsa:
      Advertise the VLAN offload netdev ability only if switch supports it").
      Up to that point, the driver was returning -EOPNOTSUPP and DSA was
      reconverting that error to 0, making the 8021q upper think all is ok
      (but obviously the error message was there even prior to this change).
      After that change the -EOPNOTSUPP is propagated to vlan_vid_add, and it
      is a hard error.
      
      Case 2:
      Ports that don't offload the Linux bridge (have a dp->bridge_dev = NULL
      because they don't implement .port_bridge_{join,leave}). Understandably,
      a standalone port should not offload VLANs either, it should remain VLAN
      unaware and any VLAN should be a software VLAN (as long as the hardware
      is not quirky, that is).
      
      In fact, dsa_slave_port_obj_add does do the right thing and rejects
      switchdev VLAN objects coming from the bridge when that bridge is not
      offloaded:
      
      	case SWITCHDEV_OBJ_ID_PORT_VLAN:
      		if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
      			return -EOPNOTSUPP;
      
      		err = dsa_slave_vlan_add(dev, obj, extack);
      
      But it seems that the bridge is able to trick us. The __vlan_vid_add
      from br_vlan.c has:
      
      	/* Try switchdev op first. In case it is not supported, fallback to
      	 * 8021q add.
      	 */
      	err = br_switchdev_port_vlan_add(dev, v->vid, flags, extack);
      	if (err == -EOPNOTSUPP)
      		return vlan_vid_add(dev, br->vlan_proto, v->vid);
      
      So it says "no, no, you need this VLAN in your life!". And we, naive as
      we are, say "oh, this comes from the vlan_vid_add code path, it must be
      an 8021q upper, sure, I'll take that". And we end up with that bridge
      VLAN installed on our port anyway. But this time, it has the wrong flags:
      if the bridge was trying to install VLAN 1 as a pvid/untagged VLAN,
      failed via switchdev, retried via vlan_vid_add, we have this comment:
      
      	/* This API only allows programming tagged, non-PVID VIDs */
      
      So what we do makes absolutely no sense.
      
      Backtracing a bit, we see the common pattern. We allow the network stack
      to think that our standalone ports are VLAN-aware, but they aren't, for
      the vast majority of switches. The quirky ones should not dictate the
      norm. The dsa_slave_vlan_rx_add_vid and dsa_slave_vlan_rx_kill_vid
      methods exist for drivers that need the 'rx-vlan-filter: on' feature in
      ethtool -k, which can be due to any of the following reasons:
      
      1. vlan_filtering_is_global = true, and some ports are under a
         VLAN-aware bridge while others are standalone, and the standalone
         ports would otherwise drop VLAN-tagged traffic. This is described in
         commit 061f6a50 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid
         implementation").
      
      2. the ports that are under a VLAN-aware bridge should also set this
         feature, for 8021q uppers having a VID not claimed by the bridge.
         In this case, the driver will essentially not even know that the VID
         is coming from the 8021q layer and not the bridge.
      
      3. Hellcreek. This driver needs it because in standalone mode, it uses
         unique VLANs per port to ensure separation. For separation of untagged
         traffic, it uses different PVIDs for each port, and for separation of
         VLAN-tagged traffic, it never accepts 8021q uppers with the same vid
         on two ports.
      
      If a driver does not fall under any of the above 3 categories, there is
      no reason why it should advertise the 'rx-vlan-filter' feature, therefore
      no reason why it should offload the VLANs added through vlan_vid_add.
      
      This commit fixes the problem by removing the 'rx-vlan-filter' feature
      from the slave devices when they operate in standalone mode, and when
      they offload a VLAN-unaware bridge.
      
      The way it works is that vlan_vid_add will now stop its processing here:
      
      vlan_add_rx_filter_info:
      	if (!vlan_hw_filter_capable(dev, proto))
      		return 0;
      
      So the VLAN will still be saved in the interface's VLAN RX filtering
      list, but because it does not declare VLAN filtering in its features,
      the 8021q module will return zero without committing that VLAN to
      hardware.
      
      This gives the drivers what they want, since it keeps the 8021q VLANs
      away from the VLAN table until VLAN awareness is enabled (point at which
      the ports are no longer standalone, hence in the mv88e6xxx case, the
      check in mv88e6xxx_port_vlan_prepare passes).
      
      Since the issue predates the existence of the hellcreek driver, case 3
      will be dealt with in a separate patch.
      
      The main change that this patch makes is to no longer set
      NETIF_F_HW_VLAN_CTAG_FILTER unconditionally, but toggle it dynamically
      (for most switches, never).
      
      The second part of the patch addresses an issue that the first part
      introduces: because the 'rx-vlan-filter' feature is now dynamically
      toggled, and our .ndo_vlan_rx_add_vid does not get called when
      'rx-vlan-filter' is off, we need to avoid bugs such as the following by
      replaying the VLANs from 8021q uppers every time we enable VLAN
      filtering:
      
      ip link add link lan0 name lan0.100 type vlan id 100
      ip addr add 192.168.100.1/24 dev lan0.100
      ping 192.168.100.2 # should work
      ip link add br0 type bridge vlan_filtering 0
      ip link set lan0 master br0
      ping 192.168.100.2 # should still work
      ip link set br0 type bridge vlan_filtering 1
      ping 192.168.100.2 # should still work but doesn't
      
      As reported by Florian, some drivers look at ds->vlan_filtering in
      their .port_vlan_add() implementation. So this patch also makes sure
      that ds->vlan_filtering is committed before calling the driver. This is
      the reason why it is first committed, then restored on the failure path.
      Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Reported-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06cfb2df
    • V
      net: dsa: properly fall back to software bridging · 67b5fb5d
      Vladimir Oltean 提交于
      If the driver does not implement .port_bridge_{join,leave}, then we must
      fall back to standalone operation on that port, and trigger the error
      path of dsa_port_bridge_join. This sets dp->bridge_dev = NULL.
      
      In turn, having a non-NULL dp->bridge_dev when there is no offloading
      support makes the following things go wrong:
      
      - dsa_default_offload_fwd_mark make the wrong decision in setting
        skb->offload_fwd_mark. It should set skb->offload_fwd_mark = 0 for
        ports that don't offload the bridge, which should instruct the bridge
        to forward in software. But this does not happen, dp->bridge_dev is
        incorrectly set to point to the bridge, so the bridge is told that
        packets have been forwarded in hardware, which they haven't.
      
      - switchdev objects (MDBs, VLANs) should not be offloaded by ports that
        don't offload the bridge. Standalone ports should behave as packet-in,
        packet-out and the bridge should not be able to manipulate the pvid of
        the port, or tag stripping on egress, or ingress filtering. This
        should already work fine because dsa_slave_port_obj_add has:
      
      	case SWITCHDEV_OBJ_ID_PORT_VLAN:
      		if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev))
      			return -EOPNOTSUPP;
      
      		err = dsa_slave_vlan_add(dev, obj, extack);
      
        but since dsa_port_offloads_bridge_port works based on dp->bridge_dev,
        this is again sabotaging us.
      
      All the above work in case the port has an unoffloaded LAG interface, so
      this is well exercised code, we should apply it for plain unoffloaded
      bridge ports too.
      Reported-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67b5fb5d
    • V
      net: dsa: don't call switchdev_bridge_port_unoffload for unoffloaded bridge ports · 09dba21b
      Vladimir Oltean 提交于
      For ports that have a NULL dp->bridge_dev, dsa_port_to_bridge_port()
      also returns NULL as expected.
      
      Issue #1 is that we are performing a NULL pointer dereference on brport_dev.
      
      Issue #2 is that these are ports on which switchdev_bridge_port_offload
      has not been called, so we should not call switchdev_bridge_port_unoffload
      on them either.
      
      Both issues are addressed by checking against a NULL brport_dev in
      dsa_port_pre_bridge_leave and exiting early.
      
      Fixes: 2f5dc00f ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09dba21b
    • L
      mac80211: introduce individual TWT support in AP mode · f5a4c24e
      Lorenzo Bianconi 提交于
      Introduce TWT action frames parsing support to mac80211.
      Currently just individual TWT agreement are support in AP mode.
      Whenever the AP receives a TWT action frame from an associated client,
      after performing sanity checks, it will notify the underlay driver with
      requested parameters in order to check if they are supported and if there
      is enough room for a new agreement. The driver is expected to set the
      agreement result and report it to mac80211.
      
      Drivers supporting this have two new callbacks:
       - add_twt_setup (mandatory)
       - twt_teardown_request (optional)
      
      mac80211 will send an action frame reply according to the result
      reported by the driver.
      Tested-by: NPeter Chiu <chui-hao.chiu@mediatek.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/257512f2e22ba42b9f2624942a128dd8f141de4b.1629741512.git.lorenzo@kernel.org
      [use le16p_replace_bits(), minor cleanups, use (void *) casts,
       fix to use ieee80211_get_he_iftype_cap() correctly]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      f5a4c24e
    • Y
      mptcp: remove MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT · c233ef13
      Yonglong Li 提交于
      MPTCP_ADD_ADDR_IPV6 and MPTCP_ADD_ADDR_PORT are not necessary, we can get
      these info from pm.local or pm.remote.
      
      Drop mptcp_pm_should_add_signal_ipv6 and mptcp_pm_should_add_signal_port
      too.
      Co-developed-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NYonglong Li <liyonglong@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c233ef13
    • Y
      mptcp: build ADD_ADDR/echo-ADD_ADDR option according pm.add_signal · f462a446
      Yonglong Li 提交于
      According to the MPTCP_ADD_ADDR_SIGNAL or MPTCP_ADD_ADDR_ECHO flag, build
      the ADD_ADDR/ADD_ADDR_ECHO option.
      
      In mptcp_pm_add_addr_signal(), use opts->addr to save the announced
      ADD_ADDR or ADD_ADDR_ECHO address.
      Co-developed-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NYonglong Li <liyonglong@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f462a446
    • Y
      mptcp: fix ADD_ADDR and RM_ADDR maybe flush addr_signal each other · 119c0220
      Yonglong Li 提交于
      ADD_ADDR shares pm.addr_signal with RM_ADDR, so after RM_ADDR/ADD_ADDR
      has done, we should not clean ADD_ADDR/RM_ADDR's addr_signal.
      Co-developed-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NYonglong Li <liyonglong@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      119c0220
    • Y
      mptcp: make MPTCP_ADD_ADDR_SIGNAL and MPTCP_ADD_ADDR_ECHO separate · 18fc1a92
      Yonglong Li 提交于
      Use MPTCP_ADD_ADDR_SIGNAL only for the action of sending ADD_ADDR, and
      use MPTCP_ADD_ADDR_ECHO only for the action of sending ADD_ADDR echo.
      
      Use msk->pm.local to save the announced ADD_ADDR address only, and reuse
      msk->pm.remote to save the announced ADD_ADDR_ECHO address.
      
      To prepare for the next patch.
      Co-developed-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NYonglong Li <liyonglong@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18fc1a92
    • Y
      mptcp: move drop_other_suboptions check under pm lock · 1f5e9e2f
      Yonglong Li 提交于
      This patch moved the drop_other_suboptions check from
      mptcp_established_options_add_addr() into mptcp_pm_add_addr_signal(), do
      it under the PM lock to avoid the race between this check and
      mptcp_pm_add_addr_signal().
      
      For this, added a new parameter for mptcp_pm_add_addr_signal() to get
      the drop_other_suboptions value. And drop the other suboptions after the
      option length check if drop_other_suboptions is true.
      
      Additionally, always drop the other suboption for TCP pure ack:
      that makes both the code simpler and the MPTCP behaviour more
      consistent.
      Co-developed-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NYonglong Li <liyonglong@chinatelecom.cn>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f5e9e2f
    • Y
      net: ipv4: Move ip_options_fragment() out of loop · faf482ca
      Yajun Deng 提交于
      The ip_options_fragment() only called when iter->offset is equal to zero,
      so move it out of loop, and inline 'Copy the flags to each fragment.'
      As also, remove the unused parameter in ip_frag_ipcb().
      Signed-off-by: NYajun Deng <yajun.deng@linux.dev>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faf482ca
  5. 23 8月, 2021 2 次提交
    • V
      net: dsa: track unique bridge numbers across all DSA switch trees · f5e165e7
      Vladimir Oltean 提交于
      Right now, cross-tree bridging setups work somewhat by mistake.
      
      In the case of cross-tree bridging with sja1105, all switch instances
      need to agree upon a common VLAN ID for forwarding a packet that belongs
      to a certain bridging domain.
      
      With TX forwarding offload, the VLAN ID is the bridge VLAN for
      VLAN-aware bridging, and the tag_8021q TX forwarding offload VID
      (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging.
      
      The VBID for VLAN-unaware bridging is derived from the dp->bridge_num
      value calculated by DSA independently for each switch tree.
      
      If ports from one tree join one bridge, and ports from another tree join
      another bridge, DSA will assign them the same bridge_num, even though
      the bridges are different. If cross-tree bridging is supported, this
      is an issue.
      
      Modify DSA to calculate the bridge_num globally across all switch trees.
      This has the implication for a driver that the dp->bridge_num value that
      DSA will assign to its ports might not be contiguous, if there are
      boards with multiple DSA drivers instantiated. Additionally, all
      bridge_num values eat up towards each switch's
      ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate,
      and can be seen as a limitation introduced by this patch. However, that
      is the lesser evil for now.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5e165e7
    • S
      ip6_gre: add validation for csum_start · 9cf448c2
      Shreyansh Chouhan 提交于
      Validate csum_start in gre_handle_offloads before we call _gre_xmit so
      that we do not crash later when the csum_start value is used in the
      lco_csum function call.
      
      This patch deals with ipv6 code.
      
      Fixes: Fixes: b05229f4 ("gre6: Cleanup GREv6 transmit path, call common
      GRE functions")
      Reported-by: syzbot+ff8e1b9f2f36481e2efc@syzkaller.appspotmail.com
      Signed-off-by: NShreyansh Chouhan <chouhan.shreyansh630@gmail.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cf448c2