1. 20 10月, 2021 3 次提交
    • Jiaran Zhang's avatar
      net: hns3: Add configuration of TM QCN error event · 60484103
      Jiaran Zhang 提交于
      Add configuration of interrupt type and fifo interrupt enable of TM QCN
      error event if enabled, otherwise this event will not be reported when
      there is error.
      
      Fixes: d914971d ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
      Signed-off-by: Jiaran Zhang's avatarJiaran Zhang <zhangjiaran@huawei.com>
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60484103
    • E
      vrf: Revert "Reset skb conntrack connection..." · 55161e67
      Eugene Crosser 提交于
      This reverts commit 09e856d5.
      
      When an interface is enslaved in a VRF, prerouting conntrack hook is
      called twice: once in the context of the original input interface, and
      once in the context of the VRF interface. If no special precausions are
      taken, this leads to creation of two conntrack entries instead of one,
      and breaks SNAT.
      
      Commit above was intended to avoid creation of extra conntrack entries
      when input interface is enslaved in a VRF. It did so by resetting
      conntrack related data associated with the skb when it enters VRF context.
      
      However it breaks netfilter operation. Imagine a use case when conntrack
      zone must be assigned based on the original input interface, rather than
      VRF interface (that would make original interfaces indistinguishable). One
      could create netfilter rules similar to these:
      
              chain rawprerouting {
                      type filter hook prerouting priority raw;
                      iif realiface1 ct zone set 1 return
                      iif realiface2 ct zone set 2 return
              }
      
      This works before the mentioned commit, but not after: zone assignment
      is "forgotten", and any subsequent NAT or filtering that is dependent
      on the conntrack zone does not work.
      
      Here is a reproducer script that demonstrates the difference in behaviour.
      
      ==========
      #!/bin/sh
      
      # This script demonstrates unexpected change of nftables behaviour
      # caused by commit 09e856d5 ""vrf: Reset skb conntrack
      # connection on VRF rcv"
      # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
      #
      # Before the commit, it was possible to assign conntrack zone to a
      # packet (or mark it for `notracking`) in the prerouting chanin, raw
      # priority, based on the `iif` (interface from which the packet
      # arrived).
      # After the change, # if the interface is enslaved in a VRF, such
      # assignment is lost. Instead, assignment based on the `iif` matching
      # the VRF master interface is honored. Thus it is impossible to
      # distinguish packets based on the original interface.
      #
      # This script demonstrates this change of behaviour: conntrack zone 1
      # or 2 is assigned depending on the match with the original interface
      # or the vrf master interface. It can be observed that conntrack entry
      # appears in different zone in the kernel versions before and after
      # the commit.
      
      IPIN=172.30.30.1
      IPOUT=172.30.30.2
      PFXL=30
      
      ip li sh vein >/dev/null 2>&1 && ip li del vein
      ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
      nft list table testct >/dev/null 2>&1 && nft delete table testct
      
      ip li add vein type veth peer veout
      ip li add tvrf type vrf table 9876
      ip li set veout master tvrf
      ip li set vein up
      ip li set veout up
      ip li set tvrf up
      /sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
      /sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
      ip addr add $IPIN/$PFXL dev vein
      ip addr add $IPOUT/$PFXL dev veout
      
      nft -f - <<__END__
      table testct {
      	chain rawpre {
      		type filter hook prerouting priority raw;
      		iif { veout, tvrf } meta nftrace set 1
      		iif veout ct zone set 1 return
      		iif tvrf ct zone set 2 return
      		notrack
      	}
      	chain rawout {
      		type filter hook output priority raw;
      		notrack
      	}
      }
      __END__
      
      uname -rv
      conntrack -F
      ping -W 1 -c 1 -I vein $IPOUT
      conntrack -L
      Signed-off-by: NEugene Crosser <crosser@average.org>
      Acked-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55161e67
    • C
      net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()' · ba69fd91
      Christophe JAILLET 提交于
      If we return before the end of the 'for_each_child_of_node()' iterator, the
      reference taken on 'port' must be released.
      
      Add the missing 'of_node_put()' calls.
      
      Fixes: 83c0afae ("net: dsa: Add new binding implementation")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.frSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ba69fd91
  2. 19 10月, 2021 5 次提交
  3. 18 10月, 2021 16 次提交
  4. 17 10月, 2021 6 次提交
  5. 16 10月, 2021 4 次提交
    • N
      net: bridge: mcast: use multicast_membership_interval for IGMPv3 · fac3cb82
      Nikolay Aleksandrov 提交于
      When I added IGMPv3 support I decided to follow the RFC for computing
      the GMI dynamically:
      " 8.4. Group Membership Interval
      
         The Group Membership Interval is the amount of time that must pass
         before a multicast router decides there are no more members of a
         group or a particular source on a network.
      
         This value MUST be ((the Robustness Variable) times (the Query
         Interval)) plus (one Query Response Interval)."
      
      But that actually is inconsistent with how the bridge used to compute it
      for IGMPv2, where it was user-configurable that has a correct default value
      but it is up to user-space to maintain it. This would make it consistent
      with the other timer values which are also maintained correct by the user
      instead of being dynamically computed. It also changes back to the previous
      user-expected GMI behaviour for IGMPv3 queries which were supported before
      IGMPv3 was added. Note that to properly compute it dynamically we would
      need to add support for "Robustness Variable" which is currently missing.
      Reported-by: NHangbin Liu <liuhangbin@gmail.com>
      Fixes: 0436862e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fac3cb82
    • S
      vsock_diag_test: remove free_sock_stat() call in test_no_sockets · ba95a622
      Stefano Garzarella 提交于
      In `test_no_sockets` we don't expect any sockets, indeed
      check_no_sockets() prints an error and exits if `sockets` list is
      not empty, so free_sock_stat() call is unnecessary since it would
      only be called when the `sockets` list is empty.
      
      This was discovered by a strange warning printed by gcc v11.2.1:
        In file included from ../../include/linux/list.h:7,
                         from vsock_diag_test.c:18:
        vsock_diag_test.c: In function ‘test_no_sockets’:
        ../../include/linux/kernel.h:35:45: error: array subscript ‘struct vsock_stat[0]’ is partly outside array bound
        s of ‘struct list_head[1]’ [-Werror=array-bounds]
           35 |         const typeof(((type *)0)->member) * __mptr = (ptr);     \
              |                                             ^~~~~~
        ../../include/linux/list.h:352:9: note: in expansion of macro ‘container_of’
          352 |         container_of(ptr, type, member)
              |         ^~~~~~~~~~~~
        ../../include/linux/list.h:393:9: note: in expansion of macro ‘list_entry’
          393 |         list_entry((pos)->member.next, typeof(*(pos)), member)
              |         ^~~~~~~~~~
        ../../include/linux/list.h:522:21: note: in expansion of macro ‘list_next_entry’
          522 |                 n = list_next_entry(pos, member);                       \
              |                     ^~~~~~~~~~~~~~~
        vsock_diag_test.c:325:9: note: in expansion of macro ‘list_for_each_entry_safe’
          325 |         list_for_each_entry_safe(st, next, sockets, list) {
              |         ^~~~~~~~~~~~~~~~~~~~~~~~
        In file included from vsock_diag_test.c:18:
        vsock_diag_test.c:333:19: note: while referencing ‘sockets’
          333 |         LIST_HEAD(sockets);
              |                   ^~~~~~~
        ../../include/linux/list.h:23:26: note: in definition of macro ‘LIST_HEAD’
           23 |         struct list_head name = LIST_HEAD_INIT(name)
      
      It seems related to some compiler optimization and assumption
      about the empty `sockets` list, since this warning is printed
      only with -02 or -O3. Also removing `exit(1)` from
      check_no_sockets() makes the warning disappear since in that
      case free_sock_stat() can be reached also when the list is
      not empty.
      Reported-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20211014152045.173872-1-sgarzare@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ba95a622
    • J
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 2151135a
      Jakub Kicinski 提交于
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-10-14
      
      Brett ensures RDMA nodes are removed during release and rebuild. He also
      corrects fw.mgmt.api to include the patch number for proper
      identification.
      
      Dave stops ida_free() being called when an IDA has not been allocated.
      
      Michal corrects the order of parameters being provided and the number of
      entries skipped for UDP tunnels.
      ====================
      
      Link: https://lore.kernel.org/r/20211014181953.3538330-1-anthony.l.nguyen@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      2151135a
    • S
      ipv6: When forwarding count rx stats on the orig netdev · 0857d6f8
      Stephen Suryaputra 提交于
      Commit bdb7cc64 ("ipv6: Count interface receive statistics on the
      ingress netdev") does not work when ip6_forward() executes on the skbs
      with vrf-enslaved netdev. Use IP6CB(skb)->iif to get to the right one.
      
      Add a selftest script to verify.
      
      Fixes: bdb7cc64 ("ipv6: Count interface receive statistics on the ingress netdev")
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20211014130845.410602-1-ssuryaextr@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0857d6f8
  6. 15 10月, 2021 6 次提交
    • D
      Merge branch 'tcp-md5-vrf-fix' · 4884ddba
      David S. Miller 提交于
      Leonard Crestez says:
      
      ====================
      tcp: md5: Fix overlap between vrf and non-vrf keys
      
      With net.ipv4.tcp_l3mdev_accept=1 it is possible for a listen socket to
      accept connection from the same client address in different VRFs. It is
      also possible to set different MD5 keys for these clients which differ only
      in the tcpm_l3index field.
      
      This appears to work when distinguishing between different VRFs but not
      between non-VRF and VRF connections. In particular:
      
      * tcp_md5_do_lookup_exact will match a non-vrf key against a vrf key. This
      means that adding a key with l3index != 0 after a key with l3index == 0
      will cause the earlier key to be deleted. Both keys can be present if the
      non-vrf key is added later.
      * _tcp_md5_do_lookup can match a non-vrf key before a vrf key. This casues
      failures if the passwords differ.
      
      This can be fixed by making tcp_md5_do_lookup_exact perform an actual exact
      comparison on l3index and by making  __tcp_md5_do_lookup perfer vrf-bound
      keys above other considerations like prefixlen.
      
      The fact that keys with l3index==0 affect VRF connections is usually not
      desirable, VRFs are meant to be completely independent. This behavior needs
      to preserved for backwards compatibility. Also, applications can just bind
      listen sockets to VRF and never specify TCP_MD5SIG_FLAG_IFINDEX at all.
      
      So far the combination of TCP_MD5SIG_FLAG_IFINDEX with tcpm_ifindex == 0
      was an error, accept this to mean "key only applies to default VRF". This
      is what applications using VRFs for traffic separation want.
      
      This also contains tests for the second part. It does not contain tests for
      overlapping keys, that would require more changes in nettest to add
      multiple keys. These scenarios are also covered by my tests for TCP-AO,
      especially around this area:
      https://github.com/cdleonard/tcp-authopt-test/blob/main/tcp_authopt_test/test_vrf_bind.py
      
      Changes since V2:
      * Rename --do-bind-key-ifindex to --force-bind-key-ifindex
      * Fix referencing TCP_MD5SIG_FLAG_IFINDEX as TCP_MD5SIG_IFINDEX
      Link to v2: https://lore.kernel.org/netdev/cover.1634107317.git.cdleonard@gmail.com/
      
      Changes since V1:
      * Accept (TCP_MD5SIG_IFINDEX with tcpm_ifindex == 0)
      * Add flags for explicitly including or excluding TCP_MD5SIG_FLAG_IFINDEX
      to nettest
      * Add few more tests in fcnal-test.sh.
      Link to v1: https://lore.kernel.org/netdev/3d8387d499f053dba5cd9184c0f7b8445c4470c6.1633542093.git.cdleonard@gmail.com/
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4884ddba
    • L
      selftests: net/fcnal: Test --{force,no}-bind-key-ifindex · 64e40177
      Leonard Crestez 提交于
      Test that applications binding listening sockets to VRFs without
      specifying TCP_MD5SIG_FLAG_IFINDEX will work as expected. This would
      be broken if __tcp_md5_do_lookup always made a strict comparison on
      l3index. See this email:
      
      https://lore.kernel.org/netdev/209548b5-27d2-2059-f2e9-2148f5a0291b@gmail.com/
      
      Applications using tcp_l3mdev_accept=1 and a single global socket (not
      bound to any interface) also should have a way to specify keys that are
      only for the default VRF, this is done by --force-bind-key-ifindex
      without otherwise binding to a device.
      Signed-off-by: NLeonard Crestez <cdleonard@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64e40177
    • L
      selftests: nettest: Add --{force,no}-bind-key-ifindex · 78a9cf61
      Leonard Crestez 提交于
      These options allow explicit control over the TCP_MD5SIG_FLAG_IFINDEX
      flag instead of always setting it based on binding to an interface.
      
      Do this by converting to getopt_long because nettest has too many
      single-character flags already and getopt_long is widely used in
      selftests.
      Signed-off-by: NLeonard Crestez <cdleonard@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78a9cf61
    • L
      tcp: md5: Allow MD5SIG_FLAG_IFINDEX with ifindex=0 · a76c2315
      Leonard Crestez 提交于
      Multiple VRFs are generally meant to be "separate" but right now md5
      keys for the default VRF also affect connections inside VRFs if the IP
      addresses happen to overlap.
      
      So far the combination of TCP_MD5SIG_FLAG_IFINDEX with tcpm_ifindex == 0
      was an error, accept this to mean "key only applies to default VRF".
      This is what applications using VRFs for traffic separation want.
      Signed-off-by: NLeonard Crestez <cdleonard@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a76c2315
    • L
      tcp: md5: Fix overlap between vrf and non-vrf keys · 86f1e3a8
      Leonard Crestez 提交于
      With net.ipv4.tcp_l3mdev_accept=1 it is possible for a listen socket to
      accept connection from the same client address in different VRFs. It is
      also possible to set different MD5 keys for these clients which differ
      only in the tcpm_l3index field.
      
      This appears to work when distinguishing between different VRFs but not
      between non-VRF and VRF connections. In particular:
      
       * tcp_md5_do_lookup_exact will match a non-vrf key against a vrf key.
      This means that adding a key with l3index != 0 after a key with l3index
      == 0 will cause the earlier key to be deleted. Both keys can be present
      if the non-vrf key is added later.
       * _tcp_md5_do_lookup can match a non-vrf key before a vrf key. This
      casues failures if the passwords differ.
      
      Fix this by making tcp_md5_do_lookup_exact perform an actual exact
      comparison on l3index and by making  __tcp_md5_do_lookup perfer
      vrf-bound keys above other considerations like prefixlen.
      
      Fixes: dea53bb8 ("tcp: Add l3index to tcp_md5sig_key and md5 functions")
      Signed-off-by: NLeonard Crestez <cdleonard@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86f1e3a8
    • V
      lan78xx: select CRC32 · 46393d61
      Vegard Nossum 提交于
      Fix the following build/link error by adding a dependency on the CRC32
      routines:
      
        ld: drivers/net/usb/lan78xx.o: in function `lan78xx_set_multicast':
        lan78xx.c:(.text+0x48cf): undefined reference to `crc32_le'
      
      The actual use of crc32_le() comes indirectly through ether_crc().
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46393d61