1. 22 4月, 2022 3 次提交
  2. 20 4月, 2022 11 次提交
  3. 19 4月, 2022 8 次提交
    • E
      netlink: reset network and mac headers in netlink_dump() · 99c07327
      Eric Dumazet 提交于
      netlink_dump() is allocating an skb, reserves space in it
      but forgets to reset network header.
      
      This allows a BPF program, invoked later from sk_filter()
      to access uninitialized kernel memory from the reserved
      space.
      
      Theorically mac header reset could be omitted, because
      it is set to a special initial value.
      bpf_internal_load_pointer_neg_helper calls skb_mac_header()
      without checking skb_mac_header_was_set().
      Relying on skb->len not being too big seems fragile.
      We also could add a sanity check in bpf_internal_load_pointer_neg_helper()
      to avoid surprises in the future.
      
      syzbot report was:
      
      BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
       ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637
       __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
       bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
       bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
       sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
       sk_filter include/linux/filter.h:905 [inline]
       netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was stored to memory at:
       ___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558
       __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796
       bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline]
       __bpf_prog_run include/linux/filter.h:626 [inline]
       bpf_prog_run include/linux/filter.h:633 [inline]
       __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756
       bpf_prog_run_save_cb include/linux/filter.h:770 [inline]
       sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150
       sk_filter include/linux/filter.h:905 [inline]
       netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:737 [inline]
       slab_alloc_node mm/slub.c:3244 [inline]
       __kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972
       kmalloc_reserve net/core/skbuff.c:354 [inline]
       __alloc_skb+0x545/0xf90 net/core/skbuff.c:426
       alloc_skb include/linux/skbuff.h:1158 [inline]
       netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242
       netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_read_iter+0x5a9/0x630 net/socket.c:1039
       do_iter_readv_writev+0xa7f/0xc70
       do_iter_read+0x52c/0x14c0 fs/read_write.c:786
       vfs_readv fs/read_write.c:906 [inline]
       do_readv+0x432/0x800 fs/read_write.c:943
       __do_sys_readv fs/read_write.c:1034 [inline]
       __se_sys_readv fs/read_write.c:1031 [inline]
       __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: db65a3aa ("netlink: Trim skb to alloc size to avoid MSG_TRUNC")
      Fixes: 9063e21f ("netlink: autosize skb lengthes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      99c07327
    • F
      rtnetlink: return EINVAL when request cannot succeed · b6177d32
      Florent Fourcot 提交于
      A request without interface name/interface index/interface group cannot
      work. We should return EINVAL
      Signed-off-by: NFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: NBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      b6177d32
    • F
      rtnetlink: return ENODEV when IFLA_ALT_IFNAME is used in dellink · dee04163
      Florent Fourcot 提交于
      If IFLA_ALT_IFNAME is set and given interface is not found,
      we should return ENODEV and be consistent with IFLA_IFNAME
      behaviour
      This commit extends feature of commit 76c9ac0e,
      "net: rtnetlink: add possibility to use alternative names as message handle"
      
      CC: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: NBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      dee04163
    • F
      rtnetlink: enable alt_ifname for setlink/newlink · 5ea08b52
      Florent Fourcot 提交于
      buffer called "ifname" given in function rtnl_dev_get
      is always valid when called by setlink/newlink,
      but contains only empty string when IFLA_IFNAME is not given. So
      IFLA_ALT_IFNAME is always ignored
      
      This patch fixes rtnl_dev_get function with a remove of ifname argument,
      and move ifname copy in do_setlink when required.
      
      It extends feature of commit 76c9ac0e,
      "net: rtnetlink: add possibility to use alternative names as message
      handle""
      
      CC: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: NBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      5ea08b52
    • F
      rtnetlink: return ENODEV when ifname does not exist and group is given · ef2a7c90
      Florent Fourcot 提交于
      When the interface does not exist, and a group is given, the given
      parameters are being set to all interfaces of the given group. The given
      IFNAME/ALT_IF_NAME are being ignored in that case.
      
      That can be dangerous since a typo (or a deleted interface) can produce
      weird side effects for caller:
      
      Case 1:
      
       IFLA_IFNAME=valid_interface
       IFLA_GROUP=1
       MTU=1234
      
      Case 1 will update MTU and group of the given interface "valid_interface".
      
      Case 2:
      
       IFLA_IFNAME=doesnotexist
       IFLA_GROUP=1
       MTU=1234
      
      Case 2 will update MTU of all interfaces in group 1. IFLA_IFNAME is
      ignored in this case
      
      This behaviour is not consistent and dangerous. In order to fix this issue,
      we now return ENODEV when the given IFNAME does not exist.
      Signed-off-by: NFlorent Fourcot <florent.fourcot@wifirst.fr>
      Signed-off-by: NBrian Baboch <brian.baboch@wifirst.fr>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      ef2a7c90
    • T
      net: sched: support hash selecting tx queue · 38a6f086
      Tonghao Zhang 提交于
      This patch allows users to pick queue_mapping, range
      from A to B. Then we can load balance packets from A
      to B tx queue. The range is an unsigned 16bit value
      in decimal format.
      
      $ tc filter ... action skbedit queue_mapping skbhash A B
      
      "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit")
      is enhanced with flags: SKBEDIT_F_TXQ_SKBHASH
      
        +----+      +----+      +----+
        | P1 |      | P2 |      | Pn |
        +----+      +----+      +----+
          |           |           |
          +-----------+-----------+
                      |
                      | clsact/skbedit
                      |      MQ
                      v
          +-----------+-----------+
          | q0        | qn        | qm
          v           v           v
        HTB/FQ       FIFO   ...  FIFO
      
      For example:
      If P1 sends out packets to different Pods on other host, and
      we want distribute flows from qn - qm. Then we can use skb->hash
      as hash.
      
      setup commands:
      $ NETDEV=eth0
      $ ip netns add n1
      $ ip link add ipv1 link $NETDEV type ipvlan mode l2
      $ ip link set ipv1 netns n1
      $ ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up
      
      $ tc qdisc add dev $NETDEV clsact
      $ tc filter add dev $NETDEV egress protocol ip prio 1 \
              flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping skbhash 2 6
      $ tc qdisc add dev $NETDEV handle 1: root mq
      $ tc qdisc add dev $NETDEV parent 1:1 handle 2: htb
      $ tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit
      $ tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit
      $ tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1
      $ tc qdisc add dev $NETDEV parent 1:3 pfifo
      $ tc qdisc add dev $NETDEV parent 1:4 pfifo
      $ tc qdisc add dev $NETDEV parent 1:5 pfifo
      $ tc qdisc add dev $NETDEV parent 1:6 pfifo
      $ tc qdisc add dev $NETDEV parent 1:7 pfifo
      
      $ ip netns exec n1 iperf3 -c 2.2.2.1 -i 1 -t 10 -P 10
      
      pick txqueue from 2 - 6:
      $ ethtool -S $NETDEV | grep -i tx_queue_[0-9]_bytes
           tx_queue_0_bytes: 42
           tx_queue_1_bytes: 0
           tx_queue_2_bytes: 11442586444
           tx_queue_3_bytes: 7383615334
           tx_queue_4_bytes: 3981365579
           tx_queue_5_bytes: 3983235051
           tx_queue_6_bytes: 6706236461
           tx_queue_7_bytes: 42
           tx_queue_8_bytes: 0
           tx_queue_9_bytes: 0
      
      txqueues 2 - 6 are mapped to classid 1:3 - 1:7
      $ tc -s class show dev $NETDEV
      ...
      class mq 1:3 root leaf 8002:
       Sent 11949133672 bytes 7929798 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:4 root leaf 8003:
       Sent 7710449050 bytes 5117279 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:5 root leaf 8004:
       Sent 4157648675 bytes 2758990 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:6 root leaf 8005:
       Sent 4159632195 bytes 2759990 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      class mq 1:7 root leaf 8006:
       Sent 7003169603 bytes 4646912 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      ...
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Lobakin <alobakin@pm.me>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Talal Ahmad <talalahmad@google.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Cc: Antoine Tenart <atenart@kernel.org>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Reviewed-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      38a6f086
    • T
      net: sched: use queue_mapping to pick tx queue · 2f1e85b1
      Tonghao Zhang 提交于
      This patch fixes issue:
      * If we install tc filters with act_skbedit in clsact hook.
        It doesn't work, because netdev_core_pick_tx() overwrites
        queue_mapping.
      
        $ tc filter ... action skbedit queue_mapping 1
      
      And this patch is useful:
      * We can use FQ + EDT to implement efficient policies. Tx queues
        are picked by xps, ndo_select_queue of netdev driver, or skb hash
        in netdev_core_pick_tx(). In fact, the netdev driver, and skb
        hash are _not_ under control. xps uses the CPUs map to select Tx
        queues, but we can't figure out which task_struct of pod/containter
        running on this cpu in most case. We can use clsact filters to classify
        one pod/container traffic to one Tx queue. Why ?
      
        In containter networking environment, there are two kinds of pod/
        containter/net-namespace. One kind (e.g. P1, P2), the high throughput
        is key in these applications. But avoid running out of network resource,
        the outbound traffic of these pods is limited, using or sharing one
        dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
        (e.g. Pn), the low latency of data access is key. And the traffic is not
        limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
        This choice provides two benefits. First, contention on the HTB/FQ Qdisc
        lock is significantly reduced since fewer CPUs contend for the same queue.
        More importantly, Qdisc contention can be eliminated completely if each
        CPU has its own FIFO Qdisc for the second kind of pods.
      
        There must be a mechanism in place to support classifying traffic based on
        pods/container to different Tx queues. Note that clsact is outside of Qdisc
        while Qdisc can run a classifier to select a sub-queue under the lock.
      
        In general recording the decision in the skb seems a little heavy handed.
        This patch introduces a per-CPU variable, suggested by Eric.
      
        The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit().
        - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag
          is set in qdisc->enqueue() though tx queue has been selected in
          netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared
          firstly in __dev_queue_xmit(), is useful:
        - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev
          in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy:
          For example, eth0, macvlan in pod, which root Qdisc install skbedit
          queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of
          eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping
          because there is no filters in clsact or tx Qdisc of this netdev.
          Same action taked in eth0, ixgbe in Host.
        - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue
          in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it
          in __dev_queue_xmit when processing next packets.
      
        For performance reasons, use the static key. If user does not config the NET_EGRESS,
        the patch will not be compiled.
      
        +----+      +----+      +----+
        | P1 |      | P2 |      | Pn |
        +----+      +----+      +----+
          |           |           |
          +-----------+-----------+
                      |
                      | clsact/skbedit
                      |      MQ
                      v
          +-----------+-----------+
          | q0        | q1        | qn
          v           v           v
        HTB/FQ      HTB/FQ  ...  FIFO
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Lobakin <alobakin@pm.me>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Talal Ahmad <talalahmad@google.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Cc: Antoine Tenart <atenart@kernel.org>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      2f1e85b1
    • K
      net: dsa: hellcreek: Calculate checksums in tagger · 0763120b
      Kurt Kanzenbach 提交于
      In case the checksum calculation is offloaded to the DSA master network
      interface, it will include the switch trailing tag. As soon as the switch strips
      that tag on egress, the calculated checksum is wrong.
      
      Therefore, add the checksum calculation to the tagger (if required) before
      adding the switch tag. This way, the hellcreek code works with all DSA master
      interfaces regardless of their declared feature set.
      
      Fixes: 01ef09ca ("net: dsa: Add tag handling for Hirschmann Hellcreek switches")
      Signed-off-by: NKurt Kanzenbach <kurt@linutronix.de>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.deSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      0763120b
  4. 18 4月, 2022 5 次提交
    • J
      devlink: add port to line card relationship set · b8375859
      Jiri Pirko 提交于
      In order to properly inform user about relationship between port and
      line card, introduce a driver API to set line card for a port. Use this
      information to extend port devlink netlink message by line card index
      and also include the line card index into phys_port_name and by that
      into a netdevice name.
      Signed-off-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8375859
    • J
      devlink: implement line card active state · fc9f50d5
      Jiri Pirko 提交于
      Allow driver to mark a line card as active. Expose this state to the
      userspace over devlink netlink interface with proper notifications.
      'active' state means that line card was plugged in after
      being provisioned.
      Signed-off-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc9f50d5
    • J
      devlink: implement line card provisioning · fcdc8ce2
      Jiri Pirko 提交于
      In order to be able to configure all needed stuff on a port/netdevice
      of a line card without the line card being present, introduce line card
      provisioning. Basically by setting a type, provisioning process will
      start and driver is supposed to create a placeholder for instances
      (ports/netdevices) for a line card type.
      
      Allow the user to query the supported line card types over line card
      get command. Then implement two netlink command SET to allow user to
      set/unset the card type.
      
      On the driver API side, add provision/unprovision ops and supported
      types array to be advertised. Upon provision op call, the driver should
      take care of creating the instances for the particular line card type.
      Introduce provision_set/clear() functions to be called by the driver
      once the provisioning/unprovisioning is done on its side. These helpers
      are not to be called directly due to the async nature of provisioning.
      
      Example:
      $ devlink port # No ports are listed
      $ devlink lc
      pci/0000:01:00.0:
        lc 1 state unprovisioned
          supported_types:
             16x100G
        lc 2 state unprovisioned
          supported_types:
             16x100G
        lc 3 state unprovisioned
          supported_types:
             16x100G
        lc 4 state unprovisioned
          supported_types:
             16x100G
        lc 5 state unprovisioned
          supported_types:
             16x100G
        lc 6 state unprovisioned
          supported_types:
             16x100G
        lc 7 state unprovisioned
          supported_types:
             16x100G
        lc 8 state unprovisioned
          supported_types:
             16x100G
      
      $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G
      $ devlink lc show pci/0000:01:00.0 lc 8
      pci/0000:01:00.0:
        lc 8 state active type 16x100G
          supported_types:
             16x100G
      $ devlink port
      pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
      pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4
      pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4
      pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4
      pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4
      pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4
      pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4
      pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4
      pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4
      pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4
      pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4
      pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4
      pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4
      pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4
      pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4
      pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4
      pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4
      
      $ devlink lc set pci/0000:01:00.0 lc 8 notype
      Signed-off-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcdc8ce2
    • J
      devlink: add support to create line card and expose to user · c246f9b5
      Jiri Pirko 提交于
      Extend the devlink API so the driver is going to be able to create and
      destroy linecard instances. There can be multiple line cards per devlink
      device. Expose this new type of object over devlink netlink API to the
      userspace, with notifications.
      Signed-off-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c246f9b5
    • E
      tcp: fix signed/unsigned comparison · 843f7740
      Eric Dumazet 提交于
      Kernel test robot reported:
      
      smatch warnings:
      net/ipv4/tcp_input.c:5966 tcp_rcv_established() warn: unsigned 'reason' is never less than zero.
      
      I actually had one packetdrill failing because of this bug,
      and was about to send the fix :)
      
      v2: Andreas Schwab also pointed out that @reason needs to be negated
          before we reach tcp_drop_reason()
      
      Fixes: 4b506af9 ("tcp: add two drop reasons for tcp_ack()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      843f7740
  5. 17 4月, 2022 12 次提交
  6. 16 4月, 2022 1 次提交
    • E
      ipv6: make ip6_rt_gc_expire an atomic_t · 9cb7c013
      Eric Dumazet 提交于
      Reads and Writes to ip6_rt_gc_expire always have been racy,
      as syzbot reported lately [1]
      
      There is a possible risk of under-flow, leading
      to unexpected high value passed to fib6_run_gc(),
      although I have not observed this in the field.
      
      Hosts hitting ip6_dst_gc() very hard are under pretty bad
      state anyway.
      
      [1]
      BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc
      
      read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
       ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
       dst_alloc+0x9b/0x160 net/core/dst.c:86
       ip6_dst_alloc net/ipv6/route.c:344 [inline]
       icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
       mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
       mld_send_cr net/ipv6/mcast.c:2119 [inline]
       mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
       process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
       worker_thread+0x618/0xa70 kernel/workqueue.c:2436
       kthread+0x1a9/0x1e0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000bb3 -> 0x00000ba9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: mld mld_ifc_work
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9cb7c013