1. 11 8月, 2022 24 次提交
  2. 10 8月, 2022 16 次提交
    • X
      bpf, arm64: Fix bpf trampoline instruction endianness · aada4766
      Xu Kuohai 提交于
      The sparse tool complains as follows:
      
      arch/arm64/net/bpf_jit_comp.c:1684:16:
      	warning: incorrect type in assignment (different base types)
      arch/arm64/net/bpf_jit_comp.c:1684:16:
      	expected unsigned int [usertype] *branch
      arch/arm64/net/bpf_jit_comp.c:1684:16:
      	got restricted __le32 [usertype] *
      arch/arm64/net/bpf_jit_comp.c:1700:52:
      	error: subtraction of different types can't work (different base
      	types)
      arch/arm64/net/bpf_jit_comp.c:1734:29:
      	warning: incorrect type in assignment (different base types)
      arch/arm64/net/bpf_jit_comp.c:1734:29:
      	expected unsigned int [usertype] *
      arch/arm64/net/bpf_jit_comp.c:1734:29:
      	got restricted __le32 [usertype] *
      arch/arm64/net/bpf_jit_comp.c:1918:52:
      	error: subtraction of different types can't work (different base
      	types)
      
      This is because the variable branch in function invoke_bpf_prog and the
      variable branches in function prepare_trampoline are defined as type
      u32 *, which conflicts with ctx->image's type __le32 *, so sparse complains
      when assignment or arithmetic operation are performed on these two
      variables and ctx->image.
      
      Since arm64 instructions are always little-endian, change the type of
      these two variables to __le32 * and call cpu_to_le32() to convert
      instruction to little-endian before writing it to memory. This is also
      in line with emit() which internally does cpu_to_le32(), too.
      
      Fixes: efc9909f ("bpf, arm64: Add bpf trampoline for arm64")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NXu Kuohai <xukuohai@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
      Link: https://lore.kernel.org/bpf/20220808040735.1232002-1-xukuohai@huawei.com
      aada4766
    • J
      genetlink: correct uAPI defines · f329a0eb
      Jakub Kicinski 提交于
      Commit 50a896cf ("genetlink: properly support per-op policy dumping")
      seems to have copy'n'pasted things a little incorrectly.
      
      The #define CTRL_ATTR_MCAST_GRP_MAX should have stayed right
      after the previous enum. The new CTRL_ATTR_POLICY_* needs
      its own define for MAX and that max should not contain the
      superfluous _DUMP in the name.
      
      We probably can't do anything about the CTRL_ATTR_POLICY_DUMP_MAX
      any more, there's likely code which uses it. For consistency
      (*cough* codegen *cough*) let's add the correctly name define
      nonetheless.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f329a0eb
    • I
      devlink: Fix use-after-free after a failed reload · 6b4db2e5
      Ido Schimmel 提交于
      After a failed devlink reload, devlink parameters are still registered,
      which means user space can set and get their values. In the case of the
      mlxsw "acl_region_rehash_interval" parameter, these operations will
      trigger a use-after-free [1].
      
      Fix this by rejecting set and get operations while in the failed state.
      Return the "-EOPNOTSUPP" error code which does not abort the parameters
      dump, but instead causes it to skip over the problematic parameter.
      
      Another possible fix is to perform these checks in the mlxsw parameter
      callbacks, but other drivers might be affected by the same problem and I
      am not aware of scenarios where these stricter checks will cause a
      regression.
      
      [1]
      mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev
      mlxsw_spectrum3 0000:00:10.0: Failed to create ports
      
      ==================================================================
      BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
      Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777
      
      CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1
      Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: netns cleanup_net
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:313 [inline]
       print_report.cold+0x5e/0x5cf mm/kasan/report.c:429
       kasan_report+0xb9/0xf0 mm/kasan/report.c:491
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306
       mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
       mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106
       mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854
       devlink_param_get net/core/devlink.c:4981 [inline]
       devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089
       devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168
       devlink_ns_change_notify net/core/devlink.c:4417 [inline]
       devlink_ns_change_notify net/core/devlink.c:4396 [inline]
       devlink_reload+0x15f/0x700 net/core/devlink.c:4507
       devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272
       ops_pre_exit_list net/core/net_namespace.c:152 [inline]
       cleanup_net+0x494/0xc00 net/core/net_namespace.c:582
       process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289
       worker_thread+0x675/0x10b0 kernel/workqueue.c:2436
       kthread+0x30c/0x3d0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
       </TASK>
      
      The buggy address belongs to the physical page:
      page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc
      flags: 0x100000000000000(node=0|zone=1)
      raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                          ^
       ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      
      Fixes: 98bbf70c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b4db2e5
    • S
      net:bonding:support balance-alb interface with vlan to bridge · d5410ac7
      Sun Shouxin 提交于
      In my test, balance-alb bonding with two slaves eth0 and eth1,
      and then Bond0.150 is created with vlan id attached bond0.
      After adding bond0.150 into one linux bridge, I noted that Bond0,
      bond0.150 and  bridge were assigned to the same MAC as eth0.
      Once bond0.150 receives a packet whose dest IP is bridge's
      and dest MAC is eth1's, the linux bridge will not match
      eth1's MAC entry in FDB, and not handle it as expected.
      The patch fix the issue, and diagram as below:
      
      eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac)
                            |
                         bond0.150(mac:eth0_mac)
                            |
                         bridge(ip:br_ip, mac:eth0_mac)--other port
      Suggested-by: NHu Yadi <huyd12@chinatelecom.cn>
      Signed-off-by: NSun Shouxin <sunshouxin@chinatelecom.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5410ac7
    • C
      macsec: Fix traffic counters/statistics · 91ec9bd5
      Clayton Yager 提交于
      OutOctetsProtected, OutOctetsEncrypted, InOctetsValidated, and
      InOctetsDecrypted were incrementing by the total number of octets in frames
      instead of by the number of octets of User Data in frames.
      
      The Controlled Port statistics ifOutOctets and ifInOctets were incrementing
      by the total number of octets instead of the number of octets of the MSDUs
      plus octets of the destination and source MAC addresses.
      
      The Controlled Port statistics ifInDiscards and ifInErrors were not
      incrementing each time the counters they aggregate were.
      
      The Controlled Port statistic ifInErrors was not included in the output of
      macsec_get_stats64 so the value was not present in ip commands output.
      
      The ReceiveSA counters InPktsNotValid, InPktsNotUsingSA, and InPktsUnusedSA
      were not incrementing.
      Signed-off-by: NClayton Yager <Clayton_Yager@selinc.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91ec9bd5
    • P
      vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout() · a3e7b29e
      Peilin Ye 提交于
      Imagine two non-blocking vsock_connect() requests on the same socket.
      The first request schedules @connect_work, and after it times out,
      vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps
      *socket* state as SS_CONNECTING.
      
      Later, the second request returns -EALREADY, meaning the socket "already
      has a pending connection in progress", even though the first request has
      already timed out.
      
      As suggested by Stefano, fix it by setting *socket* state back to
      SS_UNCONNECTED, so that the second request will return -ETIMEDOUT.
      Suggested-by: NStefano Garzarella <sgarzare@redhat.com>
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3e7b29e
    • P
      vsock: Fix memory leak in vsock_connect() · 7e97cfed
      Peilin Ye 提交于
      An O_NONBLOCK vsock_connect() request may try to reschedule
      @connect_work.  Imagine the following sequence of vsock_connect()
      requests:
      
        1. The 1st, non-blocking request schedules @connect_work, which will
           expire after 200 jiffies.  Socket state is now SS_CONNECTING;
      
        2. Later, the 2nd, blocking request gets interrupted by a signal after
           a few jiffies while waiting for the connection to be established.
           Socket state is back to SS_UNCONNECTED, but @connect_work is still
           pending, and will expire after 100 jiffies.
      
        3. Now, the 3rd, non-blocking request tries to schedule @connect_work
           again.  Since @connect_work is already scheduled,
           schedule_delayed_work() silently returns.  sock_hold() is called
           twice, but sock_put() will only be called once in
           vsock_connect_timeout(), causing a memory leak reported by syzbot:
      
        BUG: memory leak
        unreferenced object 0xffff88810ea56a40 (size 1232):
          comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  (..@............
          backtrace:
            [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930
            [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989
            [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734
            [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203
            [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468
            [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline]
            [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561
            [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline]
            [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline]
            [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568
            [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
            [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
            [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
        <...>
      
      Use mod_delayed_work() instead: if @connect_work is already scheduled,
      reschedule it, and undo sock_hold() to keep the reference count
      balanced.
      
      Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Co-developed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e97cfed
    • J
      Revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" · 6fd2c17f
      Jose Alonso 提交于
      This reverts commit 36a15e1c.
      
      The usage of FLAG_SEND_ZLP causes problems to other firmware/hardware
      versions that have no issues.
      
      The FLAG_SEND_ZLP is not safe to use in this context.
      See:
      https://patchwork.ozlabs.org/project/netdev/patch/1270599787.8900.8.camel@Linuxdev4-laptop/#118378
      The original problem needs another way to solve.
      
      Fixes: 36a15e1c ("net: usb: ax88179_178a needs FLAG_SEND_ZLP")
      Cc: stable@vger.kernel.org
      Reported-by: NRonald Wahl <ronald.wahl@raritan.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216327
      Link: https://bugs.archlinux.org/task/75491Signed-off-by: NJose Alonso <joalonsof@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fd2c17f
    • T
      netlabel: fix typo in comment · 2cd0e8db
      Topi Miettinen 提交于
      'IPv4 and IPv4' should be 'IPv4 and IPv6'.
      Signed-off-by: NTopi Miettinen <toiwoton@gmail.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cd0e8db
    • D
      Merge tag 'linux-can-fixes-for-6.0-20220810' of... · e7f16495
      David S. Miller 提交于
      Merge tag 'linux-can-fixes-for-6.0-20220810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      this is a pull request of 4 patches for net/master, with the
      whitespace issue fixed.
      
      Fedor Pchelkin contributes 2 fixes for the j1939 CAN protocol.
      
      A patch by me for the ems_usb driver fixes an unaligned access
      warning.
      
      Sebastian Würl's patch for the mcp251x driver fixes a race condition
      in the receive interrupt.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7f16495
    • J
      Merge branch 'do-not-use-rt_tos-for-ipv6-flowlabel' · 996237d9
      Jakub Kicinski 提交于
      Matthias May says:
      
      ====================
      Do not use RT_TOS for IPv6 flowlabel
      
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      ====================
      
      Link: https://lore.kernel.org/r/20220805191906.9323-1-matthias.may@westermo.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      996237d9
    • M
      ipv6: do not use RT_TOS for IPv6 flowlabel · ab7e2e0d
      Matthias May 提交于
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 571912c6 ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NMatthias May <matthias.may@westermo.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ab7e2e0d
    • M
      mlx5: do not use RT_TOS for IPv6 flowlabel · bcb0da7f
      Matthias May 提交于
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: ce99f6b9 ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NMatthias May <matthias.may@westermo.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bcb0da7f
    • M
      vxlan: do not use RT_TOS for IPv6 flowlabel · e488d4f5
      Matthias May 提交于
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 1400615d ("vxlan: allow setting ipv6 traffic class")
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NMatthias May <matthias.may@westermo.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e488d4f5
    • M
      geneve: do not use RT_TOS for IPv6 flowlabel · ca2bb695
      Matthias May 提交于
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 3a56f86f ("geneve: handle ipv6 priority like ipv4 tos")
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NMatthias May <matthias.may@westermo.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ca2bb695
    • M
      geneve: fix TOS inheriting for ipv4 · b4ab94d6
      Matthias May 提交于
      The current code retrieves the TOS field after the lookup
      on the ipv4 routing table. The routing process currently
      only allows routing based on the original 3 TOS bits, and
      not on the full 6 DSCP bits.
      As a result the retrieved TOS is cut to the 3 bits.
      However for inheriting purposes the full 6 bits should be used.
      
      Extract the full 6 bits before the route lookup and use
      that instead of the cut off 3 TOS bits.
      
      Fixes: e305ac6c ("geneve: Add support to collect tunnel metadata.")
      Signed-off-by: NMatthias May <matthias.may@westermo.com>
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/20220805190006.8078-1-matthias.may@westermo.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b4ab94d6