1. 09 8月, 2022 1 次提交
  2. 02 8月, 2022 6 次提交
  3. 01 8月, 2022 6 次提交
  4. 30 7月, 2022 1 次提交
  5. 29 7月, 2022 15 次提交
    • A
      seg6: add support for SRv6 H.L2Encaps.Red behavior · 13f0296b
      Andrea Mayer 提交于
      The SRv6 H.L2Encaps.Red behavior described in [1] is an optimization of
      the SRv6 H.L2Encaps behavior [2].
      
      H.L2Encaps.Red reduces the length of the SRH by excluding the first
      segment (SID) in the SRH of the pushed IPv6 header. The first SID is
      only placed in the IPv6 Destination Address field of the pushed IPv6
      header.
      When the SRv6 Policy only contains one SID the SRH is omitted, unless
      there is an HMAC TLV to be carried.
      
      [1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.4
      [2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.3Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: NAnton Makarov <anton.makarov11235@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13f0296b
    • A
      seg6: add support for SRv6 H.Encaps.Red behavior · b07c8cdb
      Andrea Mayer 提交于
      The SRv6 H.Encaps.Red behavior described in [1] is an optimization of
      the SRv6 H.Encaps behavior [2].
      
      H.Encaps.Red reduces the length of the SRH by excluding the first
      segment (SID) in the SRH of the pushed IPv6 header. The first SID is
      only placed in the IPv6 Destination Address field of the pushed IPv6
      header.
      When the SRv6 Policy only contains one SID the SRH is omitted, unless
      there is an HMAC TLV to be carried.
      
      [1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.2
      [2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.1Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: NAnton Makarov <anton.makarov11235@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b07c8cdb
    • Z
      net/af_packet: check len when min_header_len equals to 0 · dc633700
      Zhengchao Shao 提交于
      User can use AF_PACKET socket to send packets with the length of 0.
      When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
      to send packets, and sock->type can be any type.
      
      Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com
      Fixes: fd189422 ("bpf: Don't redirect packets with invalid pkt_len")
      Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc633700
    • E
      ax25: fix incorrect dev_tracker usage · d7c4c9e0
      Eric Dumazet 提交于
      While investigating a separate rose issue [1], and enabling
      CONFIG_NET_DEV_REFCNT_TRACKER=y, Bernard reported an orthogonal ax25 issue [2]
      
      An ax25_dev can be used by one (or many) struct ax25_cb.
      We thus need different dev_tracker, one per struct ax25_cb.
      
      After this patch is applied, we are able to focus on rose.
      
      [1] https://lore.kernel.org/netdev/fb7544a1-f42e-9254-18cc-c9b071f4ca70@free.fr/
      
      [2]
      [  205.798723] reference already released.
      [  205.798732] allocated in:
      [  205.798734]  ax25_bind+0x1a2/0x230 [ax25]
      [  205.798747]  __sys_bind+0xea/0x110
      [  205.798753]  __x64_sys_bind+0x18/0x20
      [  205.798758]  do_syscall_64+0x5c/0x80
      [  205.798763]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  205.798768] freed in:
      [  205.798770]  ax25_release+0x115/0x370 [ax25]
      [  205.798778]  __sock_release+0x42/0xb0
      [  205.798782]  sock_close+0x15/0x20
      [  205.798785]  __fput+0x9f/0x260
      [  205.798789]  ____fput+0xe/0x10
      [  205.798792]  task_work_run+0x64/0xa0
      [  205.798798]  exit_to_user_mode_prepare+0x18b/0x190
      [  205.798804]  syscall_exit_to_user_mode+0x26/0x40
      [  205.798808]  do_syscall_64+0x69/0x80
      [  205.798812]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  205.798827] ------------[ cut here ]------------
      [  205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81
      [  205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video
      [  205.798948]  mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25]
      [  205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3
      [  205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
      [  205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81
      [  205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e
      [  205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286
      [  205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000
      [  205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff
      [  205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618
      [  205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0
      [  205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001
      [  205.799024] FS:  0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000
      [  205.799028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0
      [  205.799033] Call Trace:
      [  205.799035]  <TASK>
      [  205.799038]  ? ax25_dev_device_down+0xd9/0x1b0 [ax25]
      [  205.799047]  ? ax25_device_event+0x9f/0x270 [ax25]
      [  205.799055]  ? raw_notifier_call_chain+0x49/0x60
      [  205.799060]  ? call_netdevice_notifiers_info+0x52/0xa0
      [  205.799065]  ? dev_close_many+0xc8/0x120
      [  205.799070]  ? unregister_netdevice_many+0x13d/0x890
      [  205.799073]  ? unregister_netdevice_queue+0x90/0xe0
      [  205.799076]  ? unregister_netdev+0x1d/0x30
      [  205.799080]  ? mkiss_close+0x7c/0xc0 [mkiss]
      [  205.799084]  ? tty_ldisc_close+0x2e/0x40
      [  205.799089]  ? tty_ldisc_hangup+0x137/0x210
      [  205.799092]  ? __tty_hangup.part.0+0x208/0x350
      [  205.799098]  ? tty_vhangup+0x15/0x20
      [  205.799103]  ? pty_close+0x127/0x160
      [  205.799108]  ? tty_release+0x139/0x5e0
      [  205.799112]  ? __fput+0x9f/0x260
      [  205.799118]  ax25_dev_device_down+0xd9/0x1b0 [ax25]
      [  205.799126]  ax25_device_event+0x9f/0x270 [ax25]
      [  205.799135]  raw_notifier_call_chain+0x49/0x60
      [  205.799140]  call_netdevice_notifiers_info+0x52/0xa0
      [  205.799146]  dev_close_many+0xc8/0x120
      [  205.799152]  unregister_netdevice_many+0x13d/0x890
      [  205.799157]  unregister_netdevice_queue+0x90/0xe0
      [  205.799161]  unregister_netdev+0x1d/0x30
      [  205.799165]  mkiss_close+0x7c/0xc0 [mkiss]
      [  205.799170]  tty_ldisc_close+0x2e/0x40
      [  205.799173]  tty_ldisc_hangup+0x137/0x210
      [  205.799178]  __tty_hangup.part.0+0x208/0x350
      [  205.799184]  tty_vhangup+0x15/0x20
      [  205.799188]  pty_close+0x127/0x160
      [  205.799193]  tty_release+0x139/0x5e0
      [  205.799199]  __fput+0x9f/0x260
      [  205.799203]  ____fput+0xe/0x10
      [  205.799208]  task_work_run+0x64/0xa0
      [  205.799213]  do_exit+0x33b/0xab0
      [  205.799217]  ? __handle_mm_fault+0xc4f/0x15f0
      [  205.799224]  do_group_exit+0x35/0xa0
      [  205.799228]  __x64_sys_exit_group+0x18/0x20
      [  205.799232]  do_syscall_64+0x5c/0x80
      [  205.799238]  ? handle_mm_fault+0xba/0x290
      [  205.799242]  ? debug_smp_processor_id+0x17/0x20
      [  205.799246]  ? fpregs_assert_state_consistent+0x26/0x50
      [  205.799251]  ? exit_to_user_mode_prepare+0x49/0x190
      [  205.799256]  ? irqentry_exit_to_user_mode+0x9/0x20
      [  205.799260]  ? irqentry_exit+0x33/0x40
      [  205.799263]  ? exc_page_fault+0x87/0x170
      [  205.799268]  ? asm_exc_page_fault+0x8/0x30
      [  205.799273]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  205.799277] RIP: 0033:0x7ff6b80eaca1
      [  205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77.
      [  205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      [  205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1
      [  205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
      [  205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028
      [  205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00
      [  205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00
      [  205.799304]  </TASK>
      
      Fixes: feef318c ("ax25: fix UAF bugs of net_device caused by rebinding operation")
      Reported-by: NBernard F6BVP <f6bvp@free.fr>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Duoming Zhou <duoming@zju.edu.cn>
      Link: https://lore.kernel.org/r/20220728051821.3160118-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d7c4c9e0
    • M
      devlink: Hold the instance lock in health callbacks · c90005b5
      Moshe Shemesh 提交于
      Let the core take the devlink instance lock around health callbacks and
      remove the now redundant locking in the drivers.
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c90005b5
    • J
      net: devlink: remove region snapshots list dependency on devlink->lock · 2dec18ad
      Jiri Pirko 提交于
      After mlx4 driver is converted to do locked reload,
      devlink_region_snapshot_create() may be called from both locked and
      unlocked context.
      
      Note that in mlx4 region snapshots could be created on any command
      failure. That can happen in any flow that involves commands to FW,
      which means most of the driver flows.
      
      So resolve this by removing dependency on devlink->lock for region
      snapshots list consistency and introduce new mutex to ensure it.
      Signed-off-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2dec18ad
    • J
      net: devlink: remove region snapshot ID tracking dependency on devlink->lock · 5502e871
      Jiri Pirko 提交于
      After mlx4 driver is converted to do locked reload, functions to get/put
      regions snapshot ID may be called from both locked and unlocked context.
      
      So resolve this by removing dependency on devlink->lock for region
      snapshot ID tracking by using internal xa_lock() to maintain
      shapshot_ids xa_array consistency.
      Signed-off-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5502e871
    • V
      devlink: introduce framework for selftests · 08f588fa
      Vikas Gupta 提交于
      Add a framework for running selftests.
      Framework exposes devlink commands and test suite(s) to the user
      to execute and query the supported tests by the driver.
      
      Below are new entries in devlink_nl_ops
      devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported
      selftests by the drivers.
      devlink_nl_cmd_selftests_run: To execute selftests. Users can
      provide a test mask for executing group tests or standalone tests.
      
      Documentation/networking/devlink/ path is already part of MAINTAINERS &
      the new files come under this path. Hence no update needed to the
      MAINTAINERS
      Signed-off-by: NVikas Gupta <vikas.gupta@broadcom.com>
      Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      08f588fa
    • T
      net/tls: Multi-threaded calls to TX tls_dev_del · 7adc91e0
      Tariq Toukan 提交于
      Multiple TLS device-offloaded contexts can be added in parallel via
      concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
      sequential in tls_device_gc_task.
      
      This is not a sustainable behavior. This creates a rate gap between add
      and del operations (addition rate outperforms the deletion rate).  When
      running for enough time, the TLS device resources could get exhausted,
      failing to offload new connections.
      
      Replace the single-threaded garbage collector work with a per-context
      alternative, so they can be handled on several cores in parallel. Use
      a new dedicated destruct workqueue for this.
      
      Tested with mlx5 device:
      Before: 22141 add/sec,   103 del/sec
      After:  11684 add/sec, 11684 del/sec
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7adc91e0
    • T
      net/tls: Perform immediate device ctx cleanup when possible · 113671b2
      Tariq Toukan 提交于
      TLS context destructor can be run in atomic context. Cleanup operations
      for device-offloaded contexts could require access and interaction with
      the device callbacks, which might sleep. Hence, the cleanup of such
      contexts must be deferred and completed inside an async work.
      
      For all others, this is not necessary, as cleanup is atomic. Invoke
      cleanup immediately for them, avoiding queueing redundant gc work.
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      113671b2
    • Y
      tls: rx: Fix unsigned comparison with less than zero · 8fd1e151
      Yang Li 提交于
      The return from the call to tls_rx_msg_size() is int, it can be
      a negative error code, however this is being assigned to an
      unsigned long variable 'sz', so making 'sz' an int.
      
      Eliminate the following coccicheck warning:
      ./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8fd1e151
    • J
      tls: rx: fix the false positive warning · e20691fa
      Jakub Kicinski 提交于
      I went too far in the accessor conversion, we can't use tls_strp_msg()
      after decryption because the message may not be ready. What we care
      about on this path is that the output skb is detached, i.e. we didn't
      somehow just turn around and used the input skb with its TCP data
      still attached. So look at the anchor directly.
      
      Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser")
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e20691fa
    • J
      tls: strp: rename and multithread the workqueue · d11ef9cc
      Jakub Kicinski 提交于
      Paolo points out that there seems to be no strong reason strparser
      users a single threaded workqueue. Perhaps there were some performance
      or pinning considerations? Since we don't know (and it's the slow path)
      let's default to the most natural, multi-threaded choice.
      
      Also rename the workqueue to "tls-".
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d11ef9cc
    • J
      tls: rx: don't consider sock_rcvtimeo() cumulative · 70f03fc2
      Jakub Kicinski 提交于
      Eric indicates that restarting rcvtimeo on every wait may be fine.
      I thought that we should consider it cumulative, and made
      tls_rx_reader_lock() return the remaining timeo after acquiring
      the reader lock.
      
      tls_rx_rec_wait() gets its timeout passed in by value so it
      does not keep track of time previously spent.
      
      Make the lock waiting consistent with tls_rx_rec_wait() - don't
      keep track of time spent.
      
      Read the timeo fresh in tls_rx_rec_wait().
      It's unclear to me why callers are supposed to cache the value.
      
      Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      70f03fc2
    • K
      net: ping6: Fix memleak in ipv6_renew_options(). · e2732600
      Kuniyuki Iwashima 提交于
      When we close ping6 sockets, some resources are left unfreed because
      pingv6_prot is missing sk->sk_prot->destroy().  As reported by
      syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
      
          struct ipv6_sr_hdr *hdr;
          char data[24] = {0};
          int fd;
      
          hdr = (struct ipv6_sr_hdr *)data;
          hdr->hdrlen = 2;
          hdr->type = IPV6_SRCRT_TYPE_4;
      
          fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
          setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
          close(fd);
      
      To fix memory leaks, let's add a destroy function.
      
      Note the socket() syscall checks if the GID is within the range of
      net.ipv4.ping_group_range.  The default value is [1, 0] so that no
      GID meets the condition (1 <= GID <= 0).  Thus, the local DoS does
      not succeed until we change the default value.  However, at least
      Ubuntu/Fedora/RHEL loosen it.
      
          $ cat /usr/lib/sysctl.d/50-default.conf
          ...
          -net.ipv4.ping_group_range = 0 2147483647
      
      Also, there could be another path reported with these options, and
      some of them require CAP_NET_RAW.
      
        setsockopt
            IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
            IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
            IPV6_HOPOPTS (inet6_sk(sk)->opt)
            IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
            IPV6_RTHDR (inet6_sk(sk)->opt)
            IPV6_DSTOPTS (inet6_sk(sk)->opt)
            IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
      
        getsockopt
            IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
      
      For the record, I left a different splat with syzbot's one.
      
        unreferenced object 0xffff888006270c60 (size 96):
          comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
          hex dump (first 32 bytes):
            01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00  ....D...........
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
            [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
            [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
            [<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
            [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
            [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
            [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
      [0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176
      
      Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
      Reported-by: NAyushman Dutta <ayudutta@amazon.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e2732600
  6. 28 7月, 2022 4 次提交
  7. 27 7月, 2022 7 次提交