提交 · 1ef255e257173f4bc44317ef2076e7e0de688fdf · openeuler / Kernel

09 8月, 2022 1 次提交

iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() · 1ef255e2

由 Al Viro 提交于 2年前

Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrappers doing that, convert trivial open-coded
uses of those.

BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ef255e2

02 8月, 2022 6 次提交

net: devlink: Fix missing mutex_unlock() call · 80ef9286

由 Ammar Faizi 提交于 2年前

Commit 2dec18ad forgets to call mutex_unlock() before the function
returns in the error path:

   New smatch warnings:
   net/core/devlink.c:6392 devlink_nl_cmd_region_new() warn: inconsistent \
   returns '&region->snapshot_lock'.

Make sure we call mutex_unlock() in this error path.
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: 2dec18ad ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: NAmmar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: NJiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220801115742.1309329-1-ammar.faizi@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

80ef9286

net/tls: Remove redundant workqueue flush before destroy · d81c7cdd

由 Tariq Toukan 提交于 2年前

destroy_workqueue() safely destroys the workqueue after draining it.
No need for the explicit call to flush_workqueue(). Remove it.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220801112444.26175-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

d81c7cdd

net: dsa: Fix spelling mistakes and cleanup code · 062cf5eb

由 Xie Shaowen 提交于 2年前

fix follow spelling misktakes:
	desconstructed ==> deconstructed
	enforcment ==> enforcement
Reported-by: NHacash Robot <hacashRobot@santino.com>
Signed-off-by: NXie Shaowen <studentxswpy@163.com>
Link: https://lore.kernel.org/r/20220730092254.3102875-1-studentxswpy@163.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

062cf5eb

dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock · a41b17ff

由 Hangyu Hua 提交于 2年前

In the case of sk->dccps_qpolicy == DCCPQ_POLICY_PRIO, dccp_qpolicy_full
will drop a skb when qpolicy is full. And the lock in dccp_sendmsg is
released before sock_alloc_send_skb and then relocked after
sock_alloc_send_skb. The following conditions may lead dccp_qpolicy_push
to add skb to an already full sk_write_queue:

thread1--->lock
thread1--->dccp_qpolicy_full: queue is full. drop a skb
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_full: queue is not full. no need to drop.
thread2--->unlock
thread1--->lock
thread1--->dccp_qpolicy_push: add a skb. queue is full.
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_push: add a skb!
thread2--->unlock

Fix this by moving dccp_qpolicy_full.

Fixes: b1308dc0 ("[DCCP]: Set TX Queue Length Bounds via Sysctl")
Signed-off-by: NHangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220729110027.40569-1-hbh25y@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a41b17ff

net: rose: add netdev ref tracker to 'struct rose_sock' · 2df91e39

由 Eric Dumazet 提交于 2年前

This will help debugging netdevice refcount problems with
CONFIG_NET_DEV_REFCNT_TRACKER=y
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2df91e39

net: rose: fix netdev reference changes · 93102782

由 Eric Dumazet 提交于 2年前

Bernard reported that trying to unload rose module would lead
to infamous messages:

unregistered_netdevice: waiting for rose0 to become free. Usage count = xx

This patch solves the issue, by making sure each socket referring to
a netdevice holds a reference count on it, and properly releases it
in rose_release().

rose_dev_first() is also fixed to take a device reference
before leaving the rcu_read_locked section.

Following patch will add ref_tracker annotations to ease
future bug hunting.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: NBernard Pidoux <f6bvp@free.fr>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NBernard Pidoux <f6bvp@free.fr>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

93102782

01 8月, 2022 6 次提交

net: devlink: enable parallel ops on netlink interface · 09b27846

由 Jiri Pirko 提交于 2年前

As the devlink_mutex was removed and all devlink instances are protected
individually by devlink->lock mutex, allow the netlink ops to run
in parallel and therefore allow user to execute commands on multiple
devlink instances simultaneously.
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09b27846

net: devlink: remove devlink_mutex · d3efc2a6

由 Jiri Pirko 提交于 2年前

All accesses to devlink structure from userspace and drivers are locked
with devlink->lock instance mutex. Also, devlinks xa_array iteration is
taken care of by iteration helpers taking devlink reference.

Therefore, remove devlink_mutex as it is no longer needed.
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3efc2a6

net: devlink: convert reload command to take implicit devlink->lock · 644a66c6

由 Jiri Pirko 提交于 2年前

Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

644a66c6

net: devlink: introduce "unregistering" mark and use it during devlinks iteration · c2368b19

由 Jiri Pirko 提交于 2年前

Add new mark called "unregistering" to be set at the beginning of
devlink_unregister() function. Check this mark during devlinks
iteration in order to prevent getting a reference of devlink which is
being currently unregistered.
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2368b19

udp: Remove redundant __udp_sysctl_init() call from udp_init(). · 02a7cb28

由 Kuniyuki Iwashima 提交于 2年前

__udp_sysctl_init() is called for init_net via udp_sysctl_ops.

While at it, we can rename __udp_sysctl_init() to udp_sysctl_init().

Fixes: 1e802951 ("udp: Move the udp sysctl to namespace.")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02a7cb28

net/rds: Use PTR_ERR instead of IS_ERR for rdsdebug() · 5121db6a

由 Li Qiong 提交于 2年前

If 'local_odp_mr->r_trans_private' is a error code,
it is better to print the error code than to print
the value of IS_ERR().
Signed-off-by: NLi Qiong <liqiong@nfschina.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5121db6a

30 7月, 2022 1 次提交

dn_route: replace "jiffies-now>0" with "jiffies!=now" · 0f14a835

由 Yu Zhe 提交于 2年前

Use "jiffies != now" to replace "jiffies - now > 0" to make
code more readable. We want to put a limit on how long the
loop can run for before rescheduling.
Signed-off-by: NYu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20220729061712.22666-1-yuzhe@nfschina.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0f14a835

29 7月, 2022 15 次提交

seg6: add support for SRv6 H.L2Encaps.Red behavior · 13f0296b

由 Andrea Mayer 提交于 2年前

The SRv6 H.L2Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.L2Encaps behavior [2].

H.L2Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.

[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.4
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.3Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: NAnton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13f0296b

seg6: add support for SRv6 H.Encaps.Red behavior · b07c8cdb

由 Andrea Mayer 提交于 2年前

The SRv6 H.Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.Encaps behavior [2].

H.Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.

[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.2
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.1Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: NAnton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b07c8cdb

net/af_packet: check len when min_header_len equals to 0 · dc633700

由 Zhengchao Shao 提交于 2年前

User can use AF_PACKET socket to send packets with the length of 0.
When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
to send packets, and sock->type can be any type.

Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com
Fixes: fd189422 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc633700

ax25: fix incorrect dev_tracker usage · d7c4c9e0

由 Eric Dumazet 提交于 2年前

While investigating a separate rose issue [1], and enabling
CONFIG_NET_DEV_REFCNT_TRACKER=y, Bernard reported an orthogonal ax25 issue [2]

An ax25_dev can be used by one (or many) struct ax25_cb.
We thus need different dev_tracker, one per struct ax25_cb.

After this patch is applied, we are able to focus on rose.

[1] https://lore.kernel.org/netdev/fb7544a1-f42e-9254-18cc-c9b071f4ca70@free.fr/

[2]
[  205.798723] reference already released.
[  205.798732] allocated in:
[  205.798734]  ax25_bind+0x1a2/0x230 [ax25]
[  205.798747]  __sys_bind+0xea/0x110
[  205.798753]  __x64_sys_bind+0x18/0x20
[  205.798758]  do_syscall_64+0x5c/0x80
[  205.798763]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.798768] freed in:
[  205.798770]  ax25_release+0x115/0x370 [ax25]
[  205.798778]  __sock_release+0x42/0xb0
[  205.798782]  sock_close+0x15/0x20
[  205.798785]  __fput+0x9f/0x260
[  205.798789]  ____fput+0xe/0x10
[  205.798792]  task_work_run+0x64/0xa0
[  205.798798]  exit_to_user_mode_prepare+0x18b/0x190
[  205.798804]  syscall_exit_to_user_mode+0x26/0x40
[  205.798808]  do_syscall_64+0x69/0x80
[  205.798812]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.798827] ------------[ cut here ]------------
[  205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81
[  205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video
[  205.798948]  mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25]
[  205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3
[  205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[  205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81
[  205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e
[  205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286
[  205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000
[  205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff
[  205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618
[  205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0
[  205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001
[  205.799024] FS:  0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000
[  205.799028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0
[  205.799033] Call Trace:
[  205.799035]  <TASK>
[  205.799038]  ? ax25_dev_device_down+0xd9/0x1b0 [ax25]
[  205.799047]  ? ax25_device_event+0x9f/0x270 [ax25]
[  205.799055]  ? raw_notifier_call_chain+0x49/0x60
[  205.799060]  ? call_netdevice_notifiers_info+0x52/0xa0
[  205.799065]  ? dev_close_many+0xc8/0x120
[  205.799070]  ? unregister_netdevice_many+0x13d/0x890
[  205.799073]  ? unregister_netdevice_queue+0x90/0xe0
[  205.799076]  ? unregister_netdev+0x1d/0x30
[  205.799080]  ? mkiss_close+0x7c/0xc0 [mkiss]
[  205.799084]  ? tty_ldisc_close+0x2e/0x40
[  205.799089]  ? tty_ldisc_hangup+0x137/0x210
[  205.799092]  ? __tty_hangup.part.0+0x208/0x350
[  205.799098]  ? tty_vhangup+0x15/0x20
[  205.799103]  ? pty_close+0x127/0x160
[  205.799108]  ? tty_release+0x139/0x5e0
[  205.799112]  ? __fput+0x9f/0x260
[  205.799118]  ax25_dev_device_down+0xd9/0x1b0 [ax25]
[  205.799126]  ax25_device_event+0x9f/0x270 [ax25]
[  205.799135]  raw_notifier_call_chain+0x49/0x60
[  205.799140]  call_netdevice_notifiers_info+0x52/0xa0
[  205.799146]  dev_close_many+0xc8/0x120
[  205.799152]  unregister_netdevice_many+0x13d/0x890
[  205.799157]  unregister_netdevice_queue+0x90/0xe0
[  205.799161]  unregister_netdev+0x1d/0x30
[  205.799165]  mkiss_close+0x7c/0xc0 [mkiss]
[  205.799170]  tty_ldisc_close+0x2e/0x40
[  205.799173]  tty_ldisc_hangup+0x137/0x210
[  205.799178]  __tty_hangup.part.0+0x208/0x350
[  205.799184]  tty_vhangup+0x15/0x20
[  205.799188]  pty_close+0x127/0x160
[  205.799193]  tty_release+0x139/0x5e0
[  205.799199]  __fput+0x9f/0x260
[  205.799203]  ____fput+0xe/0x10
[  205.799208]  task_work_run+0x64/0xa0
[  205.799213]  do_exit+0x33b/0xab0
[  205.799217]  ? __handle_mm_fault+0xc4f/0x15f0
[  205.799224]  do_group_exit+0x35/0xa0
[  205.799228]  __x64_sys_exit_group+0x18/0x20
[  205.799232]  do_syscall_64+0x5c/0x80
[  205.799238]  ? handle_mm_fault+0xba/0x290
[  205.799242]  ? debug_smp_processor_id+0x17/0x20
[  205.799246]  ? fpregs_assert_state_consistent+0x26/0x50
[  205.799251]  ? exit_to_user_mode_prepare+0x49/0x190
[  205.799256]  ? irqentry_exit_to_user_mode+0x9/0x20
[  205.799260]  ? irqentry_exit+0x33/0x40
[  205.799263]  ? exc_page_fault+0x87/0x170
[  205.799268]  ? asm_exc_page_fault+0x8/0x30
[  205.799273]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.799277] RIP: 0033:0x7ff6b80eaca1
[  205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77.
[  205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1
[  205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
[  205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028
[  205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00
[  205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00
[  205.799304]  </TASK>

Fixes: feef318c ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: NBernard F6BVP <f6bvp@free.fr>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220728051821.3160118-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

d7c4c9e0

devlink: Hold the instance lock in health callbacks · c90005b5

由 Moshe Shemesh 提交于 2年前

Let the core take the devlink instance lock around health callbacks and
remove the now redundant locking in the drivers.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c90005b5

net: devlink: remove region snapshots list dependency on devlink->lock · 2dec18ad

由 Jiri Pirko 提交于 2年前

After mlx4 driver is converted to do locked reload,
devlink_region_snapshot_create() may be called from both locked and
unlocked context.

Note that in mlx4 region snapshots could be created on any command
failure. That can happen in any flow that involves commands to FW,
which means most of the driver flows.

So resolve this by removing dependency on devlink->lock for region
snapshots list consistency and introduce new mutex to ensure it.
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2dec18ad

net: devlink: remove region snapshot ID tracking dependency on devlink->lock · 5502e871

由 Jiri Pirko 提交于 2年前

After mlx4 driver is converted to do locked reload, functions to get/put
regions snapshot ID may be called from both locked and unlocked context.

So resolve this by removing dependency on devlink->lock for region
snapshot ID tracking by using internal xa_lock() to maintain
shapshot_ids xa_array consistency.
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

5502e871

devlink: introduce framework for selftests · 08f588fa

由 Vikas Gupta 提交于 2年前

Add a framework for running selftests.
Framework exposes devlink commands and test suite(s) to the user
to execute and query the supported tests by the driver.

Below are new entries in devlink_nl_ops
devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported
selftests by the drivers.
devlink_nl_cmd_selftests_run: To execute selftests. Users can
provide a test mask for executing group tests or standalone tests.

Documentation/networking/devlink/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS
Signed-off-by: NVikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
Reviewed-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

08f588fa

net/tls: Multi-threaded calls to TX tls_dev_del · 7adc91e0

由 Tariq Toukan 提交于 2年前

Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.

This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate).  When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.

Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel. Use
a new dedicated destruct workqueue for this.

Tested with mlx5 device:
Before: 22141 add/sec,   103 del/sec
After:  11684 add/sec, 11684 del/sec
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

7adc91e0

net/tls: Perform immediate device ctx cleanup when possible · 113671b2

由 Tariq Toukan 提交于 2年前

TLS context destructor can be run in atomic context. Cleanup operations
for device-offloaded contexts could require access and interaction with
the device callbacks, which might sleep. Hence, the cleanup of such
contexts must be deferred and completed inside an async work.

For all others, this is not necessary, as cleanup is atomic. Invoke
cleanup immediately for them, avoiding queueing redundant gc work.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

113671b2

tls: rx: Fix unsigned comparison with less than zero · 8fd1e151

由 Yang Li 提交于 2年前

The return from the call to tls_rx_msg_size() is int, it can be
a negative error code, however this is being assigned to an
unsigned long variable 'sz', so making 'sz' an int.

Eliminate the following coccicheck warning:
./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

8fd1e151

tls: rx: fix the false positive warning · e20691fa

由 Jakub Kicinski 提交于 2年前

I went too far in the accessor conversion, we can't use tls_strp_msg()
after decryption because the message may not be ready. What we care
about on this path is that the output skb is detached, i.e. we didn't
somehow just turn around and used the input skb with its TCP data
still attached. So look at the anchor directly.

Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser")
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e20691fa

tls: strp: rename and multithread the workqueue · d11ef9cc

由 Jakub Kicinski 提交于 2年前

Paolo points out that there seems to be no strong reason strparser
users a single threaded workqueue. Perhaps there were some performance
or pinning considerations? Since we don't know (and it's the slow path)
let's default to the most natural, multi-threaded choice.

Also rename the workqueue to "tls-".
Suggested-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

d11ef9cc

tls: rx: don't consider sock_rcvtimeo() cumulative · 70f03fc2

由 Jakub Kicinski 提交于 2年前

Eric indicates that restarting rcvtimeo on every wait may be fine.
I thought that we should consider it cumulative, and made
tls_rx_reader_lock() return the remaining timeo after acquiring
the reader lock.

tls_rx_rec_wait() gets its timeout passed in by value so it
does not keep track of time previously spent.

Make the lock waiting consistent with tls_rx_rec_wait() - don't
keep track of time spent.

Read the timeo fresh in tls_rx_rec_wait().
It's unclear to me why callers are supposed to cache the value.

Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/Signed-off-by: NJakub Kicinski <kuba@kernel.org>

70f03fc2

net: ping6: Fix memleak in ipv6_renew_options(). · e2732600

由 Kuniyuki Iwashima 提交于 2年前

When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy().  As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.

    struct ipv6_sr_hdr *hdr;
    char data[24] = {0};
    int fd;

    hdr = (struct ipv6_sr_hdr *)data;
    hdr->hdrlen = 2;
    hdr->type = IPV6_SRCRT_TYPE_4;

    fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
    setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
    close(fd);

To fix memory leaks, let's add a destroy function.

Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range.  The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0).  Thus, the local DoS does
not succeed until we change the default value.  However, at least
Ubuntu/Fedora/RHEL loosen it.

    $ cat /usr/lib/sysctl.d/50-default.conf
    ...
    -net.ipv4.ping_group_range = 0 2147483647

Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.

  setsockopt
      IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
      IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
      IPV6_HOPOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDR (inet6_sk(sk)->opt)
      IPV6_DSTOPTS (inet6_sk(sk)->opt)
      IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)

  getsockopt
      IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)

For the record, I left a different splat with syzbot's one.

  unreferenced object 0xffff888006270c60 (size 96):
    comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
    hex dump (first 32 bytes):
      01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00  ....D...........
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
      [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
      [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
      [<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
      [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
      [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176

Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: NAyushman Dutta <ayudutta@amazon.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e2732600

28 7月, 2022 4 次提交

net: devlink: remove redundant net_eq() check from sb_pool_get_dumpit() · 2bb88b2c

由 Jiri Pirko 提交于 2年前

The net_eq() check is already performed inside
devlinks_xa_for_each_registered_get() helper, so remove the redundant
appearance.
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220727055912.568391-1-jiri@resnulli.usSigned-off-by: NJakub Kicinski <kuba@kernel.org>

2bb88b2c

net/sched: sch_cbq: change the type of cbq_set_lss to void · a482d47d

由 Zhengchao Shao 提交于 2年前

Change the type of cbq_set_lss to void.
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220726030748.243505-1-shaozhengchao@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a482d47d

sctp: leave the err path free in sctp_stream_init to sctp_stream_free · 181d8d20

由 Xin Long 提交于 2年前

A NULL pointer dereference was reported by Wei Chen:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  RIP: 0010:__list_del_entry_valid+0x26/0x80
  Call Trace:
   <TASK>
   sctp_sched_dequeue_common+0x1c/0x90
   sctp_sched_prio_dequeue+0x67/0x80
   __sctp_outq_teardown+0x299/0x380
   sctp_outq_free+0x15/0x20
   sctp_association_free+0xc3/0x440
   sctp_do_sm+0x1ca7/0x2210
   sctp_assoc_bh_rcv+0x1f6/0x340

This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.

As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.

Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.

Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
Reported-by: NWei Chen <harperchen1110@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

181d8d20

tcp: md5: fix IPv4-mapped support · e62d2e11

由 Eric Dumazet 提交于 2年前

After the blamed commit, IPv4 SYN packets handled
by a dual stack IPv6 socket are dropped, even if
perfectly valid.

$ nstat | grep MD5
TcpExtTCPMD5Failure             5                  0.0

For a dual stack listener, an incoming IPv4 SYN packet
would call tcp_inbound_md5_hash() with @family == AF_INET,
while tp->af_specific is pointing to tcp_sock_ipv6_specific.

Only later when an IPv4-mapped child is created, tp->af_specific
is changed to tcp_sock_ipv6_mapped_specific.

Fixes: 7bbb765b ("net/tcp: Merge TCP-MD5 inbound callbacks")
Reported-by: NBrian Vazquez <brianvv@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Reviewed-by: NDmitry Safonov <dima@arista.com>
Tested-by: NLeonard Crestez <cdleonard@gmail.com>
Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e62d2e11

27 7月, 2022 7 次提交

net/smc: Enable module load on netlink usage · 28ec53f3

由 Stefan Raspl 提交于 2年前

Previously, the smc and smc_diag modules were automatically loaded as
dependencies of the ism module whenever an ISM device was present.
With the pending rework of the ISM API, the smc module will no longer
automatically be loaded in presence of an ISM device. Usage of an AF_SMC
socket will still trigger loading of the smc modules, but usage of a
netlink socket will not.
This is addressed by setting the correct module aliases.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28ec53f3

net/smc: Pass on DMBE bit mask in IRQ handler · 8b2fed8e

由 Stefan Raspl 提交于 2年前

Make the DMBE bits, which are passed on individually in ism_move() as
parameter idx, available to the receiver.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b2fed8e

s390/ism: Cleanups · 0a2f4f98

由 Stefan Raspl 提交于 2年前

Reworked signature of the function to retrieve the system EID: No plausible
reason to use a double pointer. And neither to pass in the device as an
argument, as this identifier is by definition per system, not per device.
Plus some minor consistency edits.
Signed-off-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a2f4f98

net/smc: Eliminate struct smc_ism_position · eb481b02

由 Heiko Carstens 提交于 2年前

This struct is used in a single place only, and its usage generates
inefficient code. Time to clean up!
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
Reviewed-and-tested-by: NStefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: NTony Lu <tonylu@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb481b02

ip6mr: remove stray rcu_read_unlock() from ip6_mr_forward() · a7e555d4

由 Eric Dumazet 提交于 2年前

One rcu_read_unlock() should have been removed in blamed commit.

Fixes: 9b1c21d8 ("ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()")
Reported-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220725200554.2563581-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a7e555d4

mptcp: Do not return EINPROGRESS when subflow creation succeeds · b5177ed9

由 Mat Martineau 提交于 2年前

New subflows are created within the kernel using O_NONBLOCK, so
EINPROGRESS is the expected return value from kernel_connect().
__mptcp_subflow_connect() has the correct logic to consider EINPROGRESS
to be a successful case, but it has also used that error code as its
return value.

Before v5.19 this was benign: all the callers ignored the return
value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic
netlink command that does use the return value, so the EINPROGRESS gets
propagated to userspace.

Make __mptcp_subflow_connect() always return 0 on success instead.

Fixes: ec3edaa7 ("mptcp: Add handling of outgoing MP_JOIN requests")
Fixes: 702c2f64 ("mptcp: netlink: allow userspace-driven subflow establishment")
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b5177ed9

tls: rx: do not use the standard strparser · 84c61fe1

由 Jakub Kicinski 提交于 2年前

TLS is a relatively poor fit for strparser. We pause the input
every time a message is received, wait for a read which will
decrypt the message, start the parser, repeat. strparser is
built to delineate the messages, wrap them in individual skbs
and let them float off into the stack or a different socket.
TLS wants the data pages and nothing else. There's no need
for TLS to keep cloning (and occasionally skb_unclone()'ing)
the TCP rx queue.

This patch uses a pre-allocated skb and attaches the skbs
from the TCP rx queue to it as frags. TLS is careful never
to modify the input skb without CoW'ing / detaching it first.

Since we call TCP rx queue cleanup directly we also get back
the benefit of skb deferred free.

Overall this results in a 6% gain in my benchmarks.
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

84c61fe1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功