- 30 11月, 2022 32 次提交
-
-
由 Hao Chen 提交于
driver inclusion category:feature bugzilla: https://gitee.com/openeuler/kernel/issues/I62HX2 ---------------------------------------------------------------------- When serdes lane support setting 25Gb/s、50Gb/s speed and user wants to set port speed as 50Gb/s, it can be setted as one 50Gb/s serdes lane or two 25Gb/s serdes lanes. So, this patch adds support to query and set lane number by sysfs to satisfy this scenario. Signed-off-by: NHao Chen <chenhao418@huawei.com> Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jian Shen 提交于
driver inclusion category:feature bugzilla: https://gitee.com/openeuler/kernel/issues/I62HX2 ---------------------------------------------------------------------- For the fd rule of queue bonding is created by hardware automatically, the driver needs to specify the fd counter for each function, then it's available to query how many times the queue bonding fd rules hit. Signed-off-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jian Shen 提交于
driver inclusion category:feature bugzilla: https://gitee.com/openeuler/kernel/issues/I62HX2 ---------------------------------------------------------------------- For device version V3, the hardware supports queue bonding mode. VF can not enable queue bond mode unless PF enables it. So VF needs to query whether PF support queue bonding mode when initializing, and query whether PF enables queue bonding mode periodically. For the resource limited, to avoid a VF occupy to many FD rule space, only trust VF is allowed to enable queue bonding mode. Signed-off-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jian Shen 提交于
driver inclusion category:feature bugzilla: https://gitee.com/openeuler/kernel/issues/I62HX2 ---------------------------------------------------------------------- For device version V3, it supports queue bonding, which can identify the tuple information of TCP stream, and create flow director rules automatically, in order to keep the tx and rx packets are in the same queue pair. The driver set FD_ADD field of TX BD for TCP SYN packet, and set FD_DEL filed for TCP FIN or RST packet. The hardware create or remove a fd rule according to the TX BD, and it also support to age-out a rule if not hit for a long time. The queue bonding mode is default to be disabled, and can be enabled/disabled with ethtool priv-flags command. Signed-off-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jian Shen 提交于
driver inclusion category:feature bugzilla: https://gitee.com/openeuler/kernel/issues/I62HX2 ---------------------------------------------------------------------- Currently, the PF check the VF alive by the KEEP_ALVE mailbox from VF. VF keep sending the mailbox per 2 seconds. Once PF lost the mailbox for more than 8 seconds, it will regards the VF is abnormal, and stop notifying the state change to VF, include link state, vf mac, reset, even though it receives the KEEP_ALIVE mailbox again. It's inreasonable. This patch fixes it. PF will record the state change which need to notify VF when lost the VF's KEEP_ALIVE mailbox. And notify VF when receive the mailbox again. Introduce a new flag HCLGE_VPORT_STATE_INITED, used to distinguish the case whether VF driver loaded or not. For VF will query these states when initializing, so it's unnecessary to notify it in this case. Signed-off-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NJian Shen <shenjian15@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 GUO Zihua 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I62DVN CVE: NA -------------------------------- Syzkaller reported a UAF in mpi_key_length(). BUG: KASAN: use-after-free in mpi_key_length+0x34/0xb0 Read of size 2 at addr ffff888005737e14 by task syz-executor.15/6236 CPU: 1 PID: 6236 Comm: syz-executor.15 Kdump: loaded Tainted: GF OE 5.10.0.kasan.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220525_182517-szxrtosci10000 04/01/2014 Call Trace: dump_stack+0x9c/0xd3 print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 mpi_key_length+0x34/0xb0 pgp_calc_pkey_keyid.isra.0+0x100/0x5a0 pgp_generate_fingerprint+0x159/0x330 pgp_process_public_key+0x1c5/0x330 pgp_parse_packets+0xf4/0x200 pgp_key_parse+0xb6/0x340 asymmetric_key_preparse+0x8a/0x120 key_create_or_update+0x31f/0x8c0 __se_sys_add_key+0x23e/0x400 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 The root cause of the issue is that pgp_calc_pkey_keyid() would call mpi_key_length() and get the length of the public key. The length was then ducted from keylen, which is an unsigned value. However, the returnd byte count is not checked for legitimacy in mpi_key_length(), resulting in an inverted keylen, hence the read overflow. It turns out that the byte count check was mistakenly placed in mpi_read_from_buffer() while commit 94479061 ("mpi: introduce mpi_key_length()") tries to extract mpi_key_length() out of mpi_read_from_buffer(). This patch moves the check into mpi_key_length(). Fixes: commit 94479061 ("mpi: introduce mpi_key_length()") Signed-off-by: NGUO Zihua <guozihua@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yuyao Lin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61XP8 -------------------------------- This reverts commit 098b0e01. Function timespec64_to_ns() Add the upper and lower limits check in commit cb477557 ("time: Prevent undefined behaviour in timespec64_to_ns()"), timespec64_to_ktime() only check the upper limits,so revert this patch can fix overflow. Signed-off-by: NYuyao Lin <linyuyao1@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Luís Henriques 提交于
stable inclusion from stable-v5.10.146 commit 958b0ee23f5ac106e7cc11472b71aa2ea9a033bc category: bugfix bugzilla: 187444, https://gitee.com/openeuler/kernel/issues/I6261Z CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=958b0ee23f5ac106e7cc11472b71aa2ea9a033bc -------------------------------- commit 29a5b8a1 upstream. When walking through an inode extents, the ext4_ext_binsearch_idx() function assumes that the extent header has been previously validated. However, there are no checks that verify that the number of entries (eh->eh_entries) is non-zero when depth is > 0. And this will lead to problems because the EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this: [ 135.245946] ------------[ cut here ]------------ [ 135.247579] kernel BUG at fs/ext4/extents.c:2258! [ 135.249045] invalid opcode: 0000 [#1] PREEMPT SMP [ 135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4 [ 135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0 [ 135.256475] Code: [ 135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246 [ 135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023 [ 135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c [ 135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c [ 135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024 [ 135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000 [ 135.272394] FS: 00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 [ 135.274510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0 [ 135.277952] Call Trace: [ 135.278635] <TASK> [ 135.279247] ? preempt_count_add+0x6d/0xa0 [ 135.280358] ? percpu_counter_add_batch+0x55/0xb0 [ 135.281612] ? _raw_read_unlock+0x18/0x30 [ 135.282704] ext4_map_blocks+0x294/0x5a0 [ 135.283745] ? xa_load+0x6f/0xa0 [ 135.284562] ext4_mpage_readpages+0x3d6/0x770 [ 135.285646] read_pages+0x67/0x1d0 [ 135.286492] ? folio_add_lru+0x51/0x80 [ 135.287441] page_cache_ra_unbounded+0x124/0x170 [ 135.288510] filemap_get_pages+0x23d/0x5a0 [ 135.289457] ? path_openat+0xa72/0xdd0 [ 135.290332] filemap_read+0xbf/0x300 [ 135.291158] ? _raw_spin_lock_irqsave+0x17/0x40 [ 135.292192] new_sync_read+0x103/0x170 [ 135.293014] vfs_read+0x15d/0x180 [ 135.293745] ksys_read+0xa1/0xe0 [ 135.294461] do_syscall_64+0x3c/0x80 [ 135.295284] entry_SYSCALL_64_after_hwframe+0x46/0xb0 This patch simply adds an extra check in __ext4_ext_check(), verifying that eh_entries is not 0 when eh_depth is > 0. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283 Cc: Baokun Li <libaokun1@huawei.com> Cc: stable@kernel.org Signed-off-by: NLuís Henriques <lhenriques@suse.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NBaokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ziyang Xuan 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61PL4 CVE: NA -------------------------------- Under sockmap redirect scenario, destroy sock when psock->ingress_msg is not empty. Get a warning as following: ================================================= WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x408/0x430 ... Call Trace: <IRQ> __sk_destruct+0x3d/0x590 net/core/sock.c:1784 sk_destruct net/core/sock.c:1829 [inline] __sk_free+0x106/0x2a0 net/core/sock.c:1840 sk_free+0x7d/0xb0 net/core/sock.c:1851 sock_put include/net/sock.h:1813 [inline] tcp_v4_rcv+0x23af/0x26e0 net/ipv4/tcp_ipv4.c:2085 ip_protocol_deliver_rcu+0xe5/0x440 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0xd2/0x110 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:304 [inline] ip_local_deliver+0x10a/0x260 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:459 [inline] ip_rcv_finish+0x126/0x160 net/ipv4/ip_input.c:428 NF_HOOK include/linux/netfilter.h:304 [inline] ip_rcv+0xbf/0x1d0 net/ipv4/ip_input.c:539 __netif_receive_skb_one_core+0x15f/0x190 net/core/dev.c:5366 __netif_receive_skb+0x2e/0xe0 net/core/dev.c:5480 process_backlog+0x132/0x2c0 net/core/dev.c:6386 napi_poll+0x17e/0x4f0 net/core/dev.c:6837 net_rx_action+0x183/0x3c0 net/core/dev.c:6907 That is because commit 7e41dfae18b1 ("[Huawei] bpf, sockmap: Add sk_rmem_alloc check for sockmap") does not consider redirect scenario, reduce sk_rmem_alloc without increasing sk_rmem_alloc. That would result in sk_rmem_alloc underflow. Fixes: 8818e269 ("bpf, sockmap: Add sk_rmem_alloc check for sockmap") Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I61E4M CVE: NA -------------------------------- When doing wakeups, attempt to limit superfluous scans of the LLC domain. ARM64 enables SIS_UTIL and disables SIS_PROP to search idle CPU based on sum of util_avg. Signed-off-by: NGuan Jing <guanjing6@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61E4M CVE: NA -------------------------------- The sched_domain_shared structure is only used as pointer, and other drivers don't use it directly. Signed-off-by: NGuan Jing <guanjing6@huawei.com> Reviewed-by: Nzhangjialin <zhangjialin11@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Yu 提交于
mainline inclusion from mainline-v6.0-rc1 commit 70fb5ccf category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I61E4M Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=70fb5ccf2ebb09a0c8ebba775041567812d45 -------------------------------- [Problem Statement] select_idle_cpu() might spend too much time searching for an idle CPU, when the system is overloaded. The following histogram is the time spent in select_idle_cpu(), when running 224 instances of netperf on a system with 112 CPUs per LLC domain: @usecs: [0] 533 | | [1] 5495 | | [2, 4) 12008 | | [4, 8) 239252 | | [8, 16) 4041924 |@@@@@@@@@@@@@@ | [16, 32) 12357398 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32, 64) 14820255 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [64, 128) 13047682 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [128, 256) 8235013 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [256, 512) 4507667 |@@@@@@@@@@@@@@@ | [512, 1K) 2600472 |@@@@@@@@@ | [1K, 2K) 927912 |@@@ | [2K, 4K) 218720 | | [4K, 8K) 98161 | | [8K, 16K) 37722 | | [16K, 32K) 6715 | | [32K, 64K) 477 | | [64K, 128K) 7 | | netperf latency usecs: ======= case load Lat_99th std% TCP_RR thread-224 257.39 ( 0.21) The time spent in select_idle_cpu() is visible to netperf and might have a negative impact. [Symptom analysis] The patch [1] from Mel Gorman has been applied to track the efficiency of select_idle_sibling. Copy the indicators here: SIS Search Efficiency(se_eff%): A ratio expressed as a percentage of runqueues scanned versus idle CPUs found. A 100% efficiency indicates that the target, prev or recent CPU of a task was idle at wakeup. The lower the efficiency, the more runqueues were scanned before an idle CPU was found. SIS Domain Search Efficiency(dom_eff%): Similar, except only for the slower SIS patch. SIS Fast Success Rate(fast_rate%): Percentage of SIS that used target, prev or recent CPUs. SIS Success rate(success_rate%): Percentage of scans that found an idle CPU. The test is based on Aubrey's schedtests tool, including netperf, hackbench, schbench and tbench. Test on vanilla kernel: schedstat_parse.py -f netperf_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% TCP_RR 28 threads 99.978 18.535 99.995 100.000 TCP_RR 56 threads 99.397 5.671 99.964 100.000 TCP_RR 84 threads 21.721 6.818 73.632 100.000 TCP_RR 112 threads 12.500 5.533 59.000 100.000 TCP_RR 140 threads 8.524 4.535 49.020 100.000 TCP_RR 168 threads 6.438 3.945 40.309 99.999 TCP_RR 196 threads 5.397 3.718 32.320 99.982 TCP_RR 224 threads 4.874 3.661 25.775 99.767 UDP_RR 28 threads 99.988 17.704 99.997 100.000 UDP_RR 56 threads 99.528 5.977 99.970 100.000 UDP_RR 84 threads 24.219 6.992 76.479 100.000 UDP_RR 112 threads 13.907 5.706 62.538 100.000 UDP_RR 140 threads 9.408 4.699 52.519 100.000 UDP_RR 168 threads 7.095 4.077 44.352 100.000 UDP_RR 196 threads 5.757 3.775 35.764 99.991 UDP_RR 224 threads 5.124 3.704 28.748 99.860 schedstat_parse.py -f schbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% normal 1 mthread 99.152 6.400 99.941 100.000 normal 2 mthreads 97.844 4.003 99.908 100.000 normal 3 mthreads 96.395 2.118 99.917 99.998 normal 4 mthreads 55.288 1.451 98.615 99.804 normal 5 mthreads 7.004 1.870 45.597 61.036 normal 6 mthreads 3.354 1.346 20.777 34.230 normal 7 mthreads 2.183 1.028 11.257 21.055 normal 8 mthreads 1.653 0.825 7.849 15.549 schedstat_parse.py -f hackbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% process-pipe 1 group 99.991 7.692 99.999 100.000 process-pipe 2 groups 99.934 4.615 99.997 100.000 process-pipe 3 groups 99.597 3.198 99.987 100.000 process-pipe 4 groups 98.378 2.464 99.958 100.000 process-pipe 5 groups 27.474 3.653 89.811 99.800 process-pipe 6 groups 20.201 4.098 82.763 99.570 process-pipe 7 groups 16.423 4.156 77.398 99.316 process-pipe 8 groups 13.165 3.920 72.232 98.828 process-sockets 1 group 99.977 5.882 99.999 100.000 process-sockets 2 groups 99.927 5.505 99.996 100.000 process-sockets 3 groups 99.397 3.250 99.980 100.000 process-sockets 4 groups 79.680 4.258 98.864 99.998 process-sockets 5 groups 7.673 2.503 63.659 92.115 process-sockets 6 groups 4.642 1.584 58.946 88.048 process-sockets 7 groups 3.493 1.379 49.816 81.164 process-sockets 8 groups 3.015 1.407 40.845 75.500 threads-pipe 1 group 99.997 0.000 100.000 100.000 threads-pipe 2 groups 99.894 2.932 99.997 100.000 threads-pipe 3 groups 99.611 4.117 99.983 100.000 threads-pipe 4 groups 97.703 2.624 99.937 100.000 threads-pipe 5 groups 22.919 3.623 87.150 99.764 threads-pipe 6 groups 18.016 4.038 80.491 99.557 threads-pipe 7 groups 14.663 3.991 75.239 99.247 threads-pipe 8 groups 12.242 3.808 70.651 98.644 threads-sockets 1 group 99.990 6.667 99.999 100.000 threads-sockets 2 groups 99.940 5.114 99.997 100.000 threads-sockets 3 groups 99.469 4.115 99.977 100.000 threads-sockets 4 groups 87.528 4.038 99.400 100.000 threads-sockets 5 groups 6.942 2.398 59.244 88.337 threads-sockets 6 groups 4.359 1.954 49.448 87.860 threads-sockets 7 groups 2.845 1.345 41.198 77.102 threads-sockets 8 groups 2.871 1.404 38.512 74.312 schedstat_parse.py -f tbench_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% loopback 28 threads 99.976 18.369 99.995 100.000 loopback 56 threads 99.222 7.799 99.934 100.000 loopback 84 threads 19.723 6.819 70.215 100.000 loopback 112 threads 11.283 5.371 55.371 99.999 loopback 140 threads 0.000 0.000 0.000 0.000 loopback 168 threads 0.000 0.000 0.000 0.000 loopback 196 threads 0.000 0.000 0.000 0.000 loopback 224 threads 0.000 0.000 0.000 0.000 According to the test above, if the system becomes busy, the SIS Search Efficiency(se_eff%) drops significantly. Although some benchmarks would finally find an idle CPU(success_rate% = 100%), it is doubtful whether it is worth it to search the whole LLC domain. [Proposal] It would be ideal to have a crystal ball to answer this question: How many CPUs must a wakeup path walk down, before it can find an idle CPU? Many potential metrics could be used to predict the number. One candidate is the sum of util_avg in this LLC domain. The benefit of choosing util_avg is that it is a metric of accumulated historic activity, which seems to be smoother than instantaneous metrics (such as rq->nr_running). Besides, choosing the sum of util_avg would help predict the load of the LLC domain more precisely, because SIS_PROP uses one CPU's idle time to estimate the total LLC domain idle time. In summary, the lower the util_avg is, the more select_idle_cpu() should scan for idle CPU, and vice versa. When the sum of util_avg in this LLC domain hits 85% or above, the scan stops. The reason to choose 85% as the threshold is that this is the imbalance_pct(117) when a LLC sched group is overloaded. Introduce the quadratic function: y = SCHED_CAPACITY_SCALE - p * x^2 and y'= y / SCHED_CAPACITY_SCALE x is the ratio of sum_util compared to the CPU capacity: x = sum_util / (llc_weight * SCHED_CAPACITY_SCALE) y' is the ratio of CPUs to be scanned in the LLC domain, and the number of CPUs to scan is calculated by: nr_scan = llc_weight * y' Choosing quadratic function is because: [1] Compared to the linear function, it scans more aggressively when the sum_util is low. [2] Compared to the exponential function, it is easier to calculate. [3] It seems that there is no accurate mapping between the sum of util_avg and the number of CPUs to be scanned. Use heuristic scan for now. For a platform with 112 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 112 111 108 102 93 81 65 47 25 1 0 ... For a platform with 16 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 16 15 15 14 13 11 9 6 3 0 0 ... Furthermore, to minimize the overhead of calculating the metrics in select_idle_cpu(), borrow the statistics from periodic load balance. As mentioned by Abel, on a platform with 112 CPUs per LLC, the sum_util calculated by periodic load balance after 112 ms would decay to about 0.5 * 0.5 * 0.5 * 0.7 = 8.75%, thus bringing a delay in reflecting the latest utilization. But it is a trade-off. Checking the util_avg in newidle load balance would be more frequent, but it brings overhead - multiple CPUs write/read the per-LLC shared variable and introduces cache contention. Tim also mentioned that, it is allowed to be non-optimal in terms of scheduling for the short-term variations, but if there is a long-term trend in the load behavior, the scheduler can adjust for that. When SIS_UTIL is enabled, the select_idle_cpu() uses the nr_scan calculated by SIS_UTIL instead of the one from SIS_PROP. As Peter and Mel suggested, SIS_UTIL should be enabled by default. This patch is based on the util_avg, which is very sensitive to the CPU frequency invariance. There is an issue that, when the max frequency has been clamp, the util_avg would decay insanely fast when the CPU is idle. Commit addca285 ("cpufreq: intel_pstate: Handle no_turbo in frequency invariance") could be used to mitigate this symptom, by adjusting the arch_max_freq_ratio when turbo is disabled. But this issue is still not thoroughly fixed, because the current code is unaware of the user-specified max CPU frequency. [Test result] netperf and tbench were launched with 25% 50% 75% 100% 125% 150% 175% 200% of CPU number respectively. Hackbench and schbench were launched by 1, 2 ,4, 8 groups. Each test lasts for 100 seconds and repeats 3 times. The following is the benchmark result comparison between baseline:vanilla v5.19-rc1 and compare:patched kernel. Positive compare% indicates better performance. Each netperf test is a: netperf -4 -H 127.0.1 -t TCP/UDP_RR -c -C -l 100 netperf.throughput ======= case load baseline(std%) compare%( std%) TCP_RR 28 threads 1.00 ( 0.34) -0.16 ( 0.40) TCP_RR 56 threads 1.00 ( 0.19) -0.02 ( 0.20) TCP_RR 84 threads 1.00 ( 0.39) -0.47 ( 0.40) TCP_RR 112 threads 1.00 ( 0.21) -0.66 ( 0.22) TCP_RR 140 threads 1.00 ( 0.19) -0.69 ( 0.19) TCP_RR 168 threads 1.00 ( 0.18) -0.48 ( 0.18) TCP_RR 196 threads 1.00 ( 0.16) +194.70 ( 16.43) TCP_RR 224 threads 1.00 ( 0.16) +197.30 ( 7.85) UDP_RR 28 threads 1.00 ( 0.37) +0.35 ( 0.33) UDP_RR 56 threads 1.00 ( 11.18) -0.32 ( 0.21) UDP_RR 84 threads 1.00 ( 1.46) -0.98 ( 0.32) UDP_RR 112 threads 1.00 ( 28.85) -2.48 ( 19.61) UDP_RR 140 threads 1.00 ( 0.70) -0.71 ( 14.04) UDP_RR 168 threads 1.00 ( 14.33) -0.26 ( 11.16) UDP_RR 196 threads 1.00 ( 12.92) +186.92 ( 20.93) UDP_RR 224 threads 1.00 ( 11.74) +196.79 ( 18.62) Take the 224 threads as an example, the SIS search metrics changes are illustrated below: vanilla patched 4544492 +237.5% 15338634 sched_debug.cpu.sis_domain_search.avg 38539 +39686.8% 15333634 sched_debug.cpu.sis_failed.avg 128300000 -87.9% 15551326 sched_debug.cpu.sis_scanned.avg 5842896 +162.7% 15347978 sched_debug.cpu.sis_search.avg There is -87.9% less CPU scans after patched, which indicates lower overhead. Besides, with this patch applied, there is -13% less rq lock contention in perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested .try_to_wake_up.default_wake_function.woken_wake_function. This might help explain the performance improvement - Because this patch allows the waking task to remain on the previous CPU, rather than grabbing other CPUs' lock. Each hackbench test is a: hackbench -g $job --process/threads --pipe/sockets -l 1000000 -s 100 hackbench.throughput ========= case load baseline(std%) compare%( std%) process-pipe 1 group 1.00 ( 1.29) +0.57 ( 0.47) process-pipe 2 groups 1.00 ( 0.27) +0.77 ( 0.81) process-pipe 4 groups 1.00 ( 0.26) +1.17 ( 0.02) process-pipe 8 groups 1.00 ( 0.15) -4.79 ( 0.02) process-sockets 1 group 1.00 ( 0.63) -0.92 ( 0.13) process-sockets 2 groups 1.00 ( 0.03) -0.83 ( 0.14) process-sockets 4 groups 1.00 ( 0.40) +5.20 ( 0.26) process-sockets 8 groups 1.00 ( 0.04) +3.52 ( 0.03) threads-pipe 1 group 1.00 ( 1.28) +0.07 ( 0.14) threads-pipe 2 groups 1.00 ( 0.22) -0.49 ( 0.74) threads-pipe 4 groups 1.00 ( 0.05) +1.88 ( 0.13) threads-pipe 8 groups 1.00 ( 0.09) -4.90 ( 0.06) threads-sockets 1 group 1.00 ( 0.25) -0.70 ( 0.53) threads-sockets 2 groups 1.00 ( 0.10) -0.63 ( 0.26) threads-sockets 4 groups 1.00 ( 0.19) +11.92 ( 0.24) threads-sockets 8 groups 1.00 ( 0.08) +4.31 ( 0.11) Each tbench test is a: tbench -t 100 $job 127.0.0.1 tbench.throughput ====== case load baseline(std%) compare%( std%) loopback 28 threads 1.00 ( 0.06) -0.14 ( 0.09) loopback 56 threads 1.00 ( 0.03) -0.04 ( 0.17) loopback 84 threads 1.00 ( 0.05) +0.36 ( 0.13) loopback 112 threads 1.00 ( 0.03) +0.51 ( 0.03) loopback 140 threads 1.00 ( 0.02) -1.67 ( 0.19) loopback 168 threads 1.00 ( 0.38) +1.27 ( 0.27) loopback 196 threads 1.00 ( 0.11) +1.34 ( 0.17) loopback 224 threads 1.00 ( 0.11) +1.67 ( 0.22) Each schbench test is a: schbench -m $job -t 28 -r 100 -s 30000 -c 30000 schbench.latency_90%_us ======== case load baseline(std%) compare%( std%) normal 1 mthread 1.00 ( 31.22) -7.36 ( 20.25)* normal 2 mthreads 1.00 ( 2.45) -0.48 ( 1.79) normal 4 mthreads 1.00 ( 1.69) +0.45 ( 0.64) normal 8 mthreads 1.00 ( 5.47) +9.81 ( 14.28) *Consider the Standard Deviation, this -7.36% regression might not be valid. Also, a OLTP workload with a commercial RDBMS has been tested, and there is no significant change. There were concerns that unbalanced tasks among CPUs would cause problems. For example, suppose the LLC domain is composed of 8 CPUs, and 7 tasks are bound to CPU0~CPU6, while CPU7 is idle: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 util_avg 1024 1024 1024 1024 1024 1024 1024 0 Since the util_avg ratio is 87.5%( = 7/8 ), which is higher than 85%, select_idle_cpu() will not scan, thus CPU7 is undetected during scan. But according to Mel, it is unlikely the CPU7 will be idle all the time because CPU7 could pull some tasks via CPU_NEWLY_IDLE. lkp(kernel test robot) has reported a regression on stress-ng.sock on a very busy system. According to the sched_debug statistics, it might be caused by SIS_UTIL terminates the scan and chooses a previous CPU earlier, and this might introduce more context switch, especially involuntary preemption, which impacts a busy stress-ng. This regression has shown that, not all benchmarks in every scenario benefit from idle CPU scan limit, and it needs further investigation. Besides, there is slight regression in hackbench's 16 groups case when the LLC domain has 16 CPUs. Prateek mentioned that we should scan aggressively in an LLC domain with 16 CPUs. Because the cost to search for an idle one among 16 CPUs is negligible. The current patch aims to propose a generic solution and only considers the util_avg. Something like the below could be applied on top of the current patch to fulfill the requirement: if (llc_weight <= 16) nr_scan = nr_scan * 32 / llc_weight; For LLC domain with 16 CPUs, the nr_scan will be expanded to 2 times large. The smaller the CPU number this LLC domain has, the larger nr_scan will be expanded. This needs further investigation. There is also ongoing work[2] from Abel to filter out the busy CPUs during wakeup, to further speed up the idle CPU scan. And it could be a following-up optimization on top of this change. Suggested-by: NTim Chen <tim.c.chen@intel.com> Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NChen Yu <yu.c.chen@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NYicong Yang <yangyicong@hisilicon.com> Tested-by: NMohini Narkhede <mohini.narkhede@intel.com> Tested-by: NK Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20220612163428.849378-1-yu.c.chen@intel.comSigned-off-by: NJialin Zhang <zhangjialin11@huawei.com> Signed-off-by: NGuan Jing <guanjing6@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Li Lingfeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60QE9 CVE: NA -------------------------------- As explained in 32c39e8a ("block: fix use after free for bd_holder_dir"), we should make sure the "disk" is still live and then grab a reference to 'bd_holder_dir'. However, the "disk" should be "the claimed slave bdev" rather than "the holding disk". Fixes: 32c39e8a ("block: fix use after free for bd_holder_dir") Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Michal Simek 提交于
mainline inclusion from mainline-v5.13-rc1 commit 6a37d750 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60OLE CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6a37d750037827d385672acdebf5788fc2ffa633 -------------------------------- Static analyzer tool found that the ret variable is not initialized but code expects ret value >=0 when pinconf is skipped in the first pinmux loop. The same expectation is for pinmux in a pinconf loop. That's why initialize ret to 0 to avoid uninitialized ret value in first loop or reusing ret value from first loop in second. Addresses-Coverity: ("Uninitialized variables") Signed-off-by: NMichal Simek <michal.simek@xilinx.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NColin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/e5203bae68eb94b4b8b4e67e5e7b4d86bb989724.1615534291.git.michal.simek@xilinx.comSigned-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NYuyao Lin <linyuyao1@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Michal Simek 提交于
mainline inclusion from mainline-v5.13-rc1 commit b991f8c3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60OLE CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b991f8c3622c8c9d01a1ada382682a731932e651 -------------------------------- Right now the handling order depends on how entries are coming which is corresponding with order in DT. We have reached the case with DT overlays where conf and mux descriptions are exchanged which ends up in sequence that firmware has been asked to perform configuration before requesting the pin. The patch is enforcing the order that pin is requested all the time first followed by pin configuration. This change will ensure that firmware gets requests in the right order. Signed-off-by: NMichal Simek <michal.simek@xilinx.com> Link: https://lore.kernel.org/r/cfbe01f791c2dd42a596cbda57e15599969b57aa.1615364211.git.michal.simek@xilinx.comSigned-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NYuyao Lin <linyuyao1@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
mainline inclusion from mainline-v5.16-rc2 commit 76dd2980 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5VGU9 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76dd298094f484c6250ebd076fa53287477b2328 -------------------------------- Our syzkaller report a null pointer dereference, root cause is following: __blk_mq_alloc_map_and_rqs set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs blk_mq_alloc_map_and_rqs blk_mq_alloc_rqs // failed due to oom alloc_pages_node // set->tags[hctx_idx] is still NULL blk_mq_free_rqs drv_tags = set->tags[hctx_idx]; // null pointer dereference is triggered blk_mq_clear_rq_mapping(drv_tags, ...) This is because commit 63064be1 ("blk-mq: Add blk_mq_alloc_map_and_rqs()") merged the two steps: 1) set->tags[hctx_idx] = blk_mq_alloc_rq_map() 2) blk_mq_alloc_rqs(..., set->tags[hctx_idx]) into one step: set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs() Since tags is not initialized yet in this case, fix the problem by checking if tags is NULL pointer in blk_mq_clear_rq_mapping(). Fixes: 63064be1 ("blk-mq: Add blk_mq_alloc_map_and_rqs()") Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20221011142253.4015966-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
stable inclusion from stable-v5.10.152 commit 31b1570677e8bf85f48be8eb95e21804399b8295 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60HVY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=31b1570677e8bf85f48be8eb95e21804399b8295 ------------------------------- commit 285febab upstream. commit 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is initialized") moves wbt_set_write_cache() before rq_qos_add(), which is wrong because wbt_rq_qos() is still NULL. Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc' directly. Noted that this patch also remove the redundant setting of 'rab->wc'. Fixes: 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is initialized") Reported-by: Nkernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.comSigned-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
stable inclusion from stable-v5.10.152 commit 910ba49b33450a878128adc7d9c419dd97efd923 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60HVY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=910ba49b33450a878128adc7d9c419dd97efd923 ------------------------------- commit 8c5035df upstream. Our test found a problem that wbt inflight counter is negative, which will cause io hang(noted that this problem doesn't exist in mainline): t1: device create t2: issue io add_disk blk_register_queue wbt_enable_default wbt_init rq_qos_add // wb_normal is still 0 /* * in mainline, disk can't be opened before * bdev_add(), however, in old kernels, disk * can be opened before blk_register_queue(). */ blkdev_issue_flush // disk size is 0, however, it's not checked submit_bio_wait submit_bio blk_mq_submit_bio rq_qos_throttle wbt_wait bio_to_wbt_flags rwb_enabled // wb_normal is 0, inflight is not increased wbt_queue_depth_changed(&rwb->rqos); wbt_update_limits // wb_normal is initialized rq_qos_track wbt_track rq->wbt_flags |= bio_to_wbt_flags(rwb, bio); // wb_normal is not 0,wbt_flags will be set t3: io completion blk_mq_free_request rq_qos_done wbt_done wbt_is_tracked // return true __wbt_done wbt_rqw_done atomic_dec_return(&rqw->inflight); // inflight is decreased commit 8235b5c1 ("block: call bdev_add later in device_add_disk") can avoid this problem, however it's better to fix this problem in wbt: 1) Lower kernel can't backport this patch due to lots of refactor. 2) Root cause is that wbt call rq_qos_add() before wb_normal is initialized. Fixes: e34cbd30 ("blk-wbt: add general throttling mechanism") Cc: <stable@vger.kernel.org> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Lei Chen 提交于
stable inclusion from stable-v5.10.152 commit 392536023da18086d57565e716ed50193869b8e7 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60HVY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=392536023da18086d57565e716ed50193869b8e7 ------------------------------- commit 5a20d073 upstream. It's unnecessary to call wbt_update_limits explicitly within wbt_init, because it will be called in the following function wbt_queue_depth_changed. Signed-off-by: NLei Chen <lennychen@tencent.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
mainline inclusion from mainline-v5.15-rc1 commit 89f871af category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I60HCD CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89f871af1b26d98d983cba7ed0e86effa45ba5f8 -------------------------------- If blk_mq_request_issue_directly() failed from blk_insert_cloned_request(), the request will be accounted start. Currently, blk_insert_cloned_request() is only called by dm, and such request won't be accounted done by dm. In normal path, io will be accounted start from blk_mq_bio_to_request(), when the request is allocated, and such io will be accounted done from __blk_mq_end_request_acct() whether it succeeded or failed. Thus add blk_account_io_done() to fix the problem. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220126012132.3111551-1-yukuai3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Conflict: block/blk-core.c Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
mainline inclusion from mainline-crypto commit 8f82f4ae category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8f82f4ae8946d665f1e38da8e2b39b929d2435b1 ---------------------------------------------------------------------- Because the permission on the VF debugfs file is "0444". So the VF function checking is redundant in qos writing api. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
mainline inclusion from v6.1-rc4 commit 22d7a6c3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22d7a6c39cabab811f42cb2daed2343c87b0aca5 ---------------------------------------------------------------------- The pci bdf number check is added for qos written by using the pci api. Directly get the devfn by pci_dev, so delete some redundant code. And use the kstrtoul instead of sscanf to simplify code. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
mainline inclusion from v6.1-rc4 commit 3efe90af category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3efe90af4c0c46c58dba1b306de142827153d9c0 ---------------------------------------------------------------------- Increase the buffer to prevent stack overflow by fuzz test. The maximum length of the qos configuration buffer is 256 bytes. Currently, the value of the 'val buffer' is only 32 bytes. The sscanf does not check the dest memory length. So the 'val buffer' may stack overflow. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from v6.1-rc4 commit ee1537fe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee1537fe3dd89860d0336563891f6cac707d0cb5 ---------------------------------------------------------------------- After the device is reset, the VF needs to re-enable communication interrupt before the VF sends restart complete message to the PF. If the interrupt is re-enabled after the VF notifies the PF, the PF may fail to send messages to the VF after receiving VF's restart complete message. Fixes: 760fe22c ("crypto: hisilicon/qm - update reset flow") Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from v6.1-rc4 commit 94adb03f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=94adb03fd58bbe355e3d7a9d0f701889313e4a51 ---------------------------------------------------------------------- Change the value of clock gating register to 0x7fff to enable clock gating of the address prefetch module. When the device is idle, the clock is turned off to save power. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from v6.1-rc4 commit f57e2928 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f57e292897cac13b6ddee078aea21173b234ecb7 ---------------------------------------------------------------------- In qm_get_xqc_depth(), parameters low_bits and high_bits save the values of the corresponding bits. However, the values saved by the two parameters are opposite. As a result, the values returned to the callers are incorrect. Fixes: 129a9f34 ("crypto: hisilicon/qm - get qp num and depth from hardware registers") Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yicong Yang 提交于
mainline inclusion from v6.1-rc4 commit 7001141d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7001141d34e550854425afa76e960513cf150a62 ---------------------------------------------------------------------- dev_to_node() can handle the case when CONFIG_NUMA is not set, so the check of CONFIG_NUMA is redundant and can be removed. Signed-off-by: NYicong Yang <yangyicong@hisilicon.com> Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhiqi Song 提交于
mainline inclusion from v6.1-rc4 commit 45e6319b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=45e6319bd5f2154d8b8c9f1eaa4ac030ba0d330c ---------------------------------------------------------------------- In hpre_remove(), when the disable operation of qm sriov failed, the following logic should continue to be executed to release the remaining resources that have been allocated, instead of returning directly, otherwise there will be resource leakage. Signed-off-by: NZhiqi Song <songzhiqi1@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
mainline inclusion from v6.1-rc4 commit f5b657e5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZHPY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f5b657e5dbf830cfcb19b588b784b8190a5164a0 ---------------------------------------------------------------------- The default qos value is not initialized when sriov is repeatedly enabled and disabled. So add the vf qos value initialized in the sriov enable process. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @haochengxie This patch set AMD Perf BRS feature built in x86 kernel as default. Need this PR merged first: https://gitee.com/openeuler/kernel/pulls/201 Link:https://gitee.com/openeuler/kernel/pulls/216 Reviewed-by: Liu Chao <liuchao173@huawei.com> Reviewed-by: Kai Liu <kai.liu@suse.com> Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @haochengxie Add new perf features, backporting these patches from mainline: 1.Performance Monitor V2 - Global Controls 2.IBS(Instrument Based Sample)extensions 3.BRS(Branch Sample) Link:https://gitee.com/openeuler/kernel/pulls/201 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @Hongchen_Zhang This series of patches adds support for LoongArch architecture. - Support information New firmware+New World system / Old firmware+New World system compile with new compiler . The Corresponding new world firmware can be download from: https://github.com/loongson/Firmware . The CLFS system can be used for verification,refer to the following link for detail: https://github.com/sunhaiyong1978/CLFS-for-LoongArch/releases/tag/6.0 https://github.com/sunhaiyong1978/CLFS-for-LoongArch/blob/main/CLFS_For_LoongArch64.md#8-%E5%88%9B%E5%BB%BA%E5%90%AF%E5%8A%A8u%E7%9B%98 - Patch from https://github.com/loongson/linux/tree/loongarch-next ,update to 2022-09-03 - Testing 3a5000+71000, 3C5000+7A1000 boot up,reboot test OK,ltp 24 hour test OK Link:https://gitee.com/openeuler/kernel/pulls/265 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
- 29 11月, 2022 8 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @anatas [Description] We try to implement a live-patch mechanism in the userspace based on the UPROBE. In the handler, we may change the PC register. In this case, UPROBE must skip the handle of the next instruction. [Testing] kernel options: UPROBES_SUPPORT_PC_ALTER=y Link:https://gitee.com/openeuler/kernel/pulls/250 Reviewed-by: Xu Kuohai <xukuohai@huawei.com> Reviewed-by: Liu Chao <liuchao173@huawei.com> Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @liujie-248683921 This series contains support to get basic metricgroups working for arm64 CPUs. Initial support is added for HiSilicon hip08 platform. Some sample usage on Huawei D06 board: $ ./perf list metric List of pre-defined events (to be used in -e): Metrics: bp_misp_flush [BP misp flush L3 topdown metric] branch_mispredicts [Branch mispredicts L2 topdown metric] core_bound [Core bound L2 topdown metric] divider [Divider L3 topdown metric] exe_ports_util [EXE ports util L3 topdown metric] fetch_bandwidth_bound [Fetch bandwidth bound L2 topdown metric] fetch_latency_bound [Fetch latency bound L2 topdown metric] fsu_stall [FSU stall L3 topdown metric] idle_by_icache_miss $ sudo ./perf stat -v -M core_bound sleep 1 Using CPUID 0x00000000480fd010 metric expr (exe_stall_cycle - (mem_stall_anyload + armv8_pmuv3_0@event=0x7005@)) / cpu_cycles for core_bound found event cpu_cycles found event armv8_pmuv3_0/event=0x7005/ found event exe_stall_cycle found event mem_stall_anyload adding {cpu_cycles -> armv8_pmuv3_0/event=0x7001/ mem_stall_anyload -> armv8_pmuv3_0/event=0x7004/ Control descriptor is not initialized cpu_cycles: 989433 385050 385050 armv8_pmuv3_0/event=0x7005/: 19207 385050 385050 exe_stall_cycle: 900825 385050 385050 mem_stall_anyload: 253516 385050 385050 Performance counter stats for 'sleep': 989,433 cpu_cycles # 0.63 core_bound 19,207 armv8_pmuv3_0/event=0x7005/ 900,825 exe_stall_cycle 253,516 mem_stall_anyload 0.000805809 seconds time elapsed 0.000875000 seconds user 0.000000000 seconds sys perf stat --topdown is not supported, as this requires the CPU PMU to expose (alias) events for the TopDown L1 metrics from sysfs, which arm does not do. To get that to work, we probably need to make perf use the pmu-events cpumap to learn about those alias events. Metric reuse support is added for pmu-events parse metric testcase. This had been broken on power9 recently: https://lore.kernel.org/lkml/20210324015418.GC8931@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com/ Differences to v2: Add TB and RB tags (Thanks!) Rename metricgroup__find_metric() from metricgroup_find_metric() Change resolve_metric_simple() to rescan after any insert Differences to v1: Add pmu_events_map__find() as arm64-specific function Fix metric reuse for pmu-events parse metric testcase John Garry (6): perf metricgroup: Make find_metric() public with name change perf test: Handle metric reuse in pmu-events parsing test perf pmu: Add pmu_events_map__find() perf vendor events arm64: Add Hisi hip08 L1 metrics perf vendor events arm64: Add Hisi hip08 L2 metrics perf vendor events arm64: Add Hisi hip08 L3 metrics tools/perf/arch/arm64/util/Build | 1 + tools/perf/arch/arm64/util/pmu.c | 25 ++ .../arch/arm64/hisilicon/hip08/metrics.json | 233 ++++++++++++++++++ tools/perf/tests/pmu-events.c | 83 ++++++- tools/perf/util/metricgroup.c | 12 +- tools/perf/util/metricgroup.h | 3 +- tools/perf/util/pmu.c | 5 + tools/perf/util/pmu.h | 1 + tools/perf/util/s390-sample-raw.c | 4 +- 9 files changed, 356 insertions(+), 11 deletions(-) create mode 100644 tools/perf/arch/arm64/util/pmu.c create mode 100644 tools/perf/pmu-events/arch/arm64/hisilicon/hip08/metrics.json Reference:https://patchwork.kernel.org/project/linux-arm-kernel/cover/1617791570-165223-1-git-send-email-john.garry@huawei.com/ Bugfix:perf vendor events arm64: Fix incorrect metrics and improve readability First fix the incorrect hip08 metrics, then add some core events to the JSON file. Last, change the event code to the event name for improving readability. changes in v2: - adjust commit msg of 1st patch. - fix tab in 3rd patch. Shang XiaoJing (3): perf vendor events arm64: Fix incorrect Hisi hip08 L3 metrics perf vendor events arm64: Add HiSilicon hip08 core events perf vendor events arm64: Use event name instead of event code .../arm64/hisilicon/hip08/core-imp-def.json | 132 ++++++++++++++++++ .../arch/arm64/hisilicon/hip08/metrics.json | 48 +++---- 2 files changed, 156 insertions(+), 24 deletions(- Reference:https://lore.kernel.org/all/20221021105035.10000-1-shangxiaojing@huawei.com/ Link:https://gitee.com/openeuler/kernel/pulls/268 Reviewed-by: Cheng Jian <cj.chengjian@huawei.com> Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @hejunhao3 ```shell Synchronize the code of mainline perf tool and support the parsing of TRBE trace data. [test log] estuary:/$ perf record -e /cs_etm/@trbe3/ -C 3 -o trace.data taskset -c 3 uname -a Linux (none) 5.10.0+ #7 SMP PREEMPT Thu Nov 24 11:26:48 CST 2022 aarch64 GNU/Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.851 MB trace.data ] /estuary:/$ perf report --stdio -i trace.data -D > report.txt estuary:/$ grep -rn "ETE" report.txt 5835:. ... CoreSight ETE Trace data: size 0xd0d00 bytes estuary:/$ grep -rn "I_ASYNC : Alignment Synchronisation." report.txt | tail -n 5 497159: Idx:816712; ID:7; I_ASYNC : Alignment Synchronisation. 499987: Idx:820816; ID:7; I_ASYNC : Alignment Synchronisation. 502722: Idx:824928; ID:7; I_ASYNC : Alignment Synchronisation. 505083: Idx:829040; ID:7; I_ASYNC : Alignment Synchronisation. 507427: Idx:833132; ID:7; I_ASYNC : Alignment Synchronisation. estuary:/$ estuary:/$ perf record -e /cs_etm/@tmc_etr0/ -C 2 -o trace.data taskset -c 2 uname -a Linux (none) 5.10.0+ #7 SMP PREEMPT Thu Nov 24 11:26:48 CST 2022 aarch64 GNU/Linux [82501.067549] coresight tmc_etr0: timeout while waiting for completion of Manual Flush [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.690 MB trace.data ] /estuary:/$ perf report --stdio -i trace.data -D > report.txt estuary:/$ grep -rn "ETE" report.txt 8528:. ... CoreSight ETE Trace data: size 0xa40d0 bytes estuary:/$ grep -rn "I_ASYNC : Alignment Synchronisation." report.txt | tail -n 5 349615: Idx:633382; ID:5; I_ASYNC : Alignment Synchronisation. 350304: Idx:634786; ID:5; I_ASYNC : Alignment Synchronisation. 352589: Idx:639200; ID:5; I_ASYNC : Alignment Synchronisation. 354957: Idx:643604; ID:5; I_ASYNC : Alignment Synchronisation. 357246: Idx:648003; ID:5; I_ASYNC : Alignment Synchronisation. estuary:/$ estuary:/$ perf record -C 0 -e arm_spe_0/branch_filter=1/ -o branch.data sleep 3s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.663 MB branch.data ] estuary:/$ perf report -D -i branch.data > branch.log estuary:/$ grep -rn "B COND" branch.log | tail -=n 5 133506:. 000996d4: 4a 01 B COND 133517:. 0009970c: 4a 01 B COND 133539:. 0009977c: 4a 01 B COND 133572:. 00099824: 4a 01 B COND 133671:. 00099a1c: 4a 01 B COND estuary:/$ perf -v perf version 5.10.gede0fc40b9bf ``` Link:https://gitee.com/openeuler/kernel/pulls/282 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: Ling Mingqiang <lingmingqiang@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @xiaosuli3109 Fibre Channel has been the standard connection type for storage area networks (SAN) in enterprise storage. Despite its name, Fibre Channel signaling can run on both twisted pair copper wire and fiber-optic cables. The FibreChannel Gen7 adapter supports 64G link speeds. Add debug print support to the driver. Fix Issue: https://gitee.com/openeuler/kernel/issues/I6337O Link:https://gitee.com/openeuler/kernel/pulls/283 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Hongchen Zhang 提交于
LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OHOB -------------------------------- Ensure the netswift 10G NIC driver ko can be distributed in ISO on LoongArch. Signed-off-by: NHongchen Zhang <zhanghongchen@loongson.cn>
-
由 Baoqi Zhang 提交于
LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OHOB -------------------------------- Signed-off-by: NBaoqi Zhang <zhangbaoqi@loongson.cn> Change-Id: I7d70a63b5a813551b81f60f07dfedbbcd01d4336
-
由 Qing Zhang 提交于
LoongArch inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OHOB -------------------------------- This GMAC module is integrated into the Loongson-2K SoC and the LS7A bridge chip. commit 30bba69d upstream. Backport: stmmac: dwmac-loongson: fix uninitialized variable in loongson_dwmac_probe() stmmac: dwmac-loongson: Fix unsigned comparison to zero stmmac: dwmac-loongson:Fix missing return value stmmac: dwmac-loongson: change loongson_dwmac_driver from global to static stmmac: pci: Add LS7A support for dwmac-loongson This patch also disable dwmac FLOW_AUTO. Since DWMAC_LOONGSON do NOT support FLOW_AUTO. Signed-off-by: NQing Zhang <zhangqing@loongson.cn> Signed-off-by: NJiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NMing Wang <wangming01@loongson.cn> Change-Id: Ifefac7d47a05373ca7160d22a44d6c07b6a896e5
-
由 Longjun Luo 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I61AXT CVE: N/A Within uprobe handlers, the pc register could be modified. In this situation, there is no need to do a single stepping. Just like the kprobe, we skip it. Signed-off-by: NLongjun Luo <luolongjun@huawei.com>
-