- 16 5月, 2023 3 次提交
-
-
由 Ilya Leoshkevich 提交于
stable inclusion from stable-v5.10.151 commit bbaea0f1cd33d702d053d5bdaf6d6dec3932894c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64L0X Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bbaea0f1cd33d702d053d5bdaf6d6dec3932894c -------------------------------- commit db16c1fe upstream. [backported for dependency only extra_paholeopt variable setup and usage, we don't want floats generated in 5.10] pahole v1.21 supports the --btf_gen_floats flag, which makes it generate the information about the floating-point types [1]. Adjust link-vmlinux.sh to pass this flag to pahole in case it's supported, which is determined using a simple version check. [1] https://lore.kernel.org/dwarves/YHRiXNX1JUF2Az0A@kernel.org/Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210413190043.21918-1-iii@linux.ibm.comSigned-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLipeng Sang <sanglipeng1@jd.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @yunyingsun Title: Add PMU support for Intel Emerald Rapids Content: This PR adds Performance Monitoring Unit(PMU) support for next Intel Xeon platform Emerald Rapids. Totally 6 commits, including 4 EMR PMU enabling patches from v6.2 and 2 dependent patches from v5.14/v5.18: (v6.2-rc6) 5a8a05f1 perf/x86/intel/cstate: Add Emerald Rapids (v5.18-rc4) 528c9f1d perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support (v5.14-rc1) 87bf399f perf/x86/cstate: Add ICELAKE_X and ICELAKE_D support (v6.2-rc6) 6795e558 perf/x86/intel: Add Emerald Rapids (v6.2-rc4) 5268a284 perf/x86/intel/uncore: Add Emerald Rapids (v6.2-rc4) 69ced416 perf/x86/msr: Add Emerald Rapids The four 6.2 patches above use a macro "INTEL_FAM6_EMERALDRAPIDS_X", which is introduced by: (v6.1-rc1) 7beade0d x86/cpu: Add several Intel server CPU model numbers This patch is already included in another PR: https://gitee.com/openeuler/kernel/pulls/469 Note: this PR for PMU must be merged AFTER PR-469, otherwise there will be kernel compiling error complaining for missing definition of macro "INTEL_FAM6_EMERALDRAPIDS_X". Intel-kernel issue: https://gitee.com/openeuler/intel-kernel/issues/I6YO4Z Test: 1. platform dependent core PMU event works with perf, like "L1-dcache-loads". 2. platform dependent uncore PMU event works with perf, like "uncore_imc_0/event=0x1/". 3. offcore event works with perf. 3. PEBS works with perf. 4. topdown works with perf. With this PR(along with the patch from PR469) applied to kernel OLK-5.10, all tests above PASS on EMR. Known issue: N/A Default config change: N/A Link:https://gitee.com/openeuler/kernel/pulls/622 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Aichun Shi <aichun.shi@intel.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @yunyingsun Running Average Power Limit (RAPL) is an interface for reporting accumulated energy consumption of various system-on-chip (SoC) power domains. The RAPL energy reporting feature has been available for many generations on Intel SoC products. To enable perf collecting RAPL events on future Intel Xeon platform Emerald Rapids, below upstream commit is needed: (v6.2-rc3) 57512b57 perf/x86/rapl: Add support for Intel Emerald Rapids This patch uses a macro "INTEL_FAM6_EMERALDRAPIDS_X" which is introduced with upstream commit: (v6.1-rc1) 7beade0d x86/cpu: Add several Intel server CPU model numbers which has been backported to OLK-5.10 earlier with this PR(under review, not merged yet): https://gitee.com/openeuler/kernel/pulls/469 So here setting PR-469 as a dependency of this RAPL backport. Note: this PR must be merged AFTER PR-469, otherwise there will be kernel compiling error complaining for missing definition of macro "INTEL_FAM6_EMERALDRAPIDS_X". Intel-Kernel Issue: https://gitee.com/openeuler/intel-kernel/issues/I6YGL6 Test: With the patches in this PR included, running command below will list energy-psys events on EMR: $ perf list power | grep energy-psys Without the patches, no energy-psys event is available. It's verified that test is PASS on Intel EMR pre-production platform. Known Issue: N/A Default config change: N/A Link:https://gitee.com/openeuler/kernel/pulls/615 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
- 15 5月, 2023 1 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @NNNNicole Here is the typical case that priority inversion will caused occasionally by SMT expelling: Assuming that there are two SMT cores-cA and cB, online tasks are running on cA while offline tasks on cB. With SMT expelling, online task will drives off offline tasks to occupy all SMT cores exclusively, which, in turn, will starve the offline task to release the related resources other tasks with higher priority need. Hence, this patch will introduce another mechanism to alleviate this situation. For all offline tasks, one metric of profiling the maximum task expelling duration is set up and the default value is 5 seconds, if such offline task exsits, all offline tasks will be allowed to run into one small sleep(msleep) loop in kernel before they goes into usermode; and further, if the two SMT cores(such as cA and cB) are idle or don't get any online tasks to run, for these offline tasks, they will continue to run in usermode for the next schedule. kernel options: CONFIG_QOS_SCHED_SMT_EXPELLER=y Link:https://gitee.com/openeuler/kernel/pulls/640 Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> Reviewed-by: Liu Chao <liuchao173@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
- 12 5月, 2023 10 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @x56Jason This PR is to enable support for Intel new fast rep string operation performance enhancement. Starting with Golden Cove microarchitecture (SPR and Alderlake), Intel CPU support some rep string operation performance enhancement, which include the following features: - fast zero-length MOVSB - fast short STOSB - fast short CMPSB, SCASB More information see section 3.8 of "Intel® 64 and IA-32 Architectures Optimization Reference Manual". ## Intel-Kernel Issue #I6YPV0 ## Test Launch VM and run cpuid, we can see following cpu features are true: ``` fast zero-length MOVSB = true fast short STOSB = true fast short CMPSB, SCASB = true ``` Without this patchset, these features are false. ## Known Issue N/A ## Default Config Change N/A Link:https://gitee.com/openeuler/kernel/pulls/624 Reviewed-by: Aichun Shi <aichun.shi@intel.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @zhiquan1-li This PR includes incremental backporting patches which mainly covers some SGX bugfix until upstream v6.3. The total patch number is 9. **Intel-kernel issue:** https://gitee.com/openeuler/intel-kernel/issues/I6X1FF **Test:** 1. Build successfully for each commits 2. Kernel selftest - SGX: PASSED ```sh cd tools/testing/selftests/sgx/ make ./test_sgx ``` 3. SGX internal stress test: No new failure **Known issue:** None **Default config change:** None Link:https://gitee.com/openeuler/kernel/pulls/594 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Aichun Shi <aichun.shi@intel.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @allen-shi IFS is a hardware feature to run circuit level tests on a CPU core to detect problems that are not caught by parity or ECC checks. Intel In Field Scan(IFS) with multi-blob images supported in [PR471](https://gitee.com/openeuler/kernel/pulls/471), but also introduced microcode interface changes: 1. Removed /dev/cpu/microcode interface(Used by iucode-tool, microcode_ctl). 2. Disabled microcode late loading as default(MICROCODE_LATE_LOADING), which removed /sys/devices/system/cpu/microcode/reload interface. This PR includes 14 commits totally and is to recover the two microcode interfaces support by reverting related commits in [PR471](https://gitee.com/openeuler/kernel/pulls/471). **Intel-Kernel Issue** [#I6L337](https://gitee.com/openeuler/intel-kernel/issues/I6L337) **Test** Built and run the kernel successfully on openEuler 22.03 LTS SP1. Test is PASS on SPR platform. **Known Issue** N/A **Default config change** N/A Link:https://gitee.com/openeuler/kernel/pulls/580 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @quanxian-2021 Title: x86/cpu: Add several Intel server CPU model numbers Content: x86/cpu: Add several Intel server CPU model numbers These servers are all on the public versions of the roadmap. The model numbers for Grand Ridge, Granite Rapids, and Sierra Forest were included in the September 2022 edition of the Instruction Set Extensions document. Intel-kernel issue: https://gitee.com/openeuler/intel-kernel/issues/I6M81K Test: Boot test on EMR/GNR/SF server Known issue: N/A Default config change: N/A Link:https://gitee.com/openeuler/kernel/pulls/469 Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Aichun Shi <aichun.shi@intel.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6SIY2 ------------------------------- Signed-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6SIY2 ------------------------------- Add cmdline nosmtexpell to disable qos_smt_expell when we want to close. Signed-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6SIY2 ------------------------------- Here is the typical case that priority inversion will caused occasionally by SMT expelling: Assuming that there are two SMT cores-cA and cB, online tasks are running on cA while offline tasks on cB. With SMT expelling, online task will drives off offline tasks to occupy all SMT cores exclusively, which, in turn, will starve the offline task to release the related resources other tasks with higher priority need. Hence, this patch will introduce another mechanism to alleviate this situation. For all offline tasks, one metric of profiling the maximum task expelling duration is set up and the default value is 5 seconds, if such offline task exsits, all offline tasks will be allowed to run into one small sleep(msleep) loop in kernel before they goes into usermode; and further, if the two SMT cores(such as cA and cB) are idle or don't get any online tasks to run, for these offline tasks, they will continue to run in usermode for the next schedule. Signed-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @lu-tiancheng97 This patch aims to modify the mpam irq register error log. MPAM interrupts are used to report error information and are non-functional interrupts. The current interrupt number is set to the default value 0. As a result, the device startup log contains the error indicating that the MPAM interrupt registration fails, which is sensitive. Therefore, the log level is changed to alarm. Link:https://gitee.com/openeuler/kernel/pulls/753 Reviewed-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 Tiancheng Lu 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I71UGQ CVE: NA ---------------------------------------------------------------- MPAM interrupts are used to report error information and are non-functional interrupts. The current interrupt number is set to the default value 0. As a result, the device startup log contains the error indicating that the MPAM interrupt registration fails, which is sensitive. Therefore, the log level is changed to alarm. Signed-off-by: NTiancheng Lu <lutiancheng5@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6SIY2 ------------------------------- Track how many tasks are present with qos_offline_policy in each cfs_rq. This will be used by later commits. Signed-off-by: NGuan Jing <guanjing6@huawei.com>
-
- 10 5月, 2023 22 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @liujie-248683921 This is the follow-up work to support cluster scheduler. Previously we have added cluster level in the scheduler for both ARM64[1] and X86[2] to support load balance between clusters to bring more memory bandwidth and decrease cache contention. This patchset, on the other hand, takes care of wake-up path by giving CPUs within the same cluster a try before scanning the whole LLC to benefit those tasks communicating with each other. [1] bd0f49e67873 ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") [2] 9d6b58524779 ("sched: Add cluster scheduler level for x86") Barry Song (2): sched/fair: Scan cluster before scanning LLC in wake-up path sched: Add per_cpu cluster domain info and cpus_share_lowest_cache API In this PR, we also introduce a set of patches to make cluster scheduling configurable and open the kernel configuration for cluster. Tim Chen (3): scheduler: Create SDTL_SKIP flag to skip topology level scheduler: Add runtime knob sysctl_sched_cluster scheduler: Add boot time enabling/disabling of cluster scheduling Yicong Yang (1): scheduler: Disable cluster scheduling by default Jie Liu (1): sched: Open the kernel configuration for cluster The benchmark test is performed on the kunpeng920 with 96 CPUs and 4 numa. The baseline is the kernel for which sched_cluster is not enabled. Compared to the baseline, is the kernel with sched_cluster enabled. The tbench test was performed on threads 1, 3, 6, 12, 24, 48, and 96 on single numa, 2 numa, and 4 numa, respectively. tbench results (node 0): 1: 239.9910(0.00%) 253.1110 ( 5.47%) 3: 720.6490(0.00%) 754.6980 ( 4.72%) 6: 1423.2633(0.00%) 1494.7767 ( 5.02%) 12: 2793.9700(0.00%) 2959.4333 ( 5.92%) 24: 4951.6967(0.00%) 4880.1233 ( -1.45%) 48: 4127.8067(0.00%) 4082.2900 ( -1.10%) 96: 3737.0067(0.00%) 3700.7133 ( -0.97%) tbench results (node 0-1): 1: 241.8367(0.00%) 252.8930 ( 4.57%) 3: 719.8940(0.00%) 752.8933 ( 4.58%) 6: 1434.6667(0.00%) 1488.1167 ( 3.73%) 12: 2834.8233(0.00%) 2908.3833 ( 2.59%) 24: 5376.3200(0.00%) 5753.2133 ( 7.01%) 48: 9709.5933(0.00%) 9610.8667 ( -1.02%) 96: 8208.3500(0.00%) 8079.8200 ( -1.57%) tbench results (node 0-3): 1: 248.7557(0.00%) 252.9087 ( 1.67%) 3: 730.8120(0.00%) 733.9887 ( 0.43%) 6: 1439.8500(0.00%) 1424.3133 ( -1.08%) 12: 2821.1333(0.00%) 2782.7300 ( -1.36%) 24: 5366.3633(0.00%) 5050.6467 ( -5.88%) 48: 9362.8867(0.00%) 9323.3033 ( -0.42%) 96: 12269.2900(0.00%) 15987.9000 ( 30.31%) The netperf test was performed on threads 1, 3, 6, 12, 24, 48, and 96 on single numa, 2 numa, and 4 numa, respectively. netperf results TCP_RR (node 0): 1: 54557.1533(0.00%) 57056.0833 ( 4.58%) 3: 54073.4422(0.00%) 57013.9978 ( 5.44%) 6: 53158.2733(0.00%) 56904.6200 ( 7.05%) 12: 51908.7767(0.00%) 56452.1753 ( 8.75%) 24: 45868.0304(0.00%) 45737.4529 ( -0.28%) 48: 18372.7595(0.00%) 18353.7237 ( -0.10%) 96: 8298.8618(0.00%) 8276.4761 ( -0.27%) netperf results TCP_RR (node 0-1): 1: 54645.5400(0.00%) 57017.2833 ( 4.34%) 3: 53852.1678(0.00%) 56886.0478 ( 5.63%) 6: 54196.2400(0.00%) 56772.8533 ( 4.75%) 12: 53221.2439(0.00%) 56683.0367 ( 6.50%) 24: 51334.2392(0.00%) 55881.7862 ( 8.86%) 48: 40452.4043(0.00%) 43306.8335 ( 7.06%) 96: 19012.9919(0.00%) 19051.5740 ( 0.20%) netperf results TCP_RR (node 0-3): 1: 55933.2733(0.00%) 57134.6267 ( 2.15%) 3: 54865.2733(0.00%) 56848.4200 ( 3.61%) 6: 54131.9867(0.00%) 56813.5367 ( 4.95%) 12: 53226.0636(0.00%) 56336.5736 ( 5.84%) 24: 51632.2987(0.00%) 55689.8818 ( 7.86%) 48: 46864.5843(0.00%) 50361.5243 ( 7.46%) 96: 41761.4341(0.00%) 42939.8937 ( 2.82%) netperf results UDP_RR (node 0): 1: 64038.8467(0.00%) 66604.0933 ( 4.01%) 3: 64253.2456(0.00%) 66948.3744 ( 4.19%) 6: 63617.2783(0.00%) 66944.2483 ( 5.23%) 12: 61060.0514(0.00%) 66565.0756 ( 9.02%) 24: 54961.9269(0.00%) 54935.7403 ( -0.05%) 48: 21988.5656(0.00%) 21964.7232 ( -0.11%) 96: 9808.7866(0.00%) 9806.4410 ( -0.02%) netperf results UDP_RR (node 0-1): 1: 64101.6533(0.00%) 66924.6300 ( 4.40%) 3: 64058.6289(0.00%) 67014.3878 ( 4.61%) 6: 64000.8906(0.00%) 67007.2178 ( 4.70%) 12: 62794.1842(0.00%) 66901.7875 ( 6.54%) 24: 60655.0124(0.00%) 65935.2542 ( 8.71%) 48: 46036.2765(0.00%) 27424.1071 ( -40.43%) 96: 19524.9869(0.00%) 22832.9911 ( 16.94%) netperf results UDP_RR (node 0-3): 1: 65459.8033(0.00%) 66813.4067 ( 2.07%) 3: 64308.0756(0.00%) 66555.0544 ( 3.49%) 6: 63501.9544(0.00%) 66384.6244 ( 4.54%) 12: 62764.5350(0.00%) 66237.2536 ( 5.53%) 24: 62048.7932(0.00%) 65946.3492 ( 6.28%) 48: 57374.9211(0.00%) 61945.5453 ( 7.97%) 96: 37389.7065(0.00%) 52512.2749 ( 40.45%) The unixbench test of threads 6, 24, and 48 is performed on single numa, 2 numa, and 4 numa. ===== unixbench Dhrystone 2 using register variables ===== unixbench results (node 0): 6: 22394.8000(0.00%) 22424.7000 ( 0.13%) 24: 89510.0000(0.00%) 89514.0000 ( 0.00%) 48: 89713.0000(0.00%) 89748.1000 ( 0.04%) unixbench results (node 0-1): 6: 22427.0000(0.00%) 22366.8000 ( -0.27%) 24: 89601.6000(0.00%) 89632.3000 ( 0.03%) 48: 179007.6000(0.00%) 178949.8000 ( -0.03%) unixbench results (node 0-3): 6: 22403.7000(0.00%) 22419.9000 ( 0.07%) 24: 89566.7000(0.00%) 89541.2000 ( -0.03%) 48: 179065.0000(0.00%) 179055.2000 ( -0.01%) ===== unixbench Double-Precision Whetstone ===== unixbench results (node 0): 6: 4783.0000(0.00%) 4782.9000 ( -0.00%) 24: 19131.6000(0.00%) 19131.7000 ( 0.00%) 48: 38257.6000(0.00%) 38258.0000 ( 0.00%) unixbench results (node 0-1): 6: 4782.9000(0.00%) 4782.9000 ( 0.00%) 24: 19131.6000(0.00%) 19131.8000 ( 0.00%) 48: 38263.1000(0.00%) 38263.1000 ( 0.00%) unixbench results (node 0-3): 6: 4782.9000(0.00%) 4782.9000 ( 0.00%) 24: 19131.7000(0.00%) 19131.6000 ( -0.00%) 48: 38263.1000(0.00%) 38263.2000 ( 0.00%) ===== unixbench Execl Throughput ===== unixbench results (node 0): 6: 4013.2000(0.00%) 4209.5000 ( 4.89%) 24: 11262.1000(0.00%) 11223.5000 ( -0.34%) 48: 9748.9000(0.00%) 10940.7000 ( 12.22%) unixbench results (node 0-1): 6: 3748.0000(0.00%) 3516.6000 ( -6.17%) 24: 10683.8000(0.00%) 9172.9000 ( -14.14%) 48: 10652.3000(0.00%) 10726.0000 ( 0.69%) unixbench results (node 0-3): 6: 2918.5000(0.00%) 2904.0000 ( -0.50%) 24: 6647.2000(0.00%) 6730.9000 ( 1.26%) 48: 6243.6000(0.00%) 6209.5000 ( -0.55%) ===== unixbench File Copy 1024 bufsize 2000 maxblocks ===== unixbench results (node 0): 6: 3494.8000(0.00%) 3189.5000 ( -8.74%) 24: 3334.5000(0.00%) 3086.5000 ( -7.44%) 48: 2415.2000(0.00%) 2630.1000 ( 8.90%) unixbench results (node 0-1): 6: 2357.7000(0.00%) 2693.8000 ( 14.26%) 24: 2779.9000(0.00%) 2705.6000 ( -2.67%) 48: 2409.6000(0.00%) 2367.2000 ( -1.76%) unixbench results (node 0-3): 6: 1565.7000(0.00%) 1536.3000 ( -1.88%) 24: 1545.5000(0.00%) 1550.9000 ( 0.35%) 48: 1501.4000(0.00%) 1520.3000 ( 1.26%) ===== unixbench File Copy 256 bufsize 500 maxblocks ===== unixbench results (node 0): 6: 2355.0000(0.00%) 2129.7000 ( -9.57%) 24: 2075.1000(0.00%) 2028.6000 ( -2.24%) 48: 1719.0000(0.00%) 1717.3000 ( -0.10%) unixbench results (node 0-1): 6: 1888.6000(0.00%) 1816.2000 ( -3.83%) 24: 1862.0000(0.00%) 1800.4000 ( -3.31%) 48: 1444.2000(0.00%) 1501.1000 ( 3.94%) unixbench results (node 0-3): 6: 1113.8000(0.00%) 969.0000 ( -13.00%) 24: 984.4000(0.00%) 996.0000 ( 1.18%) 48: 946.0000(0.00%) 955.7000 ( 1.03%) ===== unixbench File Copy 4096 bufsize 8000 maxblocks ===== unixbench results (node 0): 6: 6048.9000(0.00%) 5567.4000 ( -7.96%) 24: 6343.4000(0.00%) 5674.2000 ( -10.55%) 48: 5040.7000(0.00%) 5241.9000 ( 3.99%) unixbench results (node 0-1): 6: 5695.3000(0.00%) 5180.0000 ( -9.05%) 24: 5098.0000(0.00%) 4768.4000 ( -6.47%) 48: 4643.8000(0.00%) 4541.2000 ( -2.21%) unixbench results (node 0-3): 6: 2992.6000(0.00%) 4231.4000 ( 41.40%) 24: 2926.5000(0.00%) 2853.9000 ( -2.48%) 48: 2718.4000(0.00%) 2703.7000 ( -0.54%) ===== unixbench Pipe Throughput ===== unixbench results (node 0): 6: 5819.5000(0.00%) 5845.3000 ( 0.44%) 24: 23273.3000(0.00%) 23314.9000 ( 0.18%) 48: 23316.0000(0.00%) 23323.7000 ( 0.03%) unixbench results (node 0-1): 6: 5835.2000(0.00%) 5843.8000 ( 0.15%) 24: 23278.5000(0.00%) 23376.6000 ( 0.42%) 48: 46502.1000(0.00%) 46638.4000 ( 0.29%) unixbench results (node 0-3): 6: 5827.9000(0.00%) 5843.2000 ( 0.26%) 24: 23304.2000(0.00%) 23328.7000 ( 0.11%) 48: 46608.1000(0.00%) 46665.3000 ( 0.12%) ===== unixbench Pipe-based Context Switching ===== unixbench results (node 0): 6: 2330.2000(0.00%) 2589.9000 ( 11.14%) 24: 10905.0000(0.00%) 10840.2000 ( -0.59%) 48: 8473.8000(0.00%) 8459.3000 ( -0.17%) unixbench results (node 0-1): 6: 2424.4000(0.00%) 2574.2000 ( 6.18%) 24: 8457.5000(0.00%) 10015.3000 ( 18.42%) 48: 19092.4000(0.00%) 17770.4000 ( -6.92%) unixbench results (node 0-3): 6: 2365.6000(0.00%) 2585.7000 ( 9.30%) 24: 9125.8000(0.00%) 10219.2000 ( 11.98%) 48: 10861.7000(0.00%) 10656.3000 ( -1.89%) ===== unixbench Process Creation ===== unixbench results (node 0): 6: 2541.7000(0.00%) 2642.1000 ( 3.95%) 24: 6289.2000(0.00%) 6303.7000 ( 0.23%) 48: 6726.1000(0.00%) 6618.9000 ( -1.59%) unixbench results (node 0-1): 6: 2252.1000(0.00%) 2196.6000 ( -2.46%) 24: 5883.7000(0.00%) 5915.0000 ( 0.53%) 48: 7071.9000(0.00%) 7076.5000 ( 0.07%) unixbench results (node 0-3): 6: 1684.1000(0.00%) 1769.6000 ( 5.08%) 24: 4107.7000(0.00%) 4123.8000 ( 0.39%) 48: 4453.4000(0.00%) 4371.0000 ( -1.85%) ===== unixbench Shell Scripts (1 concurrent) ===== unixbench results (node 0): 6: 8748.0000(0.00%) 8686.8000 ( -0.70%) 24: 20378.0000(0.00%) 20350.6000 ( -0.13%) 48: 20197.5000(0.00%) 20047.7000 ( -0.74%) unixbench results (node 0-1): 6: 8265.6000(0.00%) 8115.8000 ( -1.81%) 24: 25387.6000(0.00%) 25443.6000 ( 0.22%) 48: 32417.7000(0.00%) 31579.4000 ( -2.59%) unixbench results (node 0-3): 6: 6963.4000(0.00%) 6963.7000 ( 0.00%) 24: 20347.2000(0.00%) 20397.7000 ( 0.25%) 48: 23783.0000(0.00%) 23854.7000 ( 0.30%) ===== unixbench Shell Scripts (8 concurrent) ===== unixbench results (node 0): 6: 19852.4000(0.00%) 19829.3000 ( -0.12%) 24: 19548.3000(0.00%) 19434.7000 ( -0.58%) 48: 19321.0000(0.00%) 19366.3000 ( 0.23%) unixbench results (node 0-1): 6: 24136.5000(0.00%) 23653.4000 ( -2.00%) 24: 31973.2000(0.00%) 31187.2000 ( -2.46%) 48: 31769.6000(0.00%) 30218.6000 ( -4.88%) unixbench results (node 0-3): 6: 18668.3000(0.00%) 18696.6000 ( 0.15%) 24: 21164.6000(0.00%) 21599.2000 ( 2.05%) 48: 18580.7000(0.00%) 18964.8000 ( 2.07%) ===== unixbench System Call Overhead ===== unixbench results (node 0): 6: 2236.9000(0.00%) 2057.4000 ( -8.02%) 24: 2907.7000(0.00%) 2910.1000 ( 0.08%) 48: 2919.5000(0.00%) 2921.1000 ( 0.05%) unixbench results (node 0-1): 6: 1106.3000(0.00%) 1016.8000 ( -8.09%) 24: 1186.4000(0.00%) 1178.5000 ( -0.67%) 48: 1215.7000(0.00%) 1212.5000 ( -0.26%) unixbench results (node 0-3): 6: 1363.0000(0.00%) 1082.2000 ( -20.60%) 24: 1569.9000(0.00%) 1457.7000 ( -7.15%) 48: 1487.5000(0.00%) 1456.4000 ( -2.09%) Perform the fio read test using bs 64k, iodepth 128, and numjobs 32 64 128. BW (MiB/s): 32 1077(0.00%) 1130(4.92%) 64 1077(0.00%) 1077(0.00%) 128 1080(0.00%) 1080.6(0.05%) IOPS (k): 32 17.2(0.00%) 17.2(0.00%) 64 17.2(0.00%) 17.2(0.00%) 128 17.3(0.00%) 17.3(0.00%) The lmbench test with threads being 32 is performed and the memory frequency is 2599 Mhz. Processor, Processes - times in microseconds - smaller is better null call 0.18(0.00%) 0.18( 0.00%) null I/O 4.785(0.00%) 4.35( 9.09%) stat 53.55(0.00%) 52.05( 2.80%) open clos 104.5(0.00%) 105.5(-0.96%) slct TCP 2.6425(0.00%) 2.645(-0.09%) sig inst 0.2525(0.00%) 0.25( 0.99%) sig hndl 2.285(0.00%) 2.2825( 0.11%) fork proc 1359.75(0.00%) 1380(-1.49%) exec proc 3625.25(0.00%) 3599.25( 0.72%) sh proc 5292.5(0.00%) 5288.75( 0.07%) Context switching - times in microseconds - smaller is better 2p/0K 2.7775(0.00%) 2.1575(22.32%) 2p/16K 2.9825(0.00%) 2.67(10.48%) 2p/64K 3.32(0.00%) 3.005( 9.49%) 8p/16K 8.2275(0.00%) 4.4025(46.49%) 8p/64K 10.1175(0.00%) 5.8775(41.91%) 16p/16K 8.92(0.00%) 5.8275(34.67%) 16p/64K 12.275(0.00%) 9.075(26.07%) *Local* Communication latencies in microseconds - smaller is better 2p/0K 2.7775(0.00%) 2.1575(22.32%) Pipe 6.98525(0.00%) 6.00875(13.98%) AF UNIX 7.6475(0.00%) 6.955( 9.06%) RPC/UDP 277.675(0.00%) 275.3( 0.86%) TCP 19.825(0.00%) 18.575( 6.31%) RPC/TCP 330.125(0.00%) 293.35(11.14%) File & VM system latencies in microseconds - smaller is better Mmap Latency 569.125(0.00%) 539.7( 5.17%) Prot Fault 0.529667(0.00%) 0.38525(27.27%) Page Fault 0.611225(0.00%) 0.6444(-5.43%) 100fd selct 1.16175(0.00%) 1.16275(-0.09%) *Local* Communication bandwidths in MB/s - bigger is better Pipe 71.25(0.00%) 76.5( 7.37%) AF UNIX 78.75(0.00%) 79.25( 0.63%) TCP 67.25(0.00%) 70.75( 5.20%) File reread 87.15(0.00%) 96.625(10.87%) Map reread 110.9(0.00%) 114.875( 3.58%) Bcopy libc 49.75(0.00%) 53.525( 7.59%) Bcopy hand 52.9(0.00%) 52.175(-1.37%) Mem read 102.75(0.00%) 118.75(15.57%) Mem write 44.175(0.00%) 51.25(16.02%) Link:https://gitee.com/openeuler/kernel/pulls/169 Reviewed-by: Liu Chao <liuchao173@huawei.com> Reviewed-by: Fred Kimmy <xweikong@163.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @svishen backport the patch community has solved a page_pool problem. issue: https://gitee.com/openeuler/kernel/issues/I718LV Link:https://gitee.com/openeuler/kernel/pulls/678 Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @zhangjialin11 Pull new CVEs: CVE-2022-4382 CVE-2023-0458 CVE-2023-2269 CVE-2023-2483 CVE-2023-31436 CVE-2023-2194 CVE-2023-2166 CVE-2023-2176 CVE-2023-2007 fs bugfixes from Baokun Li bpf bugfixes from Liu Jian Link:https://gitee.com/openeuler/kernel/pulls/724 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 Alan Stern 提交于
mainline inclusion from mainline-v6.2-rc5 commit d18dcfe9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I66IZK CVE: CVE-2022-4382 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d18dcfe9860e842f394e37ba01ca9440ab2178f4 ---------------------------------------------------------------------- The syzbot fuzzer and Gerald Lee have identified a use-after-free bug in the gadgetfs driver, involving processes concurrently mounting and unmounting the gadgetfs filesystem. In particular, gadgetfs_fill_super() can race with gadgetfs_kill_sb(), causing the latter to deallocate the_device while the former is using it. The output from KASAN says, in part: BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:102 [inline] BUG: KASAN: use-after-free in atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:176 [inline] BUG: KASAN: use-after-free in __refcount_sub_and_test include/linux/refcount.h:272 [inline] BUG: KASAN: use-after-free in __refcount_dec_and_test include/linux/refcount.h:315 [inline] BUG: KASAN: use-after-free in refcount_dec_and_test include/linux/refcount.h:333 [inline] BUG: KASAN: use-after-free in put_dev drivers/usb/gadget/legacy/inode.c:159 [inline] BUG: KASAN: use-after-free in gadgetfs_kill_sb+0x33/0x100 drivers/usb/gadget/legacy/inode.c:2086 Write of size 4 at addr ffff8880276d7840 by task syz-executor126/18689 CPU: 0 PID: 18689 Comm: syz-executor126 Not tainted 6.1.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> ... atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:176 [inline] __refcount_sub_and_test include/linux/refcount.h:272 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] put_dev drivers/usb/gadget/legacy/inode.c:159 [inline] gadgetfs_kill_sb+0x33/0x100 drivers/usb/gadget/legacy/inode.c:2086 deactivate_locked_super+0xa7/0xf0 fs/super.c:332 vfs_get_super fs/super.c:1190 [inline] get_tree_single+0xd0/0x160 fs/super.c:1207 vfs_get_tree+0x88/0x270 fs/super.c:1531 vfs_fsconfig_locked fs/fsopen.c:232 [inline] The simplest solution is to ensure that gadgetfs_fill_super() and gadgetfs_kill_sb() are serialized by making them both acquire a new mutex. Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: syzbot+33d7ad66d65044b93f16@syzkaller.appspotmail.com Reported-and-tested-by: NGerald Lee <sundaywind2004@gmail.com> Link: https://lore.kernel.org/linux-usb/CAO3qeMVzXDP-JU6v1u5Ags6Q-bb35kg3=C6d04DjzA9ffa5x1g@mail.gmail.com/ Fixes: e5d82a73 ("vfs: Convert gadgetfs to use the new mount API") CC: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/Y6XCPXBpn3tmjdCC@rowland.harvard.eduSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZhangPeng <zhangpeng362@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Xia Fukun 提交于
stable inclusion from stable-v5.10.165 commit 9f8e45720e0e7edb661d0082422f662ed243d8d8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6Z6SU?from=project-issue CVE: CVE-2023-0458 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=9f8e45720e0e7edb661d0082422f662ed243d8d8 -------------------------------- [ Upstream commit 73979060 ] do_prlimit() adds the user-controlled resource value to a pointer that will subsequently be dereferenced. In order to help prevent this codepath from being used as a spectre "gadget" a barrier needs to be added after checking the range. Reported-by: NJordy Zomer <jordyzomer@google.com> Tested-by: NJordy Zomer <jordyzomer@google.com> Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NXia Fukun <xiafukun@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Baokun Li 提交于
maillist inclusion category: bugfix bugzilla: 188724, https://gitee.com/openeuler/kernel/issues/I70Q22 Reference: https://www.spinics.net/lists/kernel/msg4779681.html ---------------------------------------- When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may fail for some reason (e.g. memory allocation failure, bare disk write), and later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4 iomap_begin() returns an error, it is normal that the type of iomap->type may not match the expectation. Therefore, we only determine if iomap->type is as expected when ext4_iomap_begin() is executed successfully. Reported-by: syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.comReviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Mike Snitzer 提交于
mainline inclusion from mainline-v6.4-rc1 commit 3d32aaa7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6YQZS CVE: CVE-2023-2269 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=3d32aaa7e66d5c1479a3c31d6c2c5d45dd0d3b89 ---------------------------------------- syzkaller found the following problematic rwsem locking (with write lock already held): down_read+0x9d/0x450 kernel/locking/rwsem.c:1509 dm_get_inactive_table+0x2b/0xc0 drivers/md/dm-ioctl.c:773 __dev_status+0x4fd/0x7c0 drivers/md/dm-ioctl.c:844 table_clear+0x197/0x280 drivers/md/dm-ioctl.c:1537 In table_clear, it first acquires a write lock https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L1520 down_write(&_hash_lock); Then before the lock is released at L1539, there is a path shown above: table_clear -> __dev_status -> dm_get_inactive_table -> down_read https://elixir.bootlin.com/linux/v6.2/source/drivers/md/dm-ioctl.c#L773 down_read(&_hash_lock); It tries to acquire the same read lock again, resulting in the deadlock problem. Fix this by moving table_clear()'s __dev_status() call to after its up_write(&_hash_lock); Cc: stable@vger.kernel.org Reported-by: NZheng Zhang <zheng.zhang@email.ucr.edu> Signed-off-by: NMike Snitzer <snitzer@kernel.org> Conflicts: drivers/md/dm-ioctl.c Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Zheng Wang 提交于
mainline inclusion from mainline-v6.3-rc4 commit 6b6bc5b8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6ZWOL CVE: CVE-2023-2483 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6b6bc5b8bd2d4ca9e1efa9ae0f98a0b0687ace75 --------------------------- In emac_probe, &adpt->work_thread is bound with emac_work_thread. Then it will be started by timeout handler emac_tx_timeout or a IRQ handler emac_isr. If we remove the driver which will call emac_remove to make cleanup, there may be a unfinished work. The possible sequence is as follows: Fix it by finishing the work before cleanup in the emac_remove and disable timeout response. CPU0 CPU1 |emac_work_thread emac_remove | free_netdev | kfree(netdev); | |emac_reinit_locked |emac_mac_down |//use netdev Fixes: b9b17deb ("net: emac: emac gigabit ethernet controller driver") Signed-off-by: NZheng Wang <zyytlz.wz@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> (cherry picked from commit 6b6bc5b8) Signed-off-by: NLiu Jian <liujian56@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Gwangun Jung 提交于
stable inclusion from stable-v5.10.179 commit ddcf35deb8f2a1d9addc74b586cf4c5a1f5d6020 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6ZISA CVE: CVE-2023-31436 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ddcf35deb8f2a1d9addc74b586cf4c5a1f5d6020 -------------------------------- [ Upstream commit 30379334 ] If the TCA_QFQ_LMAX value is not offered through nlattr, lmax is determined by the MTU value of the network device. The MTU of the loopback device can be set up to 2^31-1. As a result, it is possible to have an lmax value that exceeds QFQ_MIN_LMAX. Due to the invalid lmax value, an index is generated that exceeds the QFQ_MAX_INDEX(=24) value, causing out-of-bounds read/write errors. The following reports a oob access: [ 84.582666] BUG: KASAN: slab-out-of-bounds in qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.583267] Read of size 4 at addr ffff88810f676948 by task ping/301 [ 84.583686] [ 84.583797] CPU: 3 PID: 301 Comm: ping Not tainted 6.3.0-rc5 #1 [ 84.584164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 84.584644] Call Trace: [ 84.584787] <TASK> [ 84.584906] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) [ 84.585108] print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) [ 84.585570] kasan_report (mm/kasan/report.c:538) [ 84.585988] qfq_activate_agg.constprop.0 (net/sched/sch_qfq.c:1027 net/sched/sch_qfq.c:1060 net/sched/sch_qfq.c:1313) [ 84.586599] qfq_enqueue (net/sched/sch_qfq.c:1255) [ 84.587607] dev_qdisc_enqueue (net/core/dev.c:3776) [ 84.587749] __dev_queue_xmit (./include/net/sch_generic.h:186 net/core/dev.c:3865 net/core/dev.c:4212) [ 84.588763] ip_finish_output2 (./include/net/neighbour.h:546 net/ipv4/ip_output.c:228) [ 84.589460] ip_output (net/ipv4/ip_output.c:430) [ 84.590132] ip_push_pending_frames (./include/net/dst.h:444 net/ipv4/ip_output.c:126 net/ipv4/ip_output.c:1586 net/ipv4/ip_output.c:1606) [ 84.590285] raw_sendmsg (net/ipv4/raw.c:649) [ 84.591960] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.592084] __sys_sendto (net/socket.c:2142) [ 84.593306] __x64_sys_sendto (net/socket.c:2150) [ 84.593779] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.593902] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.594070] RIP: 0033:0x7fe568032066 [ 84.594192] Code: 0e 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c09[ 84.594796] RSP: 002b:00007ffce388b4e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c Code starting with the faulting instruction =========================================== [ 84.595047] RAX: ffffffffffffffda RBX: 00007ffce388cc70 RCX: 00007fe568032066 [ 84.595281] RDX: 0000000000000040 RSI: 00005605fdad6d10 RDI: 0000000000000003 [ 84.595515] RBP: 00005605fdad6d10 R08: 00007ffce388eeec R09: 0000000000000010 [ 84.595749] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 84.595984] R13: 00007ffce388cc30 R14: 00007ffce388b4f0 R15: 0000001d00000001 [ 84.596218] </TASK> [ 84.596295] [ 84.596351] Allocated by task 291: [ 84.596467] kasan_save_stack (mm/kasan/common.c:46) [ 84.596597] kasan_set_track (mm/kasan/common.c:52) [ 84.596725] __kasan_kmalloc (mm/kasan/common.c:384) [ 84.596852] __kmalloc_node (./include/linux/kasan.h:196 mm/slab_common.c:967 mm/slab_common.c:974) [ 84.596979] qdisc_alloc (./include/linux/slab.h:610 ./include/linux/slab.h:731 net/sched/sch_generic.c:938) [ 84.597100] qdisc_create (net/sched/sch_api.c:1244) [ 84.597222] tc_modify_qdisc (net/sched/sch_api.c:1680) [ 84.597357] rtnetlink_rcv_msg (net/core/rtnetlink.c:6174) [ 84.597495] netlink_rcv_skb (net/netlink/af_netlink.c:2574) [ 84.597627] netlink_unicast (net/netlink/af_netlink.c:1340 net/netlink/af_netlink.c:1365) [ 84.597759] netlink_sendmsg (net/netlink/af_netlink.c:1942) [ 84.597891] sock_sendmsg (net/socket.c:724 net/socket.c:747) [ 84.598016] ____sys_sendmsg (net/socket.c:2501) [ 84.598147] ___sys_sendmsg (net/socket.c:2557) [ 84.598275] __sys_sendmsg (./include/linux/file.h:31 net/socket.c:2586) [ 84.598399] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 84.598520] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 84.598688] [ 84.598744] The buggy address belongs to the object at ffff88810f674000 [ 84.598744] which belongs to the cache kmalloc-8k of size 8192 [ 84.599135] The buggy address is located 2664 bytes to the right of [ 84.599135] allocated 7904-byte region [ffff88810f674000, ffff88810f675ee0) [ 84.599544] [ 84.599598] The buggy address belongs to the physical page: [ 84.599777] page:00000000e638567f refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10f670 [ 84.600074] head:00000000e638567f order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 84.600330] flags: 0x200000000010200(slab|head|node=0|zone=2) [ 84.600517] raw: 0200000000010200 ffff888100043180 dead000000000122 0000000000000000 [ 84.600764] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000 [ 84.601009] page dumped because: kasan: bad access detected [ 84.601187] [ 84.601241] Memory state around the buggy address: [ 84.601396] ffff88810f676800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601620] ffff88810f676880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.601845] >ffff88810f676900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602069] ^ [ 84.602243] ffff88810f676980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602468] ffff88810f676a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.602693] ================================================================== [ 84.602924] Disabling lock debugging due to kernel taint Fixes: 3015f3d2 ("pkt_sched: enable QFQ to support TSO/GSO") Reported-by: NGwangun Jung <exsociety@gmail.com> Signed-off-by: NGwangun Jung <exsociety@gmail.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Wei Chen 提交于
mainline inclusion from mainline-v6.3-rc4 commit 92fbb6d1 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6XHPL CVE: CVE-2023-2194 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=92fbb6d1296f81f41f65effd7f5f8c0f74943d15 -------------------------------- The data->block[0] variable comes from user and is a number between 0-255. Without proper check, the variable may be very large to cause an out-of-bounds when performing memcpy in slimpro_i2c_blkwr. Fix this bug by checking the value of writelen. Fixes: f6505fba ("i2c: add SLIMpro I2C device driver on APM X-Gene platform") Signed-off-by: NWei Chen <harperchen1110@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: NAndi Shyti <andi.shyti@kernel.org> Signed-off-by: NWolfram Sang <wsa@kernel.org> Signed-off-by: NYang Jihong <yangjihong1@huawei.com> Reviewed-by: NZheng Yejian <zhengyejian1@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Baokun Li 提交于
maillist inclusion category: bugfix bugzilla: 188499, https://gitee.com/openeuler/kernel/issues/I6TNVT CVE: NA Reference: https://patchwork.ozlabs.org/project/linux-ext4/patch/20230412124126.2286716-2-libaokun1@huawei.com/ ---------------------------------------- In our fault injection test, we create an ext4 file, migrate it to non-extent based file, then punch a hole and finally trigger a WARN_ON in the ext4_da_update_reserve_space(): EXT4-fs warning (device sda): ext4_da_update_reserve_space:369: ino 14, used 11 with only 10 reserved data blocks When writing back a non-extent based file, if we enable delalloc, the number of reserved blocks will be subtracted from the number of blocks mapped by ext4_ind_map_blocks(), and the extent status tree will be updated. We update the extent status tree by first removing the old extent_status and then inserting the new extent_status. If the block range we remove happens to be in an extent, then we need to allocate another extent_status with ext4_es_alloc_extent(). use old to remove to add new |----------|------------|------------| old extent_status The problem is that the allocation of a new extent_status failed due to a fault injection, and __es_shrink() did not get free memory, resulting in a return of -ENOMEM. Then do_writepages() retries after receiving -ENOMEM, we map to the same extent again, and the number of reserved blocks is again subtracted from the number of blocks in that extent. Since the blocks in the same extent are subtracted twice, we end up triggering WARN_ON at ext4_da_update_reserve_space() because used > ei->i_reserved_data_blocks. For non-extent based file, we update the number of reserved blocks after ext4_ind_map_blocks() is executed, which causes a problem that when we call ext4_ind_map_blocks() to create a block, it doesn't always create a block, but we always reduce the number of reserved blocks. So we move the logic for updating reserved blocks to ext4_ind_map_blocks() to ensure that the number of reserved blocks is updated only after we do succeed in allocating some new blocks. Fixes: 5f634d06 ("ext4: Fix quota accounting error with fallocate") Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Oliver Hartkopp 提交于
stable inclusion from stable-v5.10.159 commit c42221efb1159d6a3c89e96685ee38acdce86b6f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6WUDS CVE: CVE-2023-2166 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c42221efb1159d6a3c89e96685ee38acdce86b6f -------------------------------- commit 0acc4423 upstream. Analogue to commit 8aa59e35 ("can: af_can: fix NULL pointer dereference in can_rx_register()") we need to check for a missing initialization of ml_priv in the receive path of CAN frames. Since commit 4e096a18 ("net: introduce CAN specific pointer in the struct net_device") the check for dev->type to be ARPHRD_CAN is not sufficient anymore since bonding or tun netdevices claim to be CAN devices but do not initialize ml_priv accordingly. Fixes: 4e096a18 ("net: introduce CAN specific pointer in the struct net_device") Reported-by: syzbot+2d7f58292cb5b29eb5ad@syzkaller.appspotmail.com Reported-by: NWei Chen <harperchen1110@gmail.com> Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20221206201259.3028-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Patrisious Haddad 提交于
mainline inclusion from mainline-v6.3-rc1 commit 8d037973 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6X49E CVE: CVE-2023-2176 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d037973d48c026224ab285e6a06985ccac6f7bf --------------------------- Refactor rdma_bind_addr function so that it doesn't require that the cma destination address be changed before calling it. So now it will update the destination address internally only when it is really needed and after passing all the required checks. Which in turn results in a cleaner and more sensible call and error handling flows for the functions that call it directly or indirectly. Signed-off-by: NPatrisious Haddad <phaddad@nvidia.com> Reported-by: NWei Chen <harperchen1110@gmail.com> Reviewed-by: NMark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/3d0e9a2fd62bc10ba02fed1c7c48a48638952320.1672819273.git.leonro@nvidia.comSigned-off-by: NLeon Romanovsky <leon@kernel.org> (cherry picked from commit 8d037973) Signed-off-by: NLiu Jian <liujian56@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Jason Gunthorpe 提交于
mainline inclusion from mainline-v5.15-rc4 commit 305d568b category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6X49E CVE: CVE-2023-2176 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=305d568b72f17f674155a2a8275f865f207b3808 --------------------------- The FSM can run in a circle allowing rdma_resolve_ip() to be called twice on the same id_priv. While this cannot happen without going through the work, it violates the invariant that the same address resolution background request cannot be active twice. CPU 1 CPU 2 rdma_resolve_addr(): RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY rdma_resolve_ip(addr_handler) #1 process_one_req(): for #1 addr_handler(): RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND mutex_unlock(&id_priv->handler_mutex); [.. handler still running ..] rdma_resolve_addr(): RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY rdma_resolve_ip(addr_handler) !! two requests are now on the req_list rdma_destroy_id(): destroy_id_handler_unlock(): _destroy_id(): cma_cancel_operation(): rdma_addr_cancel() // process_one_req() self removes it spin_lock_bh(&lock); cancel_delayed_work(&req->work); if (!list_empty(&req->list)) == true ! rdma_addr_cancel() returns after process_on_req #1 is done kfree(id_priv) process_one_req(): for #2 addr_handler(): mutex_lock(&id_priv->handler_mutex); !! Use after free on id_priv rdma_addr_cancel() expects there to be one req on the list and only cancels the first one. The self-removal behavior of the work only happens after the handler has returned. This yields a situations where the req_list can have two reqs for the same "handle" but rdma_addr_cancel() only cancels the first one. The second req remains active beyond rdma_destroy_id() and will use-after-free id_priv once it inevitably triggers. Fix this by remembering if the id_priv has called rdma_resolve_ip() and always cancel before calling it again. This ensures the req_list never gets more than one item in it and doesn't cost anything in the normal flow that never uses this strange error path. Link: https://lore.kernel.org/r/0-v1-3bc675b8006d+22-syz_cancel_uaf_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: e51060f0 ("IB: IP address based RDMA connection manager") Reported-by: syzbot+dc3dfba010d7671e05f5@syzkaller.appspotmail.com Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> (cherry picked from commit 305d568b) Signed-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: drivers/infiniband/core/cma_priv.h Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Arnd Bergmann 提交于
mainline inclusion from mainline-v6.0-rc1~14 commit b04e75a4 category: bugfix bugzilla: 188707, https://gitee.com/src-openeuler/kernel/issues/I6VK2F CVE: CVE-2023-2007 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b04e75a4a8a81887386a0d2dbf605a48e779d2a0 ---------------------------------------- The dpt_i2o driver was fixed to stop using virt_to_bus() in 2008, but it still has a stale reference in an error handling code path that could never work. I submitted a patch to fix this reference earlier, but Hannes Reinecke suggested that removing the driver may be just as good here. The i2o driver layer was removed in 2015 with commit 4a72a7af ("staging: remove i2o subsystem"), but the even older dpt_i2o scsi driver stayed around. The last non-cleanup patches I could find were from Miquel van Smoorenburg and Mark Salyzyn back in 2008, they might know if there is any chance of the hardware still being used anywhere. Link: https://lore.kernel.org/linux-scsi/CAK8P3a1XfwkTOV7qOs1fTxf4vthNBRXKNu8A5V7TWnHT081NGA@mail.gmail.com/T/ Link: https://lore.kernel.org/r/20220624155226.2889613-3-arnd@kernel.org Cc: Miquel van Smoorenburg <mikevs@xs4all.net> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Baokun Li 提交于
mainline inclusion from mainline-v6.3-rc8 commit 1ba1199e category: bugfix bugzilla: 188601, https://gitee.com/openeuler/kernel/issues/I6TNTC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ba1199ec5747f475538c0d25a32804e5ba1dfde -------------------------------- KASAN report null-ptr-deref: ================================================================== BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0 Write of size 8 at addr 0000000000000000 by task sync/943 CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461 Call Trace: <TASK> dump_stack_lvl+0x7f/0xc0 print_report+0x2ba/0x340 kasan_report+0xc4/0x120 kasan_check_range+0x1b7/0x2e0 __kasan_check_write+0x24/0x40 bdi_split_work_to_wbs+0x5c5/0x7b0 sync_inodes_sb+0x195/0x630 sync_inodes_one_sb+0x3a/0x50 iterate_supers+0x106/0x1b0 ksys_sync+0x98/0x160 [...] ================================================================== The race that causes the above issue is as follows: cpu1 cpu2 -------------------------|------------------------- inode_switch_wbs INIT_WORK(&isw->work, inode_switch_wbs_work_fn) queue_rcu_work(isw_wq, &isw->work) // queue_work async inode_switch_wbs_work_fn wb_put_many(old_wb, nr_switched) percpu_ref_put_many ref->data->release(ref) cgwb_release queue_work(cgwb_release_wq, &wb->release_work) // queue_work async &wb->release_work cgwb_release_workfn ksys_sync iterate_supers sync_inodes_one_sb sync_inodes_sb bdi_split_work_to_wbs kmalloc(sizeof(*work), GFP_ATOMIC) // alloc memory failed percpu_ref_exit ref->data = NULL kfree(data) wb_get(wb) percpu_ref_get(&wb->refcnt) percpu_ref_get_many(ref, 1) atomic_long_add(nr, &ref->data->count) atomic64_add(i, v) // trigger null-ptr-deref bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all wbs. If the allocation of new work fails, the on-stack fallback will be used and the reference count of the current wb is increased afterwards. If cgroup writeback membership switches occur before getting the reference count and the current wb is released as old_wd, then calling wb_get() or wb_put() will trigger the null pointer dereference above. This issue was introduced in v4.3-rc7 (see fix tag1). Both sync_inodes_sb() and __writeback_inodes_sb_nr() calls to bdi_split_work_to_wbs() can trigger this issue. For scenarios called via sync_inodes_sb(), originally commit 7fc5854f ("writeback: synchronize sync(2) against cgroup writeback membership switches") reduced the possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io, thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(), and the issue becomes easily reproducible again. To solve this problem, percpu_ref_exit() is called under RCU protection to avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs(). Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(), and skip the current wb if wb_tryget() fails because the wb has already been shutdown. Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com Fixes: b817525a ("writeback: bdi_writeback iteration must not skip dying ones") Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NTejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Brauner <brauner@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Hou Tao <houtao1@huawei.com> Cc: yangerkun <yangerkun@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: mm/backing-dev.c Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Liu Jian 提交于
mainline inclusion from mainline-v6.3-rc2 commit d900f3d2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65HYE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d900f3d20cc3169ce42ec72acc850e662a4d4db2 --------------------------- When the buffer length of the recvmsg system call is 0, we got the flollowing soft lockup problem: watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149] CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:remove_wait_queue+0xb/0xc0 Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20 RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768 RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040 RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7 R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800 R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0 FS: 00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_msg_wait_data+0x279/0x2f0 tcp_bpf_recvmsg_parser+0x3c6/0x490 inet_recvmsg+0x280/0x290 sock_recvmsg+0xfc/0x120 ____sys_recvmsg+0x160/0x3d0 ___sys_recvmsg+0xf0/0x180 __sys_recvmsg+0xea/0x1a0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The logic in tcp_bpf_recvmsg_parser is as follows: msg_bytes_ready: copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { wait data; goto msg_bytes_ready; } In this case, "copied" always is 0, the infinite loop occurs. According to the Linux system call man page, 0 should be returned in this case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly return. Also modify several other functions with the same problem. Fixes: 1f5be6b3 ("udp: Implement udp_bpf_recvmsg() for sockmap") Fixes: 9825d866 ("af_unix: Implement unix_dgram_bpf_recvmsg()") Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: NLiu Jian <liujian56@huawei.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.comSigned-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: net/ipv4/udp_bpf.c net/unix/unix_bpf.c Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 John Fastabend 提交于
mainline inclusion from mainline-v5.17-rc1 commit 218d747a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65HYE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=218d747a4142f281a256687bb513a135c905867b --------------------------- sock_map_link() is called to update a sockmap entry with a sk. But, if the sock_map_init_proto() call fails then we return an error to the map_update op against the sockmap. In the error path though we need to cleanup psock and dec the refcnt on any programs associated with the map, because we refcnt them early in the update process to ensure they are pinned for the psock. (This avoids a race where user deletes programs while also updating the map with new socks.) In current code we do the prog refcnt dec explicitely by calling bpf_prog_put() when the program was found in the map. But, after commit '38207a5e' in this error path we've already done the prog to psock assignment so the programs have a reference from the psock as well. This then causes the psock tear down logic, invoked by sk_psock_put() in the error path, to similarly call bpf_prog_put on the programs there. To be explicit this logic does the prog->psock assignment: if (msg_*) psock_set_prog(...) Then the error path under the out_progs label does a similar check and dec with: if (msg_*) bpf_prog_put(...) And the teardown logic sk_psock_put() does ... psock_set_prog(msg_*, NULL) ... triggering another bpf_prog_put(...). Then KASAN gives us this splat, found by syzbot because we've created an inbalance between bpf_prog_inc and bpf_prog_put calling put twice on the program. BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] BUG: KASAN: vmalloc-out-of-bounds in __bpf_prog_put kernel/bpf/syscall.c:1812 [inline] kernel/bpf/syscall.c:1829 BUG: KASAN: vmalloc-out-of-bounds in bpf_prog_put+0x8c/0x4f0 kernel/bpf/syscall.c:1829 kernel/bpf/syscall.c:1829 Read of size 8 at addr ffffc90000e76038 by task syz-executor020/3641 To fix clean up error path so it doesn't try to do the bpf_prog_put in the error path once progs are assigned then it relies on the normal psock tear down logic to do complete cleanup. For completness we also cover the case whereh sk_psock_init_strp() fails, but this is not expected because it indicates an incorrect socket type and should be caught earlier. Fixes: 38207a5e ("bpf, sockmap: Attach map progs to psock early for feature probes") Reported-by: syzbot+bb73e71cf4b8fd376a4f@syzkaller.appspotmail.com Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220104214645.290900-1-john.fastabend@gmail.com (cherry picked from commit 218d747a) Signed-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: net/core/sock_map.c Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 John Fastabend 提交于
mainline inclusion from mainline-v5.16-rc5 commit c0d95d33 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65HYE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c0d95d3380ee099d735e08618c0d599e72f6c8b0 --------------------------- When a sock is added to a sock map we evaluate what proto op hooks need to be used. However, when the program is removed from the sock map we have not been evaluating if that changes the required program layout. Before the patch listed in the 'fixes' tag this was not causing failures because the base program set handles all cases. Specifically, the case with a stream parser and the case with out a stream parser are both handled. With the fix below we identified a race when running with a proto op that attempts to read skbs off both the stream parser and the skb->receive_queue. Namely, that a race existed where when the stream parser is empty checking the skb->receive_queue from recvmsg at the precies moment when the parser is paused and the receive_queue is not empty could result in skipping the stream parser. This may break a RX policy depending on the parser to run. The fix tag then loads a specific proto ops that resolved this race. But, we missed removing that proto ops recv hook when the sock is removed from the sockmap. The result is the stream parser is stopped so no more skbs will be aggregated there, but the hook and BPF program continues to be attached on the psock. User space will then get an EBUSY when trying to read the socket because the recvmsg() handler is now waiting on a stopped stream parser. To fix we rerun the proto ops init() function which will look at the new set of progs attached to the psock and rest the proto ops hook to the correct handlers. And in the above case where we remove the sock from the sock map the RX prog will no longer be listed so the proto ops is removed. Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211119181418.353932-3-john.fastabend@gmail.com (cherry picked from commit c0d95d33) Signed-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: net/core/skmsg.c Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 John Fastabend 提交于
mainline inclusion from mainline-v5.16-rc5 commit 38207a5e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65HYE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=38207a5e81230d6ffbdd51e5fa5681be5116dcae --------------------------- When a TCP socket is added to a sock map we look at the programs attached to the map to determine what proto op hooks need to be changed. Before the patch in the 'fixes' tag there were only two categories -- the empty set of programs or a TX policy. In any case the base set handled the receive case. After the fix we have an optimized program for receive that closes a small, but possible, race on receive. This program is loaded only when the map the psock is being added to includes a RX policy. Otherwise, the race is not possible so we don't need to handle the race condition. In order for the call to sk_psock_init() to correctly evaluate the above conditions all progs need to be set in the psock before the call. However, in the current code this is not the case. We end up evaluating the requirements on the old prog state. If your psock is attached to multiple maps -- for example a tx map and rx map -- then the second update would pull in the correct maps. But, the other pattern with a single rx enabled map the correct receive hooks are not used. The result is the race fixed by the patch in the fixes tag below may still be seen in this case. To fix we simply set all psock->progs before doing the call into sock_map_init(). With this the init() call gets the full list of programs and chooses the correct proto ops on the first iteration instead of requiring the second update to pull them in. This fixes the race case when only a single map is used. Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211119181418.353932-2-john.fastabend@gmail.com (cherry picked from commit 38207a5e) Signed-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: net/core/sock_map.c Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 John Fastabend 提交于
mainline inclusion from mainline-v5.17-rc1 commit 5b2c5540 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65HYE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b2c5540b8110eea0d67a78fb0ddb9654c58daeb --------------------------- Applications can be confused slightly because we do not always return the same error code as expected, e.g. what the TCP stack normally returns. For example on a sock err sk->sk_err instead of returning the sock_error we return EAGAIN. This usually means the application will 'try again' instead of aborting immediately. Another example, when a shutdown event is received we should immediately abort instead of waiting for data when the user provides a timeout. These tend to not be fatal, applications usually recover, but introduces bogus errors to the user or introduces unexpected latency. Before 'c5d2177a' we fell back to the TCP stack when no data was available so we managed to catch many of the cases here, although with the extra latency cost of calling tcp_msg_wait_data() first. To fix lets duplicate the error handling in TCP stack into tcp_bpf so that we get the same error codes. These were found in our CI tests that run applications against sockmap and do longer lived testing, at least compared to test_sockmap that does short-lived ping/pong tests, and in some of our test clusters we deploy. Its non-trivial to do these in a shorter form CI tests that would be appropriate for BPF selftests, but we are looking into it so we can ensure this keeps working going forward. As a preview one idea is to pull in the packetdrill testing which catches some of this. Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220104205918.286416-1-john.fastabend@gmail.com (cherry picked from commit 5b2c5540) Signed-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: net/ipv4/tcp_bpf.c Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 John Fastabend 提交于
mainline inclusion from mainline-v5.16-rc1 commit c5d2177a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65HYE Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c5d2177a72a1659554922728fc407f59950aa929 --------------------------- A socket in a sockmap may have different combinations of programs attached depending on configuration. There can be no programs in which case the socket acts as a sink only. There can be a TX program in this case a BPF program is attached to sending side, but no RX program is attached. There can be an RX program only where sends have no BPF program attached, but receives are hooked with BPF. And finally, both TX and RX programs may be attached. Giving us the permutations: None, Tx, Rx, and TxRx To date most of our use cases have been TX case being used as a fast datapath to directly copy between local application and a userspace proxy. Or Rx cases and TxRX applications that are operating an in kernel based proxy. The traffic in the first case where we hook applications into a userspace application looks like this: AppA redirect AppB Tx <-----------> Rx | | + + TCP <--> lo <--> TCP In this case all traffic from AppA (after 3whs) is copied into the AppB ingress queue and no traffic is ever on the TCP recieive_queue. In the second case the application never receives, except in some rare error cases, traffic on the actual user space socket. Instead the send happens in the kernel. AppProxy socket pool sk0 ------------->{sk1,sk2, skn} ^ | | | | v ingress lb egress TCP TCP Here because traffic is never read off the socket with userspace recv() APIs there is only ever one reader on the sk receive_queue. Namely the BPF programs. However, we've started to introduce a third configuration where the BPF program on receive should process the data, but then the normal case is to push the data into the receive queue of AppB. AppB recv() (userspace) ----------------------- tcp_bpf_recvmsg() (kernel) | | | | | | ingress_msgQ | | | RX_BPF | | | v v sk->receive_queue This is different from the App{A,B} redirect because traffic is first received on the sk->receive_queue. Now for the issue. The tcp_bpf_recvmsg() handler first checks the ingress_msg queue for any data handled by the BPF rx program and returned with PASS code so that it was enqueued on the ingress msg queue. Then if no data exists on that queue it checks the socket receive queue. Unfortunately, this is the same receive_queue the BPF program is reading data off of. So we get a race. Its possible for the recvmsg() hook to pull data off the receive_queue before the BPF hook has a chance to read it. It typically happens when an application is banging on recv() and getting EAGAINs. Until they manage to race with the RX BPF program. To fix this we note that before this patch at attach time when the socket is loaded into the map we check if it needs a TX program or just the base set of proto bpf hooks. Then it uses the above general RX hook regardless of if we have a BPF program attached at rx or not. This patch now extends this check to handle all cases enumerated above, TX, RX, TXRX, and none. And to fix above race when an RX program is attached we use a new hook that is nearly identical to the old one except now we do not let the recv() call skip the RX BPF program. Now only the BPF program pulls data from sk->receive_queue and recv() only pulls data from the ingress msgQ post BPF program handling. With this resolved our AppB from above has been up and running for many hours without detecting any errors. We do this by correlating counters in RX BPF events and the AppB to ensure data is never skipping the BPF program. Selftests, was not able to detect this because we only run them for a short period of time on well ordered send/recvs so we don't get any of the noise we see in real application environments. Fixes: 51199405 ("bpf: skb_verdict, support SK_PASS on RX BPF path") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Tested-by: NJussi Maki <joamaki@gmail.com> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-4-john.fastabend@gmail.com (cherry picked from commit c5d2177a) Signed-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: net/ipv4/tcp_bpf.c Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 09 5月, 2023 4 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @aspiresky01 When using allyesconfig to configure the kernel, errors may occur during the linking process when making. Link:https://gitee.com/openeuler/kernel/pulls/675 Reviewed-by: Chiqijun <chiqijun@huawei.com> Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
-
由 zhoujiadong 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6X8PA CVE: NA Reference: NA --------------------------------- When using allyesconfig to configure the kernel, errors may occur during the linking process when making. Signed-off-by: Nzhoujiadong <zhoujiadong5@huawei.com> Reviewed-by: NWulike (Collin) <wulike1@huawei.com>
-
由 zhoujiadong 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6X8PA CVE: NA Reference: NA --------------------------------- When using allyesconfig to configure the kernel, errors may occur during the linking process when making. Signed-off-by: Nzhoujiadong <zhoujiadong5@huawei.com> Reviewed-by: NWulike (Collin) <wulike1@huawei.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @liu-ziqin https://gitee.com/openeuler/kernel/blob/OLK-5.10/arch/x86/events/zhaoxin/uncore.c#L1692 released the variable 'box', however https://gitee.com/openeuler/kernel/blob/OLK-5.10/arch/x86/events/zhaoxin/uncore.c#L1694 dereferenced the freed memory 'box'(box->pmu->type->name),resulting use-after-free bug. This bug can be fixed by defining variable 'name' to temporarily store the value of box->pmu->type->name,and replacing 'box->pmu->type->name' in the condition check at L1694 with 'name'. Link:https://gitee.com/openeuler/kernel/pulls/665 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-