- 08 6月, 2023 1 次提交
-
-
由 Weili Qian 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BJW9 CVE: NA ---------------------------------------------------------------------- The mailbox configuration(128 bits) needs to be obtained from the hardware at one time. If the mailbox configuration is obtained for multiple times, the read value may be incorrect. Use the instruction to read mailbox data instead of readl(). Fixes: 263c9959 ("crypto: hisilicon - add queue management driver for HiSilicon QM module") Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com> (cherry picked from commit be56f84d)
-
- 07 6月, 2023 6 次提交
-
-
由 Liu Shixin 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6MH03 CVE: NA -------------------------------- When memory is fragmented, update_reserve_pages() may call migrate_pages() to collect continuous memory. This function can sleep, so we should use mutex lock instead of spin lock. Use KABI_EXTEND to fix kabi broken. Fixes: 0c06a1c0 ("mm/dynamic_hugetlb: add interface to configure the count of hugepages") Signed-off-by: NLiu Shixin <liushixin2@huawei.com> (cherry picked from commit e5d96a31)
-
由 Liu Shixin 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6XOIE CVE: NA -------------------------------- The memory hotplug and memory failure will dissolve freed hugepages to buddy system, this is not the expected behavior for dynamic hugetlb. Skip the dissolve operation for hugepages belonging to dynamic hugetlb. For memory hotplug, the hotplug operation is not allowed, if dhugetlb pool existed. For memory failure, the hugepage will be discard directly. Signed-off-by: NLiu Shixin <liushixin2@huawei.com> (cherry picked from commit 2430060b)
-
由 Liu Shixin 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6XOIE CVE: NA -------------------------------- To support dynamic hugetlb on arm64, we need to do two more things. The first one is to fix kabi broken in mem_cgroup, we use kabi_reserve_5 to fix it in previous patch. The second one is to check cont-bit hugetlb since this feature only support for PMD-size and PUD-size hugepage. This feature only support for 4KB pagesize, not support for 16KB and 64KB. Signed-off-by: NLiu Shixin <liushixin2@huawei.com> (cherry picked from commit 036ab0bb)
-
由 Liu Shixin 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6XOIE CVE: NA -------------------------------- When enable dynamic hugetlb on arm64, the new member struct dhugetlb_pool* will be added to mem_cgroup. We need to use a KABI_RESERVE to fix broken of kabi. The previous struct dhugetlb_pool* is only used on x86_64. Signed-off-by: NLiu Shixin <liushixin2@huawei.com> (cherry picked from commit fc47477c)
-
由 Zhiqi Song 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BANJ CVE: NA ---------------------------------------------------------------------- We find that in the reset scenario, if the reset failed and the MSE is disabled, the value of capability registers will became invalid. When we remove the device under this situation, the unregister process will read the related irq vector from the capability register directly with the mask. Then we will get an invalid value which is out of range and can not be used to get the right irq number by pci_irq_vector(). This will cause the following call trace: | Call trace: | pci_irq_vector+0xfc/0x140 | hisi_qm_uninit+0x278/0x3b0 [hisi_qm] | hpre_remove+0x16c/0x1c0 [hisi_hpre] | pci_device_remove+0x6c/0x264 | device_release_driver_internal+0x1ec/0x3e0 | device_release_driver+0x3c/0x60 | pci_stop_bus_device+0xfc/0x22c | pci_stop_and_remove_bus_device+0x38/0x70 | pci_iov_remove_virtfn+0x108/0x1c0 | sriov_disable+0x7c/0x1e4 | pci_disable_sriov+0x4c/0x6c | hisi_qm_sriov_disable+0x90/0x160 [hisi_qm] | hpre_remove+0x1a8/0x1c0 [hisi_hpre] | pci_device_remove+0x6c/0x264 | device_release_driver_internal+0x1ec/0x3e0 | driver_detach+0x168/0x2d0 | bus_remove_driver+0xc0/0x230 | driver_unregister+0x58/0xdc | pci_unregister_driver+0x40/0x220 | hpre_exit+0x34/0x64 [hisi_hpre] | __arm64_sys_delete_module+0x374/0x620 [...] | Call trace: | free_msi_irqs+0x25c/0x300 | pci_disable_msi+0x19c/0x264 | pci_free_irq_vectors+0x4c/0x70 | hisi_qm_pci_uninit+0x44/0x90 [hisi_qm] | hisi_qm_uninit+0x28c/0x3b0 [hisi_qm] | hpre_remove+0x16c/0x1c0 [hisi_hpre] | pci_device_remove+0x6c/0x264 [...] So we pre-store the valid value of the capability register to a global array in qm init process, and read the register value from this array when we need it. This ensures we can always get valid values. Signed-off-by: NZhiqi Song <songzhiqi1@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>
-
由 Wenkai Lin 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7BANJ CVE: NA ---------------------------------------------------------------------- Extract a public function to set qm algs and remove the similar code for setting qm algs in each module. Signed-off-by: NHao Fang <fanghao11@huawei.com> Signed-off-by: NWenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>
-
- 06 6月, 2023 1 次提交
-
-
由 Weili Qian 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7AUVE CVE: NA ---------------------------------------------------------------------- Before the system is shutdown, the accelerator driver needs to stop the device and write data to the memory. This prevents the accelerator from accessing addresses and writing data to the memory after the memory is reclaimed by the system, causing device exceptions and generating NFE errors. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com> (cherry picked from commit 23bdb7d8)
-
- 05 6月, 2023 1 次提交
-
-
由 Lu Wei 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7AO8G CVE: NA -------------------------------- Commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range in connect()") allocates even ports for connect() first while leaving odd ports for bind() and this works well in busy servers. But this strategy causes severe performance degradation in busy clients. when a client has used more than half of the local ports setted in proc/sys/net/ipv4/ip_local_port_range, if this client trys to connect to a server again, the connect time increases rapidly since it will traverse all the even ports though they are exhausted. So this path provides another strategy by introducing a system option: local_port_allocation. If it is a busy client, users should set it to 1 to use sequential allocation while it should be set to 0 in other situations. Its default value is 0. Signed-off-by: NLu Wei <luwei32@huawei.com> Signed-off-by: NLiu Jian <liujian56@huawei.com> (cherry picked from commit 726c5265)
-
- 02 6月, 2023 2 次提交
-
-
由 Zhangfei Gao 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I79JRM CVE: NA ---------------------------------------------------------------------- The inode can be different in a container, for example, a docker and host both open the same uacce parent device, which uses the same uacce struct but different inode, so uacce->inode is not enough. What's worse, when docker stopped, the inode will be destroyed as well, causing use-after-free in uacce_remove. So use q->filep->f_mapping to replace uacce->inode->i_mapping. Signed-off-by: NZhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com> (cherry picked from commit f29efbf3)
-
由 Longfang Liu 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I79JRM CVE: NA ---------------------------------------------------------------------- After the queue isolation function is enabled in the BIOS. If the current default number of queues is used to enable PF, the default number of queues will be greater than the number of queues supported by the function set in the BIOS, which will cause the driver to fail to load. After modification. If queue isolation is enabled. When the default queue parameter is larger than the number supported by the function. The number of enabled queues will be changed to the number supported by the function. So that the driver can be loaded normally. Signed-off-by: NLongfang Liu <liulongfang@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com> (cherry picked from commit 9bcf38ed)
-
- 01 6月, 2023 2 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A718 -------------------------------- There are worse performance with the 'Fixes' when running "./lat_ctx -P $SYNC_MAX -s 64 16". The 'Fixes' which allocates memory for p->prefer_cpus even if "prefer_cpus" not be set. Before the 'Fixes', only test "p->prefer_cpus", after, add test "!cpumask_empty(p->prefer_cpus)" which causing performance degradation. select_task_rq_fair ->set_task_select_cpus ->prefer_cpus_valid ---- test cpumask_empty(p->prefer_cpus) Fixes: ebeb84ad ("cpuset: Introduce new interface for scheduler ...") Signed-off-by: NHui Tang <tanghui20@huawei.com> (cherry picked from commit d8f77f89)
-
由 Xiu Jianfeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I798WQ CVE: NA ---------------------------------------------------------------------- We found a refcount UAF bug as follows: refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 342 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x148 Workqueue: events cpuset_hotplug_workfn Call trace: refcount_warn_saturate+0xa0/0x148 __refcount_add.constprop.0+0x5c/0x80 css_task_iter_advance_css_set+0xd8/0x210 css_task_iter_advance+0xa8/0x120 css_task_iter_next+0x94/0x158 update_tasks_root_domain+0x58/0x98 rebuild_root_domains+0xa0/0x1b0 rebuild_sched_domains_locked+0x144/0x188 cpuset_hotplug_workfn+0x138/0x5a0 process_one_work+0x1e8/0x448 worker_thread+0x228/0x3e0 kthread+0xe0/0xf0 ret_from_fork+0x10/0x20 then a kernel panic will be triggered as below: Unable to handle kernel paging request at virtual address 00000000c0000010 Call trace: cgroup_apply_control_disable+0xa4/0x16c rebind_subsystems+0x224/0x590 cgroup_destroy_root+0x64/0x2e0 css_free_rwork_fn+0x198/0x2a0 process_one_work+0x1d4/0x4bc worker_thread+0x158/0x410 kthread+0x108/0x13c ret_from_fork+0x10/0x18 The race that cause this bug can be shown as below: (hotplug cpu) | (umount cpuset) mutex_lock(&cpuset_mutex) | mutex_lock(&cgroup_mutex) cpuset_hotplug_workfn | rebuild_root_domains | rebind_subsystems update_tasks_root_domain | spin_lock_irq(&css_set_lock) css_task_iter_start | list_move_tail(&cset->e_cset_node[ss->id] while(css_task_iter_next) | &dcgrp->e_csets[ss->id]); css_task_iter_end | spin_unlock_irq(&css_set_lock) mutex_unlock(&cpuset_mutex) | mutex_unlock(&cgroup_mutex) Inside css_task_iter_start/next/end, css_set_lock is hold and then released, so when iterating task(left side), the css_set may be moved to another list(right side), then it->cset_head points to the old list head and it->cset_pos->next points to the head node of new list, which can't be used as struct css_set. To fix this issue, introduce CSS_TASK_ITER_STOPPED flag for css_task_iter. when moving css_set to dcgrp->e_csets[ss->id] in rebind_subsystems(), stop the task iteration. Reported-by: NGaosheng Cui <cuigaosheng1@huawei.com> Link: https://www.spinics.net/lists/cgroups/msg37935.html Fixes: f9a25f77 ("cpusets: Rebuild root domain deadline accounting information") Signed-off-by: NXiu Jianfeng <xiujianfeng@huaweicloud.com> Signed-off-by: NGaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com> (cherry picked from commit e52586f4)
-
- 30 5月, 2023 8 次提交
-
-
由 Zhao Wenhui 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I737X1 ------------------------------- Expand qos_level from {-1,0} to [-2, 2], to distinguish the tasks expected to be with extremely high or low priority level. Using qos_level_weight to reweight the shares when calculating group's weight. Meanwhile, set offline task's schedule policy to SCHED_IDLE so that it can be preempted at check_preempt_wakeup. Signed-off-by: NZhao Wenhui <zhaowenhui8@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 CVE: NA -------------------------------- Signed-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Guan Jing 提交于
mainline inclusion from mainline-v5.18-rc1 commit e496132e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I78WM8 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc3&id=e496132ebedd870b67f1f6d2428f9bb9d7ae27fd -------------------------------- Commit 7d2b5dd0 ("sched/numa: Allow a floating imbalance between NUMA nodes") allowed an imbalance between NUMA nodes such that communicating tasks would not be pulled apart by the load balancer. This works fine when there is a 1:1 relationship between LLC and node but can be suboptimal for multiple LLCs if independent tasks prematurely use CPUs sharing cache. Zen* has multiple LLCs per node with local memory channels and due to the allowed imbalance, it's far harder to tune some workloads to run optimally than it is on hardware that has 1 LLC per node. This patch allows an imbalance to exist up to the point where LLCs should be balanced between nodes. On a Zen3 machine running STREAM parallelised with OMP to have on instance per LLC the results and without binding, the results are 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v6 MB/sec copy-16 162596.94 ( 0.00%) 580559.74 ( 257.05%) MB/sec scale-16 136901.28 ( 0.00%) 374450.52 ( 173.52%) MB/sec add-16 157300.70 ( 0.00%) 564113.76 ( 258.62%) MB/sec triad-16 151446.88 ( 0.00%) 564304.24 ( 272.61%) STREAM can use directives to force the spread if the OpenMP is new enough but that doesn't help if an application uses threads and it's not known in advance how many threads will be created. Coremark is a CPU and cache intensive benchmark parallelised with threads. When running with 1 thread per core, the vanilla kernel allows threads to contend on cache. With the patch; 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v5 Min Score-16 368239.36 ( 0.00%) 389816.06 ( 5.86%) Hmean Score-16 388607.33 ( 0.00%) 427877.08 * 10.11%* Max Score-16 408945.69 ( 0.00%) 481022.17 ( 17.62%) Stddev Score-16 15247.04 ( 0.00%) 24966.82 ( -63.75%) CoeffVar Score-16 3.92 ( 0.00%) 5.82 ( -48.48%) It can also make a big difference for semi-realistic workloads like specjbb which can execute arbitrary numbers of threads without advance knowledge of how they should be placed. Even in cases where the average performance is neutral, the results are more stable. 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v6 Hmean tput-1 71631.55 ( 0.00%) 73065.57 ( 2.00%) Hmean tput-8 582758.78 ( 0.00%) 556777.23 ( -4.46%) Hmean tput-16 1020372.75 ( 0.00%) 1009995.26 ( -1.02%) Hmean tput-24 1416430.67 ( 0.00%) 1398700.11 ( -1.25%) Hmean tput-32 1687702.72 ( 0.00%) 1671357.04 ( -0.97%) Hmean tput-40 1798094.90 ( 0.00%) 2015616.46 * 12.10%* Hmean tput-48 1972731.77 ( 0.00%) 2333233.72 ( 18.27%) Hmean tput-56 2386872.38 ( 0.00%) 2759483.38 ( 15.61%) Hmean tput-64 2909475.33 ( 0.00%) 2925074.69 ( 0.54%) Hmean tput-72 2585071.36 ( 0.00%) 2962443.97 ( 14.60%) Hmean tput-80 2994387.24 ( 0.00%) 3015980.59 ( 0.72%) Hmean tput-88 3061408.57 ( 0.00%) 3010296.16 ( -1.67%) Hmean tput-96 3052394.82 ( 0.00%) 2784743.41 ( -8.77%) Hmean tput-104 2997814.76 ( 0.00%) 2758184.50 ( -7.99%) Hmean tput-112 2955353.29 ( 0.00%) 2859705.09 ( -3.24%) Hmean tput-120 2889770.71 ( 0.00%) 2764478.46 ( -4.34%) Hmean tput-128 2871713.84 ( 0.00%) 2750136.73 ( -4.23%) Stddev tput-1 5325.93 ( 0.00%) 2002.53 ( 62.40%) Stddev tput-8 6630.54 ( 0.00%) 10905.00 ( -64.47%) Stddev tput-16 25608.58 ( 0.00%) 6851.16 ( 73.25%) Stddev tput-24 12117.69 ( 0.00%) 4227.79 ( 65.11%) Stddev tput-32 27577.16 ( 0.00%) 8761.05 ( 68.23%) Stddev tput-40 59505.86 ( 0.00%) 2048.49 ( 96.56%) Stddev tput-48 168330.30 ( 0.00%) 93058.08 ( 44.72%) Stddev tput-56 219540.39 ( 0.00%) 30687.02 ( 86.02%) Stddev tput-64 121750.35 ( 0.00%) 9617.36 ( 92.10%) Stddev tput-72 223387.05 ( 0.00%) 34081.13 ( 84.74%) Stddev tput-80 128198.46 ( 0.00%) 22565.19 ( 82.40%) Stddev tput-88 136665.36 ( 0.00%) 27905.97 ( 79.58%) Stddev tput-96 111925.81 ( 0.00%) 99615.79 ( 11.00%) Stddev tput-104 146455.96 ( 0.00%) 28861.98 ( 80.29%) Stddev tput-112 88740.49 ( 0.00%) 58288.23 ( 34.32%) Stddev tput-120 186384.86 ( 0.00%) 45812.03 ( 75.42%) Stddev tput-128 78761.09 ( 0.00%) 57418.48 ( 27.10%) Similarly, for embarassingly parallel problems like NPB-ep, there are improvements due to better spreading across LLC when the machine is not fully utilised. vanilla sched-numaimb-v6 Min ep.D 31.79 ( 0.00%) 26.11 ( 17.87%) Amean ep.D 31.86 ( 0.00%) 26.17 * 17.86%* Stddev ep.D 0.07 ( 0.00%) 0.05 ( 24.41%) CoeffVar ep.D 0.22 ( 0.00%) 0.20 ( 7.97%) Max ep.D 31.93 ( 0.00%) 26.21 ( 17.91%) Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NGautham R. Shenoy <gautham.shenoy@amd.com> Tested-by: NK Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20220208094334.16379-3-mgorman@techsingularity.netSigned-off-by: NGuan Jing <guanjing6@huawei.com>
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v6.3-rc4 commit 1470afef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1470afefc3c42df5d1662f87d079b46651bdc95b -------------------------------- Equivalent of for_each_cpu_and, except it ORs the two masks together so it iterates all the CPUs present in either mask. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NZeng Heng <zengheng4@huawei.com>
-
由 Yury Norov 提交于
mainline inclusion from mainline-v5.13-rc1 commit 586eaebe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=586eaebea5988302c5a8b018096dd6c6f4564940 -------------------------------- find_bit would also benefit from small_const_nbits() optimizations. The detailed comment is provided by Rasmus Villemoes. Link: https://lkml.kernel.org/r/20210401003153.97325-6-yury.norov@gmail.comSuggested-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NYury Norov <yury.norov@gmail.com> Acked-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexey Klimov <aklimov@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Sterba <dsterba@suse.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jianpeng Ma <jianpeng.ma@intel.com> Cc: Joe Perches <joe@perches.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Rich Felker <dalias@libc.org> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NZeng Heng <zengheng4@huawei.com>
-
由 Peter Zijlstra 提交于
mainline inclusion from mainline-v5.13-rc1 commit e40f74c5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6VS35 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e40f74c535b8a0ecf3ef0388b51a34cdadb34fb5 -------------------------------- Introduce a cpumask that indicates (for each CPU) what direction the CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when an up fails and we do a roll-back down, it will accurately reflect the direction. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NValentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.orgSigned-off-by: NZeng Heng <zengheng4@huawei.com>
-
由 Weili Qian 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I773SD CVE: NA ---------------------------------------------------------------------- support no-sva feature. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>
-
由 Kai Ye 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I773SD CVE: NA ---------------------------------------------------------------------- 1. UACCE_MODE_NOIOMMU for warpdrive. 2. some dfx logs 3. fix some static checking. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NJiangShui Yang <yangjiangshui@h-partners.com>
-
- 29 5月, 2023 1 次提交
-
-
由 Yicong Yang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I798Y2 CVE: NA ---------------------------------------------------------------------- Some HiSilicon SMMU PMCG suffers the erratum that the global PMU disable control sometimes fail to disable each used the counters. This will lead to error or inaccurate data since before we enable the counters the counter's still counting for the event used in last perf session. This patch tries to fix this by hardening the global disable process. Before disable the PMU, writing an invalid event type (0xff) to focibly stop the counters. Signed-off-by: NYicong Yang <yangyicong@hisilicon.com> Signed-off-by: NJunhao He <hejunhao3@huawei.com>
-
- 23 5月, 2023 3 次提交
-
-
由 Tariq Toukan 提交于
stable inclusion from stable-v5.10.153 commit bbcc06933f35651294ea1e963757502312c2171f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64YCA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bbcc06933f35651294ea1e963757502312c2171f -------------------------------- [ Upstream commit bacd22df ] mlx5_cmd_cleanup_async_ctx should return only after all its callback handlers were completed. Before this patch, the below race between mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and lead to a use-after-free: 1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e. elevated by 1, a single inflight callback). 2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1. 3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and is about to call wake_up(). 4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns immediately as the condition (num_inflight == 0) holds. 5. mlx5_cmd_cleanup_async_ctx returns. 6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx object. 7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed object. Fix it by syncing using a completion object. Mark it completed when num_inflight reaches 0. Trace: BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270 Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x57/0x7d print_report.cold+0x2d5/0x684 ? do_raw_spin_lock+0x23d/0x270 kasan_report+0xb1/0x1a0 ? do_raw_spin_lock+0x23d/0x270 do_raw_spin_lock+0x23d/0x270 ? rwlock_bug.part.0+0x90/0x90 ? __delete_object+0xb8/0x100 ? lock_downgrade+0x6e0/0x6e0 _raw_spin_lock_irqsave+0x43/0x60 ? __wake_up_common_lock+0xb9/0x140 __wake_up_common_lock+0xb9/0x140 ? __wake_up_common+0x650/0x650 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kasan_set_track+0x21/0x30 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kfree+0x1ba/0x520 ? do_raw_spin_unlock+0x54/0x220 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core] ? dump_command+0xcc0/0xcc0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core] cmd_comp_notifier+0x7e/0xb0 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 ? irq_release+0x140/0x140 [mlx5_core] irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x1f2/0x620 handle_irq_event+0xb2/0x1d0 handle_edge_irq+0x21e/0xb00 __common_interrupt+0x79/0x1a0 common_interrupt+0x78/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x42/0x60 Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00 RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110 RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3 R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005 R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000 ? default_idle_call+0xcc/0x450 default_idle_call+0xec/0x450 do_idle+0x394/0x450 ? arch_cpu_idle_exit+0x40/0x40 ? do_idle+0x17/0x450 cpu_startup_entry+0x19/0x20 start_secondary+0x221/0x2b0 ? set_cpu_sibling_map+0x2070/0x2070 secondary_startup_64_no_verify+0xcd/0xdb </TASK> Allocated by task 49502: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kvmalloc_node+0x48/0xe0 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core] mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 49502: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x11d/0x1b0 kfree+0x1ba/0x520 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: e355477e ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API") Signed-off-by: NTariq Toukan <tariqt@nvidia.com> Reviewed-by: NMoshe Shemesh <moshe@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NLipeng Sang <sanglipeng1@jd.com>
-
由 Hans Verkuil 提交于
stable inclusion from stable-v5.10.153 commit b6c7446d0a38725c64305bfb4728625d4f411f50 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64YCA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b6c7446d0a38725c64305bfb4728625d4f411f50 -------------------------------- [ Upstream commit 8da7f097 ] If it is a progressive (non-interlaced) format, then ignore the interlaced timing values. Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 7f68127f ([media] videodev2.h: defines to calculate blanking and frame sizes) Signed-off-by: NMauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NLipeng Sang <sanglipeng1@jd.com>
-
由 Alexander Stein 提交于
stable inclusion from stable-v5.10.153 commit 4953a989b72d2b809b18dde7a4c2844cba4232d4 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64YCA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4953a989b72d2b809b18dde7a4c2844cba4232d4 -------------------------------- [ Upstream commit bb9ea2c3 ] The doc says the I²C device's name is used if devname is NULL, but actually the I²C device driver's name is used. Fixes: 06582930 ("media: v4l: subdev: Add a function to set an I²C sub-device's name") Signed-off-by: NAlexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NLipeng Sang <sanglipeng1@jd.com>
-
- 22 5月, 2023 3 次提交
-
-
由 JofDiamonds 提交于
mainline inclusion from mainline-v6.4-rc3 commit 748cd572 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I776SR CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=748cd5729ac7421091316e32dcdffb0578563880 ---------------------------------------------------------------------- Right now there is no way to query whether BPF programs are attached to a sockmap or not. we can use the standard interface in libbpf to query, such as: bpf_prog_query(mapFd, BPF_SK_SKB_STREAM_PARSER, 0, NULL, ...); the mapFd is the fd of sockmap. Signed-off-by: NDi Zhu <zhudi2@huawei.com> Acked-by: NYonghong Song <yhs@fb.com> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20220119014005.1209-1-zhudi2@huawei.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org> Conflicts: net/core/sock_map.c include/linux/bpf.h Signed-off-by: NJofDiamonds <kwb0523@163.com> Reviewed-by: Nwuchangye <wuchangye@huawei.com>
-
由 Pablo Neira Ayuso 提交于
stable inclusion from stable-v5.10.180 commit e044a24447189419c3a7ccc5fa6da7516036dc55 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I71F49 CVE: CVE-2023-32233 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e044a24447189419c3a7ccc5fa6da7516036dc55 -------------------------------- commit c1592a89 upstream. Toggle deleted anonymous sets as inactive in the next generation, so users cannot perform any update on it. Clear the generation bitmask in case the transaction is aborted. The following KASAN splat shows a set element deletion for a bound anonymous set that has been already removed in the same transaction. [ 64.921510] ================================================================== [ 64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.924745] Write of size 8 at addr dead000000000122 by task test/890 [ 64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253 [ 64.931120] Call Trace: [ 64.932699] <TASK> [ 64.934292] dump_stack_lvl+0x33/0x50 [ 64.935908] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.937551] kasan_report+0xda/0x120 [ 64.939186] ? nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.940814] nf_tables_commit+0xa24/0x1490 [nf_tables] [ 64.942452] ? __kasan_slab_alloc+0x2d/0x60 [ 64.944070] ? nf_tables_setelem_notify+0x190/0x190 [nf_tables] [ 64.945710] ? kasan_set_track+0x21/0x30 [ 64.947323] nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink] [ 64.948898] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink] Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Weili Qian 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I76TVJ CVE: NA ---------------------------------------------------------------------- The debugfs files 'dev_state' and 'dev_timeout' are added. dev_state: if dev_timeout is set, dev_state indicates the status of stopping the queue. 0 indicates that the queue is stopped successfully. Other values indicate that the queue stops fail. if dev_timeout is not set, the value of dev_state is 0; dev_timeout: If the queue fails to stop, the queue is released after waiting dev_timeout * 20ms. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com>
-
- 19 5月, 2023 6 次提交
-
-
由 Nanyong Sun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B CVE: NA ---------------------------------------------------------------------- Add control file "memory.ksm" to enable ksm per cgroup. Echo to 1 will set all tasks currently in the cgroup to ksm merge any mode, which means ksm gets enabled for all vma's of a process. Meanwhile echo to 0 will disable ksm for them and unmerge the merged pages. Cat the file will show the above state and ksm related profits of this cgroup. Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
-
由 David Hildenbrand 提交于
mainline inclusion from mainline-v6.4-rc1 commit 24139c07 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=24139c07f413ef4b555482c758343d71392a19bc ---------------------------------------------------------------------- Patch series "mm/ksm: improve PR_SET_MEMORY_MERGE=0 handling and cleanup disabling KSM", v2. (1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE does, (2) add a selftest for it and (3) factor out disabling of KSM from s390/gmap code. This patch (of 3): Let's unmerge any KSM pages when setting PR_SET_MEMORY_MERGE=0, and clear the VM_MERGEABLE flag from all VMAs -- just like KSM would. Of course, only do that if we previously set PR_SET_MEMORY_MERGE=1. Link: https://lkml.kernel.org/r/20230422205420.30372-1-david@redhat.com Link: https://lkml.kernel.org/r/20230422205420.30372-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NStefan Roesch <shr@devkernel.io> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: mm/ksm.c Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
-
由 Stefan Roesch 提交于
mainline inclusion from mainline-v6.4-rc1 commit d21077fb category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d21077fbc2fc987c2e593c34dc3b4d84e546dc9f ---------------------------------------------------------------------- This adds the general_profit KSM sysfs knob and the process profit metric knobs to ksm_stat. 1) expose general_profit metric The documentation mentions a general profit metric, however this metric is not calculated. In addition the formula depends on the size of internal structures, which makes it more difficult for an administrator to make the calculation. Adding the metric for a better user experience. 2) document general_profit sysfs knob 3) calculate ksm process profit metric The ksm documentation mentions the process profit metric and how to calculate it. This adds the calculation of the metric. 4) mm: expose ksm process profit metric in ksm_stat This exposes the ksm process profit metric in /proc/<pid>/ksm_stat. The documentation mentions the formula for the ksm process profit metric, however it does not calculate it. In addition the formula depends on the size of internal structures. So it makes sense to expose it. 5) document new procfs ksm knobs Link: https://lkml.kernel.org/r/20230418051342.1919757-3-shr@devkernel.ioSigned-off-by: NStefan Roesch <shr@devkernel.io> Reviewed-by: NBagas Sanjaya <bagasdotme@gmail.com> Acked-by: NDavid Hildenbrand <david@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
-
由 Stefan Roesch 提交于
mainline inclusion from mainline-v6.4-rc1 commit d7597f59 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d7597f59d1d33e9efbffa7060deb9ee5bd119e62 ---------------------------------------------------------------------- Patch series "mm: process/cgroup ksm support", v9. So far KSM can only be enabled by calling madvise for memory regions. To be able to use KSM for more workloads, KSM needs to have the ability to be enabled / disabled at the process / cgroup level. Use case 1: The madvise call is not available in the programming language. An example for this are programs with forked workloads using a garbage collected language without pointers. In such a language madvise cannot be made available. In addition the addresses of objects get moved around as they are garbage collected. KSM sharing needs to be enabled "from the outside" for these type of workloads. Use case 2: The same interpreter can also be used for workloads where KSM brings no benefit or even has overhead. We'd like to be able to enable KSM on a workload by workload basis. Use case 3: With the madvise call sharing opportunities are only enabled for the current process: it is a workload-local decision. A considerable number of sharing opportunities may exist across multiple workloads or jobs (if they are part of the same security domain). Only a higler level entity like a job scheduler or container can know for certain if its running one or more instances of a job. That job scheduler however doesn't have the necessary internal workload knowledge to make targeted madvise calls. Security concerns: In previous discussions security concerns have been brought up. The problem is that an individual workload does not have the knowledge about what else is running on a machine. Therefore it has to be very conservative in what memory areas can be shared or not. However, if the system is dedicated to running multiple jobs within the same security domain, its the job scheduler that has the knowledge that sharing can be safely enabled and is even desirable. Performance: Experiments with using UKSM have shown a capacity increase of around 20%. Here are the metrics from an instagram workload (taken from a machine with 64GB main memory): full_scans: 445 general_profit: 20158298048 max_page_sharing: 256 merge_across_nodes: 1 pages_shared: 129547 pages_sharing: 5119146 pages_to_scan: 4000 pages_unshared: 1760924 pages_volatile: 10761341 run: 1 sleep_millisecs: 20 stable_node_chains: 167 stable_node_chains_prune_millisecs: 2000 stable_node_dups: 2751 use_zero_pages: 0 zero_pages_sharing: 0 After the service is running for 30 minutes to an hour, 4 to 5 million shared pages are common for this workload when using KSM. Detailed changes: 1. New options for prctl system command This patch series adds two new options to the prctl system call. The first one allows to enable KSM at the process level and the second one to query the setting. The setting will be inherited by child processes. With the above setting, KSM can be enabled for the seed process of a cgroup and all processes in the cgroup will inherit the setting. 2. Changes to KSM processing When KSM is enabled at the process level, the KSM code will iterate over all the VMA's and enable KSM for the eligible VMA's. When forking a process that has KSM enabled, the setting will be inherited by the new child process. 3. Add general_profit metric The general_profit metric of KSM is specified in the documentation, but not calculated. This adds the general profit metric to /sys/kernel/debug/mm/ksm. 4. Add more metrics to ksm_stat This adds the process profit metric to /proc/<pid>/ksm_stat. 5. Add more tests to ksm_tests and ksm_functional_tests This adds an option to specify the merge type to the ksm_tests. This allows to test madvise and prctl KSM. It also adds a two new tests to ksm_functional_tests: one to test the new prctl options and the other one is a fork test to verify that the KSM process setting is inherited by client processes. This patch (of 3): So far KSM can only be enabled by calling madvise for memory regions. To be able to use KSM for more workloads, KSM needs to have the ability to be enabled / disabled at the process / cgroup level. 1. New options for prctl system command This patch series adds two new options to the prctl system call. The first one allows to enable KSM at the process level and the second one to query the setting. The setting will be inherited by child processes. With the above setting, KSM can be enabled for the seed process of a cgroup and all processes in the cgroup will inherit the setting. 2. Changes to KSM processing When KSM is enabled at the process level, the KSM code will iterate over all the VMA's and enable KSM for the eligible VMA's. When forking a process that has KSM enabled, the setting will be inherited by the new child process. 1) Introduce new MMF_VM_MERGE_ANY flag This introduces the new flag MMF_VM_MERGE_ANY flag. When this flag is set, kernel samepage merging (ksm) gets enabled for all vma's of a process. 2) Setting VM_MERGEABLE on VMA creation When a VMA is created, if the MMF_VM_MERGE_ANY flag is set, the VM_MERGEABLE flag will be set for this VMA. 3) support disabling of ksm for a process This adds the ability to disable ksm for a process if ksm has been enabled for the process with prctl. 4) add new prctl option to get and set ksm for a process This adds two new options to the prctl system call - enable ksm for all vmas of a process (if the vmas support it). - query if ksm has been enabled for a process. 3. Disabling MMF_VM_MERGE_ANY for storage keys in s390 In the s390 architecture when storage keys are used, the MMF_VM_MERGE_ANY will be disabled. Link: https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io Link: https://lkml.kernel.org/r/20230418051342.1919757-2-shr@devkernel.ioSigned-off-by: NStefan Roesch <shr@devkernel.io> Acked-by: NDavid Hildenbrand <david@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: kernel/sys.c mm/ksm.c mm/mmap.c Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
-
由 xu xin 提交于
mainline inclusion from mainline-v6.1-rc1 commit cb4df4ca category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cb4df4cae4f2bd8cf7a32eff81178fce31600f7c ---------------------------------------------------------------------- Patch series "ksm: count allocated rmap_items and update documentation", v5. KSM can save memory by merging identical pages, but also can consume additional memory, because it needs to generate rmap_items to save each scanned page's brief rmap information. To determine how beneficial the ksm-policy (like madvise), they are using brings, so we add a new interface /proc/<pid>/ksm_stat for each process The value "ksm_rmap_items" in it indicates the total allocated ksm rmap_items of this process. The detailed description can be seen in the following patches' commit message. This patch (of 2): KSM can save memory by merging identical pages, but also can consume additional memory, because it needs to generate rmap_items to save each scanned page's brief rmap information. Some of these pages may be merged, but some may not be abled to be merged after being checked several times, which are unprofitable memory consumed. The information about whether KSM save memory or consume memory in system-wide range can be determined by the comprehensive calculation of pages_sharing, pages_shared, pages_unshared and pages_volatile. A simple approximate calculation: profit =~ pages_sharing * sizeof(page) - (all_rmap_items) * sizeof(rmap_item); where all_rmap_items equals to the sum of pages_sharing, pages_shared, pages_unshared and pages_volatile. But we cannot calculate this kind of ksm profit inner single-process wide because the information of ksm rmap_item's number of a process is lacked. For user applications, if this kind of information could be obtained, it helps upper users know how beneficial the ksm-policy (like madvise) they are using brings, and then optimize their app code. For example, one application madvise 1000 pages as MERGEABLE, while only a few pages are really merged, then it's not cost-efficient. So we add a new interface /proc/<pid>/ksm_stat for each process in which the value of ksm_rmap_itmes is only shown now and so more values can be added in future. So similarly, we can calculate the ksm profit approximately for a single process by: profit =~ ksm_merging_pages * sizeof(page) - ksm_rmap_items * sizeof(rmap_item); where ksm_merging_pages is shown at /proc/<pid>/ksm_merging_pages, and ksm_rmap_items is shown in /proc/<pid>/ksm_stat. Link: https://lkml.kernel.org/r/20220830143731.299702-1-xu.xin16@zte.com.cn Link: https://lkml.kernel.org/r/20220830143838.299758-1-xu.xin16@zte.com.cnSigned-off-by: Nxu xin <xu.xin16@zte.com.cn> Reviewed-by: NXiaokai Ran <ran.xiaokai@zte.com.cn> Reviewed-by: NYang Yang <yang.yang29@zte.com.cn> Signed-off-by: NCGEL ZTE <cgel.zte@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: include/linux/mm_types.h Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
-
由 xu xin 提交于
mainline inclusion from mainline-v5.19-rc1 commit 76093853 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7609385337a4feb6236e42dcd0df2185683ce839 ---------------------------------------------------------------------- Some applications or containers want to use KSM by calling madvise() to advise areas of address space to be MERGEABLE. But they may not know which applications are more likely to cause real merges in the deployment. If this patch is applied, it helps them know their corresponding number of merged pages, and then optimize their app code. As current KSM only counts the number of KSM merging pages(e.g. ksm_pages_sharing and ksm_pages_shared) of the whole system, we cannot see the more fine-grained KSM merging, for the upper application optimization, the merging area cannot be set easily according to the KSM page merging probability of each process. Therefore, it is necessary to add extra statistical means so that the upper level users can know the detailed KSM merging information of each process. We add a new proc file named as ksm_merging_pages under /proc/<pid>/ to indicate the involved ksm merging pages of this process. [akpm@linux-foundation.org: fix comment typo, remove BUG_ON()s] Link: https://lkml.kernel.org/r/20220325082318.2352853-1-xu.xin16@zte.com.cnSigned-off-by: Nxu xin <xu.xin16@zte.com.cn> Reported-by: Nkernel test robot <lkp@intel.com> Reviewed-by: NYang Yang <yang.yang29@zte.com.cn> Reviewed-by: NRan Xiaokai <ran.xiaokai@zte.com.cn> Reported-by: NZeal Robot <zealci@zte.com.cn> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Ohhoon Kwon <ohoono.kwon@samsung.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Feng Tang <feng.tang@intel.com> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: include/linux/mm_types.h Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
-
- 18 5月, 2023 6 次提交
-
-
由 ZhangPeng 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6CAIM -------------------------------- Add checks for new_addr in uswap_mremap() and src_addr in uswap_check_copy_mode(), including user mode checks, overlapping checks, etc. Signed-off-by: NZhangPeng <zhangpeng362@huawei.com>
-
由 ZhangPeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6CAIM -------------------------------- As follows, fix some type and logical bugs. 1) The type of index variable is changed from int to unsigned long to support large memory registration. 2) Fix the bug that USWAP_PAGES_DIRTY does not take effect. 3) Take the mmap_read_lock() when using the VMA in uswap_adjust_uffd_range(). 4) Do some code refactoring and cleancode. Signed-off-by: NZhangPeng <zhangpeng362@huawei.com>
-
由 ZhangPeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6CAIM -------------------------------- Split uswap_register() into uswap_register() and uswap_adjust_uffd_range(). Before validate_range(), use uswap_register() to handle uswap mode. After validate_range(), use uswap_adjust_uffd_range() to change address range to VMA range, which could reduce fragmentation caused by VMA splitting. By splitting uswap_register(), we could prevent the userswap registration of invalid input address ranges. Signed-off-by: NZhangPeng <zhangpeng362@huawei.com>
-
由 ZhangPeng 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6CAIM -------------------------------- Replace enable_userswap with struct static_key_false userswap_enabled. Signed-off-by: NZhangPeng <zhangpeng362@huawei.com>
-
由 ZhangPeng 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6CAIM -------------------------------- This patch moves the code related to enable_userswap and CONFIG_USERSWAP to mm/userswap.c. This allows for better encapsulation and easier maintenance. Signed-off-by: NZhangPeng <zhangpeng362@huawei.com>
-
由 ZhangPeng 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6CAIM -------------------------------- The uffd_msg.reserved3 field is used to transfer the CPU information of the PF. Signed-off-by: NZhangPeng <zhangpeng362@huawei.com>
-