- 01 11月, 2022 22 次提交
-
-
由 He Sheng 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- ASID is a more common name than ASN. It also renames some related macros. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Reviewed-by: NCui Wei <cuiwei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTM4 -------------------------------- SW64 use r26 to calculate gp after function return, so r26 needs to be restored when kretprobe trampoline is hit. Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTLH -------------------------------- Add deep-set-template.S to rewrite memset() and optimize __clear_user(). Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTLH -------------------------------- Adjust layout of clear_user.S to make sure we can get the correct symbol name when tracing. Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Hang Xiaoqian 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56QAM -------------------------------- The stacktrace.c should be always compiled. Signed-off-by: NHang Xiaoqian <hangxiaoqian@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Xu Chenjiao 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Since suspend/resume is supported, let's enable this by default, and add some new configs brought by kernel updates. Signed-off-by: NXu Chenjiao <xuchenjiao@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 He Sheng 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- In show_regs(), we really want to print the address of stack pointer for debugging. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Reviewed-by: NCui Wei <cuiwei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 He Sheng 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- It's better to use HMC_* macro instead of numberic constant. This patch also adds __CALL_HMC_VOID to define hmcalls with no return value including sflush(). Signed-off-by: NHe Sheng <hesheng@wxiat.com> Reviewed-by: NCui Wei <cuiwei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Min Fanlei 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTKO -------------------------------- This patch adds live migration support for guest os. It requires hmcode of host and guest to be upgraded to activate this feature. Signed-off-by: NMin Fanlei <minfanlei@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Xu Chenjiao 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTK7 -------------------------------- The S3 sleeping state is a low wake latency sleeping state where all system context is lost except system memory. This state will put memory device controller into self-refresh mode in which the memory device maintains its stored data without any active command from the memory controller. At present, only SW831 supports S3 sleep option and has been tested successfully on SW831 CRB. BTW, one should upgrade SROM, HMCode and BIOS firmwares to enable this function. Signed-off-by: NXu Chenjiao <xuchenjiao@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 He Sheng 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Signed-off-by: NHe Sheng <hesheng@wxiat.com> Reviewed-by: NCui Wei <cuiwei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 He Sheng 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Signed-off-by: NHe Sheng <hesheng@wxiat.com> Reviewed-by: NCui Wei <cuiwei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Chen Wang 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTJN -------------------------------- Signed-off-by: NChen Wang <chenwang@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 He Chuyue 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Signed-off-by: NHe Chuyue <hechuyue@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 He Chuyue 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTIW -------------------------------- For perf disable PMU, this then presents the following error condition in processes schedule: Process A Process B Disable irq <PMC overflow> Disable PMU Enable irq <PMI comes> ->sw64_perf_event_irq_handler() When irq is disabled, PMC may still overflow then a PMI triggers. After another process is scheduled and irq is enabled, this PMI will raise immediately. To avoid this, clear interrupt flag in hmcode when it disable PMU. However, in kernel, events that do not exist will return directly. Signed-off-by: NHe Chuyue <hechuyue@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Use the canonical header guard naming of the full path to the header. Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY -------------------------------- Enable DEBUG_BUGVERBOSE by default to make debug easier. Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY -------------------------------- Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY -------------------------------- Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY -------------------------------- Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY -------------------------------- Signed-off-by: NMao Minkai <maominkai@wxiat.com> Reviewed-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com>
-
- 31 10月, 2022 10 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @JqyangCode [2022开源之夏] - 为ebpf添加新的辅助函数-bpf_override_reg ,功能是修改目标函数的寄存器,包括入参,返回值等 -------------------------------------- Error injection is a very important method for testing system stability. But for now, it is still lacking in this regard, but BPF can fill this gap perfectly with its kprobe function. We can use some validation methods to ensure that it only fires on calls we restrict. Although there are bpf_override_funciton that can complete some related operations before, this is a function that bypasses the initial detection and only modifies the return value to the specified value.This does not meet some of our practical scenarios: 1. For example, other registers (such as input registers) need to be modified: when we receive a network packet, we will convert it into a structure and pass it to the corresponding function for processing.For the fault tolerance of network data, we need to modify the members of this structure, which it cannot do. 2. The function cannot be mounted or what needs to be modified is not the function but the instruction: when the sensor reads the IO data, we need to simulate the IO data error. At this time, the reading of the IO data may not be a function, but a few simple instructions In summary, it is necessary to extend an interface that can modify any register, which can provide us with a simple way to achieve system error injection bugzilla: #I55FLX Link:https://gitee.com/openeuler/kernel/pulls/178 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 JqyangCode 提交于
openeuler inclusion category: feature bugzilla: https://gitee.com/open_euler/dashboard?issue_id=I55FLX CVE: NA Reference: NA ------------------- Error injection is a very important method for testing system stability. But for now, it is still lacking in this regard, but BPF can fill this gap perfectly with its kprobe function. We can use some validation methods to ensure that it only fires on calls we restrict. Although there are bpf_override_funciton that can complete some related operations before, this is a function that bypasses the initial detection and only modifies the return value to the specified value.This does not meet some of our practical scenarios: 1. For example, other registers (such as input registers) need to be modified: when we receive a network packet, we will convert it into a structure and pass it to the corresponding function for processing.For the fault tolerance of network data, we need to modify the members of this structure, which it cannot do. 2. The function cannot be mounted or what needs to be modified is not the function but the instruction: when the sensor reads the IO data, we need to simulate the IO data error. At this time, the reading of the IO data may not be a function, but a few simple instructions In summary, it is necessary to extend an interface that can modify any register, which can provide us with a simple way to achieve system error injection Signed-off-by: NJqyangCode <teanix@163.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @hizhisong In some cases, we need to run several low-thrashing required threads together which act as logical operations like PV operations. This kind of thread always falls asleep and wakes other threads up, and thread switching requires the kernel to do several scheduling related overheads (Select the proper core to execute, wake the task up, enqueue the task, mark the task scheduling flag, pick the task at the proper time, dequeue the task and do context switching). These overheads mentioned above are not accepted for the low-thrashing threads. Therefore, we require a mechanism to decline the unnecessary overhead and to swap threads directly without affecting the fairness of CFS tasks. To achieve this goal, we implemented the direct-thread-switch mechanism based on the futex_swap patch*, which switches the DTS task directly with the shared schedule entity. Also, we ensured the kernel keeps secure and consistent basically. \* https://lore.kernel.org/lkml/20200722234538.166697-2-posk@posk.io/ Link:https://gitee.com/openeuler/kernel/pulls/182 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
由 briansun 提交于
openeuler inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU CVE: NA Reference: https://lore.kernel.org/lkml/20200722234538.166697-2-posk@posk.io/ ------------------- In some scenarios, we need to run several low-thrashing required threads together which act as logical operations like PV operations. This kind of thread always falls asleep and wakes other threads up, and thread switching requires the kernel to do several scheduling related overheads (Select the proper core to execute, wake the task up, enqueue the task, mark the task scheduling flag, pick the task at the proper time, dequeue the task and do context switching). These overheads mentioned above are not accepted for the low-thrashing threads. Therefore, we require a mechanism to decline the unnecessary overhead and to swap threads directly without affecting the fairness of CFS tasks. To achieve this goal, we implemented the direct-thread-switch mechanism based on the futex_swap patch*, which switches the DTS task directly with the shared schedule entity. Also, we ensured the kernel keeps secure and consistent basically. Signed-off-by: NZhi Song <hizhisong@gmail.com>
-
由 Peter Oskolkov 提交于
openeuler inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU CVE: NA ------------------- This is the final patch in FUTEX_SWAP patchset. It adds a test/benchmark to validate behavior and compare performance of a new FUTEX_SWAP futex operation. Detailed API design and behavior considerations are provided in the commit messages of the previous two patches. Signed-off-by: NPeter Oskolkov <posk@google.com>
-
由 Peter Oskolkov 提交于
openeuler inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU CVE: NA ------------------- As described in the previous patch in this patchset ("futex: introduce FUTEX_SWAP operation"), it is often beneficial to wake a task and run it on the same CPU where the current going to sleep task it running. Internally at Google, switchto_switch sycall not only migrates the wakee to the current CPU, but also moves the waker's load stats to the wakee, thus ensuring that the migration to the current CPU does not interfere with load balancing. switchto_switch also does the context switch into the wakee, bypassing schedule(). This patchset does not go that far yet, it simply migrates the wakee to the current CPU and calls schedule(). In follow-up patches I will try to fune-tune the behavior by adjusting load stats and schedule(): our internal switchto_switch is still about 2x faster than FUTEX_SWAP (see numbers below). And now about performance: futex_swap benchmark from the last patch in this patchset produces this typical output: $ ./futex_swap -i 100000 ------- running SWAP_WAKE_WAIT ----------- completed 100000 swap and back iterations in 820683263 ns: 4103 ns per swap PASS ------- running SWAP_SWAP ----------- completed 100000 swap and back iterations in 124034476 ns: 620 ns per swap PASS In the above, the first benchmark (SWAP_WAKE_WAIT) calls FUTEX_WAKE, then FUTEX_WAIT; the second benchmark (SWAP_SWAP) calls FUTEX_SWAP. If the benchmark is restricted to a single cpu: $ taskset -c 1 ./futex_swap -i 1000000 The numbers are very similar, as expected (with wake+wait being a bit slower than swap due to two vs one syscalls). Please also note that switchto_switch is about 2x faster than FUTEX_SWAP because it does a contex switch to the wakee immediately, bypassing schedule(), so this is one of the options I'll explore in further patches (if/when this initial patchset is accepted). Tested: see the last patch is this patchset. Signed-off-by: NPeter Oskolkov <posk@google.com>
-
由 Peter Oskolkov 提交于
openeuler inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU CVE: NA ------------------- As Paul Turner presented at LPC in 2013 ... - pdf: http://pdxplumbers.osuosl.org/2013/ocw//system/presentations/1653/original/LPC%20-%20User%20Threading.pdf - video: https://www.youtube.com/watch?v=KXuZi9aeGTw ... Google has developed an M:N userspace threading subsystem backed by Google-private SwitchTo Linux Kernel API (page 17 in the pdf referenced above). This subsystem provides latency-sensitive services at Google with fine-grained user-space control/scheduling over what is running when, and this subsystem is used widely internally (called schedulers or fibers). This patchset is the first step to open-source this work. As explained in the linked pdf and video, SwitchTo API has three core operations: wait, resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation on top of which user-space threading libraries can be built. Another common use case for FUTEX_SWAP is message passing a-la RPC between tasks: task/thread T1 prepares a message, wakes T2 to work on it, and waits for the results; when T2 is done, it wakes T1 and waits for more work to arrive. Currently the simplest way to implement this is a. T1: futex-wake T2, futex-wait b. T2: wakes, does what it has been woken to do c. T2: futex-wake T1, futex-wait With FUTEX_SWAP, steps a and c above can be reduced to one futex operation that runs 5-10 times faster. Patches in this patchset: Patch 1: (this patch) introduce FUTEX_SWAP futex operation that, internally, does wake + wait. The purpose of this patch is to work out the API. Patch 2: a first rough attempt to make FUTEX_SWAP faster than what wake + wait can do. Patch 3: a selftest that can also be used to benchmark FUTEX_SWAP vs FUTEX_WAKE + FUTEX_WAIT. Tested: see patch 3 in this patchset. Signed-off-by: NPeter Oskolkov <posk@google.com>
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @gewus Compressed inode may suffer read performance issue due to it can not use extent cache, so I propose to add this unaligned extent support to improve it. Currently, it only works in readonly format f2fs image. Unaligned extent: in one compressed cluster, physical block number will be less than logical block number, so we add an extra physical block length in extent info in order to indicate such extent status. The idea is if one whole cluster blocks are contiguous physically, once its mapping info was readed at first time, we will cache an unaligned (or aligned) extent info entry in extent cache, it expects that the mapping info will be hitted when rereading cluster. Link:https://gitee.com/openeuler/kernel/pulls/172 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> -
由 Chao Yu 提交于
mainline inclusion from mainline-v5.19 commit 94afd6d6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XX1X CVE: NA ---------------------- Compressed inode may suffer read performance issue due to it can not use extent cache, so I propose to add this unaligned extent support to improve it. Currently, it only works in readonly format f2fs image. Unaligned extent: in one compressed cluster, physical block number will be less than logical block number, so we add an extra physical block length in extent info in order to indicate such extent status. The idea is if one whole cluster blocks are contiguous physically, once its mapping info was readed at first time, we will cache an unaligned (or aligned) extent info entry in extent cache, it expects that the mapping info will be hitted when rereading cluster. Merge policy: - Aligned extents can be merged. - Aligned extent and unaligned extent can not be merged. Signed-off-by: NChao Yu <chao@kernel.org> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NGewus <1319579758@qq.com>
-
由 Daeho Jeong 提交于
mainline inclusion from mainline-v5.19 commit 4215d054 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5XX1X CVE: NA ----------------- Let's allow extent cache for RO partition. Signed-off-by: NDaeho Jeong <daehojeong@google.com> Reviewed-by: NChao Yu <chao@kernel.org> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NGeuws <1319579758@qq.com>
-
- 29 10月, 2022 2 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @AoDaMo This is the result of the OSPP 2022 Project /proc/meminfo is the main way for user to know the physical memory usage, but it cannot know every operation of allocating physical pages. Some kernel modules can call alloc_pages to get physical pages directly, which part of physical memory will not be counted by /proc/meminfo. In order to trace the specific process of physical page allocation and release, a solution is to insert a new modules into kernel and register tracepoint handlers to obtain the physical page allocation information. So i added a new tracepoint named mm_page_alloc_enter at the entrance of mm/page_alloc.c:__alloc_pages(), and exported relevant tracepoint symbols for kernel module programming. Link:https://gitee.com/openeuler/kernel/pulls/189 Reviewed-by: Liu YongQiang <liuyongqiang13@huawei.com> Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> -
由 openeuler-ci-bot 提交于
Merge Pull Request from: @morymiss Multi Gen LRU is a feature for optimizing the memory recycling strategy. According to Google's test, with the help of MGLRU, the CPU utilization of kswapd is reduced by 40%, the background miskill is reduced by 85% when 75% of the memory is occupied, and the rendering delay is reduced by 18% when 50% of the memory is occupied. This feature has been integrated into the Linux next branch. In this PR, the patch is integrated into the openEuler-22.09 branch. MGLRU is mainly designed for memory recycling. First, the LRU linked list of the management page is divided into more algebra. In each generation, the number of page refluts is divided into different levels to achieve more granular management of the page. At the same time, when executing the kswapd daemon, scan the PTE of the active processes since the last execution, instead of directly scanning the physical memory. This can effectively use the spatial locality, improve the scanning efficiency, and reduce the CPU overhead. In the current test results, the IOPS and BW indicators of the fio tool are significantly improved compared with the conventional recovery methods, which proves that the memory management performance of the system is significantly improved under high IO load, and the cpu occupation of the kswapd process is significantly reduced in the test of cpu occupation. In addition, in the current test, the server, embedded and mobile devices are also tested in scenarios. The test results show the good performance of MGLRU in memory recycling, especially in high load scenarios. bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L Reference: https://lore.kernel.org/lkml/20220407031525.2368067-1-yuzhao @google.com/ https://android-review.googlesource.com/c/kernel/common/ +/2050906/10 Link:https://gitee.com/openeuler/kernel/pulls/167 Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
-
- 28 10月, 2022 6 次提交
-
-
由 Yu Zhao 提交于
mainline inclusion from mainline-v6.1-rc1 commit 8be976a0 category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L CVE: NA Reference: https://android-review.googlesource.com/c/kernel/common/+/2050919/10 ---------------------------------------------------------------------- Add a design doc. Link: https://lore.kernel.org/r/20220309021230.721028-15-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NBrian Geffon <bgeffon@google.com> Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name> Acked-by: NSteven Barrett <steven@liquorix.net> Acked-by: NSuleiman Souhlal <suleiman@google.com> Tested-by: NDaniel Byrne <djbyrne@mtu.edu> Tested-by: NDonald Carr <d@chaos-reins.com> Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: NShuang Zhai <szhai2@cs.rochester.edu> Tested-by: NSofia Trinh <sofia.trinh@edi.works> Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: NKalesh Singh <kaleshsingh@google.com> Change-Id: I1d66302e618416291ebf9647e20625fb76613c89 Signed-off-by: NYuLinjia <3110442349@qq.com>
-
由 Yu Zhao 提交于
mainline inclusion from mainline-v6.1-rc1 commit 07017acb category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L CVE: NA Reference: https://android-review.googlesource.com/c/kernel/common/+/2050918/10 ---------------------------------------------------------------------- Add an admin guide. Link: https://lore.kernel.org/r/20220309021230.721028-14-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NBrian Geffon <bgeffon@google.com> Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name> Acked-by: NSteven Barrett <steven@liquorix.net> Acked-by: NSuleiman Souhlal <suleiman@google.com> Tested-by: NDaniel Byrne <djbyrne@mtu.edu> Tested-by: NDonald Carr <d@chaos-reins.com> Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: NShuang Zhai <szhai2@cs.rochester.edu> Tested-by: NSofia Trinh <sofia.trinh@edi.works> Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: NKalesh Singh <kaleshsingh@google.com> Change-Id: I6fafbd7eb3ef6819cfcd30376459f14893f17c63 Signed-off-by: NYuLinjia <3110442349@qq.com>
-
由 Yu Zhao 提交于
mainline inclusion from mainline-v6.1-rc1 commit d6c3af7d category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L CVE: NA Reference: https://android-review.googlesource.com/c/kernel/common/+/2050917/10 ---------------------------------------------------------------------- Add /sys/kernel/debug/lru_gen for working set estimation and proactive reclaim. These features are required to optimize job scheduling (bin packing) in data centers [1][2]. Compared with the page table-based approach and the PFN-based approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has the following advantages: 1. It offers better choices because it is aware of memcgs, NUMA nodes, shared mappings and unmapped page cache. 2. It is more scalable because it is O(nr_hot_pages), whereas the PFN-based approach is O(nr_total_pages). Add /sys/kernel/debug/lru_gen_full for debugging. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 Link: https://lore.kernel.org/r/20220309021230.721028-13-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NBrian Geffon <bgeffon@google.com> Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name> Acked-by: NSteven Barrett <steven@liquorix.net> Acked-by: NSuleiman Souhlal <suleiman@google.com> Tested-by: NDaniel Byrne <djbyrne@mtu.edu> Tested-by: NDonald Carr <d@chaos-reins.com> Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: NShuang Zhai <szhai2@cs.rochester.edu> Tested-by: NSofia Trinh <sofia.trinh@edi.works> Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: NKalesh Singh <kaleshsingh@google.com> Change-Id: Ie558098e0a24a647f77f4eacc4d72576173fc0b8 Signed-off-by: NYuLinjia <3110442349@qq.com>
-
由 Yu Zhao 提交于
mainline inclusion from mainline-v6.1-rc1 commit 1332a809 category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L CVE: NA Reference: https://android-review.googlesource.com/c/kernel/common/+/2050916/10 ---------------------------------------------------------------------- Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention, as requested by many desktop users [1]. When set to value N, it prevents the working set of N milliseconds from getting evicted. The OOM killer is triggered if this working set cannot be kept in memory. Based on the average human detectable lag (~100ms), N=1000 usually eliminates intolerable lags due to thrashing. Larger values like N=3000 make lags less noticeable at the risk of premature OOM kills. Compared with the size-based approach, e.g., [2], this time-based approach has the following advantages: 1. It is easier to configure because it is agnostic to applications and memory sizes. 2. It is more reliable because it is directly wired to the OOM killer. [1] https://lore.kernel.org/lkml/Ydza%2FzXKY9ATRoh6@google.com/ [2] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/ Link: https://lore.kernel.org/r/20220309021230.721028-12-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NBrian Geffon <bgeffon@google.com> Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name> Acked-by: NSteven Barrett <steven@liquorix.net> Acked-by: NSuleiman Souhlal <suleiman@google.com> Tested-by: NDaniel Byrne <djbyrne@mtu.edu> Tested-by: NDonald Carr <d@chaos-reins.com> Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: NShuang Zhai <szhai2@cs.rochester.edu> Tested-by: NSofia Trinh <sofia.trinh@edi.works> Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: NKalesh Singh <kaleshsingh@google.com> Change-Id: I482d33f3beaf7723d2f3eeaaa5b4f12bcb9b48a1 Signed-off-by: NYuLinjia <3110442349@qq.com>
-
由 Yu Zhao 提交于
mainline inclusion from mainline-v6.1-rc1 commit 354ed597 category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L CVE: NA Reference: https://android-review.googlesource.com/c/kernel/common/+/2050915/10 ---------------------------------------------------------------------- Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that can be disabled include: 0x0001: the multi-gen LRU core 0x0002: walking page table, when arch_has_hw_pte_young() returns true 0x0004: clearing the accessed bit in non-leaf PMD entries, when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y [yYnN]: apply to all the components above E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0007 echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 NB: the page table walks happen on the scale of seconds under heavy memory pressure, in which case the mmap_lock contention is a lesser concern, compared with the LRU lock contention and the I/O congestion. So far the only well-known case of the mmap_lock contention happens on Android, due to Scudo [1] which allocates several thousand VMAs for merely a few hundred MBs. The SPF and the Maple Tree also have provided their own assessments [2][3]. However, if walking page tables does worsen the mmap_lock contention, the kill switch can be used to disable it. In this case the multi-gen LRU will suffer a minor performance degradation, as shown previously. Clearing the accessed bit in non-leaf PMD entries can also be disabled, since this behavior was not tested on x86 varieties other than Intel and AMD. [1] https://source.android.com/devices/tech/debug/scudo [2] https://lore.kernel.org/lkml/20220128131006.67712-1-michel@lespinasse.org/ [3] https://lore.kernel.org/lkml/20220202024137.2516438-1-Liam.Howlett@oracle.com/ Link: https://lore.kernel.org/r/20220309021230.721028-11-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NBrian Geffon <bgeffon@google.com> Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name> Acked-by: NSteven Barrett <steven@liquorix.net> Acked-by: NSuleiman Souhlal <suleiman@google.com> Tested-by: NDaniel Byrne <djbyrne@mtu.edu> Tested-by: NDonald Carr <d@chaos-reins.com> Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: NShuang Zhai <szhai2@cs.rochester.edu> Tested-by: NSofia Trinh <sofia.trinh@edi.works> Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: NKalesh Singh <kaleshsingh@google.com> Change-Id: I71801d9470a2588cad8bfd14fbcfafc7b010aa03 Signed-off-by: NYuLinjia <3110442349@qq.com>
-
由 Yu Zhao 提交于
mainline inclusion from mainline-v6.1-rc1 commit f76c8337 category: feature bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L CVE: NA Reference: https://android-review.googlesource.com/c/kernel/common/+/2050914/10 ---------------------------------------------------------------------- When multiple memcgs are available, it is possible to make better choices based on generations and tiers and therefore improve the overall performance under global memory pressure. This patch adds a rudimentary optimization to select memcgs that can drop single-use unmapped clean pages first. Doing so reduces the chance of going into the aging path or swapping. These two operations can be costly. A typical example that benefits from this optimization is a server running mixed types of workloads, e.g., heavy anon workload in one memcg and heavy buffered I/O workload in the other. Though this optimization can be applied to both kswapd and direct reclaim, it is only added to kswapd to keep the patchset manageable. Later improvements will cover the direct reclaim path. Server benchmark results: Mixed workloads: fio (buffered I/O): -[23, 25]% IOPS BW patch1-8: 2960k 11.3GiB/s patch1-9: 2248k 8783MiB/s memcached (anon): +[210, 214]% Ops/sec KB/sec patch1-8: 606940.09 23576.89 patch1-9: 1895197.49 73619.93 Mixed workloads: fio (buffered I/O): -[4, 6]% IOPS BW 5.18-ed464352: 2369k 9255MiB/s patch1-9: 2248k 8783MiB/s memcached (anon): +[510, 516]% Ops/sec KB/sec 5.18-ed464352: 309189.58 12010.61 patch1-9: 1895197.49 73619.93 Configurations: (changes since patch 6) cat mixed.sh modprobe brd rd_nr=2 rd_size=56623104 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 mkfs.ext4 /dev/ram1 mount -t ext4 /dev/ram1 /mnt memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=90m --group_reporting & pid=$! sleep 200 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed kill -INT $pid wait Client benchmark results: no change (CONFIG_MEMCG=n) Link: https://lore.kernel.org/r/20220309021230.721028-10-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NBrian Geffon <bgeffon@google.com> Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name> Acked-by: NSteven Barrett <steven@liquorix.net> Acked-by: NSuleiman Souhlal <suleiman@google.com> Tested-by: NDaniel Byrne <djbyrne@mtu.edu> Tested-by: NDonald Carr <d@chaos-reins.com> Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com> Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: NShuang Zhai <szhai2@cs.rochester.edu> Tested-by: NSofia Trinh <sofia.trinh@edi.works> Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: NKalesh Singh <kaleshsingh@google.com> Change-Id: I0641467dbd7c5ba0645602cec7fe8d6fdb750edb Signed-off-by: NYuLinjia <3110442349@qq.com>
-