提交 · c640a76516ba7d279ab3cf8f795a196bccc04e8c · openeuler / Kernel

01 11月, 2022 22 次提交

由 He Sheng 提交于 9月 08, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------

ASID is a more common name than ASN. It also renames some related
macros.
Signed-off-by: NHe Sheng <hesheng@wxiat.com>
Reviewed-by: NCui Wei <cuiwei@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

c640a765

sw64: fix incorrect gp after kretprobe triggered · 8c89628b

由 Mao Minkai 提交于 10月 10, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTM4

--------------------------------

SW64 use r26 to calculate gp after function return, so r26 needs to be
restored when kretprobe trampoline is hit.
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

8c89628b

sw64: add deep-set-template.S · 89aefedb

由 Mao Minkai 提交于 10月 10, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTLH

--------------------------------

Add deep-set-template.S to rewrite memset() and optimize
__clear_user().
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

89aefedb

sw64: adjust layout of clear_user.S · d1878f16

由 Mao Minkai 提交于 10月 10, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTLH

--------------------------------

Adjust layout of clear_user.S to make sure we can get the correct
symbol name when tracing.
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

d1878f16

sw64: fix compile errors when CONFIG_STACKTRACE is not set · d769ce14

由 Hang Xiaoqian 提交于 10月 09, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56QAM

--------------------------------

The stacktrace.c should be always compiled.
Signed-off-by: NHang Xiaoqian <hangxiaoqian@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

d769ce14

sw64: update openeuler_defconfig · fe3e2596

由 Xu Chenjiao 提交于 10月 09, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------

Since suspend/resume is supported, let's enable this by default, and add
some new configs brought by kernel updates.
Signed-off-by: NXu Chenjiao <xuchenjiao@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

fe3e2596

sw64: print real address of sp in show_regs() · e156be69

由 He Sheng 提交于 10月 09, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------

In show_regs(), we really want to print the address of stack
pointer for debugging.
Signed-off-by: NHe Sheng <hesheng@wxiat.com>
Reviewed-by: NCui Wei <cuiwei@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

e156be69

sw64: invoke hmcall with HMC_* macros · d5b85d88

由 He Sheng 提交于 9月 23, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------

It's better to use HMC_* macro instead of numberic constant.

This patch also adds __CALL_HMC_VOID to define hmcalls with no
return value including sflush().
Signed-off-by: NHe Sheng <hesheng@wxiat.com>
Reviewed-by: NCui Wei <cuiwei@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

d5b85d88

sw64: kvm: add guest live migration support · d81202bc

由 Min Fanlei 提交于 7月 26, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTKO

--------------------------------

This patch adds live migration support for guest os. It requires
hmcode of host and guest to be upgraded to activate this feature.
Signed-off-by: NMin Fanlei <minfanlei@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

d81202bc

sw64: add support for S3 sleep option · c87034a6

由 Xu Chenjiao 提交于 9月 16, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTK7

--------------------------------

The S3 sleeping state is a low wake latency sleeping state where all
system context is lost except system memory. This state will put memory
device controller into self-refresh mode in which the memory device
maintains its stored data without any active command from the memory
controller.

At present, only SW831 supports S3 sleep option and has been tested
successfully on SW831 CRB. BTW, one should upgrade SROM, HMCode and
BIOS firmwares to enable this function.
Signed-off-by: NXu Chenjiao <xuchenjiao@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

c87034a6

sw64: remove unused sync_icache() and some debug codes · 0bc809fc

由 He Sheng 提交于 9月 16, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------
Signed-off-by: NHe Sheng <hesheng@wxiat.com>
Reviewed-by: NCui Wei <cuiwei@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

0bc809fc

sw64: tools: add R_SW64_LITERAL_GOT support for relocs · 625fe3fe

由 He Sheng 提交于 9月 16, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------
Signed-off-by: NHe Sheng <hesheng@wxiat.com>
Reviewed-by: NCui Wei <cuiwei@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

625fe3fe

sw64: perf: add perf kvm support for guest os · 1c82a34d

由 Chen Wang 提交于 9月 16, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTJN

--------------------------------
Signed-off-by: NChen Wang <chenwang@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

1c82a34d

sw64: remove useless enum · c2c30b5b

由 He Chuyue 提交于 9月 09, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------
Signed-off-by: NHe Chuyue <hechuyue@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

c2c30b5b

sw64: perf: fix PMI with no event · 585104c0

由 He Chuyue 提交于 8月 04, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTIW

--------------------------------

For perf disable PMU, this then presents the following error
condition in processes schedule:

Process A		Process B

Disable irq
<PMC overflow>
Disable PMU
			Enable irq
			<PMI comes>
			   ->sw64_perf_event_irq_handler()

When irq is disabled, PMC may still overflow then a PMI triggers.
After another process is scheduled and irq is enabled, this PMI
will raise immediately.

To avoid this, clear interrupt flag in hmcode when it disable PMU.
However, in kernel, events that do not exist will return directly.
Signed-off-by: NHe Chuyue <hechuyue@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

585104c0

sw64: unify header guard naming · 931ff1e4

由 Mao Minkai 提交于 9月 07, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------

Use the canonical header guard naming of the full path to the header.
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

931ff1e4

sw64: fix incorrect white space use in macros · 2289af7c

由 Mao Minkai 提交于 9月 07, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG

--------------------------------
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

2289af7c

sw64: enable DEBUG_BUGVERBOSE by default · 0cc76daa

由 Mao Minkai 提交于 9月 06, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY

--------------------------------

Enable DEBUG_BUGVERBOSE by default to make debug easier.
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

0cc76daa

sw64: sort Kconfig select · 0864849a

由 Mao Minkai 提交于 9月 06, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY

--------------------------------
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

0864849a

sw64: clear .bss section using memset() · 2dedcf9a

由 Mao Minkai 提交于 9月 06, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY

--------------------------------
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

2dedcf9a

sw64: remove useless local r26 in setup_rt_frame() · c3084e72

由 Mao Minkai 提交于 9月 06, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY

--------------------------------
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

c3084e72

sw64: fix assembly style · bf05f16a

由 Mao Minkai 提交于 9月 06, 2022

Sunway inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XTGY

--------------------------------
Signed-off-by: NMao Minkai <maominkai@wxiat.com>
Reviewed-by: NHe Sheng <hesheng@wxiat.com>
Signed-off-by: NGu Zitao <guzitao@wxiat.com>

bf05f16a

31 10月, 2022 10 次提交

!178 bpf: add a bpf_override_reg helper · ea43b636

由 openeuler-ci-bot 提交于 10月 31, 2022

Merge Pull Request from: @JqyangCode 
 
[2022开源之夏] - 为ebpf添加新的辅助函数-bpf_override_reg ，功能是修改目标函数的寄存器，包括入参，返回值等
--------------------------------------
Error injection is a very important method for testing system stability. But for now,
it is still lacking in this regard, but BPF can fill this gap perfectly with its kprobe
function.

We can use some validation methods to ensure that it only fires on calls we restrict.
Although there are bpf_override_funciton that can complete some related operations
before, this is a function that bypasses the initial detection and only modifies the
return value to the specified value.This does not meet some of our practical scenarios:

1. For example, other registers (such as input registers) need to be modified: when we
receive a network packet, we will convert it into a structure and pass it to the corresponding
function for processing.For the fault tolerance of network data, we need to modify the
members of this structure, which it cannot do.

2. The function cannot be mounted or what needs to be modified is not the function but the
instruction: when the sensor reads the IO data, we need to simulate the IO data error.
At this time, the reading of the IO data may not be a function, but a few simple instructions

In summary, it is necessary to extend an interface that can modify any register, which can
provide us with a simple way to achieve system error injection

bugzilla: #I55FLX 
 
Link:https://gitee.com/openeuler/kernel/pulls/178 
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

ea43b636

bpf: add a bpf_override_regs helper · e05ebae4

由 JqyangCode 提交于 10月 31, 2022

openeuler inclusion
category: feature
bugzilla: https://gitee.com/open_euler/dashboard?issue_id=I55FLX
CVE: NA

Reference: NA

-------------------

Error injection is a very important method for testing system stability. But for now,
it is still lacking in this regard, but BPF can fill this gap perfectly with its kprobe
function.

We can use some validation methods to ensure that it only fires on calls we restrict.
Although there are bpf_override_funciton that can complete some related operations
before, this is a function that bypasses the initial detection and only modifies the
return value to the specified value.This does not meet some of our practical scenarios:

1. For example, other registers (such as input registers) need to be modified: when we
receive a network packet, we will convert it into a structure and pass it to the corresponding
function for processing.For the fault tolerance of network data, we need to modify the
members of this structure, which it cannot do.

2. The function cannot be mounted or what needs to be modified is not the function but the
instruction: when the sensor reads the IO data, we need to simulate the IO data error.
At this time, the reading of the IO data may not be a function, but a few simple instructions

In summary, it is necessary to extend an interface that can modify any register, which can
provide us with a simple way to achieve system error injection
Signed-off-by: NJqyangCode <teanix@163.com>

e05ebae4

!182 futex: introduce the direct-thread-switch mechanism · 525e6792

由 openeuler-ci-bot 提交于 10月 31, 2022

Merge Pull Request from: @hizhisong 
 
In some cases, we need to run several low-thrashing required threads together
which act as logical operations like PV operations. This kind of thread always
falls asleep and wakes other threads up, and thread switching requires the
kernel to do several scheduling related overheads (Select the proper core
to execute, wake the task up, enqueue the task, mark the task scheduling flag,
pick the task at the proper time, dequeue the task and do context switching).
These overheads mentioned above are not accepted for the low-thrashing threads.
Therefore, we require a mechanism to decline the unnecessary overhead and to
swap threads directly without affecting the fairness of CFS tasks.

To achieve this goal, we implemented the direct-thread-switch mechanism
based on the futex_swap patch*, which switches the DTS task directly with
the shared schedule entity. Also, we ensured the kernel keeps secure and
consistent basically.

\* https://lore.kernel.org/lkml/20200722234538.166697-2-posk@posk.io/
 
 
Link:https://gitee.com/openeuler/kernel/pulls/182 
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

525e6792

futex: introduce the direct-thread-switch mechanism · dad99a57

由 briansun 提交于 10月 25, 2022

openeuler inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
CVE: NA

Reference: https://lore.kernel.org/lkml/20200722234538.166697-2-posk@posk.io/

-------------------

In some scenarios, we need to run several low-thrashing required threads
together which act as logical operations like PV operations. This kind of
thread always falls asleep and wakes other threads up, and thread switching
requires the kernel to do several scheduling related overheads (Select the
proper core to execute, wake the task up, enqueue the task, mark the task
scheduling flag, pick the task at the proper time, dequeue the task and do
context switching). These overheads mentioned above are not accepted for the
low-thrashing threads. Therefore, we require a mechanism to decline the
unnecessary overhead and to swap threads directly without affecting the
fairness of CFS tasks.

To achieve this goal, we implemented the direct-thread-switch mechanism
based on the futex_swap patch*, which switches the DTS task directly with
the shared schedule entity. Also, we ensured the kernel keeps secure and
consistent basically.
Signed-off-by: NZhi Song <hizhisong@gmail.com>

dad99a57

selftests/futex: add futex_swap selftest · 8871eed8

由 Peter Oskolkov 提交于 7月 22, 2020

openeuler inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
CVE: NA

-------------------

This is the final patch in FUTEX_SWAP patchset. It
adds a test/benchmark to validate behavior and
compare performance of a new FUTEX_SWAP futex operation.

Detailed API design and behavior considerations are provided
in the commit messages of the previous two patches.
Signed-off-by: NPeter Oskolkov <posk@google.com>

8871eed8

futex/sched: add wake_up_process_prefer_current_cpu, use in FUTEX_SWAP · fc4a2354

由 Peter Oskolkov 提交于 7月 22, 2020

openeuler inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
CVE: NA

-------------------

As described in the previous patch in this patchset
("futex: introduce FUTEX_SWAP operation"), it is often
beneficial to wake a task and run it on the same CPU
where the current going to sleep task it running.

Internally at Google, switchto_switch sycall not only
migrates the wakee to the current CPU, but also moves
the waker's load stats to the wakee, thus ensuring
that the migration to the current CPU does not interfere
with load balancing. switchto_switch also does the
context switch into the wakee, bypassing schedule().

This patchset does not go that far yet, it simply
migrates the wakee to the current CPU and calls schedule().

In follow-up patches I will try to fune-tune the behavior by adjusting
load stats and schedule(): our internal switchto_switch
is still about 2x faster than FUTEX_SWAP (see numbers below).

And now about performance: futex_swap benchmark
from the last patch in this patchset produces this typical
output:

$ ./futex_swap -i 100000

------- running SWAP_WAKE_WAIT -----------

completed 100000 swap and back iterations in 820683263 ns: 4103 ns per swap
PASS

------- running SWAP_SWAP -----------

completed 100000 swap and back iterations in 124034476 ns: 620 ns per swap
PASS

In the above, the first benchmark (SWAP_WAKE_WAIT) calls FUTEX_WAKE,
then FUTEX_WAIT; the second benchmark (SWAP_SWAP) calls FUTEX_SWAP.

If the benchmark is restricted to a single cpu:

$ taskset -c 1 ./futex_swap -i 1000000

The numbers are very similar, as expected (with wake+wait being
a bit slower than swap due to two vs one syscalls).

Please also note that switchto_switch is about 2x faster than
FUTEX_SWAP because it does a contex switch to the wakee immediately,
bypassing schedule(), so this is one of the options I'll
explore in further patches (if/when this initial patchset is
accepted).

Tested: see the last patch is this patchset.
Signed-off-by: NPeter Oskolkov <posk@google.com>

fc4a2354

futex: introduce FUTEX_SWAP operation · b87ae678

由 Peter Oskolkov 提交于 7月 22, 2020

openeuler inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
CVE: NA

-------------------

As Paul Turner presented at LPC in 2013 ...
- pdf: http://pdxplumbers.osuosl.org/2013/ocw//system/presentations/1653/original/LPC%20-%20User%20Threading.pdf
- video: https://www.youtube.com/watch?v=KXuZi9aeGTw

... Google has developed an M:N userspace threading subsystem backed
by Google-private SwitchTo Linux Kernel API (page 17 in the pdf referenced
above). This subsystem provides latency-sensitive services at Google with
fine-grained user-space control/scheduling over what is running when,
and this subsystem is used widely internally (called schedulers or fibers).

This patchset is the first step to open-source this work. As explained
in the linked pdf and video, SwitchTo API has three core operations: wait,
resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation
that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation
on top of which user-space threading libraries can be built.

Another common use case for FUTEX_SWAP is message passing a-la RPC
between tasks: task/thread T1 prepares a message,
wakes T2 to work on it, and waits for the results; when T2 is done, it
wakes T1 and waits for more work to arrive. Currently the simplest
way to implement this is

a. T1: futex-wake T2, futex-wait
b. T2: wakes, does what it has been woken to do
c. T2: futex-wake T1, futex-wait

With FUTEX_SWAP, steps a and c above can be reduced to one futex operation
that runs 5-10 times faster.

Patches in this patchset:

Patch 1: (this patch) introduce FUTEX_SWAP futex operation that,
internally, does wake + wait. The purpose of this patch is
to work out the API.
Patch 2: a first rough attempt to make FUTEX_SWAP faster than
what wake + wait can do.
Patch 3: a selftest that can also be used to benchmark FUTEX_SWAP vs
FUTEX_WAKE + FUTEX_WAIT.

Tested: see patch 3 in this patchset.
Signed-off-by: NPeter Oskolkov <posk@google.com>

b87ae678

!172 f2fs: extent cache: support unaligned extent · 79d1782a

由 openeuler-ci-bot 提交于 10月 31, 2022

Merge Pull Request from: @gewus 
 
    Compressed inode may suffer read performance issue due to it can not
    use extent cache, so I propose to add this unaligned extent support
    to improve it.

    Currently, it only works in readonly format f2fs image.

    Unaligned extent: in one compressed cluster, physical block number
    will be less than logical block number, so we add an extra physical
    block length in extent info in order to indicate such extent status.

    The idea is if one whole cluster blocks are contiguous physically,
    once its mapping info was readed at first time, we will cache an
    unaligned (or aligned) extent info entry in extent cache, it expects
    that the mapping info will be hitted when rereading cluster. 
 
Link:https://gitee.com/openeuler/kernel/pulls/172 
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

79d1782a

f2fs: extent cache: support unaligned extent · 5282d0b8

由 Chao Yu 提交于 8月 04, 2021

mainline inclusion
from mainline-v5.19
commit 94afd6d6
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XX1X
CVE: NA

----------------------

Compressed inode may suffer read performance issue due to it can not
use extent cache, so I propose to add this unaligned extent support
to improve it.

Currently, it only works in readonly format f2fs image.

Unaligned extent: in one compressed cluster, physical block number
will be less than logical block number, so we add an extra physical
block length in extent info in order to indicate such extent status.

The idea is if one whole cluster blocks are contiguous physically,
once its mapping info was readed at first time, we will cache an
unaligned (or aligned) extent info entry in extent cache, it expects
that the mapping info will be hitted when rereading cluster.

Merge policy:
- Aligned extents can be merged.
- Aligned extent and unaligned extent can not be merged.
Signed-off-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NGewus <1319579758@qq.com>

5282d0b8

f2fs: enable extent cache for compression files in read-only · 6f08e1da

由 Daeho Jeong 提交于 6月 15, 2021

mainline inclusion
from mainline-v5.19
commit 4215d054
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5XX1X
CVE: NA

-----------------

Let's allow extent cache for RO partition.
Signed-off-by: NDaeho Jeong <daehojeong@google.com>
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NGeuws <1319579758@qq.com>

6f08e1da

29 10月, 2022 2 次提交

!189 mm: page_alloc: Add a tracepoint to trace the call of __alloc_pages() and export symbols · 3b7ab23f

由 openeuler-ci-bot 提交于 10月 29, 2022

Merge Pull Request from: @AoDaMo 
 
This is the result of the OSPP 2022 Project

/proc/meminfo is the main way for user to know the physical memory usage, but it cannot know every operation of allocating physical pages. Some kernel modules can call alloc_pages to get physical pages directly, which part of physical memory will not be counted by /proc/meminfo.
    
In order to trace the specific process of physical page allocation and release, a solution is to insert a new modules into kernel and register tracepoint handlers to obtain the physical page allocation information. So i added a new tracepoint named mm_page_alloc_enter at the entrance of mm/page_alloc.c:__alloc_pages(), and exported relevant tracepoint symbols for kernel module programming. 
 
Link:https://gitee.com/openeuler/kernel/pulls/189 
Reviewed-by: Liu YongQiang <liuyongqiang13@huawei.com> 
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

3b7ab23f

!167 Summer OSPP 2022:Support Multi Gen LRU in openEuler · 63906034

由 openeuler-ci-bot 提交于 10月 29, 2022

Merge Pull Request from: @morymiss

Multi Gen LRU is a feature for optimizing the memory recycling strategy. According to Google's test, with the help of MGLRU, the CPU utilization of kswapd is reduced by 40%, the background miskill is reduced by 85% when 75% of the memory is occupied, and the rendering delay is reduced by 18% when 50% of the memory is occupied. This feature has been integrated into the Linux next branch. In this PR, the patch is integrated into the openEuler-22.09 branch.

MGLRU is mainly designed for memory recycling. First, the LRU linked list of the management page is divided into more algebra. In each generation, the number of page refluts is divided into different levels to achieve more granular management of the page. At the same time, when executing the kswapd daemon, scan the PTE of the active processes since the last execution, instead of directly scanning the physical memory. This can effectively use the spatial locality, improve the scanning efficiency, and reduce the CPU overhead.

In the current test results, the IOPS and BW indicators of the fio tool are significantly improved compared with the conventional recovery methods, which proves that the memory management performance of the system is significantly improved under high IO load, and the cpu occupation of the kswapd process is significantly reduced in the test of cpu occupation. In addition, in the current test, the server, embedded and mobile devices are also tested in scenarios. The test results show the good performance of MGLRU in memory recycling, especially in high load scenarios.

bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L

Reference:
https://lore.kernel.org/lkml/20220407031525.2368067-1-yuzhao @google.com/
https://android-review.googlesource.com/c/kernel/common/ +/2050906/10

Link:https://gitee.com/openeuler/kernel/pulls/167
Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

63906034

28 10月, 2022 6 次提交

mm: multi-gen LRU: design doc · cd89f4b1

由 Yu Zhao 提交于 3月 06, 2022

mainline inclusion
from mainline-v6.1-rc1
commit 8be976a0
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
CVE: NA
Reference: https://android-review.googlesource.com/c/kernel/common/+/2050919/10

----------------------------------------------------------------------

Add a design doc.

Link: https://lore.kernel.org/r/20220309021230.721028-15-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
Change-Id: I1d66302e618416291ebf9647e20625fb76613c89
Signed-off-by: NYuLinjia <3110442349@qq.com>

cd89f4b1

mm: multi-gen LRU: admin guide · 6231535d

由 Yu Zhao 提交于 1月 23, 2022

mainline inclusion
from mainline-v6.1-rc1
commit 07017acb
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
CVE: NA
Reference: https://android-review.googlesource.com/c/kernel/common/+/2050918/10

----------------------------------------------------------------------

Add an admin guide.

Link: https://lore.kernel.org/r/20220309021230.721028-14-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
Change-Id: I6fafbd7eb3ef6819cfcd30376459f14893f17c63
Signed-off-by: NYuLinjia <3110442349@qq.com>

6231535d

mm: multi-gen LRU: debugfs interface · 5cc7182c

由 Yu Zhao 提交于 1月 27, 2022

mainline inclusion
from mainline-v6.1-rc1
commit d6c3af7d
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
CVE: NA
Reference: https://android-review.googlesource.com/c/kernel/common/+/2050917/10

----------------------------------------------------------------------

Add /sys/kernel/debug/lru_gen for working set estimation and proactive
reclaim. These features are required to optimize job scheduling (bin
packing) in data centers [1][2].

Compared with the page table-based approach and the PFN-based
approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has
the following advantages:
1. It offers better choices because it is aware of memcgs, NUMA nodes,
   shared mappings and unmapped page cache.
2. It is more scalable because it is O(nr_hot_pages), whereas the
   PFN-based approach is O(nr_total_pages).

Add /sys/kernel/debug/lru_gen_full for debugging.

[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731

Link: https://lore.kernel.org/r/20220309021230.721028-13-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
Change-Id: Ie558098e0a24a647f77f4eacc4d72576173fc0b8
Signed-off-by: NYuLinjia <3110442349@qq.com>

5cc7182c

mm: multi-gen LRU: thrashing prevention · fdf5d9bf

由 Yu Zhao 提交于 1月 27, 2022

mainline inclusion
from mainline-v6.1-rc1
commit 1332a809
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
CVE: NA
Reference: https://android-review.googlesource.com/c/kernel/common/+/2050916/10

----------------------------------------------------------------------

Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention, as
requested by many desktop users [1].

When set to value N, it prevents the working set of N milliseconds
from getting evicted. The OOM killer is triggered if this working set
cannot be kept in memory. Based on the average human detectable lag
(~100ms), N=1000 usually eliminates intolerable lags due to thrashing.
Larger values like N=3000 make lags less noticeable at the risk of
premature OOM kills.

Compared with the size-based approach, e.g., [2], this time-based
approach has the following advantages:
1. It is easier to configure because it is agnostic to applications
   and memory sizes.
2. It is more reliable because it is directly wired to the OOM killer.

[1] https://lore.kernel.org/lkml/Ydza%2FzXKY9ATRoh6@google.com/
[2] https://lore.kernel.org/lkml/20211130201652.2218636d@mail.inbox.lv/

Link: https://lore.kernel.org/r/20220309021230.721028-12-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
Change-Id: I482d33f3beaf7723d2f3eeaaa5b4f12bcb9b48a1
Signed-off-by: NYuLinjia <3110442349@qq.com>

fdf5d9bf

mm: multi-gen LRU: kill switch · 500d0e5e

由 Yu Zhao 提交于 1月 27, 2022

mainline inclusion
from mainline-v6.1-rc1
commit 354ed597
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
CVE: NA
Reference: https://android-review.googlesource.com/c/kernel/common/+/2050915/10

----------------------------------------------------------------------

Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that
can be disabled include:
  0x0001: the multi-gen LRU core
  0x0002: walking page table, when arch_has_hw_pte_young() returns
          true
  0x0004: clearing the accessed bit in non-leaf PMD entries, when
          CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y
  [yYnN]: apply to all the components above
E.g.,
  echo y >/sys/kernel/mm/lru_gen/enabled
  cat /sys/kernel/mm/lru_gen/enabled
  0x0007
  echo 5 >/sys/kernel/mm/lru_gen/enabled
  cat /sys/kernel/mm/lru_gen/enabled
  0x0005

NB: the page table walks happen on the scale of seconds under heavy
memory pressure, in which case the mmap_lock contention is a lesser
concern, compared with the LRU lock contention and the I/O congestion.
So far the only well-known case of the mmap_lock contention happens on
Android, due to Scudo [1] which allocates several thousand VMAs for
merely a few hundred MBs. The SPF and the Maple Tree also have
provided their own assessments [2][3]. However, if walking page tables
does worsen the mmap_lock contention, the kill switch can be used to
disable it. In this case the multi-gen LRU will suffer a minor
performance degradation, as shown previously.

Clearing the accessed bit in non-leaf PMD entries can also be
disabled, since this behavior was not tested on x86 varieties other
than Intel and AMD.

[1] https://source.android.com/devices/tech/debug/scudo
[2] https://lore.kernel.org/lkml/20220128131006.67712-1-michel@lespinasse.org/
[3] https://lore.kernel.org/lkml/20220202024137.2516438-1-Liam.Howlett@oracle.com/

Link: https://lore.kernel.org/r/20220309021230.721028-11-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
Change-Id: I71801d9470a2588cad8bfd14fbcfafc7b010aa03
Signed-off-by: NYuLinjia <3110442349@qq.com>

500d0e5e

mm: multi-gen LRU: optimize multiple memcgs · 46a1f8f1

由 Yu Zhao 提交于 1月 27, 2022

mainline inclusion
from mainline-v6.1-rc1
commit f76c8337
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
CVE: NA
Reference: https://android-review.googlesource.com/c/kernel/common/+/2050914/10

----------------------------------------------------------------------

When multiple memcgs are available, it is possible to make better
choices based on generations and tiers and therefore improve the
overall performance under global memory pressure. This patch adds a
rudimentary optimization to select memcgs that can drop single-use
unmapped clean pages first. Doing so reduces the chance of going into
the aging path or swapping. These two operations can be costly.

A typical example that benefits from this optimization is a server
running mixed types of workloads, e.g., heavy anon workload in one
memcg and heavy buffered I/O workload in the other.

Though this optimization can be applied to both kswapd and direct
reclaim, it is only added to kswapd to keep the patchset manageable.
Later improvements will cover the direct reclaim path.

Server benchmark results:
  Mixed workloads:
    fio (buffered I/O): -[23, 25]%
                         IOPS         BW
      patch1-8:          2960k        11.3GiB/s
      patch1-9:          2248k        8783MiB/s

    memcached (anon): +[210, 214]%
                         Ops/sec      KB/sec
      patch1-8:          606940.09    23576.89
      patch1-9:          1895197.49   73619.93

  Mixed workloads:
    fio (buffered I/O): -[4, 6]%
                         IOPS         BW
      5.18-ed464352: 2369k        9255MiB/s
      patch1-9:          2248k        8783MiB/s

    memcached (anon): +[510, 516]%
                         Ops/sec      KB/sec
      5.18-ed464352: 309189.58    12010.61
      patch1-9:          1895197.49   73619.93

  Configurations:
    (changes since patch 6)

    cat mixed.sh
    modprobe brd rd_nr=2 rd_size=56623104

    swapoff -a
    mkswap /dev/ram0
    swapon /dev/ram0

    mkfs.ext4 /dev/ram1
    mount -t ext4 /dev/ram1 /mnt

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \
      --ratio 1:0 --pipeline 8 -d 2000

    fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \
      --buffered=1 --ioengine=io_uring --iodepth=128 \
      --iodepth_batch_submit=32 --iodepth_batch_complete=32 \
      --rw=randread --random_distribution=random --norandommap \
      --time_based --ramp_time=10m --runtime=90m --group_reporting &
    pid=$!

    sleep 200

    memtier_benchmark -S /var/run/memcached/memcached.sock \
      -P memcache_binary -n allkeys --key-minimum=1 \
      --key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \
      --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed

    kill -INT $pid
    wait

Client benchmark results:
  no change (CONFIG_MEMCG=n)

Link: https://lore.kernel.org/r/20220309021230.721028-10-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
Change-Id: I0641467dbd7c5ba0645602cec7fe8d6fdb750edb
Signed-off-by: NYuLinjia <3110442349@qq.com>

46a1f8f1

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功