提交 · 8f39ae66235cca9fa09570488c6943d891cb35d9 · openeuler / Kernel

07 4月, 2022 2 次提交

memcg: Export memory.events and memory.events.local from cgroupv2 to cgroupv1 · 8f39ae66

由 Lu Jialin 提交于 4月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4X0YD?from=project-issue
CVE: NA

--------

Export "memory.events" and "memory.events.local" from cgroupv2 to
cgroupv1.

There are some differences between v2 and v1:

1)events of MEMCG_OOM_GROUP_KILL is not included in cgroupv1. Because,
there is no member of memory.oom.group.

2)events of MEMCG_MAX is represented with "limit_in_bytes" in cgroupv1 instead
of memory.max

3)event of oom_kill is include in memory.oom_control. make oom_kill include
its descendants' events and add oom_kill_local include its oom_kill event only.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8f39ae66

mm/memcg_memfs_info: show files that having pages charged in mem_cgroup · 6b1d4d3a

由 Liu Shixin 提交于 4月 07, 2022

hulk inclusion
category: feature
bugzilla: 186182, https://gitee.com/openeuler/kernel/issues/I4UOJI
CVE: NA

--------------------------------

Support to print rootfs files and tmpfs files that having pages charged
in given memory cgroup. The files infomations can be printed through
interface "memory.memfs_files_info" or printed when OOM is triggered.

In order not to flush memory logs, we limit the maximum number of files
to be printed when oom through interface "max_print_files_in_oom". And
in order to filter out small files, we limit the minimum size of files
that can be printed through interface "size_threshold".
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6b1d4d3a

20 3月, 2022 1 次提交

mm/dynamic_hugetlb: use mem_cgroup_force_empty to reclaim pages · 94c749a3

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

When all processes in the memory cgroup are finished, some memory may still be
occupied such as file cache. Use mem_cgroup_force_empty to reclaim these pages
that charged in the memory cgroup before merging all pages.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

94c749a3

19 1月, 2022 5 次提交

mm/dynamic_hugetlb: add interface to disable normal pages allocation · cf8510b3

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

Add new interface "dhugetlb.disable_normal_pages" to disable the allocation
of normal pages from a hpool. This makes dynamic hugetlb more flexible.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cf8510b3

mm/dynamic_hugetlb: alloc page from dhugetlb_pool · 32d6d14f

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

Add function to alloc page from dhugetlb_pool.
When process is bound to a mem_cgroup configured with dhugtlb_pool, alloc
page from dhugetlb_pool firstly. If there is no page in dhugetlb_pool,
fallback to alloc page from buddy system.

As the process will alloc pages from dhugetlb_pool in the mem_cgroup,
process is not allowed to migrate to other mem_cgroup.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

32d6d14f

mm/dynamic_hugetlb: add interface to configure the count of hugepages · 0c06a1c0

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

Add two interfaces in mem_cgroup to configure the count of 1G/2M hugepages
in dhugetlb_pool.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0c06a1c0

mm/dynamic_hugetlb: establish the dynamic hugetlb feature framework · a8a836a3

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

Dynamic hugetlb is a self-developed feature based on the hugetlb and memcontrol.
It supports to split huge page dynamically in a memory cgroup. There is a new structure
dhugetlb_pool in every mem_cgroup to manage the pages configured to the mem_cgroup.
For the mem_cgroup configured with dhugetlb_pool, processes in the mem_cgroup will
preferentially use the pages in dhugetlb_pool.

Dynamic hugetlb supports three types of pages, including 1G/2M huge pages and 4K pages.
For the mem_cgroup configured with dhugetlb_pool, processes will be limited to alloc
1G/2M huge pages only from dhugetlb_pool. But there is no such constraint for 4K pages.
If there are insufficient 4K pages in the dhugetlb_pool, pages can also be allocated from
buddy system. So before using dynamic hugetlb, user must know how many huge pages they
need.

Usage:
1. Add 'dynamic_hugetlb=on' in cmdline to enable dynamic hugetlb feature.
2. Prealloc some 1G hugepages through hugetlb.
3. Create a mem_cgroup and configure dhugetlb_pool to mem_cgroup.
4. Configure the count of 1G/2M hugepages, and the remaining pages in dhugetlb_pool will
be used as basic pages.
5. Bound a process to mem_cgroup. then the memory for it will be allocated from dhugetlb_pool.

This patch add the corresponding structure dhugetlb_pool for dynamic hugetlb feature,
the interface 'dhugetlb.nr_pages' in mem_cgroup to configure dhugetlb_pool and the cmdline
'dynamic_hugetlb=on' to enable dynamic hugetlb feature.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a8a836a3

mm: declare several functions · 98ecb3cd

由 Liu Shixin 提交于 1月 18, 2022

hulk inclusion
category: feature
bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
CVE: NA

--------------------------------

There are several functions that will be used in next patches for
dynamic hugetlb feature. Declare them.

No functional changes.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

98ecb3cd

07 1月, 2022 4 次提交

memcg: Add static key for memcg kswapd · 84355bcc

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

This patch adds a default-false static key to disable memcg kswapd
feature. User can enable by set memcg_kswapd in cmdline.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

84355bcc

memcg: support memcg sync reclaim work as kswapd · 1496d67c

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

Since memory.high reclaim is sync whether is in interrupt, it could
do more work than direct reclaim, i.e. write out dirty page, etc.

So, add PF_KSWAPD flag, so that current_is_kswapd() would return true
for memcg kswapd.

Memcg kswapd should stop when usage of memcg fit the memcg kswapd stop
flag. When the userland sets the memcg->memory.max, the stop_flag is
(memcg->memory.high - memcg->memory.max * 10 / 1000), which is similar
with global kswapd. Otherwise, the stop_flag is (memcg->memory.high -
memcg->memory.high / 6), which is similar with most difference between
watermark_low and watermark_high.

And, memcg kswapd should not break memory.low protection for now.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1496d67c

memcg: Export memcg.high from cgroupv2 to cgroupv1 · 6a7b3e98

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

Export memory.high from cgroupv2 to cgroupv1. Therefore, when the usage
of the memcg is larger than memory.high, some pages will be reclaimed
before return to userland, which will throttle the process.

Only export memory.high number in mem_cgroup_legacy_files and move
related functions in front of mem_cgroup_legacy_files. There is no need
to other changes.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6a7b3e98

memcg: Export memcg.{min/low} from cgroupv2 to cgroupv1 · 27c047f4

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

Export memcg.min and memcg.low from cgroupv2 to cgroupv1, in order to reduce
the negtive impact between cgroups when the system memory is insufficient.

Only export memory.{min/low} numbers in mem_cgroup_legacy_files and move
related functions in front of mem_cgroup_legacy_files. There is no need
to other changes.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

27c047f4

30 12月, 2021 1 次提交

arm64/ascend: Add new enable_oom_killer interface for oom contrl · 6d494d7f

由 Weilong Chen 提交于 12月 30, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4K2U5
CVE: NA

-------------------------------------------------

Support disable oom-killer, and report oom events to bbox
vm.enable_oom_killer:
	0: disable oom killer
	1: enable oom killer (default,compatible with mainline)
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZhang Jian <zhangjian210@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6d494d7f

06 12月, 2021 1 次提交

memcg: prohibit unconditional exceeding the limit of dying tasks · 8937d4d7

由 Vasily Averin 提交于 12月 06, 2021

stable inclusion
from stable-5.10.80
commit 74293225f50391620aaef3507ebd6fd17e0003e1
bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=74293225f50391620aaef3507ebd6fd17e0003e1

--------------------------------

commit a4ebf1b6 upstream.

Memory cgroup charging allows killed or exiting tasks to exceed the hard
limit.  It is assumed that the amount of the memory charged by those
tasks is bound and most of the memory will get released while the task
is exiting.  This is resembling a heuristic for the global OOM situation
when tasks get access to memory reserves.  There is no global memory
shortage at the memcg level so the memcg heuristic is more relieved.

The above assumption is overly optimistic though.  E.g.  vmalloc can
scale to really large requests and the heuristic would allow that.  We
used to have an early break in the vmalloc allocator for killed tasks
but this has been reverted by commit b8c8a338 ("Revert "vmalloc:
back off when the current task is killed"").  There are likely other
similar code paths which do not check for fatal signals in an
allocation&charge loop.  Also there are some kernel objects charged to a
memcg which are not bound to a process life time.

It has been observed that it is not really hard to trigger these
bypasses and cause global OOM situation.

One potential way to address these runaways would be to limit the amount
of excess (similar to the global OOM with limited oom reserves).  This
is certainly possible but it is not really clear how much of an excess
is desirable and still protects from global OOMs as that would have to
consider the overall memcg configuration.

This patch is addressing the problem by removing the heuristic
altogether.  Bypass is only allowed for requests which either cannot
fail or where the failure is not desirable while excess should be still
limited (e.g.  atomic requests).  Implementation wise a killed or dying
task fails to charge if it has passed the OOM killer stage.  That should
give all forms of reclaim chance to restore the limit before the failure
(ENOMEM) and tell the caller to back off.

In addition, this patch renames should_force_charge() helper to
task_is_dying() because now its use is not associated witch forced
charging.

This patch depends on pagefault_out_of_memory() to not trigger
out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM
and cause a global OOM killer.

Link: https://lkml.kernel.org/r/8f5cebbb-06da-4902-91f0-6566fc4b4203@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Suggested-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8937d4d7

30 11月, 2021 11 次提交

memcg: unify memcg stat flushing · 863511a4