alios: mm: Support kidled
This enables scanning pages in fixed interval to determine their access
frequency (hot/cold). The result is exported to user land on basis of
memory cgroup by "memory.idle_page_stats". The design is highlighted as
below:
* A kernel thread is spawn when this feature is enabled by writing
non-zero value to "/sys/kernel/mm/kidled/scan_period_in_seconds".
The thread sequentially scans the nodes and their pages that have
been chained up in LRU list.
* For each page, its corresponding age information is stored in the
page flags or array in node. The age represents the scanning intervals
in which the page isn't accessed. Also, the page flag (PG_idle) is
leveraged. The page's age is increased by one if the idle flag isn't
cleared in two consective scans. Otherwise, the page's age is cleared out.
Also, the page's age information is cleared when it's free'd so that
the stale age information won't be fetched when it's allocated.
* Initially, the flag is set, while the access bit in its PTE is cleared
out by the thread. In next scanning period, its PTE access bit is
synchronized with the page flag: clear the flag if access bit is set.
The flag is kept otherwise. For unmapped pages, the flag is cleared
when it's accessed.
* Eventually, the page's aging information is updated to the unstable
bucket of its corresponding memory cgroup, taking as statistics. The
unstable bucket (statistics) is copied to stable bucket when all pages
in all nodes are scanned for once. The stable bucket (statistics) is
exported to user land through "memory.idle_page_stats".
TESTING
=======
* cgroup1, unmapped pagecache
# dd if=/dev/zero of=/ext4/test.data oflag=direct bs=1M count=128
#
# echo 1 > /sys/kernel/mm/kidled/use_hierarchy
# echo 15 > /sys/kernel/mm/kidled/scan_period_in_seconds
# mkdir -p /cgroup/memory
# mount -tcgroup -o memory /cgroup/memory
# echo 1 > /cgroup/memory/memory.use_hierarchy
# mkdir -p /cgroup/memory/test
# echo 1 > /cgroup/memory/test/memory.use_hierarchy
#
# echo $$ > /cgroup/memory/test/cgroup.procs
# dd if=/ext4/test.data of=/dev/null bs=1M count=128
# < wait a few minutes >
# cat /cgroup/memory/test/memory.idle_page_stats | grep cfei
# cat /cgroup/memory/test/memory.idle_page_stats | grep cfei
cfei 0 0 0 134217728 0 0 0 0
# cat /cgroup/memory/memory.idle_page_stats | grep cfei
cfei 0 0 0 134217728 0 0 0 0
* cgroup1, mapped pagecache
# < create same file and memory cgroups as above >
#
# echo $$ > /cgroup/memory/test/cgroup.procs
# < run program to mmap the whole created file and access the area >
# < wait a few minutes >
# cat /cgroup/memory/test/memory.idle_page_stats | grep cfei
cfei 0 134217728 0 0 0 0 0 0
# cat /cgroup/memory/memory.idle_page_stats | grep cfei
cfei 0 134217728 0 0 0 0 0 0
* cgroup1, mapped and locked pagecache
# < create same file and memory cgroups as above >
#
# echo $$ > /cgroup/memory/test/cgroup.procs
# < run program to mmap the whole created file and mlock the area >
# < wait a few minutes >
# cat /cgroup/memory/test/memory.idle_page_stats | grep cfui
cfui 0 134217728 0 0 0 0 0 0
# cat /cgroup/memory/memory.idle_page_stats | grep cfui
cfui 0 134217728 0 0 0 0 0 0
* cgroup1, anonymous and locked area
# < create memory cgroups as above >
#
# echo $$ > /cgroup/memory/test/cgroup.procs
# < run program to mmap anonymous area and mlock it >
# < wait a few minutes >
# cat /cgroup/memory/test/memory.idle_page_stats | grep csui
csui 0 0 134217728 0 0 0 0 0
# cat /cgroup/memory/memory.idle_page_stats | grep csui
csui 0 0 134217728 0 0 0 0 0
* Rerun above test cases in cgroup2 and the results are no exceptional.
However, the cgroups are populated in different way as below:
# mkdir -p /cgroup
# mount -tcgroup2 none /cgroup
# echo "+memory" > /cgroup/cgroup.subtree_control
# mkdir -p /cgroup/test
Signed-off-by: NGavin Shan <shan.gavin@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
Showing
Documentation/vm/kidled.rst
0 → 100644
include/linux/kidled.h
0 → 100644
mm/kidled.c
0 → 100644
想要评论请 注册 或 登录