mm/memory.c · dca02ff3dde48fa55e7f4461140b2b485d9a31d7 · openeuler / Kernel

mm: multi-gen LRU: groundwork · dca02ff3

由 Yu Zhao 提交于 1月 25, 2021

mainline inclusion
from mainline-v6.1-rc1
commit ec1c86b2
category: feature
bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
CVE: NA
Reference: https://android-review.googlesource.com/c/kernel/common/+/2050910/10

----------------------------------------------------------------------

Evictable pages are divided into multiple generations for each lruvec.
The youngest generation number is stored in lrugen->max_seq for both
anon and file types as they are aged on an equal footing. The oldest
generation numbers are stored in lrugen->min_seq[] separately for anon
and file types as clean file pages can be evicted regardless of swap
constraints. These three variables are monotonically increasing.

Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
in order to fit into the gen counter in page->flags. Each truncated
generation number is an index to lrugen->lists[]. The sliding window
technique is used to track at least MIN_NR_GENS and at most
MAX_NR_GENS generations. The gen counter stores a value within [1,
MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
stores 0.

There are two conceptually independent procedures: "the aging", which
produces young generations, and "the eviction", which consumes old
generations. They form a closed-loop system, i.e., "the page reclaim".
Both procedures can be invoked from userspace for the purposes of
working set estimation and proactive reclaim. These features are
required to optimize job scheduling (bin packing) in data centers. The
variable size of the sliding window is designed for such use cases
[1][2].

To avoid confusion, the terms "hot" and "cold" will be applied to the
multi-gen LRU, as a new convention; the terms "active" and "inactive"
will be applied to the active/inactive LRU, as usual.

The protection of hot pages and the selection of cold pages are based
on page access channels and patterns. There are two access channels:
one through page tables and the other through file descriptors. The
protection of the former channel is by design stronger because:
1. The uncertainty in determining the access patterns of the former
channel is higher due to the approximation of the accessed bit.
2. The cost of evicting the former channel is higher due to the TLB
flushes required and the likelihood of encountering the dirty bit.
3. The penalty of underprotecting the former channel is higher because
applications usually do not prepare themselves for major page
faults like they do for blocked I/O. E.g., GUI applications
commonly use dedicated I/O threads to avoid blocking the rendering
threads.
There are also two access patterns: one with temporal locality and the
other without. For the reasons listed above, the former channel is
assumed to follow the former pattern unless VM_SEQ_READ or
VM_RAND_READ is present; the latter channel is assumed to follow the
latter pattern unless outlying refaults have been observed [3][4].

The next patch will address the "outlying refaults". Three macros,
i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are
added in this patch to make the entire patchset less diffy.

A page is added to the youngest generation on faulting. The aging
needs to check the accessed bit at least twice before handing this
page over to the eviction. The first check takes care of the accessed
bit set on the initial fault; the second check makes sure this page
has not been used since then. This protocol, AKA second chance,
requires a minimum of two generations, hence MIN_NR_GENS.

[1] https://dl.acm.org/doi/10.1145/3297858.3304053
[2] https://dl.acm.org/doi/10.1145/3503222.3507731
[3] https://lwn.net/Articles/495543/
[4] https://lwn.net/Articles/815342/

Link: https://lore.kernel.org/r/20220309021230.721028-6-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512
Signed-off-by: NYuLinjia <3110442349@qq.com>

dca02ff3

memory.c 148.9 KB

openeuler / Kernel 大约 2 年 前同步成功

Replace memory.c

openeuler / Kernel
大约 2 年前同步成功