• Y
    mm: multi-gen LRU: groundwork · dca02ff3
    Yu Zhao 提交于
    mainline inclusion
    from mainline-v6.1-rc1
    commit ec1c86b2
    category: feature
    bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
    CVE: NA
    Reference: https://android-review.googlesource.com/c/kernel/common/+/2050910/10
    
    ----------------------------------------------------------------------
    
    Evictable pages are divided into multiple generations for each lruvec.
    The youngest generation number is stored in lrugen->max_seq for both
    anon and file types as they are aged on an equal footing. The oldest
    generation numbers are stored in lrugen->min_seq[] separately for anon
    and file types as clean file pages can be evicted regardless of swap
    constraints. These three variables are monotonically increasing.
    
    Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
    in order to fit into the gen counter in page->flags. Each truncated
    generation number is an index to lrugen->lists[]. The sliding window
    technique is used to track at least MIN_NR_GENS and at most
    MAX_NR_GENS generations. The gen counter stores a value within [1,
    MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
    stores 0.
    
    There are two conceptually independent procedures: "the aging", which
    produces young generations, and "the eviction", which consumes old
    generations. They form a closed-loop system, i.e., "the page reclaim".
    Both procedures can be invoked from userspace for the purposes of
    working set estimation and proactive reclaim. These features are
    required to optimize job scheduling (bin packing) in data centers. The
    variable size of the sliding window is designed for such use cases
    [1][2].
    
    To avoid confusion, the terms "hot" and "cold" will be applied to the
    multi-gen LRU, as a new convention; the terms "active" and "inactive"
    will be applied to the active/inactive LRU, as usual.
    
    The protection of hot pages and the selection of cold pages are based
    on page access channels and patterns. There are two access channels:
    one through page tables and the other through file descriptors. The
    protection of the former channel is by design stronger because:
    1. The uncertainty in determining the access patterns of the former
       channel is higher due to the approximation of the accessed bit.
    2. The cost of evicting the former channel is higher due to the TLB
       flushes required and the likelihood of encountering the dirty bit.
    3. The penalty of underprotecting the former channel is higher because
       applications usually do not prepare themselves for major page
       faults like they do for blocked I/O. E.g., GUI applications
       commonly use dedicated I/O threads to avoid blocking the rendering
       threads.
    There are also two access patterns: one with temporal locality and the
    other without. For the reasons listed above, the former channel is
    assumed to follow the former pattern unless VM_SEQ_READ or
    VM_RAND_READ is present; the latter channel is assumed to follow the
    latter pattern unless outlying refaults have been observed [3][4].
    
    The next patch will address the "outlying refaults". Three macros,
    i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are
    added in this patch to make the entire patchset less diffy.
    
    A page is added to the youngest generation on faulting. The aging
    needs to check the accessed bit at least twice before handing this
    page over to the eviction. The first check takes care of the accessed
    bit set on the initial fault; the second check makes sure this page
    has not been used since then. This protocol, AKA second chance,
    requires a minimum of two generations, hence MIN_NR_GENS.
    
    [1] https://dl.acm.org/doi/10.1145/3297858.3304053
    [2] https://dl.acm.org/doi/10.1145/3503222.3507731
    [3] https://lwn.net/Articles/495543/
    [4] https://lwn.net/Articles/815342/
    
    Link: https://lore.kernel.org/r/20220309021230.721028-6-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
    Acked-by: NBrian Geffon <bgeffon@google.com>
    Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
    Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
    Acked-by: NSteven Barrett <steven@liquorix.net>
    Acked-by: NSuleiman Souhlal <suleiman@google.com>
    Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
    Tested-by: NDonald Carr <d@chaos-reins.com>
    Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
    Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
    Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
    Tested-by: NSofia Trinh <sofia.trinh@edi.works>
    Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
    Bug: 227651406
    Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
    Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512
    Signed-off-by: NYuLinjia <3110442349@qq.com>
    dca02ff3
memory.c 148.9 KB