1. 02 1月, 2020 1 次提交
    • G
      alinux: mm: Support kidled · f55ac551
      Gavin Shan 提交于
      This enables scanning pages in fixed interval to determine their access
      frequency (hot/cold). The result is exported to user land on basis of
      memory cgroup by "memory.idle_page_stats". The design is highlighted as
      below:
      
         * A kernel thread is spawn when this feature is enabled by writing
           non-zero value to "/sys/kernel/mm/kidled/scan_period_in_seconds".
           The thread sequentially scans the nodes and their pages that have
           been chained up in LRU list.
      
         * For each page, its corresponding age information is stored in the
           page flags or array in node. The age represents the scanning intervals
           in which the page isn't accessed. Also, the page flag (PG_idle) is
           leveraged. The page's age is increased by one if the idle flag isn't
           cleared in two consective scans. Otherwise, the page's age is cleared out.
           Also, the page's age information is cleared when it's free'd so that
           the stale age information won't be fetched when it's allocated.
      
         * Initially, the flag is set, while the access bit in its PTE is cleared
           out by the thread. In next scanning period, its PTE access bit is
           synchronized with the page flag: clear the flag if access bit is set.
           The flag is kept otherwise. For unmapped pages, the flag is cleared
           when it's accessed.
      
         * Eventually, the page's aging information is updated to the unstable
           bucket of its corresponding memory cgroup, taking as statistics. The
           unstable bucket (statistics) is copied to stable bucket when all pages
           in all nodes are scanned for once. The stable bucket (statistics) is
           exported to user land through "memory.idle_page_stats".
      
      TESTING
      =======
      
         * cgroup1, unmapped pagecache
      
           # dd if=/dev/zero of=/ext4/test.data oflag=direct bs=1M count=128
           #
           # echo 1 > /sys/kernel/mm/kidled/use_hierarchy
           # echo 15 > /sys/kernel/mm/kidled/scan_period_in_seconds
           # mkdir -p /cgroup/memory
           # mount -tcgroup -o memory /cgroup/memory
           # echo 1 > /cgroup/memory/memory.use_hierarchy
           # mkdir -p /cgroup/memory/test
           # echo 1 > /cgroup/memory/test/memory.use_hierarchy
           #
           # echo $$ > /cgroup/memory/test/cgroup.procs
           # dd if=/ext4/test.data of=/dev/null bs=1M count=128
           # < wait a few minutes >
           # cat /cgroup/memory/test/memory.idle_page_stats | grep cfei
           # cat /cgroup/memory/test/memory.idle_page_stats | grep cfei
             cfei   0   0   0   134217728   0   0   0   0
           # cat /cgroup/memory/memory.idle_page_stats | grep cfei
             cfei   0   0   0   134217728   0   0   0   0
      
         * cgroup1, mapped pagecache
      
           # < create same file and memory cgroups as above >
           #
           # echo $$ > /cgroup/memory/test/cgroup.procs
           # < run program to mmap the whole created file and access the area >
           # < wait a few minutes >
           # cat /cgroup/memory/test/memory.idle_page_stats | grep cfei
             cfei   0   134217728   0   0   0   0   0   0
           # cat /cgroup/memory/memory.idle_page_stats | grep cfei
             cfei   0   134217728   0   0   0   0   0   0
      
         * cgroup1, mapped and locked pagecache
      
           # < create same file and memory cgroups as above >
           #
           # echo $$ > /cgroup/memory/test/cgroup.procs
           # < run program to mmap the whole created file and mlock the area >
           # < wait a few minutes >
           # cat /cgroup/memory/test/memory.idle_page_stats | grep cfui
             cfui   0   134217728   0   0   0   0   0   0
           # cat /cgroup/memory/memory.idle_page_stats | grep cfui
             cfui   0   134217728   0   0   0   0   0   0
      
         * cgroup1, anonymous and locked area
      
           # < create memory cgroups as above >
           #
           # echo $$ > /cgroup/memory/test/cgroup.procs
           # < run program to mmap anonymous area and mlock it >
           # < wait a few minutes >
           # cat /cgroup/memory/test/memory.idle_page_stats | grep csui
             csui   0   0   134217728   0   0   0   0   0
           # cat /cgroup/memory/memory.idle_page_stats | grep csui
             csui   0   0   134217728   0   0   0   0   0
      
         * Rerun above test cases in cgroup2 and the results are no exceptional.
           However, the cgroups are populated in different way as below:
      
           # mkdir -p /cgroup
           # mount -tcgroup2 none /cgroup
           # echo "+memory" > /cgroup/cgroup.subtree_control
           # mkdir -p /cgroup/test
      Signed-off-by: NGavin Shan <shan.gavin@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      f55ac551