提交 c4ba6014 编写于 作者: S SeongJae Park 提交者: Linus Torvalds

Documentation: add documents for DAMON

This commit adds documents for DAMON under
`Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`.

Link: https://lkml.kernel.org/r/20210716081449.22187-11-sj38.park@gmail.comSigned-off-by: NSeongJae Park <sjpark@amazon.de>
Reviewed-by: NFernand Sieber <sieberf@amazon.com>
Reviewed-by: NMarkus Boehme <markubo@amazon.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
上级 75c1c2b5
.. SPDX-License-Identifier: GPL-2.0
========================
Monitoring Data Accesses
========================
:doc:`DAMON </vm/damon/index>` allows light-weight data access monitoring.
Using DAMON, users can analyze the memory access patterns of their systems and
optimize those.
.. toctree::
:maxdepth: 2
start
usage
.. SPDX-License-Identifier: GPL-2.0
===============
Getting Started
===============
This document briefly describes how you can use DAMON by demonstrating its
default user space tool. Please note that this document describes only a part
of its features for brevity. Please refer to :doc:`usage` for more details.
TL; DR
======
Follow the commands below to monitor and visualize the memory access pattern of
your workload. ::
# # build the kernel with CONFIG_DAMON_*=y, install it, and reboot
# mount -t debugfs none /sys/kernel/debug/
# git clone https://github.com/awslabs/damo
# ./damo/damo record $(pidof <your workload>)
# ./damo/damo report heat --plot_ascii
The final command draws the access heatmap of ``<your workload>``. The heatmap
shows which memory region (x-axis) is accessed when (y-axis) and how frequently
(number; the higher the more accesses have been observed). ::
111111111111111111111111111111111111111111111111111111110000
111121111111111111111111111111211111111111111111111111110000
000000000000000000000000000000000000000000000000001555552000
000000000000000000000000000000000000000000000222223555552000
000000000000000000000000000000000000000011111677775000000000
000000000000000000000000000000000000000488888000000000000000
000000000000000000000000000000000177888400000000000000000000
000000000000000000000000000046666522222100000000000000000000
000000000000000000000014444344444300000000000000000000000000
000000000000000002222245555510000000000000000000000000000000
# access_frequency: 0 1 2 3 4 5 6 7 8 9
# x-axis: space (140286319947776-140286426374096: 101.496 MiB)
# y-axis: time (605442256436361-605479951866441: 37.695430s)
# resolution: 60x10 (1.692 MiB and 3.770s for each character)
Prerequisites
=============
Kernel
------
You should first ensure your system is running on a kernel built with
``CONFIG_DAMON_*=y``.
User Space Tool
---------------
For the demonstration, we will use the default user space tool for DAMON,
called DAMON Operator (DAMO). It is available at
https://github.com/awslabs/damo. The examples below assume that ``damo`` is on
your ``$PATH``. It's not mandatory, though.
Because DAMO is using the debugfs interface (refer to :doc:`usage` for the
detail) of DAMON, you should ensure debugfs is mounted. Mount it manually as
below::
# mount -t debugfs none /sys/kernel/debug/
or append the following line to your ``/etc/fstab`` file so that your system
can automatically mount debugfs upon booting::
debugfs /sys/kernel/debug debugfs defaults 0 0
Recording Data Access Patterns
==============================
The commands below record the memory access patterns of a program and save the
monitoring results to a file. ::
$ git clone https://github.com/sjp38/masim
$ cd masim; make; ./masim ./configs/zigzag.cfg &
$ sudo damo record -o damon.data $(pidof masim)
The first two lines of the commands download an artificial memory access
generator program and run it in the background. The generator will repeatedly
access two 100 MiB sized memory regions one by one. You can substitute this
with your real workload. The last line asks ``damo`` to record the access
pattern in the ``damon.data`` file.
Visualizing Recorded Patterns
=============================
The following three commands visualize the recorded access patterns and save
the results as separate image files. ::
$ damo report heats --heatmap access_pattern_heatmap.png
$ damo report wss --range 0 101 1 --plot wss_dist.png
$ damo report wss --range 0 101 1 --sortby time --plot wss_chron_change.png
- ``access_pattern_heatmap.png`` will visualize the data access pattern in a
heatmap, showing which memory region (y-axis) got accessed when (x-axis)
and how frequently (color).
- ``wss_dist.png`` will show the distribution of the working set size.
- ``wss_chron_change.png`` will show how the working set size has
chronologically changed.
You can view the visualizations of this example workload at [1]_.
Visualizations of other realistic workloads are available at [2]_ [3]_ [4]_.
.. [1] https://damonitor.github.io/doc/html/v17/admin-guide/mm/damon/start.html#visualizing-recorded-patterns
.. [2] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
.. [3] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
.. [4] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
.. SPDX-License-Identifier: GPL-2.0
===============
Detailed Usages
===============
DAMON provides below three interfaces for different users.
- *DAMON user space tool.*
This is for privileged people such as system administrators who want a
just-working human-friendly interface. Using this, users can use the DAMON’s
major features in a human-friendly way. It may not be highly tuned for
special cases, though. It supports only virtual address spaces monitoring.
- *debugfs interface.*
This is for privileged user space programmers who want more optimized use of
DAMON. Using this, users can use DAMON’s major features by reading
from and writing to special debugfs files. Therefore, you can write and use
your personalized DAMON debugfs wrapper programs that reads/writes the
debugfs files instead of you. The DAMON user space tool is also a reference
implementation of such programs. It supports only virtual address spaces
monitoring.
- *Kernel Space Programming Interface.*
This is for kernel space programmers. Using this, users can utilize every
feature of DAMON most flexibly and efficiently by writing kernel space
DAMON application programs for you. You can even extend DAMON for various
address spaces.
Nevertheless, you could write your own user space tool using the debugfs
interface. A reference implementation is available at
https://github.com/awslabs/damo. If you are a kernel programmer, you could
refer to :doc:`/vm/damon/api` for the kernel space programming interface. For
the reason, this document describes only the debugfs interface
debugfs Interface
=================
DAMON exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under
its debugfs directory, ``<debugfs>/damon/``.
Attributes
----------
Users can get and set the ``sampling interval``, ``aggregation interval``,
``regions update interval``, and min/max number of monitoring target regions by
reading from and writing to the ``attrs`` file. To know about the monitoring
attributes in detail, please refer to the :doc:`/vm/damon/design`. For
example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and
1000, and then check it again::
# cd <debugfs>/damon
# echo 5000 100000 1000000 10 1000 > attrs
# cat attrs
5000 100000 1000000 10 1000
Target IDs
----------
Some types of address spaces supports multiple monitoring target. For example,
the virtual memory address spaces monitoring can have multiple processes as the
monitoring targets. Users can set the targets by writing relevant id values of
the targets to, and get the ids of the current targets by reading from the
``target_ids`` file. In case of the virtual address spaces monitoring, the
values should be pids of the monitoring target processes. For example, below
commands set processes having pids 42 and 4242 as the monitoring targets and
check it again::
# cd <debugfs>/damon
# echo 42 4242 > target_ids
# cat target_ids
42 4242
Note that setting the target ids doesn't start the monitoring.
Turning On/Off
--------------
Setting the files as described above doesn't incur effect unless you explicitly
start the monitoring. You can start, stop, and check the current status of the
monitoring by writing to and reading from the ``monitor_on`` file. Writing
``on`` to the file starts the monitoring of the targets with the attributes.
Writing ``off`` to the file stops those. DAMON also stops if every target
process is terminated. Below example commands turn on, off, and check the
status of DAMON::
# cd <debugfs>/damon
# echo on > monitor_on
# echo off > monitor_on
# cat monitor_on
off
Please note that you cannot write to the above-mentioned debugfs files while
the monitoring is turned on. If you write to the files while DAMON is running,
an error code such as ``-EBUSY`` will be returned.
Tracepoint for Monitoring Results
=================================
DAMON provides the monitoring results via a tracepoint,
``damon:damon_aggregated``. While the monitoring is turned on, you could
record the tracepoint events and show results using tracepoint supporting tools
like ``perf``. For example::
# echo on > monitor_on
# perf record -e damon:damon_aggregated &
# sleep 5
# kill 9 $(pidof perf)
# echo off > monitor_on
# perf script
......@@ -27,6 +27,7 @@ the Linux memory management.
concepts
cma_debugfs
damon/index
hugetlbpage
idle_page_tracking
ksm
......
.. SPDX-License-Identifier: GPL-2.0
=============
API Reference
=============
Kernel space programs can use every feature of DAMON using below APIs. All you
need to do is including ``damon.h``, which is located in ``include/linux/`` of
the source tree.
Structures
==========
.. kernel-doc:: include/linux/damon.h
Functions
=========
.. kernel-doc:: mm/damon/core.c
.. SPDX-License-Identifier: GPL-2.0
======
Design
======
Configurable Layers
===================
DAMON provides data access monitoring functionality while making the accuracy
and the overhead controllable. The fundamental access monitorings require
primitives that dependent on and optimized for the target address space. On
the other hand, the accuracy and overhead tradeoff mechanism, which is the core
of DAMON, is in the pure logic space. DAMON separates the two parts in
different layers and defines its interface to allow various low level
primitives implementations configurable with the core logic.
Due to this separated design and the configurable interface, users can extend
DAMON for any address space by configuring the core logics with appropriate low
level primitive implementations. If appropriate one is not provided, users can
implement the primitives on their own.
For example, physical memory, virtual memory, swap space, those for specific
processes, NUMA nodes, files, and backing memory devices would be supportable.
Also, if some architectures or devices support special optimized access check
primitives, those will be easily configurable.
Reference Implementations of Address Space Specific Primitives
==============================================================
The low level primitives for the fundamental access monitoring are defined in
two parts:
1. Identification of the monitoring target address range for the address space.
2. Access check of specific address range in the target space.
DAMON currently provides the implementation of the primitives for only the
virtual address spaces. Below two subsections describe how it works.
VMA-based Target Address Range Construction
-------------------------------------------
Only small parts in the super-huge virtual address space of the processes are
mapped to the physical memory and accessed. Thus, tracking the unmapped
address regions is just wasteful. However, because DAMON can deal with some
level of noise using the adaptive regions adjustment mechanism, tracking every
mapping is not strictly required but could even incur a high overhead in some
cases. That said, too huge unmapped areas inside the monitoring target should
be removed to not take the time for the adaptive mechanism.
For the reason, this implementation converts the complex mappings to three
distinct regions that cover every mapped area of the address space. The two
gaps between the three regions are the two biggest unmapped areas in the given
address space. The two biggest unmapped areas would be the gap between the
heap and the uppermost mmap()-ed region, and the gap between the lowermost
mmap()-ed region and the stack in most of the cases. Because these gaps are
exceptionally huge in usual address spaces, excluding these will be sufficient
to make a reasonable trade-off. Below shows this in detail::
<heap>
<BIG UNMAPPED REGION 1>
<uppermost mmap()-ed region>
(small mmap()-ed regions and munmap()-ed regions)
<lowermost mmap()-ed region>
<BIG UNMAPPED REGION 2>
<stack>
PTE Accessed-bit Based Access Check
-----------------------------------
The implementation for the virtual address space uses PTE Accessed-bit for
basic access checks. It finds the relevant PTE Accessed bit from the address
by walking the page table for the target task of the address. In this way, the
implementation finds and clears the bit for next sampling target address and
checks whether the bit set again after one sampling period. This could disturb
other kernel subsystems using the Accessed bits, namely Idle page tracking and
the reclaim logic. To avoid such disturbances, DAMON makes it mutually
exclusive with Idle page tracking and uses ``PG_idle`` and ``PG_young`` page
flags to solve the conflict with the reclaim logic, as Idle page tracking does.
Address Space Independent Core Mechanisms
=========================================
Below four sections describe each of the DAMON core mechanisms and the five
monitoring attributes, ``sampling interval``, ``aggregation interval``,
``regions update interval``, ``minimum number of regions``, and ``maximum
number of regions``.
Access Frequency Monitoring
---------------------------
The output of DAMON says what pages are how frequently accessed for a given
duration. The resolution of the access frequency is controlled by setting
``sampling interval`` and ``aggregation interval``. In detail, DAMON checks
access to each page per ``sampling interval`` and aggregates the results. In
other words, counts the number of the accesses to each page. After each
``aggregation interval`` passes, DAMON calls callback functions that previously
registered by users so that users can read the aggregated results and then
clears the results. This can be described in below simple pseudo-code::
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
sleep(sampling interval)
The monitoring overhead of this mechanism will arbitrarily increase as the
size of the target workload grows.
Region Based Sampling
---------------------
To avoid the unbounded increase of the overhead, DAMON groups adjacent pages
that assumed to have the same access frequencies into a region. As long as the
assumption (pages in a region have the same access frequencies) is kept, only
one page in the region is required to be checked. Thus, for each ``sampling
interval``, DAMON randomly picks one page in each region, waits for one
``sampling interval``, checks whether the page is accessed meanwhile, and
increases the access frequency of the region if so. Therefore, the monitoring
overhead is controllable by setting the number of regions. DAMON allows users
to set the minimum and the maximum number of regions for the trade-off.
This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed.
Adaptive Regions Adjustment
---------------------------
Even somehow the initial monitoring target regions are well constructed to
fulfill the assumption (pages in same region have similar access frequencies),
the data access pattern can be dynamically changed. This will result in low
monitoring quality. To keep the assumption as much as possible, DAMON
adaptively merges and splits each region based on their access frequency.
For each ``aggregation interval``, it compares the access frequencies of
adjacent regions and merges those if the frequency difference is small. Then,
after it reports and clears the aggregated access frequency of each region, it
splits each region into two or three regions if the total number of regions
will not exceed the user-specified maximum number of regions after the split.
In this way, DAMON provides its best-effort quality and minimal overhead while
keeping the bounds users set for their trade-off.
Dynamic Target Space Updates Handling
-------------------------------------
The monitoring target address range could dynamically changed. For example,
virtual memory could be dynamically mapped and unmapped. Physical memory could
be hot-plugged.
As the changes could be quite frequent in some cases, DAMON checks the dynamic
memory mapping changes and applies it to the abstracted target area only for
each of a user-specified time interval (``regions update interval``).
.. SPDX-License-Identifier: GPL-2.0
==========================
Frequently Asked Questions
==========================
Why a new subsystem, instead of extending perf or other user space tools?
=========================================================================
First, because it needs to be lightweight as much as possible so that it can be
used online, any unnecessary overhead such as kernel - user space context
switching cost should be avoided. Second, DAMON aims to be used by other
programs including the kernel. Therefore, having a dependency on specific
tools like perf is not desirable. These are the two biggest reasons why DAMON
is implemented in the kernel space.
Can 'idle pages tracking' or 'perf mem' substitute DAMON?
=========================================================
Idle page tracking is a low level primitive for access check of the physical
address space. 'perf mem' is similar, though it can use sampling to minimize
the overhead. On the other hand, DAMON is a higher-level framework for the
monitoring of various address spaces. It is focused on memory management
optimization and provides sophisticated accuracy/overhead handling mechanisms.
Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of
DAMON's output, but cannot substitute DAMON.
Does DAMON support virtual memory only?
=======================================
No. The core of the DAMON is address space independent. The address space
specific low level primitive parts including monitoring target regions
constructions and actual access checks can be implemented and configured on the
DAMON core by the users. In this way, DAMON users can monitor any address
space with any access check technique.
Nonetheless, DAMON provides vma tracking and PTE Accessed bit check based
implementations of the address space dependent functions for the virtual memory
by default, for a reference and convenient use. In near future, we will
provide those for physical memory address space.
Can I simply monitor page granularity?
======================================
Yes. You can do so by setting the ``min_nr_regions`` attribute higher than the
working set size divided by the page size. Because the monitoring target
regions size is forced to be ``>=page size``, the region split will make no
effect.
.. SPDX-License-Identifier: GPL-2.0
==========================
DAMON: Data Access MONitor
==========================
DAMON is a data access monitoring framework subsystem for the Linux kernel.
The core mechanisms of DAMON (refer to :doc:`design` for the detail) make it
- *accurate* (the monitoring output is useful enough for DRAM level memory
management; It might not appropriate for CPU Cache levels, though),
- *light-weight* (the monitoring overhead is low enough to be applied online),
and
- *scalable* (the upper-bound of the overhead is in constant range regardless
of the size of target workloads).
Using this framework, therefore, the kernel's memory management mechanisms can
make advanced decisions. Experimental memory management optimization works
that incurring high data accesses monitoring overhead could implemented again.
In user space, meanwhile, users who have some special workloads can write
personalized applications for better understanding and optimizations of their
workloads and systems.
.. toctree::
:maxdepth: 2
faq
design
api
plans
......@@ -32,6 +32,7 @@ descriptions of data structures and algorithms.
arch_pgtable_helpers
balance
cleancache
damon/index
free_page_reporting
frontswap
highmem
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册