未验证 提交 dcc34901 编写于 作者: O openeuler-ci-bot 提交者: Gitee

!790 mm: enable ksm per process and cgroup

Merge Pull Request from: @sun_nanyong 
 
patch 1~6: backport mainline patchset(mm: process/cgroup ksm
support)and patches it depends on.
patch 7:Add control file "memory.ksm" to enable ksm per cgroup. 
 
Link:https://gitee.com/openeuler/kernel/pulls/790 

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> 
...@@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes. ...@@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes.
When it is set to 0 only pages from the same node are merged, When it is set to 0 only pages from the same node are merged,
otherwise pages from all nodes can be merged together (default). otherwise pages from all nodes can be merged together (default).
What: /sys/kernel/mm/ksm/general_profit
Date: April 2023
KernelVersion: 6.4
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Measure how effective KSM is.
general_profit: how effective is KSM. The formula for the
calculation is in Documentation/admin-guide/mm/ksm.rst.
...@@ -99,6 +99,7 @@ Brief summary of control files. ...@@ -99,6 +99,7 @@ Brief summary of control files.
memory.kmem.tcp.failcnt show the number of tcp buf memory usage memory.kmem.tcp.failcnt show the number of tcp buf memory usage
hits limits hits limits
memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded
memory.ksm set/show ksm merge any mode
==================================== ========================================== ==================================== ==========================================
1. History 1. History
......
...@@ -159,6 +159,8 @@ stable_node_chains_prune_millisecs ...@@ -159,6 +159,8 @@ stable_node_chains_prune_millisecs
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``: The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
general_profit
how effective is KSM. The calculation is explained below.
pages_shared pages_shared
how many shared pages are being used how many shared pages are being used
pages_sharing pages_sharing
...@@ -184,6 +186,44 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the ...@@ -184,6 +186,44 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must ``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
be increased accordingly. be increased accordingly.
Monitoring KSM profit
=====================
KSM can save memory by merging identical pages, but also can consume
additional memory, because it needs to generate a number of rmap_items to
save each scanned page's brief rmap information. Some of these pages may
be merged, but some may not be abled to be merged after being checked
several times, which are unprofitable memory consumed.
1) How to determine whether KSM save memory or consume memory in system-wide
range? Here is a simple approximate calculation for reference::
general_profit =~ pages_sharing * sizeof(page) - (all_rmap_items) *
sizeof(rmap_item);
where all_rmap_items can be easily obtained by summing ``pages_sharing``,
``pages_shared``, ``pages_unshared`` and ``pages_volatile``.
2) The KSM profit inner a single process can be similarly obtained by the
following approximate calculation::
process_profit =~ ksm_merging_pages * sizeof(page) -
ksm_rmap_items * sizeof(rmap_item).
where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. The process profit
is also shown in ``/proc/<pid>/ksm_stat`` as ksm_process_profit.
From the perspective of application, a high ratio of ``ksm_rmap_items`` to
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
administrators have to rethink how to change madvise policy. Giving an example
for reference, a page's size is usually 4K, and the rmap_item's size is
separately 32B on 32-bit CPU architecture and 64B on 64-bit CPU architecture.
so if the ``ksm_rmap_items/ksm_merging_pages`` ratio exceeds 64 on 64-bit CPU
or exceeds 128 on 32-bit CPU, then the app's madvise policy should be dropped,
because the ksm profit is approximately zero or negative.
-- --
Izik Eidus, Izik Eidus,
Hugh Dickins, 17 Nov 2009 Hugh Dickins, 17 Nov 2009
...@@ -2573,6 +2573,13 @@ int gmap_mark_unmergeable(void) ...@@ -2573,6 +2573,13 @@ int gmap_mark_unmergeable(void)
struct vm_area_struct *vma; struct vm_area_struct *vma;
int ret; int ret;
/*
* Make sure to disable KSM (if enabled for the whole process or
* individual VMAs). Note that nothing currently hinders user space
* from re-enabling it.
*/
clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
for (vma = mm->mmap; vma; vma = vma->vm_next) { for (vma = mm->mmap; vma; vma = vma->vm_next) {
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end, ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
MADV_UNMERGEABLE, &vma->vm_flags); MADV_UNMERGEABLE, &vma->vm_flags);
......
...@@ -97,6 +97,7 @@ ...@@ -97,6 +97,7 @@
#include <linux/time_namespace.h> #include <linux/time_namespace.h>
#include <linux/resctrl.h> #include <linux/resctrl.h>
#include <linux/share_pool.h> #include <linux/share_pool.h>
#include <linux/ksm.h>
#include <trace/events/oom.h> #include <trace/events/oom.h>
#include "internal.h" #include "internal.h"
#include "fd.h" #include "fd.h"
...@@ -3341,6 +3342,37 @@ static int proc_pid_patch_state(struct seq_file *m, struct pid_namespace *ns, ...@@ -3341,6 +3342,37 @@ static int proc_pid_patch_state(struct seq_file *m, struct pid_namespace *ns,
} }
#endif /* CONFIG_LIVEPATCH */ #endif /* CONFIG_LIVEPATCH */
#ifdef CONFIG_KSM
static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
struct mm_struct *mm;
mm = get_task_mm(task);
if (mm) {
seq_printf(m, "%lu\n", mm->ksm_merging_pages);
mmput(mm);
}
return 0;
}
static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
struct mm_struct *mm;
mm = get_task_mm(task);
if (mm) {
seq_printf(m, "ksm_rmap_items %lu\n", mm->ksm_rmap_items);
seq_printf(m, "ksm_merging_pages %lu\n", mm->ksm_merging_pages);
seq_printf(m, "ksm_process_profit %ld\n", ksm_process_profit(mm));
mmput(mm);
}
return 0;
}
#endif /* CONFIG_KSM */
#ifdef CONFIG_STACKLEAK_METRICS #ifdef CONFIG_STACKLEAK_METRICS
static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns, static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task) struct pid *pid, struct task_struct *task)
...@@ -3482,6 +3514,10 @@ static const struct pid_entry tgid_base_stuff[] = { ...@@ -3482,6 +3514,10 @@ static const struct pid_entry tgid_base_stuff[] = {
#ifdef CONFIG_ASCEND_SHARE_POOL #ifdef CONFIG_ASCEND_SHARE_POOL
ONE("sp_group", 0444, proc_sp_group_state), ONE("sp_group", 0444, proc_sp_group_state),
#endif #endif
#ifdef CONFIG_KSM
ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages),
ONE("ksm_stat", S_IRUSR, proc_pid_ksm_stat),
#endif
}; };
static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx) static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
...@@ -3893,6 +3929,10 @@ static const struct pid_entry tid_base_stuff[] = { ...@@ -3893,6 +3929,10 @@ static const struct pid_entry tid_base_stuff[] = {
#ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY
REG("preferred_cpuset", 0644, proc_preferred_cpuset_operations), REG("preferred_cpuset", 0644, proc_preferred_cpuset_operations),
#endif #endif
#ifdef CONFIG_KSM
ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages),
ONE("ksm_stat", S_IRUSR, proc_pid_ksm_stat),
#endif
}; };
static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx) static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx)
......
...@@ -21,13 +21,27 @@ struct mem_cgroup; ...@@ -21,13 +21,27 @@ struct mem_cgroup;
#ifdef CONFIG_KSM #ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start, int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags); unsigned long end, int advice, unsigned long *vm_flags);
void ksm_add_vma(struct vm_area_struct *vma);
int ksm_enable_merge_any(struct mm_struct *mm);
int ksm_disable_merge_any(struct mm_struct *mm);
int __ksm_enter(struct mm_struct *mm); int __ksm_enter(struct mm_struct *mm);
void __ksm_exit(struct mm_struct *mm); void __ksm_exit(struct mm_struct *mm);
static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{ {
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) int ret;
return __ksm_enter(mm);
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags)) {
ret = __ksm_enter(mm);
if (ret)
return ret;
}
if (test_bit(MMF_VM_MERGE_ANY, &oldmm->flags))
set_bit(MMF_VM_MERGE_ANY, &mm->flags);
return 0; return 0;
} }
...@@ -54,8 +68,16 @@ struct page *ksm_might_need_to_copy(struct page *page, ...@@ -54,8 +68,16 @@ struct page *ksm_might_need_to_copy(struct page *page,
void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc); void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc);
void ksm_migrate_page(struct page *newpage, struct page *oldpage); void ksm_migrate_page(struct page *newpage, struct page *oldpage);
#ifdef CONFIG_PROC_FS
long ksm_process_profit(struct mm_struct *);
#endif /* CONFIG_PROC_FS */
#else /* !CONFIG_KSM */ #else /* !CONFIG_KSM */
static inline void ksm_add_vma(struct vm_area_struct *vma)
{
}
static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm) static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{ {
return 0; return 0;
......
...@@ -392,7 +392,11 @@ struct mem_cgroup { ...@@ -392,7 +392,11 @@ struct mem_cgroup {
KABI_RESERVE(3) KABI_RESERVE(3)
KABI_RESERVE(4) KABI_RESERVE(4)
#endif #endif
#ifdef CONFIG_KSM
KABI_USE(5, bool ksm_merge_any)
#else
KABI_RESERVE(5) KABI_RESERVE(5)
#endif
KABI_RESERVE(6) KABI_RESERVE(6)
KABI_RESERVE(7) KABI_RESERVE(7)
KABI_RESERVE(8) KABI_RESERVE(8)
......
...@@ -623,8 +623,21 @@ struct mm_struct { ...@@ -623,8 +623,21 @@ struct mm_struct {
#else #else
KABI_RESERVE(1) KABI_RESERVE(1)
#endif #endif
#ifdef CONFIG_KSM
/*
* Represent how many pages of this process are involved in KSM
* merging.
*/
KABI_USE(2, unsigned long ksm_merging_pages)
/*
* Represent how many pages are checked for ksm merging
* including merged and not merged.
*/
KABI_USE(3, unsigned long ksm_rmap_items)
#else
KABI_RESERVE(2) KABI_RESERVE(2)
KABI_RESERVE(3) KABI_RESERVE(3)
#endif
KABI_RESERVE(4) KABI_RESERVE(4)
KABI_RESERVE(5) KABI_RESERVE(5)
KABI_RESERVE(6) KABI_RESERVE(6)
......
...@@ -78,4 +78,5 @@ static inline int get_dumpable(struct mm_struct *mm) ...@@ -78,4 +78,5 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
MMF_DISABLE_THP_MASK) MMF_DISABLE_THP_MASK)
#define MMF_VM_MERGE_ANY 29
#endif /* _LINUX_SCHED_COREDUMP_H */ #endif /* _LINUX_SCHED_COREDUMP_H */
...@@ -258,4 +258,6 @@ struct prctl_mm_map { ...@@ -258,4 +258,6 @@ struct prctl_mm_map {
# define PR_SCHED_CORE_SCOPE_THREAD_GROUP 1 # define PR_SCHED_CORE_SCOPE_THREAD_GROUP 1
# define PR_SCHED_CORE_SCOPE_PROCESS_GROUP 2 # define PR_SCHED_CORE_SCOPE_PROCESS_GROUP 2
#define PR_SET_MEMORY_MERGE 67
#define PR_GET_MEMORY_MERGE 68
#endif /* _LINUX_PRCTL_H */ #endif /* _LINUX_PRCTL_H */
...@@ -14,6 +14,7 @@ ...@@ -14,6 +14,7 @@
#include <linux/highuid.h> #include <linux/highuid.h>
#include <linux/fs.h> #include <linux/fs.h>
#include <linux/kmod.h> #include <linux/kmod.h>
#include <linux/ksm.h>
#include <linux/perf_event.h> #include <linux/perf_event.h>
#include <linux/resource.h> #include <linux/resource.h>
#include <linux/kernel.h> #include <linux/kernel.h>
...@@ -2529,6 +2530,26 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, ...@@ -2529,6 +2530,26 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_SCHED_CORE: case PR_SCHED_CORE:
error = sched_core_share_pid(arg2, arg3, arg4, arg5); error = sched_core_share_pid(arg2, arg3, arg4, arg5);
break; break;
#endif
#ifdef CONFIG_KSM
case PR_SET_MEMORY_MERGE:
if (arg3 || arg4 || arg5)
return -EINVAL;
if (mmap_write_lock_killable(me->mm))
return -EINTR;
if (arg2)
error = ksm_enable_merge_any(me->mm);
else
error = ksm_disable_merge_any(me->mm);
mmap_write_unlock(me->mm);
break;
case PR_GET_MEMORY_MERGE:
if (arg2 || arg3 || arg4 || arg5)
return -EINVAL;
error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
break;
#endif #endif
default: default:
error = -EINVAL; error = -EINVAL;
......
...@@ -389,6 +389,7 @@ static inline struct rmap_item *alloc_rmap_item(void) ...@@ -389,6 +389,7 @@ static inline struct rmap_item *alloc_rmap_item(void)
static inline void free_rmap_item(struct rmap_item *rmap_item) static inline void free_rmap_item(struct rmap_item *rmap_item)
{ {
ksm_rmap_items--; ksm_rmap_items--;
rmap_item->mm->ksm_rmap_items--;
rmap_item->mm = NULL; /* debug safety */ rmap_item->mm = NULL; /* debug safety */
kmem_cache_free(rmap_item_cache, rmap_item); kmem_cache_free(rmap_item_cache, rmap_item);
} }
...@@ -518,6 +519,32 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr) ...@@ -518,6 +519,32 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
return (ret & VM_FAULT_OOM) ? -ENOMEM : 0; return (ret & VM_FAULT_OOM) ? -ENOMEM : 0;
} }
static bool vma_ksm_compatible(struct vm_area_struct *vma)
{
if (vma->vm_flags & (VM_SHARED | VM_MAYSHARE | VM_PFNMAP |
VM_IO | VM_DONTEXPAND | VM_HUGETLB |
VM_MIXEDMAP))
return false; /* just ignore the advice */
if (vma_is_dax(vma))
return false;
#ifdef CONFIG_COHERENT_DEVICE
if (is_cdm_vma(vma))
return false;
#endif
#ifdef VM_SAO
if (vma->vm_flags & VM_SAO)
return false;
#endif
#ifdef VM_SPARC_ADI
if (vma->vm_flags & VM_SPARC_ADI)
return false;
#endif
return true;
}
static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm, static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm,
unsigned long addr) unsigned long addr)
{ {
...@@ -642,6 +669,9 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node) ...@@ -642,6 +669,9 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
ksm_pages_sharing--; ksm_pages_sharing--;
else else
ksm_pages_shared--; ksm_pages_shared--;
rmap_item->mm->ksm_merging_pages--;
VM_BUG_ON(stable_node->rmap_hlist_len <= 0); VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
stable_node->rmap_hlist_len--; stable_node->rmap_hlist_len--;
put_anon_vma(rmap_item->anon_vma); put_anon_vma(rmap_item->anon_vma);
...@@ -791,6 +821,9 @@ static void remove_rmap_item_from_tree(struct rmap_item *rmap_item) ...@@ -791,6 +821,9 @@ static void remove_rmap_item_from_tree(struct rmap_item *rmap_item)
ksm_pages_sharing--; ksm_pages_sharing--;
else else
ksm_pages_shared--; ksm_pages_shared--;
rmap_item->mm->ksm_merging_pages--;
VM_BUG_ON(stable_node->rmap_hlist_len <= 0); VM_BUG_ON(stable_node->rmap_hlist_len <= 0);
stable_node->rmap_hlist_len--; stable_node->rmap_hlist_len--;
...@@ -1004,6 +1037,7 @@ static int unmerge_and_remove_all_rmap_items(void) ...@@ -1004,6 +1037,7 @@ static int unmerge_and_remove_all_rmap_items(void)
free_mm_slot(mm_slot); free_mm_slot(mm_slot);
clear_bit(MMF_VM_MERGEABLE, &mm->flags); clear_bit(MMF_VM_MERGEABLE, &mm->flags);
clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
mmdrop(mm); mmdrop(mm);
} else } else
spin_unlock(&ksm_mmlist_lock); spin_unlock(&ksm_mmlist_lock);
...@@ -2026,6 +2060,8 @@ static void stable_tree_append(struct rmap_item *rmap_item, ...@@ -2026,6 +2060,8 @@ static void stable_tree_append(struct rmap_item *rmap_item,
ksm_pages_sharing++; ksm_pages_sharing++;
else else
ksm_pages_shared++; ksm_pages_shared++;
rmap_item->mm->ksm_merging_pages++;
} }
/* /*
...@@ -2219,6 +2255,7 @@ static struct rmap_item *get_next_rmap_item(struct mm_slot *mm_slot, ...@@ -2219,6 +2255,7 @@ static struct rmap_item *get_next_rmap_item(struct mm_slot *mm_slot,
if (rmap_item) { if (rmap_item) {
/* It has already been zeroed */ /* It has already been zeroed */
rmap_item->mm = mm_slot->mm; rmap_item->mm = mm_slot->mm;
rmap_item->mm->ksm_rmap_items++;
rmap_item->address = addr; rmap_item->address = addr;
rmap_item->rmap_list = *rmap_list; rmap_item->rmap_list = *rmap_list;
*rmap_list = rmap_item; *rmap_list = rmap_item;
...@@ -2362,6 +2399,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) ...@@ -2362,6 +2399,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
free_mm_slot(slot); free_mm_slot(slot);
clear_bit(MMF_VM_MERGEABLE, &mm->flags); clear_bit(MMF_VM_MERGEABLE, &mm->flags);
clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
mmap_read_unlock(mm); mmap_read_unlock(mm);
mmdrop(mm); mmdrop(mm);
} else { } else {
...@@ -2438,6 +2476,126 @@ static int ksm_scan_thread(void *nothing) ...@@ -2438,6 +2476,126 @@ static int ksm_scan_thread(void *nothing)
return 0; return 0;
} }
static void __ksm_add_vma(struct vm_area_struct *vma)
{
unsigned long vm_flags = vma->vm_flags;
if (vm_flags & VM_MERGEABLE)
return;
if (vma_ksm_compatible(vma)) {
mmap_assert_write_locked(vma->vm_mm);
vma->vm_flags |= VM_MERGEABLE;
}
}
static int __ksm_del_vma(struct vm_area_struct *vma)
{
int err;
if (!(vma->vm_flags & VM_MERGEABLE))
return 0;
if (vma->anon_vma) {
err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end);
if (err)
return err;
}
mmap_assert_write_locked(vma->vm_mm);
vma->vm_flags &= ~VM_MERGEABLE;
return 0;
}
/**
* ksm_add_vma - Mark vma as mergeable if compatible
*
* @vma: Pointer to vma
*/
void ksm_add_vma(struct vm_area_struct *vma)
{
struct mm_struct *mm = vma->vm_mm;
if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
__ksm_add_vma(vma);
}
static void ksm_add_vmas(struct mm_struct *mm)
{
struct vm_area_struct *vma;
for (vma = mm->mmap; vma; vma = vma->vm_next)
__ksm_add_vma(vma);
}
static int ksm_del_vmas(struct mm_struct *mm)
{
struct vm_area_struct *vma;
int err;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
err = __ksm_del_vma(vma);
if (err)
return err;
}
return 0;
}
/**
* ksm_enable_merge_any - Add mm to mm ksm list and enable merging on all
* compatible VMA's
*
* @mm: Pointer to mm
*
* Returns 0 on success, otherwise error code
*/
int ksm_enable_merge_any(struct mm_struct *mm)
{
int err;
if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
return 0;
if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
err = __ksm_enter(mm);
if (err)
return err;
}
set_bit(MMF_VM_MERGE_ANY, &mm->flags);
ksm_add_vmas(mm);
return 0;
}
/**
* ksm_disable_merge_any - Disable merging on all compatible VMA's of the mm,
* previously enabled via ksm_enable_merge_any().
*
* Disabling merging implies unmerging any merged pages, like setting
* MADV_UNMERGEABLE would. If unmerging fails, the whole operation fails and
* merging on all compatible VMA's remains enabled.
*
* @mm: Pointer to mm
*
* Returns 0 on success, otherwise error code
*/
int ksm_disable_merge_any(struct mm_struct *mm)
{
int err;
if (!test_bit(MMF_VM_MERGE_ANY, &mm->flags))
return 0;
err = ksm_del_vmas(mm);
if (err) {
ksm_add_vmas(mm);
return err;
}
clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
return 0;
}
int ksm_madvise(struct vm_area_struct *vma, unsigned long start, int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags) unsigned long end, int advice, unsigned long *vm_flags)
{ {
...@@ -2446,30 +2604,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start, ...@@ -2446,30 +2604,11 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
switch (advice) { switch (advice) {
case MADV_MERGEABLE: case MADV_MERGEABLE:
/* if (vma->vm_flags & VM_MERGEABLE)
* Be somewhat over-protective for now!
*/
if (*vm_flags & (VM_MERGEABLE | VM_SHARED | VM_MAYSHARE |
VM_PFNMAP | VM_IO | VM_DONTEXPAND |
VM_HUGETLB | VM_MIXEDMAP))
return 0; /* just ignore the advice */
if (vma_is_dax(vma))
return 0; return 0;
#ifdef CONFIG_COHERENT_DEVICE if (!vma_ksm_compatible(vma))
if (is_cdm_vma(vma))
return 0;
#endif
#ifdef VM_SAO
if (*vm_flags & VM_SAO)
return 0;
#endif
#ifdef VM_SPARC_ADI
if (*vm_flags & VM_SPARC_ADI)
return 0; return 0;
#endif
if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) { if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) {
err = __ksm_enter(mm); err = __ksm_enter(mm);
...@@ -2567,6 +2706,7 @@ void __ksm_exit(struct mm_struct *mm) ...@@ -2567,6 +2706,7 @@ void __ksm_exit(struct mm_struct *mm)
if (easy_to_free) { if (easy_to_free) {
free_mm_slot(mm_slot); free_mm_slot(mm_slot);
clear_bit(MMF_VM_MERGE_ANY, &mm->flags);
clear_bit(MMF_VM_MERGEABLE, &mm->flags); clear_bit(MMF_VM_MERGEABLE, &mm->flags);
mmdrop(mm); mmdrop(mm);
} else if (mm_slot) { } else if (mm_slot) {
...@@ -2828,6 +2968,14 @@ static void wait_while_offlining(void) ...@@ -2828,6 +2968,14 @@ static void wait_while_offlining(void)
} }
#endif /* CONFIG_MEMORY_HOTREMOVE */ #endif /* CONFIG_MEMORY_HOTREMOVE */
#ifdef CONFIG_PROC_FS
long ksm_process_profit(struct mm_struct *mm)
{
return mm->ksm_merging_pages * PAGE_SIZE -
mm->ksm_rmap_items * sizeof(struct rmap_item);
}
#endif /* CONFIG_PROC_FS */
#ifdef CONFIG_SYSFS #ifdef CONFIG_SYSFS
/* /*
* This all compiles without CONFIG_SYSFS, but is a waste of space. * This all compiles without CONFIG_SYSFS, but is a waste of space.
...@@ -3093,6 +3241,18 @@ static ssize_t pages_volatile_show(struct kobject *kobj, ...@@ -3093,6 +3241,18 @@ static ssize_t pages_volatile_show(struct kobject *kobj,
} }
KSM_ATTR_RO(pages_volatile); KSM_ATTR_RO(pages_volatile);
static ssize_t general_profit_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
long general_profit;
general_profit = ksm_pages_sharing * PAGE_SIZE -
ksm_rmap_items * sizeof(struct rmap_item);
return sysfs_emit(buf, "%ld\n", general_profit);
}
KSM_ATTR_RO(general_profit);
static ssize_t stable_node_dups_show(struct kobject *kobj, static ssize_t stable_node_dups_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf) struct kobj_attribute *attr, char *buf)
{ {
...@@ -3157,6 +3317,7 @@ static struct attribute *ksm_attrs[] = { ...@@ -3157,6 +3317,7 @@ static struct attribute *ksm_attrs[] = {
&stable_node_dups_attr.attr, &stable_node_dups_attr.attr,
&stable_node_chains_prune_millisecs_attr.attr, &stable_node_chains_prune_millisecs_attr.attr,
&use_zero_pages_attr.attr, &use_zero_pages_attr.attr,
&general_profit_attr.attr,
NULL, NULL,
}; };
......
...@@ -71,6 +71,9 @@ ...@@ -71,6 +71,9 @@
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <trace/events/vmscan.h> #include <trace/events/vmscan.h>
#ifndef __GENKSYMS__
#include <linux/ksm.h>
#endif
struct cgroup_subsys memory_cgrp_subsys __read_mostly; struct cgroup_subsys memory_cgrp_subsys __read_mostly;
EXPORT_SYMBOL(memory_cgrp_subsys); EXPORT_SYMBOL(memory_cgrp_subsys);
...@@ -250,10 +253,15 @@ enum res_type { ...@@ -250,10 +253,15 @@ enum res_type {
iter != NULL; \ iter != NULL; \
iter = mem_cgroup_iter(NULL, iter, NULL)) iter = mem_cgroup_iter(NULL, iter, NULL))
static inline bool __task_is_dying(struct task_struct *task)
{
return tsk_is_oom_victim(task) || fatal_signal_pending(task) ||
(task->flags & PF_EXITING);
}
static inline bool task_is_dying(void) static inline bool task_is_dying(void)
{ {
return tsk_is_oom_victim(current) || fatal_signal_pending(current) || return __task_is_dying(current);
(current->flags & PF_EXITING);
} }
/* Some nice accessors for the vmpressure. */ /* Some nice accessors for the vmpressure. */
...@@ -5331,6 +5339,104 @@ static ssize_t memcg_high_async_ratio_write(struct kernfs_open_file *of, ...@@ -5331,6 +5339,104 @@ static ssize_t memcg_high_async_ratio_write(struct kernfs_open_file *of,
return nbytes; return nbytes;
} }
#ifdef CONFIG_KSM
static int memcg_set_ksm_for_tasks(struct mem_cgroup *memcg, bool enable)
{
struct task_struct *task;
struct mm_struct *mm;
struct css_task_iter it;
int ret = 0;
if (enable == READ_ONCE(memcg->ksm_merge_any))
return 0;
css_task_iter_start(&memcg->css, CSS_TASK_ITER_PROCS, &it);
while (!ret && (task = css_task_iter_next(&it))) {
if (__task_is_dying(task))
continue;
mm = get_task_mm(task);
if (!mm)
continue;
if (mmap_write_lock_killable(mm)) {
mmput(mm);
continue;
}
if (enable)
ret = ksm_enable_merge_any(mm);
else
ret = ksm_disable_merge_any(mm);
mmap_write_unlock(mm);
mmput(mm);
}
css_task_iter_end(&it);
return ret;
}
static int memory_ksm_show(struct seq_file *m, void *v)
{
unsigned long ksm_merging_pages = 0;
unsigned long ksm_rmap_items = 0;
long ksm_process_profits = 0;
unsigned int tasks = 0;
struct task_struct *task;
struct mm_struct *mm;
struct css_task_iter it;
struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
css_task_iter_start(&memcg->css, CSS_TASK_ITER_PROCS, &it);
while ((task = css_task_iter_next(&it))) {
mm = get_task_mm(task);
if (!mm)
continue;
if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
tasks++;
ksm_rmap_items += mm->ksm_rmap_items;
ksm_merging_pages += mm->ksm_merging_pages;
ksm_process_profits += ksm_process_profit(mm);
mmput(mm);
}
css_task_iter_end(&it);
seq_printf(m, "merge any state: %d\n", READ_ONCE(memcg->ksm_merge_any));
seq_printf(m, "merge any tasks: %u\n", tasks);
seq_printf(m, "ksm_rmap_items %lu\n", ksm_rmap_items);
seq_printf(m, "ksm_merging_pages %lu\n", ksm_merging_pages);
seq_printf(m, "ksm_process_profits %ld\n", ksm_process_profits);
return 0;
}
static ssize_t memory_ksm_write(struct kernfs_open_file *of, char *buf,
size_t nbytes, loff_t off)
{
bool enable;
int err;
struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
buf = strstrip(buf);
if (!buf)
return -EINVAL;
err = kstrtobool(buf, &enable);
if (err)
return err;
err = memcg_set_ksm_for_tasks(memcg, enable);
if (err)
return err;
WRITE_ONCE(memcg->ksm_merge_any, enable);
return nbytes;
}
#endif /* CONFIG_KSM */
#ifdef CONFIG_CGROUP_V1_WRITEBACK #ifdef CONFIG_CGROUP_V1_WRITEBACK
#include "../kernel/cgroup/cgroup-internal.h" #include "../kernel/cgroup/cgroup-internal.h"
...@@ -5615,6 +5721,14 @@ static struct cftype mem_cgroup_legacy_files[] = { ...@@ -5615,6 +5721,14 @@ static struct cftype mem_cgroup_legacy_files[] = {
.seq_show = wb_blkio_show, .seq_show = wb_blkio_show,
.write = wb_blkio_write, .write = wb_blkio_write,
}, },
#endif
#ifdef CONFIG_KSM
{
.name = "ksm",
.flags = CFTYPE_NOT_ON_ROOT,
.write = memory_ksm_write,
.seq_show = memory_ksm_show,
},
#endif #endif
{ }, /* terminate */ { }, /* terminate */
}; };
...@@ -5858,7 +5972,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) ...@@ -5858,7 +5972,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
if (parent != root_mem_cgroup) if (parent != root_mem_cgroup)
memory_cgrp_subsys.broken_hierarchy = true; memory_cgrp_subsys.broken_hierarchy = true;
} }
#ifdef CONFIG_KSM
memcg->ksm_merge_any = false;
#endif
/* The following stuff does not apply to the root */ /* The following stuff does not apply to the root */
if (!parent) { if (!parent) {
root_mem_cgroup = memcg; root_mem_cgroup = memcg;
......
...@@ -49,6 +49,7 @@ ...@@ -49,6 +49,7 @@
#include <linux/sched/mm.h> #include <linux/sched/mm.h>
#include <linux/swapops.h> #include <linux/swapops.h>
#include <linux/share_pool.h> #include <linux/share_pool.h>
#include <linux/ksm.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <asm/cacheflush.h> #include <asm/cacheflush.h>
...@@ -2131,6 +2132,7 @@ static unsigned long __mmap_region(struct mm_struct *mm, struct file *file, ...@@ -2131,6 +2132,7 @@ static unsigned long __mmap_region(struct mm_struct *mm, struct file *file,
allow_write_access(file); allow_write_access(file);
} }
file = vma->vm_file; file = vma->vm_file;
ksm_add_vma(vma);
out: out:
perf_event_mmap(vma); perf_event_mmap(vma);
...@@ -3436,6 +3438,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla ...@@ -3436,6 +3438,7 @@ static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long fla
vma->vm_flags = flags; vma->vm_flags = flags;
vma->vm_page_prot = vm_get_page_prot(flags); vma->vm_page_prot = vm_get_page_prot(flags);
vma_link(mm, vma, prev, rb_link, rb_parent); vma_link(mm, vma, prev, rb_link, rb_parent);
ksm_add_vma(vma);
out: out:
perf_event_mmap(vma); perf_event_mmap(vma);
mm->total_vm += len >> PAGE_SHIFT; mm->total_vm += len >> PAGE_SHIFT;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册