提交 · b861c9edbec7478c5dbaafeeb4da0a8fe1c3efa1 · openeuler / Kernel

08 1月, 2022 1 次提交

etmem: add ioctl for mm idle scan · b861c9ed

由 Kemeng Shi 提交于 1月 08, 2022

euleros inclusion
category: feature
feature: etmem
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue
CVE: NA

-------------------------------------------------

support ioctl for etmem scan to set scan flag
Signed-off-by: NKemeng Shi <shikemeng@huawei.com>
Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b861c9ed

30 12月, 2021 1 次提交

share_pool: Show sp vmflags in /proc/$pid/smaps · 5358fa5f

由 Tang Yizhou 提交于 12月 30, 2021

ascend inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NDAW
CVE: NA

-------------------

e80000600000-e80000603000 rw-s 00600000 00:05 1025
/sp_group_1 (deleted)
	Size:                 12 kB
	KernelPageSize:        4 kB
	MMUPageSize:           4 kB
	Rss:                   0 kB
	Pss:                   0 kB
	Shared_Clean:          0 kB
	Shared_Dirty:          0 kB
	Private_Clean:         0 kB
	Private_Dirty:         0 kB
	Referenced:            0 kB
	Anonymous:             0 kB
	LazyFree:              0 kB
	AnonHugePages:         0 kB
	ShmemPmdMapped:        0 kB
	Shared_Hugetlb:        0 kB
	Private_Hugetlb:       0 kB
	Swap:                  0 kB
	SwapPss:               0 kB
	Locked:                0 kB
	THPeligible:    0
	VmFlags: rd wr sh mr mw me ms pf io dc de nr dd sp
                                                        ~~
Signed-off-by: NTang Yizhou <tangyizhou@huawei.com>
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5358fa5f

13 10月, 2021 1 次提交

mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() · 2b7635c6

由 Miaohe Lin 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit b65597377b7b690efaaa1c9f4875ded8b4c46212
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b65597377b7b690efaaa1c9f4875ded8b4c46212

--------------------------------

[ Upstream commit e6be37b2 ]

Since commit 99cb0dbd ("mm,thp: add read-only THP support for
(non-shmem) FS"), read-only THP file mapping is supported.  But it forgot
to add checking for it in transparent_hugepage_enabled().  To fix it, we
add checking for read-only THP file mapping and also introduce helper
transhuge_vma_enabled() to check whether thp is enabled for specified vma
to reduce duplicated code.  We rename transparent_hugepage_enabled to
transparent_hugepage_active to make the code easier to follow as suggested
by David Hildenbrand.

[linmiaohe@huawei.com: define transhuge_vma_enabled next to transhuge_vma_suitable]
  Link: https://lkml.kernel.org/r/20210514093007.4117906-1-linmiaohe@huawei.com

Link: https://lkml.kernel.org/r/20210511134857.1581273-4-linmiaohe@huawei.com
Fixes: 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NYang Shi <shy828301@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
[hejingxian: fix build error in pin_mem_area() mm/pin_mem.c]
Signed-off-by: NJingxian He <hejingxian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2b7635c6

31 8月, 2021 1 次提交

mm: add pin memory method for checkpoint add restore · 7dc4c73d

由 Jingxian He 提交于 7月 27, 2021

hulk inclusion
category: feature
bugzilla: 48159
CVE: N/A

------------------------------

We can use the checkpoint and restore in userspace(criu) method to dump
and restore tasks when updating the kernel.
Currently, criu needs dump all memory data of tasks to files.
When the memory size is very large(larger than 1G),
the cost time of the dumping data will be very long(more than 1 min).

By pin the memory data of tasks and collect the corresponding physical pages
mapping info in checkpoint process, we can remap the physical pages to
restore tasks after upgrading the kernel. This pin memory method can
restore the task data within one second.

The pin memory area info is saved in the reserved memblock,
which can keep usable in the kernel update process.

The pin memory driver provides the following ioctl command for criu:
1) SET_PIN_MEM_AREA:
Set pin memory area, which can be remap to the restore task.
2) CLEAR_PIN_MEM_AREA:
Clear the pin memory area info,
which enable user reset the pin data.
3) REMAP_PIN_MEM_AREA:
Remap the pages of the pin memory to the restore task.
Signed-off-by: NJingxian He <hejingxian@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7dc4c73d

19 7月, 2021 1 次提交

userswap: add a new flag 'MAP_REPLACE' for mmap() · faa3fdcd

由 Guo Fan 提交于 7月 16, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I40AXF
CVE: NA

--------------------------------------

To make sure there are no other userspace threads access the memory
region we are swapping out, we need unmmap the memory region, map it
to a new address and use the new address to perform the swapout. We add
a new flag 'MAP_REPLACE' for mmap() to unmap the pages of the input
parameter 'VA' and remap them to a new tmpVA.
Signed-off-by: NGuo Fan <guofan5@huawei.com>
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

faa3fdcd

14 4月, 2021 2 次提交

memig: add memig-swap feature to openEuler · 8a655676

由 liubo 提交于 2月 24, 2021

euleros inclusion
category: feature
feature: add memig swap feature patch to openEuler kernel
bugzilla: 48246

-------------------------------------------------

reason:This patch is used to add memig swap feature to openEuler system.
memig_swap.ko is used to transfer the address
passed in the user state for page migration
Signed-off-by: Nyanxiaodan <yanxiaodan@huawei.com>
Signed-off-by: Nlinmiaohe <linmiaohe@huawei.com>
Signed-off-by: Nlouhongxiang <louhongxiang@huawei.com>
Signed-off-by: Nliubo <liubo254@huawei.com>
Signed-off-by: Ngeruijun <geruijun@huawei.com>
Signed-off-by: Nliangchenshu <liangchenshu@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8a655676

memig: add memig-scan feature to openEuler · c13e5b6a

由 liubo 提交于 2月 24, 2021

euleros inclusion
category: feature
feature: add memig scan feature patch to openEuler kernel
bugzilla: 48246

-------------------------------------------------

reason:This patch is used to add memig scan feature to openEuler system.
memig_scan.ko is used to scan the virtual address of the target process
and return the address access information to
the user mode for grading cold and hot pages.
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Nyanxiaodan <yanxiaodan@huawei.com>
Signed-off-by: NFeilong Lin <linfeilong@huawei.com>
Signed-off-by: Ngeruijun <geruijun@huawei.com>
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c13e5b6a

09 4月, 2021 1 次提交

mm: proc: Invalidate TLB after clearing soft-dirty page state · 9abab716

由 Will Deacon 提交于 3月 15, 2021

stable inclusion
from stable-5.10.20
commit 61a1f0ad45de5541fae1e78260c81d4e7e307a1c
bugzilla: 50608

--------------------------------

[ Upstream commit 912efa17 ]

Since commit 0758cd83 ("asm-generic/tlb: avoid potential double
flush"), TLB invalidation is elided in tlb_finish_mmu() if no entries
were batched via the tlb_remove_*() functions. Consequently, the
page-table modifications performed by clear_refs_write() in response to
a write to /proc/<pid>/clear_refs do not perform TLB invalidation.
Although this is fine when simply aging the ptes, in the case of
clearing the "soft-dirty" state we can end up with entries where
pte_write() is false, yet a writable mapping remains in the TLB.

Fix this by avoiding the mmu_gather API altogether: managing both the
'tlb_flush_pending' flag on the 'mm_struct' and explicit TLB
invalidation for the sort-dirty path, much like mprotect() does already.

Fixes: 0758cd83 ("asm-generic/tlb: avoid potential double flush”)
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NYu Zhao <yuzhao@google.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20210127235347.1402-2-will@kernel.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9abab716

28 1月, 2021 2 次提交

mm: don't play games with pinned pages in clear_page_refs · 80fced5a

由 Linus Torvalds 提交于 1月 25, 2021

stable inclusion
from stable-5.10.9
commit 1eea108995a2411d46350bf43e4b316a9f1e30fd
bugzilla: 47457

--------------------------------

[ Upstream commit 9348b73c ]

Turning a pinned page read-only breaks the pinning after COW.  Don't do it.

The whole "track page soft dirty" state doesn't work with pinned pages
anyway, since the page might be dirtied by the pinning entity without
ever being noticed in the page tables.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

80fced5a

mm: fix clear_refs_write locking · d1657a2f

由 Linus Torvalds 提交于 1月 25, 2021

stable inclusion
from stable-5.10.9
commit 41b0b0c09e974ff9760db396097169a1891460d2
bugzilla: 47457

--------------------------------

[ Upstream commit 29a951df ]

Turning page table entries read-only requires the mmap_sem held for
writing.

So stop doing the odd games with turning things from read locks to write
locks and back.  Just get the write lock.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d1657a2f

12 12月, 2020 1 次提交

proc: use untagged_addr() for pagemap_read addresses · 40d6366e

由 Miles Chen 提交于 12月 11, 2020

When we try to visit the pagemap of a tagged userspace pointer, we find
that the start_vaddr is not correct because of the tag.
To fix it, we should untag the userspace pointers in pagemap_read().

I tested with 5.10-rc4 and the issue remains.

Explanation from Catalin in [1]:

 "Arguably, that's a user-space bug since tagged file offsets were never
  supported. In this case it's not even a tag at bit 56 as per the arm64
  tagged address ABI but rather down to bit 47. You could say that the
  problem is caused by the C library (malloc()) or whoever created the
  tagged vaddr and passed it to this function. It's not a kernel
  regression as we've never supported it.

  Now, pagemap is a special case where the offset is usually not
  generated as a classic file offset but rather derived by shifting a
  user virtual address. I guess we can make a concession for pagemap
  (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"

My test code is based on [2]:

A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8

userspace program:

  uint64 OsLayer::VirtualToPhysical(void *vaddr) {
	uint64 frame, paddr, pfnmask, pagemask;
	int pagesize = sysconf(_SC_PAGESIZE);
	off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
	int fd = open(kPagemapPath, O_RDONLY);
	...

	if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
		int err = errno;
		string errtxt = ErrorString(err);
		if (fd >= 0)
			close(fd);
		return 0;
	}
  ...
  }

kernel fs/proc/task_mmu.c:

  static ssize_t pagemap_read(struct file *file, char __user *buf,
		size_t count, loff_t *ppos)
  {
	...
	src = *ppos;
	svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
	start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
	end_vaddr = mm->task_size;

	/* watch out for wraparound */
	// svpfn == 0xb400007662f54
	// (mm->task_size >> PAGE) == 0x8000000
	if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
		start_vaddr = end_vaddr;

	ret = 0;
	while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
		int len;
		unsigned long end;
		...
	}
	...
  }

[1] https://lore.kernel.org/patchwork/patch/1343258/
[2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158

Link: https://lkml.kernel.org/r/20201204024347.8295-1-miles.chen@mediatek.comSigned-off-by: NMiles Chen <miles.chen@mediatek.com>
Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
Cc: <stable@vger.kernel.org>	[5.4-]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40d6366e

17 10月, 2020 1 次提交

mm: remove the now-unnecessary mmget_still_valid() hack · 4d45e75a

由 Jann Horn 提交于 10月 15, 2020

The preceding patches have ensured that core dumping properly takes the
mmap_lock.  Thanks to that, we can now remove mmget_still_valid() and all
its users.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d45e75a

14 10月, 2020 3 次提交

mm: proc: smaps_rollup: do not stall write attempts on mmap_lock · ff9f47f6

由 Chinwen Chang 提交于 10月 13, 2020

smaps_rollup will try to grab mmap_lock and go through the whole vma list
until it finishes the iterating.  When encountering large processes, the
mmap_lock will be held for a longer time, which may block other write
requests like mmap and munmap from progressing smoothly.

There are upcoming mmap_lock optimizations like range-based locks, but the
lock applied to smaps_rollup would be the coarse type, which doesn't avoid
the occurrence of unpleasant contention.

To solve aforementioned issue, we add a check which detects whether anyone
wants to grab mmap_lock for write attempts.
Signed-off-by: NChinwen Chang <chinwen.chang@mediatek.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Song Liu <songliubraving@fb.com>
Cc: Jimmy Assarsson <jimmyassarsson@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Link: http://lkml.kernel.org/r/1597715898-3854-4-git-send-email-chinwen.chang@mediatek.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ff9f47f6

mm: smaps*: extend smap_gather_stats to support specified beginning · 03b4b114

由 Chinwen Chang 提交于 10月 13, 2020

Extend smap_gather_stats to support indicated beginning address at which
it should start gathering.  To achieve the goal, we add a new parameter
@start assigned by the caller and try to refactor it for simplicity.

If @start is 0, it will use the range of @vma for gathering.
Signed-off-by: NChinwen Chang <chinwen.chang@mediatek.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NSteven Price <steven.price@arm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jimmy Assarsson <jimmyassarsson@gmail.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/1597715898-3854-3-git-send-email-chinwen.chang@mediatek.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

03b4b114

proc: optimise smaps for shmem entries · 8cf88646

由 Matthew Wilcox (Oracle) 提交于 10月 13, 2020

Avoid bumping the refcount on pages when we're only interested in the
swap entries.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Link: https://lkml.kernel.org/r/20200910183318.20139-5-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cf88646

04 9月, 2020 1 次提交

arm64: mte: Add PROT_MTE support to mmap() and mprotect() · 9f341931

由 Catalin Marinas 提交于 11月 27, 2019

To enable tagging on a memory range, the user must explicitly opt in via
a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new
memory type in the AttrIndx field of a pte, simplify the or'ing of these
bits over the protection_map[] attributes by making MT_NORMAL index 0.

There are two conditions for arch_vm_get_page_prot() to return the
MT_NORMAL_TAGGED memory type: (1) the user requested it via PROT_MTE,
registered as VM_MTE in the vm_flags, and (2) the vma supports MTE,
decided during the mmap() call (only) and registered as VM_MTE_ALLOWED.

arch_calc_vm_prot_bits() is responsible for registering the user request
as VM_MTE. The newly introduced arch_calc_vm_flag_bits() sets
VM_MTE_ALLOWED if the mapping is MAP_ANONYMOUS. An MTE-capable
filesystem (RAM-based) may be able to set VM_MTE_ALLOWED during its
mmap() file ops call.

In addition, update VM_DATA_DEFAULT_FLAGS to allow mprotect(PROT_MTE) on
stack or brk area.

The Linux mmap() syscall currently ignores unknown PROT_* flags. In the
presence of MTE, an mmap(PROT_MTE) on a file which does not support MTE
will not report an error and the memory will not be mapped as Normal
Tagged. For consistency, mprotect(PROT_MTE) will not report an error
either if the memory range does not support MTE. Two subsequent patches
in the series will propose tightening of this behaviour.
Co-developed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>

9f341931

13 8月, 2020 1 次提交

/proc/PID/smaps: consistent whitespace output format · 471e78cc

由 Michal Koutný 提交于 8月 11, 2020

The keys in smaps output are padded to fixed width with spaces.  All
except for THPeligible that uses tabs (only since commit c0630669
("mm: thp: fix false negative of shmem vma's THP eligibility")).

Unify the output formatting to save time debugging some naïve parsers.
(Part of the unification is also aligning FilePmdMapped with others.)
Signed-off-by: NMichal Koutný <mkoutny@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NYang Shi <yang.shi@linux.alibaba.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200728083207.17531-1-mkoutny@suse.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

471e78cc

10 6月, 2020 2 次提交

mmap locking API: convert mmap_sem comments · c1e8d7c6

由 Michel Lespinasse 提交于 6月 08, 2020

Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1e8d7c6

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites · d8ed45c5

由 Michel Lespinasse 提交于 6月 08, 2020

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8ed45c5

03 6月, 2020 1 次提交

/proc/PID/smaps: Add PMD migration entry parsing · c94b6923

由 Huang Ying 提交于 6月 01, 2020

Now, when reading /proc/PID/smaps, the PMD migration entry in page table
is simply ignored.  To improve the accuracy of /proc/PID/smaps, its
parsing and processing is added.

To test the patch, we run pmbench to eat 400 MB memory in background,
then run /usr/bin/migratepages and `cat /proc/PID/smaps` every second.
The issue as follows can be reproduced within 60 seconds.

Before the patch, for the fully populated 400 MB anonymous VMA, some THP
pages under migration may be lost as below.

  7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
  Size:             409600 kB
  KernelPageSize:        4 kB
  MMUPageSize:           4 kB
  Rss:              407552 kB
  Pss:              407552 kB
  Shared_Clean:          0 kB
  Shared_Dirty:          0 kB
  Private_Clean:         0 kB
  Private_Dirty:    407552 kB
  Referenced:       301056 kB
  Anonymous:        407552 kB
  LazyFree:              0 kB
  AnonHugePages:    405504 kB
  ShmemPmdMapped:        0 kB
  FilePmdMapped:        0 kB
  Shared_Hugetlb:        0 kB
  Private_Hugetlb:       0 kB
  Swap:                  0 kB
  SwapPss:               0 kB
  Locked:                0 kB
  THPeligible:		1
  VmFlags: rd wr mr mw me ac

After the patch, it will be always,

  7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0
  Size:             409600 kB
  KernelPageSize:        4 kB
  MMUPageSize:           4 kB
  Rss:              409600 kB
  Pss:              409600 kB
  Shared_Clean:          0 kB
  Shared_Dirty:          0 kB
  Private_Clean:         0 kB
  Private_Dirty:    409600 kB
  Referenced:       294912 kB
  Anonymous:        409600 kB
  LazyFree:              0 kB
  AnonHugePages:    407552 kB
  ShmemPmdMapped:        0 kB
  FilePmdMapped:        0 kB
  Shared_Hugetlb:        0 kB
  Private_Hugetlb:       0 kB
  Swap:                  0 kB
  SwapPss:               0 kB
  Locked:                0 kB
  THPeligible:		1
  VmFlags: rd wr mr mw me ac
Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NZi Yan <ziy@nvidia.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Link: http://lkml.kernel.org/r/20200403123059.1846960-1-ying.huang@intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c94b6923

23 4月, 2020 1 次提交

mm: Remove MPX leftovers · 66648766

由 Jimmy Assarsson 提交于 4月 02, 2020

Remove MPX leftovers in generic code.

Fixes: 45fc24e8 ("x86/mpx: remove MPX from arch/x86")
Signed-off-by: NJimmy Assarsson <jimmyassarsson@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200402172507.2786-1-jimmyassarsson@gmail.com

66648766

08 4月, 2020 4 次提交

proc: inline m_next_vma into m_next · fad95500

由 Matthew Wilcox (Oracle) 提交于 4月 06, 2020

It's clearer to just put this inline.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-5-adobriyan@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fad95500

proc: use ppos instead of m->version · 4781f2c3

由 Matthew Wilcox (Oracle) 提交于 4月 06, 2020

The ppos is a private cursor, just like m->version. Use the canonical
cursor, not a special one.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-3-adobriyan@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4781f2c3

proc: remove m_cache_vma · c2e88d22

由 Matthew Wilcox (Oracle) 提交于 4月 06, 2020

Instead of setting m->version in the show method, set it in m_next(),
where it should be. Also remove the fallback code for failing to find a
vma, or version being zero.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-2-adobriyan@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2e88d22

proc: inline vma_stop into m_stop · d07ded61

由 Matthew Wilcox (Oracle) 提交于 4月 06, 2020

Instead of calling vma_stop() from m_start() and m_next(), do its work
in m_stop().
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200317193201.9924-1-adobriyan@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d07ded61

17 3月, 2020 1 次提交

mm: smaps: Report arm64 guarded pages in smaps · 424037b7

由 Daniel Kiss 提交于 3月 16, 2020

The arm64 Branch Target Identification support is activated by marking
executable pages as guarded pages.  Report pages mapped this way in
smaps to aid diagnostics.
Signed-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NDaniel Kiss <daniel.kiss@arm.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

424037b7

04 2月, 2020 1 次提交

mm: pagewalk: add 'depth' parameter to pte_hole · b7a16c7a

由 Steven Price 提交于 2月 03, 2020

The pte_hole() callback is called at multiple levels of the page tables.
Code dumping the kernel page tables needs to know what at what depth the
missing entry is.  Add this is an extra parameter to pte_hole().  When the
depth isn't know (e.g.  processing a vma) then -1 is passed.

The depth that is reported is the actual level where the entry is missing
(ignoring any folding that is in place), i.e.  any levels where
PTRS_PER_P?D is set to 1 are ignored.

Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
natural numbers as levels 2/3/4.

Link: http://lkml.kernel.org/r/20191218162402.45610-16-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
Tested-by: NZong Li <zong.li@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7a16c7a

25 9月, 2019 2 次提交

mm,thp: stats for file backed THP · 60fbf0ab

由 Song Liu 提交于 9月 23, 2019

In preparation for non-shmem THP, this patch adds a few stats and exposes
them in /proc/meminfo, /sys/bus/node/devices/<node>/meminfo, and
/proc/<pid>/task/<tid>/smaps.

This patch is mostly a rewrite of Kirill A.  Shutemov's earlier version:
https://lkml.kernel.org/r/20170126115819.58875-5-kirill.shutemov@linux.intel.com/

Link: http://lkml.kernel.org/r/20190801184244.3169074-5-songliubraving@fb.comSigned-off-by: NSong Liu <songliubraving@fb.com>
Acked-by: NRik van Riel <riel@surriel.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60fbf0ab

mm: introduce compound_nr() · d8c6546b

由 Matthew Wilcox (Oracle) 提交于 9月 23, 2019

Replace 1 << compound_order(page) with compound_nr(page).  Minor
improvements in readability.

Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8c6546b

07 9月, 2019 2 次提交

pagewalk: separate function pointers from iterator data · 7b86ac33

由 Christoph Hellwig 提交于 8月 28, 2019

The mm_walk structure currently mixed data and code.  Split out the
operations vectors into a new mm_walk_ops structure, and while we are
changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.

Based on patch from Linus Torvalds.

Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NThomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: NSteven Price <steven.price@arm.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7b86ac33

mm: split out a new pagewalk.h header from mm.h · a520110e

由 Christoph Hellwig 提交于 8月 28, 2019

Add a new header for the two handful of users of the walk_page_range /
walk_page_vma interface instead of polluting all users of mm.h with it.

Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NThomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: NSteven Price <steven.price@arm.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a520110e

19 7月, 2019 1 次提交

mm: thp: fix false negative of shmem vma's THP eligibility · c0630669

由 Yang Shi 提交于 7月 18, 2019

Commit 7635d9cb ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps.  But, when
checking the eligibility for shmem vma, __transparent_hugepage_enabled()
is called to override the result from shmem_huge_enabled().  It may
result in the anonymous vma's THP flag override shmem's.  For example,
running a simple test which create THP for shmem, but with anonymous THP
disabled, when reading the process's smaps, it may show:

  7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
  Size:               4096 kB
  ...
  [snip]
  ...
  ShmemPmdMapped:     4096 kB
  ...
  [snip]
  ...
  THPeligible:    0

And, /proc/meminfo does show THP allocated and PMD mapped too:

  ShmemHugePages:     4096 kB
  ShmemPmdMapped:     4096 kB

This doesn't make too much sense.  The shmem objects should be treated
separately from anonymous THP.  Calling shmem_huge_enabled() with
checking MMF_DISABLE_THP sounds good enough.  And, we could skip stack
and dax vma check since we already checked if the vma is shmem already.

Also check if vma is suitable for THP by calling
transhuge_vma_suitable().

And minor fix to smaps output format and documentation.

Link: http://lkml.kernel.org/r/1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com
Fixes: 7635d9cb ("mm, thp, proc: report THP eligibility for each vma")
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0630669

13 7月, 2019 5 次提交

mm: smaps: split PSS into components · ee2ad71b

由 Luigi Semenzato 提交于 7月 11, 2019

Report separate components (anon, file, and shmem) for PSS in
smaps_rollup.

This helps understand and tune the memory manager behavior in consumer
devices, particularly mobile devices.  Many of them (e.g.  chromebooks and
Android-based devices) use zram for anon memory, and perform disk reads
for discarded file pages.  The difference in latency is large (e.g.
reading a single page from SSD is 30 times slower than decompressing a
zram page on one popular device), thus it is useful to know how much of
the PSS is anon vs.  file.

All the information is already present in /proc/pid/smaps, but much more
expensive to obtain because of the large size of that procfs entry.

This patch also removes a small code duplication in smaps_account, which
would have gotten worse otherwise.

Also updated Documentation/filesystems/proc.txt (the smaps section was a
bit stale, and I added a smaps_rollup section) and
Documentation/ABI/testing/procfs-smaps_rollup.

[semenzato@chromium.org: v5]
  Link: http://lkml.kernel.org/r/20190626234333.44608-1-semenzato@chromium.org
Link: http://lkml.kernel.org/r/20190626180429.174569-1-semenzato@chromium.orgSigned-off-by: NLuigi Semenzato <semenzato@chromium.org>
Acked-by: NYu Zhao <yuzhao@chromium.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Yu Zhao <yuzhao@chromium.org>
Cc: Brian Geffon <bgeffon@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee2ad71b

proc: use down_read_killable mmap_sem for /proc/pid/clear_refs · c4603801

由 Konstantin Khlebnikov 提交于 7月 11, 2019

Do not remain stuck forever if something goes wrong.  Using a killable
lock permits cleanup of stuck tasks and simplifies investigation.

Replace the only unkillable mmap_sem lock in clear_refs_write().

Link: http://lkml.kernel.org/r/156007493826.3335.5424884725467456239.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4603801

proc: use down_read_killable mmap_sem for /proc/pid/pagemap · ad80b932

由 Konstantin Khlebnikov 提交于 7月 11, 2019

Do not remain stuck forever if something goes wrong.  Using a killable
lock permits cleanup of stuck tasks and simplifies investigation.

Link: http://lkml.kernel.org/r/156007493638.3335.4872164955523928492.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad80b932

proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup · a26a9781

由 Konstantin Khlebnikov 提交于 7月 11, 2019

Do not remain stuck forever if something goes wrong.  Using a killable
lock permits cleanup of stuck tasks and simplifies investigation.

Link: http://lkml.kernel.org/r/156007493429.3335.14666825072272692455.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a26a9781

proc: use down_read_killable mmap_sem for /proc/pid/maps · 8a713e7d

由 Konstantin Khlebnikov 提交于 7月 11, 2019

Do not remain stuck forever if something goes wrong.  Using a killable
lock permits cleanup of stuck tasks and simplifies investigation.

This function is also used for /proc/pid/smaps.

Link: http://lkml.kernel.org/r/156007493160.3335.14447544314127417266.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a713e7d

03 7月, 2019 1 次提交

mm: remove MEMORY_DEVICE_PUBLIC support · 25b2995a

由 Christoph Hellwig 提交于 6月 13, 2019

The code hasn't been used since it was added to the tree, and doesn't
appear to actually be usable.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Tested-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

25b2995a

15 5月, 2019 2 次提交

mm/mmu_notifier: use correct mmu_notifier events for each invalidation · 7269f999

由 Jérôme Glisse 提交于 5月 13, 2019

This updates each existing invalidation to use the correct mmu notifier
event that represent what is happening to the CPU page table.  See the
patch which introduced the events to see the rational behind this.

Link: http://lkml.kernel.org/r/20190326164747.24405-7-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7269f999

mm/mmu_notifier: contextual information for event triggering invalidation · 6f4f13e8

由 Jérôme Glisse 提交于 5月 13, 2019

CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also as
a result of kernel activities (memory compression, reclaim, migration,
...).

Users of mmu notifier API track changes to the CPU page table and take
specific action for them.  While current API only provide range of virtual
address affected by the change, not why the changes is happening.

This patchset do the initial mechanical convertion of all the places that
calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
event as well as the vma if it is know (most invalidation happens against
a given vma).  Passing down the vma allows the users of mmu notifier to
inspect the new vma page protection.

The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
should assume that every for the range is going away when that event
happens.  A latter patch do convert mm call path to use a more appropriate
events for each call.

This is done as 2 patches so that no call site is forgotten especialy
as it uses this following coccinelle patch:

%<----------------------------------------------------------------------
@@
identifier I1, I2, I3, I4;
@@
static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
+enum mmu_notifier_event event,
+unsigned flags,
+struct vm_area_struct *vma,
struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }

@@
@@
-#define mmu_notifier_range_init(range, mm, start, end)
+#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end)

@@
expression E1, E3, E4;
identifier I1;
@@
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, I1,
I1->vm_mm, E3, E4)
...>

@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(..., struct vm_area_struct *VMA, ...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }

@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(...) {
struct vm_area_struct *VMA;
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }

@@
expression E1, E2, E3, E4;
identifier FN;
@@
FN(...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, NULL,
E2, E3, E4)
...> }
---------------------------------------------------------------------->%

Applied with:
spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
spatch --sp-file mmu-notifier.spatch --dir mm --in-place

Link: http://lkml.kernel.org/r/20190326164747.24405-6-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f4f13e8

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功