提交 · c661311f9d59bcd1763d64a3069555ec609fffac · openeuler / Kernel

16 5月, 2023 2 次提交

filemap: Correct the conditions for marking a folio as accessed · c661311f

由 Matthew Wilcox (Oracle) 提交于 5月 16, 2023

mainline inclusion
from mainline-5.19-rc4
commit 5ccc944d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6YDHU
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5ccc944dce3df5fd2fd683a7df4fd49d1068eba2

-------------------------------------------------

We had an off-by-one error which meant that we never marked the first page
in a read as accessed.  This was visible as a slowdown when re-reading
a file as pages were being evicted from cache too soon.  In reviewing
this code, we noticed a second bug where a multi-page folio would be
marked as accessed multiple times when doing reads that were less than
the size of the folio.

Abstract the comparison of whether two file positions are in the same
folio into a new function, fixing both of these bugs.
Reported-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

Conflict: folios is not supported yet
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c661311f

Revert "filemap: Correct the conditions for marking a folio as accessed" · 51c36bf0

由 Yu Kuai 提交于 5月 16, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6YDHU
CVE: NA

--------------------------------

This reverts commit 8c2e5597.

Because this commit make a mistake to judge if the page is the same.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

51c36bf0

18 1月, 2023 3 次提交

mm/filemap.c: remove bogus VM_BUG_ON · 08b4393c

由 Matthew Wilcox (Oracle) 提交于 1月 18, 2023

mainline inclusion
from mainline-v5.16-rc1
commit d417b49f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6110W
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d417b49fff3e2f21043c834841e8623a6098741d

--------------------------------

It is not safe to check page->index without holding the page lock.  It
can be changed if the page is moved between the swap cache and the page
cache for a shmem file, for example.  There is a VM_BUG_ON below which
checks page->index is correct after taking the page lock.

Link: https://lkml.kernel.org/r/20210818144932.940640-1-willy@infradead.org
Fixes: 5c211ba2 ("mm: add and use find_lock_entries")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: <syzbot+c87be4f669d920c76330@syzkaller.appspotmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

08b4393c

tmpfs: fix regressions from wider use of ZERO_PAGE · 2b42032b

由 Hugh Dickins 提交于 1月 18, 2023

mainline inclusion
from mainline-v5.18-rc3
commit 1bdec44b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6113U
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1bdec44b1eee32e311b44b5b06144bb7d9b33938

--------------------------------

Chuck Lever reported fsx-based xfstests generic 075 091 112 127 failing
when 5.18-rc1 NFS server exports tmpfs: bisected to recent tmpfs change.

Whilst nfsd_splice_action() does contain some questionable handling of
repeated pages, and Chuck was able to work around there, history from
Mark Hemment makes clear that there might be similar dangers elsewhere:
it was not a good idea for me to pass ZERO_PAGE down to unknown actors.

Revert shmem_file_read_iter() to using ZERO_PAGE for holes only when
iter_is_iovec(); in other cases, use the more natural iov_iter_zero()
instead of copy_page_to_iter().

We would use iov_iter_zero() throughout, but the x86 clear_user() is not
nearly so well optimized as copy to user (dd of 1T sparse tmpfs file
takes 57 seconds rather than 44 seconds).

And now pagecache_init() does not need to SetPageUptodate(ZERO_PAGE(0)):
which had caused boot failure on arm noMMU STM32F7 and STM32H7 boards

Link: https://lkml.kernel.org/r/9a978571-8648-e830-5735-1f4748ce2e30@google.com
Fixes: 56a8c8eb ("tmpfs: do not allocate pages on read")
Signed-off-by: NHugh Dickins <hughd@google.com>
Reported-by: NPatrice CHOTARD <patrice.chotard@foss.st.com>
Reported-by: NChuck Lever III <chuck.lever@oracle.com>
Tested-by: NChuck Lever III <chuck.lever@oracle.com>
Cc: Mark Hemment <markhemm@googlemail.com>
Cc: Patrice CHOTARD <patrice.chotard@foss.st.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

2b42032b

tmpfs: do not allocate pages on read · 1e9ed93e

由 Hugh Dickins 提交于 1月 18, 2023

mainline inclusion
from mainline-v5.18-rc1
commit 56a8c8eb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6113U
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=56a8c8eb1eaf21261be8cdc4e3715239ac087342

--------------------------------

Mikulas asked in "Do we still need commit a0ee5ec5 ('tmpfs: allocate
on read when stacked')?" in [1]

Lukas noticed this unusual behavior of loop device backed by tmpfs in [2].

Normally, shmem_file_read_iter() copies the ZERO_PAGE when reading
holes; but if it looks like it might be a read for "a stacking
filesystem", it allocates actual pages to the page cache, and even marks
them as dirty.  And reads from the loop device do satisfy the test that
is used.

This oddity was added for an old version of unionfs, to help to limit
its usage to the limited size of the tmpfs mount involved; but about the
same time as the tmpfs mod went in (2.6.25), unionfs was reworked to
proceed differently; and the mod kept just in case others needed it.

Do we still need it? I cannot answer with more certainty than "Probably
not".  It's nasty enough that we really should try to delete it; but if
a regression is reported somewhere, then we might have to revert later.

It's not quite as simple as just removing the test (as Mikulas did):
xfstests generic/013 hung because splice from tmpfs failed on page not
up-to-date and page mapping unset.  That can be fixed just by marking
the ZERO_PAGE as Uptodate, which of course it is: do so in
pagecache_init() - it might be useful to others than tmpfs.

My intention, though, was to stop using the ZERO_PAGE here altogether:
surely iov_iter_zero() is better for this case? Sadly not: it relies on
clear_user(), and the x86 clear_user() is slower than its copy_user() [3].

But while we are still using the ZERO_PAGE, let's stop dirtying its
struct page cacheline with unnecessary get_page() and put_page().

Link: https://lore.kernel.org/linux-mm/alpine.LRH.2.02.2007210510230.6959@file01.intranet.prod.int.rdu2.redhat.com/ [1]
Link: https://lore.kernel.org/linux-mm/20211126075100.gd64odg2bcptiqeb@work/ [2]
Link: https://lore.kernel.org/lkml/2f5ca5e4-e250-a41c-11fb-a7f4ebc7e1c9@google.com/ [3]
Link: https://lkml.kernel.org/r/90bc5e69-9984-b5fa-a685-be55f2b64b@google.comSigned-off-by: NHugh Dickins <hughd@google.com>
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Reported-by: NLukas Czerner <lczerner@redhat.com>
Acked-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

1e9ed93e

16 8月, 2022 1 次提交

Revert "mm/page_cache_limit: do shrink_page_cache when adding page to page cache" · 2e2d1098

由 Chen Wandun 提交于 8月 16, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I56I4P
CVE: NA
backport: openEuler-22.03-LTS

--------------------------------

This reverts commit 9fea105d.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2e2d1098

19 7月, 2022 2 次提交

filemap: Correct the conditions for marking a folio as accessed · 8c2e5597

由 Matthew Wilcox (Oracle) 提交于 7月 19, 2022

mainline inclusion
from mainline-5.19-rc4
commit 5ccc944d
category: bugfix
bugzilla: 186896, https://gitee.com/src-openeuler/kernel/issues/I5GZC8
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5ccc944dce3df5fd2fd683a7df4fd49d1068eba2

-------------------------------------------------

We had an off-by-one error which meant that we never marked the first page
in a read as accessed.  This was visible as a slowdown when re-reading
a file as pages were being evicted from cache too soon.  In reviewing
this code, we noticed a second bug where a multi-page folio would be
marked as accessed multiple times when doing reads that were less than
the size of the folio.

Abstract the comparison of whether two file positions are in the same
folio into a new function, fixing both of these bugs.
Reported-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

Conflict: folios is not supported yet
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8c2e5597

Revert "mm/filemap: fix that first page is not mark accessed in filemap_read()" · 46b64853

由 Yu Kuai 提交于 7月 19, 2022

hulk inclusion
category: bugfix
bugzilla: 186896, https://gitee.com/src-openeuler/kernel/issues/I5GZC8
CVE: NA

--------------------------------

This reverts commit 499ecade.

Prepare to backport solution from mainline.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

46b64853

13 7月, 2022 2 次提交

mm/filemap: fix UAF in find_lock_entries · 0359172b

由 Liu Shixin 提交于 7月 13, 2022

maillist inclusion
category: bugfix
bugzilla: 186821, https://gitee.com/openeuler/kernel/issues/I5G69G

Reference: https://lore.kernel.org/all/20220707020938.2122198-1-liushixin2@huawei.com/

--------------------------------

Release refcount after xas_set to fix UAF which may cause panic like this:

 page:ffffea000491fa40 refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1247e9
 head:ffffea000491fa00 order:3 compound_mapcount:0 compound_pincount:0
 memcg:ffff888104f91091
 flags: 0x2fffff80010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
...
page dumped because: VM_BUG_ON_PAGE(PageTail(page))
 ------------[ cut here ]------------
 kernel BUG at include/linux/page-flags.h:632!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
 CPU: 1 PID: 7642 Comm: sh Not tainted 5.15.51-dirty #26
...
 Call Trace:
  <TASK>
  __invalidate_mapping_pages+0xe7/0x540
  drop_pagecache_sb+0x159/0x320
  iterate_supers+0x120/0x240
  drop_caches_sysctl_handler+0xaa/0xe0
  proc_sys_call_handler+0x2b4/0x480
  new_sync_write+0x3d6/0x5c0
  vfs_write+0x446/0x7a0
  ksys_write+0x105/0x210
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f52b5733130
...

This problem has been fixed on mainline by patch 6b24ca4a ("mm: Use
multi-index entries in the page cache") since it deletes the related code.

Fixes: 5c211ba2 ("mm: add and use find_lock_entries")
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Acked-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
	mm/filemap.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0359172b

mm/filemap: fix that first page is not mark accessed in filemap_read() · 29c68be3

由 Yu Kuai 提交于 7月 13, 2022

hulk inclusion
category: bugfix
bugzilla: 186896, https://gitee.com/src-openeuler/kernel/issues/I5FRAP
CVE: NA

--------------------------------

In filemap_read(), first page will not mark accessed if previous page
is equal to the current page:

if (iocb->ki_pos >> PAGE_SHIFT != ra->prev_pos >> PAGE_SHIFT))
	folio_mark_accessed(fbatch.folios[0]);

However, 'prev_pos' is set to 'ki_pos + copied' during last read, which
means 'prev_pos' can be equal to 'ki_pos' in this read, thus previous
page can be miscaculated.

Fix the problem by setting 'prev_pos' to the start offset of last read,
so that 'prev_pos >> PAGE_SHIFT' will be previous page as expected.

Fixes: 06c04442 ("mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

29c68be3

23 2月, 2022 1 次提交

mm: add support for page cache use reliable memory · 8d1faee5

由 Chen Wandun 提交于 2月 23, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4PM0Z
CVE: NA

--------------------------------

__page_cache_alloc is used to alloc page cache in most file system,
such as ext4, f2fs, so add GFP_RELIABLE flag to use reliable
memory when alloc page cache.
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8d1faee5

29 11月, 2021 1 次提交

mm/page_cache_limit: do shrink_page_cache when adding page to page cache · 9fea105d

由 Chen Wandun 提交于 11月 29, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4HOXK

------------------------------------------

Add hooks in function add_to page_cache and add_to_page_cache_lru
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9fea105d

02 9月, 2021 1 次提交

mm: fix some spelling mistakes in comments · 03161a93

由 Haitao Shi 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 60f7c503
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=60f7c503d971a731ee3c4f884a9f2e80d476730d

------------------------------------------------------------------------

Fix some spelling mistakes in comments:
	udpate ==> update
	succesful ==> successful
	exmaple ==> example
	unneccessary ==> unnecessary
	stoping ==> stopping
	uknown ==> unknown

Link: https://lkml.kernel.org/r/20201127011747.86005-1-shihaitao1@huawei.comSigned-off-by: NHaitao Shi <shihaitao1@huawei.com>
Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
Reviewed-by: NSouptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

03161a93

14 7月, 2021 17 次提交

mm/filemap: fix infinite loop in generic_file_buffered_read() · fb32e21b

由 Kent Overstreet 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 3644e2d2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I40B5X
CVE: NA

-------------------------------------------------

If iter->count is 0 and iocb->ki_pos is page aligned, this causes
nr_pages to be 0.

Then in generic_file_buffered_read_get_pages() find_get_pages_contig()
returns 0 - because we asked for 0 pages, so we call
generic_file_buffered_read_no_cached_page() which attempts to add a page
to the page cache, which fails with -EEXIST, and then we loop. Oops...
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Reported-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fb32e21b

mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig · eaa2e93a

由 Kent Overstreet 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 06c04442
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I40B5X
CVE: NA

-------------------------------------------------

Convert generic_file_buffered_read() to get pages to read from in batches,
and then copy data to userspace from many pages at once - in particular,
we now don't touch any cachelines that might be contended while we're in
the loop to copy data to userspace.

This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.

On smaller reads (512 bytes), there's a very small performance improvement
(1%, within the margin of error).

akpm: kernel test robot found a 32% speedup on one test:
https://lkml.kernel.org/r/20201030081456.GY31092@shao2-debian

Link: https://lkml.kernel.org/r/20201025212949.602194-3-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

eaa2e93a

mm/filemap/c: break generic_file_buffered_read up into multiple functions · 6d75606d

由 Kent Overstreet 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 723ef24b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I40B5X
CVE: NA

-------------------------------------------------

Patch series "generic_file_buffered_read() improvements", v2.

generic_file_buffered_read() has turned into a real monstrosity to work
with.  And it's a major performance improvement, for both small random and
large sequential reads.  On my test box, 4k buffered random reads go from
~150k to ~250k iops, and the improvements to big sequential reads are even
bigger.

This incorporates the fix for IOCB_WAITQ handling that Jens just posted as
well, also factors out lock_page_for_iocb() to improve handling of the
various iocb flags.

This patch (of 2):

This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.

This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible.  More refactoring
could be done, this patch is intended to be relatively minimal.

Link: https://lkml.kernel.org/r/20201025212949.602194-1-kent.overstreet@gmail.com
Link: https://lkml.kernel.org/r/20201025212949.602194-2-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6d75606d

mm/lru: revise the comments of lru_lock · 17556e53

由 Hugh Dickins 提交于 7月 07, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 15b44736
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZF7C?from=project-issue
CVE: NA

--------------------------------------

Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix
the incorrect comments in code.  Also fixed some zone->lru_lock comment
error from ancient time.  etc.

I struggled to understand the comment above move_pages_to_lru() (surely
it never calls page_referenced()), and eventually realized that most of
it had got separated from shrink_active_list(): move that comment back.

Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Chen, Rong A" <rong.a.chen@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: Nchenwandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

17556e53

dax: account DAX entries as nrpages · a79d5881

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 7f0e07fb
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

Simplify mapping_needs_writeback() by accounting DAX entries as pages
instead of exceptional entries.

Link: https://lkml.kernel.org/r/20201026151849.24232-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: NVishal Verma <vishal.l.verma@intel.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	fs/dax.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a79d5881

mm: stop accounting shadow entries · f4834f6b

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 46be67b4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

We no longer need to keep track of how many shadow entries are present in
a mapping.  This saves a few writes to the inode and memory barriers.

Link: https://lkml.kernel.org/r/20201026151849.24232-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: NVishal Verma <vishal.l.verma@intel.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f4834f6b

mm/filemap: fix find_lock_entries hang on 32-bit THP · 0635e153

由 Hugh Dickins 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-9
commit 2d11e738
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

No problem on 64-bit, or without huge pages, but xfstests generic/308
hung uninterruptibly on 32-bit huge tmpfs.

Since commit 0cc3b0ec ("Clarify (and fix) in 4.13 MAX_LFS_FILESIZE
macros"), MAX_LFS_FILESIZE is only a PAGE_SIZE away from wrapping 32-bit
xa_index to 0, so the new find_lock_entries() has to be extra careful
when handling a THP.

Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104211735430.3299@eggly.anvils
Fixes: 5c211ba2 ("mm: add and use find_lock_entries")
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0635e153

mm/filemap: fix mapping_seek_hole_data on THP & 32-bit · 17a01214

由 Hugh Dickins 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-9
commit ed98b015
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

No problem on 64-bit, or without huge pages, but xfstests generic/285
and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, and on
32-bit architectures, with the new mapping_seek_hole_data().  Several
different bugs turned out to need fixing.

u64 cast to stop losing bits when converting unsigned long to loff_t
(and let's use shifts throughout, rather than mixed with * and /).

Use round_up() when advancing pos, to stop assuming that pos was already
THP-aligned when advancing it by THP-size.  (This use of round_up()
assumes that any THP has THP-aligned index: true at present and true
going forward, but could be recoded to avoid the assumption.)

Use xas_set() when iterating away from a THP, so that xa_index stays in
synch with start, instead of drifting away to return bogus offset.

Check start against end to avoid wrapping 32-bit xa_index to 0 (and to
handle these additional cases, seek_data or not, it's easier to break
the loop than goto: so rearrange exit from the function).

[hughd@google.com: remove unneeded u64 casts, per Matthew]
  Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104221347240.1170@eggly.anvils

Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104211737410.3299@eggly.anvils
Fixes: 41139aa4 ("mm/filemap: add mapping_seek_hole_data")
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

17a01214

mm: pass pvec directly to find_get_entries · 2d8a8dcc

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit cf2039af
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

All callers of find_get_entries() use a pvec, so pass it directly instead
of manipulating it in the caller.

Link: https://lkml.kernel.org/r/20201112212641.27837-14-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2d8a8dcc

mm: add an 'end' parameter to find_get_entries · 285ba345

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit ca122fe4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

This simplifies the callers and leads to a more efficient implementation
since the XArray has this functionality already.

Link: https://lkml.kernel.org/r/20201112212641.27837-11-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

285ba345

mm: add and use find_lock_entries · 13bedf37

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 5c211ba2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

We have three functions (shmem_undo_range(), truncate_inode_pages_range()
and invalidate_mapping_pages()) which want exactly this function, so add
it to filemap.c.  Before this patch, shmem_undo_range() would split any
compound page which overlaps either end of the range being punched in both
the first and second loops through the address space.  After this patch,
that functionality is left for the second loop, which is arguably more
appropriate since the first loop is supposed to run through all the pages
quickly, and splitting a page can sleep.

[willy@infradead.org: add assertion]
  Link: https://lkml.kernel.org/r/20201124041507.28996-3-willy@infradead.org

Link: https://lkml.kernel.org/r/20201112212641.27837-10-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

13bedf37

iomap: use mapping_seek_hole_data · 8c0ecdbb

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 54fa39ac
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

Enhance mapping_seek_hole_data() to handle partially uptodate pages and
convert the iomap seek code to call it.

Link: https://lkml.kernel.org/r/20201112212641.27837-9-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8c0ecdbb

mm/filemap: add mapping_seek_hole_data · 6865376f

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 41139aa4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

Rewrite shmem_seek_hole_data() and move it to filemap.c.

[willy@infradead.org: don't put an xa_is_value() page]
  Link: https://lkml.kernel.org/r/20201124041507.28996-4-willy@infradead.org

Link: https://lkml.kernel.org/r/20201112212641.27837-8-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6865376f

mm/filemap: add helper for finding pages · c21b013c

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit c7bad633
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

There is a lot of common code in find_get_entries(),
find_get_pages_range() and find_get_pages_range_tag().  Factor out
find_get_entry() which simplifies all three functions.

[willy@infradead.org: remove VM_BUG_ON_PAGE()]
  Link: https://lkml.kernel.org/r/20201124041507.28996-2-willy@infradead.orgLink: https://lkml.kernel.org/r/20201112212641.27837-7-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c21b013c

mm/filemap: rename find_get_entry to mapping_get_entry · 45c701ea

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit bc5a3011
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

find_get_entry doesn't "find" anything.  It returns the entry at a
particular index.

Link: https://lkml.kernel.org/r/20201112212641.27837-6-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

45c701ea

mm: add FGP_ENTRY · 3da24cb4

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 44835d20
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

The functionality of find_lock_entry() and find_get_entry() can be
provided by pagecache_get_page(), which lets us delete find_lock_entry()
and make find_get_entry() static.

Link: https://lkml.kernel.org/r/20201112212641.27837-5-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3da24cb4

mm: make pagecache tagged lookups return only head pages · 4ffedb34

由 Matthew Wilcox (Oracle) 提交于 7月 12, 2021

mainline inclusion
from mainline-v5.12-rc1
commit c49f50d1
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZE5V
CVE: NA

-------------------------------------------------

Patch series "Overhaul multi-page lookups for THP", v4.

This THP prep patchset changes several page cache iteration APIs to only
return head pages.

 - It's only possible to tag head pages in the page cache, so only
   return head pages, not all their subpages.
 - Factor a lot of common code out of the various batch lookup routines
 - Add mapping_seek_hole_data()
 - Unify find_get_entries() and pagevec_lookup_entries()
 - Make find_get_entries only return head pages, like find_get_entry().

These are only loosely connected, but they seem to make sense together as
a series.

This patch (of 14):

Pagecache tags are used for dirty page writeback.  Since dirtiness is
tracked on a per-THP basis, we only want to return the head page rather
than each subpage of a tagged page.  All the filesystems which use huge
pages today are in-memory, so there are no tagged huge pages today.

Link: https://lkml.kernel.org/r/20201112212641.27837-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NTong Tiangen <tongtiangen@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4ffedb34

09 3月, 2021 1 次提交

mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() · 19d76e3d

由 Waiman Long 提交于 2月 19, 2021

stable inclusion
from stable-5.10.15
commit 032f8e04c0353f015d243008f039bb6e18a173c7
bugzilla: 48167

--------------------------------

commit da74240e upstream.

Commit 3fea5a49 ("mm: memcontrol: convert page cache to a new
mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked()
causing the following splat:

  page dumped because: VM_BUG_ON_PAGE(page_memcg(page))
  pages's memcg:ffff8889a4116000
  ------------[ cut here ]------------
  kernel BUG at mm/memcontrol.c:2924!
  invalid opcode: 0000 [#1] SMP KASAN PTI
  CPU: 35 PID: 12345 Comm: cat Tainted: G S      W I       5.11.0-rc4-debug+ #1
  Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017
  RIP: commit_charge+0xf4/0x130
  Call Trace:
    mem_cgroup_charge+0x175/0x770
    __add_to_page_cache_locked+0x712/0xad0
    add_to_page_cache_lru+0xc5/0x1f0
    cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles]
    __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache]
    __nfs_readpages_from_fscache+0x16d/0x630 [nfs]
    nfs_readpages+0x24e/0x540 [nfs]
    read_pages+0x5b1/0xc40
    page_cache_ra_unbounded+0x460/0x750
    generic_file_buffered_read_get_pages+0x290/0x1710
    generic_file_buffered_read+0x2a9/0xc30
    nfs_file_read+0x13f/0x230 [nfs]
    new_sync_read+0x3af/0x610
    vfs_read+0x339/0x4b0
    ksys_read+0xf1/0x1c0
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Before that commit, there was a try_charge() and commit_charge() in
__add_to_page_cache_locked().  These two separated charge functions were
replaced by a single mem_cgroup_charge().  However, it forgot to add a
matching mem_cgroup_uncharge() when the xarray insertion failed with the
page released back to the pool.

Fix this by adding a mem_cgroup_uncharge() call when insertion error
happens.

Link: https://lkml.kernel.org/r/20210125042441.20030-1-longman@redhat.com
Fixes: 3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
Signed-off-by: NWaiman Long <longman@redhat.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <smuchun@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

19d76e3d

12 12月, 2020 1 次提交

revert "mm/filemap: add static for function __add_to_page_cache_locked" · 16c0cc0c

由 Andrew Morton 提交于 12月 11, 2020

Revert commit 3351b16a ("mm/filemap: add static for function
__add_to_page_cache_locked") due to incompatibility with
ALLOW_ERROR_INJECTION which result in build errors.

Link: https://lkml.kernel.org/r/CAADnVQJ6tmzBXvtroBuEH6QA0H+q7yaSKxrVvVxhqr3KBZdEXg@mail.gmail.comTested-by: NJustin Forbes <jmforbes@linuxtx.org>
Tested-by: NGreg Thelen <gthelen@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Michal Kubecek <mkubecek@suse.cz>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Tony Luck <tony.luck@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

16c0cc0c

07 12月, 2020 1 次提交

mm/filemap: add static for function __add_to_page_cache_locked · 3351b16a

由 Alex Shi 提交于 12月 05, 2020

mm/filemap.c:830:14: warning: no previous prototype for `__add_to_page_cache_locked' [-Wmissing-prototypes]
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Link: https://lkml.kernel.org/r/1604661895-5495-1-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3351b16a

25 11月, 2020 1 次提交

mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) · 073861ed

由 Hugh Dickins 提交于 11月 24, 2020

Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.

The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.

https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).

Then on crashing a second time, realized there's a stronger reason against
that approach.  If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()?  And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon?  What
would that look like?

It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).

Matthew Wilcox explaining this to himself:
 "page is allocated, added to page cache, dirtied, writeback starts,

  --- thread A ---
  filesystem calls end_page_writeback()
        test_clear_page_writeback()
  --- context switch to thread B ---
  truncate_inode_pages_range() finds the page, it doesn't have writeback set,
  we delete it from the page cache.  Page gets reallocated, dirtied, writeback
  starts again.  Then we call write_cache_pages(), see
  PageWriteback() set, call wait_on_page_writeback()
  --- context switch back to thread A ---
  wake_up_page(page, PG_writeback);
  ... thread B is woken, but because the wakeup was for the old use of
  the page, PageWriteback is still set.

  Devious"

And prior to 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.

I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().

Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared?  I think not,
because each waiter does hold a reference on the page.  This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it).  The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).

Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
Reported-by: NQian Cai <cai@lca.pw>
Fixes: 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

073861ed

17 11月, 2020 1 次提交

mm: never attempt async page lock if we've transferred data already · 0abed7c6

由 Jens Axboe 提交于 11月 16, 2020

We catch the case where we enter generic_file_buffered_read() with data
already transferred, but we also need to be careful not to allow an async
page lock if we're looping transferring data. If not, we could be
returning -EIOCBQUEUED instead of the transferred amount, and it could
result in double waitqueue additions as well.

Cc: stable@vger.kernel.org # v5.9
Fixes: 1a0a7853 ("mm: support async buffered reads in generic_file_buffered_read()")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0abed7c6

18 10月, 2020 1 次提交

mm: mark async iocb read as NOWAIT once some data has been copied · 13bd6914

由 Jens Axboe 提交于 10月 17, 2020

Once we've copied some data for an iocb that is marked with IOCB_WAITQ,
we should no longer attempt to async lock a new page. Instead make sure
we return the copied amount, and let the caller retry, instead of
returning -EIOCBQUEUED for a new page.

This should only be possible with read-ahead disabled on the below
device, and multiple threads racing on the same file. Haven't been able
to reproduce on anything else.

Cc: stable@vger.kernel.org # v5.9
Fixes: 1a0a7853 ("mm: support async buffered reads in generic_file_buffered_read()")
Reported-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

13bd6914

17 10月, 2020 4 次提交

mm: fix some broken comments · 0e9aa675

由 Miaohe Lin 提交于 10月 15, 2020

Fix some broken comments including typo, grammar error and wrong function
name.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200913095456.54873-1-linmiaohe@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e9aa675

mm/filemap: fold ra_submit into do_sync_mmap_readahead · db660d46

由 David Howells 提交于 10月 15, 2020

Fold ra_submit() into its last remaining user and pass the
readahead_control struct to both do_page_cache_ra() and
page_cache_sync_ra().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-9-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db660d46

mm/filemap: fix page cache removal for arbitrary sized THPs · 887b22c6

由 Matthew Wilcox (Oracle) 提交于 10月 15, 2020

Patch series "Remove assumptions of THP size".

There are a number of places in the VM which assume that a THP is a PMD in
size.  That's true today, and remains true after this patch series, but
this is a prerequisite for switching to arbitrary-sized THPs.
thp_nr_pages() still returns either HPAGE_PMD_NR or 1, but will be changed
later.

This patch (of 11):

page_cache_free_page() assumes THPs are PMD_SIZE; fix that assumption.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Link: https://lkml.kernel.org/r/20200908195539.25896-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200908195539.25896-2-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

887b22c6

mm/filemap: fix storing to a THP shadow entry · 198b62f8

由 Matthew Wilcox (Oracle) 提交于 10月 15, 2020

When a THP is removed from the page cache by reclaim, we replace it with a
shadow entry that occupies all slots of the XArray previously occupied by
the THP.  If the user then accesses that page again, we only allocate a
single page, but storing it into the shadow entry replaces all entries
with that one page.  That leads to bugs like

page dumped because: VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
------------[ cut here ]------------
kernel BUG at mm/filemap.c:2529!

https://bugzilla.kernel.org/show_bug.cgi?id=206569

This is hard to reproduce with mainline, but happens regularly with the
THP patchset (as so many more THPs are created).  This solution is take
from the THP patchset.  It splits the shadow entry into order-0 pieces at
the time that we bring a new page into cache.

Fixes: 99cb0dbd ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/20200903183029.14930-4-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

198b62f8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功