提交 · 95c4cd053a1d7c4f1e171ec31d2fb8a8f5c87efe · openeuler / Kernel

17 12月, 2021 2 次提交

iomap: Convert to_iomap_page to take a folio · 95c4cd05

由 Matthew Wilcox (Oracle) 提交于 4月 27, 2021

The big comment about only using a head page can go away now that
it takes a folio argument.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

95c4cd05

fs/buffer: Convert __block_write_begin_int() to take a folio · d1bd0b4e

由 Matthew Wilcox (Oracle) 提交于 11月 03, 2021

There are no plans to convert buffer_head infrastructure to use large
folios, but __block_write_begin_int() is called from iomap, and it's
more convenient and less error-prone if we pass in a folio from iomap.
It also has a nice saving of almost 200 bytes of code from removing
repeated calls to compound_head().
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>

d1bd0b4e

25 11月, 2021 1 次提交

iomap: iomap_read_inline_data cleanup · 5ad448ce

由 Andreas Gruenbacher 提交于 11月 24, 2021

Change iomap_read_inline_data to return 0 or an error code; this
simplifies the callers.  Add a description.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
[djwong: document the return value of iomap_read_inline_data explicitly]
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

5ad448ce

22 11月, 2021 1 次提交

iomap: Fix inline extent handling in iomap_readpage · d8af404f

由 Andreas Gruenbacher 提交于 11月 17, 2021

Before commit 740499c7 ("iomap: fix the iomap_readpage_actor return
value for inline data"), when hitting an IOMAP_INLINE extent,
iomap_readpage_actor would report having read the entire page.  Since
then, it only reports having read the inline data (iomap->length).

This will force iomap_readpage into another iteration, and the
filesystem will report an unaligned hole after the IOMAP_INLINE extent.
But iomap_readpage_actor (now iomap_readpage_iter) isn't prepared to
deal with unaligned extents, it will get things wrong on filesystems
with a block size smaller than the page size, and we'll eventually run
into the following warning in iomap_iter_advance:

  WARN_ON_ONCE(iter->processed > iomap_length(iter));

Fix that by changing iomap_readpage_iter to return 0 when hitting an
inline extent; this will cause iomap_iter to stop immediately.

To fix readahead as well, change iomap_readahead_iter to pass on
iomap_readpage_iter return values less than or equal to zero.

Fixes: 740499c7 ("iomap: fix the iomap_readpage_actor return value for inline data")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

d8af404f

18 10月, 2021 1 次提交

iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable · a6294593

由 Andreas Gruenbacher 提交于 8月 02, 2021

Turn iov_iter_fault_in_readable into a function that returns the number
of bytes not faulted in, similar to copy_to_user, instead of returning a
non-zero value when any of the requested pages couldn't be faulted in.
This supports the existing users that require all pages to be faulted in
as well as new users that are happy if any pages can be faulted in.

Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
sure this change doesn't silently break things.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

a6294593

17 8月, 2021 13 次提交

iomap: constify iomap_iter_srcmap · fad0a1ab

由 Christoph Hellwig 提交于 8月 10, 2021

The srcmap returned from iomap_iter_srcmap is never modified, so mark
the iomap returned from it const and constify a lot of code that never
modifies the iomap.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

fad0a1ab

iomap: rework unshare flag · b74b1293

由 Christoph Hellwig 提交于 8月 10, 2021

Instead of another internal flags namespace inside of buffered-io.c,
just pass a UNSHARE hint in the main iomap flags field.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

b74b1293

iomap: pass an iomap_iter to various buffered I/O helpers · 1b5c1e36

由 Christoph Hellwig 提交于 8月 10, 2021

Pass the iomap_iter structure instead of individual parameters to
various internal helpers for buffered I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

1b5c1e36

iomap: switch iomap_page_mkwrite to use iomap_iter · 253564ba

由 Christoph Hellwig 提交于 8月 10, 2021

Switch iomap_page_mkwrite to use iomap_iter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

253564ba

iomap: switch iomap_zero_range to use iomap_iter · 2aa3048e

由 Christoph Hellwig 提交于 8月 10, 2021

Switch iomap_zero_range to use iomap_iter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

2aa3048e

iomap: switch iomap_file_unshare to use iomap_iter · 8fc274d1

由 Christoph Hellwig 提交于 8月 10, 2021

Switch iomap_file_unshare to use iomap_iter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

8fc274d1

iomap: switch iomap_file_buffered_write to use iomap_iter · ce83a025

由 Christoph Hellwig 提交于 8月 10, 2021

Switch iomap_file_buffered_write to use iomap_iter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

ce83a025

iomap: switch readahead and readpage to use iomap_iter · f6d48000

由 Christoph Hellwig 提交于 8月 10, 2021

Switch the page cache read functions to use iomap_iter instead of
iomap_apply.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

f6d48000

iomap: fix the iomap_readpage_actor return value for inline data · 740499c7

由 Christoph Hellwig 提交于 8月 10, 2021

The actor should never return a larger value than the length that was
passed in.  The current code handles this gracefully, but the opcoming
iter model will be more picky.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

740499c7

iomap: mark the iomap argument to iomap_read_page_sync const · 1acd9e9c

由 Christoph Hellwig 提交于 8月 10, 2021

iomap_read_page_sync never modifies the passed in iomap, so mark
it const.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

1acd9e9c

iomap: mark the iomap argument to iomap_read_inline_data const · 78c64b00

由 Christoph Hellwig 提交于 8月 10, 2021

iomap_read_inline_data never modifies the passed in iomap, so mark
it const.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

78c64b00

iomap: remove the iomap arguments to ->page_{prepare,done} · 1d25d0ae

由 Christoph Hellwig 提交于 8月 10, 2021

These aren't actually used by the only instance implementing the methods.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

1d25d0ae

iomap: pass writeback errors to the mapping · b69eea82

由 Darrick J. Wong 提交于 8月 10, 2021

Modern-day mapping_set_error has the ability to squash the usual
negative error code into something appropriate for long-term storage in
a struct address_space -- ENOSPC becomes AS_ENOSPC, and everything else
becomes EIO.  iomap squashes /everything/ to EIO, just as XFS did before
that, but this doesn't make sense.

Fix this by making it so that we can pass ENOSPC to userspace when
writeback fails due to space problems.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>

b69eea82

06 8月, 2021 2 次提交

iomap: Add another assertion to inline data handling · ae44f9c2

由 Matthew Wilcox (Oracle) 提交于 8月 04, 2021

Check that the file tail does not cross a page boundary.  Requested by
Andreas.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

ae44f9c2

iomap: Use kmap_local_page instead of kmap_atomic · ab069d5f

由 Matthew Wilcox (Oracle) 提交于 8月 04, 2021

kmap_atomic() has the side-effect of disabling pagefaults and
preemption.  kmap_local_page() does not do this and is preferred.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

ab069d5f

04 8月, 2021 5 次提交

iomap: Fix some typos and bad grammar · f1f264b4

由 Andreas Gruenbacher 提交于 8月 02, 2021

Fix some typos and bad grammar in buffered-io.c to make the comments
easier to read.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

f1f264b4

iomap: Support inline data with block size < page size · b405435b

由 Matthew Wilcox (Oracle) 提交于 8月 02, 2021

Remove the restriction that inline data must start on a page boundary
in a file.  This allows, for example, the first 2KiB to be stored out
of line and the trailing 30 bytes to be stored inline.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

b405435b

iomap: support reading inline data from non-zero pos · 69f4a26c

由 Gao Xiang 提交于 8月 03, 2021

The existing inline data support only works for cases where the entire
file is stored as inline data.  For larger files, EROFS stores the
initial blocks separately and the remainder of the file ("file tail")
adjacent to the inode.  Generalise inline data to allow reading the
inline file tail.  Tails may not cross a page boundary in memory.

We currently have no filesystems that support tails and writing,
so that case is currently disabled (see iomap_write_begin_inline).
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NGao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

69f4a26c

iomap: simplify iomap_add_to_ioend · c1b79f11

由 Christoph Hellwig 提交于 8月 02, 2021

Now that the outstanding writes are counted in bytes, there is no need
to use the low-level __bio_try_merge_page API, we can switch back to
always using bio_add_page and simply iomap_add_to_ioend again.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

c1b79f11

iomap: simplify iomap_readpage_actor · d0364f94

由 Christoph Hellwig 提交于 8月 02, 2021

Now that the outstanding reads are counted in bytes, there is no need
to use the low-level __bio_try_merge_page API, we can switch back to
always using bio_add_page and simplify iomap_readpage_actor again.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

d0364f94

16 7月, 2021 3 次提交

iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor · 229adf3c

由 Andreas Gruenbacher 提交于 7月 15, 2021

Now that we create those objects in iomap_writepage_map when needed,
there's no need to pre-create them in iomap_page_mkwrite_actor anymore.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

229adf3c

iomap: Don't create iomap_page objects for inline files · 637d3375

由 Andreas Gruenbacher 提交于 7月 15, 2021

In iomap_readpage_actor, don't create iop objects for inline inodes.
Otherwise, iomap_read_inline_data will set PageUptodate without setting
iop->uptodate, and iomap_page_release will eventually complain.

To prevent this kind of bug from occurring in the future, make sure the
page doesn't have private data attached in iomap_read_inline_data.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

637d3375

iomap: Permit pages without an iop to enter writeback · 8e1bcef8

由 Andreas Gruenbacher 提交于 7月 15, 2021

Create an iop in the writeback path if one doesn't exist.  This allows us
to avoid creating the iop in some cases.  We'll initially do that for pages
with inline data, but it can be extended to pages which are entirely within
an extent.  It also allows for an iop to be removed from pages in the
future (eg page split).
Co-developed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

8e1bcef8

30 6月, 2021 1 次提交

iomap: use __set_page_dirty_nobuffers · fd7353f8

由 Matthew Wilcox (Oracle) 提交于 6月 28, 2021

The only difference between iomap_set_page_dirty() and
__set_page_dirty_nobuffers() is that the latter includes a debugging check
that a !Uptodate page has private data.

Link: https://lkml.kernel.org/r/20210615162342.1669332-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd7353f8

10 6月, 2021 1 次提交

iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant · f0b65f39

由 Al Viro 提交于 4月 30, 2021

Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
callers do *not* need to do iov_iter_advance() after it. In case when they end
up consuming less than they'd been given they need to do iov_iter_revert() on
everything they had not consumed. That, however, needs to be done only on slow
paths.

All in-tree callers converted. And that kills the last user of iterate_all_kinds()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f0b65f39

03 6月, 2021 1 次提交

generic_perform_write()/iomap_write_actor(): saner logics for short copy · bc1bb416

由 Al Viro 提交于 5月 31, 2021

if we run into a short copy and ->write_end() refuses to advance at all,
use the amount we'd managed to copy for the next iteration to handle.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bc1bb416

15 5月, 2021 1 次提交

mm/filemap: fix readahead return types · 076171a6

由 Matthew Wilcox (Oracle) 提交于 5月 14, 2021

A readahead request will not allocate more memory than can be represented
by a size_t, even on systems that have HIGHMEM available. Change the
length functions from returning an loff_t to a size_t.

Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org
Fixes: 32c0a6bc ("btrfs: add and use readahead_batch_length")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

076171a6

04 5月, 2021 1 次提交

iomap: remove unused private field from ioend · 6e552494

由 Brian Foster 提交于 5月 04, 2021

The only remaining user of ->io_private is the generic ioend merging
infrastructure. The only user of that is XFS, which no longer sets
->io_private or passes an associated merge callback. Remove the
unused parameter and the ->io_private field.

CC: linux-fsdevel@vger.kernel.org
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

6e552494

09 4月, 2021 1 次提交

treewide: Change list_sort to use const pointers · 4f0f586b

由 Sami Tolvanen 提交于 4月 08, 2021

list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.
Suggested-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com

4f0f586b

11 3月, 2021 1 次提交

block: rename BIO_MAX_PAGES to BIO_MAX_VECS · a8affc03

由 Christoph Hellwig 提交于 3月 11, 2021

Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a8affc03

27 2月, 2021 1 次提交

block: Add bio_max_segs · 5f7136db

由 Matthew Wilcox (Oracle) 提交于 1月 29, 2021

It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same.  Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f7136db

26 2月, 2021 1 次提交

xfs: use current->journal_info for detecting transaction recursion · 756b1c34

由 Dave Chinner 提交于 2月 23, 2021

Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
recursion in XFS is just wrong. Remove it from the iomap code and
replace it with XFS specific internal checks using
current->journal_info instead.

[djwong: This change also realigns the lifetime of NOFS flag changes to
match the incore transaction, instead of the inconsistent scheme we have
now.]

Fixes: 9070733b ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS")
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

756b1c34

03 12月, 2020 1 次提交

mm: memcontrol: Use helpers to read page's memcg data · bcfe06bf

由 Roman Gushchin 提交于 12月 01, 2020

Patch series "mm: allow mapping accounted kernel pages to userspace", v6.

Currently a non-slab kernel page which has been charged to a memory cgroup
can't be mapped to userspace. The underlying reason is simple: PageKmemcg
flag is defined as a page type (like buddy, offline, etc), so it takes a
bit from a page->mapped counter. Pages with a type set can't be mapped to
userspace.

But in general the kmemcg flag has nothing to do with mapping to
userspace. It only means that the page has been accounted by the page
allocator, so it has to be properly uncharged on release.

Some bpf maps are mapping the vmalloc-based memory to userspace, and their
memory can't be accounted because of this implementation detail.

This patchset removes this limitation by moving the PageKmemcg flag into
one of the free bits of the page->mem_cgroup pointer. Also it formalizes
accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
adds several checks and removes a couple of obsolete functions. As the
result the code became more robust with fewer open-coded bit tricks.

This patch (of 4):

Currently there are many open-coded reads of the page->mem_cgroup pointer,
as well as a couple of read helpers, which are barely used.

It creates an obstacle on a way to reuse some bits of the pointer for
storing additional bits of information. In fact, we already do this for
slab pages, where the last bit indicates that a pointer has an attached
vector of objcg pointers instead of a regular memcg pointer.

This commits uses 2 existing helpers and introduces a new helper to
converts all read sides to calls of these helpers:
struct mem_cgroup *page_memcg(struct page *page);
struct mem_cgroup *page_memcg_rcu(struct page *page);
struct mem_cgroup *page_memcg_check(struct page *page);

page_memcg_check() is intended to be used in cases when the page can be a
slab page and have a memcg pointer pointing at objcg vector. It does
check the lowest bit, and if set, returns NULL. page_memcg() contains a
VM_BUG_ON_PAGE() check for the page not being a slab page.

To make sure nobody uses a direct access, struct page's
mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
Signed-off-by: NRoman Gushchin <guro@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com

bcfe06bf

05 11月, 2020 2 次提交

iomap: clean up writeback state logic on writepage error · 50e7d6c7

由 Brian Foster 提交于 10月 29, 2020

The iomap writepage error handling logic is a mash of old and
slightly broken XFS writepage logic. When keepwrite writeback state
tracking was introduced in XFS in commit 0d085a52 ("xfs: ensure
WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an
additional cluster writeback context that scanned ahead of
->writepage() to process dirty pages over the current ->writepage()
extent mapping. This context expected a dirty page and required
retention of the TOWRITE tag on partial page processing so the
higher level writeback context would revisit the page (in contrast
to ->writepage(), which passes a page with the dirty bit already
cleared).

The cluster writeback mechanism was eventually removed and some of
the error handling logic folded into the primary writeback path in
commit 150d5be0 ("xfs: remove xfs_cancel_ioend"). This patch
accidentally conflated the two contexts by using the keepwrite logic
in ->writepage() without accounting for the fact that the page is
not dirty. Further, the keepwrite logic has no practical effect on
the core ->writepage() caller (write_cache_pages()) because it never
revisits a page in the current function invocation.

Technically, the page should be redirtied for the keepwrite logic to
have any effect. Otherwise, write_cache_pages() may find the tagged
page but will skip it since it is clean. Even if the page was
redirtied, however, there is still no practical effect to keepwrite
since write_cache_pages() does not wrap around within a single
invocation of the function. Therefore, the dirty page would simply
end up retagged on the next writeback sequence over the associated
range.

All that being said, none of this really matters because redirtying
a partially processed page introduces a potential infinite redirty
-> writeback failure loop that deviates from the current design
principle of clearing the dirty state on writepage failure to avoid
building up too much dirty, unreclaimable memory on the system.
Therefore, drop the spurious keepwrite usage and dirty state
clearing logic from iomap_writepage_map(), treat the partially
processed page the same as a fully processed page, and let the
imminent ioend failure clean up the writeback state.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

50e7d6c7

iomap: support partial page discard on writeback block mapping failure · 763e4cdc

由 Brian Foster 提交于 10月 29, 2020

iomap writeback mapping failure only calls into ->discard_page() if
the current page has not been added to the ioend. Accordingly, the
XFS callback assumes a full page discard and invalidation. This is
problematic for sub-page block size filesystems where some portion
of a page might have been mapped successfully before a failure to
map a delalloc block occurs. ->discard_page() is not called in that
error scenario and the bio is explicitly failed by iomap via the
error return from ->prepare_ioend(). As a result, the filesystem
leaks delalloc blocks and corrupts the filesystem block counters.

Since XFS is the only user of ->discard_page(), tweak the semantics
to invoke the callback unconditionally on mapping errors and provide
the file offset that failed to map. Update xfs_discard_page() to
discard the corresponding portion of the file and pass the range
along to iomap_invalidatepage(). The latter already properly handles
both full and sub-page scenarios by not changing any iomap or page
state on sub-page invalidations.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

763e4cdc

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功