提交 · f4dc7fffa9873db50ec25624572f8217a6225de8 · openeuler / Kernel

27 9月, 2022 2 次提交

efi: libstub: unify initrd loading between architectures · f4dc7fff

由 Ard Biesheuvel 提交于 9月 16, 2022

Use a EFI configuration table to pass the initrd to the core kernel,
instead of per-arch methods. This cleans up the code considerably, and
should make it easier for architectures to get rid of their reliance on
DT for doing EFI boot in the future.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

f4dc7fff

efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmap · eab31265

由 Ard Biesheuvel 提交于 6月 03, 2022

Currently, struct efi_boot_memmap is a struct that is passed around
between callers of efi_get_memory_map() and the users of the resulting
data, and which carries pointers to various variables whose values are
provided by the EFI GetMemoryMap() boot service.

This is overly complex, and it is much easier to carry these values in
the struct itself. So turn the struct into one that carries these data
items directly, including a flex array for the variable number of EFI
memory descriptors that the boot service may return.
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

eab31265

06 9月, 2022 1 次提交

efi/loongarch: Add efistub booting support · ead384d9

由 Huacai Chen 提交于 8月 19, 2022

This patch adds efistub booting support, which is the standard UEFI boot
protocol for LoongArch to use.

We use generic efistub, which means we can pass boot information (i.e.,
system table, memory map, kernel command line, initrd) via a light FDT
and drop a lot of non-standard code.

We use a flat mapping to map the efi runtime in the kernel's address
space. In efi, VA = PA; in kernel, VA = PA + PAGE_OFFSET. As a result,
flat mapping is not identity mapping, SetVirtualAddressMap() is still
needed for the efi runtime.
Tested-by: NXi Ruoyao <xry111@xry111.site>
Signed-off-by: NHuacai Chen <chenhuacai@loongson.cn>
[ardb: change fpic to fpie as suggested by Xi Ruoyao]
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

ead384d9

15 8月, 2022 1 次提交

radix-tree: replace gfp.h inclusion with gfp_types.h · 9f162193

由 Yury Norov 提交于 8月 11, 2022

Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
have gfp_types.h for this.

Fixes powerpc allmodconfig build:

   In file included from include/linux/nodemask.h:97,
                    from include/linux/mmzone.h:17,
                    from include/linux/gfp.h:7,
                    from include/linux/radix-tree.h:12,
                    from include/linux/idr.h:15,
                    from include/linux/kernfs.h:12,
                    from include/linux/sysfs.h:16,
                    from include/linux/kobject.h:20,
                    from include/linux/pci.h:35,
                    from arch/powerpc/kernel/prom_init.c:24:
   include/linux/random.h: In function 'add_latent_entropy':
>> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
      25 |         add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
         |                                              ^~~~~~~~~~~~~~
         |                                              add_latent_entropy
   include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in
Reported-by: Nkernel test robot <lkp@intel.com>
CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NYury Norov <yury.norov@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f162193

13 8月, 2022 2 次提交

io_uring: make io_kiocb_to_cmd() typesafe · f2ccb5ae

由 Stefan Metzmacher 提交于 8月 11, 2022

We need to make sure (at build time) that struct io_cmd_data is not
casted to a structure that's larger.
Signed-off-by: NStefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

f2ccb5ae

fs: don't randomize struct kiocb fields · addebd9a

由 Keith Busch 提交于 8月 12, 2022

This is a size sensitive structure and randomizing can introduce extra
padding that breaks io_uring's fixed size expectations. There are few
fields here as it is, half of which need a fixed order to optimally
pack, so the randomization isn't providing much.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/io-uring/b6f508ca-b1b2-5f40-7998-e4cff1cf7212@kernel.dk/Signed-off-by: NJens Axboe <axboe@kernel.dk>

addebd9a

11 8月, 2022 15 次提交

vdpa: Add suspend operation · 848ecea1

由 Eugenio Pérez 提交于 8月 10, 2022

This operation is optional: It it's not implemented, backend feature bit
will not be exposed.
Signed-off-by: NEugenio Pérez <eperezma@redhat.com>
Message-Id: <20220810171512.2343333-2-eperezma@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

848ecea1

vdpa/mlx5: Implement susupend virtqueue callback · cae15c2e

由 Eli Cohen 提交于 7月 14, 2022

Implement the suspend callback allowing to suspend the virtqueues so
they stop processing descriptors. This is required to allow to query a
consistent state of the virtqueue while live migration is taking place.
Signed-off-by: NEli Cohen <elic@nvidia.com>
Message-Id: <20220714113927.85729-2-elic@nvidia.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

cae15c2e

virtio: add helper virtio_find_vqs_ctx_size() · fe3dc04e

由 Xuan Zhuo 提交于 8月 01, 2022

Introduce helper virtio_find_vqs_ctx_size() to call find_vqs and specify
the maximum size of each vq ring.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-37-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

fe3dc04e

virtio: find_vqs() add arg sizes · a10fba03

由 Xuan Zhuo 提交于 8月 01, 2022

find_vqs() adds a new parameter sizes to specify the size of each vq
vring.

NULL as sizes means that all queues in find_vqs() use the maximum size.
A value in the array is 0, which means that the corresponding queue uses
the maximum size.

In the split scenario, the meaning of size is the largest size, because
it may be limited by memory, the virtio core will try a smaller size.
And the size is power of 2.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NHans de Goede <hdegoede@redhat.com>
Reviewed-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-34-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a10fba03

virtio_pci: introduce helper to get/set queue reset · 0b50cece

由 Xuan Zhuo 提交于 8月 01, 2022

Introduce new helpers to implement queue reset and get queue reset
status.

 https://github.com/oasis-tcs/virtio-spec/issues/124
 https://github.com/oasis-tcs/virtio-spec/issues/139Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-31-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

0b50cece

virtio_pci: struct virtio_pci_common_cfg add queue_reset · 0cdd450e

由 Xuan Zhuo 提交于 8月 01, 2022

Add queue_reset in virtio_pci_modern_common_cfg.

 https://github.com/oasis-tcs/virtio-spec/issues/124
 https://github.com/oasis-tcs/virtio-spec/issues/139Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-30-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

0cdd450e

virtio_ring: struct virtqueue introduce reset · 4913e854

由 Xuan Zhuo 提交于 8月 01, 2022

Introduce a new member reset to the structure virtqueue to determine
whether the current vq is in the reset state. Subsequent patches will
use it.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-29-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

4913e854

virtio: allow to unbreak/break virtqueue individually · 32510631

由 Xuan Zhuo 提交于 8月 01, 2022

This patch allows the new introduced
__virtqueue_break()/__virtqueue_unbreak() to break/unbreak the
virtqueue.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-27-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

32510631

virtio_pci: struct virtio_pci_common_cfg add queue_notify_data · ea024594

由 Xuan Zhuo 提交于 8月 01, 2022

Add queue_notify_data in struct virtio_pci_common_cfg, which comes from
here https://github.com/oasis-tcs/virtio-spec/issues/89

In order not to affect the API, add a dedicated structure struct
virtio_pci_modern_common_cfg to virtio_pci_modern.h.

Since I want to add queue_reset after queue_notify_data, I submitted
this patch first.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-26-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ea024594

virtio_ring: introduce virtqueue_resize() · c790e8e1

由 Xuan Zhuo 提交于 8月 01, 2022

Introduce virtqueue_resize() to implement the resize of vring.
Based on these, the driver can dynamically adjust the size of the vring.
For example: ethtool -G.

virtqueue_resize() implements resize based on the vq reset function. In
case of failure to allocate a new vring, it will give up resize and use
the original vring.

During this process, if the re-enable reset vq fails, the vq can no
longer be used. Although the probability of this situation is not high.

The parameter recycle is used to recycle the buffer that is no longer
used.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-25-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c790e8e1

virtio_ring: split: stop __vring_new_virtqueue as export symbol · 07d9629d

由 Xuan Zhuo 提交于 8月 01, 2022

There is currently only one place to reference __vring_new_virtqueue()
directly from the outside of virtio core. And here vring_new_virtqueue()
can be used instead.

Subsequent patches will modify __vring_new_virtqueue, so stop it as an
export symbol for now.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-8-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

07d9629d

virtio: struct virtio_config_ops add callbacks for queue_reset · 3086e9fc

由 Xuan Zhuo 提交于 8月 01, 2022

reset can be divided into the following four steps (example):
 1. transport: notify the device to reset the queue
 2. vring:     recycle the buffer submitted
 3. vring:     reset/resize the vring (may re-alloc)
 4. transport: mmap vring to device, and enable the queue

In order to support queue reset, add two callbacks in struct
virtio_config_ops to implement steps 1 and 4.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-3-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3086e9fc

virtio: record the maximum queue num supported by the device. · da802961

由 Xuan Zhuo 提交于 8月 01, 2022

virtio-net can display the maximum (supported by hardware) ring size in
ethtool -g eth0.

When the subsequent patch implements vring reset, it can judge whether
the ring size passed by the driver is legal based on this.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

da802961

remoteproc: rename len of rpoc_vring to num · c2a052a4

由 Xuan Zhuo 提交于 6月 24, 2022

Rename the member len in the structure rpoc_vring to num. And remove 'in
bytes' from the comment of it. This is misleading. Because this actually
refers to the size of the virtio vring to be created. The unit is not
bytes.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20220624025621.128843-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c2a052a4

net: fix refcount bug in sk_psock_get (2) · 2a013372

由 Hawkins Jiawei 提交于 8月 05, 2022

Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda #0
 <TASK>
 __refcount_add_not_zero include/linux/refcount.h:163 [inline]
 __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
 refcount_inc_not_zero include/linux/refcount.h:245 [inline]
 sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
 sk_backlog_rcv include/net/sock.h:1061 [inline]
 __release_sock+0x134/0x3b0 net/core/sock.c:2849
 release_sock+0x54/0x1b0 net/core/sock.c:3404
 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
 __sys_shutdown_sock net/socket.c:2331 [inline]
 __sys_shutdown_sock net/socket.c:2325 [inline]
 __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
 __do_sys_shutdown net/socket.c:2351 [inline]
 __se_sys_shutdown net/socket.c:2349 [inline]
 __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
 </TASK>

During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().

Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as psock type, triggering the refcnt warning.

So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.

This patch solves it by using another bit(defined as
SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
sk_user_data points to a psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").

For there will possibly be more flags in the sk_user_data field,
this patch also refactor sk_user_data flags code to be more generic
to improve its maintainability.

Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NWen Gu <guwen@linux.alibaba.com>
Signed-off-by: NHawkins Jiawei <yin31149@gmail.com>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2a013372

10 8月, 2022 3 次提交

add barriers to buffer_uptodate and set_buffer_uptodate · d4252071

由 Mikulas Patocka 提交于 8月 09, 2022

Let's have a look at this piece of code in __bread_slow:

	get_bh(bh);
	bh->b_end_io = end_buffer_read_sync;
	submit_bh(REQ_OP_READ, 0, bh);
	wait_on_buffer(bh);
	if (buffer_uptodate(bh))
		return bh;

Neither wait_on_buffer nor buffer_uptodate contain any memory barrier.
Consequently, if someone calls sb_bread and then reads the buffer data,
the read of buffer data may be executed before wait_on_buffer(bh) on
architectures with weak memory ordering and it may return invalid data.

Fix this bug by adding a memory barrier to set_buffer_uptodate and an
acquire barrier to buffer_uptodate (in a similar way as
folio_test_uptodate and folio_mark_uptodate).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4252071

NFS: Improve write error tracing · af887e43

由 Trond Myklebust 提交于 8月 09, 2022

Don't leak request pointers, but use the "device:inode" labelling that
is used by all the other trace points. Furthermore, replace use of page
indexes with an offset, again in order to align behaviour with other
NFS trace points.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

af887e43

time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64 · 46dae32f

由 Youngmin Nam 提交于 7月 12, 2022

In ns_to_kernel_old_timeval() definition, the function argument is defined
with const identifier in kernel/time/time.c, but the prototype in
include/linux/time32.h looks different.

- The function is defined in kernel/time/time.c as below:
  struct __kernel_old_timeval ns_to_kernel_old_timeval(const s64 nsec)

- The function is decalared in include/linux/time32.h as below:
  extern struct __kernel_old_timeval ns_to_kernel_old_timeval(s64 nsec);

Because the variable of arithmethic types isn't modified in the calling scope,
there's no need to mark arguments as const, which was already mentioned during 
review (Link[1) of the original patch.

Likewise remove the "const" keyword in both definition and declaration of
ns_to_timespec64() as requested by Arnd (Link[2]).

Fixes: a84d1169 ("y2038: Introduce struct __kernel_old_timeval")
Signed-off-by: NYoungmin Nam <youngmin.nam@samsung.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/all/20220712094715.2918823-1-youngmin.nam@samsung.com
Link[1]: https://lore.kernel.org/all/20180310081123.thin6wphgk7tongy@gmail.com/
Link[2]: https://lore.kernel.org/all/CAK8P3a3nknJgEDESGdJH91jMj6R_xydFqWASd8r5BbesdvMBgA@mail.gmail.com/

46dae32f

09 8月, 2022 16 次提交

get rid of non-advancing variants · eba2d3d7

由 Al Viro 提交于 6月 10, 2022

mechanical change; will be further massaged in subsequent commits
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eba2d3d7

iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() · 1ef255e2

由 Al Viro 提交于 6月 09, 2022

Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrappers doing that, convert trivial open-coded
uses of those.

BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ef255e2

ITER_PIPE: fold data_start() and pipe_space_for_user() together · 12d426ab

由 Al Viro 提交于 6月 15, 2022

All their callers are next to each other; all of them
want the total amount of pages and, possibly, the
offset in the partial final buffer.

Combine into a new helper (pipe_npages()), fix the
bogosity in pipe_space_for_user(), while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

12d426ab

ITER_PIPE: cache the type of last buffer · 10f525a8

由 Al Viro 提交于 6月 15, 2022

We often need to find whether the last buffer is anon or not, and
currently it's rather clumsy:
	check if ->iov_offset is non-zero (i.e. that pipe is not empty)
	if so, get the corresponding pipe_buffer and check its ->ops
	if it's &default_pipe_buf_ops, we have an anon buffer.

Let's replace the use of ->iov_offset (which is nowhere near similar to
its role for other flavours) with signed field (->last_offset), with
the following rules:
	empty, no buffers occupied:		0
	anon, with bytes up to N-1 filled:	N
	zero-copy, with bytes up to N-1 filled:	-N

That way abs(i->last_offset) is equal to what used to be in i->iov_offset
and empty vs. anon vs. zero-copy can be distinguished by the sign of
i->last_offset.

	Checks for "should we extend the last buffer or should we start
a new one?" become easier to follow that way.

	Note that most of the operations can only be done in a sane
state - i.e. when the pipe has nothing past the current position of
iterator.  About the only thing that could be done outside of that
state is iov_iter_advance(), which transitions to the sane state by
truncating the pipe.  There are only two cases where we leave the
sane state:
	1) iov_iter_get_pages()/iov_iter_get_pages_alloc().  Will be
dealt with later, when we make get_pages advancing - the callers are
actually happier that way.
	2) iov_iter copied, then something is put into the copy.  Since
they share the underlying pipe, the original gets behind.  When we
decide that we are done with the copy (original is not usable until then)
we advance the original.  direct_io used to be done that way; nowadays
it operates on the original and we do iov_iter_revert() to discard
the excessive data.  At the moment there's nothing in the kernel that
could do that to ITER_PIPE iterators, so this reason for insane state
is theoretical right now.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

10f525a8

new iov_iter flavour - ITER_UBUF · fcb14cb1

由 Al Viro 提交于 5月 22, 2022

Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fcb14cb1

highmem: delete a sentence from kmap_local_page() kdocs · 72f1c55a

由 Fabio M. De Francesco 提交于 7月 28, 2022

kmap_local_page() should always be preferred in place of kmap() and
kmap_atomic().  "Only use when really necessary." is not consistent with
the Documentation/mm/highmem.rst and these kdocs it embeds.

Therefore, delete the above-mentioned sentence from kdocs.

Link: https://lkml.kernel.org/r/20220728154844.10874-7-fmdefrancesco@gmail.comSigned-off-by: NFabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

72f1c55a

highmem: specify that kmap_local_page() is callable from interrupts · 383bbef2

由 Fabio M. De Francesco 提交于 7月 28, 2022

In a recent thread about converting kmap() to kmap_local_page(), the
safety of calling kmap_local_page() was questioned.[1]

"any context" should probably be enough detail for users who want to know
whether or not kmap_local_page() can be called from interrupts.  However,
Linux still has kmap_atomic() which might make users think they must use
the latter in interrupts.

Add "including interrupts" for better clarity.

[1] https://lore.kernel.org/lkml/3187836.aeNJFYEL58@opensuse/

Link: https://lkml.kernel.org/r/20220728154844.10874-3-fmdefrancesco@gmail.comSigned-off-by: NFabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: NIra Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

383bbef2

highmem: remove unneeded spaces in kmap_local_page() kdocs · 729337bc

由 Fabio M. De Francesco 提交于 7月 28, 2022

Patch series "highmem: Extend kmap_local_page() documentation", v2.

The Highmem interface is evolving and the current documentation does not
reflect the intended uses of each of the calls.  Furthermore, after a
recent series of reworks, the differences of the calls can still be
confusing and may lead to the expanded use of calls which are deprecated.

This series is the second round of changes towards an enhanced
documentation of the Highmem's interface; at this stage the patches are
only focused to kmap_local_page().

In addition it also contains some minor clean ups.


This patch (of 7):

In the kdocs of kmap_local_page(), the description of @page starts after
several unnecessary spaces.

Therefore, remove those spaces.

Link: https://lkml.kernel.org/r/20220728154844.10874-1-fmdefrancesco@gmail.com
Link: https://lkml.kernel.org/r/20220728154844.10874-2-fmdefrancesco@gmail.comSigned-off-by: NFabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

729337bc

mm, hwpoison: enable memory error handling on 1GB hugepage · 6f461488

由 Naoya Horiguchi 提交于 7月 14, 2022

Now error handling code is prepared, so remove the blocking code and
enable memory error handling on 1GB hugepage.

Link: https://lkml.kernel.org/r/20220714042420.1847125-9-naoya.horiguchi@linux.devSigned-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

6f461488

mm, hwpoison: set PG_hwpoison for busy hugetlb pages · 38f6d293

由 Naoya Horiguchi 提交于 7月 14, 2022

If memory_failure() fails to grab page refcount on a hugetlb page because
it's busy, it returns without setting PG_hwpoison on it.  This not only
loses a chance of error containment, but breaks the rule that
action_result() should be called only when memory_failure() do any of
handling work (even if that's just setting PG_hwpoison).  This
inconsistency could harm code maintainability.

So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.

Link: https://lkml.kernel.org/r/20220714042420.1847125-6-naoya.horiguchi@linux.dev
Fixes: 405ce051 ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

38f6d293

mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage · ac5fcde0

由 Naoya Horiguchi 提交于 7月 14, 2022

Raw error info list needs to be removed when hwpoisoned hugetlb is
unpoisoned.  And unpoison handler needs to know how many errors there are
in the target hugepage.  So add them.

HPageVmemmapOptimized(hpage) and HPageRawHwpUnreliable(hpage)) sometimes
can't be unpoisoned, so skip them.

Link: https://lkml.kernel.org/r/20220714042420.1847125-5-naoya.horiguchi@linux.devSigned-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

ac5fcde0

mm, hwpoison, hugetlb: support saving mechanism of raw error pages · 161df60e

由 Naoya Horiguchi 提交于 7月 14, 2022

When handling memory error on a hugetlb page, the error handler tries to
dissolve and turn it into 4kB pages.  If it's successfully dissolved,
PageHWPoison flag is moved to the raw error page, so that's all right. 
However, dissolve sometimes fails, then the error page is left as
hwpoisoned hugepage.  It's useful if we can retry to dissolve it to save
healthy pages, but that's not possible now because the information about
where the raw error pages is lost.

Use the private field of a few tail pages to keep that information.  The
code path of shrinking hugepage pool uses this info to try delayed
dissolve.  In order to remember multiple errors in a hugepage, a
singly-linked list originated from SUBPAGE_INDEX_HWPOISON-th tail page is
constructed.  Only simple operations (adding an entry or clearing all) are
required and the list is assumed not to be very long, so this simple data
structure should be enough.

If we failed to save raw error info, the hwpoison hugepage has errors on
unknown subpage, then this new saving mechanism does not work any more, so
disable saving new raw error info and freeing hwpoison hugepages.

Link: https://lkml.kernel.org/r/20220714042420.1847125-4-naoya.horiguchi@linux.devSigned-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

161df60e

mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst · 838691a1

由 Muchun Song 提交于 6月 28, 2022

All the comments which explains how HVO works are moved to
vmemmap_dedup.rst since

  commit 4917f55b ("mm/sparse-vmemmap: improve memory savings for compound devmaps")

except some comments above page_fixed_fake_head().  This commit moves
those comments to vmemmap_dedup.rst and improve vmemmap_dedup.rst as well.

Link: https://lkml.kernel.org/r/20220628092235.91270-8-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

838691a1

mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability · 6213834c

由 Muchun Song 提交于 6月 28, 2022

There is a discussion about the name of hugetlb_vmemmap_alloc/free in
thread [1].  The suggestion suggested by David is rename "alloc/free" to
"optimize/restore" to make functionalities clearer to users, "optimize"
means the function will optimize vmemmap pages, while "restore" means
restoring its vmemmap pages discared before.  This commit does this.

Another discussion is the confusion RESERVE_VMEMMAP_NR isn't used
explicitly for vmemmap_addr but implicitly for vmemmap_end in
hugetlb_vmemmap_alloc/free.  David suggested we can compute what
hugetlb_vmemmap_init() does now at runtime.  We do not need to worry for
the overhead of computing at runtime since the calculation is simple
enough and those functions are not in a hot path.  This commit has the
following improvements:

  1) The function suffixed name ("optimize/restore") is more expressive.
  2) The logic becomes less weird in hugetlb_vmemmap_optimize/restore().
  3) The hugetlb_vmemmap_init() does not need to be exported anymore.
  4) A ->optimize_vmemmap_pages field in struct hstate is killed.
  5) There is only one place where checks is_power_of_2(sizeof(struct
     page)) instead of two places.
  6) Add more comments for hugetlb_vmemmap_optimize/restore().
  7) For external users, hugetlb_optimize_vmemmap_pages() is used for
     detecting if the HugeTLB's vmemmap pages is optimizable originally.
     In this commit, it is killed and we introduce a new helper
     hugetlb_vmemmap_optimizable() to replace it.  The name is more
     expressive.

Link: https://lore.kernel.org/all/20220404074652.68024-2-songmuchun@bytedance.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-7-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

6213834c

mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c · 998a2997

由 Muchun Song 提交于 6月 28, 2022

When I first introduced vmemmap manipulation functions related to HugeTLB,
I thought those functions may be reused by other modules (e.g.  using
similar approach to optimize vmemmap pages, unfortunately, the DAX used
the same approach but does not use those functions).  After two years, we
didn't see any other users.  So move those functions to hugetlb_vmemmap.c.
Code movement without any functional change.

Link: https://lkml.kernel.org/r/20220628092235.91270-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

998a2997

mm: hugetlb_vmemmap: introduce the name HVO · dff03381

由 Muchun Song 提交于 6月 28, 2022

It it inconvenient to mention the feature of optimizing vmemmap pages
associated with HugeTLB pages when communicating with others since there
is no specific or abbreviated name for it when it is first introduced. 
Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now.

This commit also updates the document about "hugetlb_free_vmemmap" by the
way discussed in thread [1].

Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

dff03381

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功