提交 · cb44e4f061e16be65b8a16505e121490c66d30d0 · openeuler / Kernel

25 5月, 2022 1 次提交

lockdown: also lock down previous kgdb use · eadb2f47

由 Daniel Thompson 提交于 5月 23, 2022

KGDB and KDB allow read and write access to kernel memory, and thus
should be restricted during lockdown.  An attacker with access to a
serial port (for example, via a hypervisor console, which some cloud
vendors provide over the network) could trigger the debugger so it is
important that the debugger respect the lockdown mode when/if it is
triggered.

Fix this by integrating lockdown into kdb's existing permissions
mechanism.  Unfortunately kgdb does not have any permissions mechanism
(although it certainly could be added later) so, for now, kgdb is simply
and brutally disabled by immediately exiting the gdb stub without taking
any action.

For lockdowns established early in the boot (e.g. the normal case) then
this should be fine but on systems where kgdb has set breakpoints before
the lockdown is enacted than "bad things" will happen.

CVE: CVE-2022-21499
Co-developed-by: NStephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: NStephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: NDouglas Anderson <dianders@chromium.org>
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eadb2f47

23 5月, 2022 3 次提交

landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER · b91c3e4e

由 Mickaël Salaün 提交于 5月 06, 2022

Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
to allow sandboxed processes to link and rename files from and to a
specific set of file hierarchies.  This access right should be composed
with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename.  This
lift a Landlock limitation that always denied changing the parent of an
inode.

Renaming or linking to the same directory is still always allowed,
whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
considered a threat to user data.

However, creating multiple links or renaming to a different parent
directory may lead to privilege escalations if not handled properly.
Indeed, we must be sure that the source doesn't gain more privileges by
being accessible from the destination.  This is handled by making sure
that the source hierarchy (including the referenced file or directory
itself) restricts at least as much the destination hierarchy.  If it is
not the case, an EXDEV error is returned, making it potentially possible
for user space to copy the file hierarchy instead of moving or linking
it.

Instead of creating different access rights for the source and the
destination, we choose to make it simple and consistent for users.
Indeed, considering the previous constraint, it would be weird to
require such destination access right to be also granted to the source
(to make it a superset).  Moreover, RENAME_EXCHANGE would also add to
the confusion because of paths being both a source and a destination.

See the provided documentation for additional details.

New tests are provided with a following commit.
Reviewed-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NMickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506161102.525323-8-mic@digikod.net

b91c3e4e

LSM: Remove double path_rename hook calls for RENAME_EXCHANGE · 100f59d9

由 Mickaël Salaün 提交于 5月 06, 2022

In order to be able to identify a file exchange with renameat2(2) and
RENAME_EXCHANGE, which will be useful for Landlock [1], propagate the
rename flags to LSMs.  This may also improve performance because of the
switch from two set of LSM hook calls to only one, and because LSMs
using this hook may optimize the double check (e.g. only one lock,
reduce the number of path walks).

AppArmor, Landlock and Tomoyo are updated to leverage this change.  This
should not change the current behavior (same check order), except
(different level of) speed boosts.

[1] https://lore.kernel.org/r/20220221212522.320243-1-mic@digikod.net

Cc: James Morris <jmorris@namei.org>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Serge E. Hallyn <serge@hallyn.com>
Acked-by: NJohn Johansen <john.johansen@canonical.com>
Acked-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NMickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20220506161102.525323-7-mic@digikod.net

100f59d9

landlock: Fix landlock_add_rule(2) documentation · a13e248f

由 Mickaël Salaün 提交于 5月 06, 2022

It is not mandatory to pass a file descriptor obtained with the O_PATH
flag.  Also, replace rule's accesses with ruleset's accesses.

Link: https://lore.kernel.org/r/20220506160820.524344-2-mic@digikod.net
Cc: stable@vger.kernel.org
Signed-off-by: NMickaël Salaün <mic@digikod.net>

a13e248f

20 5月, 2022 4 次提交

nvme: enable uring-passthrough for admin commands · 58e5bdeb

由 Kanchan Joshi 提交于 5月 20, 2022

Add two new opcodes that userspace can use for admin commands:
NVME_URING_CMD_ADMIN : non-vectroed
NVME_URING_CMD_ADMIN_VEC : vectored variant

Wire up support when these are issued on controller node(/dev/nvmeX).
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

58e5bdeb

topology: Remove unused cpu_cluster_mask() · 991d8d81

由 Dietmar Eggemann 提交于 5月 13, 2022

default_topology[] uses cpu_clustergroup_mask() for the CLS level
(guarded by CONFIG_SCHED_CLUSTER) which is currently provided by x86
(arch/x86/kernel/smpboot.c) and arm64 (drivers/base/arch_topology.c).

Fixes: 778c558f ("sched: Add cluster scheduler level in core and
related Kconfig for ARM64")
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NBarry Song <baohua@kernel.org>
Link: https://lore.kernel.org/r/20220513093433.425163-1-dietmar.eggemann@arm.com

991d8d81

sched: Reverse sched_class layout · 546a3fee

由 Peter Zijlstra 提交于 5月 17, 2022

Because GCC-12 is fully stupid about array bounds and it's just really
hard to get a solid array definition from a linker script, flip the
array order to avoid needing negative offsets :-/

This makes the whole relational pointer magic a little less obvious, but
alas.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/YoOLLmLG7HRTXeEm@hirez.programming.kicks-ass.net

546a3fee

io_uring: fix incorrect __kernel_rwf_t cast · 0e7579ca

由 Vasily Averin 提交于 5月 19, 2022

Currently 'make C=1 fs/io_uring.o' generates sparse warning:

  CHECK   fs/io_uring.c
fs/io_uring.c: note: in included file (through
include/trace/trace_events.h, include/trace/define_trace.h, i
nclude/trace/events/io_uring.h):
./include/trace/events/io_uring.h:488:1:
 warning: incorrect type in assignment (different base types)
    expected unsigned int [usertype] op_flags
    got restricted __kernel_rwf_t const [usertype] rw_flags

This happen on cast of sqe->rw_flags which is defined as __kernel_rwf_t,
this type is bitwise and requires __force attribute for any casts.
However rw_flags is a member of the union, and its access can be safely
replaced by using of its neighbours, so let's use poll32_events to fix
the sparse warning.
Signed-off-by: NVasily Averin <vvs@openvz.org>
Link: https://lore.kernel.org/r/6f009241-a63f-ae43-a04b-62841aaef293@openvz.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

0e7579ca

19 5月, 2022 15 次提交

random: move randomize_page() into mm where it belongs · 5ad7dd88

由 Jason A. Donenfeld 提交于 5月 14, 2022

randomize_page is an mm function. It is documented like one. It contains
the history of one. It has the naming convention of one. It looks
just like another very similar function in mm, randomize_stack_top().
And it has always been maintained and updated by mm people. There is no
need for it to be in random.c. In the "which shape does not look like
the other ones" test, pointing to randomize_page() is correct.

So move randomize_page() into mm/util.c, right next to the similar
randomize_stack_top() function.

This commit contains no actual code changes.

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

5ad7dd88

random: remove mostly unused async readiness notifier · 6701de6c

由 Jason A. Donenfeld 提交于 5月 15, 2022

The register_random_ready_notifier() notifier is somewhat complicated,
and was already recently rewritten to use notifier blocks. It is only
used now by one consumer in the kernel, vsprintf.c, for which the async
mechanism is really overly complex for what it actually needs. This
commit removes register_random_ready_notifier() and unregister_random_
ready_notifier(), because it just adds complication with little utility,
and changes vsprintf.c to just check on `!rng_is_initialized() &&
!rng_has_arch_random()`, which will eventually be true. Performance-
wise, that code was already using a static branch, so there's basically
no overhead at all to this change.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

6701de6c

random: remove get_random_bytes_arch() and add rng_has_arch_random() · 248561ad

由 Jason A. Donenfeld 提交于 5月 14, 2022

The RNG incorporates RDRAND into its state at boot and every time it
reseeds, so there's no reason for callers to use it directly. The
hashing that the RNG does on it is preferable to using the bytes raw.

The only current use case of get_random_bytes_arch() is vsprintf's
siphash key for pointer hashing, which uses it to initialize the pointer
secret earlier than usual if RDRAND is available. In order to replace
this narrow use case, just expose whether RDRAND is mixed into the RNG,
with a new function called rng_has_arch_random(). With that taken care
of, there are no users of get_random_bytes_arch() left, so it can be
removed.

Later, if trust_cpu gets turned on by default (as most distros are
doing), this one use of rng_has_arch_random() can probably go away as
well.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

248561ad

random: make consistent use of buf and len · a1940263

由 Jason A. Donenfeld 提交于 5月 13, 2022

The current code was a mix of "nbytes", "count", "size", "buffer", "in",
and so forth. Instead, let's clean this up by naming input parameters
"buf" (or "ubuf") and "len", so that you always understand that you're
reading this variety of function argument.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

a1940263

random: use proper return types on get_random_{int,long}_wait() · 7c3a8a1d

由 Jason A. Donenfeld 提交于 5月 13, 2022

Before these were returning signed values, but the API is intended to be
used with unsigned values.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

7c3a8a1d

random: remove extern from functions in header · 7782cfec

由 Jason A. Donenfeld 提交于 5月 13, 2022

Accoriding to the kernel style guide, having `extern` on functions in
headers is old school and deprecated, and doesn't add anything. So remove
them from random.h, and tidy up the file a little bit too.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

7782cfec

xen: sync xs_wire.h header with upstream xen · 62db0faf

由 Stefano Stabellini 提交于 5月 13, 2022

Sync the xs_wire.h header file in Linux with the one in Xen.
Signed-off-by: NStefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220513211938.719341-1-sstabellini@kernel.orgSigned-off-by: NJuergen Gross <jgross@suse.com>

62db0faf

xen/xenbus: eliminate xenbus_grant_ring() · 4573240f

由 Juergen Gross 提交于 4月 28, 2022

There is no external user of xenbus_grant_ring() left, so merge it into
the only caller xenbus_setup_ring().
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NOleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

4573240f

xen/xenbus: add xenbus_setup_ring() service function · 7050096d

由 Juergen Gross 提交于 4月 28, 2022

Most PV device frontends share very similar code for setting up shared
ring buffers:

- allocate page(s)
- init the ring admin data
- give the backend access to the ring via grants

Tearing down the ring requires similar actions in all frontends again:

- remove grants
- free the page(s)

Provide service functions xenbus_setup_ring() and xenbus_teardown_ring()
for that purpose.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

7050096d

xen: update ring.h · 6fac592c

由 Juergen Gross 提交于 4月 28, 2022

Update include/xen/interface/io/ring.h to its newest version.

Switch the two improper use cases of RING_HAS_UNCONSUMED_RESPONSES() to
XEN_RING_NR_UNCONSUMED_RESPONSES() in order to avoid the nasty
XEN_RING_HAS_UNCONSUMED_IS_BOOL #define.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

6fac592c

xen: update grant_table.h · 79c22318

由 Juergen Gross 提交于 5月 05, 2022

Update include/xen/interface/grant_table.h to its newest version.

This allows to drop some private definitions in grant-table.c and
include/xen/grant_table.h.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>

79c22318

xen: update vscsiif.h · 5ce9231c

由 Juergen Gross 提交于 4月 28, 2022

Update include/xen/interface/io/vscsiif.h to its newest version.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20220428075323.12853-2-jgross@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>

5ce9231c

riscv/efi_stub: Add support for RISCV_EFI_BOOT_PROTOCOL · 3f68e695

由 Sunil V L 提交于 5月 19, 2022

Add support for getting the boot hart ID from the Linux EFI stub using
RISCV_EFI_BOOT_PROTOCOL. This method is preferred over the existing DT
based approach since it works irrespective of DT or ACPI.

The specification of the protocol is hosted at:
https://github.com/riscv-non-isa/riscv-uefiSigned-off-by: NSunil V L <sunilvl@ventanamicro.com>
Acked-by: NPalmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: NHeinrich Schuchardt <heinrich.schuchardt@canonical.com>
Link: https://lore.kernel.org/r/20220519051512.136724-2-sunilvl@ventanamicro.com
[ardb: minor tweaks for coding style and whitespace]
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

3f68e695

libceph: fix potential use-after-free on linger ping and resends · 75dbb685

由 Ilya Dryomov 提交于 5月 14, 2022

request_reinit() is not only ugly as the comment rightfully suggests,
but also unsafe.  Even though it is called with osdc->lock held for
write in all cases, resetting the OSD request refcount can still race
with handle_reply() and result in use-after-free.  Taking linger ping
as an example:

    handle_timeout thread                     handle_reply thread

                                              down_read(&osdc->lock)
                                              req = lookup_request(...)
                                              ...
                                              finish_request(req)  # unregisters
                                              up_read(&osdc->lock)
                                              __complete_request(req)
                                                linger_ping_cb(req)

      # req->r_kref == 2 because handle_reply still holds its ref

    down_write(&osdc->lock)
    send_linger_ping(lreq)
      req = lreq->ping_req  # same req
      # cancel_linger_request is NOT
      # called - handle_reply already
      # unregistered
      request_reinit(req)
        WARN_ON(req->r_kref != 1)  # fires
        request_init(req)
          kref_init(req->r_kref)

                   # req->r_kref == 1 after kref_init

                                              ceph_osdc_put_request(req)
                                                kref_put(req->r_kref)

            # req->r_kref == 0 after kref_put, req is freed

        <further req initialization/use> !!!

This happens because send_linger_ping() always (re)uses the same OSD
request for watch ping requests, relying on cancel_linger_request() to
unregister it from the OSD client and rip its messages out from the
messenger.  send_linger() does the same for watch/notify registration
and watch reconnect requests.  Unfortunately cancel_request() doesn't
guarantee that after it returns the OSD client would be completely done
with the OSD request -- a ref could still be held and the callback (if
specified) could still be invoked too.

The original motivation for request_reinit() was inability to deal with
allocation failures in send_linger() and send_linger_ping().  Switching
to using osdc->req_mempool (currently only used by CephFS) respects that
and allows us to get rid of request_reinit().

Cc: stable@vger.kernel.org
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NXiubo Li <xiubli@redhat.com>
Acked-by: NJeff Layton <jlayton@kernel.org>

75dbb685

nvme: add support for TP4084 - Time-to-Ready Enhancements · 354201c5

由 Christoph Hellwig 提交于 5月 16, 2022

Add support for using longer timeouts during controller initialization
and letting the controller come up with namespaces that are not ready
for I/O yet.  We skip these not ready namespaces during scanning and
only bring them online once anoter scan is kicked off by the AEN that
is set when the NRDY bit gets set in the  I/O Command Set Independent
Identify Namespace Data Structure.   This asynchronous probing avoids
blocking the kernel boot when controllers take a very long time to
recover after unclean shutdowns (up to minutes).
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>

354201c5

18 5月, 2022 6 次提交

random: handle latent entropy and command line from random_init() · 2f14062b

由 Jason A. Donenfeld 提交于 5月 05, 2022

Currently, start_kernel() adds latent entropy and the command line to
the entropy bool *after* the RNG has been initialized, deferring when
it's actually used by things like stack canaries until the next time
the pool is seeded. This surely is not intended.

Rather than splitting up which entropy gets added where and when between
start_kernel() and random_init(), just do everything in random_init(),
which should eliminate these kinds of bugs in the future.

While we're at it, rename the awkwardly titled "rand_initialize()" to
the more standard "random_init()" nomenclature.
Reviewed-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

2f14062b

random32: use real rng for non-deterministic randomness · d4150779

由 Jason A. Donenfeld 提交于 5月 11, 2022

random32.c has two random number generators in it: one that is meant to
be used deterministically, with some predefined seed, and one that does
the same exact thing as random.c, except does it poorly. The first one
has some use cases. The second one no longer does and can be replaced
with calls to random.c's proper random number generator.

The relatively recent siphash-based bad random32.c code was added in
response to concerns that the prior random32.c was too deterministic.
Out of fears that random.c was (at the time) too slow, this code was
anonymously contributed. Then out of that emerged a kind of shadow
entropy gathering system, with its own tentacles throughout various net
code, added willy nilly.

Stop👏making👏bespoke👏random👏number👏generators👏.

Fortunately, recent advances in random.c mean that we can stop playing
with this sketchiness, and just use get_random_u32(), which is now fast
enough. In micro benchmarks using RDPMC, I'm seeing the same median
cycle count between the two functions, with the mean being _slightly_
higher due to batches refilling (which we can optimize further need be).
However, when doing *real* benchmarks of the net functions that actually
use these random numbers, the mean cycles actually *decreased* slightly
(with the median still staying the same), likely because the additional
prandom code means icache misses and complexity, whereas random.c is
generally already being used by something else nearby.

The biggest benefit of this is that there are many users of prandom who
probably should be using cryptographically secure random numbers. This
makes all of those accidental cases become secure by just flipping a
switch. Later on, we can do a tree-wide cleanup to remove the static
inline wrapper functions that this commit adds.

There are also some low-ish hanging fruits for making this even faster
in the future: a get_random_u16() function for use in the networking
stack will give a 2x performance boost there, using SIMD for ChaCha20
will let us compute 4 or 8 or 16 blocks of output in parallel, instead
of just one, giving us large buffers for cheap, and introducing a
get_random_*_bh() function that assumes irqs are already disabled will
shave off a few cycles for ordinary calls. These are things we can chip
away at down the road.
Acked-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

d4150779

siphash: use one source of truth for siphash permutations · e73aaae2

由 Jason A. Donenfeld 提交于 5月 07, 2022

The SipHash family of permutations is currently used in three places:

- siphash.c itself, used in the ordinary way it was intended.
- random32.c, in a construction from an anonymous contributor.
- random.c, as part of its fast_mix function.

Each one of these places reinvents the wheel with the same C code, same
rotation constants, and same symmetry-breaking constants.

This commit tidies things up a bit by placing macros for the
permutations and constants into siphash.h, where each of the three .c
users can access them. It also leaves a note dissuading more users of
them from emerging.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>

e73aaae2

io_uring: add support for ring mapped supplied buffers · c7fb1942

由 Jens Axboe 提交于 4月 30, 2022

Provided buffers allow an application to supply io_uring with buffers
that can then be grabbed for a read/receive request, when the data
source is ready to deliver data. The existing scheme relies on using
IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use
in real world applications. It's pretty efficient if the application
is able to supply back batches of provided buffers when they have been
consumed and the application is ready to recycle them, but if
fragmentation occurs in the buffer space, it can become difficult to
supply enough buffers at the time. This hurts efficiency.

Add a register op, IORING_REGISTER_PBUF_RING, which allows an application
to setup a shared queue for each buffer group of provided buffers. The
application can then supply buffers simply by adding them to this ring,
and the kernel can consume then just as easily. The ring shares the head
with the application, the tail remains private in the kernel.

Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use
IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to the
ring, they must use the mapped ring. Mapped provided buffer rings can
co-exist with normal provided buffers, just not within the same group ID.

To gauge overhead of the existing scheme and evaluate the mapped ring
approach, a simple NOP benchmark was written. It uses a ring of 128
entries, and submits/completes 32 at the time. 'Replenish' is how
many buffers are provided back at the time after they have been
consumed:

Test			Replenish			NOPs/sec
================================================================
No provided buffers	NA				~30M
Provided buffers	32				~16M
Provided buffers	 1				~10M
Ring buffers		32				~27M
Ring buffers		 1				~27M

The ring mapped buffers perform almost as well as not using provided
buffers at all, and they don't care if you provided 1 or more back at
the same time. This means application can just replenish as they go,
rather than need to batch and compact, further reducing overhead in the
application. The NOP benchmark above doesn't need to do any compaction,
so that overhead isn't even reflected in the above test.
Co-developed-by: NDylan Yudaken <dylany@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c7fb1942

locking/atomic: Add generic try_cmpxchg64 support · 0aa7be05

由 Uros Bizjak 提交于 5月 15, 2022

Add generic support for try_cmpxchg64{,_acquire,_release,_relaxed}
and their falbacks involving cmpxchg64.
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220515184205.103089-2-ubizjak@gmail.com

0aa7be05

audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts · 69e9cd66

由 Julian Orth 提交于 5月 17, 2022

Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:

	WARN_ON(context->context != AUDIT_CTX_UNUSED);
	WARN_ON(context->name_count);
	if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
		audit_panic("unrecoverable error in audit_syscall_entry()");
		return;
	}

These problematic dummy contexts are created via the following call
chain:

       exit_to_user_mode_prepare
    -> arch_do_signal_or_restart
    -> get_signal
    -> task_work_run
    -> tctx_task_work
    -> io_req_task_submit
    -> io_issue_sqe
    -> audit_uring_entry

Cc: stable@vger.kernel.org
Fixes: 5bd2182d ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: NJulian Orth <ju.orth@gmail.com>
[PM: subject line tweaks]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

69e9cd66

17 5月, 2022 2 次提交

dma-buf: fix use of DMA_BUF_SET_NAME_{A,B} in userspace · 7c3e9fca

由 Jérôme Pouiller 提交于 5月 17, 2022

The typedefs u32 and u64 are not available in userspace. Thus user get
an error he try to use DMA_BUF_SET_NAME_A or DMA_BUF_SET_NAME_B:

    $ gcc -Wall   -c -MMD -c -o ioctls_list.o ioctls_list.c
    In file included from /usr/include/x86_64-linux-gnu/asm/ioctl.h:1,
                     from /usr/include/linux/ioctl.h:5,
                     from /usr/include/asm-generic/ioctls.h:5,
                     from ioctls_list.c:11:
    ioctls_list.c:463:29: error: ‘u32’ undeclared here (not in a function)
      463 |     { "DMA_BUF_SET_NAME_A", DMA_BUF_SET_NAME_A, -1, -1 }, // linux/dma-buf.h
          |                             ^~~~~~~~~~~~~~~~~~
    ioctls_list.c:464:29: error: ‘u64’ undeclared here (not in a function)
      464 |     { "DMA_BUF_SET_NAME_B", DMA_BUF_SET_NAME_B, -1, -1 }, // linux/dma-buf.h
          |                             ^~~~~~~~~~~~~~~~~~

The issue was initially reported here[1].

[1]: https://github.com/jerome-pouiller/ioctl/pull/14Signed-off-by: NJérôme Pouiller <jerome.pouiller@silabs.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Fixes: a5bff92e ("dma-buf: Fix SET_NAME ioctl uapi")
CC: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20220517072708.245265-1-Jerome.Pouiller@silabs.comSigned-off-by: NChristian König <christian.koenig@amd.com>

7c3e9fca

nvme: split the enum used for various register constants · e626f37e

由 Christoph Hellwig 提交于 5月 16, 2022

Instead of having one big enum add one for each register or field.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

e626f37e

16 5月, 2022 5 次提交

net: fix dev_fill_forward_path with pppoe + bridge · cf2df74e

由 Felix Fietkau 提交于 5月 09, 2022

When calling dev_fill_forward_path on a pppoe device, the provided destination
address is invalid. In order for the bridge fdb lookup to succeed, the pppoe
code needs to update ctx->daddr to the correct value.
Fix this by storing the address inside struct net_device_path_ctx

Fixes: f6efc675 ("net: ppp: resolve forwarding path for bridge pppoe devices")
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cf2df74e

xfrm: fix "disable_policy" flag use when arriving from different devices · e6175a2e

由 Eyal Birger 提交于 5月 13, 2022

In IPv4 setting the "disable_policy" flag on a device means no policy
should be enforced for traffic originating from the device. This was
implemented by seting the DST_NOPOLICY flag in the dst based on the
originating device.

However, dsts are cached in nexthops regardless of the originating
devices, in which case, the DST_NOPOLICY flag value may be incorrect.

Consider the following setup:

                     +------------------------------+
                     | ROUTER                       |
  +-------------+    | +-----------------+          |
  | ipsec src   |----|-|ipsec0           |          |
  +-------------+    | |disable_policy=0 |   +----+ |
                     | +-----------------+   |eth1|-|-----
  +-------------+    | +-----------------+   +----+ |
  | noipsec src |----|-|eth0             |          |
  +-------------+    | |disable_policy=1 |          |
                     | +-----------------+          |
                     +------------------------------+

Where ROUTER has a default route towards eth1.

dst entries for traffic arriving from eth0 would have DST_NOPOLICY
and would be cached and therefore can be reused by traffic originating
from ipsec0, skipping policy check.

Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead
of the DST in IN/FWD IPv4 policy checks.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

e6175a2e

nvme: add missing status values to verbose logging · ca2d8992

由 Max Gurtovoy 提交于 4月 28, 2022

Log a few more path related status codes.
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ca2d8992

cdrom: mark CDROMGETSPINDOWN/CDROMSETSPINDOWN obsolete · 8fa10ee1

由 Paul Gortmaker 提交于 5月 15, 2022

These were only implemented by the IDE CD driver, which has since
been removed. Given that nobody is likely to create new CD/DVD
hardware (and associated drivers) we can mark these appropriately.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Phillip Potter <phil@philpotter.co.uk>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lore.kernel.org/all/20220427132436.12795-3-paul.gortmaker@windriver.comSigned-off-by: NPhillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20220515205833.944139-4-phil@philpotter.co.ukSigned-off-by: NJens Axboe <axboe@kernel.dk>

8fa10ee1

cdrom: remove the unused driver specific disc change ioctl · 03fea699

由 Paul Gortmaker 提交于 5月 15, 2022

This was only used by the ide-cd driver, which went away in
commit b7fb14d3 ("ide: remove the legacy ide driver")
so we might as well take advantage of that and get rid of
this hook as well.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Phillip Potter <phil@philpotter.co.uk>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lore.kernel.org/all/20220427132436.12795-2-paul.gortmaker@windriver.comSigned-off-by: NPhillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20220515205833.944139-3-phil@philpotter.co.ukSigned-off-by: NJens Axboe <axboe@kernel.dk>

03fea699

14 5月, 2022 3 次提交

io_uring: add IORING_ACCEPT_MULTISHOT for accept · 390ed29b

由 Hao Xu 提交于 5月 14, 2022

add an accept_flag IORING_ACCEPT_MULTISHOT for accept, which is to
support multishot.
Signed-off-by: NHao Xu <howeyxu@tencent.com>
Link: https://lore.kernel.org/r/20220514142046.58072-2-haoxu.linux@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

390ed29b

timekeeping: Add raw clock fallback for random_get_entropy() · 1366992e

由 Jason A. Donenfeld 提交于 4月 10, 2022

The addition of random_get_entropy_fallback() provides access to
whichever time source has the highest frequency, which is useful for
gathering entropy on platforms without available cycle counters. It's
not necessarily as good as being able to quickly access a cycle counter
that the CPU has, but it's still something, even when it falls back to
being jiffies-based.

In the event that a given arch does not define get_cycles(), falling
back to the get_cycles() default implementation that returns 0 is really
not the best we can do. Instead, at least calling
random_get_entropy_fallback() would be preferable, because that always
needs to return _something_, even falling back to jiffies eventually.
It's not as though random_get_entropy_fallback() is super high precision
or guaranteed to be entropic, but basically anything that's not zero all
the time is better than returning zero all the time.

Finally, since random_get_entropy_fallback() is used during extremely
early boot when randomizing freelists in mm_init(), it can be called
before timekeeping has been initialized. In that case there really is
nothing we can do; jiffies hasn't even started ticking yet. So just give
up and return 0.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Theodore Ts'o <tytso@mit.edu>

1366992e

security: declare member holding string literal const · 1af0e4a0

由 Christian Göttsche 提交于 2月 17, 2022

The struct security_hook_list member lsm is assigned in
security_add_hooks() with string literals passed from the individual
security modules.  Declare the function parameter and the struct member
const to signal their immutability.

Reported by Clang [-Wwrite-strings]:

    security/selinux/hooks.c:7388:63: error: passing 'const char [8]'
      to parameter of type 'char *' discards qualifiers
      [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
            security_add_hooks(selinux_hooks,
                               ARRAY_SIZE(selinux_hooks), selinux);
                                                          ^~~~~~~~~
    ./include/linux/lsm_hooks.h:1629:11: note: passing argument to
      parameter 'lsm' here
                                    char *lsm);
                                          ^
Signed-off-by: NChristian Göttsche <cgzones@googlemail.com>
Reviewed-by: NPaul Moore <paul@paul-moore.com>
Reviewed-by: NCasey Schaufler <casey@schaufler-ca.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

1af0e4a0

13 5月, 2022 1 次提交

io_uring: add flag for allocating a fully sparse direct descriptor space · a8da73a3

由 Jens Axboe 提交于 5月 09, 2022

Currently to setup a fully sparse descriptor space upfront, the app needs
to alloate an array of the full size and memset it to -1 and then pass
that in. Make this a bit easier by allowing a flag that simply does
this internally rather than needing to copy each slot separately.

This works with IORING_REGISTER_FILES2 as the flag is set in struct
io_uring_rsrc_register, and is only allow when the type is
IORING_RSRC_FILE as this doesn't make sense for registered buffers.
Reviewed-by: NHao Xu <howeyxu@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8da73a3

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功