提交 · cf6fd4bd559ee61a4454b161863c8de6f30f8dca · openeuler / Kernel

27 11月, 2019 4 次提交

io_uring: inline struct sqe_submit · cf6fd4bd

由 Pavel Begunkov 提交于 11月 25, 2019

There is no point left in keeping struct sqe_submit. Inline it
into struct io_kiocb, so any req->submit.field is now just req->field

- moves initialisation of ring_file into io_get_req()
- removes duplicated req->sequence.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cf6fd4bd

io_uring: store timeout's sqe->off in proper place · cc42e0ac

由 Pavel Begunkov 提交于 11月 25, 2019

Timeouts' sequence offset (i.e. sqe->off) is stored in
req->submit.sequence under a false name. Keep it in timeout.data
instead. The unused space for sequence will be reclaimed in the
following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc42e0ac

net: disallow ancillary data for __sys_{send,recv}msg_file() · d69e0779

由 Jens Axboe 提交于 11月 25, 2019

Only io_uring uses (and added) these, and we want to disallow the
use of sendmsg/recvmsg for anything but regular data transfers.
Use the newly added prep helper to split the msghdr copy out from
the core function, to check for msg_control and msg_controllen
settings. If either is set, we return -EINVAL.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d69e0779

net: separate out the msghdr copy from ___sys_{send,recv}msg() · 4257c8ca

由 Jens Axboe 提交于 11月 25, 2019

This is in preparation for enabling the io_uring helpers for sendmsg
and recvmsg to first copy the header for validation before continuing
with the operation.

There should be no functional changes in this patch.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4257c8ca

26 11月, 2019 36 次提交

io_uring: remove superfluous check for sqe->off in io_accept() · 8042d6ce

由 Hrvoje Zeba 提交于 11月 25, 2019

This field contains a pointer to addrlen and checking to see if it's set
returns -EINVAL if the caller sets addr & addrlen pointers.

Fixes: 17f2fe35 ("io_uring: add support for IORING_OP_ACCEPT")
Signed-off-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8042d6ce

io_uring: async workers should inherit the user creds · 181e448d

由 Jens Axboe 提交于 11月 25, 2019

If we don't inherit the original task creds, then we can confuse users
like fuse that pass creds in the request header. See link below on
identical aio issue.

Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#uSigned-off-by: NJens Axboe <axboe@kernel.dk>

181e448d

io-wq: have io_wq_create() take a 'data' argument · 576a347b

由 Jens Axboe 提交于 11月 25, 2019

We currently pass in 4 arguments outside of the bounded size. In
preparation for adding one more argument, let's bundle them up in
a struct to make it more readable.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

576a347b

io_uring: fix dead-hung for non-iter fixed rw · 311ae9e1

由 Pavel Begunkov 提交于 11月 24, 2019

Read/write requests to devices without implemented read/write_iter
using fixed buffers can cause general protection fault, which totally
hangs a machine.

io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter()
accesses it as iovec, dereferencing random address.

kmap() page by page in this case

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

311ae9e1

io_uring: add support for IORING_OP_CONNECT · f8e85cf2

由 Jens Axboe 提交于 11月 23, 2019

This allows an application to call connect() in an async fashion. Like
other opcodes, we first try a non-blocking connect, then punt to async
context if we have to.

Note that we can still return -EINPROGRESS, and in that case the caller
should use IORING_OP_POLL_ADD to do an async wait for completion of the
connect request (just like for regular connect(2), except we can do it
async here too).
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8e85cf2

net: add __sys_connect_file() helper · bd3ded31

由 Jens Axboe 提交于 11月 23, 2019

This is identical to __sys_connect(), except it takes a struct file
instead of an fd, and it also allows passing in extra file->f_flags
flags. The latter is done to support masking in O_NONBLOCK without
manipulating the original file flags.

No functional changes in this patch.

Cc: netdev@vger.kernel.org
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bd3ded31

io_uring: only return -EBUSY for submit on non-flushed backlog · c4a2ed72

由 Jens Axboe 提交于 11月 21, 2019

We return -EBUSY on submit when we have a CQ ring overflow backlog, but
that can be a bit problematic if the application is using pure userspace
poll of the CQ ring. For that case, if the ring briefly overflowed and
we have pending entries in the backlog, the submit flushes the backlog
successfully but still returns -EBUSY. If we're able to fully flush the
CQ ring backlog, let the submission proceed.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c4a2ed72

io_uring: only !null ptr to io_issue_sqe() · f9bd67f6

由 Pavel Begunkov 提交于 11月 21, 2019

Pass only non-null @nxt to io_issue_sqe() and handle it at the caller's
side. And propagate it.

- kiocb_done() is only called from io_read() and io_write(), which are
only called from io_issue_sqe(), so it's @nxt != NULL

- io_put_req_find_next() is called either with explicitly non-null local
nxt, or from one of the functions in io_issue_sqe() switch (or their
callees).
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9bd67f6

io_uring: simplify io_req_link_next() · b18fdf71

由 Pavel Begunkov 提交于 11月 21, 2019

"if (nxt)" is always true, as it was checked in the while's condition.
io_wq_current_is_worker() is unnecessary, as non-async callers don't
pass nxt, so io_queue_async_work() will be called for them anyway.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b18fdf71

io_uring: pass only !null to io_req_find_next() · 944e58bf

由 Pavel Begunkov 提交于 11月 21, 2019

Make io_req_find_next() and io_req_link_next() to accept only non-null
nxt, and handle it in callers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

944e58bf

io_uring: remove io_free_req_find_next() · 70cf9f32

由 Pavel Begunkov 提交于 11月 21, 2019

There is only one one-liner user of io_free_req_find_next(). Inline it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

70cf9f32

io_uring: add likely/unlikely in io_get_sqring() · 9835d6fa

由 Pavel Begunkov 提交于 11月 21, 2019

The number of SQEs to submit is specified by a user, so io_get_sqring()
in most of the cases succeeds. Hint compilers about that.

Checking ASM genereted by gcc 9.2.0 for x64, there is one branch
misprediction.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9835d6fa

io_uring: rename __io_submit_sqe() · d732447f

由 Pavel Begunkov 提交于 11月 21, 2019

__io_submit_sqe() is issuing requests, so call it as
such. Moreover, it ends by calling io_iopoll_req_issued().

Rename it and make terminology clearer.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d732447f

io_uring: improve trace_io_uring_defer() trace point · 915967f6

由 Jens Axboe 提交于 11月 21, 2019

We don't have shadow requests anymore, so get rid of the shadow
argument. Add the user_data argument, as that's often useful to easily
match up requests, instead of having to look at request pointers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

915967f6

io_uring: drain next sqe instead of shadowing · 1b4a51b6

由 Pavel Begunkov 提交于 11月 21, 2019

There's an issue with the shadow drain logic in that we drop the
completion lock after deciding to defer a request, then re-grab it later
and assume that the state is still the same. In the mean time, someone
else completing a request could have found and issued it. This can cause
a stall in the queue, by having a shadow request inserted that nobody is
going to drain.

Additionally, if we fail allocating the shadow request, we simply ignore
the drain.

Instead of using a shadow request, defer the next request/link instead.
This also has the following advantages:

- removes semi-duplicated code
- doesn't allocate memory for shadows
- works better if only the head marked for drain
- doesn't need complex synchronisation

On the flip side, it removes the shadow->seq ==
last_drain_in_in_link->seq optimization. That shouldn't be a common
case, and can always be added back, if needed.

Fixes: 4fe2c963 ("io_uring: add support for link with drain")
Cc: Jackie Liu <liuyun01@kylinos.cn>
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1b4a51b6

io_uring: close lookup gap for dependent next work · b76da70f

由 Jens Axboe 提交于 11月 20, 2019

When we find new work to process within the work handler, we queue the
linked timeout before we have issued the new work. This can be
problematic for very short timeouts, as we have a window where the new
work isn't visible.

Allow the work handler to store a callback function for this in the work
item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that
is set, then io-wq will call the callback when it has setup the new work
item.
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b76da70f

io_uring: allow finding next link independent of req reference count · 4d7dd462

由 Jens Axboe 提交于 11月 20, 2019

We currently try and start the next link when we put the request, and
only if we were going to free it. This means that the optimization to
continue executing requests from the same context often fails, as we're
not putting the final reference.

Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the
next request more efficiently.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d7dd462

io_uring: io_allocate_scq_urings() should return a sane state · eb065d30

由 Jens Axboe 提交于 11月 20, 2019

We currently rely on the ring destroy on cleaning things up in case of
failure, but io_allocate_scq_urings() can leave things half initialized
if only parts of it fails.

Be nice and return with either everything setup in success, or return an
error with things nicely cleaned up.

Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb065d30

io_uring: Always REQ_F_FREE_SQE for allocated sqe · bbad27b2

由 Pavel Begunkov 提交于 11月 19, 2019

Always mark requests with allocated sqe and deallocate it in
__io_free_req(). It's easier to follow and doesn't add edge cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bbad27b2

io_uring: io_fail_links() should only consider first linked timeout · 5d960724

由 Jens Axboe 提交于 11月 19, 2019

We currently clear the linked timeout field if we cancel such a timeout,
but we should only attempt to cancel if it's the first one we see.
Others should simply be freed like other requests, as they haven't
been started yet.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d960724

io_uring: Fix leaking linked timeouts · 09fbb0a8

由 Pavel Begunkov 提交于 11月 19, 2019

let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT

1. submission stage: submission references for REQ and LINK_TIMEOUT
are dropped. So, references respectively (1,1,2)

2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all
linked timeouts will call cancel_timeout() and drop 1 reference.
So, references after: (0,0,1). That's a leak.

Make it treat only the first linked timeout as such, and pass others
through __io_double_put_req().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09fbb0a8

io_uring: remove redundant check · f70193d6

由 Pavel Begunkov 提交于 11月 19, 2019

Pass any IORING_OP_LINK_TIMEOUT request further, where it will
eventually fail in io_issue_sqe().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f70193d6

io_uring: break links for failed defer · d3b35796

由 Pavel Begunkov 提交于 11月 19, 2019

If io_req_defer() failed, it needs to cancel a dependant link.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3b35796

io-wq: remove extra space characters · b2e9c7d6

由 Dan Carpenter 提交于 11月 19, 2019

These lines are indented an extra space character.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2e9c7d6

io-wq: wait for io_wq_create() to setup necessary workers · b60fda60

由 Jens Axboe 提交于 11月 19, 2019

We currently have a race where if setup is really slow, we can be
calling io_wq_destroy() before we're done setting up. This will cause
the caller to get stuck waiting for the manager to set things up, but
the manager already exited.

Fix this by doing a sync setup of the manager. This also fixes the case
where if we failed creating workers, we'd also get stuck.

In practice this race window was really small, as we already wait for
the manager to start. Hence someone would have to call io_wq_destroy()
after the task has started, but before it started the first loop. The
reported test case forked tons of these, which is why it became an
issue.

Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
Fixes: 771b53d0 ("io-wq: small threadpool implementation for io_uring")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b60fda60

io_uring: request cancellations should break links · fba38c27

由 Jens Axboe 提交于 11月 18, 2019

We currently don't explicitly break links if a request is cancelled, but
we should. Add explicitly link breakage for all types of request
cancellations that we support.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fba38c27

io_uring: correct poll cancel and linked timeout expiration completion · b0dd8a41

由 Jens Axboe 提交于 11月 18, 2019

Currently a poll request fills a completion entry of 0, even if it got
cancelled. This is odd, and it makes it harder to support with chains.
Ensure that it returns -ECANCELED in the completions events if it got
cancelled, and furthermore ensure that the linked timeout that triggered
it completes with -ETIME if we did indeed trigger the completions
through a timeout.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0dd8a41

io_uring: remove dead REQ_F_SEQ_PREV flag · e0e328c4

由 Jens Axboe 提交于 11月 15, 2019

With the conversion to io-wq, we no longer use that flag. Kill it.

Fixes: 561fb04a ("io_uring: replace workqueue usage with io-wq")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e0e328c4

io_uring: fix sequencing issues with linked timeouts · 94ae5e77

由 Jens Axboe 提交于 11月 14, 2019

We have an issue with timeout links that are deeper in the submit chain,
because we only handle it upfront, not from later submissions. Move the
prep + issue of the timeout link to the async work prep handler, and do
it normally for non-async queue. If we validate and prepare the timeout
links upfront when we first see them, there's nothing stopping us from
supporting any sort of nesting.

Fixes: 2665abfd ("io_uring: add support for linked SQE timeouts")
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

94ae5e77

io_uring: make req->timeout be dynamically allocated · ad8a48ac

由 Jens Axboe 提交于 11月 15, 2019

There are a few reasons for this:

- As a prep to improving the linked timeout logic
- io_timeout is the biggest member in the io_kiocb opcode union

This also enables a few cleanups, like unifying the timer setup between
IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT, and not needing multiple
arguments to the link/prep helpers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad8a48ac

io_uring: make io_double_put_req() use normal completion path · 978db57e

由 Jens Axboe 提交于 11月 14, 2019

If we don't use the normal completion path, we may skip killing links
that should be errored and freed. Add __io_double_put_req() for use
within the completion path itself, other calls should just use
io_double_put_req().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

978db57e

io_uring: cleanup return values from the queueing functions · 0e0702da

由 Jens Axboe 提交于 11月 14, 2019

__io_queue_sqe(), io_queue_sqe(), io_queue_link_head() all return 0/err,
but the caller doesn't care since the errors are handled inline. Clean
these up and just make them void.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0e0702da

io_uring: io_async_cancel() should pass in 'nxt' request pointer · 95a5bbae

由 Jens Axboe 提交于 11月 14, 2019

If we have a linked request, this enables us to pass it back directly
without having to go through async context.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95a5bbae

Merge tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 9c91e6a5

由 Linus Torvalds 提交于 11月 25, 2019

Pull EDAC updates from Borislav Petkov:
 "A lot of changes this time around, details below.

  From the next cycle onwards, we'll switch the EDAC tree to topic
  branches (instead of a single edac-for-next branch) which should make
  the changes handling more flexible, hopefully. We'll see.

  Summary:

   - Rework error logging functions to accept a count of errors
     parameter (Hanna Hawa)

   - Part one of substantial EDAC core + ghes_edac driver cleanup
     (Robert Richter)

   - Print additional useful logging information in skx_* (Tony Luck)

   - Improve amd64_edac hw detection + cleanups (Yazen Ghannam)

   - Misc cleanups, fixes and code improvements"

* tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (35 commits)
  EDAC/altera: Use the Altera System Manager driver
  EDAC/altera: Cleanup the ECC Manager
  EDAC/altera: Use fast register IO for S10 IRQs
  EDAC/ghes: Do not warn when incrementing refcount on 0
  EDAC/Documentation: Describe CPER module definition and DIMM ranks
  EDAC: Unify the mc_event tracepoint call
  EDAC/ghes: Remove intermediate buffer pvt->detail_location
  EDAC/ghes: Fix grain calculation
  EDAC/ghes: Use standard kernel macros for page calculations
  EDAC: Remove misleading comment in struct edac_raw_error_desc
  EDAC/mc: Reduce indentation level in edac_mc_handle_error()
  EDAC/mc: Remove needless zero string termination
  EDAC/mc: Do not BUG_ON() in edac_mc_alloc()
  EDAC: Introduce an mci_for_each_dimm() iterator
  EDAC: Remove EDAC_DIMM_OFF() macro
  EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function
  EDAC/amd64: Get rid of the ECC disabled long message
  EDAC/ghes: Fix locking and memory barrier issues
  EDAC/amd64: Check for memory before fully initializing an instance
  EDAC/amd64: Use cached data when checking for ECC
  ...

9c91e6a5

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 752272f1

由 Linus Torvalds 提交于 11月 25, 2019

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - data abort report and injection
   - steal time support
   - GICv4 performance improvements
   - vgic ITS emulation fixes
   - simplify FWB handling
   - enable halt polling counters
   - make the emulated timer PREEMPT_RT compliant

  s390:
   - small fixes and cleanups
   - selftest improvements
   - yield improvements

  PPC:
   - add capability to tell userspace whether we can single-step the
     guest
   - improve the allocation of XIVE virtual processor IDs
   - rewrite interrupt synthesis code to deliver interrupts in virtual
     mode when appropriate.
   - minor cleanups and improvements.

  x86:
   - XSAVES support for AMD
   - more accurate report of nested guest TSC to the nested hypervisor
   - retpoline optimizations
   - support for nested 5-level page tables
   - PMU virtualization optimizations, and improved support for nested
     PMU virtualization
   - correct latching of INITs for nested virtualization
   - IOAPIC optimization
   - TSX_CTRL virtualization for more TAA happiness
   - improved allocation and flushing of SEV ASIDs
   - many bugfixes and cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
  KVM: x86: Grab KVM's srcu lock when setting nested state
  KVM: x86: Open code shared_msr_update() in its only caller
  KVM: Fix jump label out_free_* in kvm_init()
  KVM: x86: Remove a spurious export of a static function
  KVM: x86: create mmu/ subdirectory
  KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page
  KVM: x86: remove set but not used variable 'called'
  KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
  KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
  KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
  KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
  KVM: x86: do not modify masked bits of shared MSRs
  KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
  KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
  KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
  KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
  KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
  KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
  KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
  ...

752272f1

Merge tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 3f3c8be9

由 Linus Torvalds 提交于 11月 25, 2019

Pull xen updates from Juergen Gross:

 - a small series to remove the build constraint of Xen x86 MCE handling
   to 64-bit only

 - a bunch of minor cleanups

* tag 'for-linus-5.5a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Fix Kconfig indentation
  xen/mcelog: also allow building for 32-bit kernels
  xen/mcelog: add PPIN to record when available
  xen/mcelog: drop __MC_MSR_MCGCAP
  xen/gntdev: Use select for DMA_SHARED_BUFFER
  xen: mm: make xen_mm_init static
  xen: mm: include <xen/xen-ops.h> for missing declarations

3f3c8be9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功