提交 · 43f8a6a74ee2442b9410ed297f5d4c77e7cb5ace · openeuler / Kernel

04 12月, 2019 1 次提交

smb3: query attributes on file close · 43f8a6a7

由 Steve French 提交于 12月 02, 2019

Since timestamps on files on most servers can be updated at
close, and since timestamps on our dentries default to one
second we can have stale timestamps in some common cases
(e.g. open, write, close, stat, wait one second, stat - will
show different mtime for the first and second stat).

The SMB2/SMB3 protocol allows querying timestamps at close
so add the code to request timestamp and attr information
(which is cheap for the server to provide) to be returned
when a file is closed (it is not needed for the many
paths that call SMB2_close that are from compounded
query infos and close nor is it needed for some of
the cases where a directory close immediately follows a
directory open.
Signed-off-by: NSteve French <stfrench@microsoft.com>
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

43f8a6a7

03 12月, 2019 4 次提交

smb3: remove unused flag passed into close functions · 9e8fae25

由 Steve French 提交于 12月 02, 2019

close was relayered to allow passing in an async flag which
is no longer needed in this path.  Remove the unneeded parameter
"flags" passed in on close.
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>

9e8fae25

cifs: remove redundant assignment to pointer pneg_ctxt · a9f76cf8

由 Colin Ian King 提交于 12月 02, 2019

The pointer pneg_ctxt is being initialized with a value that is never
read and it is being updated later with a new value.  The assignment
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

a9f76cf8

fs: cifs: Fix atime update check vs mtime · 69738cfd

由 Deepa Dinamani 提交于 11月 29, 2019

According to the comment in the code and commit log, some apps
expect atime >= mtime; but the introduced code results in
atime==mtime.  Fix the comparison to guard against atime<mtime.

Fixes: 9b9c5bea ("cifs: do not return atime less than mtime")
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Cc: stfrench@microsoft.com
Cc: linux-cifs@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

69738cfd

CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks · 6f582b27

由 Pavel Shilovsky 提交于 11月 27, 2019

Currently when the client creates a cifsFileInfo structure for
a newly opened file, it allocates a list of byte-range locks
with a pointer to the new cfile and attaches this list to the
inode's lock list. The latter happens before initializing all
other fields, e.g. cfile->tlink. Thus a partially initialized
cifsFileInfo structure becomes available to other threads that
walk through the inode's lock list. One example of such a thread
may be an oplock break worker thread that tries to push all
cached byte-range locks. This causes NULL-pointer dereference
in smb2_push_mandatory_locks() when accessing cfile->tlink:

[598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
...
[598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
[598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
...
[598428.945834] Call Trace:
[598428.945870]  ? cifs_revalidate_mapping+0x45/0x90 [cifs]
[598428.945901]  cifs_oplock_break+0x13d/0x450 [cifs]
[598428.945909]  process_one_work+0x1db/0x380
[598428.945914]  worker_thread+0x4d/0x400
[598428.945921]  kthread+0x104/0x140
[598428.945925]  ? process_one_work+0x380/0x380
[598428.945931]  ? kthread_park+0x80/0x80
[598428.945937]  ret_from_fork+0x35/0x40

Fix this by reordering initialization steps of the cifsFileInfo
structure: initialize all the fields first and then add the new
byte-range lock list to the inode's lock list.

Cc: Stable <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

6f582b27

28 11月, 2019 1 次提交

CIFS: fix a white space issue in cifs_get_inode_info() · 68464b88

由 Dan Carpenter via samba-technical 提交于 11月 26, 2019

We accidentally messed up the indenting on this if statement.

Fixes: 16c696a6c300 ("CIFS: refactor cifs_get_inode_info()")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

68464b88

27 11月, 2019 8 次提交

io_uring: make poll->wait dynamically allocated · e944475e

由 Jens Axboe 提交于 11月 26, 2019

In the quest to bring io_kiocb down to 3 cachelines, this one does
the trick. Make the wait_queue_entry for the poll command come out
of kmalloc instead of embedding it in struct io_poll_iocb, as the
latter is the largest member of io_kiocb. Once we trim this down a
bit, we're back at a healthy 192 bytes for struct io_kiocb.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e944475e

io-wq: shrink io_wq_work a bit · 6206f0e1

由 Jens Axboe 提交于 11月 26, 2019

Currently we're using 40 bytes for the io_wq_work structure, and 16 of
those is the doubly link list node. We don't need doubly linked lists,
we always add to tail to keep things ordered, and any other use case
is list traversal with deletion. For the deletion case, we can easily
support any node deletion by keeping track of the previous entry.

This shrinks io_wq_work to 32 bytes, and subsequently io_kiock from
io_uring to 216 to 208 bytes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6206f0e1

io-wq: fix handling of NUMA node IDs · 3fc50ab5

由 Jann Horn 提交于 11月 26, 2019

There are several things that can go wrong in the current code on NUMA
systems, especially if not all nodes are online all the time:

 - If the identifiers of the online nodes do not form a single contiguous
   block starting at zero, wq->wqes will be too small, and OOB memory
   accesses will occur e.g. in the loop in io_wq_create().
 - If a node comes online between the call to num_online_nodes() and the
   for_each_node() loop in io_wq_create(), an OOB write will occur.
 - If a node comes online between io_wq_create() and io_wq_enqueue(), a
   lookup is performed for an element that doesn't exist, and an OOB read
   will probably occur.

Fix it by:

 - using nr_node_ids instead of num_online_nodes() for the allocation size;
   nr_node_ids is calculated by setup_nr_node_ids() to be bigger than the
   highest node ID that could possibly come online at some point, even if
   those nodes' identifiers are not a contiguous block
 - creating workers for all possible CPUs, not just all online ones

This is basically what the normal workqueue code also does, as far as I can
tell.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3fc50ab5

io_uring: use kzalloc instead of kcalloc for single-element allocations · ad6e005c

由 Jann Horn 提交于 11月 26, 2019

These allocations are single-element allocations, so don't use the array
allocation wrapper for them.
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad6e005c

io_uring: cleanup io_import_fixed() · 7d009165

由 Pavel Begunkov 提交于 11月 25, 2019

Clean io_import_fixed() call site and make it return proper type.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7d009165

io_uring: inline struct sqe_submit · cf6fd4bd

由 Pavel Begunkov 提交于 11月 25, 2019

There is no point left in keeping struct sqe_submit. Inline it
into struct io_kiocb, so any req->submit.field is now just req->field

- moves initialisation of ring_file into io_get_req()
- removes duplicated req->sequence.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cf6fd4bd

io_uring: store timeout's sqe->off in proper place · cc42e0ac

由 Pavel Begunkov 提交于 11月 25, 2019

Timeouts' sequence offset (i.e. sqe->off) is stored in
req->submit.sequence under a false name. Keep it in timeout.data
instead. The unused space for sequence will be reclaimed in the
following patches.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc42e0ac

Revert "vfs: properly and reliably lock f_pos in fdget_pos()" · 2be7d348

由 Linus Torvalds 提交于 11月 26, 2019

This reverts commit 0be0ee71.

I was hoping it would be benign to switch over entirely to FMODE_STREAM,
and we'd have just a couple of small fixups we'd need, but it looks like
we're not quite there yet.

While it worked fine on both my desktop and laptop, they are fairly
similar in other respects, and run mostly the same loads.  Kenneth
Crudup reports that it seems to break both his vmware installation and
the KDE upower service.  In both cases apparently leading to timeouts
due to waitinmg for the f_pos lock.

There are a number of character devices in particular that definitely
want stream-like behavior, but that currently don't get marked as
streams, and as a result get the exclusion between concurrent
read()/write() on the same file descriptor.  Which doesn't work well for
them.

The most obvious example if this is /dev/console and /dev/tty, which use
console_fops and tty_fops respectively (and ptmx_fops for the pty master
side).  It may be that it's just this that causes problems, but we
clearly weren't ready yet.

Because there's a number of other likely common cases that don't have
llseek implementations and would seem to act as stream devices:

  /dev/fuse		(fuse_dev_operations)
  /dev/mcelog		(mce_chrdev_ops)
  /dev/mei0		(mei_fops)
  /dev/net/tun		(tun_fops)
  /dev/nvme0		(nvme_dev_fops)
  /dev/tpm0		(tpm_fops)
  /proc/self/ns/mnt	(ns_file_operations)
  /dev/snd/pcm*		(snd_pcm_f_ops[])

and while some of these could be trivially automatically detected by the
vfs layer when the character device is opened by just noticing that they
have no read or write operations either, it often isn't that obvious.

Some character devices most definitely do use the file position, even if
they don't allow seeking: the firmware update code, for example, uses
simple_read_from_buffer() that does use f_pos, but doesn't allow seeking
back and forth.

We'll revisit this when there's a better way to detect the problem and
fix it (possibly with a coccinelle script to do more of the FMODE_STREAM
annotations).
Reported-by: NKenneth R. Crudup <kenny@panix.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2be7d348

26 11月, 2019 26 次提交

io_uring: remove superfluous check for sqe->off in io_accept() · 8042d6ce

由 Hrvoje Zeba 提交于 11月 25, 2019

This field contains a pointer to addrlen and checking to see if it's set
returns -EINVAL if the caller sets addr & addrlen pointers.

Fixes: 17f2fe35 ("io_uring: add support for IORING_OP_ACCEPT")
Signed-off-by: NHrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8042d6ce

io_uring: async workers should inherit the user creds · 181e448d

由 Jens Axboe 提交于 11月 25, 2019

If we don't inherit the original task creds, then we can confuse users
like fuse that pass creds in the request header. See link below on
identical aio issue.

Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#uSigned-off-by: NJens Axboe <axboe@kernel.dk>

181e448d

io-wq: have io_wq_create() take a 'data' argument · 576a347b

由 Jens Axboe 提交于 11月 25, 2019

We currently pass in 4 arguments outside of the bounded size. In
preparation for adding one more argument, let's bundle them up in
a struct to make it more readable.

No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

576a347b

io_uring: fix dead-hung for non-iter fixed rw · 311ae9e1

由 Pavel Begunkov 提交于 11月 24, 2019

Read/write requests to devices without implemented read/write_iter
using fixed buffers can cause general protection fault, which totally
hangs a machine.

io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter()
accesses it as iovec, dereferencing random address.

kmap() page by page in this case

Cc: stable@vger.kernel.org
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

311ae9e1

io_uring: add support for IORING_OP_CONNECT · f8e85cf2

由 Jens Axboe 提交于 11月 23, 2019

This allows an application to call connect() in an async fashion. Like
other opcodes, we first try a non-blocking connect, then punt to async
context if we have to.

Note that we can still return -EINPROGRESS, and in that case the caller
should use IORING_OP_POLL_ADD to do an async wait for completion of the
connect request (just like for regular connect(2), except we can do it
async here too).
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8e85cf2

io_uring: only return -EBUSY for submit on non-flushed backlog · c4a2ed72

由 Jens Axboe 提交于 11月 21, 2019

We return -EBUSY on submit when we have a CQ ring overflow backlog, but
that can be a bit problematic if the application is using pure userspace
poll of the CQ ring. For that case, if the ring briefly overflowed and
we have pending entries in the backlog, the submit flushes the backlog
successfully but still returns -EBUSY. If we're able to fully flush the
CQ ring backlog, let the submission proceed.
Reported-by: NDan Melnic <dmm@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c4a2ed72

io_uring: only !null ptr to io_issue_sqe() · f9bd67f6

由 Pavel Begunkov 提交于 11月 21, 2019

Pass only non-null @nxt to io_issue_sqe() and handle it at the caller's
side. And propagate it.

- kiocb_done() is only called from io_read() and io_write(), which are
only called from io_issue_sqe(), so it's @nxt != NULL

- io_put_req_find_next() is called either with explicitly non-null local
nxt, or from one of the functions in io_issue_sqe() switch (or their
callees).
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9bd67f6

io_uring: simplify io_req_link_next() · b18fdf71

由 Pavel Begunkov 提交于 11月 21, 2019

"if (nxt)" is always true, as it was checked in the while's condition.
io_wq_current_is_worker() is unnecessary, as non-async callers don't
pass nxt, so io_queue_async_work() will be called for them anyway.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b18fdf71

io_uring: pass only !null to io_req_find_next() · 944e58bf

由 Pavel Begunkov 提交于 11月 21, 2019

Make io_req_find_next() and io_req_link_next() to accept only non-null
nxt, and handle it in callers.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

944e58bf

io_uring: remove io_free_req_find_next() · 70cf9f32

由 Pavel Begunkov 提交于 11月 21, 2019

There is only one one-liner user of io_free_req_find_next(). Inline it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

70cf9f32

io_uring: add likely/unlikely in io_get_sqring() · 9835d6fa

由 Pavel Begunkov 提交于 11月 21, 2019

The number of SQEs to submit is specified by a user, so io_get_sqring()
in most of the cases succeeds. Hint compilers about that.

Checking ASM genereted by gcc 9.2.0 for x64, there is one branch
misprediction.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9835d6fa

io_uring: rename __io_submit_sqe() · d732447f

由 Pavel Begunkov 提交于 11月 21, 2019

__io_submit_sqe() is issuing requests, so call it as
such. Moreover, it ends by calling io_iopoll_req_issued().

Rename it and make terminology clearer.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d732447f

io_uring: improve trace_io_uring_defer() trace point · 915967f6

由 Jens Axboe 提交于 11月 21, 2019

We don't have shadow requests anymore, so get rid of the shadow
argument. Add the user_data argument, as that's often useful to easily
match up requests, instead of having to look at request pointers.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

915967f6

io_uring: drain next sqe instead of shadowing · 1b4a51b6

由 Pavel Begunkov 提交于 11月 21, 2019

There's an issue with the shadow drain logic in that we drop the
completion lock after deciding to defer a request, then re-grab it later
and assume that the state is still the same. In the mean time, someone
else completing a request could have found and issued it. This can cause
a stall in the queue, by having a shadow request inserted that nobody is
going to drain.

Additionally, if we fail allocating the shadow request, we simply ignore
the drain.

Instead of using a shadow request, defer the next request/link instead.
This also has the following advantages:

- removes semi-duplicated code
- doesn't allocate memory for shadows
- works better if only the head marked for drain
- doesn't need complex synchronisation

On the flip side, it removes the shadow->seq ==
last_drain_in_in_link->seq optimization. That shouldn't be a common
case, and can always be added back, if needed.

Fixes: 4fe2c963 ("io_uring: add support for link with drain")
Cc: Jackie Liu <liuyun01@kylinos.cn>
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1b4a51b6

io_uring: close lookup gap for dependent next work · b76da70f

由 Jens Axboe 提交于 11月 20, 2019

When we find new work to process within the work handler, we queue the
linked timeout before we have issued the new work. This can be
problematic for very short timeouts, as we have a window where the new
work isn't visible.

Allow the work handler to store a callback function for this in the work
item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that
is set, then io-wq will call the callback when it has setup the new work
item.
Reported-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b76da70f

io_uring: allow finding next link independent of req reference count · 4d7dd462

由 Jens Axboe 提交于 11月 20, 2019

We currently try and start the next link when we put the request, and
only if we were going to free it. This means that the optimization to
continue executing requests from the same context often fails, as we're
not putting the final reference.

Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the
next request more efficiently.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d7dd462

io_uring: io_allocate_scq_urings() should return a sane state · eb065d30

由 Jens Axboe 提交于 11月 20, 2019

We currently rely on the ring destroy on cleaning things up in case of
failure, but io_allocate_scq_urings() can leave things half initialized
if only parts of it fails.

Be nice and return with either everything setup in success, or return an
error with things nicely cleaned up.

Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb065d30

io_uring: Always REQ_F_FREE_SQE for allocated sqe · bbad27b2

由 Pavel Begunkov 提交于 11月 19, 2019

Always mark requests with allocated sqe and deallocate it in
__io_free_req(). It's easier to follow and doesn't add edge cases.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bbad27b2

io_uring: io_fail_links() should only consider first linked timeout · 5d960724

由 Jens Axboe 提交于 11月 19, 2019

We currently clear the linked timeout field if we cancel such a timeout,
but we should only attempt to cancel if it's the first one we see.
Others should simply be freed like other requests, as they haven't
been started yet.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d960724

io_uring: Fix leaking linked timeouts · 09fbb0a8

由 Pavel Begunkov 提交于 11月 19, 2019

let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT

1. submission stage: submission references for REQ and LINK_TIMEOUT
are dropped. So, references respectively (1,1,2)

2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all
linked timeouts will call cancel_timeout() and drop 1 reference.
So, references after: (0,0,1). That's a leak.

Make it treat only the first linked timeout as such, and pass others
through __io_double_put_req().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09fbb0a8

io_uring: remove redundant check · f70193d6

由 Pavel Begunkov 提交于 11月 19, 2019

Pass any IORING_OP_LINK_TIMEOUT request further, where it will
eventually fail in io_issue_sqe().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f70193d6

io_uring: break links for failed defer · d3b35796

由 Pavel Begunkov 提交于 11月 19, 2019

If io_req_defer() failed, it needs to cancel a dependant link.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3b35796

io-wq: remove extra space characters · b2e9c7d6

由 Dan Carpenter 提交于 11月 19, 2019

These lines are indented an extra space character.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2e9c7d6

io-wq: wait for io_wq_create() to setup necessary workers · b60fda60

由 Jens Axboe 提交于 11月 19, 2019

We currently have a race where if setup is really slow, we can be
calling io_wq_destroy() before we're done setting up. This will cause
the caller to get stuck waiting for the manager to set things up, but
the manager already exited.

Fix this by doing a sync setup of the manager. This also fixes the case
where if we failed creating workers, we'd also get stuck.

In practice this race window was really small, as we already wait for
the manager to start. Hence someone would have to call io_wq_destroy()
after the task has started, but before it started the first loop. The
reported test case forked tons of these, which is why it became an
issue.

Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
Fixes: 771b53d0 ("io-wq: small threadpool implementation for io_uring")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b60fda60

io_uring: request cancellations should break links · fba38c27

由 Jens Axboe 提交于 11月 18, 2019

We currently don't explicitly break links if a request is cancelled, but
we should. Add explicitly link breakage for all types of request
cancellations that we support.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fba38c27

io_uring: correct poll cancel and linked timeout expiration completion · b0dd8a41

由 Jens Axboe 提交于 11月 18, 2019

Currently a poll request fills a completion entry of 0, even if it got
cancelled. This is odd, and it makes it harder to support with chains.
Ensure that it returns -ECANCELED in the completions events if it got
cancelled, and furthermore ensure that the linked timeout that triggered
it completes with -ETIME if we did indeed trigger the completions
through a timeout.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0dd8a41

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功