提交 · b29472ee7b53784f44011069fad15e539fd25bcf · openeuler / Kernel

18 12月, 2019 8 次提交

io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable · b29472ee

由 Jens Axboe 提交于 5年前

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the timeout remove op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b29472ee

io_uring: make IORING_OP_CANCEL_ASYNC deferrable · fbf23849

由 Jens Axboe 提交于 5年前

If we defer this command as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the async cancel op into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fbf23849

io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable · 0969e783

由 Jens Axboe 提交于 5年前

If we defer these commands as part of a link, we have to make sure that
the SQE data has been read upfront. Integrate the poll add/remove into
the prep handling to make it safe for SQE reuse.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0969e783

io_uring: make HARDLINK imply LINK · ffbb8d6b

由 Pavel Begunkov 提交于 5年前

The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a
link and there is no need to set IOSQE_IO_LINK separately, though it
could be there. Add proper check and ensure that IOSQE_IO_HARDLINK
implies IOSQE_IO_LINK.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ffbb8d6b

io_uring: any deferred command must have stable sqe data · 8ed8d3c3

由 Jens Axboe 提交于 5年前

We're currently not retaining sqe data for accept, fsync, and
sync_file_range. None of these commands need data outside of what
is directly provided, hence it can't go stale when the request is
deferred. However, it can get reused, if an application reuses
SQE entries.

Ensure that we retain the information we need and only read the sqe
contents once, off the submission path. Most of this is just moving
code into a prep and finish function.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ed8d3c3

io_uring: remove 'sqe' parameter to the OP helpers that take it · fc4df999

由 Jens Axboe 提交于 5年前

We pass in req->sqe for all of them, no need to pass it in as the
request is always passed in. This is a necessary prep patch to be
able to cleanup/fix the request prep path.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc4df999

io_uring: fix pre-prepped issue with force_nonblock == true · b7bb4f7d

由 Jens Axboe 提交于 5年前

Some of these code paths assume that any force_nonblock == true issue
is not prepped, but that's not true if we did prep as part of link setup
earlier. Check if we already have an async context allocate before
setting up a new one.

Cleanup the async context setup in general, we have a lot of duplicated
code there.

Fixes: 03b1230c ("io_uring: ensure async punted sendmsg/recvmsg requests copy data")
Fixes: f67676d1 ("io_uring: ensure async punted read/write requests copy iovec")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7bb4f7d

io-wq: re-add io_wq_current_is_worker() · 525b305d

由 Jens Axboe 提交于 5年前

This reverts commit 8cdda87a, we now have several use csaes for this
helper. Reinstate it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

525b305d

16 12月, 2019 2 次提交

io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG · 0b416c3e

由 Jens Axboe 提交于 5年前

If we have to punt the recvmsg to async context, we copy all the
context. But since the iovec used can be either on-stack (if small) or
dynamically allocated, if it's on-stack, then we need to ensure we reset
the iov pointer. If we don't, then we're reusing old stack data, and
that can lead to -EFAULTs if things get overwritten.

Ensure we retain the right pointers for the iov, and free it as well if
we end up having to go beyond UIO_FASTIOV number of vectors.

Fixes: 03b1230c ("io_uring: ensure async punted sendmsg/recvmsg requests copy data")
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b416c3e

io_uring: fix stale comment and a few typos · d195a66e

由 Brian Gianforcaro 提交于 5年前

- Fix a few typos found while reading the code.

- Fix stale io_get_sqring comment referencing s->sqe, the 's' parameter
  was renamed to 'req', but the comment still holds.
Signed-off-by: NBrian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d195a66e

12 12月, 2019 6 次提交

io_uring: ensure we return -EINVAL on unknown opcode · 9e3aa61a

由 Jens Axboe 提交于 5年前

If we submit an unknown opcode and have fd == -1, io_op_needs_file()
will return true as we default to needing a file. Then when we go and
assign the file, we find the 'fd' invalid and return -EBADF. We really
should be returning -EINVAL for that case, as we normally do for
unsupported opcodes.

Change io_op_needs_file() to have the following return values:

0   - does not need a file
1   - does need a file
< 0 - error value

and use this to pass back the right value for this invalid case.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9e3aa61a

pipe: simplify signal handling in pipe_read() and add comments · d1c6a2aa

由 Linus Torvalds 提交于 5年前

There's no need to separately check for signals while inside the locked
region, since we're going to do "wait_event_interruptible()" right
afterwards anyway, and the error handling is much simpler there.

The check for whether we had already read anything was also redundant,
since we no longer do the odd merging of reads when there are pending
writers.

But perhaps more importantly, this adds commentary about why we still
need to wake up possible writers even though we didn't read any data,
and why we can skip all the finishing touches now if we get a signal (or
had a signal pending) while waiting for more data.

[ This is a split-out cleanup from my "make pipe IO use exclusive wait
  queues" thing, which I can't apply because it triggers a nasty bug in
  the GNU make jobserver   - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d1c6a2aa

afs: Show volume name in /proc/net/afs/<cell>/volumes · 50559800

由 David Howells 提交于 5年前

Show the name of each volume in /proc/net/afs/<cell>/volumes to make it
easier to work out the name corresponding to a volume ID.  This makes it
easier to work out which mounts in /proc/mounts correspond to which volume
ID.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>

50559800

afs: Fix missing cell comparison in afs_test_super() · 106bc798

由 David Howells 提交于 5年前

Fix missing cell comparison in afs_test_super().  Without this, any pair
volumes that have the same volume ID will share a superblock, no matter the
cell, unless they're in different network namespaces.

Normally, most users will only deal with a single cell and so they won't
see this.  Even if they do look into a second cell, they won't see a
problem unless they happen to hit a volume with the same ID as one they've
already got mounted.

Before the patch:

    # ls /afs/grand.central.org/archive
    linuxdev/  mailman/  moin/  mysql/  pipermail/  stage/  twiki/
    # ls /afs/kth.se/
    linuxdev/  mailman/  moin/  mysql/  pipermail/  stage/  twiki/
    # cat /proc/mounts | grep afs
    none /afs afs rw,relatime,dyn,autocell 0 0
    #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0
    #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0
    #grand.central.org:root.archive /afs/kth.se afs ro,relatime 0 0

After the patch:

    # ls /afs/grand.central.org/archive
    linuxdev/  mailman/  moin/  mysql/  pipermail/  stage/  twiki/
    # ls /afs/kth.se/
    admin/        common/  install/  OldFiles/  service/  system/
    bakrestores/  home/    misc/     pkg/       src/      wsadmin/
    # cat /proc/mounts | grep afs
    none /afs afs rw,relatime,dyn,autocell 0 0
    #grand.central.org:root.cell /afs/grand.central.org afs ro,relatime 0 0
    #grand.central.org:root.archive /afs/grand.central.org/archive afs ro,relatime 0 0
    #kth.se:root.cell /afs/kth.se afs ro,relatime 0 0

Fixes: ^1da177e4 ("Linux-2.6.12-rc2")
Reported-by: NCarsten Jacobi <jacobi@de.ibm.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Tested-by: NJonathan Billings <jsbillings@jsbillings.org>
cc: Todd DeSantis <atd@us.ibm.com>

106bc798

afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP · 1da4bd9f

由 David Howells 提交于 5年前

Fix the lookup method on the dynamic root directory such that creation
calls, such as mkdir, open(O_CREAT), symlink, etc. fail with EOPNOTSUPP
rather than failing with some odd error (such as EEXIST).

lookup() itself tries to create automount directories when it is invoked.
These are cached locally in RAM and not committed to storage.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Tested-by: NJonathan Billings <jsbillings@jsbillings.org>

1da4bd9f

afs: Fix mountpoint parsing · 158d5833

由 David Howells 提交于 5年前

Each AFS mountpoint has strings that define the target to be mounted.  This
is required to end in a dot that is supposed to be stripped off.  The
string can include suffixes of ".readonly" or ".backup" - which are
supposed to come before the terminal dot.  To add to the confusion, the "fs
lsmount" afs utility does not show the terminal dot when displaying the
string.

The kernel mount source string parser, however, assumes that the terminal
dot marks the suffix and that the suffix is always "" and is thus ignored.
In most cases, there is no suffix and this is not a problem - but if there
is a suffix, it is lost and this affects the ability to mount the correct
volume.

The command line mount command, on the other hand, is expected not to
include a terminal dot - so the problem doesn't arise there.

Fix this by making sure that the dot exists and then stripping it when
passing the string to the mount configuration.

Fixes: bec5eb61 ("AFS: Implement an autocell mount capability [ver #2]")
Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Tested-by: NJonathan Billings <jsbillings@jsbillings.org>

158d5833

11 12月, 2019 9 次提交

io_uring: add sockets to list of files that support non-blocking issue · 10d59345

由 Jens Axboe 提交于 5年前

In chasing a performance issue between using IORING_OP_RECVMSG and
IORING_OP_READV on sockets, tracing showed that we always punt the
socket reads to async offload. This is due to io_file_supports_async()
not checking for S_ISSOCK on the inode. Since sockets supports the
O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list
of file types that we can do a non-blocking issue to.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10d59345

io_uring: only hash regular files for async work execution · 53108d47

由 Jens Axboe 提交于 5年前

We hash regular files to avoid having multiple threads hammer on the
inode mutex, but it should not be needed on other types of files
(like sockets).
Signed-off-by: NJens Axboe <axboe@kernel.dk>

53108d47

io_uring: run next sqe inline if possible · 4a0a7a18

由 Jens Axboe 提交于 5年前

One major use case of linked commands is the ability to run the next
link inline, if at all possible. This is done correctly for async
offload, but somewhere along the line we lost the ability to do so when
we were able to complete a request without having to punt it. Ensure
that we do so correctly.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4a0a7a18

io_uring: don't dynamically allocate poll data · 392edb45

由 Jens Axboe 提交于 5年前

This essentially reverts commit e944475e. For high poll ops
workloads, like TAO, the dynamic allocation of the wait_queue
entry for IORING_OP_POLL_ADD adds considerable extra overhead.
Go back to embedding the wait_queue_entry, but keep the usage of
wait->private for the pointer stashing.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

392edb45

io_uring: deferred send/recvmsg should assign iov · d9688565

由 Jens Axboe 提交于 5年前

Don't just assign it from the main call path, that can miss the case
when we're called from issue deferral.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d9688565

io_uring: sqthread should grab ctx->uring_lock for submissions · 8a4955ff

由 Jens Axboe 提交于 5年前

We use the mutex to guard against registered file updates, for instance.
Ensure we're safe in accessing that state against concurrent updates.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a4955ff

io-wq: briefly spin for new work after finishing work · e995d512

由 Jens Axboe 提交于 5年前

To avoid going to sleep only to get woken shortly thereafter, spin
briefly for new work upon completion of work.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e995d512

io-wq: remove worker->wait waitqueue · 506d95ff

由 Jens Axboe 提交于 5年前

We only have one cases of using the waitqueue to wake the worker, the
rest are using wake_up_process(). Since we can save some cycles not
fiddling with the waitqueue io_wqe_worker(), switch the work activation
to task wakeup and get rid of the now unused wait_queue_head_t in
struct io_worker.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

506d95ff

io_uring: allow unbreakable links · 4e88d6e7

由 Jens Axboe 提交于 5年前

Some commands will invariably end in a failure in the sense that the
completion result will be less than zero. One such example is timeouts
that don't have a completion count set, they will always complete with
-ETIME unless cancelled.

For linked commands, we sever links and fail the rest of the chain if
the result is less than zero. Since we have commands where we know that
will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever
regardless of the completion result. Note that the link will still sever
if we fail submitting the parent request, hard links are only resilient
in the presence of completion results for requests that did submit
correctly.

Cc: stable@vger.kernel.org # v5.4
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e88d6e7

10 12月, 2019 8 次提交

ceph: add more debug info when decoding mdsmap · da08e1e1

由 Xiubo Li 提交于 5年前

Show the laggy state.
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

da08e1e1

ceph: switch to global cap helper · bd84fbcb

由 Xiubo Li 提交于 5年前

__ceph_is_any_caps is a duplicate helper.
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bd84fbcb

ceph: trigger the reclaim work once there has enough pending caps · bba1560b

由 Xiubo Li 提交于 5年前

The nr in ceph_reclaim_caps_nr() is very possibly larger than 1,
so we may miss it and the reclaim work couldn't triggered as expected.
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bba1560b

ceph: show tasks waiting on caps in debugfs caps file · 3a3430af

由 Jeff Layton 提交于 5年前

Add some visibility of tasks that are waiting for caps to the "caps"
debugfs file. Display the tgid of the waiting task, inode number, and
the caps the task needs and wants.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3a3430af

ceph: convert int fields in ceph_mount_options to unsigned int · ad8c28a9

由 Jeff Layton 提交于 5年前

Most of these values should never be negative, so convert them to
unsigned values. Add some sanity checking to the parsed values, and
clean up some unneeded casts.

Note that while caps_max should never be negative, this patch leaves
it signed, since this value ends up later being compared to a signed
counter. Just ensure that userland never passes in a negative value
for caps_max.
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ad8c28a9

treewide: Use sizeof_field() macro · c593642c

由 Pankaj Bharadiya 提交于 5年前

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: NPankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.comCo-developed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net

c593642c

btrfs: add Kconfig dependency for BLAKE2B · 78f926f7

由 David Sterba 提交于 5年前

Because the BLAKE2B code went through a different tree, it was not
available at the time the btrfs part was merged. Now that the Kconfig
symbol exists, add it to the list.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

78f926f7

afs: Fix SELinux setting security label on /afs · bcbccaf2

由 David Howells 提交于 5年前

Make the AFS dynamic root superblock R/W so that SELinux can set the
security label on it.  Without this, upgrades to, say, the Fedora
filesystem-afs RPM fail if afs is mounted on it because the SELinux label
can't be (re-)applied.

It might be better to make it possible to bypass the R/O check for LSM
label application through setxattr.

Fixes: 4d673da1 ("afs: Support the AFS dynamic root")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
cc: selinux@vger.kernel.org
cc: linux-security-module@vger.kernel.org

bcbccaf2

09 12月, 2019 1 次提交

afs: Fix afs_find_server lookups for ipv4 peers · 9bd0160d

由 Marc Dionne 提交于 5年前

afs_find_server tries to find a server that has an address that
matches the transport address of an rxrpc peer.  The code assumes
that the transport address is always ipv6, with ipv4 represented
as ipv4 mapped addresses, but that's not the case.  If the transport
family is AF_INET, srx->transport.sin6.sin6_addr.s6_addr32[] will
be beyond the actual ipv4 address and will always be 0, and all
ipv4 addresses will be seen as matching.

As a result, the first ipv4 address seen on any server will be
considered a match, and the server returned may be the wrong one.

One of the consequences is that callbacks received over ipv4 will
only be correctly applied for the server that happens to have the
first ipv4 address on the fs_addresses4 list.  Callbacks over ipv4
from all other servers are dropped, causing the client to serve stale
data.

This is fixed by looking at the transport family, and comparing ipv4
addresses based on a sockaddr_in structure rather than a sockaddr_in6.

Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

9bd0160d

08 12月, 2019 6 次提交

smb3: improve check for when we send the security descriptor context on create · 231e2a0b

由 Steve French 提交于 5年前

We had cases in the previous patch where we were sending the security
descriptor context on SMB3 open (file create) in cases when we hadn't
mounted with with "modefromsid" mount option.

Add check for that mount flag before calling ad_sd_context in
open init.
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

231e2a0b

pipe: don't use 'pipe_wait() for basic pipe IO · 85190d15

由 Linus Torvalds 提交于 5年前

pipe_wait() may be simple, but since it relies on the pipe lock, it
means that we have to do the wakeup while holding the lock.  That's
unfortunate, because the very first thing the waked entity will want to
do is to get the pipe lock for itself.

So get rid of the pipe_wait() usage by simply releasing the pipe lock,
doing the wakeup (if required) and then using wait_event_interruptible()
to wait on the right condition instead.

wait_event_interruptible() handles races on its own by comparing the
wakeup condition before and after adding itself to the wait queue, so
you can use an optimistic unlocked condition for it.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

85190d15

pipe: remove 'waiting_writers' merging logic · a28c8b9d

由 Linus Torvalds 提交于 5年前

This code is ancient, and goes back to when we only had a single page
for the pipe buffers.  The exact history is hidden in the mists of time
(ie "before git", and in fact predates the BK repository too).

At that long-ago point in time, it actually helped to try to merge big
back-and-forth pipe reads and writes, and not limit pipe reads to the
single pipe buffer in length just because that was all we had at a time.

However, since then we've expanded the pipe buffers to multiple pages,
and this logic really doesn't seem to make sense.  And a lot of it is
somewhat questionable (ie "hmm, the user asked for a non-blocking read,
but we see that there's a writer pending, so let's wait anyway to get
the extra data that the writer will have").

But more importantly, it makes the "go to sleep" logic much less
obvious, and considering the wakeup issues we've had, I want to make for
less of those kinds of things.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a28c8b9d

pipe: fix and clarify pipe read wakeup logic · f467a6a6

由 Linus Torvalds 提交于 5年前

This is the read side version of the previous commit: it simplifies the
logic to only wake up waiting writers when necessary, and makes sure to
use a synchronous wakeup.  This time not so much for GNU make jobserver
reasons (that pipe never fills up), but simply to get the writer going
quickly again.

A bit less verbose commentary this time, if only because I assume that
the write side commentary isn't going to be ignored if you touch this
code.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f467a6a6

pipe: fix and clarify pipe write wakeup logic · 1b6b26ae

由 Linus Torvalds 提交于 5年前

The pipe rework ends up having been extra painful, partly becaused of
actual bugs with ordering and caching of the pipe state, but also
because of subtle performance issues.

In particular, the pipe rework caused the kernel build to inexplicably
slow down.

The reason turns out to be that the GNU make jobserver (which limits the
parallelism of the build) uses a pipe to implement a "token" system: a
parallel submake will read a character from the pipe to get the job
token before starting a new job, and will write a character back to the
pipe when it is done.  The overall job limit is thus easily controlled
by just writing the appropriate number of initial token characters into
the pipe.

But to work well, that really means that the old behavior of write
wakeups being synchronous (WF_SYNC) is very important - when the pipe
writer wakes up a reader, we want the reader to actually get scheduled
immediately.  Otherwise you lose the parallelism of the build.

The pipe rework lost that synchronous wakeup on write, and we had
clearly all forgotten the reasons and rules for it.

This rewrites the pipe write wakeup logic to do the required Wsync
wakeups, but also clarifies the logic and avoids extraneous wakeups.

It also ends up addign a number of comments about what oit does and why,
so that we hopefully don't end up forgetting about this next time we
change this code.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b6b26ae

pipe: fix poll/select race introduced by the pipe rework · ad910e36

由 Linus Torvalds 提交于 5年前

The kernel wait queues have a basic rule to them: you add yourself to
the wait-queue first, and then you check the things that you're going to
wait on.  That avoids the races with the event you're waiting for.

The same goes for poll/select logic: the "poll_wait()" goes first, and
then you check the things you're polling for.

Of course, if you use locking, the ordering doesn't matter since the
lock will serialize with anything that changes the state you're looking
at. That's not the case here, though.

So move the poll_wait() first in pipe_poll(), before you start looking
at the pipe state.

Fixes: 8cefc107 ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad910e36

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功