提交 · e59d3c64cba69b57263dff1d62838bc6a819ae37 · openeuler / Kernel

20 12月, 2020 8 次提交

epoll: eliminate unnecessary lock for zero timeout · e59d3c64

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

We call ep_events_available() under lock when timeout is 0, and then call
it without locks in the loop for the other cases.

Instead, call ep_events_available() without lock for all cases.  For
non-zero timeouts, we will recheck after adding the thread to the wait
queue.  For zero timeout cases, by definition, user is opportunistically
polling and will have to call epoll_wait again in the future.

Note that this lock was kept in c5a282e9 because the whole loop was
historically under lock.

This patch results in a 1% CPU/RPC reduction in RPC benchmarks.

Link: https://lkml.kernel.org/r/20201106231635.3528496-9-soheil.kdev@gmail.comSigned-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Cc: Guantao Liu <guantaol@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e59d3c64

epoll: replace gotos with a proper loop · 00b27634

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

The existing loop is pointless, and the labels make it really hard to
follow the structure.

Replace that control structure with a simple loop that returns when there
are new events, there is a signal, or the thread has timed out.

Link: https://lkml.kernel.org/r/20201106231635.3528496-8-soheil.kdev@gmail.comSigned-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Cc: Guantao Liu <guantaol@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00b27634

epoll: pull all code between fetch_events and send_event into the loop · e8c85328

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

This is a no-op change which simplifies the follow up patches.

Link: https://lkml.kernel.org/r/20201106231635.3528496-7-soheil.kdev@gmail.comSigned-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Cc: Guantao Liu <guantaol@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8c85328

epoll: simplify and optimize busy loop logic · 1493c47f

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

ep_events_available() is called multiple times around the busy loop logic,
even though the logic is generally not used. ep_reset_busy_poll_napi_id()
is similarly always called, even when busy loop is not used.

Eliminate ep_reset_busy_poll_napi_id() and inline it inside
ep_busy_loop(). Make ep_busy_loop() return whether there are any events
available after the busy loop. This will eliminate unnecessary loads and
branches, and simplifies the loop.

Link: https://lkml.kernel.org/r/20201106231635.3528496-6-soheil.kdev@gmail.comSigned-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Cc: Guantao Liu <guantaol@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1493c47f

epoll: move eavail next to the list_empty_careful check · e411596d

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

This is a no-op change and simply to make the code more coherent.

Link: https://lkml.kernel.org/r/20201106231635.3528496-5-soheil.kdev@gmail.comSigned-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Cc: Guantao Liu <guantaol@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e411596d

epoll: pull fatal signal checks into ep_send_events() · cccd29bf

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

To simplify the code, pull in checking the fatal signals into
ep_send_events().  ep_send_events() is called only from ep_poll().

Note that, previously, we were always checking fatal events, but it is
checked only if eavail is true.  This should be fine because the goal of
that check is to quickly return from epoll_wait() when there is a pending
fatal signal.

Link: https://lkml.kernel.org/r/20201106231635.3528496-4-soheil.kdev@gmail.comSigned-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Cc: Guantao Liu <guantaol@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cccd29bf

epoll: simplify signal handling · 2efdaf76

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

Check signals before locking ep->lock, and immediately return -EINTR if
there is any signal pending.

This saves a few loads, stores, and branches from the hot path and
simplifies the loop structure for follow up patches.

Link: https://lkml.kernel.org/r/20201106231635.3528496-3-soheil.kdev@gmail.comSigned-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Cc: Guantao Liu <guantaol@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2efdaf76

epoll: check for events when removing a timed out thread from the wait queue · 289caf5d

由 Soheil Hassas Yeganeh 提交于 12月 18, 2020

Patch series "simplify ep_poll".

This patch series is a followup based on the suggestions and feedback by
Linus:
https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com

The first patch in the series is a fix for the epoll race in presence of
timeouts, so that it can be cleanly backported to all affected stable
kernels.

The rest of the patch series simplify the ep_poll() implementation.  Some
of these simplifications result in minor performance enhancements as well.
We have kept these changes under self tests and internal benchmarks for a
few days, and there are minor (1-2%) performance enhancements as a result.

This patch (of 8):

After abc610e0 ("fs/epoll: avoid barrier after an epoll_wait(2)
timeout"), we break out of the ep_poll loop upon timeout, without checking
whether there is any new events available.  Prior to that patch-series we
always called ep_events_available() after exiting the loop.

This can cause races and missed wakeups.  For example, consider the
following scenario reported by Guantao Liu:

Suppose we have an eventfd added using EPOLLET to an epollfd.

Thread 1: Sleeps for just below 5ms and then writes to an eventfd.
Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an
          event of the eventfd, it will write back on that fd.
Thread 3: Calls epoll_wait with a negative timeout.

Prior to abc610e0, it is guaranteed that Thread 3 will wake up either
by Thread 1 or Thread 2.  After abc610e0, Thread 3 can be blocked
indefinitely if Thread 2 sees a timeout right before the write to the
eventfd by Thread 1.  Thread 2 will be woken up from
schedule_hrtimeout_range and, with evail 0, it will not call
ep_send_events().

To fix this issue:
1) Simplify the timed_out case as suggested by Linus.
2) while holding the lock, recheck whether the thread was woken up
   after its time out has reached.

Note that (2) is different from Linus' original suggestion: It do not set
"eavail = ep_events_available(ep)" to avoid unnecessary contention (when
there are too many timed-out threads and a small number of events), as
well as races mentioned in the discussion thread.

This is the first patch in the series so that the backport to stable
releases is straightforward.

Link: https://lkml.kernel.org/r/20201106231635.3528496-1-soheil.kdev@gmail.com
Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com
Link: https://lkml.kernel.org/r/20201106231635.3528496-2-soheil.kdev@gmail.com
Fixes: abc610e0 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout")
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Tested-by: NGuantao Liu <guantaol@google.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NGuantao Liu <guantaol@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NKhazhismel Kumykov <khazhy@google.com>
Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

289caf5d

05 12月, 2020 1 次提交

net: Remove the err argument from sock_from_file · dba4a925

由 Florent Revest 提交于 12月 04, 2020

Currently, the sock_from_file prototype takes an "err" pointer that is
either not set or set to -ENOTSOCK IFF the returned socket is NULL. This
makes the error redundant and it is ignored by a few callers.

This patch simplifies the API by letting callers deduce the error based
on whether the returned socket is NULL or not.
Suggested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NFlorent Revest <revest@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NKP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201204113609.1850150-1-revest@google.com

dba4a925

01 12月, 2020 2 次提交

net: Add SO_BUSY_POLL_BUDGET socket option · 7c951caf

由 Björn Töpel 提交于 11月 30, 2020

This option lets a user set a per socket NAPI budget for
busy-polling. If the options is not set, it will use the default of 8.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20201130185205.196029-3-bjorn.topel@gmail.com

7c951caf

net: Introduce preferred busy-polling · 7fd3253a

由 Björn Töpel 提交于 11月 30, 2020

The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
option or system-wide using the /proc/sys/net/core/busy_read knob, is
an opportunistic. That means that if the NAPI context is not
scheduled, it will poll it. If, after busy-polling, the budget is
exceeded the busy-polling logic will schedule the NAPI onto the
regular softirq handling.

One implication of the behavior above is that a busy/heavy loaded NAPI
context will never enter/allow for busy-polling. Some applications
prefer that most NAPI processing would be done by busy-polling.

This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
in concert with the napi_defer_hard_irqs and gro_flush_timeout
knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
introduced in commit 6f8b12d6 ("net: napi: add hard irqs deferral
feature"), and allows for a user to defer interrupts to be enabled and
instead schedule the NAPI context from a watchdog timer. When a user
enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
and the NAPI context is being processed by a softirq, the softirq NAPI
processing will exit early to allow the busy-polling to be performed.

If the application stops performing busy-polling via a system call,
the watchdog timer defined by gro_flush_timeout will timeout, and
regular softirq handling will resume.

In summary; Heavy traffic applications that prefer busy-polling over
softirq processing should use this option.

Example usage:

  $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
  $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout

Note that the timeout should be larger than the userspace processing
window, otherwise the watchdog will timeout and fall back to regular
softirq processing.

Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20201130185205.196029-2-bjorn.topel@gmail.com

7fd3253a

26 10月, 2020 27 次提交

epoll: take epitem list out of struct file · 319c1517

由 Al Viro 提交于 10月 01, 2020

Move the head of epitem list out of struct file; for epoll ones it's
moved into struct eventpoll (->refs there), for non-epoll - into
the new object (struct epitem_head).  In place of ->f_ep_links we
leave a pointer to the list head (->f_ep).

->f_ep is protected by ->f_lock and it's zeroed as soon as the list
of epitems becomes empty (that can happen only in ep_remove() by
now).

The list of files for reverse path check is *not* going through
struct file now - it's a single-linked list going through epitem_head
instances.  It's terminated by ERR_PTR(-1) (== EP_UNACTIVE_POINTER),
so the elements of list can be distinguished by head->next != NULL.

epitem_head instances are allocated at ep_insert() time (by
attach_epitem()) and freed either by ep_remove() (if it empties
the set of epitems *and* epitem_head does not belong to the
reverse path check list) or by clear_tfile_check_list() when
the list is emptied (if the set of epitems is empty by that
point).  Allocations are done from a separate slab - minimal kmalloc()
size is too large on some architectures.

As the result, we trim struct file _and_ get rid of the games with
temporary file references.

Locking and barriers are interesting (aren't they always); see unlist_file()
and ep_remove() for details.  The non-obvious part is that ep_remove() needs
to decide if it will be the one to free the damn thing *before* actually
storing NULL to head->epitems.first - that's what smp_load_acquire is for
in there.  unlist_file() lockless path is safe, since we hit it only if
we observe NULL in head->epitems.first and whoever had done that store is
guaranteed to have observed non-NULL in head->next.  IOW, their last access
had been the store of NULL into ->epitems.first and we can safely free
the sucker.  OTOH, we are under rcu_read_lock() and both epitem and
epitem->file have their freeing RCU-delayed.  So if we see non-NULL
->epitems.first, we can grab ->f_lock (all epitems in there share the
same struct file) and safely recheck the emptiness of ->epitems; again,
->next is still non-NULL, so ep_remove() couldn't have freed head yet.
->f_lock serializes us wrt ep_remove(); the rest is trivial.

Note that once head->epitems becomes NULL, nothing can get inserted into
it - the only remaining reference to head after that point is from the
reverse path check list.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

319c1517

epoll: massage the check list insertion · d9f41e3c

由 Al Viro 提交于 10月 01, 2020

in the "non-epoll target" cases do it in ep_insert() rather than
in do_epoll_ctl(), so that we do it only with some epitem is already
guaranteed to exist.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9f41e3c

A
lift rcu_read_lock() into reverse_path_check() · b62d2706
由 Al Viro 提交于 10月 01, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b62d2706

convert ->f_ep_links/->fllink to hlist · 44cdc1d9

由 Al Viro 提交于 9月 27, 2020

we don't care about the order of elements there
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44cdc1d9

ep_insert(): move creation of wakeup source past the fl_ep_links insertion · d1ec50ad

由 Al Viro 提交于 9月 27, 2020

That's the beginning of preparations for taking f_ep_links out of struct file.
If insertion might fail, we will need a new failure exit. Having wakeup
source creation done after that point will simplify life there; ep_remove()
can (and commonly does) live with NULL epi->ws, so it can be used for
cleanup after ep_create_wakeup_source() failure. It can't be used before
the rbtree insertion, though, so if we are to unify all old failure exits,
we need to move that thing down. Then we would be free to do simple
kmem_cache_free() on the failure to insert into f_ep_links - no wakeup source
to leak on that failure exit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d1ec50ad

A
fold ep_read_events_proc() into the only caller · 2c0b71c1
由 Al Viro 提交于 9月 26, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2c0b71c1

take the common part of ep_eventpoll_poll() and ep_item_poll() into helper · ad9366b1

由 Al Viro 提交于 9月 26, 2020

The only reason why ep_item_poll() can't simply call ep_eventpoll_poll()
(or, better yet, call vfs_poll() in all cases) is that we need to tell
lockdep how deep into the hierarchy of ->mtx we are.  So let's add
a variant of ep_eventpoll_poll() that would take depth explicitly
and turn ep_eventpoll_poll() into wrapper for that.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ad9366b1

ep_insert(): we only need tep->mtx around the insertion itself · 85353e91

由 Al Viro 提交于 9月 26, 2020

We do need ep->mtx (and we are holding it all along), but that's
the lock on the epoll we are inserting into; locking of the
epoll being inserted is not needed for most of that work -
as the matter of fact, we only need it to provide barriers
for the fastpath check (for now).

Move taking and releasing it into ep_insert().  The caller
(do_epoll_ctl()) doesn't need to bother with that at all.
Moreover, that way we kill the kludge in ep_item_poll() - now
it's always called with tep unlocked.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85353e91

A
ep_insert(): don't open-code ep_remove() on failure exits · e3e096e7
由 Al Viro 提交于 9月 26, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e3e096e7

lift locking/unlocking ep->mtx out of ep_{start,done}_scan() · 57804b1c

由 Al Viro 提交于 8月 31, 2020

get rid of depth/ep_locked arguments there and document
the kludge in ep_item_poll() that has lead to ep_locked existence in
the first place
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

57804b1c

ep_send_events_proc(): fold into the caller · ff07952a

由 Al Viro 提交于 8月 31, 2020

... and get rid of struct ep_send_events_data - not needed anymore.
The weird way of passing the arguments in (and real return value
out - nominal return value of ep_send_events_proc() is ignored)
was due to the signature forced on ep_scan_ready_list() callbacks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff07952a

A
lift the calls of ep_send_events_proc() into the callers · 443f1a04
由 Al Viro 提交于 8月 31, 2020
```
... and kill ep_scan_ready_list()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
443f1a04

lift the calls of ep_read_events_proc() into the callers · 1ec09974

由 Al Viro 提交于 8月 31, 2020

Expand the calls of ep_scan_ready_list() that get ep_read_events_proc().
As a side benefit we can pass depth to ep_read_events_proc() by value
and not by address - the latter used to be forced by the signature
expected from ep_scan_ready_list() callback.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ec09974

ep_scan_ready_list(): prepare to splitup · db502f8a

由 Al Viro 提交于 8月 31, 2020

take the stuff done before and after the callback into separate helpers
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db502f8a

ep_loop_check_proc(): saner calling conventions · bde03c4c

由 Al Viro 提交于 9月 26, 2020

1) 'cookie' argument is unused; kill it.
2) 'priv' one is always an epoll struct file, and we only care
about its associated struct eventpoll; pass that instead.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bde03c4c

get rid of ep_push_nested() · 6a3890c4

由 Al Viro 提交于 9月 26, 2020

The only remaining user is loop checking.  But there we only need
to check that we have not walked into the epoll we are inserting
into - we are adding an edge to acyclic graph, so any loop being
created will have to pass through the source of that edge.

So we don't need that array of cookies - we have only one eventpoll
to watch out for.  RIP ep_push_nested(), along with the cookies
array.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6a3890c4

A
ep_loop_check_proc(): lift pushing the cookie into callers · 56c428ca
由 Al Viro 提交于 9月 26, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
56c428ca
A
clean reverse_path_check_proc() a bit · d16312a4
由 Al Viro 提交于 9月 26, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d16312a4

reverse_path_check_proc(): don't bother with cookies · 0c320f77

由 Al Viro 提交于 9月 25, 2020

We know there's no loops by the time we call it; the
only thing we care about is too deep reverse paths.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0c320f77

reverse_path_check_proc(): sane arguments · aebf15f0

由 Al Viro 提交于 8月 22, 2020

no need to force its calling conventions to match the callback for
late unlamented ep_call_nested()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aebf15f0

A
untangling ep_call_nested(): and there was much rejoicing · 773318ed
由 Al Viro 提交于 8月 22, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
773318ed
A
untangling ep_call_nested(): move push/pop of cookie into the callbacks · 99d84d43
由 Al Viro 提交于 8月 22, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
99d84d43
A
untangling ep_call_nested(): take pushing cookie into a helper · 3b1688ef
由 Al Viro 提交于 8月 22, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3b1688ef

untangling ep_call_nested(): it's all serialized on epmutex. · d01f0594

由 Al Viro 提交于 8月 22, 2020

IOW,
	* no locking is needed to protect the list
	* the list is actually a stack
	* no need to check ->ctx
	* it can bloody well be a static 5-element array - nobody is
going to be accessing it in parallel.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d01f0594

untangling ep_call_nested(): get rid of useless arguments · 8677600d

由 Al Viro 提交于 8月 22, 2020

ctx is always equal to current, ncalls - to &poll_loop_ncalls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8677600d

epoll: get rid of epitem->nwait · 364f374f

由 Al Viro 提交于 9月 02, 2020

we use it only to indicate allocation failures within queueing
callback back to ep_insert().  Might as well use epq.epi for that
reporting...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

364f374f

epoll: switch epitem->pwqlist to single-linked list · 80285b75

由 Al Viro 提交于 9月 02, 2020

We only traverse it once to destroy all associated eppoll_entry at
epitem destruction time. The order of traversal is irrelevant there.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

80285b75

25 9月, 2020 1 次提交

ep_create_wakeup_source(): dentry name can change under you... · 3701cb59

由 Al Viro 提交于 9月 24, 2020

or get freed, for that matter, if it's a long (separately stored)
name.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3701cb59

11 9月, 2020 1 次提交

epoll: EPOLL_CTL_ADD: close the race in decision to take fast path · fe0a916c

由 Al Viro 提交于 9月 10, 2020

Checking for the lack of epitems refering to the epoll we want to insert into
is not enough; we might have an insertion of that epoll into another one that
has already collected the set of files to recheck for excessive reverse paths,
but hasn't gotten to creating/inserting the epitem for it.

However, any such insertion in progress can be detected - it will update the
generation count in our epoll when it's done looking through it for files
to check.  That gets done under ->mtx of our epoll and that allows us to
detect that safely.

We are *not* holding epmutex here, so the generation count is not stable.
However, since both the update of ep->gen by loop check and (later)
insertion into ->f_ep_link are done with ep->mtx held, we are fine -
the sequence is
	grab epmutex
	bump loop_check_gen
	...
	grab tep->mtx		// 1
	tep->gen = loop_check_gen
	...
	drop tep->mtx		// 2
	...
	grab tep->mtx		// 3
	...
	insert into ->f_ep_link
	...
	drop tep->mtx		// 4
	bump loop_check_gen
	drop epmutex
and if the fastpath check in another thread happens for that
eventpoll, it can come
	* before (1) - in that case fastpath is just fine
	* after (4) - we'll see non-empty ->f_ep_link, slow path
taken
	* between (2) and (3) - loop_check_gen is stable,
with ->mtx providing barriers and we end up taking slow path.

Note that ->f_ep_link emptiness check is slightly racy - we are protected
against insertions into that list, but removals can happen right under us.
Not a problem - in the worst case we'll end up taking a slow path for
no good reason.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe0a916c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功