提交 · 85353e919f6eb28ee4a797b06de8cc4c48dec2d7 · openeuler / Kernel

26 10月, 2020 20 次提交

ep_insert(): we only need tep->mtx around the insertion itself · 85353e91

由 Al Viro 提交于 9月 26, 2020

We do need ep->mtx (and we are holding it all along), but that's
the lock on the epoll we are inserting into; locking of the
epoll being inserted is not needed for most of that work -
as the matter of fact, we only need it to provide barriers
for the fastpath check (for now).

Move taking and releasing it into ep_insert().  The caller
(do_epoll_ctl()) doesn't need to bother with that at all.
Moreover, that way we kill the kludge in ep_item_poll() - now
it's always called with tep unlocked.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85353e91

A
ep_insert(): don't open-code ep_remove() on failure exits · e3e096e7
由 Al Viro 提交于 9月 26, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e3e096e7

lift locking/unlocking ep->mtx out of ep_{start,done}_scan() · 57804b1c

由 Al Viro 提交于 8月 31, 2020

get rid of depth/ep_locked arguments there and document
the kludge in ep_item_poll() that has lead to ep_locked existence in
the first place
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

57804b1c

ep_send_events_proc(): fold into the caller · ff07952a

由 Al Viro 提交于 8月 31, 2020

... and get rid of struct ep_send_events_data - not needed anymore.
The weird way of passing the arguments in (and real return value
out - nominal return value of ep_send_events_proc() is ignored)
was due to the signature forced on ep_scan_ready_list() callbacks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff07952a

A
lift the calls of ep_send_events_proc() into the callers · 443f1a04
由 Al Viro 提交于 8月 31, 2020
```
... and kill ep_scan_ready_list()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
443f1a04

lift the calls of ep_read_events_proc() into the callers · 1ec09974

由 Al Viro 提交于 8月 31, 2020

Expand the calls of ep_scan_ready_list() that get ep_read_events_proc().
As a side benefit we can pass depth to ep_read_events_proc() by value
and not by address - the latter used to be forced by the signature
expected from ep_scan_ready_list() callback.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ec09974

ep_scan_ready_list(): prepare to splitup · db502f8a

由 Al Viro 提交于 8月 31, 2020

take the stuff done before and after the callback into separate helpers
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db502f8a

ep_loop_check_proc(): saner calling conventions · bde03c4c

由 Al Viro 提交于 9月 26, 2020

1) 'cookie' argument is unused; kill it.
2) 'priv' one is always an epoll struct file, and we only care
about its associated struct eventpoll; pass that instead.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bde03c4c

get rid of ep_push_nested() · 6a3890c4

由 Al Viro 提交于 9月 26, 2020

The only remaining user is loop checking.  But there we only need
to check that we have not walked into the epoll we are inserting
into - we are adding an edge to acyclic graph, so any loop being
created will have to pass through the source of that edge.

So we don't need that array of cookies - we have only one eventpoll
to watch out for.  RIP ep_push_nested(), along with the cookies
array.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6a3890c4

A
ep_loop_check_proc(): lift pushing the cookie into callers · 56c428ca
由 Al Viro 提交于 9月 26, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
56c428ca
A
clean reverse_path_check_proc() a bit · d16312a4
由 Al Viro 提交于 9月 26, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d16312a4

reverse_path_check_proc(): don't bother with cookies · 0c320f77

由 Al Viro 提交于 9月 25, 2020

We know there's no loops by the time we call it; the
only thing we care about is too deep reverse paths.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0c320f77

reverse_path_check_proc(): sane arguments · aebf15f0

由 Al Viro 提交于 8月 22, 2020

no need to force its calling conventions to match the callback for
late unlamented ep_call_nested()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aebf15f0

A
untangling ep_call_nested(): and there was much rejoicing · 773318ed
由 Al Viro 提交于 8月 22, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
773318ed
A
untangling ep_call_nested(): move push/pop of cookie into the callbacks · 99d84d43
由 Al Viro 提交于 8月 22, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
99d84d43
A
untangling ep_call_nested(): take pushing cookie into a helper · 3b1688ef
由 Al Viro 提交于 8月 22, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3b1688ef

untangling ep_call_nested(): it's all serialized on epmutex. · d01f0594

由 Al Viro 提交于 8月 22, 2020

IOW,
	* no locking is needed to protect the list
	* the list is actually a stack
	* no need to check ->ctx
	* it can bloody well be a static 5-element array - nobody is
going to be accessing it in parallel.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d01f0594

untangling ep_call_nested(): get rid of useless arguments · 8677600d

由 Al Viro 提交于 8月 22, 2020

ctx is always equal to current, ncalls - to &poll_loop_ncalls.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8677600d

epoll: get rid of epitem->nwait · 364f374f

由 Al Viro 提交于 9月 02, 2020

we use it only to indicate allocation failures within queueing
callback back to ep_insert().  Might as well use epq.epi for that
reporting...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

364f374f

epoll: switch epitem->pwqlist to single-linked list · 80285b75

由 Al Viro 提交于 9月 02, 2020

We only traverse it once to destroy all associated eppoll_entry at
epitem destruction time. The order of traversal is irrelevant there.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

80285b75

25 9月, 2020 1 次提交

ep_create_wakeup_source(): dentry name can change under you... · 3701cb59

由 Al Viro 提交于 9月 24, 2020

or get freed, for that matter, if it's a long (separately stored)
name.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3701cb59

11 9月, 2020 1 次提交

epoll: EPOLL_CTL_ADD: close the race in decision to take fast path · fe0a916c

由 Al Viro 提交于 9月 10, 2020

Checking for the lack of epitems refering to the epoll we want to insert into
is not enough; we might have an insertion of that epoll into another one that
has already collected the set of files to recheck for excessive reverse paths,
but hasn't gotten to creating/inserting the epitem for it.

However, any such insertion in progress can be detected - it will update the
generation count in our epoll when it's done looking through it for files
to check.  That gets done under ->mtx of our epoll and that allows us to
detect that safely.

We are *not* holding epmutex here, so the generation count is not stable.
However, since both the update of ep->gen by loop check and (later)
insertion into ->f_ep_link are done with ep->mtx held, we are fine -
the sequence is
	grab epmutex
	bump loop_check_gen
	...
	grab tep->mtx		// 1
	tep->gen = loop_check_gen
	...
	drop tep->mtx		// 2
	...
	grab tep->mtx		// 3
	...
	insert into ->f_ep_link
	...
	drop tep->mtx		// 4
	bump loop_check_gen
	drop epmutex
and if the fastpath check in another thread happens for that
eventpoll, it can come
	* before (1) - in that case fastpath is just fine
	* after (4) - we'll see non-empty ->f_ep_link, slow path
taken
	* between (2) and (3) - loop_check_gen is stable,
with ->mtx providing barriers and we end up taking slow path.

Note that ->f_ep_link emptiness check is slightly racy - we are protected
against insertions into that list, but removals can happen right under us.
Not a problem - in the worst case we'll end up taking a slow path for
no good reason.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe0a916c

10 9月, 2020 2 次提交
- A
  epoll: replace ->visited/visited_list with generation count · 18306c40
  由 Al Viro 提交于 9月 10, 2020
```
removes the need to clear it, along with the races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  18306c40
- A
  epoll: do not insert into poll queues until all sanity checks are done · f8d4f44d
  由 Al Viro 提交于 9月 09, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f8d4f44d
02 9月, 2020 1 次提交

fix regression in "epoll: Keep a reference on files added to the check list" · 77f4689d

由 Al Viro 提交于 9月 02, 2020

epoll_loop_check_proc() can run into a file already committed to destruction;
we can't grab a reference on those and don't need to add them to the set for
reverse path check anyway.
Tested-by: NMarc Zyngier <maz@kernel.org>
Fixes: a9ed4a65 ("epoll: Keep a reference on files added to the check list")
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77f4689d

23 8月, 2020 2 次提交

A
do_epoll_ctl(): clean the failure exits up a bit · 52c47969
由 Al Viro 提交于 8月 22, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
52c47969

epoll: Keep a reference on files added to the check list · a9ed4a65

由 Marc Zyngier 提交于 8月 19, 2020

When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.

However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.

Cc: stable@vger.kernel.org
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a9ed4a65

15 5月, 2020 1 次提交

epoll: call final ep_events_available() check under the lock · 65759097

由 Roman Penyaev 提交于 5月 13, 2020

There is a possible race when ep_scan_ready_list() leaves ->rdllist and
->obflist empty for a short period of time although some events are
pending. It is quite likely that ep_events_available() observes empty
lists and goes to sleep.

Since commit 339ddb53 ("fs/epoll: remove unnecessary wakeups of
nested epoll") we are conservative in wakeups (there is only one place
for wakeup and this is ep_poll_callback()), thus ep_events_available()
must always observe correct state of two lists.

The easiest and correct way is to do the final check under the lock.
This does not impact the performance, since lock is taken anyway for
adding a wait entry to the wait queue.

The discussion of the problem can be found here:

https://lore.kernel.org/linux-fsdevel/a2f22c3c-c25a-4bda-8339-a7bdaf17849e@akamai.com/

In this patch barrierless __set_current_state() is used. This is safe
since waitqueue_active() is called under the same lock on wakeup side.

Short-circuit for fatal signals (i.e. fatal_signal_pending() check) is
moved to the line just before actual events harvesting routine. This is
fully compliant to what is said in the comment of the patch where the
actual fatal_signal_pending() check was added: c257a340 ("fs, epoll:
short circuit fetching events if thread has been killed").

Fixes: 339ddb53 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Reported-by: NJason Baron <jbaron@akamai.com>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NRoman Penyaev <rpenyaev@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200505145609.1865152-1-rpenyaev@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65759097

08 5月, 2020 2 次提交

epoll: atomically remove wait entry on wake up · 412895f0

由 Roman Penyaev 提交于 5月 07, 2020

This patch does two things:

 - fixes a lost wakeup introduced by commit 339ddb53 ("fs/epoll:
   remove unnecessary wakeups of nested epoll")

 - improves performance for events delivery.

The description of the problem is the following: if N (>1) threads are
waiting on ep->wq for new events and M (>1) events come, it is quite
likely that >1 wakeups hit the same wait queue entry, because there is
quite a big window between __add_wait_queue_exclusive() and the
following __remove_wait_queue() calls in ep_poll() function.

This can lead to lost wakeups, because thread, which was woken up, can
handle not all the events in ->rdllist.  (in better words the problem is
described here: https://lkml.org/lkml/2019/10/7/905)

The idea of the current patch is to use init_wait() instead of
init_waitqueue_entry().

Internally init_wait() sets autoremove_wake_function as a callback,
which removes the wait entry atomically (under the wq locks) from the
list, thus the next coming wakeup hits the next wait entry in the wait
queue, thus preventing lost wakeups.

Problem is very well reproduced by the epoll60 test case [1].

Wait entry removal on wakeup has also performance benefits, because
there is no need to take a ep->lock and remove wait entry from the queue
after the successful wakeup.  Here is the timing output of the epoll60
test case:

  With explicit wakeup from ep_scan_ready_list() (the state of the
  code prior 339ddb53):

    real    0m6.970s
    user    0m49.786s
    sys     0m0.113s

 After this patch:

   real    0m5.220s
   user    0m36.879s
   sys     0m0.019s

The other testcase is the stress-epoll [2], where one thread consumes
all the events and other threads produce many events:

  With explicit wakeup from ep_scan_ready_list() (the state of the
  code prior 339ddb53):

    threads  events/ms  run-time ms
          8       5427         1474
         16       6163         2596
         32       6824         4689
         64       7060         9064
        128       6991        18309

 After this patch:

    threads  events/ms  run-time ms
          8       5598         1429
         16       7073         2262
         32       7502         4265
         64       7640         8376
        128       7634        16767

 (number of "events/ms" represents event bandwidth, thus higher is
  better; number of "run-time ms" represents overall time spent
  doing the benchmark, thus lower is better)

[1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
[2] https://github.com/rouming/test-tools/blob/master/stress-epoll.cSigned-off-by: NRoman Penyaev <rpenyaev@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

412895f0

eventpoll: fix missing wakeup for ovflist in ep_poll_callback · 0c54a6a4

由 Khazhismel Kumykov 提交于 5月 07, 2020

In the event that we add to ovflist, before commit 339ddb53
("fs/epoll: remove unnecessary wakeups of nested epoll") we would be
woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.

With that wakeup removed, if we add to ovflist here, we may never wake
up.  Rather than adding back the ep_scan_ready_list wakeup - which was
resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback.

We noticed that one of our workloads was missing wakeups starting with
339ddb53 and upon manual inspection, this wakeup seemed missing to me.
With this patch added, we no longer see missing wakeups.  I haven't yet
tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.

[khazhy@google.com: use if/elif instead of goto + cleanup suggested by Roman]
  Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com
Fixes: 339ddb53 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Signed-off-by: NKhazhismel Kumykov <khazhy@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NRoman Penyaev <rpenyaev@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c54a6a4

08 4月, 2020 1 次提交

fs/epoll: make nesting accounting safe for -rt kernel · efcdd350

由 Jason Baron 提交于 4月 06, 2020

Davidlohr Bueso pointed out that when CONFIG_DEBUG_LOCK_ALLOC is set
ep_poll_safewake() can take several non-raw spinlocks after disabling
interrupts.  Since a spinlock can block in the -rt kernel, we can't take a
spinlock after disabling interrupts.  So let's re-work how we determine
the nesting level such that it plays nicely with the -rt kernel.

Let's introduce a 'nests' field in struct eventpoll that records the
current nesting level during ep_poll_callback().  Then, if we nest again
we can find the previous struct eventpoll that we were called from and
increase our count by 1.  The 'nests' field is protected by
ep->poll_wait.lock.

I've also moved the visited field to reduce the size of struct eventpoll
from 184 bytes to 176 bytes on x86_64 for !CONFIG_DEBUG_LOCK_ALLOC, which
is typical for a production config.
Reported-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1582739816-13167-1-git-send-email-jbaron@akamai.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

efcdd350

22 3月, 2020 1 次提交

epoll: fix possible lost wakeup on epoll_ctl() path · 1b53734b

由 Roman Penyaev 提交于 3月 21, 2020

This fixes possible lost wakeup introduced by commit a218cc49.
Originally modifications to ep->wq were serialized by ep->wq.lock, but
in commit a218cc49 ("epoll: use rwlock in order to reduce
ep_poll_callback() contention") a new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on epoll_ctl()
path) were switched to ep->lock, but ep_poll (epoll_wait) was using
ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock internally,
but actual waitqueue_active() check prior wake_up() call can be
reordered with modifications of ep ready list, thus wake up can be lost.

And yes, can be healed by explicit smp_mb():

  list_add_tail(&epi->rdlink, &ep->rdllist);
  smp_mb();
  if (waitqueue_active(&ep->wq))
	wake_up(&ep->wp);

But let's make it simple, thus current patch replaces ep->wq.lock with
the ep->lock for wqueue modifications, thus wake up path always observes
activeness of the wqueue correcty.

Fixes: a218cc49 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Reported-by: NMax Neunhoeffer <max@arangodb.com>
Signed-off-by: NRoman Penyaev <rpenyaev@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NMax Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Cc: <stable@vger.kernel.org>	[5.1+]
Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933Bisected-by: NMax Neunhoeffer <max@arangodb.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b53734b

30 1月, 2020 2 次提交

eventpoll: support non-blocking do_epoll_ctl() calls · 39220e8d

由 Jens Axboe 提交于 1月 08, 2020

Also make it available outside of epoll, along with the helper that
decides if we need to copy the passed in epoll_event.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

39220e8d

J
eventpoll: abstract out epoll_ctl() handler · 58e41a44
由 Jens Axboe 提交于 1月 08, 2020
```
No functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
58e41a44

05 12月, 2019 2 次提交

fs/epoll: remove unnecessary wakeups of nested epoll · 339ddb53

由 Heiher 提交于 12月 04, 2019

Take the case where we have:

        t0
         | (ew)
        e0
         | (et)
        e1
         | (lt)
        s0

t0: thread 0
e0: epoll fd 0
e1: epoll fd 1
s0: socket fd 0
ew: epoll_wait
et: edge-trigger
lt: level-trigger

We remove unnecessary wakeups to prevent the nested epoll that working in edge-
triggered mode to waking up continuously.

Test code:
 #include <unistd.h>
 #include <sys/epoll.h>
 #include <sys/socket.h>

 int main(int argc, char *argv[])
 {
 	int sfd[2];
 	int efd[2];
 	struct epoll_event e;

 	if (socketpair(AF_UNIX, SOCK_STREAM, 0, sfd) < 0)
 		goto out;

 	efd[0] = epoll_create(1);
 	if (efd[0] < 0)
 		goto out;

 	efd[1] = epoll_create(1);
 	if (efd[1] < 0)
 		goto out;

 	e.events = EPOLLIN;
 	if (epoll_ctl(efd[1], EPOLL_CTL_ADD, sfd[0], &e) < 0)
 		goto out;

 	e.events = EPOLLIN | EPOLLET;
 	if (epoll_ctl(efd[0], EPOLL_CTL_ADD, efd[1], &e) < 0)
 		goto out;

 	if (write(sfd[1], "w", 1) != 1)
 		goto out;

 	if (epoll_wait(efd[0], &e, 1, 0) != 1)
 		goto out;

 	if (epoll_wait(efd[0], &e, 1, 0) != 0)
 		goto out;

 	close(efd[0]);
 	close(efd[1]);
 	close(sfd[0]);
 	close(sfd[1]);

 	return 0;

 out:
 	return -1;
 }

More tests:
 https://github.com/heiher/epoll-wakeup

Link: http://lkml.kernel.org/r/20191009060516.3577-1-r@hev.ccSigned-off-by: Nhev <r@hev.cc>
Reviewed-by: NRoman Penyaev <rpenyaev@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

339ddb53

epoll: simplify ep_poll_safewake() for CONFIG_DEBUG_LOCK_ALLOC · f6520c52

由 Jason Baron 提交于 12月 04, 2019

Currently, ep_poll_safewake() in the CONFIG_DEBUG_LOCK_ALLOC case uses
ep_call_nested() in order to pass the correct subclass argument to
spin_lock_irqsave_nested().  However, ep_call_nested() adds unnecessary
checks for epoll depth and loops that are already verified when doing
EPOLL_CTL_ADD.  This mirrors a conversion that was done for
!CONFIG_DEBUG_LOCK_ALLOC in: commit 37b5e521 ("epoll: remove
ep_call_nested() from ep_eventpoll_poll()")

Link: http://lkml.kernel.org/r/1567628549-11501-1-git-send-email-jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
Reviewed-by: NRoman Penyaev <rpenyaev@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Wong <normalperson@yhbt.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6520c52

21 8月, 2019 1 次提交

PM / wakeup: Show wakeup sources stats in sysfs · c8377adf

由 Tri Vo 提交于 8月 06, 2019

Add an ID and a device pointer to 'struct wakeup_source'. Use them to to
expose wakeup sources statistics in sysfs under
/sys/class/wakeup/wakeup<ID>/*.
Co-developed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Co-developed-by: NStephen Boyd <swboyd@chromium.org>
Signed-off-by: NStephen Boyd <swboyd@chromium.org>
Signed-off-by: NTri Vo <trong@android.com>
Tested-by: NKalesh Singh <kaleshsingh@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c8377adf

19 7月, 2019 1 次提交

proc/sysctl: add shared variables for range check · eec4844f

由 Matteo Croce 提交于 7月 18, 2019

In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range.  This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data                                         old     new   delta
    sysctl_vals                                    -      12     +12
    __kstrtab_sysctl_vals                          -      12     +12
    max                                           14      10      -4
    int_max                                       16       -     -16
    one                                           68       -     -68
    zero                                         128      28    -100
    Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
  Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
  Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.comSigned-off-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eec4844f

17 7月, 2019 1 次提交

signal: simplify set_user_sigmask/restore_user_sigmask · b772434b

由 Oleg Nesterov 提交于 7月 16, 2019

task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
syscall paths.  This means that set_user_sigmask() can save ->blocked in
->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
was modified.

This way the callers do not need 2 sigset_t's passed to set/restore and
restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
into the trivial helper which just calls restore_saved_sigmask().

Link: http://lkml.kernel.org/r/20190606113206.GA9464@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Wong <e@80x24.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Laight <David.Laight@aculab.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b772434b

29 6月, 2019 1 次提交

signal: remove the wrong signal_pending() check in restore_user_sigmask() · 97abc889

由 Oleg Nesterov 提交于 6月 28, 2019

This is the minimal fix for stable, I'll send cleanups later.

Commit 854a6ed5 ("signal: Add restore_user_sigmask()") introduced
the visible change which breaks user-space: a signal temporary unblocked
by set_user_sigmask() can be delivered even if the caller returns
success or timeout.

Change restore_user_sigmask() to accept the additional "interrupted"
argument which should be used instead of signal_pending() check, and
update the callers.

Eric said:

: For clarity.  I don't think this is required by posix, or fundamentally to
: remove the races in select.  It is what linux has always done and we have
: applications who care so I agree this fix is needed.
:
: Further in any case where the semantic change that this patch rolls back
: (aka where allowing a signal to be delivered and the select like call to
: complete) would be advantage we can do as well if not better by using
: signalfd.
:
: Michael is there any chance we can get this guarantee of the linux
: implementation of pselect and friends clearly documented.  The guarantee
: that if the system call completes successfully we are guaranteed that no
: signal that is unblocked by using sigmask will be delivered?

Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
Fixes: 854a6ed5 ("signal: Add restore_user_sigmask()")
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NEric Wong <e@80x24.org>
Tested-by: NEric Wong <e@80x24.org>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97abc889

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功