提交 · 5ba97d2832f87943c43bb69cb1ef86dbc59df5bc · openeuler / raspberrypi-kernel

01 7月, 2015 2 次提交

fs/file.c: __fget() and dup2() atomicity rules · 5ba97d28

由 Eric Dumazet 提交于 6月 29, 2015

__fget() does lockless fetch of pointer from the descriptor
table, attempts to grab a reference and treats "it was already
zero" as "it's already gone from the table, we just hadn't
seen the store, let's fail".  Unfortunately, that breaks the
atomicity of dup2() - __fget() might see the old pointer,
notice that it's been already dropped and treat that as
"it's closed".  What we should be getting is either the
old file or new one, depending whether we come before or after
dup2().

Dmitry had following test failing sometimes :

int fd;
void *Thread(void *x) {
  char buf;
  int n = read(fd, &buf, 1);
  if (n != 1)
    exit(printf("read failed: n=%d errno=%d\n", n, errno));
  return 0;
}

int main()
{
  fd = open("/dev/urandom", O_RDONLY);
  int fd2 = open("/dev/urandom", O_RDONLY);
  if (fd == -1 || fd2 == -1)
    exit(printf("open failed\n"));
  pthread_t th;
  pthread_create(&th, 0, Thread, 0);
  if (dup2(fd2, fd) == -1)
    exit(printf("dup2 failed\n"));
  pthread_join(th, 0);
  if (close(fd) == -1)
    exit(printf("close failed\n"));
  if (close(fd2) == -1)
    exit(printf("close failed\n"));
  printf("DONE\n");
  return 0;
}
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5ba97d28

fs/file.c: don't acquire files->file_lock in fd_install() · 8a81252b

由 Eric Dumazet 提交于 6月 30, 2015

Mateusz Guzik reported :

 Currently obtaining a new file descriptor results in locking fdtable
 twice - once in order to reserve a slot and second time to fill it.

Holding the spinlock in __fd_install() is needed in case a resize is
done, or to prevent a resize.

Mateusz provided an RFC patch and a micro benchmark :
  http://people.redhat.com/~mguzik/pipebench.c

A resize is an unlikely operation in a process lifetime,
as table size is at least doubled at every resize.

We can use RCU instead of the spinlock.

__fd_install() must wait if a resize is in progress.

The resize must block new __fd_install() callers from starting,
and wait that ongoing install are finished (synchronize_sched())

resize should be attempted by a single thread to not waste resources.

rcu_sched variant is used, as __fd_install() and expand_fdtable() run
from process context.

It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
CPU E5-2696 v2 @ 2.50GHz
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NMateusz Guzik <mguzik@redhat.com>
Acked-by: NMateusz Guzik <mguzik@redhat.com>
Tested-by: NMateusz Guzik <mguzik@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8a81252b

17 4月, 2015 1 次提交

mm: rcu-protected get_mm_exe_file() · 90f31d0e

由 Konstantin Khlebnikov 提交于 4月 16, 2015

This patch removes mm->mmap_sem from mm->exe_file read side.
Also it kills dup_mm_exe_file() and moves exe_file duplication into
dup_mmap() where both mmap_sems are locked.

[akpm@linux-foundation.org: fix comment typo]
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90f31d0e

11 12月, 2014 1 次提交

fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) · 8d10a035

由 Yann Droneaud 提交于 12月 10, 2014

This patch replaces calls to get_unused_fd() with equivalent call to
get_unused_fd_flags(0) to preserve current behavor for existing code.

In a further patch, get_unused_fd() will be removed so that new code
start using get_unused_fd_flags(), with the hope O_CLOEXEC could be
used, either by default or choosen by userspace.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d10a035

09 10月, 2014 1 次提交
- A
  missing annotation in fs/file.c · e983094d
  由 Al Viro 提交于 8月 31, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  e983094d
08 9月, 2014 1 次提交

rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops · bde6c3aa

由 Paul E. McKenney 提交于 7月 01, 2014

RCU-tasks requires the occasional voluntary context switch
from CPU-bound in-kernel tasks.  In some cases, this requires
instrumenting cond_resched().  However, there is some reluctance
to countenance unconditionally instrumenting cond_resched() (see
http://lwn.net/Articles/603252/), so this commit creates a separate
cond_resched_rcu_qs() that may be used in place of cond_resched() in
locations prone to long-duration in-kernel looping.

This commit currently instruments only RCU-tasks.  Future possibilities
include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
IPI usage.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bde6c3aa

07 5月, 2014 1 次提交
- A
  fs/file.c: don't open-code kvfree() · f6c0a192
  由 Al Viro 提交于 4月 23, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f6c0a192
02 4月, 2014 1 次提交

get rid of files_defer_init() · 7f4b36f9

由 Al Viro 提交于 3月 14, 2014

the only thing it's doing these days is calculation of
upper limit for fs.nr_open sysctl and that can be done
statically
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7f4b36f9

23 3月, 2014 1 次提交

vfs: Don't let __fdget_pos() get FMODE_PATH files · 99aea681

由 Eric Biggers 提交于 3月 16, 2014

Commit bd2a31d5 ("get rid of fget_light()") introduced the
__fdget_pos() function, which returns the resulting file pointer and
fdput flags combined in an 'unsigned long'.  However, it also changed the
behavior to return files with FMODE_PATH set, which shouldn't happen
because read(), write(), lseek(), etc. aren't allowed on such files.
This commit restores the old behavior.

This regression actually had no effect on read() and write() since
FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
to fail with ESPIPE rather than EBADF.
Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

99aea681

10 3月, 2014 1 次提交

get rid of fget_light() · bd2a31d5

由 Al Viro 提交于 3月 04, 2014

instead of returning the flags by reference, we can just have the
low-level primitive return those in lower bits of unsigned long,
with struct file * derived from the rest.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bd2a31d5

18 2月, 2014 1 次提交

fs: Substitute rcu_access_pointer() for rcu_dereference_raw() · add1f099

由 Paul E. McKenney 提交于 2月 12, 2014

(Trivial patch.)

If the code is looking at the RCU-protected pointer itself, but not
dereferencing it, the rcu_dereference() functions can be downgraded to
rcu_access_pointer().  This commit makes this downgrade in __alloc_fd(),
which simply compares the RCU-protected pointer against NULL with no
dereferencing.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

add1f099

11 2月, 2014 1 次提交

fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem · 96c7a2ff

由 Eric W. Biederman 提交于 2月 10, 2014

Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept.  At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.

I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.

There are always pathologies where order > 0 allocations can fail when
there are copious amounts of free memory available.  Using the pigeon
hole principle it is easy to show that it requires 1 page more than 50%
of the pages being free to guarantee an order 1 (8KiB) allocation will
succeed, 1 page more than 75% of the pages being free to guarantee an
order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
the pages being free to guarantee an order 3 allocate will succeed.

A server churning memory with a lot of small requests and replies like
memcached is a common case that if anything can will skew the odds
against large pages being available.

Therefore let's not give external applications a practical way to kill
linux server applications, and specify __GFP_NORETRY to the kmalloc in
alloc_fdmem.  Unless I am misreading the code and by the time the code
reaches should_alloc_retry in __alloc_pages_slowpath (where
__GFP_NORETRY becomes signification).  We have already tried everything
reasonable to allocate a page and the only thing left to do is wait.  So
not waiting and falling back to vmalloc immediately seems like the
reasonable thing to do even if there wasn't a chance of triggering the
OOM killer.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96c7a2ff

25 1月, 2014 5 次提交

fs: __fget_light() can use __fget() in slow path · e6ff9a9f

由 Oleg Nesterov 提交于 1月 13, 2014

The slow path in __fget_light() can use __fget() to avoid the
code duplication. Saves 232 bytes.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e6ff9a9f

fs: factor out common code in fget_light() and fget_raw_light() · ad461834

由 Oleg Nesterov 提交于 1月 13, 2014

Apart from FMODE_PATH check fget_light() and fget_raw_light() are
identical, shift the code into the new helper, __fget_light(fd, mask).
Saves 208 bytes.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ad461834

fs: factor out common code in fget() and fget_raw() · 1deb46e2

由 Oleg Nesterov 提交于 1月 13, 2014

Apart from FMODE_PATH check fget() and fget_raw() are identical,
shift the code into the new simple helper, __fget(fd, mask). Saves
160 bytes.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1deb46e2

change close_files() to use rcu_dereference_raw(files->fdt) · ce08b62d

由 Oleg Nesterov 提交于 1月 11, 2014

put_files_struct() and close_files() do rcu_read_lock() to make
rcu_dereference_check_fdtable() happy.

This looks a bit ugly, files_fdtable() just reads the pointer,
we can simply use rcu_dereference_raw() to avoid the warning.

The patch also changes close_files() to return fdt, this avoids
another rcu_read_lock()/files_fdtable() in put_files_struct().

I think close_files() needs more cleanups:

	- we do not need xchg() exactly because we are the last
	  user of this files_struct

	- "if (file)" should be turned into WARN_ON(!file)
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce08b62d

introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill rcu_my_thread_group_empty() · a8d4b834

由 Oleg Nesterov 提交于 1月 11, 2014

rcu_dereference_check_fdtable() looks very wrong,

1. rcu_my_thread_group_empty() was added by 844b9a87 "vfs: fix
   RCU-lockdep false positive due to /proc" but it doesn't really
   fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
   hit the same race with get_files_struct().

   And otoh rcu_my_thread_group_empty() can suppress the correct
   warning if the caller is the CLONE_FILES (without CLONE_THREAD)
   task.

2. files->count == 1 check is not really right too. Even if this
   files_struct is not shared it is not safe to access it lockless
   unless the caller is the owner.

   Otoh, this check is sub-optimal. files->count == 0 always means
   it is safe to use it lockless even if files != current->files,
   but put_files_struct() has to take rcu_read_lock(). See the next
   patch.

This patch removes the buggy checks and turns fcheck_files() into
__fcheck_files() which uses rcu_dereference_raw(), the "unshared"
callers, fget_light() and fget_raw_light(), can use it to avoid
the warning from RCU-lockdep.

fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
plus __fcheck_files().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8d4b834

02 5月, 2013 1 次提交
- A
  don't bother with deferred freeing of fdtables · ac3e3c5b
  由 Al Viro 提交于 4月 28, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  ac3e3c5b
19 2月, 2013 1 次提交

locking: Various static lock initializer fixes · eece09ec

由 Thomas Gleixner 提交于 7月 17, 2011

The static lock initializers want to be fed the proper name of the
lock and not some random string. In mainline random strings are
obfuscating the readability of debug output, but for RT they prevent
the spinlock substitution. Fix it up.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

eece09ec

04 1月, 2013 1 次提交

misc: remove __dev* attributes. · 6ae14171

由 Greg Kroah-Hartman 提交于 12月 21, 2012

CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the last of the __dev* markings from the kernel from
a variety of different, tiny, places.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6ae14171

30 11月, 2012 1 次提交

fix off-by-one in argument passed by iterate_fd() to callbacks · a77cfcb4

由 Al Viro 提交于 11月 29, 2012

Noticed by Pavel Roskin; the thing in his patch I disagree with
was compensating for that shite in callbacks instead of fixing
it once in the iterator itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a77cfcb4

29 11月, 2012 1 次提交
- A
  kill daemonize() · c4144670
  由 Al Viro 提交于 10月 02, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c4144670
12 11月, 2012 1 次提交

kill bogus BUG_ON() in do_close_on_exec() · 5a847766

由 Al Viro 提交于 11月 12, 2012

It can be legitimately triggered via procfs access.  Now, at least
2 of 3 of get_files_struct() callers in procfs are useless, but
when and if we get rid of those we can always add WARN_ON() here.
BUG_ON() at that spot is simply wrong.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5a847766

31 10月, 2012 1 次提交

Return the right error value when dup[23]() newfd argument is too large · 08f05c49

由 Al Viro 提交于 10月 31, 2012

Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE
case changed incorrectly after 3.6.

The culprit is commit f33ff992 ("take rlimit check to callers of
expand_files()") which when it moved the "return -EMFILE" out to the
caller, didn't notice that the dup3() had special code to turn the
EMFILE return into EBADF.

The replace_fd() helper that got added later then inherited the bug too.
Reported-by: NJack Lin <linliangjie@huawei.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
[ Noted more bugs, wrote proper changelog, fixed up typos - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08f05c49

10 10月, 2012 1 次提交

dup3: Return an error when oldfd == newfd. · aed97647

由 Richard W.M. Jones 提交于 10月 09, 2012

I have tested the attached patch to fix the dup3 regression.

Rich.

From 0944e30e12dec6544b3602626b60ff412375c78f Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 9 Oct 2012 14:42:45 +0100
Subject: [PATCH] dup3: Return an error when oldfd == newfd.

The following commit:

  commit fe17f22d
  Author: Al Viro <viro@zeniv.linux.org.uk>
  Date:   Tue Aug 21 11:48:11 2012 -0400

    take purely descriptor-related stuff from fcntl.c to file.c

was supposed to be just code motion, but it dropped the following two
lines:

  if (unlikely(oldfd == newfd))
          return -EINVAL;

from the dup3 system call.  dup3 is not specified by POSIX, so Linux
can do what it likes.  However the POSIX proposal for dup3 [1] states
that it should return an error if oldfd == newfd.

[1] http://austingroupbugs.net/view.php?id=411Signed-off-by: NRichard W.M. Jones <rjones@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aed97647

27 9月, 2012 15 次提交

A
export fget_light · 4557c669
由 Al Viro 提交于 8月 28, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
4557c669

new helper: daemonize_descriptors() · 864bdb3b

由 Al Viro 提交于 8月 22, 2012

descriptor-related parts of daemonize, done right.  As the
result we simplify the locking rules for ->files - we
hold task_lock in *all* cases when we modify ->files.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

864bdb3b

new helper: iterate_fd() · c3c073f8

由 Al Viro 提交于 8月 21, 2012

iterates through the opened files in given descriptor table,
calling a supplied function; we stop once non-zero is returned.
Callback gets struct file *, descriptor number and const void *
argument passed to iterator.  It is called with files->file_lock
held, so it is not allowed to block.

tty_io, netprio_cgroup and selinux flush_unauthorized_files()
converted to its use.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c3c073f8

A
make expand_files() and alloc_fd() static · ad47bd72
由 Al Viro 提交于 8月 21, 2012
```
no callers outside of fs/file.c left
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ad47bd72
A
take __{set,clear}_{open_fd,close_on_exec}() into fs/file.c · b8318b01
由 Al Viro 提交于 8月 21, 2012
```
nobody uses those outside anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b8318b01

new helper: replace_fd() · 8280d161

由 Al Viro 提交于 8月 21, 2012

analog of dup2(), except that it takes struct file * as source.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8280d161

A
take purely descriptor-related stuff from fcntl.c to file.c · fe17f22d
由 Al Viro 提交于 8月 21, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
fe17f22d

take close-on-exec logics to fs/file.c, clean it up a bit · 6a6d27de

由 Al Viro 提交于 8月 21, 2012

... and add cond_resched() there, while we are at it.  We can
get large latencies as is...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6a6d27de

A
take descriptor-related part of close() to file.c · 483ce1d4
由 Al Viro 提交于 8月 19, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
483ce1d4
A
take fget() and friends to fs/file.c · 0ee8cdfe
由 Al Viro 提交于 8月 15, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0ee8cdfe

expose a low-level variant of fd_install() for binder · f869e8a7

由 Al Viro 提交于 8月 15, 2012

Similar situation to that of __alloc_fd(); do not use unless you
really have to.  You should not touch any descriptor table other
than your own; it's a sure sign of a really bad API design.

As with __alloc_fd(), you *must* use a first-class reference to
struct files_struct; something obtained by get_files_struct(some task)
(let alone direct task->files) will not do.  It must be either
current->files, or obtained by get_files_struct(current) by the
owner of that sucker and given to you.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f869e8a7

A
move put_unused_fd() and fd_install() to fs/file.c · 56007cae
由 Al Viro 提交于 8月 15, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
56007cae

trim free_fdtable_rcu() · 1983e781

由 Al Viro 提交于 8月 15, 2012

embedded case isn't hit anymore
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1983e781

don't bother with call_rcu() in put_files_struct() · b9e02af0

由 Al Viro 提交于 8月 15, 2012

At that point nobody can see us anyway; everything that
looks at files_fdtable(files) is separated from the
guts of put_files_struct(files) - either since files is
current->files or because we fetched it under task_lock()
and hadn't dropped that yet, or because we'd bumped
files->count while holding task_lock()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b9e02af0

A
move files_struct-related bits from kernel/exit.c to fs/file.c · 7cf4dc3c
由 Al Viro 提交于 8月 15, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7cf4dc3c