提交 · 5ae87e79ecb5baa65e9cf48be874098fafad0668 · openeuler / raspberrypi-kernel

23 9月, 2009 1 次提交

poll/select: avoid arithmetic overflow in __estimate_accuracy() · 5ae87e79

由 Guillaume Knispel 提交于 9月 22, 2009

__estimate_accuracy() was prone to integer overflow, for example if *tv ==
{2147, 483648000} on a 32 bit computer (or even for delays as small as
{429, 500000000} if the task is niced).

Because the result was already forced between 0 and 100ms, the effect of
the overflow was not too problematic, but the use of the hrtimer range
feature was not optimal in overflow cases.

This patch ensures that there can not be an integer overflow in this
function.
Signed-off-by: NGuillaume Knispel <gknispel@proformatique.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ae87e79

16 8月, 2009 1 次提交

poll/select: initialize triggered field of struct poll_wqueues · b2add73d

由 Guillaume Knispel 提交于 8月 15, 2009

The triggered field of struct poll_wqueues introduced in commit
5f820f64 ("poll: allow f_op->poll to
sleep").

It was first set to 1 in pollwake() (now __pollwake() ), tested and
later set to 0 in poll_schedule_timeout(), but not initialized before.

As a result when the process needs to sleep, triggered was likely to be
non-zero even if pollwake() is not called before the first
poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be
called and an extra loop calling all ->poll() would be done.

This patch initialize triggered to 0 in poll_initwait() so the ->poll()
are not called twice before the process goes to sleep when it needs to.
Signed-off-by: NGuillaume Knispel <gknispel@proformatique.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2add73d

17 6月, 2009 1 次提交

poll: avoid extra wakeups in select/poll · 4938d7e0

由 Eric Dumazet 提交于 6月 16, 2009

After introduction of keyed wakeups Davide Libenzi did on epoll, we are
able to avoid spurious wakeups in poll()/select() code too.

For example, typical use of poll()/select() is to wait for incoming
network frames on many sockets.  But TX completion for UDP/TCP frames call
sock_wfree() which in turn schedules thread.

When scheduled, thread does a full scan of all polled fds and can sleep
again, because nothing is really available.  If number of fds is large,
this cause significant load.

This patch makes select()/poll() aware of keyed wakeups and useless
wakeups are avoided.  This reduces number of context switches by about 50%
on some setups, and work performed by sofirq handlers.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4938d7e0

14 1月, 2009 4 次提交

H
[CVE-2009-0029] System call wrappers part 32 · d4e82042
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
d4e82042
H
[CVE-2009-0029] System call wrappers part 23 · 5a8a82b1
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
5a8a82b1

[CVE-2009-0029] Make sys_pselect7 static · c9da9f21

由 Heiko Carstens 提交于 1月 14, 2009

Not a single architecture has wired up sys_pselect7 plus it is the
only system call with seven parameters. Just make it static and
rename it to do_pselect which will do the work for sys_pselect6.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

c9da9f21

Fix timeouts in sys_pselect7 · 62568510

由 Bernd Schmidt 提交于 1月 13, 2009

Since we (Analog Devices) updated our Blackfin kernel to 2.6.28, we've
seen occasional 5-second hangs from telnet.  telnetd calls select with a
NULL timeout, but with the new kernel, the system call occasionally
returns 0, which causes telnet to call sleep (5).  This did not happen
with earlier kernels.

The code in sys_pselect7 looks a bit strange, in particular the variable
"to" is initialized to NULL, then changed if a non-null timeout was
passed in, but not used further.  It needs to be passed to
core_sys_select instead of &end_time.

This bug was introduced by 8ff3e8e8
("select: switch select() and poll() over to hrtimers").
Signed-off-by: NBernd Schmidt <bernd.schmidt@analog.com>
Reviewed-by: NUlrich Drepper <drepper@redhat.com>
Tested-by: NRobin Getz <rgetz@blackfin.uclinux.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

62568510

07 1月, 2009 1 次提交

poll: allow f_op->poll to sleep · 5f820f64

由 Tejun Heo 提交于 1月 06, 2009

f_op->poll is the only vfs operation which is not allowed to sleep.  It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions.  The non-sleep restriction
can be a bit tricky because ->poll is not called from an atomic context
and the result of accidentally sleeping in ->poll only shows up as
temporary busy looping when the timing is right or rather wrong.

This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events.  The
only added overhead is an extra function call during wake up and
negligible.

This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.

While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.

* s/type * symbol/type *symbol/		   : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.c

Oleg: spotted missing barrier in poll_schedule_timeout()
Davide: spotted missing write barrier in pollwake()
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Brad Boyer <flar@allandria.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roland McGrath <roland@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f820f64

27 10月, 2008 1 次提交

select: deal with math overflow from borderline valid userland data · 4d36a9e6

由 Arjan van de Ven 提交于 10月 25, 2008

Some userland apps seem to pass in a "0" for the seconds, and several
seconds worth of usecs to select().  The old kernels accepted this just
fine, so the new kernels must too.

However, due to the upscaling of the microseconds to nanoseconds we had
some cases where we got math overflow, and depending on the GCC version
(due to inlining decisions) that actually resulted in an -EINVAL return.

This patch fixes this by adding the excess microseconds to the seconds
field.

Also with thanks to Marcin Slusarz for spotting some implementation bugs
in the diagnostics patches.
Reported-by: NCarlos R. Mafra <crmafra2@gmail.com>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d36a9e6

08 9月, 2008 2 次提交

hrtimer: fix signed/unsigned bug in slack estimator · 96d2ab48

由 Arjan van de Ven 提交于 9月 07, 2008

the slack estimator used unsigned math; however for very short delay it's
possible that by the time you calculate the timeout, it's already passed and
you get a negative time/slack... in an unsigned variable... which then gets
turned into a 100 msec delay rather than zero.

This patch fixes this by using a signed typee in the right places.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

96d2ab48

hrtimer: incorporate feedback from Peter Zijlstra · 4ce105d3

由 Arjan van de Ven 提交于 9月 07, 2008

(based on  lkml review)
* use rt_task()
* task_nice() has a sign
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

4ce105d3

06 9月, 2008 3 次提交

hrtimer: make select() and poll() use the hrtimer range feature · 90d6e24a

由 Arjan van de Ven 提交于 9月 01, 2008

This patch makes the select() and poll() hrtimers use the new range
feature and settings from the task struct.

In addition, this includes the estimate_accuracy() function that Linus
posted to lkml, but changed entirely based on other peoples lkml feedback.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

90d6e24a

select: switch select() and poll() over to hrtimers · 8ff3e8e8

由 Arjan van de Ven 提交于 8月 31, 2008

With lots of help, input and cleanups from Thomas Gleixner

This patch switches select() and poll() over to hrtimers.

The core of the patch is replacing the "s64 timeout" with a
"struct timespec end_time" in all the plumbing.

But most of the diffstat comes from using the just introduced helpers:
	poll_select_set_timeout
	poll_select_copy_remaining
	timespec_add_safe
which make manipulating the timespec easier and less error-prone.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

8ff3e8e8

select: add poll_select_set_timeout() and poll_select_copy_remaining() helpers · b773ad40

由 Thomas Gleixner 提交于 8月 31, 2008

This patch adds 2 helpers that will be used for the hrtimer based select/poll:

poll_select_set_timeout() is a helper that takes a timeout (as a second, nanosecond
pair) and turns that into a "struct timespec" that represents the absolute end time.
This is a common operation in the many select() and poll() variants and needs various,
common, sanity checks.

poll_select_copy_remaining() is a helper that takes care of copying the remaining
time to userspace, as select(), pselect() and ppoll() do. This function comes in
both a natural and a compat implementation (due to datastructure differences).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

b773ad40

23 6月, 2008 1 次提交

Fix performance regression on lmbench select benchmark · 55d85384

由 Linus Torvalds 提交于 6月 22, 2008

Christian Borntraeger reported that reinstating cond_resched() with
CONFIG_PREEMPT caused a performance regression on lmbench:

	For example select file 500:
	23 microseconds
	32 microseconds

and that's really because we totally unnecessarily do the cond_resched()
in the innermost loop of select(), which is just silly.

This moves it out from the innermost loop (which only ever loops ove the
bits in a single "unsigned long" anyway), which makes the performance
regression go away.
Reported-and-tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

55d85384

02 5月, 2008 2 次提交

[PATCH] split linux/file.h · 9f3acc31

由 Al Viro 提交于 4月 24, 2008

Initial splitoff of the low-level stuff; taken to fdtable.h
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9f3acc31

A
[PATCH] make osf_select() use core_sys_select() · a2dcb44c
由 Al Viro 提交于 4月 23, 2008
```
... instead of open-coding it
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a2dcb44c

30 4月, 2008 2 次提交

signals: use HAVE_SET_RESTORE_SIGMASK · f3de272b

由 Roland McGrath 提交于 4月 30, 2008

Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
#ifdef HAVE_SET_RESTORE_SIGMASK.  If arch code defines it first, the generic
set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3de272b

signals: add set_restore_sigmask · 4e4c22c7

由 Roland McGrath 提交于 4月 30, 2008

This adds the set_restore_sigmask() inline in <linux/thread_info.h> and
replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it.  No
change, but abstracts the details of the flag protocol from all the calls.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e4c22c7

22 4月, 2008 1 次提交

trivial: small cleanups · f5264481

由 Pavel Machek 提交于 4月 21, 2008

These are small cleanups all over the tree.

Trivial style and comment changes to
  fs/select.c, kernel/signal.c, kernel/stop_machine.c & mm/pdflush.c
Signed-off-by: NPavel Machek <pavel@suse.cz>
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>

f5264481

07 2月, 2008 1 次提交

make sys_poll() wait at least timeout ms · 844fcc53

由 Karsten Wiese 提交于 2月 06, 2008

schedule_timeout(jiffies) waits for at least jiffies - 1.  Add 1 jiffie to
the timeout_jiffies calculated in sys_poll() to wait at least
timeout_msecs, like poll() manpage says.
Signed-off-by: NKarsten Wiese <fzu@wemgehoertderstaat.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

844fcc53

20 10月, 2007 1 次提交

fs/select, remove unused macros · 1276b103

由 Jiri Slaby 提交于 10月 18, 2007

fs/select, remove unused macros

this is due to preparation for global BIT macro
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1276b103

17 10月, 2007 3 次提交

Use ERESTART_RESTARTBLOCK if poll() is interrupted by a signal · 3075d9da

由 Chris Wright 提交于 10月 16, 2007

Lomesh reported poll returning EINTR during suspend/resume cycle.  This is
caused by the STOP/CONT cycle that the freezer uses, generating a pending
signal for what in effect is an ignored signal.  In general poll is a
little eager in returning EINTR, when it could try not bother userspace and
simply restart the syscall.  Both select and ppoll do use ERESTARTNOHAND to
restart the syscall.  Oleg points out that simply using ERESTARTNOHAND will
cause poll to restart with original timeout value.  which could ultimately
lead to process never returning to userspace.  Instead use
ERESTART_RESTARTBLOCK, and restart poll with updated timeout value.
Inspired by Manfred's use ERESTARTNOHAND in poll patch.

[bunk@kernel.org: do_restart_poll() can become static]
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: "Agarwal, Lomesh" <lomesh.agarwal@intel.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3075d9da

do_poll: return -EINTR when signalled · 9bf084f7

由 Oleg Nesterov 提交于 10月 16, 2007

do_poll() checks signal_pending() but returns 0 when interrupted.  This means
the caller has to check signal_pending() again.

Change it to return -EINTR when signal_pending() and count == 0.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bf084f7

do_sys_poll: simplify playing with on-stack data · 252e5725

由 Oleg Nesterov 提交于 10月 16, 2007

Cleanup. Lessens both the source and compiled code (100 bytes) and imho makes
the code much more understandable.

With this patch "struct poll_list *head" always points to on-stack stack_pps,
so we can remove all "is it on-stack" and "was it initialized" checks.

Also, move poll_initwait/poll_freewait and -EINTR detection closer to the
do_poll()'s callsite.

[akpm@linux-foundation.org: fix warning (size_t != uint)]
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Looks-good-to: Andi Kleen <ak@suse.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

252e5725

12 9月, 2007 1 次提交

Fix select on /proc files without ->poll · dd23aae4

由 Alexey Dobriyan 提交于 9月 11, 2007

Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
786d7e16 aka "Fix rmmod/read/write races
in /proc entries" broke SBCL + SLIME combo.

The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
->poll handler.  The new code makes ->poll always there and returns 0 by
default, which is not correct.  Return DEFAULT_POLLMASK instead.

Steps to reproduce:

	install emacs, SBCL, SLIME
	emacs
	M-x slime	in *inferior-lisp* buffer
	[watch it doing "Connecting to Swank on port X.."]

Please, apply before 2.6.23.

P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd23aae4

09 5月, 2007 3 次提交

Style fix in fs/select.c · ccf6780d

由 WANG Cong 提交于 5月 09, 2007

Signed-off-by: NWANG Cong  <xiyou.wangcong@gmail.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

ccf6780d

ROUND_UP macro cleanup in fs/(select|compat|readdir).c · 022a1692

由 Milind Arun Choudhary 提交于 5月 08, 2007

ROUND_UP macro cleanup use,ALIGN or DIV_ROUND_UP where ever appropriate.
Signed-off-by: NMilind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

022a1692

header cleaning: don't include smp_lock.h when not used · e63340ae

由 Randy Dunlap 提交于 5月 08, 2007

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e63340ae

11 12月, 2006 1 次提交

[PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69

由 Vadim Lobanov 提交于 12月 10, 2006

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
code becomes simpler.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bbea9f69

30 9月, 2006 1 次提交

[PATCH] enforce RLIMIT_NOFILE in poll() · 4e6fd33b

由 Chris Snook 提交于 9月 29, 2006

POSIX states that poll() shall fail with EINVAL if nfds > OPEN_MAX. In
this context, POSIX is referring to sysconf(OPEN_MAX), which is the value
of current->signal->rlim[RLIMIT_NOFILE].rlim_cur in the linux kernel, not
the compile-time constant which happens to also be named OPEN_MAX. In the
current code, an application may poll up to max_fdset file descriptors,
even if this exceeds RLIMIT_NOFILE. The current code also breaks
applications which poll more than max_fdset descriptors, which worked circa
2.4.18 when the check was against NR_OPEN, which is 1024*1024. This patch
enforces the limit precisely as POSIX defines, even if RLIMIT_NOFILE has
been changed at run time with ulimit -n.

To elaborate on the rationale for this, there are three cases:

1) RLIMIT_NOFILE is at the default value of 1024

In this (default) case, the patch changes nothing. Calls with nfds > 1024
fail with EINVAL both before and after the patch, and calls with nfds <=
1024 pass the check both before and after the patch, since 1024 is the
initial value of max_fdset.

2) RLIMIT_NOFILE has been raised above the default

In this case, poll() becomes more permissive, allowing polling up to
RLIMIT_NOFILE file descriptors even if less than 1024 have been opened.
The patch won't introduce new errors here. If an application somehow
depends on poll() failing when it polls with duplicate or invalid file
descriptors, it's already broken, since this is already allowed below 1024,
and will also work above 1024 if enough file descriptors have been open at
some point to cause max_fdset to have been increased above nfds.

3) RLIMIT_NOFILE has been lowered below the default

In this case, the system administrator or the user has gone out of their
way to protect the system from inefficient (or malicious) applications
wasting kernel memory. The current code allows polling up to 1024 file
descriptors even if RLIMIT_NOFILE is much lower, which is not what the user
or administrator intended. Well-written applications which only poll
valid, unique file descriptors will never notice the difference, because
they'll hit the limit on open() first. If an application gets broken
because of the patch in this case, then it was already poorly/maliciously
designed, and allowing it to work in the past was a violation of POSIX and
a DoS risk on low-resource systems.

With this patch, poll() will permit exactly what POSIX suggests, no more,
no less, and for any run-time value set with ulimit -n, not just 256 or
1024. There are existing apps which which poll a large number of file
descriptors, some of which may be invalid, and if those numbers stradle
1024, they currently fail with or without the patch in -mm, though they
worked fine under 2.4.18.
Signed-off-by: NChris Snook <csnook@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4e6fd33b

26 6月, 2006 1 次提交

[PATCH] fs: sys_poll with timeout -1 bug fix · 04a3446c

由 Frode Isaksen 提交于 6月 25, 2006

If you do a poll() call with timeout -1, the wait will be a big number
(depending on HZ) instead of infinite wait, since -1 is passed to the
msecs_to_jiffies function.
Signed-off-by: NFrode Isaksen <frode.isaksen@gmail.com>
Acked-by: NNishanth Aravamudan <nacc@us.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

04a3446c

23 6月, 2006 1 次提交

[PATCH] Poll cleanups/microoptimizations · 4a4b69f7

由 Vadim Lobanov 提交于 6月 23, 2006

The "count" and "pt" variables are declared and modified by do_poll(), as
well as accessed and written indirectly in the do_pollfd() subroutine.

This patch pulls all handling of these variables into the do_poll()
function, thereby eliminating the odd use of indirection in do_pollfd().
This is done by pulling the "struct pollfd" traversal loop from do_pollfd()
into its only caller do_poll(). As an added bonus, the patch saves a few
clock cycles, and also adds comments to make the code easier to follow.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4a4b69f7

11 4月, 2006 2 次提交

[PATCH] select: don't overflow if (SELECT_STACK_ALLOC % sizeof(long) != 0) · b04eb6aa

由 Mitchell Blank Jr 提交于 4月 10, 2006

If SELECT_STACK_ALLOC is not a multiple of sizeof(long) then stack_fds[]
would be shorter than SELECT_STACK_ALLOC bytes and could overflow later in
the function. Fixed by simply rearranging the test later to work on
sizeof(stack_fds) Currently SELECT_STACK_ALLOC is 256 so this doesn't
happen, but it's nasty to have things like this hidden in the code. What
if later someone decides to change SELECT_STACK_ALLOC to 300?
Signed-off-by: NMitchell Blank Jr <mitch@sfgoth.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b04eb6aa

[PATCH] select() warning fixes · 29ff2db5

由 Andrew Morton 提交于 4月 10, 2006

fs/select.c: In function `core_sys_select':
fs/select.c:339: warning: assignment from incompatible pointer type
fs/select.c:376: warning: comparison of distinct pointer types lacks a cast

By using a void* we can remove lots of casts rather than adding more.

Cc: Jes Sorensen <jes@trained-monkey.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

29ff2db5

01 4月, 2006 1 次提交

[PATCH] avoid unaligned access when accessing poll stack · 30c14e40

由 Jes Sorensen 提交于 3月 31, 2006

Commit 70674f95:

  [PATCH] Optimize select/poll by putting small data sets on the stack

resulted in the poll stack being 4-byte aligned on 64-bit architectures,
causing misaligned accesses to elements in the array.

This patch fixes it by declaring the stack in terms of 'long' instead
of 'char'.

Force alignment of poll and select stacks to long to avoid unaligned
access on 64 bit architectures.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

30c14e40

29 3月, 2006 3 次提交

[PATCH] mark f_ops const in the inode · 99ac48f5

由 Arjan van de Ven 提交于 3月 28, 2006

Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.
Signed-off-by: NArjan van de Ven <arjan@infradead.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

99ac48f5

[PATCH] use fget_light() in select/poll · e4a1f129

由 Eric Dumazet 提交于 3月 28, 2006

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e4a1f129

[PATCH] Optimize select/poll by putting small data sets on the stack · 70674f95

由 Andi Kleen 提交于 3月 28, 2006

Optimize select and poll by a using stack space for small fd sets

This brings back an old optimization from Linux 2.0. Using the stack is
faster than kmalloc. On a Intel P4 system it speeds up a select of a
single pty fd by about 13% (~4000 cycles -> ~3500)

It also saves memory because a daemon hanging in select or poll will
usually save one or two less pages. This can add up - e.g. if you have 10
daemons blocking in poll/select you save 40KB of memory.

I did a patch for this long ago, but it was never applied. This version is
a reimplementation of the old patch that tries to be less intrusive. I
only did the minimal changes needed for the stack allocation.

The cut off point before external memory is allocated is currently at
832bytes. The system calls always allocate this much memory on the stack.

These 832 bytes are divided into 256 bytes frontend data (for the select
bitmaps of the pollfds) and the rest of the space for the wait queues used
by the low level drivers. There are some extreme cases where this won't
work out for select and it falls back to allocating memory too early -
especially with very sparse large select bitmaps - but the majority of
processes who only have a small number of file descriptors should be ok.
[TBD: 832/256 might not be the best split for select or poll]

I suspect more optimizations might be possible, but they would be more
complicated. One way would be to cache the select/poll context over
multiple system calls because typically the input values should be similar.
Problem is when to flush the file descriptors out though.
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

70674f95

18 2月, 2006 1 次提交

[PATCH] select: time comparison fixes · 74910e6c

由 Andrew Morton 提交于 2月 17, 2006

I got all of these backwards.  We want to return

	min(input timeout, new timeout)

to userspace to prevent increasing the time-remaining value.

Thanks to Ernst Herzberg <earny@net4u.de> for reporting and diagnosing.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

74910e6c