1. 14 1月, 2009 3 次提交
  2. 07 1月, 2009 1 次提交
    • T
      poll: allow f_op->poll to sleep · 5f820f64
      Tejun Heo 提交于
      f_op->poll is the only vfs operation which is not allowed to sleep.  It's
      because poll and select implementation used task state to synchronize
      against wake ups, which doesn't have to be the case anymore as wait/wake
      interface can now use custom wake up functions.  The non-sleep restriction
      can be a bit tricky because ->poll is not called from an atomic context
      and the result of accidentally sleeping in ->poll only shows up as
      temporary busy looping when the timing is right or rather wrong.
      
      This patch converts poll/select to use custom wake up function and use
      separate triggered variable to synchronize against wake up events.  The
      only added overhead is an extra function call during wake up and
      negligible.
      
      This patch removes the one non-sleep exception from vfs locking rules and
      is beneficial to userland filesystem implementations like FUSE, 9p or
      peculiar fs like spufs as it's very difficult for those to implement
      non-sleeping poll method.
      
      While at it, make the following cosmetic changes to make poll.h and
      select.c checkpatch friendly.
      
      * s/type * symbol/type *symbol/		   : three places in poll.h
      * remove blank line before EXPORT_SYMBOL() : two places in select.c
      
      Oleg: spotted missing barrier in poll_schedule_timeout()
      Davide: spotted missing write barrier in pollwake()
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Brad Boyer <flar@allandria.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f820f64
  3. 27 10月, 2008 1 次提交
  4. 08 9月, 2008 2 次提交
  5. 06 9月, 2008 3 次提交
  6. 23 6月, 2008 1 次提交
  7. 02 5月, 2008 2 次提交
  8. 30 4月, 2008 2 次提交
  9. 22 4月, 2008 1 次提交
  10. 07 2月, 2008 1 次提交
  11. 20 10月, 2007 1 次提交
  12. 17 10月, 2007 3 次提交
  13. 12 9月, 2007 1 次提交
    • A
      Fix select on /proc files without ->poll · dd23aae4
      Alexey Dobriyan 提交于
      Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
      786d7e16 aka "Fix rmmod/read/write races
      in /proc entries" broke SBCL + SLIME combo.
      
      The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
      ->poll handler.  The new code makes ->poll always there and returns 0 by
      default, which is not correct.  Return DEFAULT_POLLMASK instead.
      
      Steps to reproduce:
      
      	install emacs, SBCL, SLIME
      	emacs
      	M-x slime	in *inferior-lisp* buffer
      	[watch it doing "Connecting to Swank on port X.."]
      
      Please, apply before 2.6.23.
      
      P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd23aae4
  14. 09 5月, 2007 3 次提交
  15. 11 12月, 2006 1 次提交
    • V
      [PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69
      Vadim Lobanov 提交于
      Currently, each fdtable supports three dynamically-sized arrays of data: the
      fdarray and two fdsets.  The code allows the number of fds supported by the
      fdarray (fdtable->max_fds) to differ from the number of fds supported by each
      of the fdsets (fdtable->max_fdset).
      
      In practice, it is wasteful for these two sizes to differ: whenever we hit a
      limit on the smaller-capacity structure, we will reallocate the entire fdtable
      and all the dynamic arrays within it, so any delta in the memory used by the
      larger-capacity structure will never be touched at all.
      
      Rather than hogging this excess, we shouldn't even allocate it in the first
      place, and keep the capacities of the fdarray and the fdsets equal.  This
      patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
      code becomes simpler.
      Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bbea9f69
  16. 30 9月, 2006 1 次提交
    • C
      [PATCH] enforce RLIMIT_NOFILE in poll() · 4e6fd33b
      Chris Snook 提交于
      POSIX states that poll() shall fail with EINVAL if nfds > OPEN_MAX.  In
      this context, POSIX is referring to sysconf(OPEN_MAX), which is the value
      of current->signal->rlim[RLIMIT_NOFILE].rlim_cur in the linux kernel, not
      the compile-time constant which happens to also be named OPEN_MAX.  In the
      current code, an application may poll up to max_fdset file descriptors,
      even if this exceeds RLIMIT_NOFILE.  The current code also breaks
      applications which poll more than max_fdset descriptors, which worked circa
      2.4.18 when the check was against NR_OPEN, which is 1024*1024.  This patch
      enforces the limit precisely as POSIX defines, even if RLIMIT_NOFILE has
      been changed at run time with ulimit -n.
      
      To elaborate on the rationale for this, there are three cases:
      
      1) RLIMIT_NOFILE is at the default value of 1024
      
      In this (default) case, the patch changes nothing.  Calls with nfds > 1024
      fail with EINVAL both before and after the patch, and calls with nfds <=
      1024 pass the check both before and after the patch, since 1024 is the
      initial value of max_fdset.
      
      2) RLIMIT_NOFILE has been raised above the default
      
      In this case, poll() becomes more permissive, allowing polling up to
      RLIMIT_NOFILE file descriptors even if less than 1024 have been opened.
      The patch won't introduce new errors here.  If an application somehow
      depends on poll() failing when it polls with duplicate or invalid file
      descriptors, it's already broken, since this is already allowed below 1024,
      and will also work above 1024 if enough file descriptors have been open at
      some point to cause max_fdset to have been increased above nfds.
      
      3) RLIMIT_NOFILE has been lowered below the default
      
      In this case, the system administrator or the user has gone out of their
      way to protect the system from inefficient (or malicious) applications
      wasting kernel memory.  The current code allows polling up to 1024 file
      descriptors even if RLIMIT_NOFILE is much lower, which is not what the user
      or administrator intended.  Well-written applications which only poll
      valid, unique file descriptors will never notice the difference, because
      they'll hit the limit on open() first.  If an application gets broken
      because of the patch in this case, then it was already poorly/maliciously
      designed, and allowing it to work in the past was a violation of POSIX and
      a DoS risk on low-resource systems.
      
      With this patch, poll() will permit exactly what POSIX suggests, no more,
      no less, and for any run-time value set with ulimit -n, not just 256 or
      1024.  There are existing apps which which poll a large number of file
      descriptors, some of which may be invalid, and if those numbers stradle
      1024, they currently fail with or without the patch in -mm, though they
      worked fine under 2.4.18.
      Signed-off-by: NChris Snook <csnook@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4e6fd33b
  17. 26 6月, 2006 1 次提交
  18. 23 6月, 2006 1 次提交
    • V
      [PATCH] Poll cleanups/microoptimizations · 4a4b69f7
      Vadim Lobanov 提交于
      The "count" and "pt" variables are declared and modified by do_poll(), as
      well as accessed and written indirectly in the do_pollfd() subroutine.
      
      This patch pulls all handling of these variables into the do_poll()
      function, thereby eliminating the odd use of indirection in do_pollfd().
      This is done by pulling the "struct pollfd" traversal loop from do_pollfd()
      into its only caller do_poll().  As an added bonus, the patch saves a few
      clock cycles, and also adds comments to make the code easier to follow.
      Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4a4b69f7
  19. 11 4月, 2006 2 次提交
  20. 01 4月, 2006 1 次提交
  21. 29 3月, 2006 3 次提交
    • A
      [PATCH] mark f_ops const in the inode · 99ac48f5
      Arjan van de Ven 提交于
      Mark the f_ops members of inodes as const, as well as fix the
      ripple-through this causes by places that copy this f_ops and then "do
      stuff" with it.
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      99ac48f5
    • E
      [PATCH] use fget_light() in select/poll · e4a1f129
      Eric Dumazet 提交于
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e4a1f129
    • A
      [PATCH] Optimize select/poll by putting small data sets on the stack · 70674f95
      Andi Kleen 提交于
      Optimize select and poll by a using stack space for small fd sets
      
      This brings back an old optimization from Linux 2.0.  Using the stack is
      faster than kmalloc.  On a Intel P4 system it speeds up a select of a
      single pty fd by about 13% (~4000 cycles -> ~3500)
      
      It also saves memory because a daemon hanging in select or poll will
      usually save one or two less pages.  This can add up - e.g.  if you have 10
      daemons blocking in poll/select you save 40KB of memory.
      
      I did a patch for this long ago, but it was never applied.  This version is
      a reimplementation of the old patch that tries to be less intrusive.  I
      only did the minimal changes needed for the stack allocation.
      
      The cut off point before external memory is allocated is currently at
      832bytes.  The system calls always allocate this much memory on the stack.
      
      These 832 bytes are divided into 256 bytes frontend data (for the select
      bitmaps of the pollfds) and the rest of the space for the wait queues used
      by the low level drivers.  There are some extreme cases where this won't
      work out for select and it falls back to allocating memory too early -
      especially with very sparse large select bitmaps - but the majority of
      processes who only have a small number of file descriptors should be ok.
      [TBD: 832/256 might not be the best split for select or poll]
      
      I suspect more optimizations might be possible, but they would be more
      complicated.  One way would be to cache the select/poll context over
      multiple system calls because typically the input values should be similar.
       Problem is when to flush the file descriptors out though.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      70674f95
  22. 18 2月, 2006 1 次提交
  23. 12 2月, 2006 1 次提交
    • A
      [PATCH] select: fix returned timeval · 643a6545
      Andrew Morton 提交于
      With David Woodhouse <dwmw2@infradead.org>
      
      select() presently has a habit of increasing the value of the user's
      `timeout' argument on return.
      
      We were writing back a timeout larger than the original.  We _deliberately_
      round up, since we know we must wait at _least_ as long as the caller asks
      us to.
      
      The patch adds a couple of helper functions for magnitude comparison of
      timespecs and of timevals, and uses them to prevent the various poll and
      select functions from returning a timeout which is larger than the one which
      was passed in.
      
      The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
      timeout value to the old one and was returning that.  It should just return
      the new timeout value.
      
      (We have various handy timespec/timeval-to-from-nsec conversion functions in
      time.h.  But this code open-codes it all).
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: george anzinger <george@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      643a6545
  24. 08 2月, 2006 1 次提交
  25. 19 1月, 2006 1 次提交
    • D
      [PATCH] Add pselect/ppoll system call implementation · 9f72949f
      David Woodhouse 提交于
      The following implementation of ppoll() and pselect() system calls
      depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
      thread_info.
      
      These system calls have to change the signal mask during their
      operation, and signal handlers must be invoked using the new, temporary
      signal mask. The old signal mask must be restored either upon successful
      exit from the system call, or upon returning from the invoked signal
      handler if the system call is interrupted. We can't simply restore the
      original signal mask and return to userspace, since the restored signal
      mask may actually block the signal which interrupted the system call.
      
      The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
      path to trap into do_signal() just as TIF_SIGPENDING does, and by
      causing do_signal() to use the saved signal mask instead of the current
      signal mask when setting up the stack frame for the signal handler -- or
      by causing do_signal() to simply restore the saved signal mask in the
      case where there is no handler to be invoked.
      
      The first patch implements the sys_pselect() and sys_ppoll() system
      calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
      #ifdef should go away in time when all architectures have implemented
      it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
      kernel (in the -mm tree), and the third patch then removes the
      arch-specific implementations of sys_rt_sigsuspend() and replaces them
      with generic versions using the same trick.
      
      The fourth and fifth patches, provided by David Howells, implement
      TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
      adds the syscalls to the i386 syscall table.
      
      This patch:
      
      Add the pselect() and ppoll() system calls, providing core routines usable by
      the original select() and poll() system calls and also the new calls (with
      their semantics w.r.t timeouts).
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9f72949f
  26. 10 9月, 2005 1 次提交