1. 19 1月, 2006 2 次提交
    • D
      [PATCH] Add pselect/ppoll system call implementation · 9f72949f
      David Woodhouse 提交于
      The following implementation of ppoll() and pselect() system calls
      depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
      thread_info.
      
      These system calls have to change the signal mask during their
      operation, and signal handlers must be invoked using the new, temporary
      signal mask. The old signal mask must be restored either upon successful
      exit from the system call, or upon returning from the invoked signal
      handler if the system call is interrupted. We can't simply restore the
      original signal mask and return to userspace, since the restored signal
      mask may actually block the signal which interrupted the system call.
      
      The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
      path to trap into do_signal() just as TIF_SIGPENDING does, and by
      causing do_signal() to use the saved signal mask instead of the current
      signal mask when setting up the stack frame for the signal handler -- or
      by causing do_signal() to simply restore the saved signal mask in the
      case where there is no handler to be invoked.
      
      The first patch implements the sys_pselect() and sys_ppoll() system
      calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
      #ifdef should go away in time when all architectures have implemented
      it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
      kernel (in the -mm tree), and the third patch then removes the
      arch-specific implementations of sys_rt_sigsuspend() and replaces them
      with generic versions using the same trick.
      
      The fourth and fifth patches, provided by David Howells, implement
      TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
      adds the syscalls to the i386 syscall table.
      
      This patch:
      
      Add the pselect() and ppoll() system calls, providing core routines usable by
      the original select() and poll() system calls and also the new calls (with
      their semantics w.r.t timeouts).
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9f72949f
    • U
      [PATCH] vfs: *at functions: core · 5590ff0d
      Ulrich Drepper 提交于
      Here is a series of patches which introduce in total 13 new system calls
      which take a file descriptor/filename pair instead of a single file
      name.  These functions, openat etc, have been discussed on numerous
      occasions.  They are needed to implement race-free filesystem traversal,
      they are necessary to implement a virtual per-thread current working
      directory (think multi-threaded backup software), etc.
      
      We have in glibc today implementations of the interfaces which use the
      /proc/self/fd magic.  But this code is rather expensive.  Here are some
      results (similar to what Jim Meyering posted before).
      
      The test creates a deep directory hierarchy on a tmpfs filesystem.  Then
      rm -fr is used to remove all directories.  Without syscall support I get
      this:
      
      real    0m31.921s
      user    0m0.688s
      sys     0m31.234s
      
      With syscall support the results are much better:
      
      real    0m20.699s
      user    0m0.536s
      sys     0m20.149s
      
      The interfaces are for obvious reasons currently not much used.  But they'll
      be used.  coreutils (and Jeff's posixutils) are already using them.
      Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
      them.  I expect a patch to make follow soon.  Every program which is walking
      the filesystem tree will benefit.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5590ff0d
  2. 15 1月, 2006 1 次提交
  3. 09 1月, 2006 1 次提交
    • N
      [PATCH] Fix overflow tests for compat_sys_fcntl64 locking · 2520f14c
      NeilBrown 提交于
      When making an fctl locking call through compat_sys_fcntl64 (i.e.  a 32bit
      app on a 64bit kernel), the syscall can return a locking range that is in
      conflict with the queried lock.
      
      If some aspect of this range does not fit in the 32bit structure, something
      needs to be done.
      
      The current code is wrong in several respects:
      
      - It returns data to userspace even if no conflict was found
         i.e. it should check l_type for F_UNLCK
      - It returns -EOVERFLOW too agressively.   A lock range covering
        the last possible byte of the file (start = COMPAT_OFF_T_MAX,
        len = 1) should be possible, but is rejected with the current test.
      - A extra-long 'len' should not be a problem.  If only that part
        of the conflicting lock that would be visible to the 32bit
        app needs to be reported to the 32bit app anyway.
      
      This patch addresses those three issues and adds a comment to (hopefully)
      record it for posterity.
      
      Note: this patch mainly affects test-cases.  Real applications rarely is
      ever see the problems.
      
      This patch has been tested (LSB test suite), and works.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2520f14c
  4. 05 1月, 2006 1 次提交
    • L
      Relax the rw_verify_area() error checking. · e28cc715
      Linus Torvalds 提交于
      In particular, allow over-large read- or write-requests to be downgraded
      to a more reasonable range, rather than considering them outright errors.
      
      We want to protect lower layers from (the sadly all too common) overflow
      conditions, but prefer to do so by chopping the requests up, rather than
      just refusing them outright.
      
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e28cc715
  5. 23 11月, 2005 1 次提交
    • D
      [PATCH] Fix error handling with put_compat_statfs() · 86e07ce7
      David Gibson 提交于
      In fs/compat.c, whenever put_compat_statfs() returns an error, the
      containing syscall returns -EFAULT.  This is presumably by analogy with the
      non-compat case, where any non-zero code from copy_to_user() should be
      translated into an EFAULT.  However, put_compat_statfs() is also return
      -EOVERFLOW.  The same applies for put_compat_statfs64().
      
      This bug can be observed with a statfs() on a hugetlbfs directory.
      hugetlbfs, when mounted without limits reports available, free and total
      blocks as -1 (itself a bug, another patch coming).  statfs() will
      mysteriously return EFAULT although it's parameters are perfectly valid
      addresses.
      
      This patch causes the compat versions of statfs() and statfs64() to
      correctly propogate the return values from put_compat_statfs() and
      put_compat_statfs64().
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86e07ce7
  6. 21 11月, 2005 1 次提交
  7. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: update_hiwaters just in time · 365e9c87
      Hugh Dickins 提交于
      update_mem_hiwater has attracted various criticisms, in particular from those
      concerned with mm scalability.  Originally it was called whenever rss or
      total_vm got raised.  Then many of those callsites were replaced by a timer
      tick call from account_system_time.  Now Frank van Maarseveen reports that to
      be found inadequate.  How about this?  Works for Frank.
      
      Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
      update_hiwater_rss and update_hiwater_vm.  Don't attempt to keep
      mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
      by 1): those are hot paths.  Do the opposite, update only when about to lower
      rss (usually by many), or just before final accounting in do_exit.  Handle
      mm->hiwater_vm in the same way, though it's much less of an issue.  Demand
      that whoever collects these hiwater statistics do the work of taking the
      maximum with rss or total_vm.
      
      And there has been no collector of these hiwater statistics in the tree.  The
      new convention needs an example, so match Frank's usage by adding a VmPeak
      line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
      (High-Water-Mark or High-Water-Memory).
      
      There was a particular anomaly during mremap move, that hiwater_vm might be
      captured too high.  A fleeting such anomaly remains, but it's quickly
      corrected now, whereas before it would stick.
      
      What locking?  None: if the app is racy then these statistics will be racy,
      it's not worth any overhead to make them exact.  But whenever it suits,
      hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
      page_table_lock (for now) or with preemption disabled (later on): without
      going to any trouble, minimize the time between reading current values and
      updating, to minimize those occasions when a racing thread bumps a count up
      and back down in between.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      365e9c87
  8. 15 9月, 2005 1 次提交
  9. 10 9月, 2005 2 次提交
  10. 08 9月, 2005 3 次提交
  11. 13 7月, 2005 1 次提交
    • R
      [PATCH] inotify · 0eeca283
      Robert Love 提交于
      inotify is intended to correct the deficiencies of dnotify, particularly
      its inability to scale and its terrible user interface:
      
              * dnotify requires the opening of one fd per each directory
                that you intend to watch. This quickly results in too many
                open files and pins removable media, preventing unmount.
              * dnotify is directory-based. You only learn about changes to
                directories. Sure, a change to a file in a directory affects
                the directory, but you are then forced to keep a cache of
                stat structures.
              * dnotify's interface to user-space is awful.  Signals?
      
      inotify provides a more usable, simple, powerful solution to file change
      notification:
      
              * inotify's interface is a system call that returns a fd, not SIGIO.
      	  You get a single fd, which is select()-able.
              * inotify has an event that says "the filesystem that the item
                you were watching is on was unmounted."
              * inotify can watch directories or files.
      
      Inotify is currently used by Beagle (a desktop search infrastructure),
      Gamin (a FAM replacement), and other projects.
      
      See Documentation/filesystems/inotify.txt.
      Signed-off-by: NRobert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0eeca283
  12. 28 4月, 2005 1 次提交
  13. 19 4月, 2005 1 次提交
  14. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4