1. 03 10月, 2006 1 次提交
    • D
      [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers · afefdbb2
      David Howells 提交于
      These patches make the kernel pass 64-bit inode numbers internally when
      communicating to userspace, even on a 32-bit system.  They are required
      because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
      for example.  The 64-bit inode numbers are then propagated to userspace
      automatically where the arch supports it.
      
      Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
      number returned by stat64() or getdents64() to differentiate files, and
      failing because the 64-bit inode number space was compressed to 32-bits, and
      so overlaps occur.
      
      This patch:
      
      Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
      inode number so that 64-bit inode numbers can be passed back to userspace.
      
      The stat functions then returns the full 64-bit inode number where
      available and where possible.  If it is not possible to represent the inode
      number supplied by the filesystem in the field provided by userspace, then
      error EOVERFLOW will be issued.
      
      Similarly, the getdents/readdir functions now pass the full 64-bit inode
      number to userspace where possible, returning EOVERFLOW instead when a
      directory entry is encountered that can't be properly represented.
      
      Note that this means that some inodes will not be stat'able on a 32-bit
      system with old libraries where they were before - but it does mean that
      there will be no ambiguity over what a 32-bit inode number refers to.
      
      Note similarly that directory scans may be cut short with an error on a
      32-bit system with old libraries where the scan would work before for the
      same reasons.
      
      It is judged unlikely that this situation will occur because modern glibc
      uses 64-bit capable versions of stat and getdents class functions
      exclusively, and that older systems are unlikely to encounter
      unrepresentable inode numbers anyway.
      
      [akpm: alpha build fix]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      afefdbb2
  2. 02 10月, 2006 1 次提交
  3. 01 10月, 2006 3 次提交
  4. 26 9月, 2006 1 次提交
    • A
      [PATCH] Check return value of copy_to_user in compat_sys_pselect7 · 75833345
      Andi Kleen 提交于
      Fix
      
      linux/fs/compat.c: In function compat_sys_pselect7
      linux/fs/compat.c:1869: warning: ignoring return value of copy_to_user, declared with attribute warn_unused_result
      
      To make it easier to handle I changed to semantics to not try to
      write out a timespec if an error occurred. I hope that's ok.
      
      Cc: dwmw2@infradead.org
      Signed-off-by: NAndi Kleen <ak@suse.de>
      75833345
  5. 27 6月, 2006 1 次提交
  6. 23 6月, 2006 1 次提交
  7. 22 5月, 2006 1 次提交
    • L
      [PATCH] NFS: fix error handling on access_ok in compat_sys_nfsservctl · d64b1c87
      Lin Feng Shen 提交于
      Functions compat_nfs_svc_trans, compat_nfs_clnt_trans,
      compat_nfs_exp_trans, compat_nfs_getfd_trans and compat_nfs_getfs_trans,
      which are called by compat_sys_nfsservctl(fs/compat.c), don't handle the
      return value of access_ok properly.  access_ok return 1 when the addr is
      valid, and 0 when it's not, but these functions have the reversed
      understanding.  When the address is valid, they always return -EFAULT to
      compat_sys_nfsservctl.
      
      An example is to run /usr/sbin/rpc.nfsd(32bit program on Power5).  It
      doesn't function as expected.  strace showes that nfsservctl returns
      -EFAULT.
      
      The patch fixes this by correcting the error handling on the return value
      of access_ok in the five functions.
      Signed-off-by: NLin Feng Shen <shenlinf@cn.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Acked-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d64b1c87
  8. 16 5月, 2006 1 次提交
  9. 04 5月, 2006 1 次提交
  10. 02 5月, 2006 1 次提交
  11. 26 4月, 2006 1 次提交
  12. 29 3月, 2006 1 次提交
  13. 26 3月, 2006 2 次提交
  14. 24 3月, 2006 1 次提交
  15. 18 2月, 2006 1 次提交
  16. 12 2月, 2006 1 次提交
    • A
      [PATCH] select: fix returned timeval · 643a6545
      Andrew Morton 提交于
      With David Woodhouse <dwmw2@infradead.org>
      
      select() presently has a habit of increasing the value of the user's
      `timeout' argument on return.
      
      We were writing back a timeout larger than the original.  We _deliberately_
      round up, since we know we must wait at _least_ as long as the caller asks
      us to.
      
      The patch adds a couple of helper functions for magnitude comparison of
      timespecs and of timevals, and uses them to prevent the various poll and
      select functions from returning a timeout which is larger than the one which
      was passed in.
      
      The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
      timeout value to the old one and was returning that.  It should just return
      the new timeout value.
      
      (We have various handy timespec/timeval-to-from-nsec conversion functions in
      time.h.  But this code open-codes it all).
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: george anzinger <george@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      643a6545
  17. 02 2月, 2006 2 次提交
  18. 20 1月, 2006 1 次提交
  19. 19 1月, 2006 2 次提交
    • D
      [PATCH] Add pselect/ppoll system call implementation · 9f72949f
      David Woodhouse 提交于
      The following implementation of ppoll() and pselect() system calls
      depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
      thread_info.
      
      These system calls have to change the signal mask during their
      operation, and signal handlers must be invoked using the new, temporary
      signal mask. The old signal mask must be restored either upon successful
      exit from the system call, or upon returning from the invoked signal
      handler if the system call is interrupted. We can't simply restore the
      original signal mask and return to userspace, since the restored signal
      mask may actually block the signal which interrupted the system call.
      
      The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
      path to trap into do_signal() just as TIF_SIGPENDING does, and by
      causing do_signal() to use the saved signal mask instead of the current
      signal mask when setting up the stack frame for the signal handler -- or
      by causing do_signal() to simply restore the saved signal mask in the
      case where there is no handler to be invoked.
      
      The first patch implements the sys_pselect() and sys_ppoll() system
      calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
      #ifdef should go away in time when all architectures have implemented
      it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
      kernel (in the -mm tree), and the third patch then removes the
      arch-specific implementations of sys_rt_sigsuspend() and replaces them
      with generic versions using the same trick.
      
      The fourth and fifth patches, provided by David Howells, implement
      TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
      adds the syscalls to the i386 syscall table.
      
      This patch:
      
      Add the pselect() and ppoll() system calls, providing core routines usable by
      the original select() and poll() system calls and also the new calls (with
      their semantics w.r.t timeouts).
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9f72949f
    • U
      [PATCH] vfs: *at functions: core · 5590ff0d
      Ulrich Drepper 提交于
      Here is a series of patches which introduce in total 13 new system calls
      which take a file descriptor/filename pair instead of a single file
      name.  These functions, openat etc, have been discussed on numerous
      occasions.  They are needed to implement race-free filesystem traversal,
      they are necessary to implement a virtual per-thread current working
      directory (think multi-threaded backup software), etc.
      
      We have in glibc today implementations of the interfaces which use the
      /proc/self/fd magic.  But this code is rather expensive.  Here are some
      results (similar to what Jim Meyering posted before).
      
      The test creates a deep directory hierarchy on a tmpfs filesystem.  Then
      rm -fr is used to remove all directories.  Without syscall support I get
      this:
      
      real    0m31.921s
      user    0m0.688s
      sys     0m31.234s
      
      With syscall support the results are much better:
      
      real    0m20.699s
      user    0m0.536s
      sys     0m20.149s
      
      The interfaces are for obvious reasons currently not much used.  But they'll
      be used.  coreutils (and Jeff's posixutils) are already using them.
      Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
      them.  I expect a patch to make follow soon.  Every program which is walking
      the filesystem tree will benefit.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5590ff0d
  20. 15 1月, 2006 1 次提交
  21. 09 1月, 2006 1 次提交
    • N
      [PATCH] Fix overflow tests for compat_sys_fcntl64 locking · 2520f14c
      NeilBrown 提交于
      When making an fctl locking call through compat_sys_fcntl64 (i.e.  a 32bit
      app on a 64bit kernel), the syscall can return a locking range that is in
      conflict with the queried lock.
      
      If some aspect of this range does not fit in the 32bit structure, something
      needs to be done.
      
      The current code is wrong in several respects:
      
      - It returns data to userspace even if no conflict was found
         i.e. it should check l_type for F_UNLCK
      - It returns -EOVERFLOW too agressively.   A lock range covering
        the last possible byte of the file (start = COMPAT_OFF_T_MAX,
        len = 1) should be possible, but is rejected with the current test.
      - A extra-long 'len' should not be a problem.  If only that part
        of the conflicting lock that would be visible to the 32bit
        app needs to be reported to the 32bit app anyway.
      
      This patch addresses those three issues and adds a comment to (hopefully)
      record it for posterity.
      
      Note: this patch mainly affects test-cases.  Real applications rarely is
      ever see the problems.
      
      This patch has been tested (LSB test suite), and works.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2520f14c
  22. 05 1月, 2006 1 次提交
    • L
      Relax the rw_verify_area() error checking. · e28cc715
      Linus Torvalds 提交于
      In particular, allow over-large read- or write-requests to be downgraded
      to a more reasonable range, rather than considering them outright errors.
      
      We want to protect lower layers from (the sadly all too common) overflow
      conditions, but prefer to do so by chopping the requests up, rather than
      just refusing them outright.
      
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e28cc715
  23. 23 11月, 2005 1 次提交
    • D
      [PATCH] Fix error handling with put_compat_statfs() · 86e07ce7
      David Gibson 提交于
      In fs/compat.c, whenever put_compat_statfs() returns an error, the
      containing syscall returns -EFAULT.  This is presumably by analogy with the
      non-compat case, where any non-zero code from copy_to_user() should be
      translated into an EFAULT.  However, put_compat_statfs() is also return
      -EOVERFLOW.  The same applies for put_compat_statfs64().
      
      This bug can be observed with a statfs() on a hugetlbfs directory.
      hugetlbfs, when mounted without limits reports available, free and total
      blocks as -1 (itself a bug, another patch coming).  statfs() will
      mysteriously return EFAULT although it's parameters are perfectly valid
      addresses.
      
      This patch causes the compat versions of statfs() and statfs64() to
      correctly propogate the return values from put_compat_statfs() and
      put_compat_statfs64().
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86e07ce7
  24. 21 11月, 2005 1 次提交
  25. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: update_hiwaters just in time · 365e9c87
      Hugh Dickins 提交于
      update_mem_hiwater has attracted various criticisms, in particular from those
      concerned with mm scalability.  Originally it was called whenever rss or
      total_vm got raised.  Then many of those callsites were replaced by a timer
      tick call from account_system_time.  Now Frank van Maarseveen reports that to
      be found inadequate.  How about this?  Works for Frank.
      
      Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
      update_hiwater_rss and update_hiwater_vm.  Don't attempt to keep
      mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
      by 1): those are hot paths.  Do the opposite, update only when about to lower
      rss (usually by many), or just before final accounting in do_exit.  Handle
      mm->hiwater_vm in the same way, though it's much less of an issue.  Demand
      that whoever collects these hiwater statistics do the work of taking the
      maximum with rss or total_vm.
      
      And there has been no collector of these hiwater statistics in the tree.  The
      new convention needs an example, so match Frank's usage by adding a VmPeak
      line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
      (High-Water-Mark or High-Water-Memory).
      
      There was a particular anomaly during mremap move, that hiwater_vm might be
      captured too high.  A fleeting such anomaly remains, but it's quickly
      corrected now, whereas before it would stick.
      
      What locking?  None: if the app is racy then these statistics will be racy,
      it's not worth any overhead to make them exact.  But whenever it suits,
      hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
      page_table_lock (for now) or with preemption disabled (later on): without
      going to any trouble, minimize the time between reading current values and
      updating, to minimize those occasions when a racing thread bumps a count up
      and back down in between.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      365e9c87
  26. 15 9月, 2005 1 次提交
  27. 10 9月, 2005 2 次提交
  28. 08 9月, 2005 3 次提交
  29. 13 7月, 2005 1 次提交
    • R
      [PATCH] inotify · 0eeca283
      Robert Love 提交于
      inotify is intended to correct the deficiencies of dnotify, particularly
      its inability to scale and its terrible user interface:
      
              * dnotify requires the opening of one fd per each directory
                that you intend to watch. This quickly results in too many
                open files and pins removable media, preventing unmount.
              * dnotify is directory-based. You only learn about changes to
                directories. Sure, a change to a file in a directory affects
                the directory, but you are then forced to keep a cache of
                stat structures.
              * dnotify's interface to user-space is awful.  Signals?
      
      inotify provides a more usable, simple, powerful solution to file change
      notification:
      
              * inotify's interface is a system call that returns a fd, not SIGIO.
      	  You get a single fd, which is select()-able.
              * inotify has an event that says "the filesystem that the item
                you were watching is on was unmounted."
              * inotify can watch directories or files.
      
      Inotify is currently used by Beagle (a desktop search infrastructure),
      Gamin (a FAM replacement), and other projects.
      
      See Documentation/filesystems/inotify.txt.
      Signed-off-by: NRobert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0eeca283
  30. 28 4月, 2005 1 次提交
  31. 19 4月, 2005 1 次提交
  32. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4