1. 23 6月, 2006 19 次提交
    • M
      [PATCH] vfs: add lock owner argument to flush operation · 75e1fcc0
      Miklos Szeredi 提交于
      Pass the POSIX lock owner ID to the flush operation.
      
      This is useful for filesystems which don't want to store any locking state
      in inode->i_flock but want to handle locking/unlocking POSIX locks
      internally.  FUSE is one such filesystem but I think it possible that some
      network filesystems would need this also.
      
      Also add a flag to indicate that a POSIX locking request was generated by
      close(), so filesystems using the above feature won't send an extra locking
      request in this case.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      75e1fcc0
    • M
      [PATCH] locks: clean up locks_remove_posix() · ff7b86b8
      Miklos Szeredi 提交于
      locks_remove_posix() can use posix_lock_file() instead of doing the lock
      removal by hand.  posix_lock_file() now does exacly the same.
      
      The comment about pids no longer applies, posix_lock_file() takes only the
      owner into account.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ff7b86b8
    • M
      [PATCH] locks: don't do unnecessary allocations · 39005d02
      Miklos Szeredi 提交于
      posix_lock_file() always allocates new locks in advance, even if it's easy to
      determine that no allocations will be needed.
      
      Optimize these cases:
      
       - FL_ACCESS flag is set
      
       - Unlocking the whole range
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39005d02
    • M
      [PATCH] locks: don't unnecessarily fail posix lock operations · 0d9a490a
      Miklos Szeredi 提交于
      posix_lock_file() was too cautious, failing operations on OOM, even if they
      didn't actually require an allocation.
      
      This has the disadvantage, that a failing unlock on process exit could lead to
      a memory leak.  There are two possibilites for this:
      
      - filesystem implements .lock() and calls back to posix_lock_file().  On
      cleanup of files_struct locks_remove_posix() is called which should remove all
      locks belonging to files_struct.  However if filesystem calls
      posix_lock_file() which fails, then those locks will never be freed.
      
      - if a file is closed while a lock is blocked, then after acquiring
      fcntl_setlk() will undo the lock.  But this unlock itself might fail on OOM,
      again possibly leaking the lock.
      
      The solution is to move the checking of the allocations until after it is sure
      that they will be needed.  This will solve the above problem since unlock will
      always succeed unless it splits an existing region.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0d9a490a
    • P
      [PATCH] read_mapping_page for address space · 090d2b18
      Pekka Enberg 提交于
      Add read_mapping_page() which is used for callers that pass
      mapping->a_ops->readpage as the filler for read_cache_page.  This removes
      some duplication from filesystem code.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      090d2b18
    • A
      [PATCH] ext2 XIP won't build without MMU · 0c426f26
      Al Viro 提交于
      Disable Ext2 XIP if the kernel is configured in no-MMU mode as the former
      won't build.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0c426f26
    • A
      [PATCH] frv: binfmt_elf_fdpic __user annotations · 530018bf
      Al Viro 提交于
      Add __user annotations to binfmt_elf_fdpic.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      530018bf
    • J
      [PATCH] lsm: add task_setioprio hook · 03e68060
      James Morris 提交于
      Implement an LSM hook for setting a task's IO priority, similar to the hook
      for setting a tasks's nice value.
      
      A previous version of this LSM hook was included in an older version of
      multiadm by Jan Engelhardt, although I don't recall it being submitted
      upstream.
      
      Also included is the corresponding SELinux hook, which re-uses the setsched
      permission in the proccess class.
      Signed-off-by: NJames Morris <jmorris@namei.org>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      03e68060
    • O
      [PATCH] writeback: fix range handling · 111ebb6e
      OGAWA Hirofumi 提交于
      When a writeback_control's `start' and `end' fields are used to
      indicate a one-byte-range starting at file offset zero, the required
      values of .start=0,.end=0 mean that the ->writepages() implementation
      has no way of telling that it is being asked to perform a range
      request.  Because we're currently overloading (start == 0 && end == 0)
      to mean "this is not a write-a-range request".
      
      To make all this sane, the patch changes range of writeback_control.
      
      So caller does: If it is calling ->writepages() to write pages, it
      sets range (range_start/end or range_cyclic) always.
      
      And if range_cyclic is true, ->writepages() thinks the range is
      cyclic, otherwise it just uses range_start and range_end.
      
      This patch does,
      
          - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
            -1 is usually ok for range_end (type is long long). But, if someone did,
      
      		range_end += val;		range_end is "val - 1"
      		u64val = range_end >> bits;	u64val is "~(0ULL)"
      
            or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
            things, and uses LLONG_MAX for range_end.
      
          - All callers of ->writepages() sets range_start/end or range_cyclic.
      
          - Fix updates of ->writeback_index. It seems already bit strange.
            If it starts at 0 and ended by check of nr_to_write, this last
            index may reduce chance to scan end of file.  So, this updates
            ->writeback_index only if range_cyclic is true or whole-file is
            scanned.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: "Vladimir V. Saveliev" <vs@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      111ebb6e
    • C
      [PATCH] tightening hugetlb strict accounting · a43a8c39
      Chen, Kenneth W 提交于
      Current hugetlb strict accounting for shared mapping always assume mapping
      starts at zero file offset and reserves pages between zero and size of the
      file.  This assumption often reserves (or lock down) a lot more pages then
      necessary if application maps at none zero file offset.  libhugetlbfs is
      one example that requires proper reservation on shared mapping starts at
      none zero offset.
      
      This patch extends the reservation and hugetlb strict accounting to support
      any arbitrary pair of (offset, len), resulting a much more robust and
      accurate scheme.  More importantly, it won't lock down any hugetlb pages
      outside file mapping.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a43a8c39
    • K
      [PATCH] for_each_possible_cpu: xfs · 6f0419e0
      KAMEZAWA Hiroyuki 提交于
      for_each_cpu() actually iterates across all possible CPUs.  We've had mistakes
      in the past where people were using for_each_cpu() where they should have been
      iterating across only online or present CPUs.  This is inefficient and
      possibly buggy.
      
      We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
      future.
      
      This patch replaces for_each_cpu with for_each_possible_cpu.
      in xfs.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NNathan Scott <nathans@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6f0419e0
    • D
      [PATCH] XFS: Use the dentry passed to statfs() to limit the scope of the results · d6938d1b
      David Howells 提交于
      Enable XFS to limit the statfs() results to the project quota covering the
      dentry used as a base for call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NNathan Scott <nathans@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6938d1b
    • D
      [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry · 726c3342
      David Howells 提交于
      Give the statfs superblock operation a dentry pointer rather than a superblock
      pointer.
      
      This complements the get_sb() patch.  That reduced the significance of
      sb->s_root, allowing NFS to place a fake root there.  However, NFS does
      require a dentry to use as a target for the statfs operation.  This permits
      the root in the vfsmount to be used instead.
      
      linux/mount.h has been added where necessary to make allyesconfig build
      successfully.
      
      Interest has also been expressed for use with the FUSE and XFS filesystems.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c3342
    • D
      [PATCH] VFS: Permit filesystem to override root dentry on mount · 454e2398
      David Howells 提交于
      Extend the get_sb() filesystem operation to take an extra argument that
      permits the VFS to pass in the target vfsmount that defines the mountpoint.
      
      The filesystem is then required to manually set the superblock and root dentry
      pointers.  For most filesystems, this should be done with simple_set_mnt()
      which will set the superblock pointer and then set the root dentry to the
      superblock's s_root (as per the old default behaviour).
      
      The get_sb() op now returns an integer as there's now no need to return the
      superblock pointer.
      
      This patch permits a superblock to be implicitly shared amongst several mount
      points, such as can be done with NFS to avoid potential inode aliasing.  In
      such a case, simple_set_mnt() would not be called, and instead the mnt_root
      and mnt_sb would be set directly.
      
      The patch also makes the following changes:
      
       (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
           pointer argument and return an integer, so most filesystems have to change
           very little.
      
       (*) If one of the convenience function is not used, then get_sb() should
           normally call simple_set_mnt() to instantiate the vfsmount. This will
           always return 0, and so can be tail-called from get_sb().
      
       (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
           dcache upon superblock destruction rather than shrink_dcache_anon().
      
           This is required because the superblock may now have multiple trees that
           aren't actually bound to s_root, but that still need to be cleaned up. The
           currently called functions assume that the whole tree is rooted at s_root,
           and that anonymous dentries are not the roots of trees which results in
           dentries being left unculled.
      
           However, with the way NFS superblock sharing are currently set to be
           implemented, these assumptions are violated: the root of the filesystem is
           simply a dummy dentry and inode (the real inode for '/' may well be
           inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
           with child trees.
      
           [*] Anonymous until discovered from another tree.
      
       (*) The documentation has been adjusted, including the additional bit of
           changing ext2_* into foo_* in the documentation.
      
      [akpm@osdl.org: convert ipath_fs, do other stuff]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454e2398
    • A
      [PATCH] prune_one_dentry() tweaks · d702ccb3
      Andrew Morton 提交于
      - Add description of d_lock handling to comments over prune_one_dentry().
      
      - It has three callsites - uninline it, saving 200 bytes of text.
      
      Cc: Jan Blunck <jblunck@suse.de>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Olaf Hering <olh@suse.de>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d702ccb3
    • N
      [PATCH] Fix dcache race during umount · 0feae5c4
      NeilBrown 提交于
      The race is that the shrink_dcache_memory shrinker could get called while a
      filesystem is being unmounted, and could try to prune a dentry belonging to
      that filesystem.
      
      If it does, then it will call in to iput on the inode while the dentry is
      no longer able to be found by the umounting process.  If iput takes a
      while, generic_shutdown_super could get all the way though
      shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
      ever waiting on this particular inode.
      
      Eventually the superblock gets freed anyway and if the iput tried to touch
      it (which some filesystems certainly do), it will lose.  The promised
      "Self-destruct in 5 seconds" doesn't lead to a nice day.
      
      The race is closed by holding s_umount while calling prune_one_dentry on
      someone else's dentry.  As a down_read_trylock is used,
      shrink_dcache_memory will no longer try to prune the dentry of a filesystem
      that is being unmounted, and unmount will not be able to start until any
      such active prune_one_dentry completes.
      
      This requires that prune_dcache *knows* which filesystem (if any) it is
      doing the prune on behalf of so that it can be careful of other
      filesystems.  shrink_dcache_memory isn't called it on behalf of any
      filesystem, and so is careful of everything.
      
      shrink_dcache_anon is now passed a super_block rather than the s_anon list
      out of the superblock, so it can get the s_anon list itself, and can pass
      the superblock down to prune_dcache.
      
      If prune_dcache finds a dentry that it cannot free, it leaves it where it
      is (at the tail of the list) and exits, on the assumption that some other
      thread will be removing that dentry soon.  To try to make sure that some
      work gets done, a limited number of dnetries which are untouchable are
      skipped over while choosing the dentry to work on.
      
      I believe this race was first found by Kirill Korotaev.
      
      Cc: Jan Blunck <jblunck@suse.de>
      Acked-by: NKirill Korotaev <dev@openvz.org>
      Cc: Olaf Hering <olh@suse.de>
      Acked-by: NBalbir Singh <balbir@in.ibm.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NBalbir Singh <balbir@in.ibm.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0feae5c4
    • M
      [PATCH] remove steal_locks() · c89681ed
      Miklos Szeredi 提交于
      This patch removes the steal_locks() function.
      
      steal_locks() doesn't work correctly with any filesystem that does it's own
      lock management, including NFS, CIFS, etc.
      
      In addition it has weird semantics on local filesystems in case tasks
      sharing file-descriptor tables are doing POSIX locking operations in
      parallel to execve().
      
      The steal_locks() function has an effect on applications doing:
      
      clone(CLONE_FILES)
        /* in child */
        lock
        execve
        lock
      
      POSIX locks acquired before execve (by "child", "parent" or any further
      task sharing files_struct) will after the execve be owned exclusively by
      "child".
      
      According to Chris Wright some LSB/LTP kind of suite triggers without the
      stealing behavior, but there's no known real-world application that would
      also fail.
      
      Apps using NPTL are not affected, since all other threads are killed before
      execve.
      
      Apps using LinuxThreads are only affected if they
      
        - have multiple threads during exec (LinuxThreads doesn't kill other
          threads, the app may do it with pthread_kill_other_threads_np())
        - rely on POSIX locks being inherited across exec
      
      Both conditions are documented, but not their interaction.
      
      Apps using clone() natively are affected if they
      
        - use clone(CLONE_FILES)
        - rely on POSIX locks being inherited across exec
      
      The above scenarios are unlikely, but possible.
      
      If the patch is vetoed, there's a plan B, that involves mostly keeping the
      weird stealing semantics, but changing the way lock ownership is handled so
      that network and local filesystems work consistently.
      
      That would add more complexity though, so this solution seems to be
      preferred by most people.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Steven French <sfrench@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c89681ed
    • O
      [PATCH] Fix a race condition between ->i_mapping and iput() · 09d967c6
      OGAWA Hirofumi 提交于
      This race became a cause of oops, and can reproduce by the following.
      
          while true; do
      	dd if=/dev/zero of=/dev/.static/dev/hdg1 bs=512 count=1000 & sync
          done
      
      This race condition was between __sync_single_inode() and iput().
      
                cpu0 (fs's inode)                 cpu1 (bdev's inode)
                -----------------                 -------------------
                                             close("/dev/hda2")
                                             [...]
      __sync_single_inode()
         /* copy the bdev's ->i_mapping */
         mapping = inode->i_mapping;
      
                                             generic_forget_inode()
                                                bdev_clear_inode()
      					     /* restre the fs's ->i_mapping */
      				             inode->i_mapping = &inode->i_data;
      				          /* bdev's inode was freed */
                                                destroy_inode(inode);
      
         if (wait) {
            /* dereference a freed bdev's mapping->host */
            filemap_fdatawait(mapping);  /* Oops */
      
      Since __sync_single_inode() is only taking a ref-count of fs's inode, the
      another process can be close() and freeing the bdev's inode while writing
      fs's inode.  So, __sync_signle_inode() accesses the freed ->i_mapping,
      oops.
      
      This patch takes a ref-count on the bdev's inode for the fs's inode before
      setting a ->i_mapping, and the clear_inode() of the fs's inode does iput() on
      the bdev's inode.  So if the fs's inode is still living, bdev's inode
      shouldn't be freed.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      09d967c6
    • A
      [PATCH] NTFS: Critical bug fix (affects MIPS and possibly others) · f893afbe
      Anton Altaparmakov 提交于
      Many thanks to Pauline Ng for the detailed bug report and analysis!
      Signed-off-by: NAnton Altaparmakov <aia21@cantab.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f893afbe
  2. 22 6月, 2006 1 次提交
  3. 20 6月, 2006 11 次提交
  4. 19 6月, 2006 7 次提交
  5. 18 6月, 2006 2 次提交