1. 09 2月, 2008 1 次提交
  2. 20 10月, 2007 5 次提交
  3. 17 10月, 2007 1 次提交
    • K
      writeback: fix periodic superblock dirty inode flushing · 0e0f4fc2
      Ken Chen 提交于
      Current -mm tree has bucketful of bug fixes in periodic writeback path.
      However, we still hit a glitch where dirty pages on a given inode aren't
      completely flushed to the disk, and system will accumulate large amount of
      dirty pages beyond what dirty_expire_interval is designed for.
      
      The problem is __sync_single_inode() will move an inode to sb->s_dirty list
      even when there are more pending dirty pages on that inode.  If there is
      another inode with a small number of dirty pages, we hit a case where the loop
      iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0.
      Thus leaving the inode that has large amount of dirty pages behind and it has
      to wait for another dirty_writeback_interval before we flush it again.  We
      effectively only write out MAX_WRITEBACK_PAGES every dirty_writeback_interval.
      If the rate of dirtying is sufficiently high, the system will start
      accumulate a large number of dirty pages.
      
      So fix it by having another sb->s_more_io list on which to park the inode
      while we iterate through sb->s_io and to allow each dirty inode which resides
      on that sb to have an equal chance of flushing some amount of dirty pages.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e0f4fc2
  4. 17 7月, 2007 1 次提交
    • L
      hugetlbfs: handle empty options string · b4c07bce
      Lee Schermerhorn 提交于
      I was seeing a null pointer deref in fs/super.c:vfs_kern_mount().
      Some file system get_sb() handler was returning NULL mnt_sb with
      a non-negative return value.  I also noticed a "hugetlbfs: Bad
      mount option:" message in the log.
      
      Turns out that hugetlbfs_parse_options() was not checking for an
      empty option string after call to strsep().  On failure,
      hugetlbfs_parse_options() returns 1.  hugetlbfs_fill_super() just
      passed this return code back up the call stack where
      vfs_kern_mount() missed the error and proceeded with a NULL mnt_sb.
      
      Apparently introduced by patch:
      	hugetlbfs-use-lib-parser-fix-docs.patch
      
      The problem was exposed by this line in my fstab:
      
      none        /huge       hugetlbfs   defaults    0 0
      
      It can also be demonstrated by invoking mount of hugetlbfs
      directly with no options or a bogus option.
      
      This patch:
      
      1) adds the check for empty option to hugetlbfs_parse_options(),
      2) enhances the error message to bracket any unrecognized
         option with quotes ,
      3) modifies hugetlbfs_parse_options() to return -EINVAL on any
         unrecognized option,
      4) adds a BUG_ON() to vfs_kern_mount() to catch any get_sb()
         handler that returns a NULL mnt->mnt_sb with a return value
         >= 0.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4c07bce
  5. 09 5月, 2007 1 次提交
    • M
      add filesystem subtype support · 79c0b2df
      Miklos Szeredi 提交于
      There's a slight problem with filesystem type representation in fuse
      based filesystems.
      
      From the kernel's view, there are just two filesystem types: fuse and
      fuseblk.  From the user's view there are lots of different filesystem
      types.  The user is not even much concerned if the filesystem is fuse based
      or not.  So there's a conflict of interest in how this should be
      represented in fstab, mtab and /proc/mounts.
      
      The current scheme is to encode the real filesystem type in the mount
      source.  So an sshfs mount looks like this:
      
        sshfs#user@server:/   /mnt/server    fuse   rw,nosuid,nodev,...
      
      This url-ish syntax works OK for sshfs and similar filesystems.  However
      for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
      the kernel expects the mount source to be a real device name.
      
      A possibly better scheme would be to encode the real type in the type
      field as "type.subtype".  So fuse mounts would look like this:
      
        /dev/hda1       /mnt/windows   fuseblk.ntfs-3g   rw,...
        user@server:/   /mnt/server    fuse.sshfs        rw,nosuid,nodev,...
      
      This patch adds the necessary code to the kernel so that this can be
      correctly displayed in /proc/mounts.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79c0b2df
  6. 28 4月, 2007 1 次提交
  7. 13 2月, 2007 1 次提交
  8. 12 1月, 2007 1 次提交
  9. 09 12月, 2006 1 次提交
  10. 04 12月, 2006 1 次提交
  11. 12 10月, 2006 1 次提交
    • D
      [PATCH] VFS: Destroy the dentries contributed by a superblock on unmounting · c636ebdb
      David Howells 提交于
      The attached patch destroys all the dentries attached to a superblock in one go
      by:
      
       (1) Destroying the tree rooted at s_root.
      
       (2) Destroying every entry in the anon list, one at a time.
      
       (3) Each entry in the anon list has its subtree consumed from the leaves
           inwards.
      
      This reduces the amount of work generic_shutdown_super() does, and avoids
      iterating through the dentry_unused list.
      
      Note that locking is almost entirely absent in the shrink_dcache_for_umount*()
      functions added by this patch.  This is because:
      
       (1) at the point the filesystem calls generic_shutdown_super(), it is not
           permitted to further touch the superblock's set of dentries, and nor may
           it remove aliases from inodes;
      
       (2) the dcache memory shrinker now skips dentries that are being unmounted;
           and
      
       (3) the superblock no longer has any external references through which the VFS
           can reach it.
      
      Given these points, the only locking we need to do is when we remove dentries
      from the unused list and the name hashes, which we do a directory's worth at a
      time.
      
      We also don't need to guard against reference counts going to zero unexpectedly
      and removing bits of the tree we're working on as nothing else can call dput().
      
      A cut down version of dentry_iput() has been folded into
      shrink_dcache_for_umount_subtree() function.  Apart from not needing to unlock
      things, it also doesn't need to check for inotify watches.
      
      In this version of the patch, the complaint about a dentry still being in use
      has been expanded from a single BUG_ON() and now gives much more information.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Acked-by: NIan Kent <raven@themaw.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c636ebdb
  12. 01 10月, 2006 2 次提交
    • D
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells 提交于
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9361401e
    • D
      [PATCH] BLOCK: Move functions out of buffer code [try #6] · cf9a2ae8
      David Howells 提交于
      Move some functions out of the buffering code that aren't strictly buffering
      specific.  This is a precursor to being able to disable the block layer.
      
       (*) Moved some stuff out of fs/buffer.c:
      
           (*) The file sync and general sync stuff moved to fs/sync.c.
      
           (*) The superblock sync stuff moved to fs/super.c.
      
           (*) do_invalidatepage() moved to mm/truncate.c.
      
           (*) try_to_release_page() moved to mm/filemap.c.
      
       (*) Moved some related declarations between header files:
      
           (*) declarations for do_invalidatepage() and try_to_release_page() moved
           	 to linux/mm.h.
      
           (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cf9a2ae8
  13. 30 9月, 2006 1 次提交
  14. 07 9月, 2006 1 次提交
  15. 04 7月, 2006 2 次提交
  16. 01 7月, 2006 1 次提交
  17. 23 6月, 2006 3 次提交
    • D
      [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry · 726c3342
      David Howells 提交于
      Give the statfs superblock operation a dentry pointer rather than a superblock
      pointer.
      
      This complements the get_sb() patch.  That reduced the significance of
      sb->s_root, allowing NFS to place a fake root there.  However, NFS does
      require a dentry to use as a target for the statfs operation.  This permits
      the root in the vfsmount to be used instead.
      
      linux/mount.h has been added where necessary to make allyesconfig build
      successfully.
      
      Interest has also been expressed for use with the FUSE and XFS filesystems.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c3342
    • D
      [PATCH] VFS: Permit filesystem to override root dentry on mount · 454e2398
      David Howells 提交于
      Extend the get_sb() filesystem operation to take an extra argument that
      permits the VFS to pass in the target vfsmount that defines the mountpoint.
      
      The filesystem is then required to manually set the superblock and root dentry
      pointers.  For most filesystems, this should be done with simple_set_mnt()
      which will set the superblock pointer and then set the root dentry to the
      superblock's s_root (as per the old default behaviour).
      
      The get_sb() op now returns an integer as there's now no need to return the
      superblock pointer.
      
      This patch permits a superblock to be implicitly shared amongst several mount
      points, such as can be done with NFS to avoid potential inode aliasing.  In
      such a case, simple_set_mnt() would not be called, and instead the mnt_root
      and mnt_sb would be set directly.
      
      The patch also makes the following changes:
      
       (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
           pointer argument and return an integer, so most filesystems have to change
           very little.
      
       (*) If one of the convenience function is not used, then get_sb() should
           normally call simple_set_mnt() to instantiate the vfsmount. This will
           always return 0, and so can be tail-called from get_sb().
      
       (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
           dcache upon superblock destruction rather than shrink_dcache_anon().
      
           This is required because the superblock may now have multiple trees that
           aren't actually bound to s_root, but that still need to be cleaned up. The
           currently called functions assume that the whole tree is rooted at s_root,
           and that anonymous dentries are not the roots of trees which results in
           dentries being left unculled.
      
           However, with the way NFS superblock sharing are currently set to be
           implemented, these assumptions are violated: the root of the filesystem is
           simply a dummy dentry and inode (the real inode for '/' may well be
           inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
           with child trees.
      
           [*] Anonymous until discovered from another tree.
      
       (*) The documentation has been adjusted, including the additional bit of
           changing ext2_* into foo_* in the documentation.
      
      [akpm@osdl.org: convert ipath_fs, do other stuff]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454e2398
    • N
      [PATCH] Fix dcache race during umount · 0feae5c4
      NeilBrown 提交于
      The race is that the shrink_dcache_memory shrinker could get called while a
      filesystem is being unmounted, and could try to prune a dentry belonging to
      that filesystem.
      
      If it does, then it will call in to iput on the inode while the dentry is
      no longer able to be found by the umounting process.  If iput takes a
      while, generic_shutdown_super could get all the way though
      shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
      ever waiting on this particular inode.
      
      Eventually the superblock gets freed anyway and if the iput tried to touch
      it (which some filesystems certainly do), it will lose.  The promised
      "Self-destruct in 5 seconds" doesn't lead to a nice day.
      
      The race is closed by holding s_umount while calling prune_one_dentry on
      someone else's dentry.  As a down_read_trylock is used,
      shrink_dcache_memory will no longer try to prune the dentry of a filesystem
      that is being unmounted, and unmount will not be able to start until any
      such active prune_one_dentry completes.
      
      This requires that prune_dcache *knows* which filesystem (if any) it is
      doing the prune on behalf of so that it can be careful of other
      filesystems.  shrink_dcache_memory isn't called it on behalf of any
      filesystem, and so is careful of everything.
      
      shrink_dcache_anon is now passed a super_block rather than the s_anon list
      out of the superblock, so it can get the s_anon list itself, and can pass
      the superblock down to prune_dcache.
      
      If prune_dcache finds a dentry that it cannot free, it leaves it where it
      is (at the tail of the list) and exits, on the assumption that some other
      thread will be removing that dentry soon.  To try to make sure that some
      work gets done, a limited number of dnetries which are untouchable are
      skipped over while choosing the dentry to work on.
      
      I believe this race was first found by Kirill Korotaev.
      
      Cc: Jan Blunck <jblunck@suse.de>
      Acked-by: NKirill Korotaev <dev@openvz.org>
      Cc: Olaf Hering <olh@suse.de>
      Acked-by: NBalbir Singh <balbir@in.ibm.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NBalbir Singh <balbir@in.ibm.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0feae5c4
  18. 09 6月, 2006 2 次提交
  19. 27 3月, 2006 1 次提交
    • I
      [PATCH] sem2mutex: fs/ · 353ab6e9
      Ingo Molnar 提交于
      Semaphore to mutex conversion.
      
      The conversion was generated via scripts, and the result was validated
      automatically via a script as well.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
      Cc: Robert Love <rml@tech9.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      353ab6e9
  20. 26 3月, 2006 1 次提交
  21. 24 3月, 2006 1 次提交
    • T
      [PATCH] vfs: MS_VERBOSE should be MS_SILENT · 9b04c997
      Theodore Ts'o 提交于
      The meaning of MS_VERBOSE is backwards; if the bit is set, it really means,
      "don't be verbose".  This is confusing and counter-intuitive.
      
      In addition, there is also no way to set the MS_VERBOSE flag in the
      mount(8) program in util-linux, but interesting, it does define options
      which would do the right thing if MS_SILENT were defined, which
      unfortunately we do not:
      
      #ifdef MS_SILENT
        { "quiet",    0, 0, MS_SILENT    },   /* be quiet  */
        { "loud",     0, 1, MS_SILENT    },   /* print out messages. */
      #endif
      
      So the obvious fix is to deprecate the use of MS_VERBOSE and replace it
      with MS_SILENT.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b04c997
  22. 23 3月, 2006 3 次提交
  23. 23 2月, 2006 1 次提交
    • G
      Revert mount/umount uevent removal · fa675765
      Greg Kroah-Hartman 提交于
      This change reverts the 033b96fd commit
      from Kay Sievers that removed the mount/umount uevents from the kernel.
      Some older versions of HAL still depend on these events to detect when a
      new device has been mounted.  These events are not correctly emitted,
      and are broken by design, and so, should not be relied upon by any
      future program.  Instead, the /proc/mounts file should be polled to
      properly detect this kind of event.
      
      A feature-removal-schedule.txt entry has been added, noting when this
      interface will be removed from the kernel.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      fa675765
  24. 08 2月, 2006 1 次提交
  25. 10 1月, 2006 1 次提交
  26. 09 1月, 2006 1 次提交
  27. 05 1月, 2006 1 次提交
  28. 08 11月, 2005 1 次提交
    • A
      [PATCH] saner handling of auto_acct_off() and DQUOT_OFF() in umount · 7b7b1ace
      Al Viro 提交于
      The way we currently deal with quota and process accounting that might
      keep vfsmount busy at umount time is inherently broken; we try to turn
      them off just in case (not quite correctly, at that) and
      
        a) pray umount doesn't fail (otherwise they'll stay turned off)
        b) pray nobody doesn anything funny just as we turn quota off
      
      Moreover, LSM provides hooks for doing the same sort of broken logics.
      
      The proper way to deal with that is to introduce the second kind of
      reference to vfsmount.  Semantics:
      
       - when the last normal reference is dropped, all special ones are
         converted to normal ones and if there had been any, cleanup is done.
       - normal reference can be cloned into a special one
       - special reference can be converted to normal one; that's a no-op if
         we'd already passed the point of no return (i.e.  mntput() had
         converted special references to normal and started cleanup).
      
      The way it works: e.g. starting process accounting converts the vfsmount
      reference pinned by the opened file into special one and turns it back
      to normal when it gets shut down; acct_auto_close() is done when no
      normal references are left.  That way it does *not* obstruct umount(2)
      and it silently gets turned off when the last normal reference to
      vfsmount is gone.  Which is exactly what we want...
      
      The same should be done by LSM module that holds some internal
      references to vfsmount and wants to shut them down on umount - it should
      make them special and security_sb_umount_close() will be called exactly
      when the last normal reference to vfsmount is gone.
      
      quota handling is even simpler - we don't use normal file IO anymore, so
      there's no need to hold vfsmounts at all.  DQUOT_OFF() is done from
      deactivate_super(), where it really belongs.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7b7b1ace
  29. 07 11月, 2005 1 次提交