1. 16 1月, 2011 4 次提交
    • D
      Unexport do_add_mount() and add in follow_automount(), not ->d_automount() · ea5b778a
      David Howells 提交于
      Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
      added rather than calling do_add_mount() itself.  follow_automount() will then
      do the addition.
      
      This slightly complicates things as ->d_automount() normally wants to add the
      new vfsmount to an expiration list and start an expiration timer.  The problem
      with that is that the vfsmount will be deleted if it has a refcount of 1 and
      the timer will not repeat if the expiration list is empty.
      
      To this end, we require the vfsmount to be returned from d_automount() with a
      refcount of (at least) 2.  One of these refs will be dropped unconditionally.
      In addition, follow_automount() must get a 3rd ref around the call to
      do_add_mount() lest it eat a ref and return an error, leaving the mount we
      have open to being expired as we would otherwise have only 1 ref on it.
      
      d_automount() should also add the the vfsmount to the expiration list (by
      calling mnt_set_expiry()) and start the expiration timer before returning, if
      this mechanism is to be used.  The vfsmount will be unlinked from the
      expiration list by follow_automount() if do_add_mount() fails.
      
      This patch also fixes the call to do_add_mount() for AFS to propagate the mount
      flags from the parent vfsmount.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ea5b778a
    • D
      Allow d_manage() to be used in RCU-walk mode · ab90911f
      David Howells 提交于
      Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
      as when it is in Ref-walk mode.  This permits __follow_mount_rcu() to call
      d_manage() directly.  d_manage() needs a parameter to indicate that it is in
      RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
      -ECHILD instead).
      
      autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
      accesses it and otherwise request dropping back to ref-walk mode.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ab90911f
    • D
      Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53
      David Howells 提交于
      Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
      sleep when it tries to transit away from one of that filesystem's directories
      during a pathwalk.  The operation is keyed off a new dentry flag
      (DCACHE_MANAGE_TRANSIT).
      
      The filesystem is allowed to be selective about which processes it holds and
      which it permits to continue on or prohibits from transiting from each flagged
      directory.  This will allow autofs to hold up client processes whilst letting
      its userspace daemon through to maintain the directory or the stuff behind it
      or mounted upon it.
      
      The ->d_manage() dentry operation:
      
      	int (*d_manage)(struct path *path, bool mounting_here);
      
      takes a pointer to the directory about to be transited away from and a flag
      indicating whether the transit is undertaken by do_add_mount() or
      do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
      
      It should return 0 if successful and to let the process continue on its way;
      -EISDIR to prohibit the caller from skipping to overmounted filesystems or
      automounting, and to use this directory; or some other error code to return to
      the user.
      
      ->d_manage() is called with namespace_sem writelocked if mounting_here is true
      and no other locks held, so it may sleep.  However, if mounting_here is true,
      it may not initiate or wait for a mount or unmount upon the parameter
      directory, even if the act is actually performed by userspace.
      
      Within fs/namei.c, follow_managed() is extended to check with d_manage() first
      on each managed directory, before transiting away from it or attempting to
      automount upon it.
      
      follow_down() is renamed follow_down_one() and should only be used where the
      filesystem deliberately intends to avoid management steps (e.g. autofs).
      
      A new follow_down() is added that incorporates the loop done by all other
      callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
      and CIFS do use it, their use is removed by converting them to use
      d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
      also takes an extra parameter to indicate if it is being called from mount code
      (with namespace_sem writelocked) which it passes to d_manage().  follow_down()
      ignores automount points so that it can be used to mount on them.
      
      __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
      DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
      sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
      that determine whether to abort or not itself.  That would allow the autofs
      daemon to continue on in rcu-walk mode.
      
      Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
      required as every tranist from that directory will cause d_manage() to be
      invoked.  It can always be set again when necessary.
      
      ==========================
      WHAT THIS MEANS FOR AUTOFS
      ==========================
      
      Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
      trigger the automounting of indirect mounts, and both of these can be called
      with i_mutex held.
      
      autofs knows that the i_mutex will be held by the caller in lookup(), and so
      can drop it before invoking the daemon - but this isn't so for d_revalidate(),
      since the lock is only held on _some_ of the code paths that call it.  This
      means that autofs can't risk dropping i_mutex from its d_revalidate() function
      before it calls the daemon.
      
      The bug could manifest itself as, for example, a process that's trying to
      validate an automount dentry that gets made to wait because that dentry is
      expired and needs cleaning up:
      
      	mkdir         S ffffffff8014e05a     0 32580  24956
      	Call Trace:
      	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
      	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
      	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
      	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
      	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
      	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
      	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
      	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4
      
      versus the automount daemon which wants to remove that dentry, but can't
      because the normal process is holding the i_mutex lock:
      
      	automount     D ffffffff8014e05a     0 32581      1              32561
      	Call Trace:
      	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
      	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
      	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
      	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
      	 [<ffffffff8005d229>] tracesys+0x71/0xe0
      	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
      
      which means that the system is deadlocked.
      
      This patch allows autofs to hold up normal processes whilst the daemon goes
      ahead and does things to the dentry tree behind the automouter point without
      risking a deadlock as almost no locks are held in d_manage() and none in
      d_automount().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Was-Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cc53ce53
    • D
      Add a dentry op to handle automounting rather than abusing follow_link() · 9875cf80
      David Howells 提交于
      Add a dentry op (d_automount) to handle automounting directories rather than
      abusing the follow_link() inode operation.  The operation is keyed off a new
      dentry flag (DCACHE_NEED_AUTOMOUNT).
      
      This also makes it easier to add an AT_ flag to suppress terminal segment
      automount during pathwalk and removes the need for the kludge code in the
      pathwalk algorithm to handle directories with follow_link() semantics.
      
      The ->d_automount() dentry operation:
      
      	struct vfsmount *(*d_automount)(struct path *mountpoint);
      
      takes a pointer to the directory to be mounted upon, which is expected to
      provide sufficient data to determine what should be mounted.  If successful, it
      should return the vfsmount struct it creates (which it should also have added
      to the namespace using do_add_mount() or similar).  If there's a collision with
      another automount attempt, NULL should be returned.  If the directory specified
      by the parameter should be used directly rather than being mounted upon,
      -EISDIR should be returned.  In any other case, an error code should be
      returned.
      
      The ->d_automount() operation is called with no locks held and may sleep.  At
      this point the pathwalk algorithm will be in ref-walk mode.
      
      Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
      added to handle mountpoints.  It will return -EREMOTE if the automount flag was
      set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
      symlinks or mountpoints, -EISDIR if the walk point should be used without
      mounting and 0 if successful.  The path will be updated to point to the mounted
      filesystem if a successful automount took place.
      
      __follow_mount() is replaced by follow_managed() which is more generic
      (especially with the patch that adds ->d_manage()).  This handles transits from
      directories during pathwalk, including automounting and skipping over
      mountpoints (and holding processes with the next patch).
      
      __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
      automount point with nothing mounted on it.
      
      follow_dotdot*() does not handle automounts as you don't want to trigger them
      whilst following "..".
      
      I've also extracted the mount/don't-mount logic from autofs4 and included it
      here.  It makes the mount go ahead anyway if someone calls open() or creat(),
      tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
      or sticks a '/' on the end of the pathname.  If they do a stat(), however,
      they'll only trigger the automount if they didn't also say O_NOFOLLOW.
      
      I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
      inodes as automount points.  This flag is automatically propagated to the
      dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate().  This saves NFS and could
      save AFS a private flag bit apiece, but is not strictly necessary.  It would be
      preferable to do the propagation in d_set_d_op(), but that doesn't normally
      have access to the inode.
      
      [AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
      succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
      that, rather than just returning with ungrabbed *path]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Was-Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9875cf80
  2. 14 1月, 2011 1 次提交
  3. 07 1月, 2011 5 次提交
  4. 03 12月, 2010 1 次提交
  5. 02 12月, 2010 1 次提交
    • L
      Call the filesystem back whenever a page is removed from the page cache · 6072d13c
      Linus Torvalds 提交于
      NFS needs to be able to release objects that are stored in the page
      cache once the page itself is no longer visible from the page cache.
      
      This patch adds a callback to the address space operations that allows
      filesystems to perform page cleanups once the page has been removed
      from the page cache.
      
      Original patch by: Linus Torvalds <torvalds@linux-foundation.org>
      [trondmy: cover the cases of invalidate_inode_pages2() and
                truncate_inode_pages()]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6072d13c
  6. 14 8月, 2010 1 次提交
  7. 28 5月, 2010 2 次提交
    • N
      fs: introduce new truncate sequence · 7bb46a67
      npiggin@suse.de 提交于
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
      setattr > vmtruncate > truncate, have filesystems call their truncate sequence
      from ->setattr if filesystem specific operations are required. vmtruncate is
      deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
      previously should be used.
      
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence. Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      away.
      
      simple_setsize is also introduced to perform just the ATTR_SIZE portion
      of simple_setattr (ie. changing i_size and trimming pagecache).
      
      To implement the new truncate sequence:
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - convert usage of helpers block_write_begin, nobh_write_begin,
        cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
        variants. These avoid calling vmtruncate to trim blocks (see previous).
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      - make use of the better opportunity to handle errors with the new sequence.
      
      Big problem with the previous calling sequence: the filesystem is not called
      until i_size has already changed.  This means it is not allowed to fail the
      call, and also it does not know what the previous i_size was. Also, generic
      code calling vmtruncate to truncate allocated blocks in case of error had
      no good way to return a meaningful error (or, for example, atomically handle
      block deallocation).
      
      Cc: Christoph Hellwig <hch@lst.de>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7bb46a67
    • C
      drop unused dentry argument to ->fsync · 7ea80859
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7ea80859
  8. 23 4月, 2010 1 次提交
  9. 10 12月, 2009 1 次提交
  10. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Define a new error_remove_page address space op for async truncation · 25718736
      Andi Kleen 提交于
      Truncating metadata pages is not safe right now before
      we haven't audited all file systems.
      
      To enable truncation only for data address space define
      a new address_space callback error_remove_page.
      
      This is used for memory_failure.c memory error handling.
      
      This can be then set to truncate_inode_page()
      
      This patch just defines the new operation and adds documentation.
      
      Callers and users come in followon patches.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      25718736
  11. 21 4月, 2009 1 次提交
  12. 10 1月, 2009 1 次提交
    • T
      filesystem freeze: add error handling of write_super_lockfs/unlockfs · c4be0c1d
      Takashi Sato 提交于
      Currently, ext3 in mainline Linux doesn't have the freeze feature which
      suspends write requests.  So, we cannot take a backup which keeps the
      filesystem's consistency with the storage device's features (snapshot and
      replication) while it is mounted.
      
      In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
      and it would be used to get the consistent backup.
      
      If Linux's standard filesystem ext3 has the freeze feature, we can do it
      without a commercial filesystem.
      
      So I have implemented the ioctls of the freeze feature.
      I think we can take the consistent backup with the following steps.
      1. Freeze the filesystem with the freeze ioctl.
      2. Separate the replication volume or create the snapshot
         with the storage device's feature.
      3. Unfreeze the filesystem with the unfreeze ioctl.
      4. Take the backup from the separated replication volume
         or the snapshot.
      
      This patch:
      
      VFS:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they can return an error.
      Rename write_super_lockfs and unlockfs of the super block operation
      freeze_fs and unfreeze_fs to avoid a confusion.
      
      ext3, ext4, xfs, gfs2, jfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that write_super_lockfs returns an error if needed,
      and unlockfs always returns 0.
      
      reiserfs:
      Changed the type of write_super_lockfs and unlockfs from "void"
      to "int" so that they always return 0 (success) to keep a current behavior.
      Signed-off-by: NTakashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: NMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
      Cc: <xfs-masters@oss.sgi.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4be0c1d
  13. 01 1月, 2009 2 次提交
  14. 31 10月, 2008 1 次提交
  15. 27 7月, 2008 1 次提交
  16. 07 5月, 2008 1 次提交
    • C
      [PATCH] kill ->put_inode · 33dcdac2
      Christoph Hellwig 提交于
      And with that last patch to affs killing the last put_inode instance we
      can finally, after many years of transition kill this racy and awkward
      interface.
      
      (It's kinda funny that even the description in
      Documentation/filesystems/vfs.txt was entirely wrong..)
      
      Also remove a very misleading comment above the defintion of
      struct super_operations.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      33dcdac2
  17. 09 2月, 2008 1 次提交
    • M
      mount options: add documentation · f84e3f52
      Miklos Szeredi 提交于
      This series addresses the problem of showing mount options in
      /proc/mounts.
      
      Several filesystems which use mount options, have not implemented a
      .show_options superblock operation.  Several others have implemented
      this callback, but have not kept it fully up to date with the parsed
      options.
      
      Q: Why do we need correct option showing in /proc/mounts?
      A: We want /proc/mounts to fully replace /etc/mtab.  The reasons for
         this are:
          - unprivileged mounters won't be able to update /etc/mtab
          - /etc/mtab doesn't work with private mount namespaces
          - /etc/mtab can become out-of-sync with reality
      
      Q: Can't this be done, so that filesystems need not bother with
         implementing a .show_mounts callback, and keeping it up to date?
      A: Only in some cases.  Certain filesystems allow modification of a
         subset of options in their remount_fs method.  It is not possible
         to take this into account without knowing exactly how the
         filesystem handles options.
      
      For the simple case (no remount or remount resets all options) the
      patchset introduces two helpers:
      
        generic_show_options()
        save_mount_options()
      
      These can also be used to emulate the old /etc/mtab behavior, until
      proper support is added.  Even if this is not 100% correct, it's still
      better than showing no options at all.
      
      The following patches fix up most in-tree filesystems, some have been
      compile tested only, some have been reviewed and acked by the
      maintainer.
      
      Table displaying status of all in-kernel filesystems:
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      legend:
      
        none - fs has options, but doesn't define ->show_options()
        some - fs defines ->show_options(), but some only options are shown
        good - fs shows all options
        noopt - fs does not have options
        patch - a patch will be posted
        merged - a patch has been merged by subsystem maintainer
      
      9p          good
      adfs        patch
      affs        patch
      afs         patch
      autofs      patch
      autofs4     patch
      befs        patch
      bfs         noopt
      cifs        some
      coda        noopt
      configfs    noopt
      cramfs      noopt
      debugfs     noopt
      devpts      patch
      ecryptfs    good
      efs         noopt
      ext2        patch
      ext3        good
      ext4        merged
      fat         patch
      freevxfs    noopt
      fuse        patch
      fusectl	    noopt
      gfs2        good
      gfs2meta    noopt
      hfs         good
      hfsplus     good
      hostfs      patch
      hpfs        patch
      hppfs       noopt
      hugetlbfs   patch
      isofs       patch
      jffs2       noopt
      jfs         merged
      minix       noopt
      msdos       ->fat
      ncpfs       patch
      nfs         some
      nfsd        noopt
      ntfs        good
      ocfs2       good
      ocfs2/dlmfs noopt
      openpromfs  noopt
      proc        noopt
      qnx4        noopt
      ramfs       noopt
      reiserfs    patch
      romfs       noopt
      smbfs       good
      sysfs       noopt
      sysv        noopt
      udf         patch
      ufs         good
      vfat        ->fat
      xfs         good
      
      mm/shmem.c                                    patch
      drivers/oprofile/oprofilefs.c                 noopt
      drivers/infiniband/hw/ipath/ipath_fs.c        noopt
      drivers/misc/ibmasm/ibmasmfs.c                noopt
      drivers/usb/core (usbfs)                      merged
      drivers/usb/gadget (gadgetfs)                 noopt
      drivers/isdn/capi/capifs.c                    patch
      kernel/cpuset.c                               noopt
      fs/binfmt_misc.c                              noopt
      net/sunrpc/rpc_pipe.c                         noopt
      arch/powerpc/platforms/cell/spufs             patch
      arch/s390/hypfs                               good
      ipc/mqueue.c                                  noopt
      security (securityfs)                         noopt
      security/selinux/selinuxfs.c                  noopt
      kernel/cgroup.c                               good
      security/smack/smackfs.c                      noopt
      
      in -mm:
      
      reiser4     some
      unionfs     good
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      
      This patch:
      
      Document the rules for handling mount options in the .show_options
      super operation.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f84e3f52
  18. 08 2月, 2008 1 次提交
    • D
      iget: remove iget() and the read_inode() super op as being obsolete · 12debc42
      David Howells 提交于
      Remove the old iget() call and the read_inode() superblock operation it uses
      as these are really obsolete, and the use of read_inode() does not produce
      proper error handling (no distinction between ENOMEM and EIO when marking an
      inode bad).
      
      Furthermore, this removes the temptation to use iget() to find an inode by
      number in a filesystem from code outside that filesystem.
      
      iget_locked() should be used instead.  A new function is added in an earlier
      patch (iget_failed) that is to be called to mark an inode as bad, unlock it
      and release it should the get routine fail.  Mark iget() and read_inode() as
      being obsolete and remove references to them from the documentation.
      
      Typically a filesystem will be modified such that the read_inode function
      becomes an internal iget function, for example the following:
      
      	void thingyfs_read_inode(struct inode *inode)
      	{
      		...
      	}
      
      would be changed into something like:
      
      	struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
      	{
      		struct inode *inode;
      		int ret;
      
      		inode = iget_locked(sb, ino);
      		if (!inode)
      			return ERR_PTR(-ENOMEM);
      		if (!(inode->i_state & I_NEW))
      			return inode;
      
      		...
      		unlock_new_inode(inode);
      		return inode;
      	error:
      		iget_failed(inode);
      		return ERR_PTR(ret);
      	}
      
      and then thingyfs_iget() would be called rather than iget(), for example:
      
      	ret = -EINVAL;
      	inode = iget(sb, ino);
      	if (!inode || is_bad_inode(inode))
      		goto error;
      
      becomes:
      
      	inode = thingyfs_iget(sb, ino);
      	if (IS_ERR(inode)) {
      		ret = PTR_ERR(inode);
      		goto error;
      	}
      
      Note that is_bad_inode() does not need to be called.  The error returned by
      thingyfs_iget() should render it unnecessary.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12debc42
  19. 20 10月, 2007 1 次提交
  20. 17 10月, 2007 2 次提交
  21. 17 7月, 2007 2 次提交
  22. 09 5月, 2007 1 次提交
    • E
      VFS: delay the dentry name generation on sockets and pipes · c23fbb6b
      Eric Dumazet 提交于
      1) Introduces a new method in 'struct dentry_operations'.  This method
         called d_dname() might be called from d_path() to build a pathname for
         special filesystems.  It is called without locks.
      
         Future patches (if we succeed in having one common dentry for all
         pipes/sockets) may need to change prototype of this method, but we now
         use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);
      
      2) Adds a dynamic_dname() helper function that eases d_dname() implementations
      
      3) Defines d_dname method for sockets : No more sprintf() at socket
         creation.  This is delayed up to the moment someone does an access to
         /proc/pid/fd/...
      
      4) Defines d_dname method for pipes : No more sprintf() at pipe
         creation.  This is delayed up to the moment someone does an access to
         /proc/pid/fd/...
      
      A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
      *nice* speedup on my Pentium(M) 1.6 Ghz :
      
      3.090 s instead of 3.450 s
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c23fbb6b
  23. 21 2月, 2007 1 次提交
    • N
      [PATCH] fs: fix libfs data leak · 955eff5a
      Nick Piggin 提交于
      simple_prepare_write leaks uninitialised kernel data.  This happens because
      the it leaves an uninitialised "hole" over the part of the page that the
      write is expected to go to.  This is fine, but it then marks the page
      uptodate, which means a concurrent read can come in and copy the
      uninitialised memory into userspace before it written to.
      
      Fix it by simply marking it uptodate in simple_commit_write instead, after
      the hole has been filled in.  This could theoretically break an fs that
      uses simple_prepare_write and not simple_commit_write, and that relies on
      the incorrect simple_prepare_write behaviour.  Luckily, none of those
      exists in the tree.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      955eff5a
  24. 04 10月, 2006 1 次提交
  25. 01 10月, 2006 1 次提交
  26. 11 7月, 2006 1 次提交
  27. 23 6月, 2006 2 次提交
    • D
      [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry · 726c3342
      David Howells 提交于
      Give the statfs superblock operation a dentry pointer rather than a superblock
      pointer.
      
      This complements the get_sb() patch.  That reduced the significance of
      sb->s_root, allowing NFS to place a fake root there.  However, NFS does
      require a dentry to use as a target for the statfs operation.  This permits
      the root in the vfsmount to be used instead.
      
      linux/mount.h has been added where necessary to make allyesconfig build
      successfully.
      
      Interest has also been expressed for use with the FUSE and XFS filesystems.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c3342
    • D
      [PATCH] VFS: Permit filesystem to override root dentry on mount · 454e2398
      David Howells 提交于
      Extend the get_sb() filesystem operation to take an extra argument that
      permits the VFS to pass in the target vfsmount that defines the mountpoint.
      
      The filesystem is then required to manually set the superblock and root dentry
      pointers.  For most filesystems, this should be done with simple_set_mnt()
      which will set the superblock pointer and then set the root dentry to the
      superblock's s_root (as per the old default behaviour).
      
      The get_sb() op now returns an integer as there's now no need to return the
      superblock pointer.
      
      This patch permits a superblock to be implicitly shared amongst several mount
      points, such as can be done with NFS to avoid potential inode aliasing.  In
      such a case, simple_set_mnt() would not be called, and instead the mnt_root
      and mnt_sb would be set directly.
      
      The patch also makes the following changes:
      
       (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
           pointer argument and return an integer, so most filesystems have to change
           very little.
      
       (*) If one of the convenience function is not used, then get_sb() should
           normally call simple_set_mnt() to instantiate the vfsmount. This will
           always return 0, and so can be tail-called from get_sb().
      
       (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
           dcache upon superblock destruction rather than shrink_dcache_anon().
      
           This is required because the superblock may now have multiple trees that
           aren't actually bound to s_root, but that still need to be cleaned up. The
           currently called functions assume that the whole tree is rooted at s_root,
           and that anonymous dentries are not the roots of trees which results in
           dentries being left unculled.
      
           However, with the way NFS superblock sharing are currently set to be
           implemented, these assumptions are violated: the root of the filesystem is
           simply a dummy dentry and inode (the real inode for '/' may well be
           inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
           with child trees.
      
           [*] Anonymous until discovered from another tree.
      
       (*) The documentation has been adjusted, including the additional bit of
           changing ext2_* into foo_* in the documentation.
      
      [akpm@osdl.org: convert ipath_fs, do other stuff]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454e2398
  28. 11 4月, 2006 1 次提交