1. 16 1月, 2011 1 次提交
    • D
      Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53
      David Howells 提交于
      Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
      sleep when it tries to transit away from one of that filesystem's directories
      during a pathwalk.  The operation is keyed off a new dentry flag
      (DCACHE_MANAGE_TRANSIT).
      
      The filesystem is allowed to be selective about which processes it holds and
      which it permits to continue on or prohibits from transiting from each flagged
      directory.  This will allow autofs to hold up client processes whilst letting
      its userspace daemon through to maintain the directory or the stuff behind it
      or mounted upon it.
      
      The ->d_manage() dentry operation:
      
      	int (*d_manage)(struct path *path, bool mounting_here);
      
      takes a pointer to the directory about to be transited away from and a flag
      indicating whether the transit is undertaken by do_add_mount() or
      do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
      
      It should return 0 if successful and to let the process continue on its way;
      -EISDIR to prohibit the caller from skipping to overmounted filesystems or
      automounting, and to use this directory; or some other error code to return to
      the user.
      
      ->d_manage() is called with namespace_sem writelocked if mounting_here is true
      and no other locks held, so it may sleep.  However, if mounting_here is true,
      it may not initiate or wait for a mount or unmount upon the parameter
      directory, even if the act is actually performed by userspace.
      
      Within fs/namei.c, follow_managed() is extended to check with d_manage() first
      on each managed directory, before transiting away from it or attempting to
      automount upon it.
      
      follow_down() is renamed follow_down_one() and should only be used where the
      filesystem deliberately intends to avoid management steps (e.g. autofs).
      
      A new follow_down() is added that incorporates the loop done by all other
      callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
      and CIFS do use it, their use is removed by converting them to use
      d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
      also takes an extra parameter to indicate if it is being called from mount code
      (with namespace_sem writelocked) which it passes to d_manage().  follow_down()
      ignores automount points so that it can be used to mount on them.
      
      __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
      DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
      sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
      that determine whether to abort or not itself.  That would allow the autofs
      daemon to continue on in rcu-walk mode.
      
      Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
      required as every tranist from that directory will cause d_manage() to be
      invoked.  It can always be set again when necessary.
      
      ==========================
      WHAT THIS MEANS FOR AUTOFS
      ==========================
      
      Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
      trigger the automounting of indirect mounts, and both of these can be called
      with i_mutex held.
      
      autofs knows that the i_mutex will be held by the caller in lookup(), and so
      can drop it before invoking the daemon - but this isn't so for d_revalidate(),
      since the lock is only held on _some_ of the code paths that call it.  This
      means that autofs can't risk dropping i_mutex from its d_revalidate() function
      before it calls the daemon.
      
      The bug could manifest itself as, for example, a process that's trying to
      validate an automount dentry that gets made to wait because that dentry is
      expired and needs cleaning up:
      
      	mkdir         S ffffffff8014e05a     0 32580  24956
      	Call Trace:
      	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
      	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
      	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
      	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
      	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
      	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
      	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
      	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4
      
      versus the automount daemon which wants to remove that dentry, but can't
      because the normal process is holding the i_mutex lock:
      
      	automount     D ffffffff8014e05a     0 32581      1              32561
      	Call Trace:
      	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
      	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
      	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
      	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
      	 [<ffffffff8005d229>] tracesys+0x71/0xe0
      	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
      
      which means that the system is deadlocked.
      
      This patch allows autofs to hold up normal processes whilst the daemon goes
      ahead and does things to the dentry tree behind the automouter point without
      risking a deadlock as almost no locks are held in d_manage() and none in
      d_automount().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Was-Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cc53ce53
  2. 07 1月, 2011 1 次提交
  3. 20 12月, 2010 1 次提交
  4. 17 12月, 2010 1 次提交
  5. 26 10月, 2010 1 次提交
  6. 27 8月, 2010 1 次提交
  7. 10 8月, 2010 1 次提交
    • C
      pass a struct path to vfs_statfs · ebabe9a9
      Christoph Hellwig 提交于
      We'll need the path to implement the flags field for statvfs support.
      We do have it available in all callers except:
      
       - ecryptfs_statfs.  This one doesn't actually need vfs_statfs but just
         needs to do a caller to the lower filesystem statfs method.
       - sys_ustat.  Add a non-exported statfs_by_dentry helper for it which
         doesn't won't be able to fill out the flags field later on.
      
      In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
      of the misleading vfs prefix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ebabe9a9
  8. 31 7月, 2010 1 次提交
  9. 30 7月, 2010 1 次提交
  10. 28 7月, 2010 2 次提交
    • E
      fsnotify: pass a file instead of an inode to open, read, and write · 2a12a9d7
      Eric Paris 提交于
      fanotify, the upcoming notification system actually needs a struct path so it can
      do opens in the context of listeners, and it needs a file so it can get f_flags
      from the original process.  Close was the only operation that already was passing
      a struct file to the notification hook.  This patch passes a file for access,
      modify, and open as well as they are easily available to these hooks.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a12a9d7
    • J
      nfsd: bypass readahead cache when have struct file · fa0a2126
      J. Bruce Fields 提交于
      The readahead cache compensates for the fact that the NFS server
      currently does an open and close on every IO operation in the NFSv2 and
      NFSv3 case.
      
      In the NFSv4 case we have long-lived struct files associated with client
      opens, so there's no need for this.  In fact, concurrent IO's using
      trying to modify the same file->f_ra may cause problems.
      
      So, don't bother with the readahead cache in that case.
      
      Note eventually we'll likely do this in the v2/v3 case as well by
      keeping a cache of struct files instead of struct file_ra_state's.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fa0a2126
  11. 23 7月, 2010 1 次提交
  12. 07 7月, 2010 1 次提交
  13. 02 6月, 2010 1 次提交
  14. 22 5月, 2010 1 次提交
  15. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  16. 23 3月, 2010 1 次提交
    • J
      nfsd: don't break lease while servicing a COMMIT · 91885258
      Jeff Layton 提交于
      This is the second attempt to fix the problem whereby a COMMIT call
      causes a lease break and triggers a possible deadlock.
      
      The problem is that nfsd attempts to break a lease on a COMMIT call.
      This triggers a delegation recall if the lease is held for a delegation.
      If the client is the one holding the delegation and it's the same one on
      which it's issuing the COMMIT, then it can't return that delegation
      until the COMMIT is complete. But, nfsd won't complete the COMMIT until
      the delegation is returned. The client and server are essentially
      deadlocked until the state is marked bad (due to the client not
      responding on the callback channel).
      
      The first patch attempted to deal with this by eliminating the open of
      the file altogether and simply had nfsd_commit pass a NULL file pointer
      to the vfs_fsync_range. That would conflict with some work in progress
      by Christoph Hellwig to clean up the fsync interface, so this patch
      takes a different approach.
      
      This declares a new NFSD_MAY_NOT_BREAK_LEASE access flag that indicates
      to nfsd_open that it should not break any leases when opening the file,
      and has nfsd_commit set that flag on the nfsd_open call.
      
      For now, this patch leaves nfsd_commit opening the file with write
      access since I'm not clear on what sort of access would be more
      appropriate.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      91885258
  17. 05 3月, 2010 1 次提交
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
  18. 04 3月, 2010 1 次提交
  19. 21 2月, 2010 1 次提交
    • B
      commit_metadata export operation replacing nfsd_sync_dir · f501912a
      Ben Myers 提交于
      - Add commit_metadata export_operation to allow the underlying filesystem to
      decide how to commit an inode most efficiently.
      
      - Usage of nfsd_sync_dir and write_inode_now has been replaced with the
      commit_metadata function that takes a svc_fh.
      
      - The commit_metadata function calls the commit_metadata export_op if it's
      there, or else falls back to sync_inode instead of fsync and write_inode_now
      because only metadata need be synced here.
      
      - nfsd4_sync_rec_dir now uses vfs_fsync so that commit_metadata can be static
      Signed-off-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      f501912a
  20. 20 2月, 2010 1 次提交
  21. 07 2月, 2010 3 次提交
  22. 30 1月, 2010 1 次提交
  23. 13 1月, 2010 1 次提交
  24. 07 1月, 2010 1 次提交
  25. 17 12月, 2009 2 次提交
  26. 16 12月, 2009 5 次提交
  27. 15 12月, 2009 2 次提交
  28. 14 11月, 2009 1 次提交
  29. 29 9月, 2009 2 次提交
  30. 22 9月, 2009 1 次提交