1. 20 7月, 2011 3 次提交
    • A
      make exec_permission(dir) really equivalent to inode_permission(dir, MAY_EXEC) · 4cf27141
      Al Viro 提交于
      capability overrides apply only to the default case; if fs has ->permission()
      that does _not_ call generic_permission(), we have no business doing them.
      Moreover, if it has ->permission() that does call generic_permission(), we
      have no need to recheck capabilities.
      
      Besides, the capability overrides should apply only if we got EACCES from
      acl_permission_check(); any other value (-EIO, etc.) should be returned
      to caller, capabilities or not capabilities.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4cf27141
    • J
      fs: add a DCACHE_NEED_LOOKUP flag for d_flags · 44396f4b
      Josef Bacik 提交于
      Btrfs (and I'd venture most other fs's) stores its indexes in nice disk order
      for readdir, but unfortunately in the case of anything that stats the files in
      order that readdir spits back (like oh say ls) that means we still have to do
      the normal lookup of the file, which means looking up our other index and then
      looking up the inode.  What I want is a way to create dummy dentries when we
      find them in readdir so that when ls or anything else subsequently does a
      stat(), we already have the location information in the dentry and can go
      straight to the inode itself.  The lookup stuff just assumes that if it finds a
      dentry it is done, it doesn't perform a lookup.  So add a DCACHE_NEED_LOOKUP
      flag so that the lookup code knows it still needs to run i_op->lookup() on the
      parent to get the inode for the dentry.  I have tested this with btrfs and I
      went from something that looks like this
      
      http://people.redhat.com/jwhiter/ls-noreada.png
      
      To this
      
      http://people.redhat.com/jwhiter/ls-good.png
      
      Thats a savings of 1300 seconds, or 22 minutes.  That is a significant savings.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      44396f4b
    • L
      vfs: fix race in rcu lookup of pruned dentry · 59430262
      Linus Torvalds 提交于
      Don't update *inode in __follow_mount_rcu() until we'd verified that
      there is mountpoint there.  Kudos to Hugh Dickins for catching that
      one in the first place and eventually figuring out the solution (and
      catching a braino in the earlier version of patch).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      59430262
  2. 13 7月, 2011 1 次提交
  3. 20 6月, 2011 2 次提交
  4. 16 6月, 2011 2 次提交
    • A
      VFS: Fix vfsmount overput on simultaneous automount · 8aef1884
      Al Viro 提交于
      [Kudos to dhowells for tracking that crap down]
      
      If two processes attempt to cause automounting on the same mountpoint at the
      same time, the vfsmount holding the mountpoint will be left with one too few
      references on it, causing a BUG when the kernel tries to clean up.
      
      The problem is that lock_mount() drops the caller's reference to the
      mountpoint's vfsmount in the case where it finds something already mounted on
      the mountpoint as it transits to the mounted filesystem and replaces path->mnt
      with the new mountpoint vfsmount.
      
      During a pathwalk, however, we don't take a reference on the vfsmount if it is
      the same as the one in the nameidata struct, but do_add_mount() doesn't know
      this.
      
      The fix is to make sure we have a ref on the vfsmount of the mountpoint before
      calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
      left with an extra ref on the mountpoint vfsmount which needs releasing.
      We can handle that in follow_managed() by not making assumptions about what
      we can and what we cannot get from lookup_mnt() as the current code does.
      
      The callers of follow_managed() expect that reference to path->mnt will be
      grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
      keep track of whether such reference has been grabbed and assume that it'll
      happen in those and only those cases that'll have us return with changed
      path->mnt.  That assumption is almost correct - it breaks in case of
      racing automounts and in even harder to hit race between following a mountpoint
      and a couple of mount --move.  The thing is, we don't need to make that
      assumption at all - after the end of loop in follow_manage() we can check
      if path->mnt has ended up unchanged and do mntput() if needed.
      
      The BUG can be reproduced with the following test program:
      
      	#include <stdio.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <sys/wait.h>
      	int main(int argc, char **argv)
      	{
      		int pid, ws;
      		struct stat buf;
      		pid = fork();
      		stat(argv[1], &buf);
      		if (pid > 0) wait(&ws);
      		return 0;
      	}
      
      and the following procedure:
      
       (1) Mount an NFS volume that on the server has something else mounted on a
           subdirectory.  For instance, I can mount / from my server:
      
      	mount warthog:/ /mnt -t nfs4 -r
      
           On the server /data has another filesystem mounted on it, so NFS will see
           a change in FSID as it walks down the path, and will mark /mnt/data as
           being a mountpoint.  This will cause the automount code to be triggered.
      
           !!! Do not look inside the mounted fs at this point !!!
      
       (2) Run the above program on a file within the submount to generate two
           simultaneous automount requests:
      
      	/tmp/forkstat /mnt/data/testfile
      
       (3) Unmount the automounted submount:
      
      	umount /mnt/data
      
       (4) Unmount the original mount:
      
      	umount /mnt
      
           At this point the kernel should throw a BUG with something like the
           following:
      
      	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]
      
      Note that the bug appears on the root dentry of the original mount, not the
      mountpoint and not the submount because sys_umount() hasn't got to its final
      mntput_no_expire() yet, but this isn't so obvious from the call trace:
      
       [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
       [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
       [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
       [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
       [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
       [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
       [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
       [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
       [<ffffffff811862a8>] mntput+0x3b/0x44
       [<ffffffff81186d87>] release_mounts+0xa2/0xbf
       [<ffffffff811876af>] sys_umount+0x47a/0x4ba
       [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
       [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b
      
      as do_umount() is inlined.  However, you can see release_mounts() in there.
      
      Note also that it may be necessary to have multiple CPU cores to be able to
      trigger this bug.
      Tested-by: NJeff Layton <jlayton@redhat.com>
      Tested-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8aef1884
    • T
      fix wrong iput on d_inode introduced by e6bc45d6 · 50338b88
      Török Edwin 提交于
      Git bisection shows that commit e6bc45d6 causes
      BUG_ONs under high I/O load:
      
      kernel BUG at fs/inode.c:1368!
      [ 2862.501007] Call Trace:
      [ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
      [ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
      [ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
      [ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
      [ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
      [ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b
      
      A reliable way to reproduce this bug is:
      Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
      and apt-get remove openjdk-6-jdk.
      
      The buggy part of the patch is this:
      	struct inode *inode = NULL;
      .....
      -               if (nd.last.name[nd.last.len])
      -                       goto slashes;
                      inode = dentry->d_inode;
      -               if (inode)
      -                       ihold(inode);
      +               if (nd.last.name[nd.last.len] || !inode)
      +                       goto slashes;
      +               ihold(inode)
      ...
      	if (inode)
      		iput(inode);	/* truncate the inode here */
      
      If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
      and dentry->d_inode is non-NULL, then this code now does an additional iput on
      the inode, which is wrong.
      
      Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.
      
      Reference: https://lkml.org/lkml/2011/6/15/50Reported-by: NNorbert Preining <preining@logic.at>
      Reported-by: NTörök Edwin <edwintorok@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      50338b88
  5. 07 6月, 2011 1 次提交
  6. 30 5月, 2011 1 次提交
  7. 27 5月, 2011 3 次提交
  8. 26 5月, 2011 11 次提交
  9. 21 5月, 2011 1 次提交
  10. 14 5月, 2011 1 次提交
    • L
      vfs: micro-optimize acl_permission_check() · 26cf46be
      Linus Torvalds 提交于
      It's a hot function, and we're better off not mixing types in the mask
      calculations.  The compiler just ends up mixing 16-bit and 32-bit
      operations, for no good reason.
      
      So do everything in 'unsigned int' rather than mixing 'unsigned int'
      masking with a 'umode_t' (16-bit) mode variable.
      
      This, together with the parent commit (47a150ed: "Cache user_ns in
      struct cred") makes acl_permission_check() much nicer.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26cf46be
  11. 16 4月, 2011 1 次提交
  12. 31 3月, 2011 1 次提交
  13. 25 3月, 2011 1 次提交
  14. 24 3月, 2011 2 次提交
  15. 23 3月, 2011 1 次提交
  16. 18 3月, 2011 2 次提交
  17. 16 3月, 2011 6 次提交