• A
    VFS: Fix vfsmount overput on simultaneous automount · 8aef1884
    Al Viro 提交于
    [Kudos to dhowells for tracking that crap down]
    
    If two processes attempt to cause automounting on the same mountpoint at the
    same time, the vfsmount holding the mountpoint will be left with one too few
    references on it, causing a BUG when the kernel tries to clean up.
    
    The problem is that lock_mount() drops the caller's reference to the
    mountpoint's vfsmount in the case where it finds something already mounted on
    the mountpoint as it transits to the mounted filesystem and replaces path->mnt
    with the new mountpoint vfsmount.
    
    During a pathwalk, however, we don't take a reference on the vfsmount if it is
    the same as the one in the nameidata struct, but do_add_mount() doesn't know
    this.
    
    The fix is to make sure we have a ref on the vfsmount of the mountpoint before
    calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
    left with an extra ref on the mountpoint vfsmount which needs releasing.
    We can handle that in follow_managed() by not making assumptions about what
    we can and what we cannot get from lookup_mnt() as the current code does.
    
    The callers of follow_managed() expect that reference to path->mnt will be
    grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
    keep track of whether such reference has been grabbed and assume that it'll
    happen in those and only those cases that'll have us return with changed
    path->mnt.  That assumption is almost correct - it breaks in case of
    racing automounts and in even harder to hit race between following a mountpoint
    and a couple of mount --move.  The thing is, we don't need to make that
    assumption at all - after the end of loop in follow_manage() we can check
    if path->mnt has ended up unchanged and do mntput() if needed.
    
    The BUG can be reproduced with the following test program:
    
    	#include <stdio.h>
    	#include <sys/types.h>
    	#include <sys/stat.h>
    	#include <unistd.h>
    	#include <sys/wait.h>
    	int main(int argc, char **argv)
    	{
    		int pid, ws;
    		struct stat buf;
    		pid = fork();
    		stat(argv[1], &buf);
    		if (pid > 0) wait(&ws);
    		return 0;
    	}
    
    and the following procedure:
    
     (1) Mount an NFS volume that on the server has something else mounted on a
         subdirectory.  For instance, I can mount / from my server:
    
    	mount warthog:/ /mnt -t nfs4 -r
    
         On the server /data has another filesystem mounted on it, so NFS will see
         a change in FSID as it walks down the path, and will mark /mnt/data as
         being a mountpoint.  This will cause the automount code to be triggered.
    
         !!! Do not look inside the mounted fs at this point !!!
    
     (2) Run the above program on a file within the submount to generate two
         simultaneous automount requests:
    
    	/tmp/forkstat /mnt/data/testfile
    
     (3) Unmount the automounted submount:
    
    	umount /mnt/data
    
     (4) Unmount the original mount:
    
    	umount /mnt
    
         At this point the kernel should throw a BUG with something like the
         following:
    
    	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]
    
    Note that the bug appears on the root dentry of the original mount, not the
    mountpoint and not the submount because sys_umount() hasn't got to its final
    mntput_no_expire() yet, but this isn't so obvious from the call trace:
    
     [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
     [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
     [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
     [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
     [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
     [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
     [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
     [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
     [<ffffffff811862a8>] mntput+0x3b/0x44
     [<ffffffff81186d87>] release_mounts+0xa2/0xbf
     [<ffffffff811876af>] sys_umount+0x47a/0x4ba
     [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
     [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b
    
    as do_umount() is inlined.  However, you can see release_mounts() in there.
    
    Note also that it may be necessary to have multiple CPU cores to be able to
    trigger this bug.
    Tested-by: NJeff Layton <jlayton@redhat.com>
    Tested-by: NIan Kent <raven@themaw.net>
    Signed-off-by: NDavid Howells <dhowells@redhat.com>
    Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
    8aef1884
namei.c 81.4 KB