1. 03 4月, 2015 3 次提交
    • E
      mnt: Don't propagate umounts in __detach_mounts · 8318e667
      Eric W. Biederman 提交于
      Invoking mount propagation from __detach_mounts is inefficient and
      wrong.
      
      It is inefficient because __detach_mounts already walks the list of
      mounts that where something needs to be done, and mount propagation
      walks some subset of those mounts again.
      
      It is actively wrong because if the dentry that is passed to
      __detach_mounts is not part of the path to a mount that mount should
      not be affected.
      
      change_mnt_propagation(p,MS_PRIVATE) modifies the mount propagation
      tree of a master mount so it's slaves are connected to another master
      if possible.  Which means even removing a mount from the middle of a
      mount tree with __detach_mounts will not deprive any mount propagated
      mount events.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      8318e667
    • E
      mnt: Improve the umount_tree flags · e819f152
      Eric W. Biederman 提交于
      - Remove the unneeded declaration from pnode.h
      - Mark umount_tree static as it has no callers outside of namespace.c
      - Define an enumeration of umount_tree's flags.
      - Pass umount_tree's flags in by name
      
      This removes the magic numbers 0, 1 and 2 making the code a little
      clearer and makes it possible for there to be lazy unmounts that don't
      propagate.  Which is what __detach_mounts actually wants for example.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e819f152
    • E
      mnt: Use hlist_move_list in namespace_unlock · a3b3c562
      Eric W. Biederman 提交于
      Small cleanup to make the code more readable and maintainable.
      Signed-off-by: NEric Biederman <ebiederm@xmission.com>
      a3b3c562
  2. 23 2月, 2015 1 次提交
    • D
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells 提交于
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  3. 14 2月, 2015 1 次提交
  4. 26 1月, 2015 1 次提交
  5. 19 12月, 2014 1 次提交
    • E
      mnt: Fix a memory stomp in umount · c297abfd
      Eric W. Biederman 提交于
      While reviewing the code of umount_tree I realized that when we append
      to a preexisting unmounted list we do not change pprev of the former
      first item in the list.
      
      Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
      the former first item of the list will stomp unmounted.first leaving
      it set to some random mount point which we are likely to free soon.
      
      This isn't likely to hit, but if it does I don't know how anyone could
      track it down.
      
      [ This happened because we don't have all the same operations for
        hlist's as we do for normal doubly-linked lists. In particular,
        list_splice() is easy on our standard doubly-linked lists, while
        hlist_splice() doesn't exist and needs both start/end entries of the
        hlist.  And commit 38129a13 incorrectly open-coded that missing
        hlist_splice().
      
        We should think about making these kinds of "mindless" conversions
        easier to get right by adding the missing hlist helpers   - Linus ]
      
      Fixes: 38129a13 switch mnt_hash to hlist
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c297abfd
  6. 11 12月, 2014 1 次提交
    • A
      take the targets of /proc/*/ns/* symlinks to separate fs · e149ed2b
      Al Viro 提交于
      New pseudo-filesystem: nsfs.  Targets of /proc/*/ns/* live there now.
      It's not mountable (not even registered, so it's not in /proc/filesystems,
      etc.).  Files on it *are* bindable - we explicitly permit that in do_loopback().
      
      This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
      get_proc_ns() is a macro now (it's simply returning ->i_private; would
      have been an inline, if not for header ordering headache).
      proc_ns_inode() is an ex-parrot.  The interface used in procfs is
      ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).
      
      Dentries and inodes are never hashed; a non-counting reference to dentry
      is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
      if present.  See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
      of that mechanism.
      
      As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
      it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets
      from ns_get_path().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e149ed2b
  7. 05 12月, 2014 6 次提交
  8. 03 12月, 2014 6 次提交
    • E
      mnt: Clear mnt_expire during pivot_root · 4fed655c
      Eric W. Biederman 提交于
      When inspecting the pivot_root and the current mount expiry logic I
      realized that pivot_root fails to clear like mount move does.
      
      Add the missing line in case someone does the interesting feat of
      moving an expirable submount.  This gives a strong guarantee that root
      of the filesystem tree will never expire.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      4fed655c
    • E
      mnt: Carefully set CL_UNPRIVILEGED in clone_mnt · 381cacb1
      Eric W. Biederman 提交于
      old->mnt_expiry should be ignored unless CL_EXPIRE is set.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      381cacb1
    • E
      mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. · 8486a788
      Eric W. Biederman 提交于
      Clear MNT_LOCKED in the callers of copy_tree except copy_mnt_ns, and
      collect_mounts.  In copy_mnt_ns it is necessary to create an exact
      copy of a mount tree, so not clearing MNT_LOCKED is important.
      Similarly collect_mounts is used to take a snapshot of the mount tree
      for audit logging purposes and auditing using a faithful copy of the
      tree is important.
      
      This becomes particularly significant when we start setting MNT_LOCKED
      on rootfs to prevent it from being unmounted.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      8486a788
    • E
      umount: Do not allow unmounting rootfs. · da362b09
      Eric W. Biederman 提交于
      Andrew Vagin <avagin@parallels.com> writes:
      
      > #define _GNU_SOURCE
      > #include <sys/types.h>
      > #include <sys/stat.h>
      > #include <fcntl.h>
      > #include <sched.h>
      > #include <unistd.h>
      > #include <sys/mount.h>
      >
      > int main(int argc, char **argv)
      > {
      > 	int fd;
      >
      > 	fd = open("/proc/self/ns/mnt", O_RDONLY);
      > 	if (fd < 0)
      > 	   return 1;
      > 	   while (1) {
      > 	   	 if (umount2("/", MNT_DETACH) ||
      > 		        setns(fd, CLONE_NEWNS))
      > 					break;
      > 					}
      >
      > 					return 0;
      > }
      >
      > root@ubuntu:/home/avagin# gcc -Wall nsenter.c -o nsenter
      > root@ubuntu:/home/avagin# strace ./nsenter
      > execve("./nsenter", ["./nsenter"], [/* 22 vars */]) = 0
      > ...
      > open("/proc/self/ns/mnt", O_RDONLY)     = 3
      > umount("/", MNT_DETACH)                 = 0
      > setns(3, 131072)                        = 0
      > umount("/", MNT_DETACH
      >
      causes:
      
      > [  260.548301] ------------[ cut here ]------------
      > [  260.550941] kernel BUG at /build/buildd/linux-3.13.0/fs/pnode.c:372!
      > [  260.552068] invalid opcode: 0000 [#1] SMP
      > [  260.552068] Modules linked in: xt_CHECKSUM iptable_mangle xt_tcpudp xt_addrtype xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison iptable_filter ip_tables x_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel binfmt_misc nfsd auth_rpcgss nfs_acl aesni_intel nfs lockd aes_x86_64 sunrpc fscache lrw gf128mul glue_helper ablk_helper cryptd serio_raw ppdev parport_pc lp parport btrfs xor raid6_pq libcrc32c psmouse floppy
      > [  260.552068] CPU: 0 PID: 1723 Comm: nsenter Not tainted 3.13.0-30-generic #55-Ubuntu
      > [  260.552068] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      > [  260.552068] task: ffff8800376097f0 ti: ffff880074824000 task.ti: ffff880074824000
      > [  260.552068] RIP: 0010:[<ffffffff811e9483>]  [<ffffffff811e9483>] propagate_umount+0x123/0x130
      > [  260.552068] RSP: 0018:ffff880074825e98  EFLAGS: 00010246
      > [  260.552068] RAX: ffff88007c741140 RBX: 0000000000000002 RCX: ffff88007c741190
      > [  260.552068] RDX: ffff88007c741190 RSI: ffff880074825ec0 RDI: ffff880074825ec0
      > [  260.552068] RBP: ffff880074825eb0 R08: 00000000000172e0 R09: ffff88007fc172e0
      > [  260.552068] R10: ffffffff811cc642 R11: ffffea0001d59000 R12: ffff88007c741140
      > [  260.552068] R13: ffff88007c741140 R14: ffff88007c741140 R15: 0000000000000000
      > [  260.552068] FS:  00007fd5c7e41740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      > [  260.552068] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > [  260.552068] CR2: 00007fd5c7968050 CR3: 0000000070124000 CR4: 00000000000406f0
      > [  260.552068] Stack:
      > [  260.552068]  0000000000000002 0000000000000002 ffff88007c631000 ffff880074825ed8
      > [  260.552068]  ffffffff811dcfac ffff88007c741140 0000000000000002 ffff88007c741160
      > [  260.552068]  ffff880074825f38 ffffffff811dd12b ffffffff811cc642 0000000075640000
      > [  260.552068] Call Trace:
      > [  260.552068]  [<ffffffff811dcfac>] umount_tree+0x20c/0x260
      > [  260.552068]  [<ffffffff811dd12b>] do_umount+0x12b/0x300
      > [  260.552068]  [<ffffffff811cc642>] ? final_putname+0x22/0x50
      > [  260.552068]  [<ffffffff811cc849>] ? putname+0x29/0x40
      > [  260.552068]  [<ffffffff811dd88c>] SyS_umount+0xdc/0x100
      > [  260.552068]  [<ffffffff8172aeff>] tracesys+0xe1/0xe6
      > [  260.552068] Code: 89 50 08 48 8b 50 08 48 89 02 49 89 45 08 e9 72 ff ff ff 0f 1f 44 00 00 4c 89 e6 4c 89 e7 e8 f5 f6 ff ff 48 89 c3 e9 39 ff ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 90 66 66 66 66 90 55 b8 01
      > [  260.552068] RIP  [<ffffffff811e9483>] propagate_umount+0x123/0x130
      > [  260.552068]  RSP <ffff880074825e98>
      > [  260.611451] ---[ end trace 11c33d85f1d4c652 ]--
      
      Which in practice is totally uninteresting.  Only the global root user can
      do it, and it is just a stupid thing to do.
      
      However that is no excuse to allow a silly way to oops the kernel.
      
      We can avoid this silly problem by setting MNT_LOCKED on the rootfs
      mount point and thus avoid needing any special cases in the unmount
      code.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      da362b09
    • E
      umount: Disallow unprivileged mount force · b2f5d4dc
      Eric W. Biederman 提交于
      Forced unmount affects not just the mount namespace but the underlying
      superblock as well.  Restrict forced unmount to the global root user
      for now.  Otherwise it becomes possible a user in a less privileged
      mount namespace to force the shutdown of a superblock of a filesystem
      in a more privileged mount namespace, allowing a DOS attack on root.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      b2f5d4dc
    • E
      mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount · 3e186641
      Eric W. Biederman 提交于
      Now that remount is properly enforcing the rule that you can't remove
      nodev at least sandstorm.io is breaking when performing a remount.
      
      It turns out that there is an easy intuitive solution implicitly
      add nodev on remount when nodev was implicitly added on mount.
      Tested-by: NCedric Bosdonnat <cbosdonnat@suse.com>
      Tested-by: NRichard Weinberger <richard@nod.at>
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3e186641
  9. 24 10月, 2014 1 次提交
  10. 15 10月, 2014 1 次提交
  11. 09 10月, 2014 8 次提交
    • S
      vfs: move getname() from callers to do_mount() · 5e6123f3
      Seunghun Lee 提交于
      It would make more sense to pass char __user * instead of
      char * in callers of do_mount() and do getname() inside do_mount().
      Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NSeunghun Lee <waydi1@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5e6123f3
    • T
      fs: namespace: suppress 'may be used uninitialized' warnings · b8850d1f
      Tim Gardner 提交于
      The gcc version 4.9.1 compiler complains Even though it isn't possible for
      these variables to not get initialized before they are used.
      
      fs/namespace.c: In function ‘SyS_mount’:
      fs/namespace.c:2720:8: warning: ‘kernel_dev’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
              ^
      fs/namespace.c:2699:8: note: ‘kernel_dev’ was declared here
        char *kernel_dev;
              ^
      fs/namespace.c:2720:8: warning: ‘kernel_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
              ^
      fs/namespace.c:2697:8: note: ‘kernel_type’ was declared here
        char *kernel_type;
              ^
      
      Fix the warnings by simplifying copy_mount_string() as suggested by Al Viro.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b8850d1f
    • E
      vfs: Add a function to lazily unmount all mounts from any dentry. · 80b5dce8
      Eric W. Biederman 提交于
      The new function detach_mounts comes in two pieces.  The first piece
      is a static inline test of d_mounpoint that returns immediately
      without taking any locks if d_mounpoint is not set.  In the common
      case when mountpoints are absent this allows the vfs to continue
      running with it's same cacheline foot print.
      
      The second piece of detach_mounts __detach_mounts actually does the
      work and it assumes that a mountpoint is present so it is slow and
      takes namespace_sem for write, and then locks the mount hash (aka
      mount_lock) after a struct mountpoint has been found.
      
      With those two locks held each entry on the list of mounts on a
      mountpoint is selected and lazily unmounted until all of the mount
      have been lazily unmounted.
      
      v7: Wrote a proper change description and removed the changelog
          documenting deleted wrong turns.
      Signed-off-by: NEric W. Biederman <ebiederman@twitter.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      80b5dce8
    • E
      vfs: factor out lookup_mountpoint from new_mountpoint · e2dfa935
      Eric W. Biederman 提交于
      I am shortly going to add a new user of struct mountpoint that
      needs to look up existing entries but does not want to create
      a struct mountpoint if one does not exist.  Therefore to keep
      the code simple and easy to read split out lookup_mountpoint
      from new_mountpoint.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e2dfa935
    • E
      vfs: Keep a list of mounts on a mount point · 0a5eb7c8
      Eric W. Biederman 提交于
      To spot any possible problems call BUG if a mountpoint
      is put when it's list of mounts is not empty.
      
      AV: use hlist instead of list_head
      Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NEric W. Biederman <ebiederman@twitter.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a5eb7c8
    • E
      vfs: Don't allow overwriting mounts in the current mount namespace · 7af1364f
      Eric W. Biederman 提交于
      In preparation for allowing mountpoints to be renamed and unlinked
      in remote filesystems and in other mount namespaces test if on a dentry
      there is a mount in the local mount namespace before allowing it to
      be renamed or unlinked.
      
      The primary motivation here are old versions of fusermount unmount
      which is not safe if the a path can be renamed or unlinked while it is
      verifying the mount is safe to unmount.  More recent versions are simpler
      and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
      in a directory owned by an arbitrary user.
      
      Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good
      enough to remove concerns about new kernels mixed with old versions
      of fusermount.
      
      A secondary motivation for restrictions here is that it removing empty
      directories that have non-empty mount points on them appears to
      violate the rule that rmdir can not remove empty directories.  As
      Linus Torvalds pointed out this is useful for programs (like git) that
      test if a directory is empty with rmdir.
      
      Therefore this patch arranges to enforce the existing mount point
      semantics for local mount namespace.
      
      v2: Rewrote the test to be a drop in replacement for d_mountpoint
      v3: Use bool instead of int as the return type of is_local_mountpoint
      Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7af1364f
    • A
      delayed mntput · 9ea459e1
      Al Viro 提交于
      On final mntput() we want fs shutdown to happen before return to
      userland; however, the only case where we want it happen right
      there (i.e. where task_work_add won't do) is MNT_INTERNAL victim.
      Those have to be fully synchronous - failure halfway through module
      init might count on having vfsmount killed right there.  Fortunately,
      final mntput on MNT_INTERNAL vfsmounts happens on shallow stack.
      So we handle those synchronously and do an analog of delayed fput
      logics for everything else.
      
      As the result, we are guaranteed that fs shutdown will always happen
      on shallow stack.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9ea459e1
    • A
      fs: Add a missing permission check to do_umount · a1480dcc
      Andy Lutomirski 提交于
      Accessing do_remount_sb should require global CAP_SYS_ADMIN, but
      only one of the two call sites was appropriately protected.
      
      Fixes CVE-2014-7975.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      a1480dcc
  12. 31 8月, 2014 2 次提交
    • A
      fix EBUSY on umount() from MNT_SHRINKABLE · 81b6b061
      Al Viro 提交于
      We need the parents of victims alive until namespace_unlock() gets to
      dput() of the (ex-)mountpoints.  However, that screws up the "is it
      busy" checks in case when we have shrinkable mounts that need to be
      killed.  Solution: go ahead and decrement refcounts of parents right
      in umount_tree(), increment them again just before dropping rwsem in
      namespace_unlock() (and let the loop in the end of namespace_unlock()
      finally drop those references for good, as we do now).  Parents can't
      get freed until we drop rwsem - at least one reference is kept until
      then, both in case when parent is among the victims and when it is
      not.  So they'll still be around when we get to namespace_unlock().
      
      Cc: stable@vger.kernel.org # 3.12+
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      81b6b061
    • A
      get rid of propagate_umount() mistakenly treating slaves as busy. · 88b368f2
      Al Viro 提交于
      The check in __propagate_umount() ("has somebody explicitly mounted
      something on that slave?") is done *before* taking the already doomed
      victims out of the child lists.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      88b368f2
  13. 12 8月, 2014 1 次提交
    • A
      fix copy_tree() regression · 12a5b529
      Al Viro 提交于
      Since 3.14 we had copy_tree() get the shadowing wrong - if we had one
      vfsmount shadowing another (i.e. if A is a slave of B, C is mounted
      on A/foo, then D got mounted on B/foo creating D' on A/foo shadowed
      by C), copy_tree() of A would make a copy of D' shadow the the copy of
      C, not the other way around.
      
      It's easy to fix, fortunately - just make sure that mount follows
      the one that shadows it in mnt_child as well as in mnt_hash, and when
      copy_tree() decides to attach a new mount, check if the last child
      it has added to the same parent should be shadowing the new one.
      And if it should, just use the same logics commit_tree() has - put the
      new mount into the hash and children lists right after the one that
      should shadow it.
      
      Cc: stable@vger.kernel.org [3.14 and later]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      12a5b529
  14. 08 8月, 2014 3 次提交
    • A
      death to mnt_pinned · 3064c356
      Al Viro 提交于
      Rather than playing silly buggers with vfsmount refcounts, just have
      acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt
      and replace it with said clone.  Then attach the pin to original
      vfsmount.  Voila - the clone will be alive until the file gets closed,
      making sure that underlying superblock remains active, etc., and
      we can drop the original vfsmount, so that it's not kept busy.
      If the file lives until the final mntput of the original vfsmount,
      we'll notice that there's an fs_pin (one in bsd_acct_struct that
      holds that file) and mnt_pin_kill() will take it out.  Since
      ->kill() is synchronous, we won't proceed past that point until
      these files are closed (and private clones of our vfsmount are
      gone), so we get the same ordering warranties we used to get.
      
      mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance -
      it never became usable outside of kernel/acct.c (and racy wrt
      umount even there).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3064c356
    • A
      make fs/{namespace,super}.c forget about acct.h · 8fa1f1c2
      Al Viro 提交于
      These externs belong in fs/internal.h.  Rename (they are not acct-specific
      anymore) and move them over there.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8fa1f1c2
    • A
      acct: get rid of acct_list · 215752fc
      Al Viro 提交于
      Put these suckers on per-vfsmount and per-superblock lists instead.
      Note: right now it's still acct_lock for everything, but that's
      going to change.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      215752fc
  15. 07 8月, 2014 1 次提交
    • K
      list: fix order of arguments for hlist_add_after(_rcu) · 1d023284
      Ken Helias 提交于
      All other add functions for lists have the new item as first argument
      and the position where it is added as second argument.  This was changed
      for no good reason in this function and makes using it unnecessary
      confusing.
      
      The name was changed to hlist_add_behind() to cause unconverted code to
      generate a compile error instead of using the wrong parameter order.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKen Helias <kenhelias@firemail.de>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	[intel driver bits]
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d023284
  16. 01 8月, 2014 3 次提交
    • E
      mnt: Change the default remount atime from relatime to the existing value · ffbc6f0e
      Eric W. Biederman 提交于
      Since March 2009 the kernel has treated the state that if no
      MS_..ATIME flags are passed then the kernel defaults to relatime.
      
      Defaulting to relatime instead of the existing atime state during a
      remount is silly, and causes problems in practice for people who don't
      specify any MS_...ATIME flags and to get the default filesystem atime
      setting.  Those users may encounter a permission error because the
      default atime setting does not work.
      
      A default that does not work and causes permission problems is
      ridiculous, so preserve the existing value to have a default
      atime setting that is always guaranteed to work.
      
      Using the default atime setting in this way is particularly
      interesting for applications built to run in restricted userspace
      environments without /proc mounted, as the existing atime mount
      options of a filesystem can not be read from /proc/mounts.
      
      In practice this fixes user space that uses the default atime
      setting on remount that are broken by the permission checks
      keeping less privileged users from changing more privileged users
      atime settings.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ffbc6f0e
    • E
      mnt: Correct permission checks in do_remount · 9566d674
      Eric W. Biederman 提交于
      While invesgiating the issue where in "mount --bind -oremount,ro ..."
      would result in later "mount --bind -oremount,rw" succeeding even if
      the mount started off locked I realized that there are several
      additional mount flags that should be locked and are not.
      
      In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime
      flags in addition to MNT_READONLY should all be locked.  These
      flags are all per superblock, can all be changed with MS_BIND,
      and should not be changable if set by a more privileged user.
      
      The following additions to the current logic are added in this patch.
      - nosuid may not be clearable by a less privileged user.
      - nodev  may not be clearable by a less privielged user.
      - noexec may not be clearable by a less privileged user.
      - atime flags may not be changeable by a less privileged user.
      
      The logic with atime is that always setting atime on access is a
      global policy and backup software and auditing software could break if
      atime bits are not updated (when they are configured to be updated),
      and serious performance degradation could result (DOS attack) if atime
      updates happen when they have been explicitly disabled.  Therefore an
      unprivileged user should not be able to mess with the atime bits set
      by a more privileged user.
      
      The additional restrictions are implemented with the addition of
      MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME
      mnt flags.
      
      Taken together these changes and the fixes for MNT_LOCK_READONLY
      should make it safe for an unprivileged user to create a user
      namespace and to call "mount --bind -o remount,... ..." without
      the danger of mount flags being changed maliciously.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      9566d674
    • E
      mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount · 07b64558
      Eric W. Biederman 提交于
      There are no races as locked mount flags are guaranteed to never change.
      
      Moving the test into do_remount makes it more visible, and ensures all
      filesystem remounts pass the MNT_LOCK_READONLY permission check.  This
      second case is not an issue today as filesystem remounts are guarded
      by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged
      mount namespaces, but it could become an issue in the future.
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      07b64558