1. 11 12月, 2020 1 次提交
  2. 25 9月, 2020 1 次提交
  3. 28 8月, 2020 1 次提交
  4. 15 8月, 2020 1 次提交
  5. 13 8月, 2020 3 次提交
    • K
      exec: move path_noexec() check earlier · 0fd338b2
      Kees Cook 提交于
      The path_noexec() check, like the regular file check, was happening too
      late, letting LSMs see impossible execve()s.  Check it earlier as well in
      may_open() and collect the redundant fs/exec.c path_noexec() test under
      the same robustness comment as the S_ISREG() check.
      
      My notes on the call path, and related arguments, checks, etc:
      
      do_open_execat()
          struct open_flags open_exec_flags = {
              .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
              .acc_mode = MAY_EXEC,
              ...
          do_filp_open(dfd, filename, open_flags)
              path_openat(nameidata, open_flags, flags)
                  file = alloc_empty_file(open_flags, current_cred());
                  do_open(nameidata, file, open_flags)
                      may_open(path, acc_mode, open_flag)
                          /* new location of MAY_EXEC vs path_noexec() test */
                          inode_permission(inode, MAY_OPEN | acc_mode)
                              security_inode_permission(inode, acc_mode)
                      vfs_open(path, file)
                          do_dentry_open(file, path->dentry->d_inode, open)
                              security_file_open(f)
                              open()
          /* old location of path_noexec() test */
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fd338b2
    • K
      exec: move S_ISREG() check earlier · 633fb6ac
      Kees Cook 提交于
      The execve(2)/uselib(2) syscalls have always rejected non-regular files.
      Recently, it was noticed that a deadlock was introduced when trying to
      execute pipes, as the S_ISREG() test was happening too late.  This was
      fixed in commit 73601ea5 ("fs/open.c: allow opening only regular files
      during execve()"), but it was added after inode_permission() had already
      run, which meant LSMs could see bogus attempts to execute non-regular
      files.
      
      Move the test into the other inode type checks (which already look for
      other pathological conditions[1]).  Since there is no need to use
      FMODE_EXEC while we still have access to "acc_mode", also switch the test
      to MAY_EXEC.
      
      Also include a comment with the redundant S_ISREG() checks at the end of
      execve(2)/uselib(2) to note that they are present to avoid any mistakes.
      
      My notes on the call path, and related arguments, checks, etc:
      
      do_open_execat()
          struct open_flags open_exec_flags = {
              .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
              .acc_mode = MAY_EXEC,
              ...
          do_filp_open(dfd, filename, open_flags)
              path_openat(nameidata, open_flags, flags)
                  file = alloc_empty_file(open_flags, current_cred());
                  do_open(nameidata, file, open_flags)
                      may_open(path, acc_mode, open_flag)
      		    /* new location of MAY_EXEC vs S_ISREG() test */
                          inode_permission(inode, MAY_OPEN | acc_mode)
                              security_inode_permission(inode, acc_mode)
                      vfs_open(path, file)
                          do_dentry_open(file, path->dentry->d_inode, open)
                              /* old location of FMODE_EXEC vs S_ISREG() test */
                              security_file_open(f)
                              open()
      
      [1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      633fb6ac
    • A
      fix breakage in do_rmdir() · 24fb33d4
      Al Viro 提交于
      syzbot reported and bisected a use-after-free due to the recent init
      cleanups.
      
      The putname() should happen only after we'd *not* branched to retry,
      same as it's done in do_unlinkat().
      
      Reported-by: syzbot+bbeb1c88016c7db4aa24@syzkaller.appspotmail.com
      Fixes: e24ab0ef "fs: push the getname from do_rmdir into the callers"
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24fb33d4
  6. 31 7月, 2020 5 次提交
  7. 09 6月, 2020 2 次提交
    • L
      vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK · 63d72b93
      Linus Torvalds 提交于
      posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact
      the permission logic internally must not check that bit (it's only for
      upper layers to decide whether they can block to do IO to look up the
      acl information or not).
      
      But the way the code was written, it _looked_ like it cared, since the
      function explicitly did not mask that bit off.
      
      But it has exactly two callers: one for when that bit is set, which
      first clears the bit before calling posix_acl_permission(), and the
      other call site when that bit was clear.
      
      So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be
      used for the actual permission test, and that currently is pointlessly
      cleared by the callers when the function itself should just not care.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63d72b93
    • L
      vfs: do not do group lookup when not necessary · 5fc475b7
      Linus Torvalds 提交于
      Rasmus Villemoes points out that the 'in_group_p()' tests can be a
      noticeable expense, and often completely unnecessary.  A common
      situation is that the 'group' bits are the same as the 'other' bits
      wrt the permissions we want to test.
      
      So rewrite 'acl_permission_check()' to not bother checking for group
      ownership when the permission check doesn't care.
      
      For example, if we're asking for read permissions, and both 'group' and
      'other' allow reading, there's really no reason to check if we're part
      of the group or not: either way, we'll allow it.
      
      Rasmus says:
       "On a bog-standard Ubuntu 20.04 install, a workload consisting of
        compiling lots of userspace programs (i.e., calling lots of
        short-lived programs that all need to get their shared libs mapped in,
        and the compilers poking around looking for system headers - lots of
        /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around
        0.1% according to perf top.
      
        System-installed files are almost always 0755 (directories and
        binaries) or 0644, so in most cases, we can avoid the binary search
        and the cost of pulling the cred->groups array and in_group_p() .text
        into the cpu cache"
      Reported-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fc475b7
  8. 14 5月, 2020 1 次提交
    • M
      vfs: allow unprivileged whiteout creation · a3c751a5
      Miklos Szeredi 提交于
      Whiteouts, unlike real device node should not require privileges to create.
      
      The general concern with device nodes is that opening them can have side
      effects.  The kernel already avoids zero major (see
      Documentation/admin-guide/devices.txt).  To be on the safe side the patch
      explicitly forbids registering a char device with 0/0 number (see
      cdev_add()).
      
      This guarantees that a non-O_PATH open on a whiteout will fail with ENODEV;
      i.e. it won't have any side effect.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a3c751a5
  9. 06 4月, 2020 1 次提交
  10. 02 4月, 2020 21 次提交
    • A
      lookup_open(): don't bother with fallbacks to lookup+create · 99a4a90c
      Al Viro 提交于
      We fall back to lookup+create (instead of atomic_open) in several cases:
      	1) we don't have write access to filesystem and O_TRUNC is
      present in the flags.  It's not something we want ->atomic_open() to
      see - it just might go ahead and truncate the file.  However, we can
      pass it the flags sans O_TRUNC - eventually do_open() will call
      handle_truncate() anyway.
      	2) we have O_CREAT | O_EXCL and we can't write to parent.
      That's going to be an error, of course, but we want to know _which_
      error should that be - might be EEXIST (if file exists), might be
      EACCES or EROFS.  Simply stripping O_CREAT (and checking if we see
      ENOENT) would suffice, if not for O_EXCL.  However, we used to have
      ->atomic_open() fully responsible for rejecting O_CREAT | O_EXCL
      on existing file and just stripping O_CREAT would've disarmed
      those checks.  With nothing downstream to catch the problem -
      FMODE_OPENED used to be "don't bother with EEXIST checks,
      ->atomic_open() has done those".  Now EEXIST checks downstream
      are skipped only if FMODE_CREATED is set - FMODE_OPENED alone
      is not enough.  That has eliminated the need to fall back onto
      lookup+create path in this case.
      	3) O_WRONLY or O_RDWR when we have no write access to
      filesystem, with nothing else objectionable.  Fallback is
      (and had always been) pointless.
      
      IOW, we don't really need that fallback; all we need in such
      cases is to trim O_TRUNC and O_CREAT properly.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99a4a90c
    • A
      atomic_open(): no need to pass struct open_flags anymore · d489cf9a
      Al Viro 提交于
      argument had been unused since 1643b43f (lookup_open(): lift the
      "fallback to !O_CREAT" logics from atomic_open()) back in 2016
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d489cf9a
    • A
      ff326a32
    • A
      open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open() · b94e0b32
      Al Viro 提交于
      Currently path_openat() has "EEXIST on O_EXCL|O_CREAT" checks done on one
      of the ways out of open_last_lookups().  There are 4 cases:
      	1) the last component is . or ..; check is not done.
      	2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open();
      check is not done.
      	3) symlink to be traversed is found; check is not done (nor
      should it be)
      	4) everything else: check done (before complete_walk(), even).
      
      In case (1) O_EXCL|O_CREAT ends up failing with -EISDIR - that's
      	open("/tmp/.", O_CREAT|O_EXCL, 0600)
      Note that in the same conditions
      	open("/tmp", O_CREAT|O_EXCL, 0600)
      would have yielded EEXIST.  Either error is allowed, switching to -EEXIST
      in these cases would've been more consistent.
      
      Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the
      object hadn't existed prior to the call.  The check should not be done in
      such a case.  The rest is problematic, though - we have
      	FMODE_OPENED set (i.e. it went through ->atomic_open() and got
      successfully opened there)
      	FMODE_CREATED is *NOT* set
      	O_CREAT and O_EXCL are both set.
      Any such case is a bug - either we failed to set FMODE_CREATED when we
      had, in fact, created an object (no such instances in the tree) or
      we have opened a pre-existing file despite having had both O_CREAT and
      O_EXCL passed.  One of those was, in fact caught (and fixed) while
      sorting out this mess (gfs2 on cold dcache).  And in such situations
      we should fail with EEXIST.
      
      Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing
      in handle_dots() to set it, for (4) we'd explicitly checked that.
      
      And (1), (2) and (4) are exactly the cases when we leave the loop in
      the caller, with do_open() called immediately after that loop.  IOW, we
      can move the check over there, and make it
      
      	If we have O_CREAT|O_EXCL and after successful pathname resolution
      FMODE_CREATED is *not* set, we must have run into a preexisting file and
      should fail with EEXIST.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b94e0b32
    • A
    • A
      f7bb959d
    • A
      take post-lookup part of do_last() out of loop · c5971b8c
      Al Viro 提交于
      now we can have open_last_lookups() directly from the loop in
      path_openat() - the rest of do_last() never returns a symlink
      to follow, so we can bloody well leave the loop first.
      
      Rename the rest of that thing from do_last() to do_open() and
      make it return an int.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c5971b8c
    • A
    • A
      __nd_alloc_stack(): make it return bool · 60ef60c7
      Al Viro 提交于
      ... and adjust the caller (reserve_stack()).  Rename to nd_alloc_stack(),
      while we are at it.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      60ef60c7
    • A
      reserve_stack(): switch to __nd_alloc_stack() · 4542576b
      Al Viro 提交于
      expand the call of nd_alloc_stack() into it (and don't
      recheck the depth on the second call)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4542576b
    • A
      49055906
    • A
      pick_link(): more straightforward handling of allocation failures · aef9404d
      Al Viro 提交于
      pick_link() needs to push onto stack; we start with using two-element
      array embedded into struct nameidata and the first time we need
      more than that we switch to separately allocated array.
      
      Allocation can fail, of course, and handling of that would be simple
      enough - we need to drop 'link' and bugger off.  However, the things
      get more complicated in RCU mode.  There we must do GFP_ATOMIC
      allocation.  If that fails, we try to switch to non-RCU mode and
      repeat the allocation.
      
      To switch to non-RCU mode we need to grab references to 'link' and
      to everything in nameidata.  The latter done by unlazy_walk();
      the former - legitimize_path().  'link' must go first - after
      unlazy_walk() we are out of RCU-critical period and it's too
      late to call legitimize_path() since the references in link->mnt
      and link->dentry might be pointing to freed and reused memory.
      
      So we do legitimize_path(), then unlazy_walk().  And that's where
      it gets too subtle: what to do if the former fails?  We MUST
      do path_put(link) to avoid leaks.  And we can't do that under
      rcu_read_lock().  Solution in mainline was to empty then nameidata
      manually, drop out of RCU mode and then do put_path().
      
      In effect, we open-code the things eventual terminate_walk()
      would've done on error in RCU mode.  That looks badly out of place
      and confusing.  We could add a comment along the lines of the
      explanation above, but... there's a simpler solution.  Call
      unlazy_walk() even if legitimaze_path() fails.  It will take
      us out of RCU mode, so we'll be able to do path_put(link).
      
      Yes, it will do unnecessary work - attempt to grab references
      on the stuff in nameidata, only to have them dropped as soon
      as we return the error to upper layer and get terminate_walk()
      called there.  So what?  We are thoroughly off the fast path
      by that point - we had GFP_ATOMIC allocation fail, we had
      ->d_seq or mount_lock mismatch and we are about to try walking
      the same path from scratch in non-RCU mode.  Which will need
      to do the same allocation, this time with GFP_KERNEL, so it will
      be able to apply memory pressure for blocking stuff.
      
      Compared to that the cost of several lockref_get_not_dead()
      is noise.  And the logics become much easier to understand
      that way.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aef9404d
    • A
      fold path_to_nameidata() into its only remaining caller · c99687a0
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c99687a0
    • A
      pick_link(): pass it struct path already with normal refcounting rules · 84f0cd9e
      Al Viro 提交于
      step_into() tries to avoid grabbing and dropping mount references
      on the steps that do not involve crossing mountpoints (which is
      obviously the majority of cases).  So it uses a local struct path
      with unusual refcounting rules - path.mnt is pinned if and only if
      it's not equal to nd->path.mnt.
      
      We used to have similar beasts all over the place and we had quite
      a few bugs crop up in their handling - it's easy to get confused
      when changing e.g. cleanup on failure exits (or adding a new check,
      etc.)
      
      Now that's mostly gone - the step_into() instance (which is what
      we need them for) is the only one left.  It is exposed to mount
      traversal and it's (shortly) seen by pick_link().  Since pick_link()
      needs to store it in link stack, where the normal rules apply,
      it has to make sure that mount is pinned regardless of nd->path.mnt
      value.  That's done on all calls of pick_link() and very early
      in those.  Let's do that in the caller (step_into()) instead -
      that way the fewer places need to be aware of such struct path
      instances.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      84f0cd9e
    • A
      fs/namei.c: kill follow_mount() · 19f6028a
      Al Viro 提交于
      The only remaining caller (path_pts()) should be using follow_down()
      anyway.  And clean path_pts() a bit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19f6028a
    • A
      non-RCU analogue of the previous commit · 2aa38470
      Al Viro 提交于
      new helper: choose_mountpoint().  Wrapper around choose_mountpoint_rcu(),
      similar to lookup_mnt() vs. __lookup_mnt().  follow_dotdot() switched to
      it.  Now we don't grab mount_lock exclusive anymore; note that the
      primitive used non-RCU mount traversals in other direction (lookup_mnt())
      doesn't bother with that either - it uses mount_lock seqcount instead.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2aa38470
    • A
      helper for mount rootwards traversal · 7ef482fa
      Al Viro 提交于
      The loops in follow_dotdot{_rcu()} are doing the same thing:
      we have a mount and we want to find out how far up the chain
      of mounts do we need to go.
      
      We follow the chain of mount until we find one that is not
      directly overmounting the root of another mount.  If such
      a mount is found, we want the location it's mounted upon.
      If we run out of chain (i.e. get to a mount that is not
      mounted on anything else) or run into process' root, we
      report failure.
      
      On success, we want (in RCU case) d_seq of resulting location
      sampled or (in non-RCU case) references to that location
      acquired.
      
      This commit introduces such primitive for RCU case and
      switches follow_dotdot_rcu() to it; non-RCU case will be
      go in the next commit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7ef482fa
    • A
      follow_dotdot(): be lazy about changing nd->path · 165200d6
      Al Viro 提交于
      Change nd->path only after the loop is done and only in case we hadn't
      ended up finding ourselves in root.  Same for NO_XDEV check.
      
      That separates the "check how far back do we need to go through the
      mount stack" logics from the rest of .. traversal.
      
      NOTE: path_get/path_put introduced here are temporary.  They will
      go away later in the series.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      165200d6
    • A
      follow_dotdot_rcu(): be lazy about changing nd->path · efe772d6
      Al Viro 提交于
      Change nd->path only after the loop is done and only in case we hadn't
      ended up finding ourselves in root.  Same for NO_XDEV check.  Don't
      recheck mount_lock on each step either.
      
      That separates the "check how far back do we need to go through the
      mount stack" logics from the rest of .. traversal.
      
      Note that the sequence for d_seq/d_inode here is
      	* sample mount_lock seqcount
      ...
      	* sample d_seq
      	* fetch d_inode
      	* verify mount_lock seqcount
      The last step makes sure that d_inode value we'd got matches d_seq -
      it dentry is guaranteed to have been a mountpoint through the
      entire thing, so its d_inode must have been stable.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      efe772d6
    • A
      follow_dotdot{,_rcu}(): massage loops · 12487f30
      Al Viro 提交于
      The logics in both of them is the same:
      	while true
      		if in process' root	// uncommon
      			break
      		if *not* in mount root	// normal case
      			find the parent
      			return
      		if at absolute root	// very uncommon
      			break
      		move to underlying mountpoint
      	report that we are in root
      
      Pull the common path out of the loop:
      	if in process' root		// uncommon
      		goto in_root
      	if unlikely(in mount root)
      		while true
      			if at absolute root
      				goto in_root
      			move to underlying mountpoint
      			if in process' root
      				goto in_root
      			if in mount root
      				break;
      	find the parent	// we are not in mount root
      	return
      in_root:
      	report that we are in root
      
      The reason for that transformation is that we get to keep the
      common path straight *and* get a separate block for "move
      through underlying mountpoints", which will allow to sanitize
      NO_XDEV handling there.  What's more, the pared-down loops
      will be easier to deal with - in particular, non-RCU case
      has no need to grab mount_lock and rewriting it to the
      form that wouldn't do that is a non-trivial change.  Better
      do that with less stuff getting in the way...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      12487f30
    • A
      lift all calls of step_into() out of follow_dotdot/follow_dotdot_rcu · c2df1968
      Al Viro 提交于
      lift step_into() into handle_dots() (where they merge with each other);
      have follow_... return dentry and pass inode/seq to the caller.
      
      [braino fix folded; kudos to Qian Cai <cai@lca.pw> for reporting it]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c2df1968
  11. 14 3月, 2020 3 次提交