1. 07 4月, 2021 2 次提交
    • A
      LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late · 4f0ed93f
      Al Viro 提交于
      That (and traversals in case of umount .) should be done before
      complete_walk().  Either a braino or mismerge damage on queue
      reorders - either way, I should've spotted that much earlier.
      Fucked-up-by: NAl Viro <viro@zeniv.linux.org.uk>
      X-Paperbag: Brown
      Fixes: 161aff1d "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()"
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4f0ed93f
    • A
      Make sure nd->path.mnt and nd->path.dentry are always valid pointers · 7d01ef75
      Al Viro 提交于
      Initialize them in set_nameidata() and make sure that terminate_walk() clears them
      once the pointers become potentially invalid (i.e. we leave RCU mode or drop them
      in non-RCU one).  Currently we have "path_init() always initializes them and nobody
      accesses them outside of path_init()/terminate_walk() segments", which is asking
      for trouble.
      
      With that change we would have nd->path.{mnt,dentry}
      	1) always valid - NULL or pointing to currently allocated objects.
      	2) non-NULL while we are successfully walking
      	3) NULL when we are not walking at all
      	4) contributing to refcounts whenever non-NULL outside of RCU mode.
      
      Fixes: 6c6ec2b0 ("fs: add support for LOOKUP_CACHED")
      Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com
      Tested-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7d01ef75
  2. 21 2月, 2021 1 次提交
    • A
      fix handling of nd->depth on LOOKUP_CACHED failures in try_to_unlazy* · eacd9aa8
      Al Viro 提交于
      After switching to non-RCU mode, we want nd->depth to match the number
      of entries in nd->stack[] that need eventual path_put().
      legitimize_links() takes care of that on failures; unfortunately,
      failure exits added for LOOKUP_CACHED do not.
      
      We could add the logics for that into those failure exits, both in
      try_to_unlazy() and in try_to_unlazy_next(), but since both checks
      are immediately followed by legitimize_links() and there's no calls
      of legitimize_links() other than those two...  It's easier to
      move the check (and required handling of nd->depth on failure) into
      legitimize_links() itself.
      
      [caught by Jens: ... and since we are zeroing ->depth here, we need
      to do drop_links() first]
      
      Fixes: 6c6ec2b0 "fs: add support for LOOKUP_CACHED"
      Tested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eacd9aa8
  3. 24 1月, 2021 9 次提交
  4. 05 1月, 2021 3 次提交
    • J
      fs: add support for LOOKUP_CACHED · 6c6ec2b0
      Jens Axboe 提交于
      io_uring always punts opens to async context, since there's no control
      over whether the lookup blocks or not. Add LOOKUP_CACHED to support
      just doing the fast RCU based lookups, which we know will not block. If
      we can do a cached path resolution of the filename, then we don't have
      to always punt lookups for a worker.
      
      During path resolution, we always do LOOKUP_RCU first. If that fails and
      we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6c6ec2b0
    • A
      saner calling conventions for unlazy_child() · ae66db45
      Al Viro 提交于
      same as for the previous commit - instead of 0/-ECHILD make
      it return true/false, rename to try_to_unlazy_child().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ae66db45
    • J
      fs: make unlazy_walk() error handling consistent · e36cffed
      Jens Axboe 提交于
      Most callers check for non-zero return, and assume it's -ECHILD (which
      it always will be). One caller uses the actual error return. Clean this
      up and make it fully consistent, by having unlazy_walk() return a bool
      instead. Rename it to try_to_unlazy() and return true on success, and
      failure on error. That's easier to read.
      
      No functional changes in this patch.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cffed
  5. 04 1月, 2021 2 次提交
    • S
      fs/namei.c: Remove unlikely of status being -ECHILD in lookup_fast() · 26ddb45e
      Steven Rostedt (VMware) 提交于
      Running my yearly branch profiling code, it detected a 100% wrong branch
      condition in name.c for lookup_fast(). The code in question has:
      
      		status = d_revalidate(dentry, nd->flags);
      		if (likely(status > 0))
      			return dentry;
      		if (unlazy_child(nd, dentry, seq))
      			return ERR_PTR(-ECHILD);
      		if (unlikely(status == -ECHILD))
      			/* we'd been told to redo it in non-rcu mode */
      			status = d_revalidate(dentry, nd->flags);
      
      If the status of the d_revalidate() is greater than zero, then the function
      finishes. Otherwise, if it is an "unlazy_child" it returns with -ECHILD.
      After the above two checks, the status is compared to -ECHILD, as that is
      what is returned if the original d_revalidate() needed to be done in a
      non-rcu mode.
      
      Especially this path is called in a condition of:
      
      	if (nd->flags & LOOKUP_RCU) {
      
      And most of the d_revalidate() functions have:
      
      	if (flags & LOOKUP_RCU)
      		return -ECHILD;
      
      It appears that that is the only case that this if statement is triggered
      on two of my machines, running in production.
      
      As it is dependent on what filesystem mix is configured in the running
      kernel, simply remove the unlikely() from the if statement.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      26ddb45e
    • A
      do_tmpfile(): don't mess with finish_open() · 1e8f44f1
      Al Viro 提交于
      use vfs_open() instead
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1e8f44f1
  6. 11 12月, 2020 1 次提交
  7. 10 12月, 2020 1 次提交
  8. 25 9月, 2020 1 次提交
  9. 28 8月, 2020 1 次提交
  10. 15 8月, 2020 1 次提交
  11. 13 8月, 2020 3 次提交
    • K
      exec: move path_noexec() check earlier · 0fd338b2
      Kees Cook 提交于
      The path_noexec() check, like the regular file check, was happening too
      late, letting LSMs see impossible execve()s.  Check it earlier as well in
      may_open() and collect the redundant fs/exec.c path_noexec() test under
      the same robustness comment as the S_ISREG() check.
      
      My notes on the call path, and related arguments, checks, etc:
      
      do_open_execat()
          struct open_flags open_exec_flags = {
              .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
              .acc_mode = MAY_EXEC,
              ...
          do_filp_open(dfd, filename, open_flags)
              path_openat(nameidata, open_flags, flags)
                  file = alloc_empty_file(open_flags, current_cred());
                  do_open(nameidata, file, open_flags)
                      may_open(path, acc_mode, open_flag)
                          /* new location of MAY_EXEC vs path_noexec() test */
                          inode_permission(inode, MAY_OPEN | acc_mode)
                              security_inode_permission(inode, acc_mode)
                      vfs_open(path, file)
                          do_dentry_open(file, path->dentry->d_inode, open)
                              security_file_open(f)
                              open()
          /* old location of path_noexec() test */
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Link: http://lkml.kernel.org/r/20200605160013.3954297-4-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fd338b2
    • K
      exec: move S_ISREG() check earlier · 633fb6ac
      Kees Cook 提交于
      The execve(2)/uselib(2) syscalls have always rejected non-regular files.
      Recently, it was noticed that a deadlock was introduced when trying to
      execute pipes, as the S_ISREG() test was happening too late.  This was
      fixed in commit 73601ea5 ("fs/open.c: allow opening only regular files
      during execve()"), but it was added after inode_permission() had already
      run, which meant LSMs could see bogus attempts to execute non-regular
      files.
      
      Move the test into the other inode type checks (which already look for
      other pathological conditions[1]).  Since there is no need to use
      FMODE_EXEC while we still have access to "acc_mode", also switch the test
      to MAY_EXEC.
      
      Also include a comment with the redundant S_ISREG() checks at the end of
      execve(2)/uselib(2) to note that they are present to avoid any mistakes.
      
      My notes on the call path, and related arguments, checks, etc:
      
      do_open_execat()
          struct open_flags open_exec_flags = {
              .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
              .acc_mode = MAY_EXEC,
              ...
          do_filp_open(dfd, filename, open_flags)
              path_openat(nameidata, open_flags, flags)
                  file = alloc_empty_file(open_flags, current_cred());
                  do_open(nameidata, file, open_flags)
                      may_open(path, acc_mode, open_flag)
      		    /* new location of MAY_EXEC vs S_ISREG() test */
                          inode_permission(inode, MAY_OPEN | acc_mode)
                              security_inode_permission(inode, acc_mode)
                      vfs_open(path, file)
                          do_dentry_open(file, path->dentry->d_inode, open)
                              /* old location of FMODE_EXEC vs S_ISREG() test */
                              security_file_open(f)
                              open()
      
      [1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Link: http://lkml.kernel.org/r/20200605160013.3954297-3-keescook@chromium.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      633fb6ac
    • A
      fix breakage in do_rmdir() · 24fb33d4
      Al Viro 提交于
      syzbot reported and bisected a use-after-free due to the recent init
      cleanups.
      
      The putname() should happen only after we'd *not* branched to retry,
      same as it's done in do_unlinkat().
      
      Reported-by: syzbot+bbeb1c88016c7db4aa24@syzkaller.appspotmail.com
      Fixes: e24ab0ef "fs: push the getname from do_rmdir into the callers"
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24fb33d4
  12. 31 7月, 2020 5 次提交
  13. 09 6月, 2020 2 次提交
    • L
      vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK · 63d72b93
      Linus Torvalds 提交于
      posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact
      the permission logic internally must not check that bit (it's only for
      upper layers to decide whether they can block to do IO to look up the
      acl information or not).
      
      But the way the code was written, it _looked_ like it cared, since the
      function explicitly did not mask that bit off.
      
      But it has exactly two callers: one for when that bit is set, which
      first clears the bit before calling posix_acl_permission(), and the
      other call site when that bit was clear.
      
      So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be
      used for the actual permission test, and that currently is pointlessly
      cleared by the callers when the function itself should just not care.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63d72b93
    • L
      vfs: do not do group lookup when not necessary · 5fc475b7
      Linus Torvalds 提交于
      Rasmus Villemoes points out that the 'in_group_p()' tests can be a
      noticeable expense, and often completely unnecessary.  A common
      situation is that the 'group' bits are the same as the 'other' bits
      wrt the permissions we want to test.
      
      So rewrite 'acl_permission_check()' to not bother checking for group
      ownership when the permission check doesn't care.
      
      For example, if we're asking for read permissions, and both 'group' and
      'other' allow reading, there's really no reason to check if we're part
      of the group or not: either way, we'll allow it.
      
      Rasmus says:
       "On a bog-standard Ubuntu 20.04 install, a workload consisting of
        compiling lots of userspace programs (i.e., calling lots of
        short-lived programs that all need to get their shared libs mapped in,
        and the compilers poking around looking for system headers - lots of
        /usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around
        0.1% according to perf top.
      
        System-installed files are almost always 0755 (directories and
        binaries) or 0644, so in most cases, we can avoid the binary search
        and the cost of pulling the cred->groups array and in_group_p() .text
        into the cpu cache"
      Reported-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fc475b7
  14. 14 5月, 2020 1 次提交
    • M
      vfs: allow unprivileged whiteout creation · a3c751a5
      Miklos Szeredi 提交于
      Whiteouts, unlike real device node should not require privileges to create.
      
      The general concern with device nodes is that opening them can have side
      effects.  The kernel already avoids zero major (see
      Documentation/admin-guide/devices.txt).  To be on the safe side the patch
      explicitly forbids registering a char device with 0/0 number (see
      cdev_add()).
      
      This guarantees that a non-O_PATH open on a whiteout will fail with ENODEV;
      i.e. it won't have any side effect.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a3c751a5
  15. 06 4月, 2020 1 次提交
  16. 02 4月, 2020 6 次提交
    • A
      lookup_open(): don't bother with fallbacks to lookup+create · 99a4a90c
      Al Viro 提交于
      We fall back to lookup+create (instead of atomic_open) in several cases:
      	1) we don't have write access to filesystem and O_TRUNC is
      present in the flags.  It's not something we want ->atomic_open() to
      see - it just might go ahead and truncate the file.  However, we can
      pass it the flags sans O_TRUNC - eventually do_open() will call
      handle_truncate() anyway.
      	2) we have O_CREAT | O_EXCL and we can't write to parent.
      That's going to be an error, of course, but we want to know _which_
      error should that be - might be EEXIST (if file exists), might be
      EACCES or EROFS.  Simply stripping O_CREAT (and checking if we see
      ENOENT) would suffice, if not for O_EXCL.  However, we used to have
      ->atomic_open() fully responsible for rejecting O_CREAT | O_EXCL
      on existing file and just stripping O_CREAT would've disarmed
      those checks.  With nothing downstream to catch the problem -
      FMODE_OPENED used to be "don't bother with EEXIST checks,
      ->atomic_open() has done those".  Now EEXIST checks downstream
      are skipped only if FMODE_CREATED is set - FMODE_OPENED alone
      is not enough.  That has eliminated the need to fall back onto
      lookup+create path in this case.
      	3) O_WRONLY or O_RDWR when we have no write access to
      filesystem, with nothing else objectionable.  Fallback is
      (and had always been) pointless.
      
      IOW, we don't really need that fallback; all we need in such
      cases is to trim O_TRUNC and O_CREAT properly.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99a4a90c
    • A
      atomic_open(): no need to pass struct open_flags anymore · d489cf9a
      Al Viro 提交于
      argument had been unused since 1643b43f (lookup_open(): lift the
      "fallback to !O_CREAT" logics from atomic_open()) back in 2016
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d489cf9a
    • A
      ff326a32
    • A
      open_last_lookups(): lift O_EXCL|O_CREAT handling into do_open() · b94e0b32
      Al Viro 提交于
      Currently path_openat() has "EEXIST on O_EXCL|O_CREAT" checks done on one
      of the ways out of open_last_lookups().  There are 4 cases:
      	1) the last component is . or ..; check is not done.
      	2) we had FMODE_OPENED or FMODE_CREATED set while in lookup_open();
      check is not done.
      	3) symlink to be traversed is found; check is not done (nor
      should it be)
      	4) everything else: check done (before complete_walk(), even).
      
      In case (1) O_EXCL|O_CREAT ends up failing with -EISDIR - that's
      	open("/tmp/.", O_CREAT|O_EXCL, 0600)
      Note that in the same conditions
      	open("/tmp", O_CREAT|O_EXCL, 0600)
      would have yielded EEXIST.  Either error is allowed, switching to -EEXIST
      in these cases would've been more consistent.
      
      Case (2) is more subtle; first of all, if we have FMODE_CREATED set, the
      object hadn't existed prior to the call.  The check should not be done in
      such a case.  The rest is problematic, though - we have
      	FMODE_OPENED set (i.e. it went through ->atomic_open() and got
      successfully opened there)
      	FMODE_CREATED is *NOT* set
      	O_CREAT and O_EXCL are both set.
      Any such case is a bug - either we failed to set FMODE_CREATED when we
      had, in fact, created an object (no such instances in the tree) or
      we have opened a pre-existing file despite having had both O_CREAT and
      O_EXCL passed.  One of those was, in fact caught (and fixed) while
      sorting out this mess (gfs2 on cold dcache).  And in such situations
      we should fail with EEXIST.
      
      Note that for (1) and (4) FMODE_CREATED is not set - for (1) there's nothing
      in handle_dots() to set it, for (4) we'd explicitly checked that.
      
      And (1), (2) and (4) are exactly the cases when we leave the loop in
      the caller, with do_open() called immediately after that loop.  IOW, we
      can move the check over there, and make it
      
      	If we have O_CREAT|O_EXCL and after successful pathname resolution
      FMODE_CREATED is *not* set, we must have run into a preexisting file and
      should fail with EEXIST.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b94e0b32
    • A
    • A
      f7bb959d