1. 21 2月, 2021 1 次提交
    • A
      fix handling of nd->depth on LOOKUP_CACHED failures in try_to_unlazy* · eacd9aa8
      Al Viro 提交于
      After switching to non-RCU mode, we want nd->depth to match the number
      of entries in nd->stack[] that need eventual path_put().
      legitimize_links() takes care of that on failures; unfortunately,
      failure exits added for LOOKUP_CACHED do not.
      
      We could add the logics for that into those failure exits, both in
      try_to_unlazy() and in try_to_unlazy_next(), but since both checks
      are immediately followed by legitimize_links() and there's no calls
      of legitimize_links() other than those two...  It's easier to
      move the check (and required handling of nd->depth on failure) into
      legitimize_links() itself.
      
      [caught by Jens: ... and since we are zeroing ->depth here, we need
      to do drop_links() first]
      
      Fixes: 6c6ec2b0 "fs: add support for LOOKUP_CACHED"
      Tested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eacd9aa8
  2. 05 1月, 2021 4 次提交
    • J
      fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED · 99668f61
      Jens Axboe 提交于
      Now that we support non-blocking path resolution internally, expose it
      via openat2() in the struct open_how ->resolve flags. This allows
      applications using openat2() to limit path resolution to the extent that
      it is already cached.
      
      If the lookup cannot be satisfied in a non-blocking manner, openat2(2)
      will return -1/-EAGAIN.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99668f61
    • J
      fs: add support for LOOKUP_CACHED · 6c6ec2b0
      Jens Axboe 提交于
      io_uring always punts opens to async context, since there's no control
      over whether the lookup blocks or not. Add LOOKUP_CACHED to support
      just doing the fast RCU based lookups, which we know will not block. If
      we can do a cached path resolution of the filename, then we don't have
      to always punt lookups for a worker.
      
      During path resolution, we always do LOOKUP_RCU first. If that fails and
      we terminate LOOKUP_RCU, then fail a LOOKUP_CACHED attempt as well.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6c6ec2b0
    • A
      saner calling conventions for unlazy_child() · ae66db45
      Al Viro 提交于
      same as for the previous commit - instead of 0/-ECHILD make
      it return true/false, rename to try_to_unlazy_child().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ae66db45
    • J
      fs: make unlazy_walk() error handling consistent · e36cffed
      Jens Axboe 提交于
      Most callers check for non-zero return, and assume it's -ECHILD (which
      it always will be). One caller uses the actual error return. Clean this
      up and make it fully consistent, by having unlazy_walk() return a bool
      instead. Rename it to try_to_unlazy() and return true on success, and
      failure on error. That's easier to read.
      
      No functional changes in this patch.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cffed
  3. 04 1月, 2021 2 次提交
    • S
      fs/namei.c: Remove unlikely of status being -ECHILD in lookup_fast() · 26ddb45e
      Steven Rostedt (VMware) 提交于
      Running my yearly branch profiling code, it detected a 100% wrong branch
      condition in name.c for lookup_fast(). The code in question has:
      
      		status = d_revalidate(dentry, nd->flags);
      		if (likely(status > 0))
      			return dentry;
      		if (unlazy_child(nd, dentry, seq))
      			return ERR_PTR(-ECHILD);
      		if (unlikely(status == -ECHILD))
      			/* we'd been told to redo it in non-rcu mode */
      			status = d_revalidate(dentry, nd->flags);
      
      If the status of the d_revalidate() is greater than zero, then the function
      finishes. Otherwise, if it is an "unlazy_child" it returns with -ECHILD.
      After the above two checks, the status is compared to -ECHILD, as that is
      what is returned if the original d_revalidate() needed to be done in a
      non-rcu mode.
      
      Especially this path is called in a condition of:
      
      	if (nd->flags & LOOKUP_RCU) {
      
      And most of the d_revalidate() functions have:
      
      	if (flags & LOOKUP_RCU)
      		return -ECHILD;
      
      It appears that that is the only case that this if statement is triggered
      on two of my machines, running in production.
      
      As it is dependent on what filesystem mix is configured in the running
      kernel, simply remove the unlikely() from the if statement.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      26ddb45e
    • A
      do_tmpfile(): don't mess with finish_open() · 1e8f44f1
      Al Viro 提交于
      use vfs_open() instead
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1e8f44f1
  4. 28 12月, 2020 1 次提交
  5. 23 12月, 2020 5 次提交
  6. 22 12月, 2020 5 次提交
  7. 21 12月, 2020 4 次提交
  8. 20 12月, 2020 10 次提交
  9. 19 12月, 2020 4 次提交
    • C
      close_range: unshare all fds for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC · fec8a6a6
      Christian Brauner 提交于
      After introducing CLOSE_RANGE_CLOEXEC syzbot reported a crash when
      CLOSE_RANGE_CLOEXEC is specified in conjunction with CLOSE_RANGE_UNSHARE.
      When CLOSE_RANGE_UNSHARE is specified the caller will receive a private
      file descriptor table in case their file descriptor table is currently
      shared.
      
      For the case where the caller has requested all file descriptors to be
      actually closed via e.g. close_range(3, ~0U, 0) the kernel knows that
      the caller does not need any of the file descriptors anymore and will
      optimize the close operation by only copying all files in the range from
      0 to 3 and no others.
      
      However, if the caller requested CLOSE_RANGE_CLOEXEC together with
      CLOSE_RANGE_UNSHARE the caller wants to still make use of the file
      descriptors so the kernel needs to copy all of them and can't optimize.
      
      The original patch didn't account for this and thus could cause oopses
      as evidenced by the syzbot report because it assumed that all fds had
      been copied. Fix this by handling the CLOSE_RANGE_CLOEXEC case.
      
      syzbot reported
      ==================================================================
      BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
      BUG: KASAN: null-ptr-deref in atomic64_read include/asm-generic/atomic-instrumented.h:837 [inline]
      BUG: KASAN: null-ptr-deref in atomic_long_read include/asm-generic/atomic-long.h:29 [inline]
      BUG: KASAN: null-ptr-deref in filp_close+0x22/0x170 fs/open.c:1274
      Read of size 8 at addr 0000000000000077 by task syz-executor511/8522
      
      CPU: 1 PID: 8522 Comm: syz-executor511 Not tainted 5.10.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       __kasan_report mm/kasan/report.c:549 [inline]
       kasan_report.cold+0x5/0x37 mm/kasan/report.c:562
       check_memory_region_inline mm/kasan/generic.c:186 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:192
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic64_read include/asm-generic/atomic-instrumented.h:837 [inline]
       atomic_long_read include/asm-generic/atomic-long.h:29 [inline]
       filp_close+0x22/0x170 fs/open.c:1274
       close_files fs/file.c:402 [inline]
       put_files_struct fs/file.c:417 [inline]
       put_files_struct+0x1cc/0x350 fs/file.c:414
       exit_files+0x12a/0x170 fs/file.c:435
       do_exit+0xb4f/0x2a00 kernel/exit.c:818
       do_group_exit+0x125/0x310 kernel/exit.c:920
       get_signal+0x428/0x2100 kernel/signal.c:2792
       arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
       handle_signal_work kernel/entry/common.c:147 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x447039
      Code: Unable to access opcode bytes at RIP 0x44700f.
      RSP: 002b:00007f1b1225cdb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
      RAX: 0000000000000001 RBX: 00000000006dbc28 RCX: 0000000000447039
      RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000006dbc2c
      RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
      R13: 00007fff223b6bef R14: 00007f1b1225d9c0 R15: 00000000006dbc2c
      ==================================================================
      
      syzbot has tested the proposed patch and the reproducer did not trigger any issue:
      
      Reported-and-tested-by: syzbot+96cfd2b22b3213646a93@syzkaller.appspotmail.com
      
      Tested on:
      
      commit:         10f7cddd selftests/core: add regression test for CLOSE_RAN..
      git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git vfs
      kernel config:  https://syzkaller.appspot.com/x/.config?x=5d42216b510180e3
      dashboard link: https://syzkaller.appspot.com/bug?extid=96cfd2b22b3213646a93
      compiler:       gcc (GCC) 10.1.0-syz 20200507
      
      Reported-by: syzbot+96cfd2b22b3213646a93@syzkaller.appspotmail.com
      Fixes: 582f1fb6 ("fs, close_range: add flag CLOSE_RANGE_CLOEXEC")
      Cc: Giuseppe Scrivano <gscrivan@redhat.com>
      Cc: linux-fsdevel@vger.kernel.org
      Link: https://lore.kernel.org/r/20201217213303.722643-1-christian.brauner@ubuntu.comSigned-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      fec8a6a6
    • P
      io_uring: fix 0-iov read buffer select · dd201662
      Pavel Begunkov 提交于
      Doing vectored buf-select read with 0 iovec passed is meaningless and
      utterly broken, forbid it.
      
      Cc: <stable@vger.kernel.org> # 5.7+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dd201662
    • B
      Add SMB 2 support for getting and setting SACLs · 9541b813
      Boris Protopopov 提交于
      Fix passing of the additional security info via version
      operations. Force new open when getting SACL and avoid
      reuse of files that were previously open without
      sufficient privileges to access SACLs.
      Signed-off-by: NBoris Protopopov <pboris@amazon.com>
      Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      9541b813
    • B
      SMB3: Add support for getting and setting SACLs · 3970acf7
      Boris Protopopov 提交于
      Add SYSTEM_SECURITY access flag and use with smb2 when opening
      files for getting/setting SACLs. Add "system.cifs_ntsd_full"
      extended attribute to allow user-space access to the functionality.
      Avoid multiple server calls when setting owner, DACL, and SACL.
      Signed-off-by: NBoris Protopopov <pboris@amazon.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      3970acf7
  10. 18 12月, 2020 4 次提交
    • P
      io_uring: close a small race gap for files cancel · dfea9fce
      Pavel Begunkov 提交于
      The purpose of io_uring_cancel_files() is to wait for all requests
      matching ->files to go/be cancelled. We should first drop files of a
      request in io_req_drop_files() and only then make it undiscoverable for
      io_uring_cancel_files.
      
      First drop, then delete from list. It's ok to leave req->id->files
      dangling, because it's not dereferenced by cancellation code, only
      compared against. It would potentially go to sleep and be awaken by
      following in io_req_drop_files() wake_up().
      
      Fixes: 0f212204 ("io_uring: don't rely on weak ->files references")
      Cc: <stable@vger.kernel.org> # 5.5+
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dfea9fce
    • X
      io_uring: fix io_wqe->work_list corruption · 0020ef04
      Xiaoguang Wang 提交于
      For the first time a req punted to io-wq, we'll initialize io_wq_work's
      list to be NULL, then insert req to io_wqe->work_list. If this req is not
      inserted into tail of io_wqe->work_list, this req's io_wq_work list will
      point to another req's io_wq_work. For splitted bio case, this req maybe
      inserted to io_wqe->work_list repeatedly, once we insert it to tail of
      io_wqe->work_list for the second time, now io_wq_work->list->next will be
      invalid pointer, which then result in many strang error, panic, kernel
      soft-lockup, rcu stall, etc.
      
      In my vm, kernel doest not have commit cc29e1bf ("block: disable
      iopoll for split bio"), below fio job can reproduce this bug steadily:
      [global]
      name=iouring-sqpoll-iopoll-1
      ioengine=io_uring
      iodepth=128
      numjobs=1
      thread
      rw=randread
      direct=1
      registerfiles=1
      hipri=1
      bs=4m
      size=100M
      runtime=120
      time_based
      group_reporting
      randrepeat=0
      
      [device]
      directory=/home/feiman.wxg/mntpoint/  # an ext4 mount point
      
      If we have commit cc29e1bf ("block: disable iopoll for split bio"),
      there will no splitted bio case for polled io, but I think we still to need
      to fix this list corruption, it also should maybe go to stable branchs.
      
      To fix this corruption, if a req is inserted into tail of io_wqe->work_list,
      initialize req->io_wq_work->list->next to bu NULL.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0020ef04
    • S
      cifs: Avoid error pointer dereference · 0bf1bafb
      Samuel Cabrero 提交于
      The patch 7d6535b7: "cifs: Simplify reconnect code when dfs
      upcall is enabled" leads to the following static checker warning:
      
      	fs/cifs/connect.c:160 reconn_set_next_dfs_target()
      	error: 'server->hostname' dereferencing possible ERR_PTR()
      
      Avoid dereferencing the error pointer by early returning on error
      condition.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NSamuel Cabrero <scabrero@suse.de>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      0bf1bafb
    • D
      cifs: Re-indent cifs_swn_reconnect() · 0f2c66ae
      Dan Carpenter 提交于
      This code is slightly nicer if we flip the cifs_sockaddr_equal()
      around and pull all the code in one tab.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NSamuel Cabrero <scabrero@suse.de>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      0f2c66ae