1. 28 4月, 2021 1 次提交
  2. 27 4月, 2021 1 次提交
  3. 24 4月, 2021 2 次提交
  4. 23 4月, 2021 9 次提交
  5. 18 4月, 2021 1 次提交
    • L
      readdir: make sure to verify directory entry for legacy interfaces too · 0c93ac69
      Linus Torvalds 提交于
      This does the directory entry name verification for the legacy
      "fillonedir" (and compat) interface that goes all the way back to the
      dark ages before we had a proper dirent, and the readdir() system call
      returned just a single entry at a time.
      
      Nobody should use this interface unless you still have binaries from
      1991, but let's do it right.
      
      This came up during discussions about unsafe_copy_to_user() and proper
      checking of all the inputs to it, as the networking layer is looking to
      use it in a few new places.  So let's make sure the _old_ users do it
      all right and proper, before we add new ones.
      
      See also commit 8a23eb80 ("Make filldir[64]() verify the directory
      entry filename is valid") which did the proper modern interfaces that
      people actually use. It had a note:
      
          Note that I didn't bother adding the checks to any legacy interfaces
          that nobody uses.
      
      which this now corrects.  Note that we really don't care about POSIX and
      the presense of '/' in a directory entry, but verify_dirent_name() also
      ends up doing the proper name length verification which is what the
      input checking discussion was about.
      
      [ Another option would be to remove the support for this particular very
        old interface: any binaries that use it are likely a.out binaries, and
        they will no longer run anyway since we removed a.out binftm support
        in commit eac61655 ("x86: Deprecate a.out support").
      
        But I'm not sure which came first: getdents() or ELF support, so let's
        pretend somebody might still have a working binary that uses the
        legacy readdir() case.. ]
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjbvzCAhAtvG0d81W5o0-KT5PPTHhfJ5ieDFq+bGtgOYg@mail.gmail.com/Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c93ac69
  6. 15 4月, 2021 1 次提交
    • P
      io_uring: fix early sqd_list removal sqpoll hangs · c7d95613
      Pavel Begunkov 提交于
      [  245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds.
      [  245.463334] task:iou-sqp-1374    state:D flags:0x00004000
      [  245.463345] Call Trace:
      [  245.463352]  __schedule+0x36b/0x950
      [  245.463376]  schedule+0x68/0xe0
      [  245.463385]  __io_uring_cancel+0xfb/0x1a0
      [  245.463407]  do_exit+0xc0/0xb40
      [  245.463423]  io_sq_thread+0x49b/0x710
      [  245.463445]  ret_from_fork+0x22/0x30
      
      It happens when sqpoll forgot to run park_task_work and goes to exit,
      then exiting user may remove ctx from sqd_list, and so corresponding
      io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully
      it just stucks in do_exit() in this case.
      
      Fixes: dbe1bdbb ("io_uring: handle signals for IO threads like a normal thread")
      Reported-by: NJoakim Hassila <joj@mac.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c7d95613
  7. 10 4月, 2021 3 次提交
    • N
      btrfs: zoned: move superblock logging zone location · 53b74fa9
      Naohiro Aota 提交于
      Moves the location of the superblock logging zones. The new locations of
      the logging zones are now determined based on fixed block addresses
      instead of on fixed zone numbers.
      
      The old placement method based on fixed zone numbers causes problems when
      one needs to inspect a file system image without access to the drive zone
      information. In such case, the super block locations cannot be reliably
      determined as the zone size is unknown. By locating the superblock logging
      zones using fixed addresses, we can scan a dumped file system image without
      the zone information since a super block copy will always be present at or
      after the fixed known locations.
      
      Introduce the following three pairs of zones containing fixed offset
      locations, regardless of the device zone size.
      
        - primary superblock: offset   0B (and the following zone)
        - first copy:         offset 512G (and the following zone)
        - Second copy:        offset   4T (4096G, and the following zone)
      
      If a logging zone is outside of the disk capacity, we do not record the
      superblock copy.
      
      The first copy position is much larger than for a non-zoned filesystem,
      which is at 64M.  This is to avoid overlapping with the log zones for
      the primary superblock. This higher location is arbitrary but allows
      supporting devices with very large zone sizes, plus some space around in
      between.
      
      Such large zone size is unrealistic and very unlikely to ever be seen in
      real devices. Currently, SMR disks have a zone size of 256MB, and we are
      expecting ZNS drives to be in the 1-4GB range, so this limit gives us
      room to breathe. For now, we only allow zone sizes up to 8GB. The
      maximum zone size that would still fit in the space is 256G.
      
      The fixed location addresses are somewhat arbitrary, with the intent of
      maintaining superblock reliability for smaller and larger devices, with
      the preference for the latter. For this reason, there are two superblocks
      under the first 1T. This should cover use cases for physical devices and
      for emulated/device-mapper devices.
      
      The superblock logging zones are reserved for superblock logging and
      never used for data or metadata blocks. Note that we only reserve the
      two zones per primary/copy actually used for superblock logging. We do
      not reserve the ranges of zones possibly containing superblocks with the
      largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).
      
      The zones containing the fixed location offsets used to store
      superblocks on a non-zoned volume are also reserved to avoid confusion.
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      53b74fa9
    • J
      fs: direct-io: fix missing sdio->boundary · df41872b
      Jack Qiu 提交于
      I encountered a hung task issue, but not a performance one.  I run DIO
      on a device (need lba continuous, for example open channel ssd), maybe
      hungtask in below case:
      
        DIO:						Checkpoint:
        get addr A(at boundary), merge into BIO,
        no submit because boundary missing
      						flush dirty data(get addr A+1), wait IO(A+1)
      						writeback timeout, because DIO(A) didn't submit
        get addr A+2 fail, because checkpoint is doing
      
      dio_send_cur_page() may clear sdio->boundary, so prevent it from missing
      a boundary.
      
      Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com
      Fixes: b1058b98 ("direct-io: submit bio after boundary buffer is added to it")
      Signed-off-by: NJack Qiu <jack.qiu@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df41872b
    • W
      ocfs2: fix deadlock between setattr and dio_end_io_write · 90bd070a
      Wengang Wang 提交于
      The following deadlock is detected:
      
        truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).
      
        PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
         #0  __schedule at ffffffff818667cc
         #1  schedule at ffffffff81866de6
         #2  inode_dio_wait at ffffffff812a2d04
         #3  ocfs2_setattr at ffffffffc05f322e [ocfs2]
         #4  notify_change at ffffffff812a5a09
         #5  do_truncate at ffffffff812808f5
         #6  do_sys_ftruncate.constprop.18 at ffffffff81280cf2
         #7  sys_ftruncate at ffffffff81280d8e
         #8  do_syscall_64 at ffffffff81003949
         #9  entry_SYSCALL_64_after_hwframe at ffffffff81a001ad
      
      dio completion path is going to complete one direct IO (decrement
      inode->i_dio_count), but before that it hung at locking inode->i_rwsem:
      
         #0  __schedule+700 at ffffffff818667cc
         #1  schedule+54 at ffffffff81866de6
         #2  rwsem_down_write_failed+536 at ffffffff8186aa28
         #3  call_rwsem_down_write_failed+23 at ffffffff8185a1b7
         #4  down_write+45 at ffffffff81869c9d
         #5  ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
         #6  ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
         #7  dio_complete+140 at ffffffff812c873c
         #8  dio_aio_complete_work+25 at ffffffff812c89f9
         #9  process_one_work+361 at ffffffff810b1889
        #10  worker_thread+77 at ffffffff810b233d
        #11  kthread+261 at ffffffff810b7fd5
        #12  ret_from_fork+62 at ffffffff81a0035e
      
      Thus above forms ABBA deadlock.  The same deadlock was mentioned in
      upstream commit 28f5a8a7 ("ocfs2: should wait dio before inode lock
      in ocfs2_setattr()").  It seems that that commit only removed the
      cluster lock (the victim of above dead lock) from the ABBA deadlock
      party.
      
      End-user visible effects: Process hang in truncate -> ocfs2_setattr path
      and other processes hang at ocfs2_dio_end_io_write path.
      
      This is to fix the deadlock itself.  It removes inode_lock() call from
      dio completion path to remove the deadlock and add ip_alloc_sem lock in
      setattr path to synchronize the inode modifications.
      
      [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
        Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com
      
      Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.comSigned-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90bd070a
  8. 09 4月, 2021 2 次提交
  9. 08 4月, 2021 4 次提交
  10. 07 4月, 2021 2 次提交
    • A
      LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late · 4f0ed93f
      Al Viro 提交于
      That (and traversals in case of umount .) should be done before
      complete_walk().  Either a braino or mismerge damage on queue
      reorders - either way, I should've spotted that much earlier.
      Fucked-up-by: NAl Viro <viro@zeniv.linux.org.uk>
      X-Paperbag: Brown
      Fixes: 161aff1d "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()"
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4f0ed93f
    • A
      Make sure nd->path.mnt and nd->path.dentry are always valid pointers · 7d01ef75
      Al Viro 提交于
      Initialize them in set_nameidata() and make sure that terminate_walk() clears them
      once the pointers become potentially invalid (i.e. we leave RCU mode or drop them
      in non-RCU one).  Currently we have "path_init() always initializes them and nobody
      accesses them outside of path_init()/terminate_walk() segments", which is asking
      for trouble.
      
      With that change we would have nd->path.{mnt,dentry}
      	1) always valid - NULL or pointing to currently allocated objects.
      	2) non-NULL while we are successfully walking
      	3) NULL when we are not walking at all
      	4) contributing to refcounts whenever non-NULL outside of RCU mode.
      
      Fixes: 6c6ec2b0 ("fs: add support for LOOKUP_CACHED")
      Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com
      Tested-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7d01ef75
  11. 03 4月, 2021 1 次提交
    • J
      io_uring: fix !CONFIG_BLOCK compilation failure · e82ad485
      Jens Axboe 提交于
      kernel test robot correctly pinpoints a compilation failure if
      CONFIG_BLOCK isn't set:
      
      fs/io_uring.c: In function '__io_complete_rw':
      >> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration]
          2509 |  if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) {
               |                                                ^~~~~~~~~~~~~~~~~~~~
               |                                                io_rw_reissue
          cc1: some warnings being treated as errors
      
      Ensure that we have a stub declaration of io_rw_should_reissue() for
      !CONFIG_BLOCK.
      
      Fixes: 230d50d4 ("io_uring: move reissue into regular IO path")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e82ad485
  12. 02 4月, 2021 3 次提交
    • J
      io_uring: move reissue into regular IO path · 230d50d4
      Jens Axboe 提交于
      It's non-obvious how retry is done for block backed files, when it happens
      off the kiocb done path. It also makes it tricky to deal with the iov_iter
      handling.
      
      Just mark the req as needing a reissue, and handling it from the
      submission path instead. This makes it directly obvious that we're not
      re-importing the iovec from userspace past the submit point, and it means
      that we can just reuse our usual -EAGAIN retry path from the read/write
      handling.
      
      At some point in the future, we'll gain the ability to always reliably
      return -EAGAIN through the stack. A previous attempt on the block side
      didn't pan out and got reverted, hence the need to check for this
      information out-of-band right now.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      230d50d4
    • P
      block: don't ignore REQ_NOWAIT for direct IO · f8b78caf
      Pavel Begunkov 提交于
      If IOCB_NOWAIT is set on submission, then that needs to get propagated to
      REQ_NOWAIT on the block side. Otherwise we completely lose this
      information, and any issuer of IOCB_NOWAIT IO will potentially end up
      blocking on eg request allocation on the storage side.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f8b78caf
    • C
      file: fix close_range() for unshare+cloexec · 9b5b8722
      Christian Brauner 提交于
      syzbot reported a bug when putting the last reference to a tasks file
      descriptor table. Debugging this showed we didn't recalculate the
      current maximum fd number for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC
      after we unshared the file descriptors table. So max_fd could exceed the
      current fdtable maximum causing us to set excessive bits. As a concrete
      example, let's say the user requested everything from fd 4 to ~0UL to be
      closed and their current fdtable size is 256 with their highest open fd
      being 4. With CLOSE_RANGE_UNSHARE the caller will end up with a new
      fdtable which has room for 64 file descriptors since that is the lowest
      fdtable size we accept. But now max_fd will still point to 255 and needs
      to be adjusted. Fix this by retrieving the correct maximum fd value in
      __range_cloexec().
      
      Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
      Fixes: 582f1fb6 ("fs, close_range: add flag CLOSE_RANGE_CLOEXEC")
      Fixes: fec8a6a6 ("close_range: unshare all fds for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Giuseppe Scrivano <gscrivan@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      9b5b8722
  13. 01 4月, 2021 3 次提交
  14. 31 3月, 2021 2 次提交
    • T
      reiserfs: update reiserfs_xattrs_initialized() condition · 5e46d1b7
      Tetsuo Handa 提交于
      syzbot is reporting NULL pointer dereference at reiserfs_security_init()
      [1], for commit ab17c4f0 ("reiserfs: fixup xattr_root caching")
      is assuming that REISERFS_SB(s)->xattr_root != NULL in
      reiserfs_xattr_jcreate_nblocks() despite that commit made
      REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL
      case possible.
      
      I guess that commit 6cb4aff0 ("reiserfs: fix oops while creating
      privroot with selinux enabled") wanted to check xattr_root != NULL
      before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking
      about the xattr root.
      
        The issue is that while creating the privroot during mount
        reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
        dereferences the xattr root. The xattr root doesn't exist, so we get
        an oops.
      
      Therefore, update reiserfs_xattrs_initialized() to check both the
      privroot and the xattr root.
      
      Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1]
      Reported-and-tested-by: Nsyzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com>
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Fixes: 6cb4aff0 ("reiserfs: fix oops while creating privroot with selinux enabled")
      Acked-by: NJeff Mahoney <jeffm@suse.com>
      Acked-by: NJan Kara <jack@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e46d1b7
    • J
      io_uring: drop sqd lock before handling signals for SQPOLL · 82734c5b
      Jens Axboe 提交于
      Don't call into get_signal() with the sqd mutex held, it'll fail if we're
      freezing the task and we'll get complaints on locks still being held:
      
      ====================================
      WARNING: iou-sqp-8386/8387 still has locks held!
      5.12.0-rc4-syzkaller #0 Not tainted
      ------------------------------------
      1 lock held by iou-sqp-8386/8387:
       #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731
      
       stack backtrace:
       CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       Call Trace:
        __dump_stack lib/dump_stack.c:79 [inline]
        dump_stack+0x141/0x1d7 lib/dump_stack.c:120
        try_to_freeze include/linux/freezer.h:66 [inline]
        get_signal+0x171a/0x2150 kernel/signal.c:2576
        io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748
      
      Fold the get_signal() case in with the parking checks, as we need to drop
      the lock in both cases, and since we need to be checking for parking when
      juggling the lock anyway.
      
      Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com
      Fixes: dbe1bdbb ("io_uring: handle signals for IO threads like a normal thread")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      82734c5b
  15. 29 3月, 2021 2 次提交
  16. 28 3月, 2021 3 次提交