1. 21 7月, 2019 1 次提交
    • D
      afs: Fix uninitialised spinlock afs_volume::cb_break_lock · fdfff855
      David Howells 提交于
      [ Upstream commit 90fa9b64523a645a97edc0bdcf2d74759957eeee ]
      
      Fix the cb_break_lock spinlock in afs_volume struct by initialising it when
      the volume record is allocated.
      
      Also rename the lock to cb_v_break_lock to distinguish it from the lock of
      the same name in the afs_server struct.
      
      Without this, the following trace may be observed when a volume-break
      callback is received:
      
        INFO: trying to register non-static key.
        the code is fine but needs lockdep annotation.
        turning off the locking correctness validator.
        CPU: 2 PID: 50 Comm: kworker/2:1 Not tainted 5.2.0-rc1-fscache+ #3045
        Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
        Workqueue: afs SRXAFSCB_CallBack
        Call Trace:
         dump_stack+0x67/0x8e
         register_lock_class+0x23b/0x421
         ? check_usage_forwards+0x13c/0x13c
         __lock_acquire+0x89/0xf73
         lock_acquire+0x13b/0x166
         ? afs_break_callbacks+0x1b2/0x3dd
         _raw_write_lock+0x2c/0x36
         ? afs_break_callbacks+0x1b2/0x3dd
         afs_break_callbacks+0x1b2/0x3dd
         ? trace_event_raw_event_afs_server+0x61/0xac
         SRXAFSCB_CallBack+0x11f/0x16c
         process_one_work+0x2c5/0x4ee
         ? worker_thread+0x234/0x2ac
         worker_thread+0x1d8/0x2ac
         ? cancel_delayed_work_sync+0xf/0xf
         kthread+0x11f/0x127
         ? kthread_park+0x76/0x76
         ret_from_fork+0x24/0x30
      
      Fixes: 68251f0a ("afs: Fix whole-volume callback handling")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fdfff855
  2. 14 7月, 2019 4 次提交
  3. 10 7月, 2019 4 次提交
    • P
      nfsd: Fix overflow causing non-working mounts on 1 TB machines · 8129a10c
      Paul Menzel 提交于
      commit 3b2d4dcf71c4a91b420f835e52ddea8192300a3b upstream.
      
      Since commit 10a68cdf (nfsd: fix performance-limiting session
      calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with
      1 TB of memory cannot be mounted anymore. The mount just hangs on the
      client.
      
      The gist of commit 10a68cdf is the change below.
      
          -avail = clamp_t(int, avail, slotsize, avail/3);
          +avail = clamp_t(int, avail, slotsize, total_avail/3);
      
      Here are the macros.
      
          #define min_t(type, x, y)       __careful_cmp((type)(x), (type)(y), <)
          #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi)
      
      `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts
      the values to `int`, which for 32-bit integers can only hold values
      −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1).
      
      `avail` (in the function signature) is just 65536, so that no overflow
      was happening. Before the commit the assignment would result in 21845,
      and `num = 4`.
      
      When using `total_avail`, it is causing the assignment to be
      18446744072226137429 (printed as %lu), and `num` is then 4164608182.
      
      My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the
      server thinks there is no memory available any more for this client.
      
      Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long`
      fixes the issue.
      
      Now, `avail = 65536` (before commit 10a68cdf `avail = 21845`), but
      `num = 4` remains the same.
      
      Fixes: c54f24e338ed (nfsd: fix performance-limiting session calculation)
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8129a10c
    • J
      f2fs: don't access node/meta inode mapping after iput · e2379b04
      Jaegeuk Kim 提交于
      [ Upstream commit 7c77bf7de1574ac7a31a2b76f4927404307d13e7 ]
      
      This fixes wrong access of address spaces of node and meta inodes after iput.
      
      Fixes: 60aa4d5536ab ("f2fs: fix use-after-free issue when accessing sbi->stat_info")
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e2379b04
    • N
      btrfs: Ensure replaced device doesn't have pending chunk allocation · fb814f21
      Nikolay Borisov 提交于
      commit debd1c065d2037919a7da67baf55cc683fee09f0 upstream.
      
      Recent FITRIM work, namely bbbf7243d62d ("btrfs: combine device update
      operations during transaction commit") combined the way certain
      operations are recoded in a transaction. As a result an ASSERT was added
      in dev_replace_finish to ensure the new code works correctly.
      Unfortunately I got reports that it's possible to trigger the assert,
      meaning that during a device replace it's possible to have an unfinished
      chunk allocation on the source device.
      
      This is supposed to be prevented by the fact that a transaction is
      committed before finishing the replace oepration and alter acquiring the
      chunk mutex. This is not sufficient since by the time the transaction is
      committed and the chunk mutex acquired it's possible to allocate a chunk
      depending on the workload being executed on the replaced device. This
      bug has been present ever since device replace was introduced but there
      was never code which checks for it.
      
      The correct way to fix is to ensure that there is no pending device
      modification operation when the chunk mutex is acquire and if there is
      repeat transaction commit. Unfortunately it's not possible to just
      exclude the source device from btrfs_fs_devices::dev_alloc_list since
      this causes ENOSPC to be hit in transaction commit.
      
      Fixing that in another way would need to add special cases to handle the
      last writes and forbid new ones. The looped transaction fix is more
      obvious, and can be easily backported. The runtime of dev-replace is
      long so there's no noticeable delay caused by that.
      Reported-by: NDavid Sterba <dsterba@suse.com>
      Fixes: 391cd9df ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      fb814f21
    • E
      fs/userfaultfd.c: disable irqs for fault_pending and event locks · 052b3181
      Eric Biggers 提交于
      commit cbcfa130a911c613a1d9d921af2eea171c414172 upstream.
      
      When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
      and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.
      
      This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
      userfaultfd_ctx_read(), which in turn can be waiting for
      userfaultfd_ctx::fault_pending_wqh.lock or
      userfaultfd_ctx::event_wqh.lock.
      
      But elsewhere the fault_pending_wqh and event_wqh locks are taken with
      IRQs enabled.  Since the IRQ handler may take kioctx::ctx_lock, lockdep
      reports that a deadlock is possible.
      
      Fix it by always disabling IRQs when taking the fault_pending_wqh and
      event_wqh locks.
      
      Commit ae62c16e105a ("userfaultfd: disable irqs when taking the
      waitqueue lock") didn't fix this because it only accounted for the
      fd_wqh lock, not the other locks nested inside it.
      
      Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
      Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
      Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.19+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      052b3181
  4. 03 7月, 2019 4 次提交
  5. 25 6月, 2019 8 次提交
    • S
      SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write · 5293c79c
      Steve French 提交于
      commit 8d526d62db907e786fd88948c75d1833d82bd80e upstream.
      
      Some servers such as Windows 10 will return STATUS_INSUFFICIENT_RESOURCES
      as the number of simultaneous SMB3 requests grows (even though the client
      has sufficient credits).  Return EAGAIN on STATUS_INSUFFICIENT_RESOURCES
      so that we can retry writes which fail with this status code.
      
      This (for example) fixes large file copies to Windows 10 on fast networks.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5293c79c
    • N
      btrfs: start readahead also in seed devices · c592b1c3
      Naohiro Aota 提交于
      commit c4e0540d0ad49c8ceab06cceed1de27c4fe29f6e upstream.
      
      Currently, btrfs does not consult seed devices to start readahead. As a
      result, if readahead zone is added to the seed devices, btrfs_reada_wait()
      indefinitely wait for the reada_ctl to finish.
      
      You can reproduce the hung by modifying btrfs/163 to have larger initial
      file size (e.g. xfs_io pwrite 4M instead of current 256K).
      
      Fixes: 7414a03f ("btrfs: initial readahead code and prototypes")
      Cc: stable@vger.kernel.org # 3.2+: ce7791ff: Btrfs: fix race between readahead and device replace/removal
      Cc: stable@vger.kernel.org # 3.2+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c592b1c3
    • A
      ovl: fix bogus -Wmaybe-unitialized warning · 0319ef1d
      Arnd Bergmann 提交于
      [ Upstream commit 1dac6f5b0ed2601be21bb4e27a44b0c3e667b7f4 ]
      
      gcc gets a bit confused by the logic in ovl_setup_trap() and
      can't figure out whether the local 'trap' variable in the caller
      was initialized or not:
      
      fs/overlayfs/super.c: In function 'ovl_fill_super':
      fs/overlayfs/super.c:1333:4: error: 'trap' may be used uninitialized in this function [-Werror=maybe-uninitialized]
          iput(trap);
          ^~~~~~~~~~
      fs/overlayfs/super.c:1312:17: note: 'trap' was declared here
      
      Reword slightly to make it easier for the compiler to understand.
      
      Fixes: 146d62e5a586 ("ovl: detect overlapping layers")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0319ef1d
    • M
      ovl: don't fail with disconnected lower NFS · 639e8c2f
      Miklos Szeredi 提交于
      [ Upstream commit 9179c21dc6ed1c993caa5fe4da876a6765c26af7 ]
      
      NFS mounts can be disconnected from fs root.  Don't fail the overlapping
      layer check because of this.
      
      The check is not authoritative anyway, since topology can change during or
      after the check.
      Reported-by: NAntti Antinoja <antti@fennosys.fi>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 146d62e5a586 ("ovl: detect overlapping layers")
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      639e8c2f
    • A
      ovl: detect overlapping layers · f1c5aa5e
      Amir Goldstein 提交于
      [ Upstream commit 146d62e5a5867fbf84490d82455718bfb10fe824 ]
      
      Overlapping overlay layers are not supported and can cause unexpected
      behavior, but overlayfs does not currently check or warn about these
      configurations.
      
      User is not supposed to specify the same directory for upper and
      lower dirs or for different lower layers and user is not supposed to
      specify directories that are descendants of each other for overlay
      layers, but that is exactly what this zysbot repro did:
      
          https://syzkaller.appspot.com/x/repro.syz?x=12c7a94f400000
      
      Moving layer root directories into other layers while overlayfs
      is mounted could also result in unexpected behavior.
      
      This commit places "traps" in the overlay inode hash table.
      Those traps are dummy overlay inodes that are hashed by the layers
      root inodes.
      
      On mount, the hash table trap entries are used to verify that overlay
      layers are not overlapping.  While at it, we also verify that overlay
      layers are not overlapping with directories "in-use" by other overlay
      instances as upperdir/workdir.
      
      On lookup, the trap entries are used to verify that overlay layers
      root inodes have not been moved into other layers after mount.
      
      Some examples:
      
      $ ./run --ov --samefs -s
      ...
      ( mkdir -p base/upper/0/u base/upper/0/w base/lower lower upper mnt
        mount -o bind base/lower lower
        mount -o bind base/upper upper
        mount -t overlay none mnt ...
              -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w)
      
      $ umount mnt
      $ mount -t overlay none mnt ...
              -o lowerdir=base,upperdir=upper/0/u,workdir=upper/0/w
      
        [   94.434900] overlayfs: overlapping upperdir path
        mount: mount overlay on mnt failed: Too many levels of symbolic links
      
      $ mount -t overlay none mnt ...
              -o lowerdir=upper/0/u,upperdir=upper/0/u,workdir=upper/0/w
      
        [  151.350132] overlayfs: conflicting lowerdir path
        mount: none is already mounted or mnt busy
      
      $ mount -t overlay none mnt ...
              -o lowerdir=lower:lower/a,upperdir=upper/0/u,workdir=upper/0/w
      
        [  201.205045] overlayfs: overlapping lowerdir path
        mount: mount overlay on mnt failed: Too many levels of symbolic links
      
      $ mount -t overlay none mnt ...
              -o lowerdir=lower,upperdir=upper/0/u,workdir=upper/0/w
      $ mv base/upper/0/ base/lower/
      $ find mnt/0
        mnt/0
        mnt/0/w
        find: 'mnt/0/w/work': Too many levels of symbolic links
        find: 'mnt/0/u': Too many levels of symbolic links
      
      Reported-by: syzbot+9c69c282adc4edd2b540@syzkaller.appspotmail.com
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f1c5aa5e
    • A
      ovl: make i_ino consistent with st_ino in more cases · a00f405e
      Amir Goldstein 提交于
      [ Upstream commit 6dde1e42f497b2d4e22466f23019016775607947 ]
      
      Relax the condition that overlayfs supports nfs export, to require
      that i_ino is consistent with st_ino/d_ino.
      
      It is enough to require that st_ino and d_ino are consistent.
      
      This fixes the failure of xfstest generic/504, due to mismatch of
      st_ino to inode number in the output of /proc/locks.
      
      Fixes: 12574a9f ("ovl: consistent i_ino for non-samefs with xino")
      Cc: <stable@vger.kernel.org> # v4.19
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a00f405e
    • A
      ovl: fix wrong flags check in FS_IOC_FS[SG]ETXATTR ioctls · d6623379
      Amir Goldstein 提交于
      [ Upstream commit 941d935ac7636911a3fd8fa80e758e52b0b11e20 ]
      
      The ioctl argument was parsed as the wrong type.
      
      Fixes: b21d9c435f93 ("ovl: support the FS_IOC_FS[SG]ETXATTR ioctls")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d6623379
    • A
      ovl: support the FS_IOC_FS[SG]ETXATTR ioctls · 3cb5d7fa
      Amir Goldstein 提交于
      [ Upstream commit b21d9c435f935014d3e3fa6914f2e4fbabb0e94d ]
      
      They are the extended version of FS_IOC_FS[SG]ETFLAGS ioctls.
      xfs_io -c "chattr <flags>" uses the new ioctls for setting flags.
      
      This used to work in kernel pre v4.19, before stacked file ops
      introduced the ovl_ioctl whitelist.
      Reported-by: NDave Chinner <david@fromorbit.com>
      Fixes: d1d04ef8 ("ovl: stack file ops")
      Cc: <stable@vger.kernel.org> # v4.19
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3cb5d7fa
  6. 22 6月, 2019 3 次提交
  7. 19 6月, 2019 2 次提交
    • R
      f2fs: fix to avoid accessing xattr across the boundary · ae3787d4
      Randall Huang 提交于
      [ Upstream commit 2777e654371dd4207a3a7f4fb5fa39550053a080 ]
      
      When we traverse xattr entries via __find_xattr(),
      if the raw filesystem content is faked or any hardware failure occurs,
      out-of-bound error can be detected by KASAN.
      Fix the issue by introducing boundary check.
      
      [   38.402878] c7   1827 BUG: KASAN: slab-out-of-bounds in f2fs_getxattr+0x518/0x68c
      [   38.402891] c7   1827 Read of size 4 at addr ffffffc0b6fb35dc by task
      [   38.402935] c7   1827 Call trace:
      [   38.402952] c7   1827 [<ffffff900809003c>] dump_backtrace+0x0/0x6bc
      [   38.402966] c7   1827 [<ffffff9008090030>] show_stack+0x20/0x2c
      [   38.402981] c7   1827 [<ffffff900871ab10>] dump_stack+0xfc/0x140
      [   38.402995] c7   1827 [<ffffff9008325c40>] print_address_description+0x80/0x2d8
      [   38.403009] c7   1827 [<ffffff900832629c>] kasan_report_error+0x198/0x1fc
      [   38.403022] c7   1827 [<ffffff9008326104>] kasan_report_error+0x0/0x1fc
      [   38.403037] c7   1827 [<ffffff9008325000>] __asan_load4+0x1b0/0x1b8
      [   38.403051] c7   1827 [<ffffff90085fcc44>] f2fs_getxattr+0x518/0x68c
      [   38.403066] c7   1827 [<ffffff90085fc508>] f2fs_xattr_generic_get+0xb0/0xd0
      [   38.403080] c7   1827 [<ffffff9008395708>] __vfs_getxattr+0x1f4/0x1fc
      [   38.403096] c7   1827 [<ffffff9008621bd0>] inode_doinit_with_dentry+0x360/0x938
      [   38.403109] c7   1827 [<ffffff900862d6cc>] selinux_d_instantiate+0x2c/0x38
      [   38.403123] c7   1827 [<ffffff900861b018>] security_d_instantiate+0x68/0x98
      [   38.403136] c7   1827 [<ffffff9008377db8>] d_splice_alias+0x58/0x348
      [   38.403149] c7   1827 [<ffffff900858d16c>] f2fs_lookup+0x608/0x774
      [   38.403163] c7   1827 [<ffffff900835eacc>] lookup_slow+0x1e0/0x2cc
      [   38.403177] c7   1827 [<ffffff9008367fe0>] walk_component+0x160/0x520
      [   38.403190] c7   1827 [<ffffff9008369ef4>] path_lookupat+0x110/0x2b4
      [   38.403203] c7   1827 [<ffffff900835dd38>] filename_lookup+0x1d8/0x3a8
      [   38.403216] c7   1827 [<ffffff900835eeb0>] user_path_at_empty+0x54/0x68
      [   38.403229] c7   1827 [<ffffff9008395f44>] SyS_getxattr+0xb4/0x18c
      [   38.403241] c7   1827 [<ffffff9008084200>] el0_svc_naked+0x34/0x38
      Signed-off-by: NRandall Huang <huangrandall@google.com>
      [Jaegeuk Kim: Fix wrong ending boundary]
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ae3787d4
    • W
      fs/ocfs2: fix race in ocfs2_dentry_attach_lock() · 6b9aa7ac
      Wengang Wang 提交于
      commit be99ca2716972a712cde46092c54dee5e6192bf8 upstream.
      
      ocfs2_dentry_attach_lock() can be executed in parallel threads against the
      same dentry.  Make that race safe.  The race is like this:
      
                  thread A                               thread B
      
      (A1) enter ocfs2_dentry_attach_lock,
      seeing dentry->d_fsdata is NULL,
      and no alias found by
      ocfs2_find_local_alias, so kmalloc
      a new ocfs2_dentry_lock structure
      to local variable "dl", dl1
      
                     .....
      
                                          (B1) enter ocfs2_dentry_attach_lock,
                                          seeing dentry->d_fsdata is NULL,
                                          and no alias found by
                                          ocfs2_find_local_alias so kmalloc
                                          a new ocfs2_dentry_lock structure
                                          to local variable "dl", dl2.
      
                                                         ......
      
      (A2) set dentry->d_fsdata with dl1,
      call ocfs2_dentry_lock() and increase
      dl1->dl_lockres.l_ro_holders to 1 on
      success.
                    ......
      
                                          (B2) set dentry->d_fsdata with dl2
                                          call ocfs2_dentry_lock() and increase
      				    dl2->dl_lockres.l_ro_holders to 1 on
      				    success.
      
                                                        ......
      
      (A3) call ocfs2_dentry_unlock()
      and decrease
      dl2->dl_lockres.l_ro_holders to 0
      on success.
                   ....
      
                                          (B3) call ocfs2_dentry_unlock(),
                                          decreasing
      				    dl2->dl_lockres.l_ro_holders, but
      				    see it's zero now, panic
      
      Link: http://lkml.kernel.org/r/20190529174636.22364-1-wen.gang.wang@oracle.comSigned-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Reported-by: NDaniel Sobe <daniel.sobe@nxp.com>
      Tested-by: NDaniel Sobe <daniel.sobe@nxp.com>
      Reviewed-by: NChangwei Ge <gechangwei@live.cn>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b9aa7ac
  8. 15 6月, 2019 14 次提交
    • A
      ovl: support stacked SEEK_HOLE/SEEK_DATA · afec7068
      Amir Goldstein 提交于
      commit 9e46b840c7053b5f7a245e98cd239b60d189a96c upstream.
      
      Overlay file f_pos is the master copy that is preserved
      through copy up and modified on read/write, but only real
      fs knows how to SEEK_HOLE/SEEK_DATA and real fs may impose
      limitations that are more strict than ->s_maxbytes for specific
      files, so we use the real file to perform seeks.
      
      We do not call real fs for SEEK_CUR:0 query and for SEEK_SET:0
      requests.
      
      Fixes: d1d04ef8 ("ovl: stack file ops")
      Reported-by: NEddie Horng <eddiehorng.tw@gmail.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afec7068
    • J
      ovl: check the capability before cred overridden · 22dac6cc
      Jiufei Xue 提交于
      commit 98487de318a6f33312471ae1e2afa16fbf8361fe upstream.
      
      We found that it return success when we set IMMUTABLE_FL flag to a file in
      docker even though the docker didn't have the capability
      CAP_LINUX_IMMUTABLE.
      
      The commit d1d04ef8 ("ovl: stack file ops") and dab5ca8f ("ovl: add
      lsattr/chattr support") implemented chattr operations on a regular overlay
      file. ovl_real_ioctl() overridden the current process's subjective
      credentials with ofs->creator_cred which have the capability
      CAP_LINUX_IMMUTABLE so that it will return success in
      vfs_ioctl()->cap_capable().
      
      Fix this by checking the capability before cred overridden. And here we
      only care about APPEND_FL and IMMUTABLE_FL, so get these information from
      inode.
      
      [SzM: move check and call to underlying fs inside inode locked region to
      prevent two such calls from racing with each other]
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22dac6cc
    • A
      nfsd: avoid uninitialized variable warning · 806e8395
      Arnd Bergmann 提交于
      [ Upstream commit 0ab88ca4bcf18ba21058d8f19220f60afe0d34d8 ]
      
      clang warns that 'contextlen' may be accessed without an initialization:
      
      fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized]
                                                                      contextlen);
                                                                      ^~~~~~~~~~
      fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning
              int contextlen;
                            ^
                             = 0
      
      Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is
      set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled.
      Adding another #ifdef like the other two in this function
      avoids the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      806e8395
    • J
      nfsd: allow fh_want_write to be called twice · b4330e4a
      J. Bruce Fields 提交于
      [ Upstream commit 0b8f62625dc309651d0efcb6a6247c933acd8b45 ]
      
      A fuzzer recently triggered lockdep warnings about potential sb_writers
      deadlocks caused by fh_want_write().
      
      Looks like we aren't careful to pair each fh_want_write() with an
      fh_drop_write().
      
      It's not normally a problem since fh_put() will call fh_drop_write() for
      us.  And was OK for NFSv3 where we'd do one operation that might call
      fh_want_write(), and then put the filehandle.
      
      But an NFSv4 protocol fuzzer can do weird things like call unlink twice
      in a compound, and then we get into trouble.
      
      I'm a little worried about this approach of just leaving everything to
      fh_put().  But I think there are probably a lot of
      fh_want_write()/fh_drop_write() imbalances so for now I think we need it
      to be more forgiving.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b4330e4a
    • K
      fuse: retrieve: cap requested size to negotiated max_write · ae35c325
      Kirill Smelkov 提交于
      [ Upstream commit 7640682e67b33cab8628729afec8ca92b851394f ]
      
      FUSE filesystem server and kernel client negotiate during initialization
      phase, what should be the maximum write size the client will ever issue.
      Correspondingly the filesystem server then queues sys_read calls to read
      requests with buffer capacity large enough to carry request header + that
      max_write bytes. A filesystem server is free to set its max_write in
      anywhere in the range between [1*page, fc->max_pages*page]. In particular
      go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages
      corresponds to 128K. Libfuse also allows users to configure max_write, but
      by default presets it to possible maximum.
      
      If max_write is < fc->max_pages*page, and in NOTIFY_RETRIEVE handler we
      allow to retrieve more than max_write bytes, corresponding prepared
      NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
      filesystem server, in full correspondence with server/client contract, will
      be only queuing sys_read with ~max_write buffer capacity, and
      fuse_dev_do_read throws away requests that cannot fit into server request
      buffer. In turn the filesystem server could get stuck waiting indefinitely
      for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is
      understood by clients as that NOTIFY_REPLY was queued and will be sent
      back.
      
      Cap requested size to negotiate max_write to avoid the problem.  This
      aligns with the way NOTIFY_RETRIEVE handler works, which already
      unconditionally caps requested retrieve size to fuse_conn->max_pages.  This
      way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than
      was originally requested.
      
      Please see [1] for context where the problem of stuck filesystem was hit
      for real, how the situation was traced and for more involving patch that
      did not make it into the tree.
      
      [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
      [2] https://github.com/hanwen/go-fuseSigned-off-by: NKirill Smelkov <kirr@nexedi.com>
      Cc: Han-Wen Nienhuys <hanwen@google.com>
      Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ae35c325
    • A
      ovl: do not generate duplicate fsnotify events for "fake" path · 06382ad6
      Amir Goldstein 提交于
      [ Upstream commit d989903058a83e8536cc7aadf9256a47d5c173fe ]
      
      Overlayfs "fake" path is used for stacked file operations on underlying
      files.  Operations on files with "fake" path must not generate fsnotify
      events with path data, because those events have already been generated at
      overlayfs layer and because the reported event->fd for fanotify marks on
      underlying inode/filesystem will have the wrong path (the overlayfs path).
      
      Link: https://lore.kernel.org/linux-fsdevel/20190423065024.12695-1-jencce.kernel@gmail.com/Reported-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Fixes: d1d04ef8 ("ovl: stack file ops")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      06382ad6
    • Y
      configfs: fix possible use-after-free in configfs_register_group · 4dc146d4
      YueHaibing 提交于
      [ Upstream commit 35399f87e271f7cf3048eab00a421a6519ac8441 ]
      
      In configfs_register_group(), if create_default_group() failed, we
      forget to unlink the group. It will left a invalid item in the parent list,
      which may trigger the use-after-free issue seen below:
      
      BUG: KASAN: use-after-free in __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
      Read of size 8 at addr ffff8881ef61ae20 by task syz-executor.0/5996
      
      CPU: 1 PID: 5996 Comm: syz-executor.0 Tainted: G         C        5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xa9/0x10e lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
       __list_add include/linux/list.h:60 [inline]
       list_add_tail include/linux/list.h:93 [inline]
       link_obj+0xb0/0x190 fs/configfs/dir.c:759
       link_group+0x1c/0x130 fs/configfs/dir.c:784
       configfs_register_group+0x56/0x1e0 fs/configfs/dir.c:1751
       configfs_register_default_group+0x72/0xc0 fs/configfs/dir.c:1834
       ? 0xffffffffc1be0000
       iio_sw_trigger_init+0x23/0x1000 [industrialio_sw_trigger]
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f494ecbcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 00007f494ecbcc70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f494ecbd6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      
      Allocated by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:497
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       configfs_register_default_group+0x4c/0xc0 fs/configfs/dir.c:1829
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5987:
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:459
       slab_free_hook mm/slub.c:1429 [inline]
       slab_free_freelist_hook mm/slub.c:1456 [inline]
       slab_free mm/slub.c:3003 [inline]
       kfree+0xe1/0x270 mm/slub.c:3955
       configfs_register_default_group+0x9a/0xc0 fs/configfs/dir.c:1836
       0xffffffffc1bd0023
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881ef61ae00
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 32 bytes inside of
       192-byte region [ffff8881ef61ae00, ffff8881ef61aec0)
      The buggy address belongs to the page:
      page:ffffea0007bd8680 count:1 mapcount:0 mapping:ffff8881f6c03000 index:0xffff8881ef61a700
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 ffffea0007ca4740 0000000500000005 ffff8881f6c03000
      raw: ffff8881ef61a700 000000008010000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881ef61ad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881ef61ad80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
      >ffff8881ef61ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                     ^
       ffff8881ef61ae80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881ef61af00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 5cf6a51e ("configfs: allow dynamic group creation")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4dc146d4
    • C
      f2fs: fix to do checksum even if inode page is uptodate · 8d7ebdd1
      Chao Yu 提交于
      [ Upstream commit b42b179bda9ff11075a6fc2bac4d9e400513679a ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203221
      
      - Overview
      When mounting the attached crafted image and running program, this error is reported.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_07.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1279!
       RIP: 0010:read_node_page+0xcf/0xf0
       Call Trace:
        __get_node_page+0x6b/0x2f0
        f2fs_iget+0x8f/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_fchmodat+0x3e/0xa0
        __x64_sys_chmod+0x12/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      On below paths, we can have opportunity to readahead inode page
      - gc_node_segment -> f2fs_ra_node_page
      - gc_data_segment -> f2fs_ra_node_page
      - f2fs_fill_dentries -> f2fs_ra_node_page
      
      Unlike synchronized read, on readahead path, we can set page uptodate
      before verifying page's checksum, then read_node_page() will trigger
      kernel panic once it encounters a uptodated page w/ incorrect checksum.
      
      So considering readahead scenario, we have to do checksum each time
      when loading inode page even if it is uptodated.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8d7ebdd1
    • C
      f2fs: fix to do sanity check on valid block count of segment · 64024854
      Chao Yu 提交于
      [ Upstream commit e95bcdb2fefa129f37bd9035af1d234ca92ee4ef ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203233
      
      - Overview
      When mounting the attached crafted image and running program, following errors are reported.
      Additionally, it hangs on sync after running program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      cc poc_13.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Kernel messages
       F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
       kernel BUG at fs/f2fs/segment.c:2102!
       RIP: 0010:update_sit_entry+0x394/0x410
       Call Trace:
        f2fs_allocate_data_block+0x16f/0x660
        do_write_page+0x62/0x170
        f2fs_do_write_node_page+0x33/0xa0
        __write_node_page+0x270/0x4e0
        f2fs_sync_node_pages+0x5df/0x670
        f2fs_write_checkpoint+0x372/0x1400
        f2fs_sync_fs+0xa3/0x130
        f2fs_do_sync_file+0x1a6/0x810
        do_fsync+0x33/0x60
        __x64_sys_fsync+0xb/0x10
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      sit.vblocks and sum valid block count in sit.valid_map may be
      inconsistent, segment w/ zero vblocks will be treated as free
      segment, while allocating in free segment, we may allocate a
      free block, if its bitmap is valid previously, it can cause
      kernel crash due to bitmap verification failure.
      
      Anyway, to avoid further serious metadata inconsistence and
      corruption, it is necessary and worth to detect SIT
      inconsistence. So let's enable check_block_count() to verify
      vblocks and valid_map all the time rather than do it only
      CONFIG_F2FS_CHECK_FS is enabled.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      64024854
    • C
      f2fs: fix to use inline space only if inline_xattr is enable · 101e48fe
      Chao Yu 提交于
      [ Upstream commit 622927f3b8809206f6da54a6a7ed4df1a7770fce ]
      
      With below mkfs and mount option:
      
      MKFS_OPTIONS  -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
      MOUNT_OPTIONS -- -o noinline_xattr
      
      We may miss xattr data with below testcase:
      - mkdir dir
      - setfattr -n "user.name" -v 0 dir
      - for ((i = 0; i < 190; i++)) do touch dir/$i; done
      - umount
      - mount
      - getfattr -n "user.name" dir
      
      user.name: No such attribute
      
      The root cause is that we persist xattr data into reserved inline xattr
      space, even if inline_xattr is not enable in inline directory inode, after
      inline dentry conversion, reserved space no longer exists, so that xattr
      data missed.
      
      Let's use inline xattr space only if inline_xattr flag is set on inode
      to fix this iusse.
      
      Fixes: 6afc662e ("f2fs: support flexible inline xattr size")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      101e48fe
    • C
      f2fs: fix to avoid panic in dec_valid_block_count() · 45624f0e
      Chao Yu 提交于
      [ Upstream commit 5e159cd349bf3a31fb7e35c23a93308eb30f4f71 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203209
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after the this script.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_01.c
      ./run.sh f2fs
      sync
      
       kernel BUG at fs/f2fs/f2fs.h:1788!
       RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
       Call Trace:
        f2fs_truncate_blocks+0x36d/0x3c0
        f2fs_truncate+0x88/0x110
        f2fs_setattr+0x3e1/0x460
        notify_change+0x2da/0x400
        do_truncate+0x6d/0xb0
        do_sys_ftruncate+0xf1/0x160
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is dec_valid_block_count() will trigger kernel panic due to
      inconsistent count in between inode.i_blocks and actual block.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      45624f0e
    • C
      f2fs: fix to clear dirty inode in error path of f2fs_iget() · 47a92acf
      Chao Yu 提交于
      [ Upstream commit 546d22f070d64a7b96f57c93333772085d3a5e6d ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203217
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_test_05.c
      mkdir test
      mount -t f2fs tmp.img test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/inode.c:707!
       RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
       Call Trace:
        evict+0xba/0x180
        f2fs_iget+0x598/0xdf0
        f2fs_lookup+0x136/0x320
        __lookup_slow+0x92/0x140
        lookup_slow+0x30/0x50
        walk_component+0x1c1/0x350
        path_lookupat+0x62/0x200
        filename_lookup+0xb3/0x1a0
        do_readlinkat+0x56/0x110
        __x64_sys_readlink+0x16/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      During inode loading, __recover_inline_status() can recovery inode status
      and set inode dirty, once we failed in following process, it will fail
      the check in f2fs_evict_inode, result in trigger BUG_ON().
      
      Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      47a92acf
    • C
      f2fs: fix to do sanity check on free nid · ca9fcbc5
      Chao Yu 提交于
      [ Upstream commit 626bcf2b7ce87211dba565f2bfa7842ba5be5c1b ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203225
      
      - Overview
      When mounting the attached crafted image and unmounting it, following errors are reported.
      Additionally, it hangs on sync after unmounting.
      
      The image is intentionally fuzzed from a normal f2fs image for testing.
      Compile options for F2FS are as follows.
      CONFIG_F2FS_FS=y
      CONFIG_F2FS_STAT_FS=y
      CONFIG_F2FS_FS_XATTR=y
      CONFIG_F2FS_FS_POSIX_ACL=y
      CONFIG_F2FS_CHECK_FS=y
      
      - Reproduces
      mkdir test
      mount -t f2fs tmp.img test
      touch test/t
      umount test
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:3073!
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
       Call Trace:
        f2fs_put_super+0xf4/0x270
        generic_shutdown_super+0x62/0x110
        kill_block_super+0x1c/0x50
        kill_f2fs_super+0xad/0xd0
        deactivate_locked_super+0x35/0x60
        cleanup_mnt+0x36/0x70
        task_work_run+0x75/0x90
        exit_to_usermode_loop+0x93/0xa0
        do_syscall_64+0xba/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
      
      NAT table is corrupted, so reserved meta/node inode ids were added into
      free list incorrectly, during file creation, since reserved id has cached
      in inode hash, so it fails the creation and preallocated nid can not be
      released later, result in kernel panic.
      
      To fix this issue, let's do nid boundary check during free nid loading.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ca9fcbc5
    • C
      f2fs: fix to avoid panic in f2fs_remove_inode_page() · f3aa313d
      Chao Yu 提交于
      [ Upstream commit 8b6810f8acfe429fde7c7dad4714692cc5f75651 ]
      
      As Jungyeon reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=203219
      
      - Overview
      When mounting the attached crafted image and running program, I got this error.
      Additionally, it hangs on sync after running the program.
      
      The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
      
      - Reproduces
      cc poc_06.c
      mkdir test
      mount -t f2fs tmp.img test
      cp a.out test
      cd test
      sudo ./a.out
      sync
      
      - Messages
       kernel BUG at fs/f2fs/node.c:1183!
       RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
       Call Trace:
        f2fs_evict_inode+0x2a3/0x3a0
        evict+0xba/0x180
        __dentry_kill+0xbe/0x160
        dentry_kill+0x46/0x180
        dput+0xbb/0x100
        do_renameat2+0x3c9/0x550
        __x64_sys_rename+0x17/0x20
        do_syscall_64+0x43/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is f2fs_remove_inode_page() will trigger kernel panic due to
      inconsistent i_blocks value of inode.
      
      To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
      give a hint to fsck for latter repairing of potential image corruption.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix build warning and add unlikely]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f3aa313d