1. 03 6月, 2021 40 次提交
    • P
      io_uring: fix overflows checks in provide buffers · 2b61578e
      Pavel Begunkov 提交于
      stable inclusion
      from stable-5.10.37
      commit cbbc13b115b8f18e0a714d89f87fbdc499acfe2d
      bugzilla: 51868
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 38134ada ]
      
      Colin reported before possible overflow and sign extension problems in
      io_provide_buffers_prep(). As Linus pointed out previous attempt did nothing
      useful, see d81269fe ("io_uring: fix provide_buffers sign extension").
      
      Do that with help of check_<op>_overflow helpers. And fix struct
      io_provide_buf::len type, as it doesn't make much sense to keep it
      signed.
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Fixes: efe68c1c ("io_uring: validate the full range of provided buffers for access")
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/46538827e70fce5f6cdb50897cff4cacc490f380.1618488258.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2b61578e
    • K
      seccomp: Fix CONFIG tests for Seccomp_filters · a1ef4953
      Kenta.Tada@sony.com 提交于
      stable inclusion
      from stable-5.10.37
      commit 7456cc7c9fd5e551f462287b0d105e8cd1ffc9ec
      bugzilla: 51868
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 64bdc024 ]
      
      Strictly speaking, seccomp filters are only used
      when CONFIG_SECCOMP_FILTER.
      This patch fixes the condition to enable "Seccomp_filters"
      in /proc/$pid/status.
      Signed-off-by: NKenta Tada <Kenta.Tada@sony.com>
      Fixes: c818c03b ("seccomp: Report number of loaded filters in /proc/$pid/status")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/OSBPR01MB26772D245E2CF4F26B76A989F5669@OSBPR01MB2677.jpnprd01.prod.outlook.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a1ef4953
    • D
      afs: Fix updating of i_mode due to 3rd party change · f6e01777
      David Howells 提交于
      stable inclusion
      from stable-5.10.37
      commit 95f4e9f33b707787b990017cdfc9ff72cde7f3a5
      bugzilla: 51868
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 6e1eb04a ]
      
      Fix afs_apply_status() to mask off the irrelevant bits from status->mode
      when OR'ing them into i_mode.  This can happen when a 3rd party chmod
      occurs.
      
      Also fix afs_inode_init_from_status() to mask off the mode bits when
      initialising i_mode.
      
      Fixes: 260a9803 ("[AFS]: Add "directory write" support.")
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f6e01777
    • O
      NFSv4.2: fix copy stateid copying for the async copy · 089b1364
      Olga Kornievskaia 提交于
      stable inclusion
      from stable-5.10.37
      commit 821ff1d44fe3c10db27834a97c1f93667a037a21
      bugzilla: 51868
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit e739b120 ]
      
      This patch fixes Dan Carpenter's report that the static checker
      found a problem where memcpy() was copying into too small of a buffer.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: e0639dc5 ("NFSD introduce async copy feature")
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NDai Ngo <dai.ngo@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      089b1364
    • C
      NFSD: Fix sparse warning in nfs4proc.c · 4c2a4d3d
      Chuck Lever 提交于
      stable inclusion
      from stable-5.10.37
      commit 74bcea1a608ec3818aafbcfcb9f18cba24474134
      bugzilla: 51868
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit eb162e17 ]
      
      linux/fs/nfsd/nfs4proc.c:1542:24: warning: incorrect type in assignment (different base types)
      linux/fs/nfsd/nfs4proc.c:1542:24:    expected restricted __be32 [assigned] [usertype] status
      linux/fs/nfsd/nfs4proc.c:1542:24:    got int
      
      Clean-up: The dup_copy_fields() function returns only zero, so make
      it return void for now, and get rid of the return code check.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4c2a4d3d
    • D
      ovl: fix missing revert_creds() on error path · 5aa7eb7a
      Dan Carpenter 提交于
      stable inclusion
      from stable-5.10.37
      commit 06f414e5c9f0acaaffde67c07b4f672631c54861
      bugzilla: 51868
      CVE: NA
      
      --------------------------------
      
      commit 7b279bbf upstream.
      
      Smatch complains about missing that the ovl_override_creds() doesn't
      have a matching revert_creds() if the dentry is disconnected.  Fix this
      by moving the ovl_override_creds() until after the disconnected check.
      
      Fixes: aa3ff3c1 ("ovl: copy up of disconnected dentries")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5aa7eb7a
    • T
      io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers · d91ba9fd
      Thadeu Lima de Souza Cascardo 提交于
      stable inclusion
      from stable-5.10.37
      commit 7e916d0124e5f40d7912f93a633f5dee2c3ad735
      bugzilla: 51868
      CVE: NA
      
      --------------------------------
      
      commit d1f82808 upstream.
      
      Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on
      that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS.
      
      Truncate those lengths when doing io_add_buffers, so buffer addresses still
      use the uncapped length.
      
      Also, take the chance and change struct io_buffer len member to __u32, so
      it matches struct io_provide_buffer len member.
      
      This fixes CVE-2021-3491, also reported as ZDI-CAN-13546.
      
      Fixes: ddf0322d ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
      Reported-by: Billy Jheng Bing-Jhong (@st424204)
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      d91ba9fd
    • J
      ext4: Fix occasional generic/418 failure · db18ee72
      Jan Kara 提交于
      stable inclusion
      from stable-5.10.36
      commit 378a016271baef2368f84e824f50acc23fa1bd35
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 5899593f upstream.
      
      Eric has noticed that after pagecache read rework, generic/418 is
      occasionally failing for ext4 when blocksize < pagesize. In fact, the
      pagecache rework just made hard to hit race in ext4 more likely. The
      problem is that since ext4 conversion of direct IO writes to iomap
      framework (commit 378f32ba), we update inode size after direct IO
      write only after invalidating page cache. Thus if buffered read sneaks
      at unfortunate moment like:
      
      CPU1 - write at offset 1k                       CPU2 - read from offset 0
      iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT);
                                                      ext4_readpage();
      ext4_handle_inode_extension()
      
      the read will zero out tail of the page as it still sees smaller inode
      size and thus page cache becomes inconsistent with on-disk contents with
      all the consequences.
      
      Fix the problem by moving inode size update into end_io handler which
      gets called before the page cache is invalidated.
      Reported-and-tested-by: NEric Whitney <enwlinux@gmail.com>
      Fixes: 378f32ba ("ext4: introduce direct I/O write using iomap infrastructure")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NDave Chinner <dchinner@redhat.com>
      Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      db18ee72
    • T
      ext4: allow the dax flag to be set and cleared on inline directories · 4f3abc65
      Theodore Ts'o 提交于
      stable inclusion
      from stable-5.10.36
      commit 133e83b5b3b337591b3e35e79c607fbfe82e5e44
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 4811d992 upstream.
      
      This is needed to allow generic/607 to pass for file systems with the
      inline data_feature enabled, and it allows the use of file systems
      where the directories use inline_data, while the files are accessed
      via DAX.
      
      Cc: stable@kernel.org
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4f3abc65
    • X
      ext4: fix error return code in ext4_fc_perform_commit() · 5fc141d2
      Xu Yihang 提交于
      stable inclusion
      from stable-5.10.36
      commit 72447c925ea90800da153413098c796e4ba6c150
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit e1262cd2 upstream.
      
      In case of if not ext4_fc_add_tlv branch, an error return code is missing.
      
      Cc: stable@kernel.org
      Fixes: aa75f4d3 ("ext4: main fast-commit commit path")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NXu Yihang <xuyihang@huawei.com>
      Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20210408070033.123047-1-xuyihang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5fc141d2
    • Y
      ext4: fix ext4_error_err save negative errno into superblock · e2e0196c
      Ye Bin 提交于
      stable inclusion
      from stable-5.10.36
      commit bf4ba04f0161c8b17cfca9cd32118a6215593b30
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 6810fad9 upstream.
      
      Fix As write_mmp_block() so that it returns -EIO instead of 1, so that
      the correct error gets saved into the superblock.
      
      Cc: stable@kernel.org
      Fixes: 54d3adbc ("ext4: save all error info in save_error_info() and drop ext4_set_errno()")
      Reported-by: NLiu Zhi Qiang <liuzhiqiang26@huawei.com>
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20210406025331.148343-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      e2e0196c
    • F
      ext4: fix error code in ext4_commit_super · 17707bfe
      Fengnan Chang 提交于
      stable inclusion
      from stable-5.10.36
      commit 12905cf9e5c418fe8ad45e56d8d79848c6b11558
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit f88f1466 upstream.
      
      We should set the error code when ext4_commit_super check argument failed.
      Found in code review.
      Fixes: c4be0c1d ("filesystem freeze: add error handling of write_super_lockfs/unlockfs").
      
      Cc: stable@kernel.org
      Signed-off-by: NFengnan Chang <changfengnan@vivo.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      
      Conflicts:
      	fs/ext4/super.c
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      17707bfe
    • J
      ext4: annotate data race in jbd2_journal_dirty_metadata() · 6380d284
      Jan Kara 提交于
      stable inclusion
      from stable-5.10.36
      commit 346190959f9750f50ae56872c17e71cda1562688
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 83fe6b18 upstream.
      
      Assertion checks in jbd2_journal_dirty_metadata() are known to be racy
      but we don't want to be grabbing locks just for them.  We thus recheck
      them under b_state_lock only if it looks like they would fail. Annotate
      the checks with data_race().
      
      Cc: stable@kernel.org
      Reported-by: NHao Sun <sunhao.th@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      6380d284
    • J
      ext4: annotate data race in start_this_handle() · 1db69e4b
      Jan Kara 提交于
      stable inclusion
      from stable-5.10.36
      commit 9aca313726cb657325eeb01ab00142a6572c2175
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 3b1833e9 upstream.
      
      Access to journal->j_running_transaction is not protected by appropriate
      lock and thus is racy. We are well aware of that and the code handles
      the race properly. Just add a comment and data_race() annotation.
      
      Cc: stable@kernel.org
      Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1db69e4b
    • S
      smb3: do not attempt multichannel to server which does not support it · 952d0a63
      Steve French 提交于
      stable inclusion
      from stable-5.10.36
      commit d35c4c959eb48c0f14179dbeb28672bdd400b1c9
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 9c2dc11d upstream.
      
      We were ignoring CAP_MULTI_CHANNEL in the server response - if the
      server doesn't support multichannel we should not be attempting it.
      
      See MS-SMB2 section 3.2.5.2
      Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
      Reviewed-By: NTom Talpey <tom@talpey.com>
      Cc: <stable@vger.kernel.org> # v5.8+
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      952d0a63
    • S
      smb3: when mounting with multichannel include it in requested capabilities · 5a9940b0
      Steve French 提交于
      stable inclusion
      from stable-5.10.36
      commit 796b8263752890976b8df9692852ec8fcb36549a
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 679971e7 upstream.
      
      In the SMB3/SMB3.1.1 negotiate protocol request, we are supposed to
      advertise CAP_MULTICHANNEL capability when establishing multiple
      channels has been requested by the user doing the mount. See MS-SMB2
      sections 2.2.3 and 3.2.5.2
      
      Without setting it there is some risk that multichannel could fail
      if the server interpreted the field strictly.
      Reviewed-By: NTom Talpey <tom@talpey.com>
      Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
      Cc: <stable@vger.kernel.org> # v5.8+
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5a9940b0
    • H
      exfat: fix erroneous discard when clear cluster bit · 06b83bcd
      Hyeongseok Kim 提交于
      stable inclusion
      from stable-5.10.36
      commit 11e3ff7e164a69b8807a9c1066c1b6adbb6033e1
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 77edfc6e upstream.
      
      If mounted with discard option, exFAT issues discard command when clear
      cluster bit to remove file. But the input parameter of cluster-to-sector
      calculation is abnormally added by reserved cluster size which is 2,
      leading to discard unrelated sectors included in target+2 cluster.
      With fixing this, remove the wrong comments in set/clear/find bitmap
      functions.
      
      Fixes: 1e49a94c ("exfat: add bitmap operations")
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: NHyeongseok Kim <hyeongseok@gmail.com>
      Acked-by: NSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      06b83bcd
    • V
      fuse: fix write deadlock · 258c2b71
      Vivek Goyal 提交于
      stable inclusion
      from stable-5.10.36
      commit 1c525c265668176301bac4f152dd49a3c51c7ac6
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 4f06dd92 upstream.
      
      There are two modes for write(2) and friends in fuse:
      
      a) write through (update page cache, send sync WRITE request to userspace)
      
      b) buffered write (update page cache, async writeout later)
      
      The write through method kept all the page cache pages locked that were
      used for the request.  Keeping more than one page locked is deadlock prone
      and Qian Cai demonstrated this with trinity fuzzing.
      
      The reason for keeping the pages locked is that concurrent mapped reads
      shouldn't try to pull possibly stale data into the page cache.
      
      For full page writes, the easy way to fix this is to make the cached page
      be the authoritative source by marking the page PG_uptodate immediately.
      After this the page can be safely unlocked, since mapped/cached reads will
      take the written data from the cache.
      
      Concurrent mapped writes will now cause data in the original WRITE request
      to be updated; this however doesn't cause any data inconsistency and this
      scenario should be exceedingly rare anyway.
      
      If the WRITE request returns with an error in the above case, currently the
      page is not marked uptodate; this means that a concurrent read will always
      read consistent data.  After this patch the page is uptodate between
      writing to the cache and receiving the error: there's window where a cached
      read will read the wrong data.  While theoretically this could be a
      regression, it is unlikely to be one in practice, since this is normal for
      buffered writes.
      
      In case of a partial page write to an already uptodate page the locking is
      also unnecessary, with the above caveats.
      
      Partial write of a not uptodate page still needs to be handled.  One way
      would be to read the complete page before doing the write.  This is not
      possible, since it might break filesystems that don't expect any READ
      requests when the file was opened O_WRONLY.
      
      The other solution is to serialize the synchronous write with reads from
      the partial pages.  The easiest way to do this is to keep the partial pages
      locked.  The problem is that a write() may involve two such pages (one head
      and one tail).  This patch fixes it by only locking the partial tail page.
      If there's a partial head page as well, then split that off as a separate
      WRITE request.
      Reported-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw/
      Fixes: ea9b9907 ("fuse: implement perform_write")
      Cc: <stable@vger.kernel.org> # v2.6.26
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      258c2b71
    • J
      jffs2: Hook up splice_write callback · afc4b305
      Joel Stanley 提交于
      stable inclusion
      from stable-5.10.36
      commit 643243e318686a5179d80d1511ee883dbddb736f
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 42984af0 upstream.
      
      overlayfs using jffs2 as the upper filesystem would fail in some cases
      since moving to v5.10. The test case used was to run 'touch' on a file
      that exists in the lower fs, causing the modification time to be
      updated. It returns EINVAL when the bug is triggered.
      
      A bisection showed this was introduced in v5.9-rc1, with commit
      36e2c742 ("fs: don't allow splice read/write without explicit ops").
      Reverting that commit restores the expected behaviour.
      
      Some digging showed that this was due to jffs2 lacking an implementation
      of splice_write. (For unknown reasons the warn_unsupported that should
      trigger was not displaying any output).
      
      Adding this patch resolved the issue and the test now passes.
      
      Cc: stable@vger.kernel.org
      Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NLei YU <yulei.sh@bytedance.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      afc4b305
    • L
      jffs2: Fix kasan slab-out-of-bounds problem · 9b6397e2
      lizhe 提交于
      stable inclusion
      from stable-5.10.36
      commit 72c282b10951b20e83ebf16ed65e55e56f424552
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 960b9a8a upstream.
      
      KASAN report a slab-out-of-bounds problem. The logs are listed below.
      It is because in function jffs2_scan_dirent_node, we alloc "checkedlen+1"
      bytes for fd->name and we check crc with length rd->nsize. If checkedlen
      is less than rd->nsize, it will cause the slab-out-of-bounds problem.
      
      jffs2: Dirent at *** has zeroes in name. Truncating to %d char
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in crc32_le+0x1ce/0x260 at addr ffff8800842cf2d1
      Read of size 1 by task test_JFFS2/915
      =============================================================================
      BUG kmalloc-64 (Tainted: G    B      O   ): kasan: bad access detected
      -----------------------------------------------------------------------------
      INFO: Allocated in jffs2_alloc_full_dirent+0x2a/0x40 age=0 cpu=1 pid=915
      	___slab_alloc+0x580/0x5f0
      	__slab_alloc.isra.24+0x4e/0x64
      	__kmalloc+0x170/0x300
      	jffs2_alloc_full_dirent+0x2a/0x40
      	jffs2_scan_eraseblock+0x1ca4/0x3b64
      	jffs2_scan_medium+0x285/0xfe0
      	jffs2_do_mount_fs+0x5fb/0x1bbc
      	jffs2_do_fill_super+0x245/0x6f0
      	jffs2_fill_super+0x287/0x2e0
      	mount_mtd_aux.isra.0+0x9a/0x144
      	mount_mtd+0x222/0x2f0
      	jffs2_mount+0x41/0x60
      	mount_fs+0x63/0x230
      	vfs_kern_mount.part.6+0x6c/0x1f4
      	do_mount+0xae8/0x1940
      	SyS_mount+0x105/0x1d0
      INFO: Freed in jffs2_free_full_dirent+0x22/0x40 age=27 cpu=1 pid=915
      	__slab_free+0x372/0x4e4
      	kfree+0x1d4/0x20c
      	jffs2_free_full_dirent+0x22/0x40
      	jffs2_build_remove_unlinked_inode+0x17a/0x1e4
      	jffs2_do_mount_fs+0x1646/0x1bbc
      	jffs2_do_fill_super+0x245/0x6f0
      	jffs2_fill_super+0x287/0x2e0
      	mount_mtd_aux.isra.0+0x9a/0x144
      	mount_mtd+0x222/0x2f0
      	jffs2_mount+0x41/0x60
      	mount_fs+0x63/0x230
      	vfs_kern_mount.part.6+0x6c/0x1f4
      	do_mount+0xae8/0x1940
      	SyS_mount+0x105/0x1d0
      	entry_SYSCALL_64_fastpath+0x1e/0x97
      Call Trace:
       [<ffffffff815befef>] dump_stack+0x59/0x7e
       [<ffffffff812d1d65>] print_trailer+0x125/0x1b0
       [<ffffffff812d82c8>] object_err+0x34/0x40
       [<ffffffff812dadef>] kasan_report.part.1+0x21f/0x534
       [<ffffffff81132401>] ? vprintk+0x2d/0x40
       [<ffffffff815f1ee2>] ? crc32_le+0x1ce/0x260
       [<ffffffff812db41a>] kasan_report+0x26/0x30
       [<ffffffff812d9fc1>] __asan_load1+0x3d/0x50
       [<ffffffff815f1ee2>] crc32_le+0x1ce/0x260
       [<ffffffff814764ae>] ? jffs2_alloc_full_dirent+0x2a/0x40
       [<ffffffff81485cec>] jffs2_scan_eraseblock+0x1d0c/0x3b64
       [<ffffffff81488813>] ? jffs2_scan_medium+0xccf/0xfe0
       [<ffffffff81483fe0>] ? jffs2_scan_make_ino_cache+0x14c/0x14c
       [<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
       [<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
       [<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
       [<ffffffff812d5d90>] ? kmem_cache_alloc_trace+0x10c/0x2cc
       [<ffffffff818169fb>] ? mtd_point+0xf7/0x130
       [<ffffffff81487dc9>] jffs2_scan_medium+0x285/0xfe0
       [<ffffffff81487b44>] ? jffs2_scan_eraseblock+0x3b64/0x3b64
       [<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
       [<ffffffff812da3e9>] ? kasan_unpoison_shadow+0x35/0x50
       [<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
       [<ffffffff812d57df>] ? __kmalloc+0x12b/0x300
       [<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
       [<ffffffff814a2753>] ? jffs2_sum_init+0x9f/0x240
       [<ffffffff8148b2ff>] jffs2_do_mount_fs+0x5fb/0x1bbc
       [<ffffffff8148ad04>] ? jffs2_del_noinode_dirent+0x640/0x640
       [<ffffffff812da462>] ? kasan_kmalloc+0x5e/0x70
       [<ffffffff81127c5b>] ? __init_rwsem+0x97/0xac
       [<ffffffff81492349>] jffs2_do_fill_super+0x245/0x6f0
       [<ffffffff81493c5b>] jffs2_fill_super+0x287/0x2e0
       [<ffffffff814939d4>] ? jffs2_parse_options+0x594/0x594
       [<ffffffff81819bea>] mount_mtd_aux.isra.0+0x9a/0x144
       [<ffffffff81819eb6>] mount_mtd+0x222/0x2f0
       [<ffffffff814939d4>] ? jffs2_parse_options+0x594/0x594
       [<ffffffff81819c94>] ? mount_mtd_aux.isra.0+0x144/0x144
       [<ffffffff81258757>] ? free_pages+0x13/0x1c
       [<ffffffff814fa0ac>] ? selinux_sb_copy_data+0x278/0x2e0
       [<ffffffff81492b35>] jffs2_mount+0x41/0x60
       [<ffffffff81302fb7>] mount_fs+0x63/0x230
       [<ffffffff8133755f>] ? alloc_vfsmnt+0x32f/0x3b0
       [<ffffffff81337f2c>] vfs_kern_mount.part.6+0x6c/0x1f4
       [<ffffffff8133ceec>] do_mount+0xae8/0x1940
       [<ffffffff811b94e0>] ? audit_filter_rules.constprop.6+0x1d10/0x1d10
       [<ffffffff8133c404>] ? copy_mount_string+0x40/0x40
       [<ffffffff812cbf78>] ? alloc_pages_current+0xa4/0x1bc
       [<ffffffff81253a89>] ? __get_free_pages+0x25/0x50
       [<ffffffff81338993>] ? copy_mount_options.part.17+0x183/0x264
       [<ffffffff8133e3a9>] SyS_mount+0x105/0x1d0
       [<ffffffff8133e2a4>] ? copy_mnt_ns+0x560/0x560
       [<ffffffff810e8391>] ? msa_space_switch_handler+0x13d/0x190
       [<ffffffff81be184a>] entry_SYSCALL_64_fastpath+0x1e/0x97
       [<ffffffff810e9274>] ? msa_space_switch+0xb0/0xe0
      Memory state around the buggy address:
       ffff8800842cf180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8800842cf200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff8800842cf280: fc fc fc fc fc fc 00 00 00 00 01 fc fc fc fc fc
                                                       ^
       ffff8800842cf300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8800842cf380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      
      Cc: stable@vger.kernel.org
      Reported-by: NKunkun Xu <xukunkun1@huawei.com>
      Signed-off-by: Nlizhe <lizhe67@huawei.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9b6397e2
    • T
      NFSv4: Don't discard segments marked for return in _pnfs_return_layout() · 5bfc60d4
      Trond Myklebust 提交于
      stable inclusion
      from stable-5.10.36
      commit 2fafe7d5047f98791afd9a1d90d2afb70debc590
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit de144ff4 upstream.
      
      If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
      flag, then the assumption is that it has some reporting requirement
      to perform through a layoutreturn (e.g. flexfiles layout stats or error
      information).
      
      Fixes: 6d597e17 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5bfc60d4
    • T
      NFS: Don't discard pNFS layout segments that are marked for return · 2b19cf50
      Trond Myklebust 提交于
      stable inclusion
      from stable-5.10.36
      commit 334165d9fb69f357fa8e1e4766f9c6600aa67e5d
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 39fd0186 upstream.
      
      If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
      flag, then the assumption is that it has some reporting requirement
      to perform through a layoutreturn (e.g. flexfiles layout stats or error
      information).
      
      Fixes: e0b7d420 ("pNFS: Don't discard layout segments that are marked for return")
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2b19cf50
    • R
      NFS: fs_context: validate UDP retrans to prevent shift out-of-bounds · b93f1d48
      Randy Dunlap 提交于
      stable inclusion
      from stable-5.10.36
      commit 96fa26b74cdcf9f5c98996bf36bec9fb5b19ffe2
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit c09f11ef upstream.
      
      Fix shift out-of-bounds in xprt_calc_majortimeo(). This is caused
      by a garbage timeout (retrans) mount option being passed to nfs mount,
      in this case from syzkaller.
      
      If the protocol is XPRT_TRANSPORT_UDP, then 'retrans' is a shift
      value for a 64-bit long integer, so 'retrans' cannot be >= 64.
      If it is >= 64, fail the mount and return an error.
      
      Fixes: 9954bf92 ("NFS: Move mount parameterisation bits into their own file")
      Reported-by: syzbot+ba2e91df8f74809417fa@syzkaller.appspotmail.com
      Reported-by: syzbot+f3a0fa110fd630ab56c8@syzkaller.appspotmail.com
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Anna Schumaker <anna.schumaker@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b93f1d48
    • C
      f2fs: fix to avoid out-of-bounds memory access · 1fcf6d1b
      Chao Yu 提交于
      stable inclusion
      from stable-5.10.36
      commit 9aa4602237d535b83c579eb752e8fc1c3e7e7055
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit b862676e upstream.
      
      butt3rflyh4ck <butterflyhuangxx@gmail.com> reported a bug found by
      syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]:
      
       dump_stack+0xfa/0x151 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232
       __kasan_report mm/kasan/report.c:399 [inline]
       kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
       f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline]
       current_nat_addr fs/f2fs/node.h:213 [inline]
       get_next_nat_page fs/f2fs/node.c:123 [inline]
       __flush_nat_entry_set fs/f2fs/node.c:2888 [inline]
       f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991
       f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640
       f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807
       f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454
       __sync_filesystem fs/sync.c:39 [inline]
       sync_filesystem fs/sync.c:67 [inline]
       sync_filesystem+0x1b5/0x260 fs/sync.c:48
       generic_shutdown_super+0x70/0x370 fs/super.c:448
       kill_block_super+0x97/0xf0 fs/super.c:1394
      
      The root cause is, if nat entry in checkpoint journal area is corrupted,
      e.g. nid of journalled nat entry exceeds max nid value, during checkpoint,
      once it tries to flush nat journal to NAT area, get_next_nat_page() may
      access out-of-bounds memory on nat_bitmap due to it uses wrong nid value
      as bitmap offset.
      
      [1] https://lore.kernel.org/lkml/CAFcO6XOMWdr8pObek6eN6-fs58KG9doRFadgJj-FnF-1x43s2g@mail.gmail.com/T/#uReported-and-tested-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1fcf6d1b
    • E
      f2fs: fix error handling in f2fs_end_enable_verity() · 17030c18
      Eric Biggers 提交于
      stable inclusion
      from stable-5.10.36
      commit 39624749c52dc015d358f8c894bd2236702d07bb
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 3c031542 upstream.
      
      f2fs didn't properly clean up if verity failed to be enabled on a file:
      
      - It left verity metadata (pages past EOF) in the page cache, which
        would be exposed to userspace if the file was later extended.
      
      - It didn't truncate the verity metadata at all (either from cache or
        from disk) if an error occurred while setting the verity bit.
      
      Fix these bugs by adding a call to truncate_inode_pages() and ensuring
      that we truncate the verity metadata (both from cache and from disk) in
      all error paths.  Also rework the code to cleanly separate the success
      path from the error paths, which makes it much easier to understand.
      
      Finally, log a message if f2fs_truncate() fails, since it might
      otherwise fail silently.
      Reported-by: NYunlei He <heyunlei@hihonor.com>
      Fixes: 95ae251f ("f2fs: add fs-verity support")
      Cc: <stable@vger.kernel.org> # v5.4+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      17030c18
    • G
      ubifs: Only check replay with inode type to judge if inode linked · 00a9aad1
      Guochun Mao 提交于
      stable inclusion
      from stable-5.10.36
      commit 50b0c0c3385d80ee23173dc1d9fd82803ed45aac
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 3e903315 upstream.
      
      Conside the following case, it just write a big file into flash,
      when complete writing, delete the file, and then power off promptly.
      Next time power on, we'll get a replay list like:
      ...
      LEB 1105:211344 len 4144 deletion 0 sqnum 428783 key type 1 inode 80
      LEB 15:233544 len 160 deletion 1 sqnum 428785 key type 0 inode 80
      LEB 1105:215488 len 4144 deletion 0 sqnum 428787 key type 1 inode 80
      ...
      In the replay list, data nodes' deletion are 0, and the inode node's
      deletion is 1. In current logic, the file's dentry will be removed,
      but inode and the flash space it occupied will be reserved.
      User will see that much free space been disappeared.
      
      We only need to check the deletion value of the following inode type
      node of the replay entry.
      
      Fixes: e58725d5 ("ubifs: Handle re-linking of inodes correctly while recovery")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGuochun Mao <guochun.mao@mediatek.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      00a9aad1
    • L
      virtiofs: fix memory leak in virtio_fs_probe() · ff3705e5
      Luis Henriques 提交于
      stable inclusion
      from stable-5.10.36
      commit d19555ff225d0896a33246a49279e6d578095f15
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit c79c5e01 upstream.
      
      When accidentally passing twice the same tag to qemu, kmemleak ended up
      reporting a memory leak in virtiofs.  Also, looking at the log I saw the
      following error (that's when I realised the duplicated tag):
      
        virtiofs: probe of virtio5 failed with error -17
      
      Here's the kmemleak log for reference:
      
      unreferenced object 0xffff888103d47800 (size 1024):
        comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff  ................
        backtrace:
          [<000000000ebb87c1>] virtio_fs_probe+0x171/0x7ae [virtiofs]
          [<00000000f8aca419>] virtio_dev_probe+0x15f/0x210
          [<000000004d6baf3c>] really_probe+0xea/0x430
          [<00000000a6ceeac8>] device_driver_attach+0xa8/0xb0
          [<00000000196f47a7>] __driver_attach+0x98/0x140
          [<000000000b20601d>] bus_for_each_dev+0x7b/0xc0
          [<00000000399c7b7f>] bus_add_driver+0x11b/0x1f0
          [<0000000032b09ba7>] driver_register+0x8f/0xe0
          [<00000000cdd55998>] 0xffffffffa002c013
          [<000000000ea196a2>] do_one_initcall+0x64/0x2e0
          [<0000000008f727ce>] do_init_module+0x5c/0x260
          [<000000003cdedab6>] __do_sys_finit_module+0xb5/0x120
          [<00000000ad2f48c6>] do_syscall_64+0x33/0x40
          [<00000000809526b5>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NLuis Henriques <lhenriques@suse.de>
      Fixes: a62a8ef9 ("virtio-fs: add virtiofs filesystem")
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      ff3705e5
    • T
      fs: fix reporting supported extra file attributes for statx() · a029750c
      Theodore Ts'o 提交于
      stable inclusion
      from stable-5.10.36
      commit 1b41d4e5aa75675c915cbed09e2a7813f3fd2e49
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 5afa7e8b upstream.
      
      statx(2) notes that any attribute that is not indicated as supported
      by stx_attributes_mask has no usable value.  Commits 801e5237
      ("fs: move generic stat response attr handling to vfs_getattr_nosec")
      and 712b2698 ("fs/stat: Define DAX statx attribute") sets
      STATX_ATTR_AUTOMOUNT and STATX_ATTR_DAX, respectively, without setting
      stx_attributes_mask, which can cause xfstests generic/532 to fail.
      
      Fix this in the same way as commit 1b9598c8 ("xfs: fix reporting
      supported extra file attributes for statx()")
      
      Fixes: 801e5237 ("fs: move generic stat response attr handling to vfs_getattr_nosec")
      Fixes: 712b2698 ("fs/stat: Define DAX statx attribute")
      Cc: stable@kernel.org
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a029750c
    • F
      btrfs: fix race when picking most recent mod log operation for an old root · 1226f171
      Filipe Manana 提交于
      stable inclusion
      from stable-5.10.36
      commit 1d852d6bb4d44baac57452be5c2857741139fc59
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f9690f42 ]
      
      Commit dbcc7d57 ("btrfs: fix race when cloning extent buffer during
      rewind of an old root"), fixed a race when we need to rewind the extent
      buffer of an old root. It was caused by picking a new mod log operation
      for the extent buffer while getting a cloned extent buffer with an outdated
      number of items (off by -1), because we cloned the extent buffer without
      locking it first.
      
      However there is still another similar race, but in the opposite direction.
      The cloned extent buffer has a number of items that does not match the
      number of tree mod log operations that are going to be replayed. This is
      because right after we got the last (most recent) tree mod log operation to
      replay and before locking and cloning the extent buffer, another task adds
      a new pointer to the extent buffer, which results in adding a new tree mod
      log operation and incrementing the number of items in the extent buffer.
      So after cloning we have mismatch between the number of items in the extent
      buffer and the number of mod log operations we are going to apply to it.
      This results in hitting a BUG_ON() that produces the following stack trace:
      
         ------------[ cut here ]------------
         kernel BUG at fs/btrfs/tree-mod-log.c:675!
         invalid opcode: 0000 [#1] SMP KASAN PTI
         CPU: 3 PID: 4811 Comm: crawl_1215 Tainted: G        W         5.12.0-7d1efdf501f8-misc-next+ #99
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
         RIP: 0010:tree_mod_log_rewind+0x3b1/0x3c0
         Code: 05 48 8d 74 10 (...)
         RSP: 0018:ffffc90001027090 EFLAGS: 00010293
         RAX: 0000000000000000 RBX: ffff8880a8514600 RCX: ffffffffaa9e59b6
         RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8880a851462c
         RBP: ffffc900010270e0 R08: 00000000000000c0 R09: ffffed1004333417
         R10: ffff88802199a0b7 R11: ffffed1004333416 R12: 000000000000000e
         R13: ffff888135af8748 R14: ffff88818766ff00 R15: ffff8880a851462c
         FS:  00007f29acf62700(0000) GS:ffff8881f2200000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 00007f0e6013f718 CR3: 000000010d42e003 CR4: 0000000000170ee0
         Call Trace:
          btrfs_get_old_root+0x16a/0x5c0
          ? lock_downgrade+0x400/0x400
          btrfs_search_old_slot+0x192/0x520
          ? btrfs_search_slot+0x1090/0x1090
          ? free_extent_buffer.part.61+0xd7/0x140
          ? free_extent_buffer+0x13/0x20
          resolve_indirect_refs+0x3e9/0xfc0
          ? lock_downgrade+0x400/0x400
          ? __kasan_check_read+0x11/0x20
          ? add_prelim_ref.part.11+0x150/0x150
          ? lock_downgrade+0x400/0x400
          ? __kasan_check_read+0x11/0x20
          ? lock_acquired+0xbb/0x620
          ? __kasan_check_write+0x14/0x20
          ? do_raw_spin_unlock+0xa8/0x140
          ? rb_insert_color+0x340/0x360
          ? prelim_ref_insert+0x12d/0x430
          find_parent_nodes+0x5c3/0x1830
          ? stack_trace_save+0x87/0xb0
          ? resolve_indirect_refs+0xfc0/0xfc0
          ? fs_reclaim_acquire+0x67/0xf0
          ? __kasan_check_read+0x11/0x20
          ? lockdep_hardirqs_on_prepare+0x210/0x210
          ? fs_reclaim_acquire+0x67/0xf0
          ? __kasan_check_read+0x11/0x20
          ? ___might_sleep+0x10f/0x1e0
          ? __kasan_kmalloc+0x9d/0xd0
          ? trace_hardirqs_on+0x55/0x120
          btrfs_find_all_roots_safe+0x142/0x1e0
          ? find_parent_nodes+0x1830/0x1830
          ? trace_hardirqs_on+0x55/0x120
          ? ulist_free+0x1f/0x30
          ? btrfs_inode_flags_to_xflags+0x50/0x50
          iterate_extent_inodes+0x20e/0x580
          ? tree_backref_for_extent+0x230/0x230
          ? release_extent_buffer+0x225/0x280
          ? read_extent_buffer+0xdd/0x110
          ? lock_downgrade+0x400/0x400
          ? __kasan_check_read+0x11/0x20
          ? lock_acquired+0xbb/0x620
          ? __kasan_check_write+0x14/0x20
          ? do_raw_spin_unlock+0xa8/0x140
          ? _raw_spin_unlock+0x22/0x30
          ? release_extent_buffer+0x225/0x280
          iterate_inodes_from_logical+0x129/0x170
          ? iterate_inodes_from_logical+0x129/0x170
          ? btrfs_inode_flags_to_xflags+0x50/0x50
          ? iterate_extent_inodes+0x580/0x580
          ? __vmalloc_node+0x92/0xb0
          ? init_data_container+0x34/0xb0
          ? init_data_container+0x34/0xb0
          ? kvmalloc_node+0x60/0x80
          btrfs_ioctl_logical_to_ino+0x158/0x230
          btrfs_ioctl+0x2038/0x4360
          ? __kasan_check_write+0x14/0x20
          ? mmput+0x3b/0x220
          ? btrfs_ioctl_get_supported_features+0x30/0x30
          ? __kasan_check_read+0x11/0x20
          ? __kasan_check_read+0x11/0x20
          ? lock_release+0xc8/0x650
          ? __might_fault+0x64/0xd0
          ? __kasan_check_read+0x11/0x20
          ? lock_downgrade+0x400/0x400
          ? lockdep_hardirqs_on_prepare+0x210/0x210
          ? lockdep_hardirqs_on_prepare+0x13/0x210
          ? _raw_spin_unlock_irqrestore+0x51/0x63
          ? __kasan_check_read+0x11/0x20
          ? do_vfs_ioctl+0xfc/0x9d0
          ? ioctl_file_clone+0xe0/0xe0
          ? lock_downgrade+0x400/0x400
          ? lockdep_hardirqs_on_prepare+0x210/0x210
          ? __kasan_check_read+0x11/0x20
          ? lock_release+0xc8/0x650
          ? __task_pid_nr_ns+0xd3/0x250
          ? __kasan_check_read+0x11/0x20
          ? __fget_files+0x160/0x230
          ? __fget_light+0xf2/0x110
          __x64_sys_ioctl+0xc3/0x100
          do_syscall_64+0x37/0x80
          entry_SYSCALL_64_after_hwframe+0x44/0xae
         RIP: 0033:0x7f29ae85b427
         Code: 00 00 90 48 8b (...)
         RSP: 002b:00007f29acf5fcf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
         RAX: ffffffffffffffda RBX: 00007f29acf5ff40 RCX: 00007f29ae85b427
         RDX: 00007f29acf5ff48 RSI: 00000000c038943b RDI: 0000000000000003
         RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f29acf60120
         R10: 00005640d5fc7b00 R11: 0000000000000246 R12: 0000000000000003
         R13: 00007f29acf5ff48 R14: 00007f29acf5ff40 R15: 00007f29acf5fef8
         Modules linked in:
         ---[ end trace 85e5fce078dfbe04 ]---
      
        (gdb) l *(tree_mod_log_rewind+0x3b1)
        0xffffffff819e5b21 is in tree_mod_log_rewind (fs/btrfs/tree-mod-log.c:675).
        670                      * the modification. As we're going backwards, we do the
        671                      * opposite of each operation here.
        672                      */
        673                     switch (tm->op) {
        674                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
        675                             BUG_ON(tm->slot < n);
        676                             fallthrough;
        677                     case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING:
        678                     case BTRFS_MOD_LOG_KEY_REMOVE:
        679                             btrfs_set_node_key(eb, &tm->key, tm->slot);
        (gdb) quit
      
      The following steps explain in more detail how it happens:
      
      1) We have one tree mod log user (through fiemap or the logical ino ioctl),
         with a sequence number of 1, so we have fs_info->tree_mod_seq == 1.
         This is task A;
      
      2) Another task is at ctree.c:balance_level() and we have eb X currently as
         the root of the tree, and we promote its single child, eb Y, as the new
         root.
      
         Then, at ctree.c:balance_level(), we call:
      
            ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
      
      3) At btrfs_tree_mod_log_insert_root() we create a tree mod log operation
         of type BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING, with a ->logical field
         pointing to ebX->start. We only have one item in eb X, so we create
         only one tree mod log operation, and store in the "tm_list" array;
      
      4) Then, still at btrfs_tree_mod_log_insert_root(), we create a tree mod
         log element of operation type BTRFS_MOD_LOG_ROOT_REPLACE, ->logical set
         to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level
         set to the level of eb X and ->generation set to the generation of eb X;
      
      5) Then btrfs_tree_mod_log_insert_root() calls tree_mod_log_free_eb() with
         "tm_list" as argument. After that, tree_mod_log_free_eb() calls
         tree_mod_log_insert(). This inserts the mod log operation of type
         BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING from step 3 into the rbtree
         with a sequence number of 2 (and fs_info->tree_mod_seq set to 2);
      
      6) Then, after inserting the "tm_list" single element into the tree mod
         log rbtree, the BTRFS_MOD_LOG_ROOT_REPLACE element is inserted, which
         gets the sequence number 3 (and fs_info->tree_mod_seq set to 3);
      
      7) Back to ctree.c:balance_level(), we free eb X by calling
         btrfs_free_tree_block() on it. Because eb X was created in the current
         transaction, has no other references and writeback did not happen for
         it, we add it back to the free space cache/tree;
      
      8) Later some other task B allocates the metadata extent from eb X, since
         it is marked as free space in the space cache/tree, and uses it as a
         node for some other btree;
      
      9) The tree mod log user task calls btrfs_search_old_slot(), which calls
         btrfs_get_old_root(), and finally that calls tree_mod_log_oldest_root()
         with time_seq == 1 and eb_root == eb Y;
      
      10) The first iteration of the while loop finds the tree mod log element
          with sequence number 3, for the logical address of eb Y and of type
          BTRFS_MOD_LOG_ROOT_REPLACE;
      
      11) Because the operation type is BTRFS_MOD_LOG_ROOT_REPLACE, we don't
          break out of the loop, and set root_logical to point to
          tm->old_root.logical, which corresponds to the logical address of
          eb X;
      
      12) On the next iteration of the while loop, the call to
          tree_mod_log_search_oldest() returns the smallest tree mod log element
          for the logical address of eb X, which has a sequence number of 2, an
          operation type of BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and
          corresponds to the old slot 0 of eb X (eb X had only 1 item in it
          before being freed at step 7);
      
      13) We then break out of the while loop and return the tree mod log
          operation of type BTRFS_MOD_LOG_ROOT_REPLACE (eb Y), and not the one
          for slot 0 of eb X, to btrfs_get_old_root();
      
      14) At btrfs_get_old_root(), we process the BTRFS_MOD_LOG_ROOT_REPLACE
          operation and set "logical" to the logical address of eb X, which was
          the old root. We then call tree_mod_log_search() passing it the logical
          address of eb X and time_seq == 1;
      
      15) But before calling tree_mod_log_search(), task B locks eb X, adds a
          key to eb X, which results in adding a tree mod log operation of type
          BTRFS_MOD_LOG_KEY_ADD, with a sequence number of 4, to the tree mod
          log, and increments the number of items in eb X from 0 to 1.
          Now fs_info->tree_mod_seq has a value of 4;
      
      16) Task A then calls tree_mod_log_search(), which returns the most recent
          tree mod log operation for eb X, which is the one just added by task B
          at the previous step, with a sequence number of 4, a type of
          BTRFS_MOD_LOG_KEY_ADD and for slot 0;
      
      17) Before task A locks and clones eb X, task A adds another key to eb X,
          which results in adding a new BTRFS_MOD_LOG_KEY_ADD mod log operation,
          with a sequence number of 5, for slot 1 of eb X, increments the
          number of items in eb X from 1 to 2, and unlocks eb X.
          Now fs_info->tree_mod_seq has a value of 5;
      
      18) Task A then locks eb X and clones it. The clone has a value of 2 for
          the number of items and the pointer "tm" points to the tree mod log
          operation with sequence number 4, not the most recent one with a
          sequence number of 5, so there is mismatch between the number of
          mod log operations that are going to be applied to the cloned version
          of eb X and the number of items in the clone;
      
      19) Task A then calls tree_mod_log_rewind() with the clone of eb X, the
          tree mod log operation with sequence number 4 and a type of
          BTRFS_MOD_LOG_KEY_ADD, and time_seq == 1;
      
      20) At tree_mod_log_rewind(), we set the local variable "n" with a value
          of 2, which is the number of items in the clone of eb X.
      
          Then in the first iteration of the while loop, we process the mod log
          operation with sequence number 4, which is targeted at slot 0 and has
          a type of BTRFS_MOD_LOG_KEY_ADD. This results in decrementing "n" from
          2 to 1.
      
          Then we pick the next tree mod log operation for eb X, which is the
          tree mod log operation with a sequence number of 2, a type of
          BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING and for slot 0, it is the one
          added in step 5 to the tree mod log tree.
      
          We go back to the top of the loop to process this mod log operation,
          and because its slot is 0 and "n" has a value of 1, we hit the BUG_ON:
      
              (...)
              switch (tm->op) {
              case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
                      BUG_ON(tm->slot < n);
                      fallthrough;
      	(...)
      
      Fix this by checking for a more recent tree mod log operation after locking
      and cloning the extent buffer of the old root node, and use it as the first
      operation to apply to the cloned extent buffer when rewinding it.
      
      Stable backport notes: due to moved code and renames, in =< 5.11 the
      change should be applied to ctree.c:get_old_root.
      Reported-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Link: https://lore.kernel.org/linux-btrfs/20210404040732.GZ32440@hungrycats.org/
      Fixes: 834328a8 ("Btrfs: tree mod log's old roots could still be part of the tree")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1226f171
    • J
      btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s · 5b953513
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.36
      commit 9c60c881d662a8aa3c70717d53eccbbe951c979f
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 7a9213a9 ]
      
      A few BUG_ON()'s in replace_path are purely to keep us from making
      logical mistakes, so replace them with ASSERT()'s.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5b953513
    • J
      btrfs: do proper error handling in btrfs_update_reloc_root · 2893b014
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.36
      commit f32b84d7c977e1906a4781b93b3c93090b6cd675
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 592fbcd5 ]
      
      We call btrfs_update_root in btrfs_update_reloc_root, which can fail for
      all sorts of reasons, including IO errors.  Instead of panicing the box
      lets return the error, now that all callers properly handle those
      errors.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2893b014
    • J
      btrfs: do proper error handling in create_reloc_root · cc47c520
      Josef Bacik 提交于
      stable inclusion
      from stable-5.10.36
      commit 224c654a2eca6a29009b80c887bcf3ac4b2cab30
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 84c50ba5 ]
      
      We do memory allocations here, read blocks from disk, all sorts of
      operations that could easily fail at any given point.  Instead of
      panicing the box, simply return the error back up the chain, all callers
      at this point have proper error handling.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cc47c520
    • F
      btrfs: fix race between transaction aborts and fsyncs leading to use-after-free · 8eff4d5b
      Filipe Manana 提交于
      stable inclusion
      from stable-5.10.36
      commit a4794be7b00b7eda4b45fffd283ab7d76df7e5d6
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 061dde82 upstream.
      
      There is a race between a task aborting a transaction during a commit,
      a task doing an fsync and the transaction kthread, which leads to an
      use-after-free of the log root tree. When this happens, it results in a
      stack trace like the following:
      
        BTRFS info (device dm-0): forced readonly
        BTRFS warning (device dm-0): Skipping commit of aborted transaction.
        BTRFS: error (device dm-0) in cleanup_transaction:1958: errno=-5 IO failure
        BTRFS warning (device dm-0): lost page write due to IO error on /dev/mapper/error-test (-5)
        BTRFS warning (device dm-0): Skipping commit of aborted transaction.
        BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0xa4e8 len 4096 err no 10
        BTRFS error (device dm-0): error writing primary super block to device 1
        BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e000 len 4096 err no 10
        BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e008 len 4096 err no 10
        BTRFS warning (device dm-0): direct IO failed ino 261 rw 0,0 sector 0x12e010 len 4096 err no 10
        BTRFS: error (device dm-0) in write_all_supers:4110: errno=-5 IO failure (1 errors while writing supers)
        BTRFS: error (device dm-0) in btrfs_sync_log:3308: errno=-5 IO failure
        general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b68: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 2 PID: 2458471 Comm: fsstress Not tainted 5.12.0-rc5-btrfs-next-84 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        RIP: 0010:__mutex_lock+0x139/0xa40
        Code: c0 74 19 (...)
        RSP: 0018:ffff9f18830d7b00 EFLAGS: 00010202
        RAX: 6b6b6b6b6b6b6b68 RBX: 0000000000000001 RCX: 0000000000000002
        RDX: ffffffffb9c54d13 RSI: 0000000000000000 RDI: 0000000000000000
        RBP: ffff9f18830d7bc0 R08: 0000000000000000 R09: 0000000000000000
        R10: ffff9f18830d7be0 R11: 0000000000000001 R12: ffff8c6cd199c040
        R13: ffff8c6c95821358 R14: 00000000fffffffb R15: ffff8c6cbcf01358
        FS:  00007fa9140c2b80(0000) GS:ffff8c6fac600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fa913d52000 CR3: 000000013d2b4003 CR4: 0000000000370ee0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         ? __btrfs_handle_fs_error+0xde/0x146 [btrfs]
         ? btrfs_sync_log+0x7c1/0xf20 [btrfs]
         ? btrfs_sync_log+0x7c1/0xf20 [btrfs]
         btrfs_sync_log+0x7c1/0xf20 [btrfs]
         btrfs_sync_file+0x40c/0x580 [btrfs]
         do_fsync+0x38/0x70
         __x64_sys_fsync+0x10/0x20
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fa9142a55c3
        Code: 8b 15 09 (...)
        RSP: 002b:00007fff26278d48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
        RAX: ffffffffffffffda RBX: 0000563c83cb4560 RCX: 00007fa9142a55c3
        RDX: 00007fff26278cb0 RSI: 00007fff26278cb0 RDI: 0000000000000005
        RBP: 0000000000000005 R08: 0000000000000001 R09: 00007fff26278d5c
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000340
        R13: 00007fff26278de0 R14: 00007fff26278d96 R15: 0000563c83ca57c0
        Modules linked in: btrfs dm_zero dm_snapshot dm_thin_pool (...)
        ---[ end trace ee2f1b19327d791d ]---
      
      The steps that lead to this crash are the following:
      
      1) We are at transaction N;
      
      2) We have two tasks with a transaction handle attached to transaction N.
         Task A and Task B. Task B is doing an fsync;
      
      3) Task B is at btrfs_sync_log(), and has saved fs_info->log_root_tree
         into a local variable named 'log_root_tree' at the top of
         btrfs_sync_log(). Task B is about to call write_all_supers(), but
         before that...
      
      4) Task A calls btrfs_commit_transaction(), and after it sets the
         transaction state to TRANS_STATE_COMMIT_START, an error happens before
         it waits for the transaction's 'num_writers' counter to reach a value
         of 1 (no one else attached to the transaction), so it jumps to the
         label "cleanup_transaction";
      
      5) Task A then calls cleanup_transaction(), where it aborts the
         transaction, setting BTRFS_FS_STATE_TRANS_ABORTED on fs_info->fs_state,
         setting the ->aborted field of the transaction and the handle to an
         errno value and also setting BTRFS_FS_STATE_ERROR on fs_info->fs_state.
      
         After that, at cleanup_transaction(), it deletes the transaction from
         the list of transactions (fs_info->trans_list), sets the transaction
         to the state TRANS_STATE_COMMIT_DOING and then waits for the number
         of writers to go down to 1, as it's currently 2 (1 for task A and 1
         for task B);
      
      6) The transaction kthread is running and sees that BTRFS_FS_STATE_ERROR
         is set in fs_info->fs_state, so it calls btrfs_cleanup_transaction().
      
         There it sees the list fs_info->trans_list is empty, and then proceeds
         into calling btrfs_drop_all_logs(), which frees the log root tree with
         a call to btrfs_free_log_root_tree();
      
      7) Task B calls write_all_supers() and, shortly after, under the label
         'out_wake_log_root', it deferences the pointer stored in
         'log_root_tree', which was already freed in the previous step by the
         transaction kthread. This results in a use-after-free leading to a
         crash.
      
      Fix this by deleting the transaction from the list of transactions at
      cleanup_transaction() only after setting the transaction state to
      TRANS_STATE_COMMIT_DOING and waiting for all existing tasks that are
      attached to the transaction to release their transaction handles.
      This makes the transaction kthread wait for all the tasks attached to
      the transaction to be done with the transaction before dropping the
      log roots and doing other cleanups.
      
      Fixes: ef67963d ("btrfs: drop logs when we've aborted a transaction")
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8eff4d5b
    • F
      btrfs: fix metadata extent leak after failure to create subvolume · 951dad9b
      Filipe Manana 提交于
      stable inclusion
      from stable-5.10.36
      commit 97f30747b22c84e24243cd066b2c232084f64b8c
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 67addf29 upstream.
      
      When creating a subvolume we allocate an extent buffer for its root node
      after starting a transaction. We setup a root item for the subvolume that
      points to that extent buffer and then attempt to insert the root item into
      the root tree - however if that fails, due to ENOMEM for example, we do
      not free the extent buffer previously allocated and we do not abort the
      transaction (as at that point we did nothing that can not be undone).
      
      This means that we effectively do not return the metadata extent back to
      the free space cache/tree and we leave a delayed reference for it which
      causes a metadata extent item to be added to the extent tree, in the next
      transaction commit, without having backreferences. When this happens
      'btrfs check' reports the following:
      
        $ btrfs check /dev/sdi
        Opening filesystem to check...
        Checking filesystem on /dev/sdi
        UUID: dce2cb9d-025f-4b05-a4bf-cee0ad3785eb
        [1/7] checking root items
        [2/7] checking extents
        ref mismatch on [30425088 16384] extent item 1, found 0
        backref 30425088 root 256 not referenced back 0x564a91c23d70
        incorrect global backref count on 30425088 found 1 wanted 0
        backpointer mismatch on [30425088 16384]
        owner ref check failed [30425088 16384]
        ERROR: errors found in extent allocation tree or chunk allocation
        [3/7] checking free space cache
        [4/7] checking fs roots
        [5/7] checking only csums items (without verifying data)
        [6/7] checking root refs
        [7/7] checking quota groups skipped (not enabled on this FS)
        found 212992 bytes used, error(s) found
        total csum bytes: 0
        total tree bytes: 131072
        total fs tree bytes: 32768
        total extent tree bytes: 16384
        btree space waste bytes: 124669
        file data blocks allocated: 65536
         referenced 65536
      
      So fix this by freeing the metadata extent if btrfs_insert_root() returns
      an error.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      951dad9b
    • Q
      btrfs: handle remount to no compress during compression · 35ee722e
      Qu Wenruo 提交于
      stable inclusion
      from stable-5.10.36
      commit dba16ca6f347266145606b7c1e5c056986fc0124
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 1d8ba9e7 upstream.
      
      [BUG]
      When running btrfs/071 with inode_need_compress() removed from
      compress_file_range(), we got the following crash:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000018
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
        RIP: 0010:compress_file_range+0x476/0x7b0 [btrfs]
        Call Trace:
         ? submit_compressed_extents+0x450/0x450 [btrfs]
         async_cow_start+0x16/0x40 [btrfs]
         btrfs_work_helper+0xf2/0x3e0 [btrfs]
         process_one_work+0x278/0x5e0
         worker_thread+0x55/0x400
         ? process_one_work+0x5e0/0x5e0
         kthread+0x168/0x190
         ? kthread_create_worker_on_cpu+0x70/0x70
         ret_from_fork+0x22/0x30
        ---[ end trace 65faf4eae941fa7d ]---
      
      This is already after the patch "btrfs: inode: fix NULL pointer
      dereference if inode doesn't need compression."
      
      [CAUSE]
      @pages is firstly created by kcalloc() in compress_file_extent():
                      pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
      
      Then passed to btrfs_compress_pages() to be utilized there:
      
                      ret = btrfs_compress_pages(...
                                                 pages,
                                                 &nr_pages,
                                                 ...);
      
      btrfs_compress_pages() will initialize each page as output, in
      zlib_compress_pages() we have:
      
                              pages[nr_pages] = out_page;
                              nr_pages++;
      
      Normally this is completely fine, but there is a special case which
      is in btrfs_compress_pages() itself:
      
              switch (type) {
              default:
                      return -E2BIG;
              }
      
      In this case, we didn't modify @pages nor @out_pages, leaving them
      untouched, then when we cleanup pages, the we can hit NULL pointer
      dereference again:
      
              if (pages) {
                      for (i = 0; i < nr_pages; i++) {
                              WARN_ON(pages[i]->mapping);
                              put_page(pages[i]);
                      }
              ...
              }
      
      Since pages[i] are all initialized to zero, and btrfs_compress_pages()
      doesn't change them at all, accessing pages[i]->mapping would lead to
      NULL pointer dereference.
      
      This is not possible for current kernel, as we check
      inode_need_compress() before doing pages allocation.
      But if we're going to remove that inode_need_compress() in
      compress_file_extent(), then it's going to be a problem.
      
      [FIX]
      When btrfs_compress_pages() hits its default case, modify @out_pages to
      0 to prevent such problem from happening.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212331
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      35ee722e
    • A
      smb2: fix use-after-free in smb2_ioctl_query_info() · 491b6804
      Aurelien Aptel 提交于
      stable inclusion
      from stable-5.10.36
      commit 5f2adf84624efe2fef48d2501bc3ccf660b280f1
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit ccd48ec3 upstream.
      
      * rqst[1,2,3] is allocated in vars
      * each rqst->rq_iov is also allocated in vars or using pooled memory
      
      SMB2_open_free, SMB2_ioctl_free, SMB2_query_info_free are iterating on
      each rqst after vars has been freed (use-after-free), and they are
      freeing the kvec a second time (double-free).
      
      How to trigger:
      
      * compile with KASAN
      * mount a share
      
      $ smbinfo quota /mnt/foo
      Segmentation fault
      $ dmesg
      
       ==================================================================
       BUG: KASAN: use-after-free in SMB2_open_free+0x1c/0xa0
       Read of size 8 at addr ffff888007b10c00 by task python3/1200
      
       CPU: 2 PID: 1200 Comm: python3 Not tainted 5.12.0-rc6+ #107
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
       Call Trace:
        dump_stack+0x93/0xc2
        print_address_description.constprop.0+0x18/0x130
        ? SMB2_open_free+0x1c/0xa0
        ? SMB2_open_free+0x1c/0xa0
        kasan_report.cold+0x7f/0x111
        ? smb2_ioctl_query_info+0x240/0x990
        ? SMB2_open_free+0x1c/0xa0
        SMB2_open_free+0x1c/0xa0
        smb2_ioctl_query_info+0x2bf/0x990
        ? smb2_query_reparse_tag+0x600/0x600
        ? cifs_mapchar+0x250/0x250
        ? rcu_read_lock_sched_held+0x3f/0x70
        ? cifs_strndup_to_utf16+0x12c/0x1c0
        ? rwlock_bug.part.0+0x60/0x60
        ? rcu_read_lock_sched_held+0x3f/0x70
        ? cifs_convert_path_to_utf16+0xf8/0x140
        ? smb2_check_message+0x6f0/0x6f0
        cifs_ioctl+0xf18/0x16b0
        ? smb2_query_reparse_tag+0x600/0x600
        ? cifs_readdir+0x1800/0x1800
        ? selinux_bprm_creds_for_exec+0x4d0/0x4d0
        ? do_user_addr_fault+0x30b/0x950
        ? __x64_sys_openat+0xce/0x140
        __x64_sys_ioctl+0xb9/0xf0
        do_syscall_64+0x33/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7fdcf1f4ba87
       Code: b3 66 90 48 8b 05 11 14 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 13 2c 00 f7 d8 64 89 01 48
       RSP: 002b:00007ffef1ce7748 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
       RAX: ffffffffffffffda RBX: 00000000c018cf07 RCX: 00007fdcf1f4ba87
       RDX: 0000564c467c5590 RSI: 00000000c018cf07 RDI: 0000000000000003
       RBP: 00007ffef1ce7770 R08: 00007ffef1ce7420 R09: 00007fdcf0e0562b
       R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000004018
       R13: 0000000000000001 R14: 0000000000000003 R15: 0000564c467c5590
      
       Allocated by task 1200:
        kasan_save_stack+0x1b/0x40
        __kasan_kmalloc+0x7a/0x90
        smb2_ioctl_query_info+0x10e/0x990
        cifs_ioctl+0xf18/0x16b0
        __x64_sys_ioctl+0xb9/0xf0
        do_syscall_64+0x33/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       Freed by task 1200:
        kasan_save_stack+0x1b/0x40
        kasan_set_track+0x1c/0x30
        kasan_set_free_info+0x20/0x30
        __kasan_slab_free+0xe5/0x110
        slab_free_freelist_hook+0x53/0x130
        kfree+0xcc/0x320
        smb2_ioctl_query_info+0x2ad/0x990
        cifs_ioctl+0xf18/0x16b0
        __x64_sys_ioctl+0xb9/0xf0
        do_syscall_64+0x33/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
       The buggy address belongs to the object at ffff888007b10c00
        which belongs to the cache kmalloc-512 of size 512
       The buggy address is located 0 bytes inside of
        512-byte region [ffff888007b10c00, ffff888007b10e00)
       The buggy address belongs to the page:
       page:0000000044e14b75 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7b10
       head:0000000044e14b75 order:2 compound_mapcount:0 compound_pincount:0
       flags: 0x100000000010200(slab|head)
       raw: 0100000000010200 ffffea000015f500 0000000400000004 ffff888001042c80
       raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
        ffff888007b10b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff888007b10b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       >ffff888007b10c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                          ^
        ffff888007b10c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ffff888007b10d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ==================================================================
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      491b6804
    • S
      cifs: detect dead connections only when echoes are enabled. · 7b9f30f4
      Shyam Prasad N 提交于
      stable inclusion
      from stable-5.10.36
      commit 8a90058752e0b04b3159fa889a7f611b550bf816
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit f4916649 upstream.
      
      We can detect server unresponsiveness only if echoes are enabled.
      Echoes can be disabled under two scenarios:
      1. The connection is low on credits, so we've disabled echoes/oplocks.
      2. The connection has not seen any request till now (other than
      negotiate/sess-setup), which is when we enable these two, based on
      the credits available.
      
      So this fix will check for dead connection, only when echo is enabled.
      Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
      CC: <stable@vger.kernel.org> # v5.8+
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7b9f30f4
    • E
      cifs: fix out-of-bound memory access when calling smb3_notify() at mount point · ed8dbc7b
      Eugene Korenevsky 提交于
      stable inclusion
      from stable-5.10.36
      commit 23d7b4a8f77ae1252ac1a0c496ec3b603f85f593
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit a637f4ae upstream.
      
      If smb3_notify() is called at mount point of CIFS, build_path_from_dentry()
      returns the pointer to kmalloc-ed memory with terminating zero (this is
      empty FileName to be passed to SMB2 CREATE request). This pointer is assigned
      to the `path` variable.
      Then `path + 1` (to skip first backslash symbol) is passed to
      cifs_convert_path_to_utf16(). This is incorrect for empty path and causes
      out-of-bound memory access.
      
      Get rid of this "increase by one". cifs_convert_path_to_utf16() already
      contains the check for leading backslash in the path.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=212693
      CC: <stable@vger.kernel.org> # v5.6+
      Signed-off-by: NEugene Korenevsky <ekorenevsky@astralinux.ru>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      ed8dbc7b
    • P
      cifs: Return correct error code from smb2_get_enc_key · 9e4fe771
      Paul Aurich 提交于
      stable inclusion
      from stable-5.10.36
      commit aaa0faa5c28a91c362352d6b35dc3ed10df56fb0
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 83728cbf upstream.
      
      Avoid a warning if the error percolates back up:
      
      [440700.376476] CIFS VFS: \\otters.example.com crypt_message: Could not get encryption key
      [440700.386947] ------------[ cut here ]------------
      [440700.386948] err = 1
      [440700.386977] WARNING: CPU: 11 PID: 2733 at /build/linux-hwe-5.4-p6lk6L/linux-hwe-5.4-5.4.0/lib/errseq.c:74 errseq_set+0x5c/0x70
      ...
      [440700.397304] CPU: 11 PID: 2733 Comm: tar Tainted: G           OE     5.4.0-70-generic #78~18.04.1-Ubuntu
      ...
      [440700.397334] Call Trace:
      [440700.397346]  __filemap_set_wb_err+0x1a/0x70
      [440700.397419]  cifs_writepages+0x9c7/0xb30 [cifs]
      [440700.397426]  do_writepages+0x4b/0xe0
      [440700.397444]  __filemap_fdatawrite_range+0xcb/0x100
      [440700.397455]  filemap_write_and_wait+0x42/0xa0
      [440700.397486]  cifs_setattr+0x68b/0xf30 [cifs]
      [440700.397493]  notify_change+0x358/0x4a0
      [440700.397500]  utimes_common+0xe9/0x1c0
      [440700.397510]  do_utimes+0xc5/0x150
      [440700.397520]  __x64_sys_utimensat+0x88/0xd0
      
      Fixes: 61cfac6f ("CIFS: Fix possible use after free in demultiplex thread")
      Signed-off-by: NPaul Aurich <paul@darkrain42.org>
      CC: stable@vger.kernel.org
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9e4fe771
    • G
      erofs: add unsupported inode i_format check · 89806ecc
      Gao Xiang 提交于
      stable inclusion
      from stable-5.10.36
      commit dbaf435ddf973cc90146c95e526bd6163843377b
      bugzilla: 51867
      CVE: NA
      
      --------------------------------
      
      commit 24a806d8 upstream.
      
      If any unknown i_format fields are set (may be of some new incompat
      inode features), mark such inode as unsupported.
      
      Just in case of any new incompat i_format fields added in the future.
      
      Link: https://lore.kernel.org/r/20210329003614.6583-1-hsiangkao@aol.com
      Fixes: 431339ba ("staging: erofs: add inode operations")
      Cc: <stable@vger.kernel.org> # 4.19+
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      89806ecc