1. 06 11月, 2019 23 次提交
    • T
      NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid() · 74001646
      Trond Myklebust 提交于
      commit 79cc55422ce99be5964bde208ba8557174720893 upstream.
      
      A typo in nfs4_refresh_delegation_stateid() means we're leaking an
      RCU lock, and always returning a value of 'false'. As the function
      description states, we were always supposed to return 'true' if a
      matching delegation was found.
      
      Fixes: 12f275cd ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      74001646
    • M
      fuse: truncate pending writes on O_TRUNC · 62e42369
      Miklos Szeredi 提交于
      commit e4648309b85a78f8c787457832269a8712a8673e upstream.
      
      Make sure cached writes are not reordered around open(..., O_TRUNC), with
      the obvious wrong results.
      
      Fixes: 4d99ff8f ("fuse: Turn writeback cache on")
      Cc: <stable@vger.kernel.org> # v3.15+
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62e42369
    • M
      fuse: flush dirty data/metadata before non-truncate setattr · 72c913fd
      Miklos Szeredi 提交于
      commit b24e7598db62386a95a3c8b9c75630c5d56fe077 upstream.
      
      If writeback cache is enabled, then writes might get reordered with
      chmod/chown/utimes.  The problem with this is that performing the write in
      the fuse daemon might itself change some of these attributes.  In such case
      the following sequence of operations will result in file ending up with the
      wrong mode, for example:
      
        int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL);
        write (fd, "1", 1);
        fchown (fd, 0, 0);
        fchmod (fd, 04755);
        close (fd);
      
      This patch fixes this by flushing pending writes before performing
      chown/chmod/utimes.
      Reported-by: NGiuseppe Scrivano <gscrivan@redhat.com>
      Tested-by: NGiuseppe Scrivano <gscrivan@redhat.com>
      Fixes: 4d99ff8f ("fuse: Turn writeback cache on")
      Cc: <stable@vger.kernel.org> # v3.15+
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72c913fd
    • C
      NFSv4: Fix leak of clp->cl_acceptor string · da24be88
      Chuck Lever 提交于
      [ Upstream commit 1047ec868332034d1fbcb2fae19fe6d4cb869ff2 ]
      
      Our client can issue multiple SETCLIENTID operations to the same
      server in some circumstances. Ensure that calls to
      nfs4_proc_setclientid() after the first one do not overwrite the
      previously allocated cl_acceptor string.
      
      unreferenced object 0xffff888461031800 (size 32):
        comm "mount.nfs", pid 2227, jiffies 4294822467 (age 1407.749s)
        hex dump (first 32 bytes):
          6e 66 73 40 6b 6c 69 6d 74 2e 69 62 2e 31 30 31  nfs@klimt.ib.101
          35 67 72 61 6e 67 65 72 2e 6e 65 74 00 00 00 00  5granger.net....
        backtrace:
          [<00000000ab820188>] __kmalloc+0x128/0x176
          [<00000000eeaf4ec8>] gss_stringify_acceptor+0xbd/0x1a7 [auth_rpcgss]
          [<00000000e85e3382>] nfs4_proc_setclientid+0x34e/0x46c [nfsv4]
          [<000000003d9cf1fa>] nfs40_discover_server_trunking+0x7a/0xed [nfsv4]
          [<00000000b81c3787>] nfs4_discover_server_trunking+0x81/0x244 [nfsv4]
          [<000000000801b55f>] nfs4_init_client+0x1b0/0x238 [nfsv4]
          [<00000000977daf7f>] nfs4_set_client+0xfe/0x14d [nfsv4]
          [<0000000053a68a2a>] nfs4_create_server+0x107/0x1db [nfsv4]
          [<0000000088262019>] nfs4_remote_mount+0x2c/0x59 [nfsv4]
          [<00000000e84a2fd0>] legacy_get_tree+0x2d/0x4c
          [<00000000797e947c>] vfs_get_tree+0x20/0xc7
          [<00000000ecabaaa8>] fc_mount+0xe/0x36
          [<00000000f15fafc2>] vfs_kern_mount+0x74/0x8d
          [<00000000a3ff4e26>] nfs_do_root_mount+0x8a/0xa3 [nfsv4]
          [<00000000d1c2b337>] nfs4_try_mount+0x58/0xad [nfsv4]
          [<000000004c9bddee>] nfs_fs_mount+0x820/0x869 [nfs]
      
      Fixes: f11b2a1c ("nfs4: copy acceptor name from context ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      da24be88
    • J
      fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() · c3689876
      Jia-Ju Bai 提交于
      [ Upstream commit 2abb7d3b12d007c30193f48bebed781009bebdd2 ]
      
      In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283
      to check whether inode_alloc is NULL:
      
          if (inode_alloc)
      
      When inode_alloc is NULL, it is used on line 287:
      
          ocfs2_inode_lock(inode_alloc, &bh, 0);
              ocfs2_inode_lock_full_nested(inode, ...)
                  struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, inode_alloc is checked on line 286.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Link: http://lkml.kernel.org/r/20190726033717.32359-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c3689876
    • J
      fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() · 4de544b4
      Jia-Ju Bai 提交于
      [ Upstream commit 583fee3e12df0e6f1f66f063b989d8e7fed0e65a ]
      
      In ocfs2_write_end_nolock(), there are an if statement on lines 1976,
      2047 and 2058, to check whether handle is NULL:
      
          if (handle)
      
      When handle is NULL, it is used on line 2045:
      
      	ocfs2_update_inode_fsync_trans(handle, inode, 1);
              oi->i_sync_tid = handle->h_transaction->t_tid;
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, handle is checked before calling
      ocfs2_update_inode_fsync_trans().
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4de544b4
    • J
      fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() · c18d8604
      Jia-Ju Bai 提交于
      [ Upstream commit 56e94ea132bb5c2c1d0b60a6aeb34dcb7d71a53d ]
      
      In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
      check whether loc->xl_entry is NULL:
      
          if (loc->xl_entry)
      
      When loc->xl_entry is NULL, it is used on line 2158:
      
          ocfs2_xa_add_entry(loc, name_hash);
              loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
              loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
      
      and line 2164:
      
          ocfs2_xa_add_namevalue(loc, xi);
              loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
              loc->xl_entry->xe_name_len = xi->xi_name_len;
      
      Thus, possible null-pointer dereferences may occur.
      
      To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
      abnormally returns with -EINVAL.
      
      These bugs are found by a static analysis tool STCheck written by us.
      
      [akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
      Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c18d8604
    • J
      ocfs2: clear zero in unaligned direct IO · 2141f777
      Jia Guo 提交于
      [ Upstream commit 7a243c82ea527cd1da47381ad9cd646844f3b693 ]
      
      Unused portion of a part-written fs-block-sized block is not set to zero
      in unaligned append direct write.This can lead to serious data
      inconsistencies.
      
      Ocfs2 manage disk with cluster size(for example, 1M), part-written in
      one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
      VFS(function dio_zero_block) doesn't do the cleaning because bh's state
      is not set to NEW in function ocfs2_dio_wr_get_block when we write a
      WRITTEN cluster.  For example, the cluster size is 1M, file size is 8k
      and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
      contain dirty data.
      
      We have to deal with two cases:
       1.The starting position of direct write is outside the file.
       2.The starting position of direct write is located in the file.
      
      We need set bh's state to NEW in the first case.  In the second case, we
      need mapped twice because bh's state of area out file should be set to
      NEW while area in file not.
      
      [akpm@linux-foundation.org: coding style fixes]
      Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.comSigned-off-by: NJia Guo <guojia12@huawei.com>
      Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2141f777
    • A
      fs: cifs: mute -Wunused-const-variable message · 497fd98a
      Austin Kim 提交于
      [ Upstream commit dd19c106a36690b47bb1acc68372f2b472b495b8 ]
      
      After 'Initial git repository build' commit,
      'mapping_table_ERRHRD' variable has not been used.
      
      So 'mapping_table_ERRHRD' const variable could be removed
      to mute below warning message:
      
         fs/cifs/netmisc.c:120:40: warning: unused variable 'mapping_table_ERRHRD' [-Wunused-const-variable]
         static const struct smb_to_posix_error mapping_table_ERRHRD[] = {
                                                 ^
      Signed-off-by: NAustin Kim <austindh.kim@gmail.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      497fd98a
    • Z
      nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request · ca2cc4b4
      ZhangXiaoxu 提交于
      [ Upstream commit 33ea5aaa87cdae0f9af4d6b7ee4f650a1a36fd1d ]
      
      When xfstests testing, there are some WARNING as below:
      
      WARNING: CPU: 0 PID: 6235 at fs/nfs/inode.c:122 nfs_clear_inode+0x9c/0xd8
      Modules linked in:
      CPU: 0 PID: 6235 Comm: umount.nfs
      Hardware name: linux,dummy-virt (DT)
      pstate: 60000005 (nZCv daif -PAN -UAO)
      pc : nfs_clear_inode+0x9c/0xd8
      lr : nfs_evict_inode+0x60/0x78
      sp : fffffc000f68fc00
      x29: fffffc000f68fc00 x28: fffffe00c53155c0
      x27: fffffe00c5315000 x26: fffffc0009a63748
      x25: fffffc000f68fd18 x24: fffffc000bfaaf40
      x23: fffffc000936d3c0 x22: fffffe00c4ff5e20
      x21: fffffc000bfaaf40 x20: fffffe00c4ff5d10
      x19: fffffc000c056000 x18: 000000000000003c
      x17: 0000000000000000 x16: 0000000000000000
      x15: 0000000000000040 x14: 0000000000000228
      x13: fffffc000c3a2000 x12: 0000000000000045
      x11: 0000000000000000 x10: 0000000000000000
      x9 : 0000000000000000 x8 : 0000000000000000
      x7 : 0000000000000000 x6 : fffffc00084b027c
      x5 : fffffc0009a64000 x4 : fffffe00c0e77400
      x3 : fffffc000c0563a8 x2 : fffffffffffffffb
      x1 : 000000000000764e x0 : 0000000000000001
      Call trace:
       nfs_clear_inode+0x9c/0xd8
       nfs_evict_inode+0x60/0x78
       evict+0x108/0x380
       dispose_list+0x70/0xa0
       evict_inodes+0x194/0x210
       generic_shutdown_super+0xb0/0x220
       nfs_kill_super+0x40/0x88
       deactivate_locked_super+0xb4/0x120
       deactivate_super+0x144/0x160
       cleanup_mnt+0x98/0x148
       __cleanup_mnt+0x38/0x50
       task_work_run+0x114/0x160
       do_notify_resume+0x2f8/0x308
       work_pending+0x8/0x14
      
      The nrequest should be increased/decreased only if PG_INODE_REF flag
      was setted.
      
      But in the nfs_inode_remove_request function, it maybe decrease when
      no PG_INODE_REF flag, this maybe lead nrequests count error.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ca2cc4b4
    • K
      exec: load_script: Do not exec truncated interpreter path · 646e5c77
      Kees Cook 提交于
      [ Upstream commit b5372fe5dc84235dbe04998efdede3c4daa866a9 ]
      
      Commit 8099b047ecc4 ("exec: load_script: don't blindly truncate
      shebang string") was trying to protect against a confused exec of a
      truncated interpreter path. However, it was overeager and also refused
      to truncate arguments as well, which broke userspace, and it was
      reverted. This attempts the protection again, but allows arguments to
      remain truncated. In an effort to improve readability, helper functions
      and comments have been added.
      Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Samuel Dionne-Riel <samuel@dionne-riel.com>
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Cc: Graham Christensen <graham@grahamc.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      646e5c77
    • T
      ext4: disallow files with EXT4_JOURNAL_DATA_FL from EXT4_IOC_SWAP_BOOT · f251c83d
      Theodore Ts'o 提交于
      [ Upstream commit 6e589291f4b1b700ca12baec5930592a0d51e63c ]
      
      A malicious/clueless root user can use EXT4_IOC_SWAP_BOOT to force a
      corner casew which can lead to the file system getting corrupted.
      There's no usefulness to allowing this, so just prohibit this case.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f251c83d
    • R
      cifs: add credits from unmatched responses/messages · b73132b7
      Ronnie Sahlberg 提交于
      [ Upstream commit eca004523811f816bcfca3046ab54e1278e0973b ]
      
      We should add any credits granted to us from unmatched server responses.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b73132b7
    • P
      CIFS: Respect SMB2 hdr preamble size in read responses · ee4d28a7
      Pavel Shilovsky 提交于
      [ Upstream commit bb1bccb60c2ebd9a6f895507d1d48d5ed773814e ]
      
      There are a couple places where we still account for 4 bytes
      in the beginning of SMB2 packet which is not true in the current
      code. Fix this to use a header preamble size where possible.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ee4d28a7
    • T
      NFSv4: Ensure that the state manager exits the loop on SIGKILL · ce05beb3
      Trond Myklebust 提交于
      [ Upstream commit a1aa09be21fa344d1f5585aab8164bfae55f57e3 ]
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ce05beb3
    • F
      Btrfs: fix deadlock on tree root leaf when finding free extent · cb38a17c
      Filipe Manana 提交于
      [ Upstream commit 4222ea7100c0e37adace2790c8822758bbeee179 ]
      
      When we are writing out a free space cache, during the transaction commit
      phase, we can end up in a deadlock which results in a stack trace like the
      following:
      
       schedule+0x28/0x80
       btrfs_tree_read_lock+0x8e/0x120 [btrfs]
       ? finish_wait+0x80/0x80
       btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
       btrfs_search_slot+0xf6/0x9f0 [btrfs]
       ? evict_refill_and_join+0xd0/0xd0 [btrfs]
       ? inode_insert5+0x119/0x190
       btrfs_lookup_inode+0x3a/0xc0 [btrfs]
       ? kmem_cache_alloc+0x166/0x1d0
       btrfs_iget+0x113/0x690 [btrfs]
       __lookup_free_space_inode+0xd8/0x150 [btrfs]
       lookup_free_space_inode+0x5b/0xb0 [btrfs]
       load_free_space_cache+0x7c/0x170 [btrfs]
       ? cache_block_group+0x72/0x3b0 [btrfs]
       cache_block_group+0x1b3/0x3b0 [btrfs]
       ? finish_wait+0x80/0x80
       find_free_extent+0x799/0x1010 [btrfs]
       btrfs_reserve_extent+0x9b/0x180 [btrfs]
       btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs]
       __btrfs_cow_block+0x11d/0x500 [btrfs]
       btrfs_cow_block+0xdc/0x180 [btrfs]
       btrfs_search_slot+0x3bd/0x9f0 [btrfs]
       btrfs_lookup_inode+0x3a/0xc0 [btrfs]
       ? kmem_cache_alloc+0x166/0x1d0
       btrfs_update_inode_item+0x46/0x100 [btrfs]
       cache_save_setup+0xe4/0x3a0 [btrfs]
       btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs]
       btrfs_commit_transaction+0xcb/0x8b0 [btrfs]
      
      At cache_save_setup() we need to update the inode item of a block group's
      cache which is located in the tree root (fs_info->tree_root), which means
      that it may result in COWing a leaf from that tree. If that happens we
      need to find a free metadata extent and while looking for one, if we find
      a block group which was not cached yet we attempt to load its cache by
      calling cache_block_group(). However this function will try to load the
      inode of the free space cache, which requires finding the matching inode
      item in the tree root - if that inode item is located in the same leaf as
      the inode item of the space cache we are updating at cache_save_setup(),
      we end up in a deadlock, since we try to obtain a read lock on the same
      extent buffer that we previously write locked.
      
      So fix this by using the tree root's commit root when searching for a
      block group's free space cache inode item when we are attempting to load
      a free space cache. This is safe since block groups once loaded stay in
      memory forever, as well as their caches, so after they are first loaded
      we will never need to read their inode items again. For new block groups,
      once they are created they get their ->cached field set to
      BTRFS_CACHE_FINISHED meaning we will not need to read their inode item.
      Reported-by: NAndrew Nelson <andrew.s.nelson@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAPTELenq9x5KOWuQ+fa7h1r3nsJG8vyiTH8+ifjURc_duHh2Wg@mail.gmail.com/
      Fixes: 9d66e233 ("Btrfs: load free space cache if it exists")
      Tested-by: NAndrew Nelson <andrew.s.nelson@gmail.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      cb38a17c
    • C
      f2fs: fix to recover inode->i_flags of inode block during POR · ce435543
      Chao Yu 提交于
      [ Upstream commit 0c093b590efb5c1ccdc835868dc2ae94bd2e14dc ]
      
      Testcase to reproduce this bug:
      1. mkfs.f2fs /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. sync
      5. chattr +a /mnt/f2fs/file
      6. xfs_io -a /mnt/f2fs/file -c "fsync"
      7. godown /mnt/f2fs
      8. umount /mnt/f2fs
      9. mount -t f2fs /dev/sdd /mnt/f2fs
      10. xfs_io /mnt/f2fs/file
      
      There is no error when opening this file w/o O_APPEND, but actually,
      we expect the correct result should be:
      
      /mnt/f2fs/file: Operation not permitted
      
      The root cause is, in recover_inode(), we recover inode->i_flags more
      than F2FS_I(inode)->i_flags, so fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ce435543
    • C
      f2fs: fix to recover inode's i_gc_failures during POR · b619de07
      Chao Yu 提交于
      [ Upstream commit 7de36cf3e4087207f42a88992f8cb615a1bd902e ]
      
      inode.i_gc_failures is used to indicate that skip count of migrating
      on blocks of inode, we should guarantee it can be recovered in sudden
      power-off case.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b619de07
    • J
      f2fs: flush quota blocks after turnning it off · 6b2fbfac
      Jaegeuk Kim 提交于
      [ Upstream commit 0e0667b625cf64243df83171bff61f9d350b9ca5 ]
      
      After quota_off, we'll get some dirty blocks. If put_super don't have a chance
      to flush them by checkpoint, it causes NULL pointer exception in end_io after
      iput(node_inode). (e.g., by checkpoint=disable)
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6b2fbfac
    • Q
      btrfs: tracepoints: Fix wrong parameter order for qgroup events · d8ab4185
      Qu Wenruo 提交于
      [ Upstream commit fd2b007eaec898564e269d1f478a2da0380ecf51 ]
      
      [BUG]
      For btrfs:qgroup_meta_reserve event, the trace event can output garbage:
      
        qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2
      
      The diff should always be alinged to sector size (4k), so there is
      definitely something wrong.
      
      [CAUSE]
      For the wrong @diff, it's caused by wrong parameter order.
      The correct parameters are:
      
        struct btrfs_root, s64 diff, int type.
      
      However the parameters used are:
      
        struct btrfs_root, int type, s64 diff.
      
      Fixes: 4ee0d883 ("btrfs: qgroup: Update trace events for metadata reservation")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d8ab4185
    • Q
      btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() · 6bcbe350
      Qu Wenruo 提交于
      [ Upstream commit 8702ba9396bf7bbae2ab93c94acd4bd37cfa4f09 ]
      
      [Background]
      Btrfs qgroup uses two types of reserved space for METADATA space,
      PERTRANS and PREALLOC.
      
      PERTRANS is metadata space reserved for each transaction started by
      btrfs_start_transaction().
      While PREALLOC is for delalloc, where we reserve space before joining a
      transaction, and finally it will be converted to PERTRANS after the
      writeback is done.
      
      [Inconsistency]
      However there is inconsistency in how we handle PREALLOC metadata space.
      
      The most obvious one is:
      In btrfs_buffered_write():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);
      
      We always free qgroup PREALLOC meta space.
      
      While in btrfs_truncate_block():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
      
      We only free qgroup PREALLOC meta space when something went wrong.
      
      [The Correct Behavior]
      The correct behavior should be the one in btrfs_buffered_write(), we
      should always free PREALLOC metadata space.
      
      The reason is, the btrfs_delalloc_* mechanism works by:
      - Reserve metadata first, even it's not necessary
        In btrfs_delalloc_reserve_metadata()
      
      - Free the unused metadata space
        Normally in:
        btrfs_delalloc_release_extents()
        |- btrfs_inode_rsv_release()
           Here we do calculation on whether we should release or not.
      
      E.g. for 64K buffered write, the metadata rsv works like:
      
      /* The first page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=0
      total:		num_bytes=calc_inode_reservations()
      /* The first page caused one outstanding extent, thus needs metadata
         rsv */
      
      /* The 2nd page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed
      /* The 2nd page doesn't cause new outstanding extent, needs no new meta
         rsv, so we free what we have reserved */
      
      /* The 3rd~16th pages */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed (still space for one outstanding extent)
      
      This means, if btrfs_delalloc_release_extents() determines to free some
      space, then those space should be freed NOW.
      So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
      than btrfs_qgroup_convert_reserved_meta().
      
      The good news is:
      - The callers are not that hot
        The hottest caller is in btrfs_buffered_write(), which is already
        fixed by commit 336a8bb8 ("btrfs: Fix wrong
        btrfs_delalloc_release_extents parameter"). Thus it's not that
        easy to cause false EDQUOT.
      
      - The trans commit in advance for qgroup would hide the bug
        Since commit f5fef4593653 ("btrfs: qgroup: Make qgroup async transaction
        commit more aggressive"), when btrfs qgroup metadata free space is slow,
        it will try to commit transaction and free the wrongly converted
        PERTRANS space, so it's not that easy to hit such bug.
      
      [FIX]
      So to fix the problem, remove the @qgroup_free parameter for
      btrfs_delalloc_release_extents(), and always pass true to
      btrfs_inode_rsv_release().
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6bcbe350
    • F
      Btrfs: fix memory leak due to concurrent append writes with fiemap · 96b9b946
      Filipe Manana 提交于
      [ Upstream commit c67d970f0ea8dcc423e112137d34334fa0abb8ec ]
      
      When we have a buffered write that starts at an offset greater than or
      equals to the file's size happening concurrently with a full ranged
      fiemap, we can end up leaking an extent state structure.
      
      Suppose we have a file with a size of 1Mb, and before the buffered write
      and fiemap are performed, it has a single extent state in its io tree
      representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.
      
      The following sequence diagram shows how the memory leak happens if a
      fiemap a buffered write, starting at offset 1Mb and with a length of
      4Kb, are performed concurrently.
      
                CPU 1                                                  CPU 2
      
        extent_fiemap()
          --> it's a full ranged fiemap
              range from 0 to LLONG_MAX - 1
              (9223372036854775807)
      
          --> locks range in the inode's
              io tree
            --> after this we have 2 extent
                states in the io tree:
                --> 1 for range [0, 1Mb[ with
                    the bits EXTENT_LOCKED and
                    EXTENT_DELALLOC_BITS set
                --> 1 for the range
                    [1Mb, LLONG_MAX[ with
                    the EXTENT_LOCKED bit set
      
                                                        --> start buffered write at offset
                                                            1Mb with a length of 4Kb
      
                                                        btrfs_file_write_iter()
      
                                                          btrfs_buffered_write()
                                                            --> cached_state is NULL
      
                                                            lock_and_cleanup_extent_if_need()
                                                              --> returns 0 and does not lock
                                                                  range because it starts
                                                                  at current i_size / eof
      
                                                            --> cached_state remains NULL
      
                                                            btrfs_dirty_pages()
                                                              btrfs_set_extent_delalloc()
                                                                (...)
                                                                __set_extent_bit()
      
                                                                  --> splits extent state for range
                                                                      [1Mb, LLONG_MAX[ and now we
                                                                      have 2 extent states:
      
                                                                      --> one for the range
                                                                          [1Mb, 1Mb + 4Kb[ with
                                                                          EXTENT_LOCKED set
                                                                      --> another one for the range
                                                                          [1Mb + 4Kb, LLONG_MAX[ with
                                                                          EXTENT_LOCKED set as well
      
                                                                  --> sets EXTENT_DELALLOC on the
                                                                      extent state for the range
                                                                      [1Mb, 1Mb + 4Kb[
                                                                  --> caches extent state
                                                                      [1Mb, 1Mb + 4Kb[ into
                                                                      @cached_state because it has
                                                                      the bit EXTENT_LOCKED set
      
                                                          --> btrfs_buffered_write() ends up
                                                              with a non-NULL cached_state and
                                                              never calls anything to release its
                                                              reference on it, resulting in a
                                                              memory leak
      
      Fix this by calling free_extent_state() on cached_state if the range was
      not locked by lock_and_cleanup_extent_if_need().
      
      The same issue can happen if anything else other than fiemap locks a range
      that covers eof and beyond.
      
      This could be triggered, sporadically, by test case generic/561 from the
      fstests suite, which makes duperemove run concurrently with fsstress, and
      duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
      the leak is reported in dmesg/syslog when removing the btrfs module with
      a message like the following:
      
        [77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1
      
      Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak.
      
      CC: stable@vger.kernel.org # 4.16+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      96b9b946
    • F
      Btrfs: fix inode cache block reserve leak on failure to allocate data space · 692aa7d5
      Filipe Manana 提交于
      [ Upstream commit 29d47d00e0ae61668ee0c5d90bef2893c8abbafa ]
      
      If we failed to allocate the data extent(s) for the inode space cache, we
      were bailing out without releasing the previously reserved metadata. This
      was triggering the following warnings when unmounting a filesystem:
      
        $ cat -n fs/btrfs/inode.c
        (...)
        9268  void btrfs_destroy_inode(struct inode *inode)
        9269  {
        (...)
        9276          WARN_ON(BTRFS_I(inode)->block_rsv.reserved);
        9277          WARN_ON(BTRFS_I(inode)->block_rsv.size);
        (...)
        9281          WARN_ON(BTRFS_I(inode)->csum_bytes);
        9282          WARN_ON(BTRFS_I(inode)->defrag_bytes);
        (...)
      
      Several fstests test cases triggered this often, such as generic/083,
      generic/102, generic/172, generic/269 and generic/300 at least, producing
      stack traces like the following in dmesg/syslog:
      
        [82039.079546] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9276 btrfs_destroy_inode+0x203/0x270 [btrfs]
        (...)
        [82039.081543] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.081912] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.082673] RIP: 0010:btrfs_destroy_inode+0x203/0x270 [btrfs]
        (...)
        [82039.083913] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
        [82039.084320] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.084736] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.085156] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.085578] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.086000] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.086416] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.086837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.087253] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.087672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.088089] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.088504] Call Trace:
        [82039.088918]  destroy_inode+0x3b/0x70
        [82039.089340]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.089768]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.090183]  ? wait_for_completion+0x65/0x1a0
        [82039.090607]  close_ctree+0x172/0x370 [btrfs]
        [82039.091021]  generic_shutdown_super+0x6c/0x110
        [82039.091427]  kill_anon_super+0xe/0x30
        [82039.091832]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.092233]  deactivate_locked_super+0x3a/0x70
        [82039.092636]  cleanup_mnt+0x3b/0x80
        [82039.093039]  task_work_run+0x93/0xc0
        [82039.093457]  exit_to_usermode_loop+0xfa/0x100
        [82039.093856]  do_syscall_64+0x162/0x1d0
        [82039.094244]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.094634] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.095876] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.096290] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.096700] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.097110] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.097522] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.097937] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.098350] irq event stamp: 0
        [82039.098750] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.099150] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.099545] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.099925] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.100292] ---[ end trace f2521afa616ddccc ]---
        [82039.100707] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9277 btrfs_destroy_inode+0x1ac/0x270 [btrfs]
        (...)
        [82039.103050] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.103428] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.104203] RIP: 0010:btrfs_destroy_inode+0x1ac/0x270 [btrfs]
        (...)
        [82039.105461] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
        [82039.105866] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.106270] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.106673] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.107078] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.107487] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.107894] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.108309] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.108723] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.109146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.109567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.109989] Call Trace:
        [82039.110405]  destroy_inode+0x3b/0x70
        [82039.110830]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.111257]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.111675]  ? wait_for_completion+0x65/0x1a0
        [82039.112101]  close_ctree+0x172/0x370 [btrfs]
        [82039.112519]  generic_shutdown_super+0x6c/0x110
        [82039.112988]  kill_anon_super+0xe/0x30
        [82039.113439]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.113861]  deactivate_locked_super+0x3a/0x70
        [82039.114278]  cleanup_mnt+0x3b/0x80
        [82039.114685]  task_work_run+0x93/0xc0
        [82039.115083]  exit_to_usermode_loop+0xfa/0x100
        [82039.115476]  do_syscall_64+0x162/0x1d0
        [82039.115863]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.116254] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.117463] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.117882] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.118330] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.118743] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.119159] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.119574] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.119987] irq event stamp: 0
        [82039.120387] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.120787] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.121182] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.121563] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.121933] ---[ end trace f2521afa616ddccd ]---
        [82039.122353] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9278 btrfs_destroy_inode+0x1bc/0x270 [btrfs]
        (...)
        [82039.124606] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.125008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.125801] RIP: 0010:btrfs_destroy_inode+0x1bc/0x270 [btrfs]
        (...)
        [82039.126998] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010202
        [82039.127399] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.127803] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.128206] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.128611] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.129020] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.129428] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.129846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.130261] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.130684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.131142] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.131561] Call Trace:
        [82039.131990]  destroy_inode+0x3b/0x70
        [82039.132417]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.132844]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.133262]  ? wait_for_completion+0x65/0x1a0
        [82039.133688]  close_ctree+0x172/0x370 [btrfs]
        [82039.134157]  generic_shutdown_super+0x6c/0x110
        [82039.134575]  kill_anon_super+0xe/0x30
        [82039.134997]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.135415]  deactivate_locked_super+0x3a/0x70
        [82039.135832]  cleanup_mnt+0x3b/0x80
        [82039.136239]  task_work_run+0x93/0xc0
        [82039.136637]  exit_to_usermode_loop+0xfa/0x100
        [82039.137029]  do_syscall_64+0x162/0x1d0
        [82039.137418]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.137812] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.139059] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.139475] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.139890] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.140302] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.140719] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.141138] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.141597] irq event stamp: 0
        [82039.142043] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.142443] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.142839] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.143220] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.143588] ---[ end trace f2521afa616ddcce ]---
        [82039.167472] WARNING: CPU: 3 PID: 13167 at fs/btrfs/extent-tree.c:10120 btrfs_free_block_groups+0x30d/0x460 [btrfs]
        (...)
        [82039.173800] CPU: 3 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.174847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.177031] RIP: 0010:btrfs_free_block_groups+0x30d/0x460 [btrfs]
        (...)
        [82039.180397] RSP: 0018:ffffac0b426a7dd8 EFLAGS: 00010206
        [82039.181574] RAX: ffff8de010a1db40 RBX: ffff8de010a1db40 RCX: 0000000000170014
        [82039.182711] RDX: ffff8ddff4380040 RSI: ffff8de010a1da58 RDI: 0000000000000246
        [82039.183817] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.184925] R10: ffff8de036404380 R11: ffffffffb8a5ea00 R12: ffff8de010a1b2b8
        [82039.186090] R13: ffff8de010a1b2b8 R14: 0000000000000000 R15: dead000000000100
        [82039.187208] FS:  00007f8db96d12c0(0000) GS:ffff8de036b80000(0000) knlGS:0000000000000000
        [82039.188345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.189481] CR2: 00007fb044005170 CR3: 00000002315cc006 CR4: 00000000003606e0
        [82039.190674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.191829] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.192978] Call Trace:
        [82039.194160]  close_ctree+0x19a/0x370 [btrfs]
        [82039.195315]  generic_shutdown_super+0x6c/0x110
        [82039.196486]  kill_anon_super+0xe/0x30
        [82039.197645]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.198696]  deactivate_locked_super+0x3a/0x70
        [82039.199619]  cleanup_mnt+0x3b/0x80
        [82039.200559]  task_work_run+0x93/0xc0
        [82039.201505]  exit_to_usermode_loop+0xfa/0x100
        [82039.202436]  do_syscall_64+0x162/0x1d0
        [82039.203339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.204091] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.206360] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.207132] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.207906] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.208621] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.209285] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.209984] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.210642] irq event stamp: 0
        [82039.211306] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.211971] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.212643] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.213304] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.213875] ---[ end trace f2521afa616ddccf ]---
      
      Fix this by releasing the reserved metadata on failure to allocate data
      extent(s) for the inode cache.
      
      Fixes: 69fe2d75 ("btrfs: make the delalloc block rsv per inode")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      692aa7d5
  2. 29 10月, 2019 7 次提交
    • F
      Btrfs: check for the full sync flag while holding the inode lock during fsync · 1b921b5b
      Filipe Manana 提交于
      commit ba0b084ac309283db6e329785c1dc4f45fdbd379 upstream.
      
      We were checking for the full fsync flag in the inode before locking the
      inode, which is racy, since at that that time it might not be set but
      after we acquire the inode lock some other task set it. One case where
      this can happen is on a system low on memory and some concurrent task
      failed to allocate an extent map and therefore set the full sync flag on
      the inode, to force the next fsync to work in full mode.
      
      A consequence of missing the full fsync flag set is hitting the problems
      fixed by commit 0c713cbab620 ("Btrfs: fix race between ranged fsync and
      writeback of adjacent ranges"), BUG_ON() when dropping extents from a log
      tree, hitting assertion failures at tree-log.c:copy_items() or all sorts
      of weird inconsistencies after replaying a log due to file extents items
      representing ranges that overlap.
      
      So just move the check such that it's done after locking the inode and
      before starting writeback again.
      
      Fixes: 0c713cbab620 ("Btrfs: fix race between ranged fsync and writeback of adjacent ranges")
      CC: stable@vger.kernel.org # 5.2+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b921b5b
    • F
      Btrfs: add missing extents release on file extent cluster relocation error · ac6bae2b
      Filipe Manana 提交于
      commit 44db1216efe37bf670f8d1019cdc41658d84baf5 upstream.
      
      If we error out when finding a page at relocate_file_extent_cluster(), we
      need to release the outstanding extents counter on the relocation inode,
      set by the previous call to btrfs_delalloc_reserve_metadata(), otherwise
      the inode's block reserve size can never decrease to zero and metadata
      space is leaked. Therefore add a call to btrfs_delalloc_release_extents()
      in case we can't find the target page.
      
      Fixes: 8b62f87b ("Btrfs: rework outstanding_extents")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac6bae2b
    • Q
      btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() · 6cd5be98
      Qu Wenruo 提交于
      commit 4b654acdae850f48b8250b9a578a4eaa518c7a6f upstream.
      
      In btrfs_read_block_groups(), if we have an invalid block group which
      has mixed type (DATA|METADATA) while the fs doesn't have MIXED_GROUPS
      feature, we error out without freeing the block group cache.
      
      This patch will add the missing btrfs_put_block_group() to prevent
      memory leak.
      
      Note for stable backports: the file to patch in versions <= 5.3 is
      fs/btrfs/extent-tree.c
      
      Fixes: 49303381 ("Btrfs: bail out if block group has different mixed flag")
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cd5be98
    • P
      CIFS: Fix use after free of file info structures · 01332b03
      Pavel Shilovsky 提交于
      commit 1a67c415965752879e2e9fad407bc44fc7f25f23 upstream.
      
      Currently the code assumes that if a file info entry belongs
      to lists of open file handles of an inode and a tcon then
      it has non-zero reference. The recent changes broke that
      assumption when putting the last reference of the file info.
      There may be a situation when a file is being deleted but
      nothing prevents another thread to reference it again
      and start using it. This happens because we do not hold
      the inode list lock while checking the number of references
      of the file info structure. Fix this by doing the proper
      locking when doing the check.
      
      Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo")
      Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
      Cc: Stable <stable@vger.kernel.org>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01332b03
    • R
      CIFS: avoid using MID 0xFFFF · 71cf8816
      Roberto Bergantinos Corpas 提交于
      commit 03d9a9fe3f3aec508e485dd3dcfa1e99933b4bdb upstream.
      
      According to MS-CIFS specification MID 0xFFFF should not be used by the
      CIFS client, but we actually do. Besides, this has proven to cause races
      leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
      MID is a 2 byte value easy to reach in CurrentMid which may conflict with
      an oplock break notification request coming from server
      Signed-off-by: NRoberto Bergantinos Corpas <rbergant@redhat.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71cf8816
    • D
      fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c · 6ea856ef
      David Hildenbrand 提交于
      commit aad5f69bc161af489dbb5934868bd347282f0764 upstream.
      
      There are three places where we access uninitialized memmaps, namely:
      - /proc/kpagecount
      - /proc/kpageflags
      - /proc/kpagecgroup
      
      We have initialized memmaps either when the section is online or when the
      page was initialized to the ZONE_DEVICE.  Uninitialized memmaps contain
      garbage and in the worst case trigger kernel BUGs, especially with
      CONFIG_PAGE_POISONING.
      
      For example, not onlining a DIMM during boot and calling /proc/kpagecount
      with CONFIG_PAGE_POISONING:
      
        :/# cat /proc/kpagecount > tmp.test
        BUG: unable to handle page fault for address: fffffffffffffffe
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
        Oops: 0000 [#1] SMP NOPTI
        CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
        RIP: 0010:kpagecount_read+0xce/0x1e0
        Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
        RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
        RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
        RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
        R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
        FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
        Call Trace:
         proc_reg_read+0x3c/0x60
         vfs_read+0xc5/0x180
         ksys_read+0x68/0xe0
         do_syscall_64+0x5c/0xa0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      For now, let's drop support for ZONE_DEVICE from the three pseudo files
      in order to fix this.  To distinguish offline memory (with garbage
      memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
      would have to check get_dev_pagemap() and pfn_zone_device_reserved()
      right now.  The usage of both (especially, special casing devmem) is
      frowned upon and needs to be reworked.
      
      The fundamental issue we have is:
      
      	if (pfn_to_online_page(pfn)) {
      		/* memmap initialized */
      	} else if (pfn_valid(pfn)) {
      		/*
      		 * ???
      		 * a) offline memory. memmap garbage.
      		 * b) devmem: memmap initialized to ZONE_DEVICE.
      		 * c) devmem: reserved for driver. memmap garbage.
      		 * (d) devmem: memmap currently initializing - garbage)
      		 */
      	}
      
      We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
      in place as that function is also used from memory failure.  We now no
      longer dump information about pages that are not in use anymore -
      offline.
      
      Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
      Cc: Pankaj gupta <pagupta@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ea856ef
    • Y
      ocfs2: fix panic due to ocfs2_wq is null · 0d3ad773
      Yi Li 提交于
      commit b918c43021baaa3648de09e19a4a3dd555a45f40 upstream.
      
      mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
      an error.  ocfs2_initialize_super() returns before allocating ocfs2_wq.
      ocfs2_dismount_volume() triggers the following panic.
      
        Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
        ------------[ cut here ]------------
        Oops: 0002 [#1] SMP NOPTI
        CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G  E
              4.14.148-200.ckv.x86_64 #1
        Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
        task: ffff967af0520000 task.stack: ffffa5f05484000
        RIP: 0010:mutex_lock+0x19/0x20
        Call Trace:
          flush_workqueue+0x81/0x460
          ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
          ocfs2_dismount_volume+0x84/0x400 [ocfs2]
          ocfs2_fill_super+0xa4/0x1270 [ocfs2]
          ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
          mount_bdev+0x17f/0x1c0
          mount_fs+0x3a/0x160
      
      Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.comSigned-off-by: NYi Li <yilikernel@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d3ad773
  3. 18 10月, 2019 9 次提交
    • A
      Fix the locking in dcache_readdir() and friends · 664ec2db
      Al Viro 提交于
      commit d4f4de5e5ef8efde85febb6876cd3c8ab1631999 upstream.
      
      There are two problems in dcache_readdir() - one is that lockless traversal
      of the list needs non-trivial cooperation of d_alloc() (at least a switch
      to list_add_rcu(), and probably more than just that) and another is that
      it assumes that no removal will happen without the directory locked exclusive.
      Said assumption had always been there, never had been stated explicitly and
      is violated by several places in the kernel (devpts and selinuxfs).
      
              * replacement of next_positive() with different calling conventions:
      it returns struct list_head * instead of struct dentry *; the latter is
      passed in and out by reference, grabbing the result and dropping the original
      value.
              * scan is under ->d_lock.  If we run out of timeslice, cursor is moved
      after the last position we'd reached and we reschedule; then the scan continues
      from that place.  To avoid livelocks between multiple lseek() (with cursors
      getting moved past each other, never reaching the real entries) we always
      skip the cursors, need_resched() or not.
              * returned list_head * is either ->d_child of dentry we'd found or
      ->d_subdirs of parent (if we got to the end of the list).
              * dcache_readdir() and dcache_dir_lseek() switched to new helper.
      dcache_readdir() always holds a reference to dentry passed to dir_emit() now.
      Cursor is moved to just before the entry where dir_emit() has failed or into
      the very end of the list, if we'd run out.
              * move_cursor() eliminated - it had sucky calling conventions and
      after fixing that it became simply list_move() (in lseek and scan_positives)
      or list_move_tail() (in readdir).
      
              All operations with the list are under ->d_lock now, and we do not
      depend upon having all file removals done with parent locked exclusive
      anymore.
      
      Cc: stable@vger.kernel.org
      Reported-by: N"zhengbin (A)" <zhengbin13@huawei.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      664ec2db
    • T
      NFS: Fix O_DIRECT accounting of number of bytes read/written · e9360f39
      Trond Myklebust 提交于
      commit 031d73ed768a40684f3ca21992265ffdb6a270bf upstream.
      
      When a series of O_DIRECT reads or writes are truncated, either due to
      eof or due to an error, then we should return the number of contiguous
      bytes that were received/sent starting at the offset specified by the
      application.
      
      Currently, we are failing to correctly check contiguity, and so we're
      failing the generic/465 in xfstests when the race between the read
      and write RPCs causes the file to get extended while the 2 reads are
      outstanding. If the first read RPC call wins the race and returns with
      eof set, we should treat the second read RPC as being truncated.
      Reported-by: NSu Yanjun <suyj.fnst@cn.fujitsu.com>
      Fixes: 1ccbad9f ("nfs: fix DIO good bytes calculation")
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9360f39
    • J
      btrfs: fix uninitialized ret in ref-verify · e0805d7f
      Josef Bacik 提交于
      commit c5f4987e86f6692fdb12533ea1fc7a7bb98e555a upstream.
      
      Coverity caught a case where we could return with a uninitialized value
      in ret in process_leaf.  This is actually pretty likely because we could
      very easily run into a block group item key and have a garbage value in
      ret and think there was an errror.  Fix this by initializing ret to 0.
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Fixes: fd708b81 ("Btrfs: add a extent ref verify tool")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0805d7f
    • J
      btrfs: fix incorrect updating of log root tree · f7313de4
      Josef Bacik 提交于
      commit 4203e968947071586a98b5314fd7ffdea3b4f971 upstream.
      
      We've historically had reports of being unable to mount file systems
      because the tree log root couldn't be read.  Usually this is the "parent
      transid failure", but could be any of the related errors, including
      "fsid mismatch" or "bad tree block", depending on which block got
      allocated.
      
      The modification of the individual log root items are serialized on the
      per-log root root_mutex.  This means that any modification to the
      per-subvol log root_item is completely protected.
      
      However we update the root item in the log root tree outside of the log
      root tree log_mutex.  We do this in order to allow multiple subvolumes
      to be updated in each log transaction.
      
      This is problematic however because when we are writing the log root
      tree out we update the super block with the _current_ log root node
      information.  Since these two operations happen independently of each
      other, you can end up updating the log root tree in between writing out
      the dirty blocks and setting the super block to point at the current
      root.
      
      This means we'll point at the new root node that hasn't been written
      out, instead of the one we should be pointing at.  Thus whatever garbage
      or old block we end up pointing at complains when we mount the file
      system later and try to replay the log.
      
      Fix this by copying the log's root item into a local root item copy.
      Then once we're safely under the log_root_tree->log_mutex we update the
      root item in the log_root_tree.  This way we do not modify the
      log_root_tree while we're committing it, fixing the problem.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NChris Mason <clm@fb.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7313de4
    • D
      cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic · a8de7090
      Dave Wysochanski 提交于
      commit cb248819d209d113e45fed459773991518e8e80b upstream.
      
      Commit 487317c99477 ("cifs: add spinlock for the openFileList to
      cifsInodeInfo") added cifsInodeInfo->open_file_lock spin_lock to protect
      the openFileList, but missed a few places where cifs_inode->openFileList
      was enumerated.  Change these remaining tcon->open_file_lock to
      cifsInodeInfo->open_file_lock to avoid panic in is_size_safe_to_change.
      
      [17313.245641] RIP: 0010:is_size_safe_to_change+0x57/0xb0 [cifs]
      [17313.245645] Code: 68 40 48 89 ef e8 19 67 b7 f1 48 8b 43 40 48 8d 4b 40 48 8d 50 f0 48 39 c1 75 0f eb 47 48 8b 42 10 48 8d 50 f0 48 39 c1 74 3a <8b> 80 88 00 00 00 83 c0 01 a8 02 74 e6 48 89 ef c6 07 00 0f 1f 40
      [17313.245649] RSP: 0018:ffff94ae1baefa30 EFLAGS: 00010202
      [17313.245654] RAX: dead000000000100 RBX: ffff88dc72243300 RCX: ffff88dc72243340
      [17313.245657] RDX: dead0000000000f0 RSI: 00000000098f7940 RDI: ffff88dd3102f040
      [17313.245659] RBP: ffff88dd3102f040 R08: 0000000000000000 R09: ffff94ae1baefc40
      [17313.245661] R10: ffffcdc8bb1c4e80 R11: ffffcdc8b50adb08 R12: 00000000098f7940
      [17313.245663] R13: ffff88dc72243300 R14: ffff88dbc8f19600 R15: ffff88dc72243428
      [17313.245667] FS:  00007fb145485700(0000) GS:ffff88dd3e000000(0000) knlGS:0000000000000000
      [17313.245670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [17313.245672] CR2: 0000026bb46c6000 CR3: 0000004edb110003 CR4: 00000000007606e0
      [17313.245753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [17313.245756] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [17313.245759] PKRU: 55555554
      [17313.245761] Call Trace:
      [17313.245803]  cifs_fattr_to_inode+0x16b/0x580 [cifs]
      [17313.245838]  cifs_get_inode_info+0x35c/0xa60 [cifs]
      [17313.245852]  ? kmem_cache_alloc_trace+0x151/0x1d0
      [17313.245885]  cifs_open+0x38f/0x990 [cifs]
      [17313.245921]  ? cifs_revalidate_dentry_attr+0x3e/0x350 [cifs]
      [17313.245953]  ? cifsFileInfo_get+0x30/0x30 [cifs]
      [17313.245960]  ? do_dentry_open+0x132/0x330
      [17313.245963]  do_dentry_open+0x132/0x330
      [17313.245969]  path_openat+0x573/0x14d0
      [17313.245974]  do_filp_open+0x93/0x100
      [17313.245979]  ? __check_object_size+0xa3/0x181
      [17313.245986]  ? audit_alloc_name+0x7e/0xd0
      [17313.245992]  do_sys_open+0x184/0x220
      [17313.245999]  do_syscall_64+0x5b/0x1b0
      
      Fixes: 487317c99477 ("cifs: add spinlock for the openFileList to cifsInodeInfo")
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8de7090
    • P
      CIFS: Force reval dentry if LOOKUP_REVAL flag is set · 230b339a
      Pavel Shilovsky 提交于
      commit 0b3d0ef9840f7be202393ca9116b857f6f793715 upstream.
      
      Mark inode for force revalidation if LOOKUP_REVAL flag is set.
      This tells the client to actually send a QueryInfo request to
      the server to obtain the latest metadata in case a directory
      or a file were changed remotely. Only do that if the client
      doesn't have a lease for the file to avoid unneeded round
      trips to the server.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      230b339a
    • P
      CIFS: Force revalidate inode when dentry is stale · 0bc78de4
      Pavel Shilovsky 提交于
      commit c82e5ac7fe3570a269c0929bf7899f62048e7dbc upstream.
      
      Currently the client indicates that a dentry is stale when inode
      numbers or type types between a local inode and a remote file
      don't match. If this is the case attributes is not being copied
      from remote to local, so, it is already known that the local copy
      has stale metadata. That's why the inode needs to be marked for
      revalidation in order to tell the VFS to lookup the dentry again
      before openning a file. This prevents unexpected stale errors
      to be returned to the user space when openning a file.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bc78de4
    • P
      CIFS: Gracefully handle QueryInfo errors during open · d72c2115
      Pavel Shilovsky 提交于
      commit 30573a82fb179420b8aac30a3a3595aa96a93156 upstream.
      
      Currently if the client identifies problems when processing
      metadata returned in CREATE response, the open handle is being
      leaked. This causes multiple problems like a file missing a lease
      break by that client which causes high latencies to other clients
      accessing the file. Another side-effect of this is that the file
      can't be deleted.
      
      Fix this by closing the file after the client hits an error after
      the file was opened and the open descriptor wasn't returned to
      the user space. Also convert -ESTALE to -EOPENSTALE to allow
      the VFS to revalidate a dentry and retry the open.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d72c2115
    • I
      f2fs: use EINVAL for superblock with invalid magic · 95bcc0d9
      Icenowy Zheng 提交于
      [ Upstream commit 38fb6d0ea34299d97b031ed64fe994158b6f8eb3 ]
      
      The kernel mount_block_root() function expects -EACESS or -EINVAL for a
      unmountable filesystem when trying to mount the root with different
      filesystem types.
      
      However, in 5.3-rc1 the behavior when F2FS code cannot find valid block
      changed to return -EFSCORRUPTED(-EUCLEAN), and this error code makes
      mount_block_root() fail when trying to probe F2FS.
      
      When the magic number of the superblock mismatches, it has a high
      probability that it's just not a F2FS. In this case return -EINVAL seems
      to be a better result, and this return value can make mount_block_root()
      probing work again.
      
      Return -EINVAL when the superblock has magic mismatch, -EFSCORRUPTED in
      other cases (the magic matches but the superblock cannot be recognized).
      
      Fixes: 10f966bbf521 ("f2fs: use generic EFSBADCRC/EFSCORRUPTED")
      Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      95bcc0d9
  4. 12 10月, 2019 1 次提交
    • E
      vfs: Fix EOVERFLOW testing in put_compat_statfs64 · effad578
      Eric Sandeen 提交于
      commit cc3a7bfe62b947b423fcb2cfe89fcba92bf48fa3 upstream.
      
      Today, put_compat_statfs64() disallows nearly any field value over
      2^32 if f_bsize is only 32 bits, but that makes no sense.
      compat_statfs64 is there for the explicit purpose of providing 64-bit
      fields for f_files, f_ffree, etc.  And f_bsize is always only 32 bits.
      
      As a result, 32-bit userspace gets -EOVERFLOW for i.e.  large file
      counts even with -D_FILE_OFFSET_BITS=64 set.
      
      In reality, only f_bsize and f_frsize can legitimately overflow
      (fields like f_type and f_namelen should never be large), so test
      only those fields.
      
      This bug was discussed at length some time ago, and this is the proposal
      Al suggested at https://lkml.org/lkml/2018/8/6/640.  It seemed to get
      dropped amid the discussion of other related changes, but this
      part seems obviously correct on its own, so I've picked it up and
      sent it, for expediency.
      
      Fixes: 64d2ab32 ("vfs: fix put_compat_statfs64() does not handle errors")
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      effad578