1. 29 10月, 2019 6 次提交
    • F
      Btrfs: add missing extents release on file extent cluster relocation error · ac6bae2b
      Filipe Manana 提交于
      commit 44db1216efe37bf670f8d1019cdc41658d84baf5 upstream.
      
      If we error out when finding a page at relocate_file_extent_cluster(), we
      need to release the outstanding extents counter on the relocation inode,
      set by the previous call to btrfs_delalloc_reserve_metadata(), otherwise
      the inode's block reserve size can never decrease to zero and metadata
      space is leaked. Therefore add a call to btrfs_delalloc_release_extents()
      in case we can't find the target page.
      
      Fixes: 8b62f87b ("Btrfs: rework outstanding_extents")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac6bae2b
    • Q
      btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group() · 6cd5be98
      Qu Wenruo 提交于
      commit 4b654acdae850f48b8250b9a578a4eaa518c7a6f upstream.
      
      In btrfs_read_block_groups(), if we have an invalid block group which
      has mixed type (DATA|METADATA) while the fs doesn't have MIXED_GROUPS
      feature, we error out without freeing the block group cache.
      
      This patch will add the missing btrfs_put_block_group() to prevent
      memory leak.
      
      Note for stable backports: the file to patch in versions <= 5.3 is
      fs/btrfs/extent-tree.c
      
      Fixes: 49303381 ("Btrfs: bail out if block group has different mixed flag")
      CC: stable@vger.kernel.org # 4.9+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cd5be98
    • P
      CIFS: Fix use after free of file info structures · 01332b03
      Pavel Shilovsky 提交于
      commit 1a67c415965752879e2e9fad407bc44fc7f25f23 upstream.
      
      Currently the code assumes that if a file info entry belongs
      to lists of open file handles of an inode and a tcon then
      it has non-zero reference. The recent changes broke that
      assumption when putting the last reference of the file info.
      There may be a situation when a file is being deleted but
      nothing prevents another thread to reference it again
      and start using it. This happens because we do not hold
      the inode list lock while checking the number of references
      of the file info structure. Fix this by doing the proper
      locking when doing the check.
      
      Fixes: 487317c99477d ("cifs: add spinlock for the openFileList to cifsInodeInfo")
      Fixes: cb248819d209d ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
      Cc: Stable <stable@vger.kernel.org>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01332b03
    • R
      CIFS: avoid using MID 0xFFFF · 71cf8816
      Roberto Bergantinos Corpas 提交于
      commit 03d9a9fe3f3aec508e485dd3dcfa1e99933b4bdb upstream.
      
      According to MS-CIFS specification MID 0xFFFF should not be used by the
      CIFS client, but we actually do. Besides, this has proven to cause races
      leading to oops between SendReceive2/cifs_demultiplex_thread. On SMB1,
      MID is a 2 byte value easy to reach in CurrentMid which may conflict with
      an oplock break notification request coming from server
      Signed-off-by: NRoberto Bergantinos Corpas <rbergant@redhat.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71cf8816
    • D
      fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c · 6ea856ef
      David Hildenbrand 提交于
      commit aad5f69bc161af489dbb5934868bd347282f0764 upstream.
      
      There are three places where we access uninitialized memmaps, namely:
      - /proc/kpagecount
      - /proc/kpageflags
      - /proc/kpagecgroup
      
      We have initialized memmaps either when the section is online or when the
      page was initialized to the ZONE_DEVICE.  Uninitialized memmaps contain
      garbage and in the worst case trigger kernel BUGs, especially with
      CONFIG_PAGE_POISONING.
      
      For example, not onlining a DIMM during boot and calling /proc/kpagecount
      with CONFIG_PAGE_POISONING:
      
        :/# cat /proc/kpagecount > tmp.test
        BUG: unable to handle page fault for address: fffffffffffffffe
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
        Oops: 0000 [#1] SMP NOPTI
        CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
        RIP: 0010:kpagecount_read+0xce/0x1e0
        Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
        RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
        RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
        RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
        R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
        FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
        Call Trace:
         proc_reg_read+0x3c/0x60
         vfs_read+0xc5/0x180
         ksys_read+0x68/0xe0
         do_syscall_64+0x5c/0xa0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      For now, let's drop support for ZONE_DEVICE from the three pseudo files
      in order to fix this.  To distinguish offline memory (with garbage
      memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
      would have to check get_dev_pagemap() and pfn_zone_device_reserved()
      right now.  The usage of both (especially, special casing devmem) is
      frowned upon and needs to be reworked.
      
      The fundamental issue we have is:
      
      	if (pfn_to_online_page(pfn)) {
      		/* memmap initialized */
      	} else if (pfn_valid(pfn)) {
      		/*
      		 * ???
      		 * a) offline memory. memmap garbage.
      		 * b) devmem: memmap initialized to ZONE_DEVICE.
      		 * c) devmem: reserved for driver. memmap garbage.
      		 * (d) devmem: memmap currently initializing - garbage)
      		 */
      	}
      
      We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
      in place as that function is also used from memory failure.  We now no
      longer dump information about pages that are not in use anymore -
      offline.
      
      Link: http://lkml.kernel.org/r/20191009142435.3975-2-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
      Cc: Pankaj gupta <pagupta@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ea856ef
    • Y
      ocfs2: fix panic due to ocfs2_wq is null · 0d3ad773
      Yi Li 提交于
      commit b918c43021baaa3648de09e19a4a3dd555a45f40 upstream.
      
      mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
      an error.  ocfs2_initialize_super() returns before allocating ocfs2_wq.
      ocfs2_dismount_volume() triggers the following panic.
      
        Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
        Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
        ------------[ cut here ]------------
        Oops: 0002 [#1] SMP NOPTI
        CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G  E
              4.14.148-200.ckv.x86_64 #1
        Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
        task: ffff967af0520000 task.stack: ffffa5f05484000
        RIP: 0010:mutex_lock+0x19/0x20
        Call Trace:
          flush_workqueue+0x81/0x460
          ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
          ocfs2_dismount_volume+0x84/0x400 [ocfs2]
          ocfs2_fill_super+0xa4/0x1270 [ocfs2]
          ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
          mount_bdev+0x17f/0x1c0
          mount_fs+0x3a/0x160
      
      Link: http://lkml.kernel.org/r/1571139611-24107-1-git-send-email-yili@winhong.comSigned-off-by: NYi Li <yilikernel@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d3ad773
  2. 18 10月, 2019 9 次提交
    • A
      Fix the locking in dcache_readdir() and friends · 664ec2db
      Al Viro 提交于
      commit d4f4de5e5ef8efde85febb6876cd3c8ab1631999 upstream.
      
      There are two problems in dcache_readdir() - one is that lockless traversal
      of the list needs non-trivial cooperation of d_alloc() (at least a switch
      to list_add_rcu(), and probably more than just that) and another is that
      it assumes that no removal will happen without the directory locked exclusive.
      Said assumption had always been there, never had been stated explicitly and
      is violated by several places in the kernel (devpts and selinuxfs).
      
              * replacement of next_positive() with different calling conventions:
      it returns struct list_head * instead of struct dentry *; the latter is
      passed in and out by reference, grabbing the result and dropping the original
      value.
              * scan is under ->d_lock.  If we run out of timeslice, cursor is moved
      after the last position we'd reached and we reschedule; then the scan continues
      from that place.  To avoid livelocks between multiple lseek() (with cursors
      getting moved past each other, never reaching the real entries) we always
      skip the cursors, need_resched() or not.
              * returned list_head * is either ->d_child of dentry we'd found or
      ->d_subdirs of parent (if we got to the end of the list).
              * dcache_readdir() and dcache_dir_lseek() switched to new helper.
      dcache_readdir() always holds a reference to dentry passed to dir_emit() now.
      Cursor is moved to just before the entry where dir_emit() has failed or into
      the very end of the list, if we'd run out.
              * move_cursor() eliminated - it had sucky calling conventions and
      after fixing that it became simply list_move() (in lseek and scan_positives)
      or list_move_tail() (in readdir).
      
              All operations with the list are under ->d_lock now, and we do not
      depend upon having all file removals done with parent locked exclusive
      anymore.
      
      Cc: stable@vger.kernel.org
      Reported-by: N"zhengbin (A)" <zhengbin13@huawei.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      664ec2db
    • T
      NFS: Fix O_DIRECT accounting of number of bytes read/written · e9360f39
      Trond Myklebust 提交于
      commit 031d73ed768a40684f3ca21992265ffdb6a270bf upstream.
      
      When a series of O_DIRECT reads or writes are truncated, either due to
      eof or due to an error, then we should return the number of contiguous
      bytes that were received/sent starting at the offset specified by the
      application.
      
      Currently, we are failing to correctly check contiguity, and so we're
      failing the generic/465 in xfstests when the race between the read
      and write RPCs causes the file to get extended while the 2 reads are
      outstanding. If the first read RPC call wins the race and returns with
      eof set, we should treat the second read RPC as being truncated.
      Reported-by: NSu Yanjun <suyj.fnst@cn.fujitsu.com>
      Fixes: 1ccbad9f ("nfs: fix DIO good bytes calculation")
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9360f39
    • J
      btrfs: fix uninitialized ret in ref-verify · e0805d7f
      Josef Bacik 提交于
      commit c5f4987e86f6692fdb12533ea1fc7a7bb98e555a upstream.
      
      Coverity caught a case where we could return with a uninitialized value
      in ret in process_leaf.  This is actually pretty likely because we could
      very easily run into a block group item key and have a garbage value in
      ret and think there was an errror.  Fix this by initializing ret to 0.
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Fixes: fd708b81 ("Btrfs: add a extent ref verify tool")
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0805d7f
    • J
      btrfs: fix incorrect updating of log root tree · f7313de4
      Josef Bacik 提交于
      commit 4203e968947071586a98b5314fd7ffdea3b4f971 upstream.
      
      We've historically had reports of being unable to mount file systems
      because the tree log root couldn't be read.  Usually this is the "parent
      transid failure", but could be any of the related errors, including
      "fsid mismatch" or "bad tree block", depending on which block got
      allocated.
      
      The modification of the individual log root items are serialized on the
      per-log root root_mutex.  This means that any modification to the
      per-subvol log root_item is completely protected.
      
      However we update the root item in the log root tree outside of the log
      root tree log_mutex.  We do this in order to allow multiple subvolumes
      to be updated in each log transaction.
      
      This is problematic however because when we are writing the log root
      tree out we update the super block with the _current_ log root node
      information.  Since these two operations happen independently of each
      other, you can end up updating the log root tree in between writing out
      the dirty blocks and setting the super block to point at the current
      root.
      
      This means we'll point at the new root node that hasn't been written
      out, instead of the one we should be pointing at.  Thus whatever garbage
      or old block we end up pointing at complains when we mount the file
      system later and try to replay the log.
      
      Fix this by copying the log's root item into a local root item copy.
      Then once we're safely under the log_root_tree->log_mutex we update the
      root item in the log_root_tree.  This way we do not modify the
      log_root_tree while we're committing it, fixing the problem.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NChris Mason <clm@fb.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7313de4
    • D
      cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic · a8de7090
      Dave Wysochanski 提交于
      commit cb248819d209d113e45fed459773991518e8e80b upstream.
      
      Commit 487317c99477 ("cifs: add spinlock for the openFileList to
      cifsInodeInfo") added cifsInodeInfo->open_file_lock spin_lock to protect
      the openFileList, but missed a few places where cifs_inode->openFileList
      was enumerated.  Change these remaining tcon->open_file_lock to
      cifsInodeInfo->open_file_lock to avoid panic in is_size_safe_to_change.
      
      [17313.245641] RIP: 0010:is_size_safe_to_change+0x57/0xb0 [cifs]
      [17313.245645] Code: 68 40 48 89 ef e8 19 67 b7 f1 48 8b 43 40 48 8d 4b 40 48 8d 50 f0 48 39 c1 75 0f eb 47 48 8b 42 10 48 8d 50 f0 48 39 c1 74 3a <8b> 80 88 00 00 00 83 c0 01 a8 02 74 e6 48 89 ef c6 07 00 0f 1f 40
      [17313.245649] RSP: 0018:ffff94ae1baefa30 EFLAGS: 00010202
      [17313.245654] RAX: dead000000000100 RBX: ffff88dc72243300 RCX: ffff88dc72243340
      [17313.245657] RDX: dead0000000000f0 RSI: 00000000098f7940 RDI: ffff88dd3102f040
      [17313.245659] RBP: ffff88dd3102f040 R08: 0000000000000000 R09: ffff94ae1baefc40
      [17313.245661] R10: ffffcdc8bb1c4e80 R11: ffffcdc8b50adb08 R12: 00000000098f7940
      [17313.245663] R13: ffff88dc72243300 R14: ffff88dbc8f19600 R15: ffff88dc72243428
      [17313.245667] FS:  00007fb145485700(0000) GS:ffff88dd3e000000(0000) knlGS:0000000000000000
      [17313.245670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [17313.245672] CR2: 0000026bb46c6000 CR3: 0000004edb110003 CR4: 00000000007606e0
      [17313.245753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [17313.245756] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [17313.245759] PKRU: 55555554
      [17313.245761] Call Trace:
      [17313.245803]  cifs_fattr_to_inode+0x16b/0x580 [cifs]
      [17313.245838]  cifs_get_inode_info+0x35c/0xa60 [cifs]
      [17313.245852]  ? kmem_cache_alloc_trace+0x151/0x1d0
      [17313.245885]  cifs_open+0x38f/0x990 [cifs]
      [17313.245921]  ? cifs_revalidate_dentry_attr+0x3e/0x350 [cifs]
      [17313.245953]  ? cifsFileInfo_get+0x30/0x30 [cifs]
      [17313.245960]  ? do_dentry_open+0x132/0x330
      [17313.245963]  do_dentry_open+0x132/0x330
      [17313.245969]  path_openat+0x573/0x14d0
      [17313.245974]  do_filp_open+0x93/0x100
      [17313.245979]  ? __check_object_size+0xa3/0x181
      [17313.245986]  ? audit_alloc_name+0x7e/0xd0
      [17313.245992]  do_sys_open+0x184/0x220
      [17313.245999]  do_syscall_64+0x5b/0x1b0
      
      Fixes: 487317c99477 ("cifs: add spinlock for the openFileList to cifsInodeInfo")
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8de7090
    • P
      CIFS: Force reval dentry if LOOKUP_REVAL flag is set · 230b339a
      Pavel Shilovsky 提交于
      commit 0b3d0ef9840f7be202393ca9116b857f6f793715 upstream.
      
      Mark inode for force revalidation if LOOKUP_REVAL flag is set.
      This tells the client to actually send a QueryInfo request to
      the server to obtain the latest metadata in case a directory
      or a file were changed remotely. Only do that if the client
      doesn't have a lease for the file to avoid unneeded round
      trips to the server.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      230b339a
    • P
      CIFS: Force revalidate inode when dentry is stale · 0bc78de4
      Pavel Shilovsky 提交于
      commit c82e5ac7fe3570a269c0929bf7899f62048e7dbc upstream.
      
      Currently the client indicates that a dentry is stale when inode
      numbers or type types between a local inode and a remote file
      don't match. If this is the case attributes is not being copied
      from remote to local, so, it is already known that the local copy
      has stale metadata. That's why the inode needs to be marked for
      revalidation in order to tell the VFS to lookup the dentry again
      before openning a file. This prevents unexpected stale errors
      to be returned to the user space when openning a file.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bc78de4
    • P
      CIFS: Gracefully handle QueryInfo errors during open · d72c2115
      Pavel Shilovsky 提交于
      commit 30573a82fb179420b8aac30a3a3595aa96a93156 upstream.
      
      Currently if the client identifies problems when processing
      metadata returned in CREATE response, the open handle is being
      leaked. This causes multiple problems like a file missing a lease
      break by that client which causes high latencies to other clients
      accessing the file. Another side-effect of this is that the file
      can't be deleted.
      
      Fix this by closing the file after the client hits an error after
      the file was opened and the open descriptor wasn't returned to
      the user space. Also convert -ESTALE to -EOPENSTALE to allow
      the VFS to revalidate a dentry and retry the open.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d72c2115
    • I
      f2fs: use EINVAL for superblock with invalid magic · 95bcc0d9
      Icenowy Zheng 提交于
      [ Upstream commit 38fb6d0ea34299d97b031ed64fe994158b6f8eb3 ]
      
      The kernel mount_block_root() function expects -EACESS or -EINVAL for a
      unmountable filesystem when trying to mount the root with different
      filesystem types.
      
      However, in 5.3-rc1 the behavior when F2FS code cannot find valid block
      changed to return -EFSCORRUPTED(-EUCLEAN), and this error code makes
      mount_block_root() fail when trying to probe F2FS.
      
      When the magic number of the superblock mismatches, it has a high
      probability that it's just not a F2FS. In this case return -EINVAL seems
      to be a better result, and this return value can make mount_block_root()
      probing work again.
      
      Return -EINVAL when the superblock has magic mismatch, -EFSCORRUPTED in
      other cases (the magic matches but the superblock cannot be recognized).
      
      Fixes: 10f966bbf521 ("f2fs: use generic EFSBADCRC/EFSCORRUPTED")
      Signed-off-by: NIcenowy Zheng <icenowy@aosc.io>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      95bcc0d9
  3. 12 10月, 2019 7 次提交
  4. 08 10月, 2019 5 次提交
  5. 05 10月, 2019 13 次提交
    • E
      fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock · 5bead06b
      Eric Biggers 提交于
      [ Upstream commit 76e43c8ccaa35c30d5df853013561145a0f750a5 ]
      
      When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
      and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
      
      This may have to wait for fuse_iqueue::waitq.lock to be released by one
      of many places that take it with IRQs enabled.  Since the IRQ handler
      may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.
      
      Fix it by protecting the state of struct fuse_iqueue with a separate
      spinlock, and only accessing fuse_iqueue::waitq using the versions of
      the waitqueue functions which do IRQ-safe locking internally.
      
      Reproducer:
      
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <sys/mount.h>
      	#include <sys/stat.h>
      	#include <sys/syscall.h>
      	#include <unistd.h>
      	#include <linux/aio_abi.h>
      
      	int main()
      	{
      		char opts[128];
      		int fd = open("/dev/fuse", O_RDWR);
      		aio_context_t ctx = 0;
      		struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
      		struct iocb *cbp = &cb;
      
      		sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
      		mkdir("mnt", 0700);
      		mount("foo",  "mnt", "fuse", 0, opts);
      		syscall(__NR_io_setup, 1, &ctx);
      		syscall(__NR_io_submit, ctx, 1, &cbp);
      	}
      
      Beginning of lockdep output:
      
      	=====================================================
      	WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      	5.3.0-rc5 #9 Not tainted
      	-----------------------------------------------------
      	syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      	000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
      	000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
      	000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
      
      	and this task is already holding:
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
      	which would create a new lock dependency:
      	 (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
      
      	but this new dependency connects a SOFTIRQ-irq-safe lock:
      	 (&(&ctx->ctx_lock)->rlock){..-.}
      
      	[...]
      
      Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
      Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.19+
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bead06b
    • P
      CIFS: Fix oplock handling for SMB 2.1+ protocols · 4290a9e5
      Pavel Shilovsky 提交于
      commit a016e2794fc3a245a91946038dd8f34d65e53cc3 upstream.
      
      There may be situations when a server negotiates SMB 2.1
      protocol version or higher but responds to a CREATE request
      with an oplock rather than a lease.
      
      Currently the client doesn't handle such a case correctly:
      when another CREATE comes in the server sends an oplock
      break to the initial CREATE and the client doesn't send
      an ack back due to a wrong caching level being set (READ
      instead of RWH). Missing an oplock break ack makes the
      server wait until the break times out which dramatically
      increases the latency of the second CREATE.
      
      Fix this by properly detecting oplocks when using SMB 2.1
      protocol version and higher.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4290a9e5
    • M
      CIFS: fix max ea value size · a3a15089
      Murphy Zhou 提交于
      commit 63d37fb4ce5ae7bf1e58f906d1bf25f036fe79b2 upstream.
      
      It should not be larger then the slab max buf size. If user
      specifies a larger size, it passes this check and goes
      straightly to SMB2_set_info_init performing an insecure memcpy.
      Signed-off-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3a15089
    • T
      ext4: fix punch hole for inline_data file systems · 091c754d
      Theodore Ts'o 提交于
      commit c1e8220bd316d8ae8e524df39534b8a412a45d5e upstream.
      
      If a program attempts to punch a hole on an inline data file, we need
      to convert it to a normal file first.
      
      This was detected using ext4/032 using the adv configuration.  Simple
      reproducer:
      
      mke2fs -Fq -t ext4 -O inline_data /dev/vdc
      mount /vdc
      echo "" > /vdc/testfile
      xfs_io -c 'truncate 33554432' /vdc/testfile
      xfs_io -c 'fpunch 0 1048576' /vdc/testfile
      umount /vdc
      e2fsck -fy /dev/vdc
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      091c754d
    • R
      ext4: fix warning inside ext4_convert_unwritten_extents_endio · 775e3e73
      Rakesh Pandit 提交于
      commit e3d550c2c4f2f3dba469bc3c4b83d9332b4e99e1 upstream.
      
      Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing
      first argument.  This was introduced in commit ff95ec22 ("ext4:
      add warning to ext4_convert_unwritten_extents_endio") and splitting
      extents inside endio would trigger it.
      
      Fixes: ff95ec22 ("ext4: add warning to ext4_convert_unwritten_extents_endio")
      Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      775e3e73
    • F
      Btrfs: fix race setting up and completing qgroup rescan workers · bacff03b
      Filipe Manana 提交于
      commit 13fc1d271a2e3ab8a02071e711add01fab9271f6 upstream.
      
      There is a race between setting up a qgroup rescan worker and completing
      a qgroup rescan worker that can lead to callers of the qgroup rescan wait
      ioctl to either not wait for the rescan worker to complete or to hang
      forever due to missing wake ups. The following diagram shows a sequence
      of steps that illustrates the race.
      
              CPU 1                                                         CPU 2                                  CPU 3
      
       btrfs_ioctl_quota_rescan()
        btrfs_qgroup_rescan()
         qgroup_rescan_init()
          mutex_lock(&fs_info->qgroup_rescan_lock)
          spin_lock(&fs_info->qgroup_lock)
      
          fs_info->qgroup_flags |=
            BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
          init_completion(
            &fs_info->qgroup_rescan_completion)
      
          fs_info->qgroup_rescan_running = true
      
          mutex_unlock(&fs_info->qgroup_rescan_lock)
          spin_unlock(&fs_info->qgroup_lock)
      
          btrfs_init_work()
           --> starts the worker
      
                                                              btrfs_qgroup_rescan_worker()
                                                               mutex_lock(&fs_info->qgroup_rescan_lock)
      
                                                               fs_info->qgroup_flags &=
                                                                 ~BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
                                                               mutex_unlock(&fs_info->qgroup_rescan_lock)
      
                                                               starts transaction, updates qgroup status
                                                               item, etc
      
                                                                                                                 btrfs_ioctl_quota_rescan()
                                                                                                                  btrfs_qgroup_rescan()
                                                                                                                   qgroup_rescan_init()
                                                                                                                    mutex_lock(&fs_info->qgroup_rescan_lock)
                                                                                                                    spin_lock(&fs_info->qgroup_lock)
      
                                                                                                                    fs_info->qgroup_flags |=
                                                                                                                      BTRFS_QGROUP_STATUS_FLAG_RESCAN
      
                                                                                                                    init_completion(
                                                                                                                      &fs_info->qgroup_rescan_completion)
      
                                                                                                                    fs_info->qgroup_rescan_running = true
      
                                                                                                                    mutex_unlock(&fs_info->qgroup_rescan_lock)
                                                                                                                    spin_unlock(&fs_info->qgroup_lock)
      
                                                                                                                    btrfs_init_work()
                                                                                                                     --> starts another worker
      
                                                               mutex_lock(&fs_info->qgroup_rescan_lock)
      
                                                               fs_info->qgroup_rescan_running = false
      
                                                               mutex_unlock(&fs_info->qgroup_rescan_lock)
      
      							 complete_all(&fs_info->qgroup_rescan_completion)
      
      Before the rescan worker started by the task at CPU 3 completes, if
      another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS
      because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at
      fs_info->qgroup_flags, which is expected and correct behaviour.
      
      However if other task calls btrfs_ioctl_quota_rescan_wait() before the
      rescan worker started by the task at CPU 3 completes, it will return
      immediately without waiting for the new rescan worker to complete,
      because fs_info->qgroup_rescan_running is set to false by CPU 2.
      
      This race is making test case btrfs/171 (from fstests) to fail often:
      
        btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad)
      #      --- tests/btrfs/171.out     2018-09-16 21:30:48.505104287 +0100
      #      +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad      2019-09-19 02:01:36.938486039 +0100
      #      @@ -1,2 +1,3 @@
      #       QA output created by 171
      #      +ERROR: quota rescan failed: Operation now in progress
      #       Silence is golden
      #      ...
      #      (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad'  to see the entire diff)
      
      That is because the test calls the btrfs-progs commands "qgroup quota
      rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes
      calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs"
      commands 'qgroup assign' and 'qgroup remove' often call the rescan start
      ioctl after calling the qgroup assign ioctl,
      btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait
      for a rescan worker to complete.
      
      Another problem the race can cause is missing wake ups for waiters,
      since the call to complete_all() happens outside a critical section and
      after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence
      diagram above, if we have a waiter for the first rescan task (executed
      by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and
      if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and
      before it calls complete_all() against
      fs_info->qgroup_rescan_completion, the task at CPU 3 calls
      init_completion() against fs_info->qgroup_rescan_completion which
      re-initilizes its wait queue to an empty queue, therefore causing the
      rescan worker at CPU 2 to call complete_all() against an empty queue,
      never waking up the task waiting for that rescan worker.
      
      Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting
      fs_info->qgroup_rescan_running to false in the same critical section,
      delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the
      call to complete_all() in that same critical section. This gives the
      protection needed to avoid rescan wait ioctl callers not waiting for a
      running rescan worker and the lost wake ups problem, since setting that
      rescan flag and boolean as well as initializing the wait queue is done
      already in a critical section delimited by that mutex (at
      qgroup_rescan_init()).
      
      Fixes: 57254b6e ("Btrfs: add ioctl to wait for qgroup rescan completion")
      Fixes: d2c609b8 ("btrfs: properly track when rescan worker is running")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bacff03b
    • Q
      btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls · b5c42ef0
      Qu Wenruo 提交于
      commit d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3 upstream.
      
      [BUG]
      The following script can cause btrfs qgroup data space leak:
      
        mkfs.btrfs -f $dev
        mount $dev -o nospace_cache $mnt
      
        btrfs subv create $mnt/subv
        btrfs quota en $mnt
        btrfs quota rescan -w $mnt
        btrfs qgroup limit 128m $mnt/subv
      
        for (( i = 0; i < 3; i++)); do
                # Create 3 64M holes for latter fallocate to fail
                truncate -s 192m $mnt/subv/file
                xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
                xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
                sync
      
                # it's supposed to fail, and each failure will leak at least 64M
                # data space
                xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
                rm $mnt/subv/file
                sync
        done
      
        # Shouldn't fail after we removed the file
        xfs_io -f -c "falloc 0 64m" $mnt/subv/file
      
      [CAUSE]
      Btrfs qgroup data reserve code allow multiple reservations to happen on
      a single extent_changeset:
      E.g:
      	btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M);
      	btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M);
      	btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M);
      
      Btrfs qgroup code has its internal tracking to make sure we don't
      double-reserve in above example.
      
      The only pattern utilizing this feature is in the main while loop of
      btrfs_fallocate() function.
      
      However btrfs_qgroup_reserve_data()'s error handling has a bug in that
      on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED
      flag but doesn't free previously reserved bytes.
      
      This bug has a two fold effect:
      - Clearing EXTENT_QGROUP_RESERVED ranges
        This is the correct behavior, but it prevents
        btrfs_qgroup_check_reserved_leak() to catch the leakage as the
        detector is purely EXTENT_QGROUP_RESERVED flag based.
      
      - Leak the previously reserved data bytes.
      
      The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
      the last one fails, leaking space reserved in the previous ones.
      
      [FIX]
      Also free previously reserved data bytes when btrfs_qgroup_reserve_data
      fails.
      
      Fixes: 52472553 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5c42ef0
    • Q
      btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space · c521bfa8
      Qu Wenruo 提交于
      commit bab32fc069ce8829c416e8737c119f62a57970f9 upstream.
      
      [BUG]
      Under the following case with qgroup enabled, if some error happened
      after we have reserved delalloc space, then in error handling path, we
      could cause qgroup data space leakage:
      
      From btrfs_truncate_block() in inode.c:
      
      	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
      					   block_start, blocksize);
      	if (ret)
      		goto out;
      
       again:
      	page = find_or_create_page(mapping, index, mask);
      	if (!page) {
      		btrfs_delalloc_release_space(inode, data_reserved,
      					     block_start, blocksize, true);
      		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
      		ret = -ENOMEM;
      		goto out;
      	}
      
      [CAUSE]
      In the above case, btrfs_delalloc_reserve_space() will call
      btrfs_qgroup_reserve_data() and mark the io_tree range with
      EXTENT_QGROUP_RESERVED flag.
      
      In the error handling path, we have the following call stack:
      btrfs_delalloc_release_space()
      |- btrfs_free_reserved_data_space()
         |- btrsf_qgroup_free_data()
            |- __btrfs_qgroup_release_data(reserved=@reserved, free=1)
               |- qgroup_free_reserved_data(reserved=@reserved)
                  |- clear_record_extent_bits();
                  |- freed += changeset.bytes_changed;
      
      However due to a completion bug, qgroup_free_reserved_data() will clear
      EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
      than the correct BTRFS_I(inode)->io_tree.
      Since io_failure_tree is never marked with that flag,
      btrfs_qgroup_free_data() will not free any data reserved space at all,
      causing a leakage.
      
      This type of error handling can only be triggered by errors outside of
      qgroup code. So EDQUOT error from qgroup can't trigger it.
      
      [FIX]
      Fix the wrong target io_tree.
      Reported-by: NJosef Bacik <josef@toxicpanda.com>
      Fixes: bc42bda2 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c521bfa8
    • N
      btrfs: Relinquish CPUs in btrfs_compare_trees · 067f82a0
      Nikolay Borisov 提交于
      commit 6af112b11a4bc1b560f60a618ac9c1dcefe9836e upstream.
      
      When doing any form of incremental send the parent and the child trees
      need to be compared via btrfs_compare_trees. This  can result in long
      loop chains without ever relinquishing the CPU. This causes softlockup
      detector to trigger when comparing trees with a lot of items. Example
      report:
      
      watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [snapperd:16153]
      CPU: 0 PID: 16153 Comm: snapperd Not tainted 5.2.9-1-default #1 openSUSE Tumbleweed (unreleased)
      Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      pstate: 40000005 (nZcv daif -PAN -UAO)
      pc : __ll_sc_arch_atomic_sub_return+0x14/0x20
      lr : btrfs_release_extent_buffer_pages+0xe0/0x1e8 [btrfs]
      sp : ffff00001273b7e0
      Call trace:
       __ll_sc_arch_atomic_sub_return+0x14/0x20
       release_extent_buffer+0xdc/0x120 [btrfs]
       free_extent_buffer.part.0+0xb0/0x118 [btrfs]
       free_extent_buffer+0x24/0x30 [btrfs]
       btrfs_release_path+0x4c/0xa0 [btrfs]
       btrfs_free_path.part.0+0x20/0x40 [btrfs]
       btrfs_free_path+0x24/0x30 [btrfs]
       get_inode_info+0xa8/0xf8 [btrfs]
       finish_inode_if_needed+0xe0/0x6d8 [btrfs]
       changed_cb+0x9c/0x410 [btrfs]
       btrfs_compare_trees+0x284/0x648 [btrfs]
       send_subvol+0x33c/0x520 [btrfs]
       btrfs_ioctl_send+0x8a0/0xaf0 [btrfs]
       btrfs_ioctl+0x199c/0x2288 [btrfs]
       do_vfs_ioctl+0x4b0/0x820
       ksys_ioctl+0x84/0xb8
       __arm64_sys_ioctl+0x28/0x38
       el0_svc_common.constprop.0+0x7c/0x188
       el0_svc_handler+0x34/0x90
       el0_svc+0x8/0xc
      
      Fix this by adding a call to cond_resched at the beginning of the main
      loop in btrfs_compare_trees.
      
      Fixes: 7069830a ("Btrfs: add btrfs_compare_trees function")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      067f82a0
    • F
      Btrfs: fix use-after-free when using the tree modification log · b08344be
      Filipe Manana 提交于
      commit efad8a853ad2057f96664328a0d327a05ce39c76 upstream.
      
      At ctree.c:get_old_root(), we are accessing a root's header owner field
      after we have freed the respective extent buffer. This results in an
      use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC
      is set, results in a stack trace like the following:
      
        [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
        [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1
        [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs]
        (...)
        [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286
        [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000
        [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246
        [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000
        [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000
        [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8
        [ 3876.799620] FS:  00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000
        [ 3876.799643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0
        [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 3876.799723] Call Trace:
        [ 3876.799735]  ? do_raw_spin_unlock+0x49/0xc0
        [ 3876.799749]  ? _raw_spin_unlock+0x24/0x30
        [ 3876.799779]  resolve_indirect_refs+0x1eb/0xc80 [btrfs]
        [ 3876.799810]  find_parent_nodes+0x38d/0x1180 [btrfs]
        [ 3876.799841]  btrfs_check_shared+0x11a/0x1d0 [btrfs]
        [ 3876.799870]  ? extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799895]  extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799913]  do_vfs_ioctl+0x45a/0x700
        [ 3876.799926]  ksys_ioctl+0x70/0x80
        [ 3876.799938]  ? trace_hardirqs_off_thunk+0x1a/0x20
        [ 3876.799953]  __x64_sys_ioctl+0x16/0x20
        [ 3876.799965]  do_syscall_64+0x62/0x220
        [ 3876.799977]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 3876.799993] RIP: 0033:0x7f23e0013dd7
        (...)
        [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7
        [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003
        [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c
        [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80
        [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80
        (...)
        [ 3876.805107] ---[ end trace e53161e179ef04f9 ]---
      
      Fix that by saving the root's header owner field into a local variable
      before freeing the root's extent buffer, and then use that local variable
      when needed.
      
      Fixes: 30b0463a ("Btrfs: fix accessing the root pointer in tree mod log functions")
      CC: stable@vger.kernel.org # 3.10+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b08344be
    • C
      btrfs: fix allocation of free space cache v1 bitmap pages · 4874c6fe
      Christophe Leroy 提交于
      commit 3acd48507dc43eeeb0a1fe965b8bad91cab904a7 upstream.
      
      Various notifications of type "BUG kmalloc-4096 () : Redzone
      overwritten" have been observed recently in various parts of the kernel.
      After some time, it has been made a relation with the use of BTRFS
      filesystem and with SLUB_DEBUG turned on.
      
      [   22.809700] BUG kmalloc-4096 (Tainted: G        W        ): Redzone overwritten
      
      [   22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc
      [   22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224
      [   22.811193] 	__slab_alloc.constprop.26+0x44/0x70
      [   22.811345] 	kmem_cache_alloc_trace+0xf0/0x2ec
      [   22.811588] 	__load_free_space_cache+0x588/0x780 [btrfs]
      [   22.811848] 	load_free_space_cache+0xf4/0x1b0 [btrfs]
      [   22.812090] 	cache_block_group+0x1d0/0x3d0 [btrfs]
      [   22.812321] 	find_free_extent+0x680/0x12a4 [btrfs]
      [   22.812549] 	btrfs_reserve_extent+0xec/0x220 [btrfs]
      [   22.812785] 	btrfs_alloc_tree_block+0x178/0x5f4 [btrfs]
      [   22.813032] 	__btrfs_cow_block+0x150/0x5d4 [btrfs]
      [   22.813262] 	btrfs_cow_block+0x194/0x298 [btrfs]
      [   22.813484] 	commit_cowonly_roots+0x44/0x294 [btrfs]
      [   22.813718] 	btrfs_commit_transaction+0x63c/0xc0c [btrfs]
      [   22.813973] 	close_ctree+0xf8/0x2a4 [btrfs]
      [   22.814107] 	generic_shutdown_super+0x80/0x110
      [   22.814250] 	kill_anon_super+0x18/0x30
      [   22.814437] 	btrfs_kill_super+0x18/0x90 [btrfs]
      [   22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83
      [   22.814841] 	proc_cgroup_show+0xc0/0x248
      [   22.814967] 	proc_single_show+0x54/0x98
      [   22.815086] 	seq_read+0x278/0x45c
      [   22.815190] 	__vfs_read+0x28/0x17c
      [   22.815289] 	vfs_read+0xa8/0x14c
      [   22.815381] 	ksys_read+0x50/0x94
      [   22.815475] 	ret_from_syscall+0x0/0x38
      
      Commit 69d24804 ("btrfs: use copy_page for copying pages instead of
      memcpy") changed the way bitmap blocks are copied. But allthough bitmaps
      have the size of a page, they were allocated with kzalloc().
      
      Most of the time, kzalloc() allocates aligned blocks of memory, so
      copy_page() can be used. But when some debug options like SLAB_DEBUG are
      activated, kzalloc() may return unaligned pointer.
      
      On powerpc, memcpy(), copy_page() and other copying functions use
      'dcbz' instruction which provides an entire zeroed cacheline to avoid
      memory read when the intention is to overwrite a full line. Functions
      like memcpy() are writen to care about partial cachelines at the start
      and end of the destination, but copy_page() assumes it gets pages. As
      pages are naturally cache aligned, copy_page() doesn't care about
      partial lines. This means that when copy_page() is called with a
      misaligned pointer, a few leading bytes are zeroed.
      
      To fix it, allocate bitmaps through kmem_cache instead of using kzalloc()
      The cache pool is created with PAGE_SIZE alignment constraint.
      Reported-by: NErhard F. <erhard_f@mailbox.org>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204371
      Fixes: 69d24804 ("btrfs: use copy_page for copying pages instead of memcpy")
      Cc: stable@vger.kernel.org # 4.19+
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ rename to btrfs_free_space_bitmap ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4874c6fe
    • M
      ovl: filter of trusted xattr results in audit · 934243a7
      Mark Salyzyn 提交于
      commit 5c2e9f346b815841f9bed6029ebcb06415caf640 upstream.
      
      When filtering xattr list for reading, presence of trusted xattr
      results in a security audit log.  However, if there is other content
      no errno will be set, and if there isn't, the errno will be -ENODATA
      and not -EPERM as is usually associated with a lack of capability.
      The check does not block the request to list the xattrs present.
      
      Switch to ns_capable_noaudit to reflect a more appropriate check.
      Signed-off-by: NMark Salyzyn <salyzyn@android.com>
      Cc: linux-security-module@vger.kernel.org
      Cc: kernel-team@android.com
      Cc: stable@vger.kernel.org # v3.18+
      Fixes: a082c6f6 ("ovl: filter trusted xattr for non-admin")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      934243a7
    • D
      ovl: Fix dereferencing possible ERR_PTR() · e7265adc
      Ding Xiang 提交于
      commit 97f024b9171e74c4443bbe8a8dce31b917f97ac5 upstream.
      
      if ovl_encode_real_fh() fails, no memory was allocated
      and the error in the error-valued pointer should be returned.
      
      Fixes: 9b6faee0 ("ovl: check ERR_PTR() return value from ovl_encode_fh()")
      Signed-off-by: NDing Xiang <dingxiang@cmss.chinamobile.com>
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7265adc