1. 16 9月, 2019 23 次提交
    • T
      ext4: don't perform block validity checks on the journal inode · 97fbf573
      Theodore Ts'o 提交于
      [ Upstream commit 0a944e8a6c66ca04c7afbaa17e22bf208a8b37f0 ]
      
      Since the journal inode is already checked when we added it to the
      block validity's system zone, if we check it again, we'll just trigger
      a failure.
      
      This was causing failures like this:
      
      [   53.897001] EXT4-fs error (device sda): ext4_find_extent:909: inode
      #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0)
      [   53.931430] jbd2_journal_bmap: journal block not found at offset 49 on sda-8
      [   53.938480] Aborting journal on device sda-8.
      
      ... but only if the system was under enough memory pressure that
      logical->physical mapping for the journal inode gets pushed out of the
      extent cache.  (This is why it wasn't noticed earlier.)
      
      Fixes: 345c0dbf3a30 ("ext4: protect journal inode's blocks using block_validity")
      Reported-by: NDan Rue <dan.rue@linaro.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      97fbf573
    • T
      NFSv4: Fix delegation state recovery · 652993a5
      Trond Myklebust 提交于
      [ Upstream commit 5eb8d18ca0e001c6055da2b7f30d8f6dca23a44f ]
      
      Once we clear the NFS_DELEGATED_STATE flag, we're telling
      nfs_delegation_claim_opens() that we're done recovering all open state
      for that stateid, so we really need to ensure that we test for all
      open modes that are currently cached and recover them before exiting
      nfs4_open_delegation_recall().
      
      Fixes: 24311f88 ("NFSv4: Recovery of recalled read delegations...")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      652993a5
    • N
      pstore: Fix double-free in pstore_mkfile() failure path · 5e9a2ce6
      Norbert Manthey 提交于
      [ Upstream commit 4c6d80e1144bdf48cae6b602ae30d41f3e5c76a9 ]
      
      The pstore_mkfile() function is passed a pointer to a struct
      pstore_record. On success it consumes this 'record' pointer and
      references it from the created inode.
      
      On failure, however, it may or may not free the record. There are even
      two different code paths which return -ENOMEM -- one of which does and
      the other doesn't free the record.
      
      Make the behaviour deterministic by never consuming and freeing the
      record when returning failure, allowing the caller to do the cleanup
      consistently.
      Signed-off-by: NNorbert Manthey <nmanthey@amazon.de>
      Link: https://lore.kernel.org/r/1562331960-26198-1-git-send-email-nmanthey@amazon.de
      Fixes: 83f70f07 ("pstore: Do not duplicate record metadata")
      Fixes: 1dfff7dd ("pstore: Pass record contents instead of copying")
      Cc: stable@vger.kernel.org
      [kees: also move "private" allocation location, rename inode cleanup label]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5e9a2ce6
    • J
      btrfs: correctly validate compression type · 1c13c9c4
      Johannes Thumshirn 提交于
      [ Upstream commit aa53e3bfac7205fb3a8815ac1c937fd6ed01b41e ]
      
      Nikolay reported the following KASAN splat when running btrfs/048:
      
      [ 1843.470920] ==================================================================
      [ 1843.471971] BUG: KASAN: slab-out-of-bounds in strncmp+0x66/0xb0
      [ 1843.472775] Read of size 1 at addr ffff888111e369e2 by task btrfs/3979
      
      [ 1843.473904] CPU: 3 PID: 3979 Comm: btrfs Not tainted 5.2.0-rc3-default #536
      [ 1843.475009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [ 1843.476322] Call Trace:
      [ 1843.476674]  dump_stack+0x7c/0xbb
      [ 1843.477132]  ? strncmp+0x66/0xb0
      [ 1843.477587]  print_address_description+0x114/0x320
      [ 1843.478256]  ? strncmp+0x66/0xb0
      [ 1843.478740]  ? strncmp+0x66/0xb0
      [ 1843.479185]  __kasan_report+0x14e/0x192
      [ 1843.479759]  ? strncmp+0x66/0xb0
      [ 1843.480209]  kasan_report+0xe/0x20
      [ 1843.480679]  strncmp+0x66/0xb0
      [ 1843.481105]  prop_compression_validate+0x24/0x70
      [ 1843.481798]  btrfs_xattr_handler_set_prop+0x65/0x160
      [ 1843.482509]  __vfs_setxattr+0x71/0x90
      [ 1843.483012]  __vfs_setxattr_noperm+0x84/0x130
      [ 1843.483606]  vfs_setxattr+0xac/0xb0
      [ 1843.484085]  setxattr+0x18c/0x230
      [ 1843.484546]  ? vfs_setxattr+0xb0/0xb0
      [ 1843.485048]  ? __mod_node_page_state+0x1f/0xa0
      [ 1843.485672]  ? _raw_spin_unlock+0x24/0x40
      [ 1843.486233]  ? __handle_mm_fault+0x988/0x1290
      [ 1843.486823]  ? lock_acquire+0xb4/0x1e0
      [ 1843.487330]  ? lock_acquire+0xb4/0x1e0
      [ 1843.487842]  ? mnt_want_write_file+0x3c/0x80
      [ 1843.488442]  ? debug_lockdep_rcu_enabled+0x22/0x40
      [ 1843.489089]  ? rcu_sync_lockdep_assert+0xe/0x70
      [ 1843.489707]  ? __sb_start_write+0x158/0x200
      [ 1843.490278]  ? mnt_want_write_file+0x3c/0x80
      [ 1843.490855]  ? __mnt_want_write+0x98/0xe0
      [ 1843.491397]  __x64_sys_fsetxattr+0xba/0xe0
      [ 1843.492201]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [ 1843.493201]  do_syscall_64+0x6c/0x230
      [ 1843.493988]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1843.495041] RIP: 0033:0x7fa7a8a7707a
      [ 1843.495819] Code: 48 8b 0d 21 de 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 be 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ee dd 2b 00 f7 d8 64 89 01 48
      [ 1843.499203] RSP: 002b:00007ffcb73bca38 EFLAGS: 00000202 ORIG_RAX: 00000000000000be
      [ 1843.500210] RAX: ffffffffffffffda RBX: 00007ffcb73bda9d RCX: 00007fa7a8a7707a
      [ 1843.501170] RDX: 00007ffcb73bda9d RSI: 00000000006dc050 RDI: 0000000000000003
      [ 1843.502152] RBP: 00000000006dc050 R08: 0000000000000000 R09: 0000000000000000
      [ 1843.503109] R10: 0000000000000002 R11: 0000000000000202 R12: 00007ffcb73bda91
      [ 1843.504055] R13: 0000000000000003 R14: 00007ffcb73bda82 R15: ffffffffffffffff
      
      [ 1843.505268] Allocated by task 3979:
      [ 1843.505771]  save_stack+0x19/0x80
      [ 1843.506211]  __kasan_kmalloc.constprop.5+0xa0/0xd0
      [ 1843.506836]  setxattr+0xeb/0x230
      [ 1843.507264]  __x64_sys_fsetxattr+0xba/0xe0
      [ 1843.507886]  do_syscall_64+0x6c/0x230
      [ 1843.508429]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [ 1843.509558] Freed by task 0:
      [ 1843.510188] (stack is not available)
      
      [ 1843.511309] The buggy address belongs to the object at ffff888111e369e0
                      which belongs to the cache kmalloc-8 of size 8
      [ 1843.514095] The buggy address is located 2 bytes inside of
                      8-byte region [ffff888111e369e0, ffff888111e369e8)
      [ 1843.516524] The buggy address belongs to the page:
      [ 1843.517561] page:ffff88813f478d80 refcount:1 mapcount:0 mapping:ffff88811940c300 index:0xffff888111e373b8 compound_mapcount: 0
      [ 1843.519993] flags: 0x4404000010200(slab|head)
      [ 1843.520951] raw: 0004404000010200 ffff88813f48b008 ffff888119403d50 ffff88811940c300
      [ 1843.522616] raw: ffff888111e373b8 000000000016000f 00000001ffffffff 0000000000000000
      [ 1843.524281] page dumped because: kasan: bad access detected
      
      [ 1843.525936] Memory state around the buggy address:
      [ 1843.526975]  ffff888111e36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.528479]  ffff888111e36900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.530138] >ffff888111e36980: fc fc fc fc fc fc fc fc fc fc fc fc 02 fc fc fc
      [ 1843.531877]                                                        ^
      [ 1843.533287]  ffff888111e36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.534874]  ffff888111e36a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1843.536468] ==================================================================
      
      This is caused by supplying a too short compression value ('lz') in the
      test-case and comparing it to 'lzo' with strncmp() and a length of 3.
      strncmp() read past the 'lz' when looking for the 'o' and thus caused an
      out-of-bounds read.
      
      Introduce a new check 'btrfs_compress_is_valid_type()' which not only
      checks the user-supplied value against known compression types, but also
      employs checks for too short values.
      Reported-by: NNikolay Borisov <nborisov@suse.com>
      Fixes: 272e5326c783 ("btrfs: prop: fix vanished compression property after failed set")
      CC: stable@vger.kernel.org # 5.1+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1c13c9c4
    • Y
      ceph: use ceph_evict_inode to cleanup inode's resource · 81281039
      Yan, Zheng 提交于
      [ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]
      
      remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
      freeing inode to remove its caps. But VFS wakes freeing inode waiters
      before calling destroy_inode().
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/40102Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      81281039
    • P
      cifs: Properly handle auto disabling of serverino option · 987564c2
      Paulo Alcantara (SUSE) 提交于
      [ Upstream commit 29fbeb7a908a60a5ae8c50fbe171cb8fdcef1980 ]
      
      Fix mount options comparison when serverino option is turned off later
      in cifs_autodisable_serverino() and thus avoiding mismatch of new cifs
      mounts.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaulo Alcantara (SUSE) <paulo@paulo.ac>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilove@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      987564c2
    • R
      cifs: add spinlock for the openFileList to cifsInodeInfo · acc07941
      Ronnie Sahlberg 提交于
      [ Upstream commit 487317c99477d00f22370625d53be3239febabbe ]
      
      We can not depend on the tcon->open_file_lock here since in multiuser mode
      we may have the same file/inode open via multiple different tcons.
      
      The current code is race prone and will crash if one user deletes a file
      at the same time a different user opens/create the file.
      
      To avoid this we need to have a spinlock attached to the inode and not the tcon.
      
      RHBZ:  1580165
      
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      acc07941
    • F
      Btrfs: fix race between block group removal and block group allocation · 1d064876
      Filipe Manana 提交于
      [ Upstream commit 8eaf40c0e24e98899a0f3ac9d25a33aafe13822a ]
      
      If a task is removing the block group that currently has the highest start
      offset amongst all existing block groups, there is a short time window
      where it races with a concurrent block group allocation, resulting in a
      transaction abort with an error code of EEXIST.
      
      The following diagram explains the race in detail:
      
            Task A                                                        Task B
      
       btrfs_remove_block_group(bg offset X)
      
         remove_extent_mapping(em offset X)
           -> removes extent map X from the
              tree of extent maps
              (fs_info->mapping_tree), so the
              next call to find_next_chunk()
              will return offset X
      
                                                         btrfs_alloc_chunk()
                                                           find_next_chunk()
                                                             --> returns offset X
      
                                                           __btrfs_alloc_chunk(offset X)
                                                             btrfs_make_block_group()
                                                               btrfs_create_block_group_cache()
                                                                 --> creates btrfs_block_group_cache
                                                                     object with a key corresponding
                                                                     to the block group item in the
                                                                     extent, the key is:
                                                                     (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)
      
                                                               --> adds the btrfs_block_group_cache object
                                                                   to the list new_bgs of the transaction
                                                                   handle
      
                                                         btrfs_end_transaction(trans handle)
                                                           __btrfs_end_transaction()
                                                             btrfs_create_pending_block_groups()
                                                               --> sees the new btrfs_block_group_cache
                                                                   in the new_bgs list of the transaction
                                                                   handle
                                                               --> its call to btrfs_insert_item() fails
                                                                   with -EEXIST when attempting to insert
                                                                   the block group item key
                                                                   (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)
                                                                   because task A has not removed that key yet
                                                               --> aborts the running transaction with
                                                                   error -EEXIST
      
         btrfs_del_item()
           -> removes the block group's key from
              the extent tree, key is
              (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)
      
      A sample transaction abort trace:
      
        [78912.403537] ------------[ cut here ]------------
        [78912.403811] BTRFS: Transaction aborted (error -17)
        [78912.404082] WARNING: CPU: 2 PID: 20465 at fs/btrfs/extent-tree.c:10551 btrfs_create_pending_block_groups+0x196/0x250 [btrfs]
        (...)
        [78912.405642] CPU: 2 PID: 20465 Comm: btrfs Tainted: G        W         5.0.0-btrfs-next-46 #1
        [78912.405941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [78912.406586] RIP: 0010:btrfs_create_pending_block_groups+0x196/0x250 [btrfs]
        (...)
        [78912.407636] RSP: 0018:ffff9d3d4b7e3b08 EFLAGS: 00010282
        [78912.407997] RAX: 0000000000000000 RBX: ffff90959a3796f0 RCX: 0000000000000006
        [78912.408369] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff909636b16860
        [78912.408746] RBP: ffff909626758a58 R08: 0000000000000000 R09: 0000000000000000
        [78912.409144] R10: ffff9095ff462400 R11: 0000000000000000 R12: ffff90959a379588
        [78912.409521] R13: ffff909626758ab0 R14: ffff9095036c0000 R15: ffff9095299e1158
        [78912.409899] FS:  00007f387f16f700(0000) GS:ffff909636b00000(0000) knlGS:0000000000000000
        [78912.410285] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [78912.410673] CR2: 00007f429fc87cbc CR3: 000000014440a004 CR4: 00000000003606e0
        [78912.411095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [78912.411496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [78912.411898] Call Trace:
        [78912.412318]  __btrfs_end_transaction+0x5b/0x1c0 [btrfs]
        [78912.412746]  btrfs_inc_block_group_ro+0xcf/0x160 [btrfs]
        [78912.413179]  scrub_enumerate_chunks+0x188/0x5b0 [btrfs]
        [78912.413622]  ? __mutex_unlock_slowpath+0x100/0x2a0
        [78912.414078]  btrfs_scrub_dev+0x2ef/0x720 [btrfs]
        [78912.414535]  ? __sb_start_write+0xd4/0x1c0
        [78912.414963]  ? mnt_want_write_file+0x24/0x50
        [78912.415403]  btrfs_ioctl+0x17fb/0x3120 [btrfs]
        [78912.415832]  ? lock_acquire+0xa6/0x190
        [78912.416256]  ? do_vfs_ioctl+0xa2/0x6f0
        [78912.416685]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
        [78912.417116]  do_vfs_ioctl+0xa2/0x6f0
        [78912.417534]  ? __fget+0x113/0x200
        [78912.417954]  ksys_ioctl+0x70/0x80
        [78912.418369]  __x64_sys_ioctl+0x16/0x20
        [78912.418812]  do_syscall_64+0x60/0x1b0
        [78912.419231]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [78912.419644] RIP: 0033:0x7f3880252dd7
        (...)
        [78912.420957] RSP: 002b:00007f387f16ed68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [78912.421426] RAX: ffffffffffffffda RBX: 000055f5becc1df0 RCX: 00007f3880252dd7
        [78912.421889] RDX: 000055f5becc1df0 RSI: 00000000c400941b RDI: 0000000000000003
        [78912.422354] RBP: 0000000000000000 R08: 00007f387f16f700 R09: 0000000000000000
        [78912.422790] R10: 00007f387f16f700 R11: 0000000000000246 R12: 0000000000000000
        [78912.423202] R13: 00007ffda49c266f R14: 0000000000000000 R15: 00007f388145e040
        [78912.425505] ---[ end trace eb9bfe7c426fc4d3 ]---
      
      Fix this by calling remove_extent_mapping(), at btrfs_remove_block_group(),
      only at the very end, after removing the block group item key from the
      extent tree (and removing the free space tree entry if we are using the
      free space tree feature).
      
      Fixes: 04216820 ("Btrfs: fix race between fs trimming and block group remove/allocation")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1d064876
    • L
      cifs: smbd: take an array of reqeusts when sending upper layer data · 96b44c20
      Long Li 提交于
      [ Upstream commit 4739f2328661d070f93f9bcc8afb2a82706c826d ]
      
      To support compounding, __smb_send_rqst() now sends an array of requests to
      the transport layer.
      Change smbd_send() to take an array of requests, and send them in as few
      packets as possible.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      96b44c20
    • T
      ext4: protect journal inode's blocks using block_validity · 2fd4629d
      Theodore Ts'o 提交于
      [ Upstream commit 345c0dbf3a30872d9b204db96b5857cd00808cae ]
      
      Add the blocks which belong to the journal inode to block_validity's
      system zone so attempts to deallocate or overwrite the journal due a
      corrupted file system where the journal blocks are also claimed by
      another inode.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2fd4629d
    • Z
      cifs: Fix lease buffer length error · 4061e662
      ZhangXiaoxu 提交于
      [ Upstream commit b57a55e2200ede754e4dc9cce4ba9402544b9365 ]
      
      There is a KASAN slab-out-of-bounds:
      BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0
      Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539
      
      CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
                  rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
      Call Trace:
       dump_stack+0xdd/0x12a
       print_address_description+0xa7/0x540
       kasan_report+0x1ff/0x550
       check_memory_region+0x2f1/0x310
       memcpy+0x2f/0x80
       _copy_from_iter_full+0x783/0xaa0
       tcp_sendmsg_locked+0x1840/0x4140
       tcp_sendmsg+0x37/0x60
       inet_sendmsg+0x18c/0x490
       sock_sendmsg+0xae/0x130
       smb_send_kvec+0x29c/0x520
       __smb_send_rqst+0x3ef/0xc60
       smb_send_rqst+0x25a/0x2e0
       compound_send_recv+0x9e8/0x2af0
       cifs_send_recv+0x24/0x30
       SMB2_open+0x35e/0x1620
       open_shroot+0x27b/0x490
       smb2_open_op_close+0x4e1/0x590
       smb2_query_path_info+0x2ac/0x650
       cifs_get_inode_info+0x1058/0x28f0
       cifs_root_iget+0x3bb/0xf80
       cifs_smb3_do_mount+0xe00/0x14c0
       cifs_do_mount+0x15/0x20
       mount_fs+0x5e/0x290
       vfs_kern_mount+0x88/0x460
       do_mount+0x398/0x31e0
       ksys_mount+0xc6/0x150
       __x64_sys_mount+0xea/0x190
       do_syscall_64+0x122/0x590
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      It can be reproduced by the following step:
        1. samba configured with: server max protocol = SMB2_10
        2. mount -o vers=default
      
      When parse the mount version parameter, the 'ops' and 'vals'
      was setted to smb30,  if negotiate result is smb21, just
      update the 'ops' to smb21, but the 'vals' is still smb30.
      When add lease context, the iov_base is allocated with smb21
      ops, but the iov_len is initiallited with the smb30. Because
      the iov_len is longer than iov_base, when send the message,
      copy array out of bounds.
      
      we need to keep the 'ops' and 'vals' consistent.
      
      Fixes: 9764c02f ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
      Fixes: d5c7076b772a ("smb3: add smb3.1.1 to default dialect list")
      Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4061e662
    • P
      CIFS: Fix leaking locked VFS cache pages in writeback retry · 778d626c
      Pavel Shilovsky 提交于
      [ Upstream commit 165df9a080b6863ae286fa01780c13d87cd81076 ]
      
      If we don't find a writable file handle when retrying writepages
      we break of the loop and do not unlock and put pages neither from
      wdata2 nor from the original wdata. Fix this by walking through
      all the remaining pages and cleanup them properly.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      778d626c
    • P
      CIFS: Fix error paths in writeback code · fb2dabea
      Pavel Shilovsky 提交于
      [ Upstream commit 9a66396f1857cc1de06f4f4771797315e1a4ea56 ]
      
      This patch aims to address writeback code problems related to error
      paths. In particular it respects EINTR and related error codes and
      stores and returns the first error occurred during writeback.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fb2dabea
    • D
      btrfs: init csum_list before possible free · 476ecc14
      Dan Robertson 提交于
      [ Upstream commit e49be14b8d80e23bb7c53d78c21717a474ade76b ]
      
      The scrub_ctx csum_list member must be initialized before scrub_free_ctx
      is called. If the csum_list is not initialized beforehand, the
      list_empty call in scrub_free_csums will result in a null deref if the
      allocation fails in the for loop.
      
      Fixes: a2de733c ("btrfs: scrub")
      CC: stable@vger.kernel.org # 3.0+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDan Robertson <dan@dlrobertson.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      476ecc14
    • A
      btrfs: scrub: fix circular locking dependency warning · 936690bd
      Anand Jain 提交于
      [ Upstream commit 1cec3f27168d7835ff3d23ab371cd548440131bb ]
      
      This fixes a longstanding lockdep warning triggered by
      fstests/btrfs/011.
      
      Circular locking dependency check reports warning[1], that's because the
      btrfs_scrub_dev() calls the stack #0 below with, the fs_info::scrub_lock
      held. The test case leading to this warning:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /btrfs
        $ btrfs scrub start -B /btrfs
      
      In fact we have fs_info::scrub_workers_refcnt to track if the init and destroy
      of the scrub workers are needed. So once we have incremented and decremented
      the fs_info::scrub_workers_refcnt value in the thread, its ok to drop the
      scrub_lock, and then actually do the btrfs_destroy_workqueue() part. So this
      patch drops the scrub_lock before calling btrfs_destroy_workqueue().
      
        [359.258534] ======================================================
        [359.260305] WARNING: possible circular locking dependency detected
        [359.261938] 5.0.0-rc6-default #461 Not tainted
        [359.263135] ------------------------------------------------------
        [359.264672] btrfs/20975 is trying to acquire lock:
        [359.265927] 00000000d4d32bea ((wq_completion)"%s-%s""btrfs", name){+.+.}, at: flush_workqueue+0x87/0x540
        [359.268416]
        [359.268416] but task is already holding lock:
        [359.270061] 0000000053ea26a6 (&fs_info->scrub_lock){+.+.}, at: btrfs_scrub_dev+0x322/0x590 [btrfs]
        [359.272418]
        [359.272418] which lock already depends on the new lock.
        [359.272418]
        [359.274692]
        [359.274692] the existing dependency chain (in reverse order) is:
        [359.276671]
        [359.276671] -> #3 (&fs_info->scrub_lock){+.+.}:
        [359.278187]        __mutex_lock+0x86/0x9c0
        [359.279086]        btrfs_scrub_pause+0x31/0x100 [btrfs]
        [359.280421]        btrfs_commit_transaction+0x1e4/0x9e0 [btrfs]
        [359.281931]        close_ctree+0x30b/0x350 [btrfs]
        [359.283208]        generic_shutdown_super+0x64/0x100
        [359.284516]        kill_anon_super+0x14/0x30
        [359.285658]        btrfs_kill_super+0x12/0xa0 [btrfs]
        [359.286964]        deactivate_locked_super+0x29/0x60
        [359.288242]        cleanup_mnt+0x3b/0x70
        [359.289310]        task_work_run+0x98/0xc0
        [359.290428]        exit_to_usermode_loop+0x83/0x90
        [359.291445]        do_syscall_64+0x15b/0x180
        [359.292598]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [359.294011]
        [359.294011] -> #2 (sb_internal#2){.+.+}:
        [359.295432]        __sb_start_write+0x113/0x1d0
        [359.296394]        start_transaction+0x369/0x500 [btrfs]
        [359.297471]        btrfs_finish_ordered_io+0x2aa/0x7c0 [btrfs]
        [359.298629]        normal_work_helper+0xcd/0x530 [btrfs]
        [359.299698]        process_one_work+0x246/0x610
        [359.300898]        worker_thread+0x3c/0x390
        [359.302020]        kthread+0x116/0x130
        [359.303053]        ret_from_fork+0x24/0x30
        [359.304152]
        [359.304152] -> #1 ((work_completion)(&work->normal_work)){+.+.}:
        [359.306100]        process_one_work+0x21f/0x610
        [359.307302]        worker_thread+0x3c/0x390
        [359.308465]        kthread+0x116/0x130
        [359.309357]        ret_from_fork+0x24/0x30
        [359.310229]
        [359.310229] -> #0 ((wq_completion)"%s-%s""btrfs", name){+.+.}:
        [359.311812]        lock_acquire+0x90/0x180
        [359.312929]        flush_workqueue+0xaa/0x540
        [359.313845]        drain_workqueue+0xa1/0x180
        [359.314761]        destroy_workqueue+0x17/0x240
        [359.315754]        btrfs_destroy_workqueue+0x57/0x200 [btrfs]
        [359.317245]        scrub_workers_put+0x2c/0x60 [btrfs]
        [359.318585]        btrfs_scrub_dev+0x336/0x590 [btrfs]
        [359.319944]        btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs]
        [359.321622]        btrfs_ioctl+0x28a4/0x2e40 [btrfs]
        [359.322908]        do_vfs_ioctl+0xa2/0x6d0
        [359.324021]        ksys_ioctl+0x3a/0x70
        [359.325066]        __x64_sys_ioctl+0x16/0x20
        [359.326236]        do_syscall_64+0x54/0x180
        [359.327379]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [359.328772]
        [359.328772] other info that might help us debug this:
        [359.328772]
        [359.330990] Chain exists of:
        [359.330990]   (wq_completion)"%s-%s""btrfs", name --> sb_internal#2 --> &fs_info->scrub_lock
        [359.330990]
        [359.334376]  Possible unsafe locking scenario:
        [359.334376]
        [359.336020]        CPU0                    CPU1
        [359.337070]        ----                    ----
        [359.337821]   lock(&fs_info->scrub_lock);
        [359.338506]                                lock(sb_internal#2);
        [359.339506]                                lock(&fs_info->scrub_lock);
        [359.341461]   lock((wq_completion)"%s-%s""btrfs", name);
        [359.342437]
        [359.342437]  *** DEADLOCK ***
        [359.342437]
        [359.343745] 1 lock held by btrfs/20975:
        [359.344788]  #0: 0000000053ea26a6 (&fs_info->scrub_lock){+.+.}, at: btrfs_scrub_dev+0x322/0x590 [btrfs]
        [359.346778]
        [359.346778] stack backtrace:
        [359.347897] CPU: 0 PID: 20975 Comm: btrfs Not tainted 5.0.0-rc6-default #461
        [359.348983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
        [359.350501] Call Trace:
        [359.350931]  dump_stack+0x67/0x90
        [359.351676]  print_circular_bug.isra.37.cold.56+0x15c/0x195
        [359.353569]  check_prev_add.constprop.44+0x4f9/0x750
        [359.354849]  ? check_prev_add.constprop.44+0x286/0x750
        [359.356505]  __lock_acquire+0xb84/0xf10
        [359.357505]  lock_acquire+0x90/0x180
        [359.358271]  ? flush_workqueue+0x87/0x540
        [359.359098]  flush_workqueue+0xaa/0x540
        [359.359912]  ? flush_workqueue+0x87/0x540
        [359.360740]  ? drain_workqueue+0x1e/0x180
        [359.361565]  ? drain_workqueue+0xa1/0x180
        [359.362391]  drain_workqueue+0xa1/0x180
        [359.363193]  destroy_workqueue+0x17/0x240
        [359.364539]  btrfs_destroy_workqueue+0x57/0x200 [btrfs]
        [359.365673]  scrub_workers_put+0x2c/0x60 [btrfs]
        [359.366618]  btrfs_scrub_dev+0x336/0x590 [btrfs]
        [359.367594]  ? start_transaction+0xa1/0x500 [btrfs]
        [359.368679]  btrfs_dev_replace_by_ioctl.cold.19+0x179/0x1bb [btrfs]
        [359.369545]  btrfs_ioctl+0x28a4/0x2e40 [btrfs]
        [359.370186]  ? __lock_acquire+0x263/0xf10
        [359.370777]  ? kvm_clock_read+0x14/0x30
        [359.371392]  ? kvm_sched_clock_read+0x5/0x10
        [359.372248]  ? sched_clock+0x5/0x10
        [359.372786]  ? sched_clock_cpu+0xc/0xc0
        [359.373662]  ? do_vfs_ioctl+0xa2/0x6d0
        [359.374552]  do_vfs_ioctl+0xa2/0x6d0
        [359.375378]  ? do_sigaction+0xff/0x250
        [359.376233]  ksys_ioctl+0x3a/0x70
        [359.376954]  __x64_sys_ioctl+0x16/0x20
        [359.377772]  do_syscall_64+0x54/0x180
        [359.378841]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [359.380422] RIP: 0033:0x7f5429296a97
      
      Backporting to older kernels: scrub_nocow_workers must be freed the same
      way as the others.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      [ update changelog ]
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      936690bd
    • D
      btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex · ff55333f
      David Sterba 提交于
      [ Upstream commit 0e94c4f45d14cf89d1f40c91b0a8517e791672a7 ]
      
      The scrub context is allocated with GFP_KERNEL and called from
      btrfs_scrub_dev under the fs_info::device_list_mutex. This is not safe
      regarding reclaim that could try to flush filesystem data in order to
      get the memory. And the device_list_mutex is held during superblock
      commit, so this would cause a lockup.
      
      Move the alocation and initialization before any changes that require
      the mutex.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ff55333f
    • D
      btrfs: scrub: pass fs_info to scrub_setup_ctx · 8ba3169d
      David Sterba 提交于
      [ Upstream commit 92f7ba434f51e8e9317f1d166105889aa230abd2 ]
      
      We can pass fs_info directly as this is the only member of btrfs_device
      that's bing used inside scrub_setup_ctx.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8ba3169d
    • Q
      btrfs: Use real device structure to verify dev extent · be77686f
      Qu Wenruo 提交于
      [ Upstream commit 1b3922a8bc74231f9a767d1be6d9a061a4d4eeab ]
      
      [BUG]
      Linux v5.0-rc1 will fail fstests/btrfs/163 with the following kernel
      message:
      
        BTRFS error (device dm-6): dev extent devid 1 physical offset 13631488 len 8388608 is beyond device boundary 0
        BTRFS error (device dm-6): failed to verify dev extents against chunks: -117
        BTRFS error (device dm-6): open_ctree failed
      
      [CAUSE]
      Commit cf90d884 ("btrfs: Introduce mount time chunk <-> dev extent
      mapping check") introduced strict check on dev extents.
      
      We use btrfs_find_device() with dev uuid and fs uuid set to NULL, and
      only dependent on @devid to find the real device.
      
      For seed devices, we call clone_fs_devices() in open_seed_devices() to
      allow us search seed devices directly.
      
      However clone_fs_devices() just populates devices with devid and dev
      uuid, without populating other essential members, like disk_total_bytes.
      
      This makes any device returned by btrfs_find_device(fs_info, devid,
      NULL, NULL) is just a dummy, with 0 disk_total_bytes, and any dev
      extents on the seed device will not pass the device boundary check.
      
      [FIX]
      This patch will try to verify the device returned by btrfs_find_device()
      and if it's a dummy then re-search in seed devices.
      
      Fixes: cf90d884 ("btrfs: Introduce mount time chunk <-> dev extent mapping check")
      CC: stable@vger.kernel.org # 4.19+
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      be77686f
    • Q
      btrfs: volumes: Make sure no dev extent is beyond device boundary · a2790b99
      Qu Wenruo 提交于
      [ Upstream commit 05a37c48604c19b50873fd9663f9140c150469d1 ]
      
      Add extra dev extent end check against device boundary.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a2790b99
    • N
      btrfs: Fix error handling in btrfs_cleanup_ordered_extents · eb124aaa
      Nikolay Borisov 提交于
      [ Upstream commit d1051d6ebf8ef3517a5a3cf82bba8436d190f1c2 ]
      
      Running btrfs/124 in a loop hung up on me sporadically with the
      following call trace:
      
      	btrfs           D    0  5760   5324 0x00000000
      	Call Trace:
      	 ? __schedule+0x243/0x800
      	 schedule+0x33/0x90
      	 btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
      	 ? wait_woken+0xa0/0xa0
      	 btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
      	 btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
      	 btrfs_relocate_chunk+0x49/0x100 [btrfs]
      	 btrfs_balance+0xbeb/0x1740 [btrfs]
      	 btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
      	 btrfs_ioctl+0x1691/0x3110 [btrfs]
      	 ? lockdep_hardirqs_on+0xed/0x180
      	 ? __handle_mm_fault+0x8e7/0xfb0
      	 ? _raw_spin_unlock+0x24/0x30
      	 ? __handle_mm_fault+0x8e7/0xfb0
      	 ? do_vfs_ioctl+0xa5/0x6e0
      	 ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
      	 do_vfs_ioctl+0xa5/0x6e0
      	 ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
      	 ksys_ioctl+0x3a/0x70
      	 __x64_sys_ioctl+0x16/0x20
      	 do_syscall_64+0x60/0x1b0
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This happens because during page writeback it's valid for
      writepage_delalloc to instantiate a delalloc range which doesn't belong
      to the page currently being written back.
      
      The reason this case is valid is due to find_lock_delalloc_range
      returning any available range after the passed delalloc_start and
      ignoring whether the page under writeback is within that range.
      
      In turn ordered extents (OE) are always created for the returned range
      from find_lock_delalloc_range. If, however, a failure occurs while OE
      are being created then the clean up code in btrfs_cleanup_ordered_extents
      will be called.
      
      Unfortunately the code in btrfs_cleanup_ordered_extents doesn't consider
      the case of such 'foreign' range being processed and instead it always
      assumes that the range OE are created for belongs to the page. This
      leads to the first page of such foregin range to not be cleaned up since
      it's deliberately missed and skipped by the current cleaning up code.
      
      Fix this by correctly checking whether the current page belongs to the
      range being instantiated and if so adjsut the range parameters passed
      for cleaning up. If it doesn't, then just clean the whole OE range
      directly.
      
      Fixes: 52427260 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      eb124aaa
    • N
      btrfs: Remove extent_io_ops::fill_delalloc · 1669d1d2
      Nikolay Borisov 提交于
      [ Upstream commit 5eaad97af8aeff38debe7d3c69ec3a0d71f8350f ]
      
      This callback is called only from writepage_delalloc which in turn is
      guaranteed to be called from the data page writeout path. In the end
      there is no reason to have the call to this function to be indrected via
      the extent_io_ops structure. This patch removes the callback definition,
      exports the function and calls it directly. No functional changes.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ rename to btrfs_run_delalloc_range ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1669d1d2
    • F
      Btrfs: fix deadlock with memory reclaim during scrub · 338a528b
      Filipe Manana 提交于
      [ Upstream commit a5fb11429167ee6ddeeacc554efaf5776b36433a ]
      
      When a transaction commit starts, it attempts to pause scrub and it blocks
      until the scrub is paused. So while the transaction is blocked waiting for
      scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
      otherwise we risk getting into a deadlock with reclaim.
      
      Checking for scrub pause requests is done early at the beginning of the
      while loop of scrub_stripe() and later in the loop, scrub_extent() and
      scrub_raid56_parity() are called, which in turn call scrub_pages() and
      scrub_pages_for_parity() respectively. These last two functions do memory
      allocations using GFP_KERNEL. Same problem could happen while scrubbing
      the super blocks, since it calls scrub_pages().
      
      We also can not have any of the worker tasks, created by the scrub task,
      doing GFP_KERNEL allocations, because before pausing, the scrub task waits
      for all the worker tasks to complete (also done at scrub_stripe()).
      
      So make sure GFP_NOFS is used for the memory allocations because at any
      time a scrub pause request can happen from another task that started to
      commit a transaction.
      
      Fixes: 58c4e173 ("btrfs: scrub: use GFP_KERNEL on the submission path")
      CC: stable@vger.kernel.org # 4.6+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      338a528b
    • O
      Btrfs: clean up scrub is_dev_replace parameter · fac80347
      Omar Sandoval 提交于
      [ Upstream commit 32934280967d00dc2b5c4d3b63b21a9c8638326e ]
      
      struct scrub_ctx has an ->is_dev_replace member, so there's no point in
      passing around is_dev_replace where sctx is available.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fac80347
  2. 10 9月, 2019 5 次提交
    • D
      afs: Fix leak in afs_lookup_cell_rcu() · 1a31b0d0
      David Howells 提交于
      [ Upstream commit a5fb8e6c02d6a518fb2b1a2b8c2471fa77b69436 ]
      
      Fix a leak on the cell refcount in afs_lookup_cell_rcu() due to
      non-clearance of the default error in the case a NULL cell name is passed
      and the workstation default cell is used.
      
      Also put a bit at the end to make sure we don't leak a cell ref if we're
      going to be returning an error.
      
      This leak results in an assertion like the following when the kafs module is
      unloaded:
      
      	AFS: Assertion failed
      	2 == 1 is false
      	0x2 == 0x1 is false
      	------------[ cut here ]------------
      	kernel BUG at fs/afs/cell.c:770!
      	...
      	RIP: 0010:afs_manage_cells+0x220/0x42f [kafs]
      	...
      	 process_one_work+0x4c2/0x82c
      	 ? pool_mayday_timeout+0x1e1/0x1e1
      	 ? do_raw_spin_lock+0x134/0x175
      	 worker_thread+0x336/0x4a6
      	 ? rescuer_thread+0x4af/0x4af
      	 kthread+0x1de/0x1ee
      	 ? kthread_park+0xd4/0xd4
      	 ret_from_fork+0x24/0x30
      
      Fixes: 989782dc ("afs: Overhaul cell database management")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1a31b0d0
    • L
      ceph: fix buffer free while holding i_ceph_lock in fill_inode() · b84817d9
      Luis Henriques 提交于
      [ Upstream commit af8a85a41734f37b67ba8ce69d56b685bee4ac48 ]
      
      Calling ceph_buffer_put() in fill_inode() may result in freeing the
      i_xattrs.blob buffer while holding the i_ceph_lock.  This can be fixed by
      postponing the call until later, when the lock is released.
      
      The following backtrace was triggered by fstests generic/070.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
        in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4
        6 locks held by kworker/0:4/3852:
         #0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0
         #1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0
         #2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476
         #3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476
         #4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476
         #5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70
        CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
        Workqueue: ceph-msgr ceph_con_workfn
        Call Trace:
         dump_stack+0x67/0x90
         ___might_sleep.cold+0x9f/0xb1
         vfree+0x4b/0x60
         ceph_buffer_release+0x1b/0x60
         fill_inode.isra.0+0xa9b/0xf70
         ceph_fill_trace+0x13b/0xc70
         ? dispatch+0x2eb/0x1476
         dispatch+0x320/0x1476
         ? __mutex_unlock_slowpath+0x4d/0x2a0
         ceph_con_workfn+0xc97/0x2ec0
         ? process_one_work+0x1b8/0x5f0
         process_one_work+0x244/0x5f0
         worker_thread+0x4d/0x3e0
         kthread+0x105/0x140
         ? process_one_work+0x5f0/0x5f0
         ? kthread_park+0x90/0x90
         ret_from_fork+0x3a/0x50
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b84817d9
    • L
      ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob() · 5cd1e355
      Luis Henriques 提交于
      [ Upstream commit 12fe3dda7ed89c95cc0ef7abc001ad1ad3e092f8 ]
      
      Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in
      freeing the i_xattrs.blob buffer while holding the i_ceph_lock.  This can
      be fixed by having this function returning the old blob buffer and have
      the callers of this function freeing it when the lock is released.
      
      The following backtrace was triggered by fstests generic/117.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
        in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress
        4 locks held by fsstress/649:
         #0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0
         #1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60
         #2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60
         #3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60
        CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x67/0x90
         ___might_sleep.cold+0x9f/0xb1
         vfree+0x4b/0x60
         ceph_buffer_release+0x1b/0x60
         __ceph_build_xattrs_blob+0x12b/0x170
         __send_cap+0x302/0x540
         ? __lock_acquire+0x23c/0x1e40
         ? __mark_caps_flushing+0x15c/0x280
         ? _raw_spin_unlock+0x24/0x30
         ceph_check_caps+0x5f0/0xc60
         ceph_flush_dirty_caps+0x7c/0x150
         ? __ia32_sys_fdatasync+0x20/0x20
         ceph_sync_fs+0x5a/0x130
         iterate_supers+0x8f/0xf0
         ksys_sync+0x4f/0xb0
         __ia32_sys_sync+0xa/0x10
         do_syscall_64+0x50/0x1c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7fc6409ab617
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5cd1e355
    • L
      ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr() · dfb8712c
      Luis Henriques 提交于
      [ Upstream commit 86968ef21596515958d5f0a40233d02be78ecec0 ]
      
      Calling ceph_buffer_put() in __ceph_setxattr() may end up freeing the
      i_xattrs.prealloc_blob buffer while holding the i_ceph_lock.  This can be
      fixed by postponing the call until later, when the lock is released.
      
      The following backtrace was triggered by fstests generic/117.
      
        BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
        in_atomic(): 1, irqs_disabled(): 0, pid: 650, name: fsstress
        3 locks held by fsstress/650:
         #0: 00000000870a0fe8 (sb_writers#8){.+.+}, at: mnt_want_write+0x20/0x50
         #1: 00000000ba0c4c74 (&type->i_mutex_dir_key#6){++++}, at: vfs_setxattr+0x55/0xa0
         #2: 000000008dfbb3f2 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: __ceph_setxattr+0x297/0x810
        CPU: 1 PID: 650 Comm: fsstress Not tainted 5.2.0+ #437
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x67/0x90
         ___might_sleep.cold+0x9f/0xb1
         vfree+0x4b/0x60
         ceph_buffer_release+0x1b/0x60
         __ceph_setxattr+0x2b4/0x810
         __vfs_setxattr+0x66/0x80
         __vfs_setxattr_noperm+0x59/0xf0
         vfs_setxattr+0x81/0xa0
         setxattr+0x115/0x230
         ? filename_lookup+0xc9/0x140
         ? rcu_read_lock_sched_held+0x74/0x80
         ? rcu_sync_lockdep_assert+0x2e/0x60
         ? __sb_start_write+0x142/0x1a0
         ? mnt_want_write+0x20/0x50
         path_setxattr+0xba/0xd0
         __x64_sys_lsetxattr+0x24/0x30
         do_syscall_64+0x50/0x1c0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7ff23514359a
      Signed-off-by: NLuis Henriques <lhenriques@suse.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      dfb8712c
    • D
      vfs: fix page locking deadlocks when deduping files · ac3cc25f
      Darrick J. Wong 提交于
      [ Upstream commit edc58dd0123b552453a74369bd0c8d890b497b4b ]
      
      When dedupe wants to use the page cache to compare parts of two files
      for dedupe, we must be very careful to handle locking correctly.  The
      current code doesn't do this.  It must lock and unlock the page only
      once if the two pages are the same, since the overlapping range check
      doesn't catch this when blocksize < pagesize.  If the pages are distinct
      but from the same file, we must observe page locking order and lock them
      in order of increasing offset to avoid clashing with writeback locking.
      
      Fixes: 876bec6f ("vfs: refactor clone/dedupe_file_range common functions")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ac3cc25f
  3. 06 9月, 2019 8 次提交
  4. 29 8月, 2019 4 次提交