1. 27 10月, 2018 7 次提交
  2. 23 10月, 2018 11 次提交
    • C
      f2fs: fix to keep project quota consistent · 78130819
      Chao Yu 提交于
      This patch does below changes to keep consistence of project quota data
      in sudden power-cut case:
      - update inode.i_projid and project quota atomically under lock_op() in
      f2fs_ioc_setproject()
      - recover inode.i_projid and project quota in recover_inode()
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      78130819
    • C
      f2fs: guarantee journalled quota data by checkpoint · af033b2a
      Chao Yu 提交于
      For journalled quota mode, let checkpoint to flush dquot dirty data
      and quota file data to guarntee persistence of all quota sysfile in
      last checkpoint, by this way, we can avoid corrupting quota sysfile
      when encountering SPO.
      
      The implementation is as below:
      
      1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
      cached dquot metadata changes in quota subsystem, and later checkpoint
      should:
       a) flush dquot metadata into quota file.
       b) flush quota file to storage to keep file usage be consistent.
      
      2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
      operation failed due to -EIO or -ENOSPC, so later,
       a) checkpoint will skip syncing dquot metadata.
       b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
          hint for fsck repairing.
      
      3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
      data updating is very heavy, it may cause hungtask in block_operation().
      To avoid this, if our retry time exceed threshold, let's just skip
      flushing and retry in next checkpoint().
      Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: avoid warnings and set fsck flag]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      af033b2a
    • S
      f2fs: cleanup dirty pages if recover failed · 26b5a079
      Sheng Yong 提交于
      During recover, we will try to create new dentries for inodes with
      dentry_mark. But if the parent is missing (e.g. killed by fsck),
      recover will break. But those recovered dirty pages are not cleanup.
      This will hit f2fs_bug_on:
      
      [   53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
      [   53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
      [   53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
      [   53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
      [   53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
      [   53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
      [   53.546174] Modules linked in:
      [   53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
      [   53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
      [   53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
      [   53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
      [   53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
      [   53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
      [   53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
      [   53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
      [   53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
      [   53.546226] FS:  00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
      [   53.546229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
      [   53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   53.546242] Call Trace:
      [   53.546248]  f2fs_submit_page_bio+0x95/0x740
      [   53.546253]  read_node_page+0x161/0x1e0
      [   53.546271]  ? truncate_node+0x650/0x650
      [   53.546283]  ? add_to_page_cache_lru+0x12c/0x170
      [   53.546288]  ? pagecache_get_page+0x262/0x2d0
      [   53.546292]  __get_node_page+0x200/0x660
      [   53.546302]  f2fs_update_inode_page+0x4a/0x160
      [   53.546306]  f2fs_write_inode+0x86/0xb0
      [   53.546317]  __writeback_single_inode+0x49c/0x620
      [   53.546322]  writeback_single_inode+0xe4/0x1e0
      [   53.546326]  sync_inode_metadata+0x93/0xd0
      [   53.546330]  ? sync_inode+0x10/0x10
      [   53.546342]  ? do_raw_spin_unlock+0xed/0x100
      [   53.546347]  f2fs_sync_inode_meta+0xe0/0x130
      [   53.546351]  f2fs_fill_super+0x287d/0x2d10
      [   53.546367]  ? vsnprintf+0x742/0x7a0
      [   53.546372]  ? f2fs_commit_super+0x180/0x180
      [   53.546379]  ? up_write+0x20/0x40
      [   53.546385]  ? set_blocksize+0x5f/0x140
      [   53.546391]  ? f2fs_commit_super+0x180/0x180
      [   53.546402]  mount_bdev+0x181/0x200
      [   53.546406]  mount_fs+0x94/0x180
      [   53.546411]  vfs_kern_mount+0x6c/0x1e0
      [   53.546415]  do_mount+0xe5e/0x1510
      [   53.546420]  ? fs_reclaim_release+0x9/0x30
      [   53.546424]  ? copy_mount_string+0x20/0x20
      [   53.546428]  ? fs_reclaim_acquire+0xd/0x30
      [   53.546435]  ? __might_sleep+0x2c/0xc0
      [   53.546440]  ? ___might_sleep+0x53/0x170
      [   53.546453]  ? __might_fault+0x4c/0x60
      [   53.546468]  ? _copy_from_user+0x95/0xa0
      [   53.546474]  ? memdup_user+0x39/0x60
      [   53.546478]  ksys_mount+0x88/0xb0
      [   53.546482]  __x64_sys_mount+0x5d/0x70
      [   53.546495]  do_syscall_64+0x65/0x130
      [   53.546503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   53.547639] ---[ end trace b804d1ea2fec893e ]---
      
      So if recover fails, we need to drop all recovered data.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      26b5a079
    • S
      f2fs: fix data corruption issue with hardware encryption · 1e78e8bd
      Sahitya Tummala 提交于
      Direct IO can be used in case of hardware encryption. The following
      scenario results into data corruption issue in this path -
      
      Thread A -                          Thread B-
      -> write file#1 in direct IO
                                          -> GC gets kicked in
                                          -> GC submitted bio on meta mapping
      				       for file#1, but pending completion
      -> write file#1 again with new data
         in direct IO
                                          -> GC bio gets completed now
                                          -> GC writes old data to the new
                                             location and thus file#1 is
      				       corrupted.
      
      Fix this by submitting and waiting for pending io on meta mapping
      for direct IO case in f2fs_map_blocks().
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1e78e8bd
    • C
      f2fs: fix to recover inode->i_flags of inode block during POR · 0c093b59
      Chao Yu 提交于
      Testcase to reproduce this bug:
      1. mkfs.f2fs /dev/sdd
      2. mount -t f2fs /dev/sdd /mnt/f2fs
      3. touch /mnt/f2fs/file
      4. sync
      5. chattr +a /mnt/f2fs/file
      6. xfs_io -a /mnt/f2fs/file -c "fsync"
      7. godown /mnt/f2fs
      8. umount /mnt/f2fs
      9. mount -t f2fs /dev/sdd /mnt/f2fs
      10. xfs_io /mnt/f2fs/file
      
      There is no error when opening this file w/o O_APPEND, but actually,
      we expect the correct result should be:
      
      /mnt/f2fs/file: Operation not permitted
      
      The root cause is, in recover_inode(), we recover inode->i_flags more
      than F2FS_I(inode)->i_flags, so fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0c093b59
    • C
      f2fs: spread f2fs_set_inode_flags() · 9149a5eb
      Chao Yu 提交于
      This patch changes codes as below:
      - use f2fs_set_inode_flags() to update i_flags atomically to avoid
      potential race.
      - synchronize F2FS_I(inode)->i_flags to inode->i_flags in
      f2fs_new_inode().
      - use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9149a5eb
    • C
      f2fs: fix to spread clear_cold_data() · 2baf0781
      Chao Yu 提交于
      We need to drop PG_checked flag on page as well when we clear PG_uptodate
      flag, in order to avoid treating the page as GCing one later.
      Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2baf0781
    • J
      Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()" · 164a63fa
      Jaegeuk Kim 提交于
      This reverts commit 66110abc.
      
      If we clear the cold data flag out of the writeback flow, we can miscount
      -1 by end_io, which incurs a deadlock caused by all I/Os being blocked during
      heavy GC.
      
      Balancing F2FS Async:
       - IO (CP:    1, Data:   -1, Flush: (   0    0    1), Discard: (   ...
      
      GC thread:                              IRQ
      - move_data_page()
       - set_page_dirty()
        - clear_cold_data()
                                              - f2fs_write_end_io()
                                               - type = WB_DATA_TYPE(page);
                                                 here, we get wrong type
                                               - dec_page_count(sbi, type);
       - f2fs_wait_on_page_writeback()
      
      Cc: <stable@vger.kernel.org>
      Reported-and-Tested-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      164a63fa
    • J
      f2fs: account read IOs and use IO counts for is_idle · 5f9abab4
      Jaegeuk Kim 提交于
      This patch adds issued read IO counts which is under block layer.
      
      Chao modified a bit, since:
      
      Below race can cause reversed reference on F2FS_RD_DATA, there is
      the same issue in f2fs_submit_page_bio(), fix them by relocate
      __submit_bio() and inc_page_count.
      
      Thread A			Thread B
      - f2fs_write_begin
       - f2fs_submit_page_read
       - __submit_bio
      				- f2fs_read_end_io
      				 - __read_end_io
      				 - dec_page_count(, F2FS_RD_DATA)
       - inc_page_count(, F2FS_RD_DATA)
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5f9abab4
    • C
      f2fs: fix to account IO correctly for cgroup writeback · 78efac53
      Chao Yu 提交于
      Now, we have supported cgroup writeback, it depends on correctly IO
      account of specified filesystem.
      
      But in commit d1b3e72d ("f2fs: submit bio of in-place-update pages"),
      we split write paths from f2fs_submit_page_mbio() to two:
      - f2fs_submit_page_bio() for IPU path
      - f2fs_submit_page_bio() for OPU path
      
      But still we account write IO only in f2fs_submit_page_mbio(), result in
      incorrect IO account, fix it by adding missing IO account in IPU path.
      
      Fixes: d1b3e72d ("f2fs: submit bio of in-place-update pages")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      78efac53
    • C
      f2fs: fix to account IO correctly · 4c58ed07
      Chao Yu 提交于
      Below race can cause reversed reference on dirty count, fix it by
      relocating __submit_bio() and inc_page_count().
      
      Thread A				Thread B
      - f2fs_inplace_write_data
       - f2fs_submit_page_bio
        - __submit_bio
      					- f2fs_write_end_io
      					 - dec_page_count
        - inc_page_count
      
      Cc: <stable@vger.kernel.org>
      Fixes: d1b3e72d ("f2fs: submit bio of in-place-update pages")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c58ed07
  3. 22 10月, 2018 4 次提交
  4. 20 10月, 2018 1 次提交
  5. 19 10月, 2018 4 次提交
  6. 18 10月, 2018 13 次提交
    • E
      fscache: Fix out of bound read in long cookie keys · fa520c47
      Eric Sandeen 提交于
      fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
      
       BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
       Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615
      
      and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
      
        BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
        BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
        Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466
      
      This happens for any index_key_len which is not divisible by 4 and is
      larger than the size of the inline key, because the code allocates exactly
      index_key_len for the key buffer, but the hashing loop is stepping through
      it 4 bytes (u32) at a time in the buf[] array.
      
      Fix this by calculating how many u32 buffers we'll need by using
      DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
      buffer to hold the index_key, then using that same count as the hashing
      index limit.
      
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa520c47
    • D
      fscache: Fix incomplete initialisation of inline key space · 1ff22883
      David Howells 提交于
      The inline key in struct rxrpc_cookie is insufficiently initialized,
      zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
      bytes will end up hashing uninitialized memory because the memcpy only
      partially fills the last buf[] element.
      
      Fix this by clearing fscache_cookie objects on allocation rather than using
      the slab constructor to initialise them.  We're going to pretty much fill
      in the entire struct anyway, so bringing it into our dcache writably
      shouldn't incur much overhead.
      
      This removes the need to do clearance in fscache_set_key() (where we aren't
      doing it correctly anyway).
      
      Also, we don't need to set cookie->key_len in fscache_set_key() as we
      already did it in the only caller, so remove that.
      
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ff22883
    • A
      cachefiles: fix the race between cachefiles_bury_object() and rmdir(2) · 169b8033
      Al Viro 提交于
      the victim might've been rmdir'ed just before the lock_rename();
      unlike the normal callers, we do not look the source up after the
      parents are locked - we know it beforehand and just recheck that it's
      still the child of what used to be its parent.  Unfortunately,
      the check is too weak - we don't spot a dead directory since its
      ->d_parent is unchanged, dentry is positive, etc.  So we sail all
      the way to ->rename(), with hosting filesystems _not_ expecting
      to be asked renaming an rmdir'ed subdirectory.
      
      The fix is easy, fortunately - the lock on parent is sufficient for
      making IS_DEADDIR() on child safe.
      
      Cc: stable@vger.kernel.org
      Fixes: 9ae326a6 (CacheFiles: A cache that backs onto a mounted filesystem)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      169b8033
    • C
      xfs: cancel COW blocks before swapext · 96987eea
      Christoph Hellwig 提交于
      We need to make sure we have no outstanding COW blocks before we swap
      extents, as there is nothing preventing us from having preallocated COW
      delalloc on either inode that swapext is called on.  That case can
      easily be reproduced by running generic/324 in always_cow mode:
      
      [  620.760572] XFS: Assertion failed: tip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap_util.c, line: 1669
      [  620.761608] ------------[ cut here ]------------
      [  620.762171] kernel BUG at fs/xfs/xfs_message.c:102!
      [  620.762732] invalid opcode: 0000 [#1] SMP PTI
      [  620.763272] CPU: 0 PID: 24153 Comm: xfs_fsr Tainted: G        W         4.19.0-rc1+ #4182
      [  620.764203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
      [  620.765202] RIP: 0010:assfail+0x20/0x28
      [  620.765646] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
      [  620.767758] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
      [  620.768359] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
      [  620.769174] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
      [  620.769982] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
      [  620.770788] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
      [  620.771638] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
      [  620.772504] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
      [  620.773475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  620.774168] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
      [  620.774978] Call Trace:
      [  620.775274]  xfs_swap_extent_forks+0x2a0/0x2e0
      [  620.775792]  xfs_swap_extents+0x38b/0xab0
      [  620.776256]  xfs_ioc_swapext+0x121/0x140
      [  620.776709]  xfs_file_ioctl+0x328/0xc90
      [  620.777154]  ? rcu_read_lock_sched_held+0x50/0x60
      [  620.777694]  ? xfs_iunlock+0x233/0x260
      [  620.778127]  ? xfs_setattr_nonsize+0x3be/0x6a0
      [  620.778647]  do_vfs_ioctl+0x9d/0x680
      [  620.779071]  ? ksys_fchown+0x47/0x80
      [  620.779552]  ksys_ioctl+0x35/0x70
      [  620.780040]  __x64_sys_ioctl+0x11/0x20
      [  620.780530]  do_syscall_64+0x4b/0x190
      [  620.780927]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  620.781467] RIP: 0033:0x7fdc364d0f07
      [  620.781900] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 28
      [  620.784044] RSP: 002b:00007ffe2a766038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  620.784896] RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fdc364d0f07
      [  620.785667] RDX: 0000560296ca2fc0 RSI: 00000000c0c0586d RDI: 0000000000000005
      [  620.786398] RBP: 0000000000000025 R08: 0000000000001200 R09: 0000000000000000
      [  620.787283] R10: 0000000000000432 R11: 0000000000000246 R12: 0000000000000005
      [  620.788051] R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000006
      [  620.788927] Modules linked in:
      [  620.789340] ---[ end trace 9503b7417ffdbdb0 ]---
      [  620.790065] RIP: 0010:assfail+0x20/0x28
      [  620.790642] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
      [  620.793038] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
      [  620.793609] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
      [  620.794317] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
      [  620.795025] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
      [  620.795778] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
      [  620.796675] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
      [  620.797782] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
      [  620.798908] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  620.799594] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
      [  620.800424] Kernel panic - not syncing: Fatal exception
      [  620.801191] Kernel Offset: disabled
      [  620.801597] ---[ end Kernel panic - not syncing: Fatal exception ]---
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      96987eea
    • B
      xfs: clear ail delwri queued bufs on unmount of shutdown fs · efc3289c
      Brian Foster 提交于
      In the typical unmount case, the AIL is forced out by the unmount
      sequence before the xfsaild task is stopped. Since AIL items are
      removed on writeback completion, this means that the AIL
      ->ail_buf_list delwri queue has been drained. This is not always
      true in the shutdown case, however.
      
      It's possible for buffers to sit on a delwri queue for a period of
      time across submission attempts if said items are locked or have
      been relogged and pinned since first added to the queue. If the
      attempt to log such an item results in a log I/O error, the error
      processing can shutdown the fs, remove the item from the AIL, stale
      the buffer (dropping the LRU reference) and clear its delwri queue
      state. The latter bit means the buffer will be released from a
      delwri queue on the next submission attempt, but this might never
      occur if the filesystem has shutdown and the AIL is empty.
      
      This means that such buffers are held indefinitely by the AIL delwri
      queue across destruction of the AIL. Aside from being a memory leak,
      these buffers can also hold references to in-core perag structures.
      The latter problem manifests as a generic/475 failure, reproducing
      the following asserts at unmount time:
      
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 151
        XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
      	file: fs/xfs/xfs_mount.c, line: 132
      
      To prevent this problem, clear the AIL delwri queue as a final step
      before xfsaild() exit. The !empty state should never occur in the
      normal case, so add an assert to catch unexpected problems going
      forward.
      
      [dgc: add comment explaining need for xfs_buf_delwri_cancel() after
       calling xfs_buf_delwri_submit_nowait().]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      efc3289c
    • C
      xfs: use offsetof() in place of offset macros for __xfsstats · 26ca3901
      Carlos Maiolino 提交于
      Most offset macro mess is used in xfs_stats_format() only, and we can
      simply get the right offsets using offsetof(), instead of several macros
      to mark the offsets inside __xfsstats structure.
      
      Replace all XFSSTAT_END_* macros by a single helper macro to get the
      right offset into __xfsstats, and use this helper in xfs_stats_format()
      directly.
      
      The quota stats code, still looks a bit cleaner when using XFSSTAT_*
      macros, so, this patch also defines XFSSTAT_START_XQMSTAT and
      XFSSTAT_END_XQMSTAT locally to that code. This also should prevent
      offset mistakes when updates are done into __xfsstats.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      26ca3901
    • C
      xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat · 41657e55
      Carlos Maiolino 提交于
      The addition of FIBT, RMAP and REFCOUNT changed the offsets into
      __xfssats structure.
      
      This caused xqmstat_proc_show() to display garbage data via
      /proc/fs/xfs/xqmstat, once it relies on the offsets marked via macros.
      
      Fix it.
      
      Fixes: 00f4e4f9 xfs: add rmap btree stats infrastructure
      Fixes: aafc3c24 xfs: support the XFS_BTNUM_FINOBT free inode btree type
      Fixes: 46eeb521 xfs: introduce refcount btree definitions
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      41657e55
    • D
      xfs: fix use-after-free race in xfs_buf_rele · 37fd1678
      Dave Chinner 提交于
      When looking at a 4.18 based KASAN use after free report, I noticed
      that racing xfs_buf_rele() may race on dropping the last reference
      to the buffer and taking the buffer lock. This was the symptom
      displayed by the KASAN report, but the actual issue that was
      reported had already been fixed in 4.19-rc1 by commit e339dd8d
      ("xfs: use sync buffer I/O for sync delwri queue submission").
      
      Despite this, I think there is still an issue with xfs_buf_rele()
      in this code:
      
              release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
              spin_lock(&bp->b_lock);
              if (!release) {
      .....
      
      If two threads race on the b_lock after both dropping a reference
      and one getting dropping the last reference so release = true, we
      end up with:
      
      CPU 0				CPU 1
      atomic_dec_and_lock()
      				atomic_dec_and_lock()
      				spin_lock(&bp->b_lock)
      spin_lock(&bp->b_lock)
      <spins>
      				<release = true bp->b_lru_ref = 0>
      				<remove from lists>
      				freebuf = true
      				spin_unlock(&bp->b_lock)
      				xfs_buf_free(bp)
      <gets lock, reading and writing freed memory>
      <accesses freed memory>
      spin_unlock(&bp->b_lock) <reads/writes freed memory>
      
      IOWs, we can't safely take bp->b_lock after dropping the hold
      reference because the buffer may go away at any time after we
      drop that reference. However, this can be fixed simply by taking the
      bp->b_lock before we drop the reference.
      
      It is safe to nest the pag_buf_lock inside bp->b_lock as the
      pag_buf_lock is only used to serialise against lookup in
      xfs_buf_find() and no other locks are held over or under the
      pag_buf_lock there. Make this clear by documenting the buffer lock
      orders at the top of the file.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      37fd1678
    • A
      xfs: Add attibute remove and helper functions · 068f985a
      Allison Henderson 提交于
      This patch adds xfs_attr_remove_args. These sub-routines remove
      the attributes specified in @args. We will use this later for setting
      parent pointers as a deferred attribute operation.
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      068f985a
    • A
      xfs: Add attibute set and helper functions · 2f3cd809
      Allison Henderson 提交于
      This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff.
      These sub-routines set the attributes specified in @args.
      We will use this later for setting parent pointers as a deferred
      attribute operation.
      
      [dgc: remove attr fork init code from xfs_attr_set_args().]
      [dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.]
      [dgc: correct sf add error handling.]
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2f3cd809
    • A
      xfs: Add helper function xfs_attr_try_sf_addname · 4c74a56b
      Allison Henderson 提交于
      This patch adds a subroutine xfs_attr_try_sf_addname
      used by xfs_attr_set.  This subrotine will attempt to
      add the attribute name specified in args in shortform,
      as well and perform error handling previously done in
      xfs_attr_set.
      
      This patch helps to pre-simplify xfs_attr_set for reviewing
      purposes and reduce indentation.  New function will be added
      in the next patch.
      
      [dgc: moved commit to helper function, too.]
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4c74a56b
    • A
      xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h · e2421f0b
      Allison Henderson 提交于
      This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
      since xfs_attr.c is in libxfs.  We will need these later in
      xfsprogs.
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e2421f0b
    • D
      xfs: issue log message on user force shutdown · 56668a5c
      Dave Chinner 提交于
      The kernel only issues a log message that it's been shut down when
      the filesystem triggers a shutdown itself. Hence there is no trace
      in the log when a shutdown is triggered manually from userspace.
      This can make it hard to see sequence of events in the log when
      things go wrong, so make sure we always log a message when a
      shutdown is run.
      
      While there, clean up the logic flow so we don't have to continually
      check if the shutdown trigger was user initiated before logging
      shutdown messages.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      56668a5c