- 16 5月, 2018 3 次提交
-
-
由 Dave Chinner 提交于
We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. Essentially, the machine was under memory pressure when the mount was being run, xfs_fs_fill_super() failed after allocating the xfs_mount and attaching it to sb->s_fs_info. It then cleaned up and freed the xfs_mount, but the sb->s_fs_info field still pointed to the freed memory. Hence when the superblock shrinker then ran it fell off the bad pointer. With the superblock shrinker problem fixed at teh VFS level, this stale s_fs_info pointer is still a problem - we use it unconditionally in ->put_super when the superblock is being torn down, and hence we can still trip over it after a ->fill_super call failure. Hence we need to clear s_fs_info if xfs-fs_fill_super() fails, and we need to check if it's valid in the places it can potentially be dereferenced after a ->fill_super failure. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
Similar to log_recovery_delay, this delay occurs between the VFS superblock being initialised and the xfs_mount being fully initialised. It also poisons the per-ag radix tree node so that it can be used for triggering shrinker races during mount such as the following: <run memory pressure workload in background> $ cat dirty-mount.sh #! /bin/bash umount -f /dev/pmem0 mkfs.xfs -f /dev/pmem0 mount /dev/pmem0 /mnt/test rm -f /mnt/test/foo xfs_io -fxc "pwrite 0 4k" -c fsync -c "shutdown" /mnt/test/foo umount /dev/pmem0 # let's crash it now! echo 30 > /sys/fs/xfs/debug/mount_delay mount /dev/pmem0 /mnt/test echo 0 > /sys/fs/xfs/debug/mount_delay umount /dev/pmem0 $ sudo ./dirty-mount.sh ..... [ 60.378118] CPU: 3 PID: 3577 Comm: fs_mark Tainted: G D W 4.16.0-rc5-dgc #440 [ 60.378120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 60.378124] RIP: 0010:radix_tree_next_chunk+0x76/0x320 [ 60.378127] RSP: 0018:ffffc9000276f4f8 EFLAGS: 00010282 [ 60.383670] RAX: a5a5a5a5a5a5a5a4 RBX: 0000000000000010 RCX: 000000000000001a [ 60.385277] RDX: 0000000000000000 RSI: ffffc9000276f540 RDI: 0000000000000000 [ 60.386554] RBP: 0000000000000000 R08: 0000000000000000 R09: a5a5a5a5a5a5a5a5 [ 60.388194] R10: 0000000000000006 R11: 0000000000000001 R12: ffffc9000276f598 [ 60.389288] R13: 0000000000000040 R14: 0000000000000228 R15: ffff880816cd6458 [ 60.390827] FS: 00007f5c124b9740(0000) GS:ffff88083fc00000(0000) knlGS:0000000000000000 [ 60.392253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.393423] CR2: 00007f5c11bba0b8 CR3: 000000035580e001 CR4: 00000000000606e0 [ 60.394519] Call Trace: [ 60.395252] radix_tree_gang_lookup_tag+0xc4/0x130 [ 60.395948] xfs_perag_get_tag+0x37/0xf0 [ 60.396522] xfs_reclaim_inodes_count+0x32/0x40 [ 60.397178] xfs_fs_nr_cached_objects+0x11/0x20 [ 60.397837] super_cache_count+0x35/0xc0 [ 60.399159] shrink_slab.part.66+0xb1/0x370 [ 60.400194] shrink_node+0x7e/0x1a0 [ 60.401058] try_to_free_pages+0x199/0x470 [ 60.402081] __alloc_pages_slowpath+0x3a1/0xd20 [ 60.403729] __alloc_pages_nodemask+0x1c3/0x200 [ 60.404941] cache_grow_begin+0x20b/0x2e0 [ 60.406164] fallback_alloc+0x160/0x200 [ 60.407088] kmem_cache_alloc+0x111/0x4e0 [ 60.408038] ? xfs_buf_rele+0x61/0x430 [ 60.408925] kmem_zone_alloc+0x61/0xe0 [ 60.409965] xfs_inode_alloc+0x24/0x1d0 ..... Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Darrick J. Wong 提交于
Rebuilding the reverse-mapping tree requires us to quiesce all inodes in the filesystem, so we must stop background reclamation of post-EOF and CoW prealloc blocks. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 10 5月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
It's just a connector between a transaction and a log item. There's a 1:1 relationship between a log item descriptor and a log item, and a 1:1 relationship between a log item descriptor and a transaction. Both relationships are created and terminated at the same time, so why do we even have the descriptor? Replace it with a specific list_head in the log item and a new log item dirtied flag to replace the XFS_LID_DIRTY flag. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY] Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 4月, 2018 1 次提交
-
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 26 3月, 2018 1 次提交
-
-
由 Brian Foster 提交于
Most of the generic data structures embedded in xfs_mount are dynamically initialized immediately after mp is allocated. A few fields are left out and initialized during the xfs_mountfs() sequence, after mp has been attached to the superblock. To clean this up and help prevent premature access of associated fields, refactor xfs_mount allocation and all dependent init calls into a new helper. This self-documents that all low level data structures (i.e., locks, trees, etc.) should be initialized before xfs_mount is attached to the superblock. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 3月, 2018 2 次提交
-
-
由 Darrick J. Wong 提交于
There's no point in allocating a transaction and locking the inode in preparation to clear cow blocks if there actually are any cow fork extents. Therefore, move the xfs_reflink_cancel_cow_range hunk to xfs_inactive and check the cow ifp first. This makes inode reclamation run faster. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Christoph Hellwig 提交于
Use the VFS dirty inode tracking for lazytime inodes only, and just log them in ->dirty_inode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 27 2月, 2018 1 次提交
-
-
由 Chengguang Xu 提交于
When specifying string type mount option (e.g., logdev) several times in a mount, current option parsing may cause memory leak. Hence, call kfree for previous one in this case. Signed-off-by: NChengguang Xu <cgxu519@icloud.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 02 2月, 2018 3 次提交
-
-
由 Darrick J. Wong 提交于
Reverse mapping has had a while to soak, so remove the experimental tag. Now that we've landed space metadata cross-referencing in scrub, the feature actually has a purpose. Reject rmap filesystems with an rt device until the code to support it is actually implemented. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com>
-
由 Darrick J. Wong 提交于
We don't support realtime filesystems with reflink either, so fail those mounts. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com>
-
由 Darrick J. Wong 提交于
Now that reflink is no longer experimental, reject attempts to mount with DAX until that whole mess gets sorted out. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 29 1月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
But reject reflink + DAX file systems for now until the code to support reflinks on DAX is actually implemented. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: port to 4.16] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 09 1月, 2018 1 次提交
-
-
由 Richard Wareing 提交于
- Reports realtime device free blocks in statfs calls if (realtime) inheritance bit is set on the inode of directory, or realtime flag in the case of files. This is a bit more intuitive, especially for use-cases which are using a much larger device for the realtime device. - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a realtime device on the mount, similar to the XFS_IS_REALTIME_INODE option. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NRichard Wareing <rwareing@fb.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 22 12月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
When we're remounting the filesystem readonly, remove all CoW preallocations prior to going ro. If the fs goes down after the ro remount, we never clean up the staging extents, which means xfs_check will trip over them on a subsequent run. Practically speaking, the next mount will clean them up too, so this is unlikely to be seen. Since we shut down the cowblocks cleaner on remount-ro, we also have to make sure we start it back up if/when we remount-rw. Found by adding clonerange to fsstress and running xfs/017. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 28 11月, 2017 1 次提交
-
-
由 Linus Torvalds 提交于
This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 10月, 2017 1 次提交
-
-
由 Matthew Garrett 提交于
[AV: in addition to the fix in previous commit] Signed-off-by: NMatthew Garrett <mjg59@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Reviewed-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 9月, 2017 1 次提交
-
-
由 Kenjiro Nakayama 提交于
xfs: Output warning message when discard option was enabled even though the device does not support discard In order to using discard function, it is necessary that not only xfs is mounted with discard option, but also the discard function is supported by the device. Current code doesn't output any message when users mount with discard option on unsupported device, so it is difficult to notice that it was not enabled actually. This patch adds the warning message to notice that discard option is not enabled due to unsupported device when the filesystem is mounted. Changes in v2 (Suggested by Brian Foster): - Move the unsupported device check into xfs_fs_fill_super(). - Clear the discard flag when device is unsupported. Signed-off-by: NKenjiro Nakayama <nakayamakenjiro@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 04 9月, 2017 1 次提交
-
-
由 Pan Bian 提交于
In function xfs_test_remount_options(), kfree() is used to free memory allocated by kmem_zalloc(). But it is better to use kmem_free(). Signed-off-by: NPan Bian <bianpan2016@163.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 01 9月, 2017 1 次提交
-
-
由 Dan Williams 提交于
The ->iomap_begin() operation is a hot path, so cache the fs_dax_get_by_host() result at mount time to avoid the incurring the hash lookup overhead on a per-i/o basis. Reported-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 17 7月, 2017 1 次提交
-
-
由 David Howells 提交于
Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 20 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 19 6月, 2017 1 次提交
-
-
由 NeilBrown 提交于
"flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 5月, 2017 1 次提交
-
-
由 Dan Williams 提交于
For configurations that do not enable DAX filesystems or drivers, do not require the DAX core to be built. Given that the 'direct_access' method has been removed from 'block_device_operations', we can also go ahead and remove the block-related dax helper functions from fs/block_dev.c to drivers/dax/super.c. This keeps dax details out of the block layer and lets the DAX core be built as a module in the FS_DAX=n case. Filesystems need to include dax.h to call bdev_dax_supported(). Cc: linux-xfs@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NJan Kara <jack@suse.com> Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 04 4月, 2017 1 次提交
-
-
由 Brian Foster 提交于
The log covering background task used to be part of the xfssyncd workqueue. That workqueue was removed as of commit 5889608d ("xfs: syncd workqueue is no more") and the associated work item scheduled to the xfs-log wq. The latter is used for log buffer I/O completion. Since xfs_log_worker() can invoke a log flush, a deadlock is possible between the xfs-log and xfs-cil workqueues. Consider the following codepath from xfs_log_worker(): xfs_log_worker() xfs_log_force() _xfs_log_force() xlog_cil_force() xlog_cil_force_lsn() xlog_cil_push_now() flush_work() The above is in xfs-log wq context and blocked waiting on the completion of an xfs-cil work item. Concurrently, the cil push in progress can end up blocked here: xlog_cil_push_work() xlog_cil_push() xlog_write() xlog_state_get_iclog_space() xlog_wait(&log->l_flush_wait, ...) The above is in xfs-cil context waiting on log buffer I/O completion, which executes in xfs-log wq context. In this scenario both workqueues are deadlocked waiting on eachother. Add a new workqueue specifically for the high level log covering and ail pushing worker, as was the case prior to commit 5889608d. Diagnosed-by: NDavid Jeffery <djeffery@redhat.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 08 3月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
We only want to reclaim preallocations from our periodic work item. Currently this is archived by looking for a dirty inode, but that check is rather fragile. Instead add a flag to xfs_reflink_cancel_cow_* so that the caller can ask for just cancelling unwritten extents in the COW fork. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: fix typos in commit message] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 2月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead we submit the discard requests and use another workqueue to release the extents from the extent busy list. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 09 12月, 2016 1 次提交
-
-
由 Dave Chinner 提交于
We always perform integrity operations now, so these mount options don't do anything. Deprecate them and mark them for removal in in a year. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 30 11月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which recently replaced i_mutex instead. This means we only have to take one lock instead of two in many fast path operations, and we can also shrink the xfs_inode structure. Thanks to the xfs_ilock family there is very little churn, the only thing of note is that we need to switch to use the lock_two_directory helper for taking the i_rwsem on two inodes in a few places to make sure our lock order matches the one used in the VFS. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJens Axboe <axboe@fb.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 10 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
There are a couple of places where we don't check the inode's reflink flag before calling into the reflink code. Fix those, and add some asserts so we don't make this mistake again. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reported-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 10月, 2016 5 次提交
-
-
由 Darrick J. Wong 提交于
Add the reflink feature flag to the set of recognized feature flags. This enables users to write to reflink filesystems. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Trim CoW reservations made on behalf of a cowextsz hint if they get too old or we run low on quota, so long as we don't have dirty data awaiting writeback or directio operations in progress. Garbage collection of the cowextsize extents are kept separate from prealloc extent reaping because setting the CoW prealloc lifetime to a (much) higher value than the regular prealloc extent lifetime has been useful for combatting CoW fragmentation on VM hosts where the VMs experience bursty write behaviors and we can keep the utilization ratios low enough that we don't start to run out of space. IOWs, it benefits us to keep the CoW fork reservations around for as long as we can unless we run out of blocks or hit inode reclaim. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
To gracefully handle the situation where a CoW operation turns a single refcount extent into a lot of tiny ones and then run out of space when a tree split has to happen, use the per-AG reserved block pool to pre-allocate all the space we'll ever need for a maximal btree. For a 4K block size, this only costs an overhead of 0.3% of available disk space. When reflink is enabled, we have an unfortunate problem with rmap -- since we can share a block billions of times, this means that the reverse mapping btree can expand basically infinitely. When an AG is so full that there are no free blocks with which to expand the rmapbt, the filesystem will shut down hard. This is rather annoying to the user, so use the AG reservation code to reserve a "reasonable" amount of space for rmap. We'll prevent reflinks and CoW operations if we think we're getting close to exhausting an AG's free space rather than shutting down, but this permanent reservation should be enough for "most" users. Hopefully. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> [hch@lst.de: ensure that we invalidate the freed btree buffer] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Due to the way the CoW algorithm in XFS works, there's an interval during which blocks allocated to handle a CoW can be lost -- if the FS goes down after the blocks are allocated but before the block remapping takes place. This is exacerbated by the cowextsz hint -- allocated reservations can sit around for a while, waiting to get used. Since the refcount btree doesn't normally store records with refcount of 1, we can use it to record these in-progress extents. In-progress blocks cannot be shared because they're not user-visible, so there shouldn't be any conflicts with other programs. This is a better solution than holding EFIs during writeback because (a) EFIs can't be relogged currently, (b) even if they could, EFIs are bound by available log space, which puts an unnecessary upper bound on how much CoW we can have in flight, and (c) we already have a mechanism to track blocks. At mount time, read the refcount records and free anything we find with a refcount of 1 because those were in-progress when the FS went down. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
When destroying the inode, cancel all pending reservations in the CoW fork so that all the reserved blocks go back to the free pile. In theory this sort of cleanup is only needed to clean up after write errors. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 05 10月, 2016 3 次提交
-
-
由 Darrick J. Wong 提交于
Log recovery will iget an inode to replay BUI items and iput the inode when it's done. Unfortunately, if the inode was unlinked, the iput will see that i_nlink == 0 and decide to truncate & free the inode, which prevents us from replaying subsequent BUIs. We can't skip the BUIs because we have to replay all the redo items to ensure that atomic operations complete. Since unlinked inode recovery will reap the inode anyway, we can safely introduce a new inode flag to indicate that an inode is in this 'unlinked recovery' state and should not be auto-reaped in the drop_inode path. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Implement deferred versions of the inode block map/unmap functions. These will be used in subsequent patches to make reflink operations atomic. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Create bmbt update intent/done log items to record redo information in the log. Because we roll transactions multiple times for reflink operations, we also have to track the status of the metadata updates that will be recorded in the post-roll transactions in case we crash before committing the final transaction. This mechanism enables log recovery to finish what was already started. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 04 10月, 2016 2 次提交
-
-
由 Darrick J. Wong 提交于
Plumb in the upper level interface to schedule and finish deferred refcount operations via the deferred ops mechanism. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Create refcount update intent/done log items to record redo information in the log. Because we need to roll transactions between updating the bmbt mapping and updating the reverse mapping, we also have to track the status of the metadata updates that will be recorded in the post-roll transactions, just in case we crash before committing the final transaction. This mechanism enables log recovery to finish what was already started. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-