- 29 1月, 2018 10 次提交
-
-
由 Amir Goldstein 提交于
Commit 66f36464 ("xfs: remove if_rdev") moved storing of rdev value for special inodes to VFS inodes, but forgot to preserve the value of i_rdev when recycling a reclaimable xfs_inode. This was detected by xfstest overlay/017 with inodex=on mount option and xfs base fs. The test does a lookup of overlay chardev and blockdev right after drop caches. Overlayfs inodes hold a reference on underlying xfs inodes when mount option index=on is configured. If drop caches reclaim xfs inodes, before it relclaims overlayfs inodes, that can sometimes leave a reclaimable xfs inode and that test hits that case quite often. When that happens, the xfs inode cache remains broken (zere i_rdev) until the next cycle mount or drop caches. Fixes: 66f36464 ("xfs: remove if_rdev") Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Darrick J. Wong 提交于
Move all the inode and quota accounting updates out of xfs_bmap_btalloc in preparation for fixing some quota accounting problems with copy on write. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Refactor inode verifier error reporting into a non-libxfs function so that we aren't encoding the message format in libxfs. This also changes the kernel dmesg output to resemble buffer verifier errors more closely. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Fix all the inode number formats to be consistently (0x%llx) in all trace point definitions. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Always zero the di_flags2 field when we free the inode so that we never end up with an on-disk record for an unallocated inode that also has the reflink iflag set. This is in keeping with the general principle that only files can have the reflink iflag set, even though we'll zero out di_flags2 if we ever reallocate the inode. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Ensure that we've attached all the necessary dquots before performing reflink operations so that quota accounting is accurate. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Shan Hai 提交于
Remove the extent size hint and realtime inode relevant code from the xfs_bmapi_reserve_delalloc since it is not called on the inode with extent size hint set or on a realtime inode. Signed-off-by: NShan Hai <shan.hai@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Carlos Maiolino 提交于
Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: minor style cleanups] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Carlos Maiolino 提交于
By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: minor style changes] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Carlos Maiolino 提交于
Take advantage of the rework on xfs_buf log items list, to get rid of ths typedef for xfs_buf_log_item. This patch also fix some indentation alignment issues found along the way. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 18 1月, 2018 25 次提交
-
-
由 Darrick J. Wong 提交于
Fix compiler warning on non-debug build Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Currently, we don't check sb_agblocks or sb_agblklog when we validate the superblock, which means that we can fuzz garbage values into those values and the mount succeeds. This leads to all sorts of UBSAN warnings in xfs/350 since we can then coerce other parts of xfs into shifting by ridiculously large values. Once we've validated agblocks, make sure the agcount makes sense. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Eryu Guan reported seeing occasional hangs when running generic/269 with a new fsstress that supports clonerange/deduperange. The cause of this hang is an infinite loop when we convert the CoW fork extents from unwritten to real just prior to writing the pages out; the infinite loop happens because there's nothing in the CoW fork to convert, and so it spins forever. The fundamental issue here is that when we go to perform these CoW fork conversions, we're supposed to have an extent waiting for us, but the low space CoW reaper has snuck in and blown them away! There are four conditions that can dissuade the reaper from touching our file -- no reflink iflag; dirty page cache; writeback in progress; or directio in progress. We check the four conditions prior to taking the locks, but we neglect to recheck them once we have the locks, which is how we end up whacking the writeback that's in progress. Therefore, refactor the four checks into a helper function and call it once again once we have the locks to make sure we really want to reap the inode. While we're at it, add an ASSERT for this weird condition so that we'll fail noisily if we ever screw this up again. Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Tested-by: NEryu Guan <eguan@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
xfs_bmbt_irec.br_blockcount is declared as xfs_filblks_t, which is an unsigned 64-bit integer. Though the bmbt helpers will never set a value larger than 2^21 (since the underlying on-disk extent record has a length field that is only 21 bits wide), we should be a little defensive about checking that a bmbt record doesn't exceed what we're expecting or overflow into the next AG. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
A btree format inode fork with zero records makes no sense, so reject it if we see it, or else we can miscalculate memory allocations. Found by zeroes fuzzing {a,u3}.bmbt.numrecs in xfs/{374,378,412} with KASAN. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
In the attribute leaf verifier, we can check for obviously bad values of firstused and count so that later attempts at lasthash don't run off the end of the memory buffer. Found by ones fuzzing hdr.count in xfs/400 with KASAN. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
In xfs_scrub_dir_rec, we must walk through the directory block entries to arrive at the offset given by the hash structure. If we blindly trust the hash address, we can end up midway into a directory entry and stray outside the block. Found by lastbit fuzzing lents[3].address in xfs/390 with KASAN enabled. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Don't iunlock an unlocked inode, which can happen if the parent pointer scrubber bails out with sc->ip unlocked while trying to grab the parent directory inode. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Whenever we load a buffer, explicitly re-call the structure verifier to ensure that memory isn't corrupting things. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Use an inode's block mappings to cross-reference inode block counters. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
While we're scrubbing various btrees, cross-reference the records with the other metadata. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
During metadata btree scrub, we should cross-reference with the reference counts. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Cross reference the refcount data with the rmap data to check that the number of rmaps for a given block match the refcount of that block, and that CoW blocks (which are owned entirely by the refcountbt) are tracked as well. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
When scrubbing various btrees, we should cross-reference the records with the reverse mapping btree and ensure that traversing the btree finds the same number of blocks that the rmapbt thinks are owned by that btree. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Cross-reference the inode btrees with the other metadata when we scrub the filesystem. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Scrub should make sure that each bnobt record has a corresponding cntbt record. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
When we're scrubbing various btrees, cross-reference the records with the bnobt to ensure that we don't also think the space is free. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Create some stubs that will be used to cross-reference metadata records. The actual cross-referencing will be filled in by subsequent patches. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
When scanning a metadata btree block, cross-reference the block location with the free space btree and the reverse mapping btree to ensure that the rmapbt knows about the block and the bnobt does not. Add a mechanism to defer checks when we happen to be scanning the bnobt/rmapbt itself because it's less efficient to repeatedly clone and destroy the cursor. This patch provides the framework to make btree block owner checks happen; the actual meat will be added in subsequent patches. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
There are a few places where we make a libxfs api call on behalf of some object other than the one we're scrubbing but inadvertently call the regular process_error function. When this happens we mark the object corrupt even though it was corruption in /some other/ object that actually produced the -EFSCORRUPTED code. The correct output flag for these situations is SCRUB_OFLAG_XFAIL, not SCRUB_OFLAG_CORRUPT, so fix this now that we also have a helper to set these. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Create some helper functions that we'll use later to deal with problems we might encounter while cross referencing metadata with other metadata. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Add a couple of functions to the refcount btrees that will be used to cross-reference metadata against the refcountbt. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Add a couple of functions to the rmap btrees that will be used to cross-reference metadata against the rmapbt. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Add a couple of functions to the inode btrees that will be used to cross-reference metadata against the inobt. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Add a couple of functions to the free space btrees that will be used to cross-reference metadata against the bnobt/cntbt, and a generic btree function that provides the real implementation. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 17 1月, 2018 1 次提交
-
-
由 Brian Foster 提交于
Chris Dunlop reports a problem where an xattr operation fails, reports the following error to syslog and hangs during unmount: ================================================ [ BUG: lock held when returning to user space! ] ... ------------------------------------------------ <PID> is leaving the kernel with locks still held! 1 lock held by <PID>: #0: (sb_internal){......}, at: [<ffffffffa07692a3>] xfs_trans_alloc+0xe3/0x130 [xfs] The failure/shutdown occurs during deferred ops processing which leads to an error return from xfs_defer_finish() via xfs_attr_leaf_addname(). While the root cause of the failure is unknown corruption, the cause of the subsequent BUG above and unmount hang is failure to cancel the transaction before returning to userspace. The transaction is not cancelled because the out_defer_cancel error handling paths in the xfs_attr_[leaf|node]_[add|remove]name() functions clear args.trans without releasing the transaction. The callers therefore lose the reference to the transaction and fail to cancel it. Since xfs_attr_[set|remove]() always cancel args.trans when != NULL and xfs_defer_finish()->...->xfs_trans_roll() should always return with a valid transaction, update the leaf/node xattr functions to not reset args.trans in the error path responsible for cancelling deferred ops. Reported-by: NChris Dunlop <chris@onthe.net.au> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 13 1月, 2018 4 次提交
-
-
由 Brian Foster 提交于
XFS started using the perag metadata reservation pool for free inode btree blocks in commit 76d771b4 ("xfs: use per-AG reservations for the finobt"). To handle backwards compatibility, finobt blocks are accounted against the pool so long as the full reservation is available at mount time. Otherwise the ->m_inotbt_nores flag is set and the filesystem falls back to the traditional per-transaction finobt reservation. This commit has two problems: - finobt blocks are always accounted against the metadata reservation on allocation, regardless of ->m_inotbt_nores state - finobt blocks are never returned to the reservation pool on free The first problem affects reflink+finobt filesystems where the full finobt reservation is not available at mount time. finobt blocks are essentially stolen from the reflink reservation, putting refcountbt management at risk of allocation failure. The second problem is an unconditional leak of metadata reservation whenever finobt is enabled. Update the finobt block allocation callouts to consider ->m_inotbt_nores and account blocks appropriately. Blocks should be consistently accounted against the metadata pool when ->m_inotbt_nores is false and otherwise tagged as RESV_NONE. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Colin Ian King 提交于
It appears that the check for versions 4 or more is incorrect and is off-by-one. Fix this. Detected by CoverityScan, CID#1463775 ("Logically dead code") Fixes: ac503a4c ("xfs: refactor the geometry structure filling function") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Xiongwei Song 提交于
The mutex pag_ici_reclaim_lock of xfs_perag_t structure is initialized in xfs_initialize_perag. If happen errors in xfs_initialize_perag, or free resources in xfs_free_perag, wo need to destroy the mutex before free perag. Signed-off-by: NXiongwei Song <sxwjean@me.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Darrick J. Wong 提交于
Starting with commit 57e73442 ("vsprintf: refactor %pK code out of pointer"), the behavior of the raw '%p' printk format specifier was changed to print a 32-bit hash of the pointer value to avoid leaking kernel pointers into dmesg. For most situations that's good. This is /undesirable/ behavior when we're trying to debug XFS, however, so define a PTR_FMT that prints the actual pointer when we're in debug mode. Note that %p for tracepoints still prints the raw pointer, so in the long run we could consider rewriting some of these messages as tracepoints. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-