- 25 3月, 2015 14 次提交
-
-
由 Dave Chinner 提交于
Conflicts: fs/xfs/libxfs/xfs_bmap.c fs/xfs/xfs_inode.c
-
由 Joe Perches 提交于
added a positive error return value. This value filters up through the return layers and should be negative as the other return values are in the same function. Signed-off-by: NJoe Perches <joe@perches.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Byoungyoung Lee 提交于
xfs_mru_cache_insert() can be called from within transaction context during block allocation like so: write(2) .... xfs_get_blocks xfs_iomap_write_direct start transaction xfs_bmapi_write xfs_bmapi_allocate xfs_bmap_btalloc xfs_bmap_btalloc_filestreams xfs_filestream_new_ag xfs_filestream_pick_ag xfs_mru_cache_insert radix_tree_preload(GFP_KERNEL) In this case, GFP_KERNEL is incorrect and can potentially lead to deadlocks in memory reclaim. It should use GFP_NOFS allocations to avoid lock recursion problems. [dchinner: rewrote commit message] Signed-off-by: NByoungyoung Lee <blee@gatech.edu> Signed-off-by: NSanidhya Kashyap <sanidhya.gatech@gmail.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Scott Wood 提交于
Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Signed-off-by: NScott Wood <scottwood@freescale.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Fabian Frederick 提交于
Use icnodehdr for struct xfs_da3_icnode_hdr instead of nodehdr (already declared above). Signed-off-by: NFabian Frederick <fabf@skynet.be> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Fabian Frederick 提交于
new_parent and src_is_directory are only used in 0/1 context. Signed-off-by: NFabian Frederick <fabf@skynet.be> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
If xfs_filestream_get_parent() fails, we have a null pip, goto out, and attempt to IRELE(NULL). This causes a null pointer dereference and BUG(). Fix this by directly returning NULLAGNUMBER in this case. Reported-by: NAdrien Nader <adrien@notk.org> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
This code is redundant now that we have verifiers that sanity check the buffers as they are read from disk. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Conflicts: fs/xfs/xfs_inode.c
-
由 Dave Chinner 提交于
Whiteouts are used by overlayfs - it has a crazy convention that a whiteout is a character device inode with a major:minor of 0:0. Because it's not documented anywhere, here's an example of what RENAME_WHITEOUT does on ext4: # echo foo > /mnt/scratch/foo # echo bar > /mnt/scratch/bar # ls -l /mnt/scratch total 24 -rw-r--r-- 1 root root 4 Feb 11 20:22 bar -rw-r--r-- 1 root root 4 Feb 11 20:22 foo drwx------ 2 root root 16384 Feb 11 20:18 lost+found # src/renameat2 -w /mnt/scratch/foo /mnt/scratch/bar # ls -l /mnt/scratch total 20 -rw-r--r-- 1 root root 4 Feb 11 20:22 bar c--------- 1 root root 0, 0 Feb 11 20:23 foo drwx------ 2 root root 16384 Feb 11 20:18 lost+found # cat /mnt/scratch/bar foo # In XFS rename terms, the operation that has been done is that source (foo) has been moved to the target (bar), which is like a nomal rename operation, but rather than the source being removed, it have been replaced with a whiteout. We can't allocate whiteout inodes within the rename transaction due to allocation being a multi-commit transaction: rename needs to be a single, atomic commit. Hence we have several options here, form most efficient to least efficient: - use DT_WHT in the target dirent and do no whiteout inode allocation. The main issue with this approach is that we need hooks in lookup to create a virtual chardev inode to present to userspace and in places where we might need to modify the dirent e.g. unlink. Overlayfs also needs to be taught about DT_WHT. Most invasive change, lowest overhead. - create a special whiteout inode in the root directory (e.g. a ".wino" dirent) and then hardlink every new whiteout to it. This means we only need to create a single whiteout inode, and rename simply creates a hardlink to it. We can use DT_WHT for these, though using DT_CHR means we won't have to modify overlayfs, nor anything in userspace. Downside is we have to look up the whiteout inode on every operation and create it if it doesn't exist. - copy ext4: create a special whiteout chardev inode for every whiteout. This is more complex than the above options because of the lack of atomicity between inode creation and the rename operation, requiring us to create a tmpfile inode and then linking it into the directory structure during the rename. At least with a tmpfile inode crashes between the create and rename doesn't leave unreferenced inodes or directory pollution around. By far the simplest thing to do in the short term is to copy ext4. While it is the most inefficient way of supporting whiteouts, but as an initial implementation we can simply reuse existing functions and add a small amount of extra code the the rename operation. When we get full whiteout support in the VFS (via the dentry cache) we can then look to supporting DT_WHT method outlined as the first method of supporting whiteouts. But until then, we'll stick with what overlayfs expects us to be: dumb and stupid. Signed-off-by: NDave Chinner <dchinner@redhat.com>
-
由 Dave Chinner 提交于
Now that xfs_finish_rename() exists, there is no reason for xfs_cross_rename() to return to xfs_rename() to finish off the rename transaction. Drive the completion code into xfs_cross_rename() and handle all errors there so as to simplify the xfs_rename() code. Further, push the rename exchange target_ip check to early in the rename code so as to make the error handling easy and obviously correct. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Rather than use a jump label for the final transaction commit in the rename, factor it into a simple helper function and call it appropriately. This slightly reduces the spaghetti nature of xfs_rename. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The jump labels are ambiguous and unclear and some of the error paths are used inconsistently. Rules for error jumps are: - use out_trans_cancel for unmodified transaction context - use out_bmap_cancel on ENOSPC errors - use out_trans_abort when transaction is likely to be dirty. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When doing RENAME_WHITEOUT, we now have to lock 5 inodes into the rename transaction. This means we need to update xfs_sort_for_rename() and xfs_lock_inodes() to handle up to 5 inodes. Because of the vagaries of rename, this means we could have anywhere between 3 and 5 inodes locked into the transaction.... While xfs_lock_inodes() does not need anything other than an assert telling us we are passing more inodes that we ever thought we should see, it could do with a logic rework to remove all the indenting. This is not a functional change - it just makes the code a lot easier to read. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 2月, 2015 10 次提交
-
-
由 Dave Chinner 提交于
-
由 Dave Chinner 提交于
Conflicts: fs/xfs/xfs_super.c
-
由 Dave Chinner 提交于
-
由 Eric Sandeen 提交于
We recently removed deprecated sysctls; may as well remove deprecated mount options as well, we've stated that they'd be gone by now in the docs. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Test generic/224 is failing with a corruption being detected on one of Michael's test boxes. Debug that Michael added is indicating that the minleft trimming is resulting in an underflow: ..... before fixup: rlen 1 args->len 0 after xfs_alloc_fix_len : rlen 1 args->len 1 before goto out_nominleft: rlen 1 args->len 0 before fixup: rlen 1 args->len 0 after xfs_alloc_fix_len : rlen 1 args->len 1 after fixup: rlen 1 args->len 1 before fixup: rlen 1 args->len 0 after xfs_alloc_fix_len : rlen 1 args->len 1 after fixup: rlen 4294967295 args->len 4294967295 XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_alloc.c, line: 1424 The "goto out_nominleft:" indicates that we are getting close to ENOSPC in the AG, and a couple of allocations later we underflow and the corruption check fires in xfs_alloc_ag_vextent_size(). The issue is that the extent length fixups comaprisons are done with variables of xfs_extlen_t types. These are unsigned so an underflow looks like a really big value and hence is not detected as being smaller than the minimum length allowed for the extent. Hence the corruption check fires as it is noticing that the returned length is longer than the original extent length passed in. This can be easily fixed by ensuring we do the underflow test on signed values, the same way xfs_alloc_fix_len() prevents underflow. So we realise in future that these casts prevent underflows from going undetected, add comments to the code indicating this. Reported-by: NMichael L. Semon <mlsemon35@gmail.com> Tested-by: NMichael L. Semon <mlsemon35@gmail.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
If xfs_trans_reserve fails we don't cancel the transaction, and we'll leak the allocated transaction pointer. Spotted by Coverity. Signed-off-by: NEric Sandeen <ssandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Wang Sheng-Hui 提交于
The error messages document the reason for the checks better than the comment and the comments about volume mounts date back to Irix and so aren't relevant any more. So just remove the old and redundant comment. Signed-off-by: NWang Sheng-Hui <shhuiw@foxmail.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
Today, when the "failing async writes" get ratelimited, we see: XFS:: 62836 callbacks suppressed Aside from the extra ":" it's not entirely clear which message is being suppressed, especially if other messages or ratelimits are happening at the same time. Clarify this as i.e.: XFS (dm-11): Failing async write on buffer block 0x140090. Retrying async write. XFS: Failing async write: 62836 callbacks suppressed Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
There are times, when doing triage and forensics, that we would like to know whether a filesystem was unmounted, or if the plug was pulled without a clean unmount. Log unmounts at the same level (NOTICE) as we log mounts. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
We shouldn't get here with RENAME_EXCHANGE set and no target_ip, but let's be defensive, because xfs_cross_rename() will dereference it. Spotted by Coverity. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 23 2月, 2015 16 次提交
-
-
由 Eric Sandeen 提交于
Today, if we hit an XFS_WANT_CORRUPTED_RETURN we don't print any information about which filesystem hit it. Passing in the mp allows us to print the filesystem (device) name, which is a pretty critical piece of information. Tested by running fsfuzzer 'til I hit some. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
Today, if we hit an XFS_WANT_CORRUPTED_GOTO we don't print any information about which filesystem hit it. Passing in the mp allows us to print the filesystem (device) name, which is a pretty critical piece of information. Tested by running fsfuzzer 'til I hit some. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Al Viro noticed a generic set of issues to do with filehandle lookup racing with dentry cache setup. They involve a filehandle lookup occurring while an inode is being created and the filehandle lookup racing with the dentry creation for the real file. This can lead to multiple dentries for the one path being instantiated. There are a host of other issues around this same set of paths. The underlying cause is that file handle lookup only waits on inode cache instantiation rather than full dentry cache instantiation. XFS is mostly immune to the problems discovered due to it's own internal inode cache, but there are a couple of corner cases where races can happen. We currently clear the XFS_INEW flag when the inode is fully set up after insertion into the cache. Newly allocated inodes are inserted locked and so aren't usable until the allocation transaction commits. This, however, occurs before the dentry and security information is fully initialised and hence the inode is unlocked and available for lookups to find too early. To solve the problem, only clear the XFS_INEW flag for newly created inodes once the dentry is fully instantiated. This means lookups will retry until the XFS_INEW flag is removed from the inode and hence avoids the race conditions in questions. THis also means that xfs_create(), xfs_create_tmpfile() and xfs_symlink() need to finish the setup of the inode in their error paths if we had allocated the inode but failed later in the creation process. xfs_symlink(), in particular, needed a lot of help to make it's error handling match that of xfs_create(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
A new fsync vs power fail test in xfstests indicated that XFS can have unreliable data consistency when doing extending truncates that require block zeroing. The blocks beyond EOF get zeroed in memory, but we never force those changes to disk before we run the transaction that extends the file size and exposes those blocks to userspace. This can result in the blocks not being correctly zeroed after a crash. Because in-memory behaviour is correct, tools like fsx don't pick up any coherency problems - it's not until the filesystem is shutdown or the system crashes after writing the truncate transaction to the journal but before the zeroed data in the page cache is flushed that the issue is exposed. Fix this by also flushing the dirty data in memory region between the old size and new size when we've found blocks that need zeroing in the truncate process. Reported-by: NLiu Bo <bo.li.liu@oracle.com> cc: <stable@vger.kernel.org> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Jan Kara 提交于
For filesystems without separate project quota inode field in the superblock we just reuse project quota file for group quotas (and vice versa) if project quota file is allocated and we need group quota file. When we reuse the file, quota structures on disk suddenly have wrong type stored in d_flags though. Nobody really cares about this (although structure type reported to userspace was wrong as well) except that after commit 14bf61ff (quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units) assertion in xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I was testing without XFS_DEBUG so I didn't notice when submitting the above commit). Fix the problem by properly resetting ddq->d_flags when running quotacheck for a quota file. CC: stable@vger.kernel.org Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Extent swap operations are another extent manipulation operation that we need to ensure does not race against mmap page faults. The current code returns if the file is mapped prior to the swap being done, but it could potentially race against new page faults while the swap is in progress. Hence we should use the XFS_MMAPLOCK_EXCL for this operation, too. While there, fix the error path handling that can result in double unlocks of the inodes when cancelling the swapext transaction. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Now that truncate locks out new page faults, we no longer need to do special writeback hacks in truncate to work around potential races between page faults, page cache truncation and file size updates to ensure we get write page faults for extending truncates on sub-page block size filesystems. Hence we can remove the code in xfs_setattr_size() that handles this and update the comments around the code tha thandles page cache truncate and size updates to reflect the new reality. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Now we have the i_mmap_lock being held across the page fault IO path, we now add extent manipulation operation exclusion by adding the lock to the paths that directly modify extent maps. This includes truncate, hole punching and other fallocate based operations. The operations will now take both the i_iolock and the i_mmaplock in exclusive mode, thereby ensuring that all IO and page faults block without holding any page locks while the extent manipulation is in progress. This gives us the lock order during truncate of i_iolock -> i_mmaplock -> page_lock -> i_lock, hence providing the same lock order as the iolock provides the normal IO path without involving the mmap_sem. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Take the i_mmaplock over write page faults. These come through the ->page_mkwrite callout, so we need to wrap that calls with the i_mmaplock. This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock -> i_lock. Also, move the page_mkwrite wrapper to the same region of xfs_file.c as the read fault wrappers and add a tracepoint. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Take the i_mmaplock over read page faults. These come through the ->fault callout, so we need to wrap the generic implementation with the i_mmaplock. While there, add tracepoints for the read fault as it passes through XFS. This gives us a lock order of mmap_sem -> i_mmaplock -> page_lock -> i_lock. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Right now we cannot serialise mmap against truncate or hole punch sanely. ->page_mkwrite is not able to take locks that the read IO path normally takes (i.e. the inode iolock) because that could result in lock inversions (read - iolock - page fault - page_mkwrite - iolock) and so we cannot use an IO path lock to serialise page write faults against truncate operations. Instead, introduce a new lock that is used *only* in the ->page_mkwrite path that is the equivalent of the iolock. The lock ordering in a page fault is i_mmaplock -> page lock -> i_ilock, and so in truncate we can i_iolock -> i_mmaplock and so lock out new write faults during the process of truncation. Because i_mmap_lock is outside the page lock, we can hold it across all the same operations we hold the i_iolock for. The only difference is that we never hold the i_mmaplock in the normal IO path and so do not ever have the possibility that we can page fault inside it. Hence there are no recursion issues on the i_mmap_lock and so we can use it to serialise page fault IO against inode modification operations that affect the IO path. This patch introduces the i_mmaplock infrastructure, lockdep annotations and initialisation/destruction code. Use of the new lock will be in subsequent patches. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Now that there are no users of the bitfield based incore superblock modification API, just remove the whole damn lot of it, including all the bitfield definitions. This finally removes a lot of cruft that has been around for a long time. Credit goes to Christoph Hellwig for providing a great patch connecting all the dots to enale us to do this. This patch is derived from that work. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Introduce helper functions for modifying fields in the superblock into xfs_trans.c, the only caller of xfs_mod_incore_sb_batch(). We can then use these directly in xfs_trans_unreserve_and_mod_sb() and so remove another user of the xfs_mode_incore_sb() API without losing any functionality or scalability of the transaction commit code.. Based on a patch from Christoph Hellwig. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Add a new helper to modify the incore counter of free realtime extents. This matches the helpers used for inode and data block counters, and removes a significant users of the xfs_mod_incore_sb() interface. Based on a patch originally from Christoph Hellwig. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Now that the in-core superblock infrastructure has been replaced with generic per-cpu counters, we don't need it anymore. Nuke it from orbit so we are sure that it won't haunt us again... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
XFS has hand-rolled per-cpu counters for the superblock since before there was any generic implementation. The free block counter is special in that it is used for ENOSPC detection outside transaction contexts for for delayed allocation. This means that the counter needs to be accurate at zero. The current per-cpu counter code jumps through lots of hoops to ensure we never run past zero, but we don't need to make all those jumps with the generic counter implementation. The generic counter implementation allows us to pass a "batch" threshold at which the addition/subtraction to the counter value will be folded back into global value under lock. We can use this feature to reduce the batch size as we approach 0 in a very similar manner to the existing counters and their rebalance algorithm. If we use a batch size of 1 as we approach 0, then every addition and subtraction will be done against the global value and hence allow accurate detection of zero threshold crossing. Hence we can replace the handrolled, accurate-at-zero counters with generic percpu counters. Note: this removes just enough of the icsb infrastructure to compile without warnings. The rest will go in subsequent commits. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-