- 07 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 22 5月, 2018 1 次提交
-
-
由 Dan Williams 提交于
When xfs is operating as the back-end of a pNFS block server, it prevents collisions between local and remote operations by requiring a lease to be held for remotely accessed blocks. Local filesystem operations break those leases before writing or mutating the extent map of the file. A similar mechanism is needed to prevent operations on pinned dax mappings, like device-DMA, from colliding with extent unmap operations. BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of layout breaking. Layouts are broken in the BREAK_WRITE case to ensure that layout-holders do not collide with local writes. Additionally, layouts are broken in the BREAK_UNMAP case to make sure the layout-holder has a consistent view of the file's extent map. While BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL. After this refactoring xfs_break_layouts() becomes the entry point for coordinating both types of breaks. Finally, xfs_break_leased_layouts() becomes just the BREAK_WRITE handler. Note that the unlock tracking is needed in a follow on change. That will coordinate retrying either break handler until both successfully test for a lease break while maintaining the lock state. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Reported-by: NDave Chinner <david@fromorbit.com> Reported-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 5月, 2018 1 次提交
-
-
由 Brian Foster 提交于
The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 5月, 2018 1 次提交
-
-
由 Brian Foster 提交于
We've had reports of online discard operations being sent from XFS on write-only workloads. These discards occur as a result of eofblocks trims that can occur after a large file copy completes. These discards are slightly confusing for users who might be paying close attention to online discards (i.e., vdo) due to performance sensitivity. They also happen to be spurious because freed post-eof blocks by definition have not been written to during the current allocation cycle. Update xfs_free_eofblocks() to skip discards that are purely attributed to eofblocks trims. This cuts down the number of spurious discards that may occur on write-only workloads due to normal preallocation activity. Note that discards of post-eof extents can still occur from other codepaths that do not isolate handling of post-eof blocks from those within eof. For example, file unlinks and truncates may still cause discards for any file blocks affected by the operation. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 4月, 2018 1 次提交
-
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 03 4月, 2018 1 次提交
-
-
由 Chandan Rajendra 提交于
xfs_dir_ialloc() rolls the current transaction when allocation of a new inode required the space manager to perform an allocation and replinish the Inode btree. None of the callers of xfs_dir_ialloc() need to know if the transaction was committed. Hence this commit removes the "committed" argument of xfs_dir_ialloc. Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 16 3月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
This helper doesn't add any real value over just calling iomap_zero_range directly, so remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 29 1月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Refactor xfs_lock_two_inodes to take separate locking modes for each inode. Specifically, this enables us to take a SHARED lock on one inode and an EXCL lock on the other. The lock class (MMAPLOCK/ILOCK) must be the same for each inode. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 09 1月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Replace the current haphazard dir2 shortform verifier callsites with a centralized verifier function that can be called either with the default verifier functions or with a custom set. This helps us strengthen integrity checking while providing us with flexibility for repair tools. xfs_repair wants this to be able to supply its own verifier functions when trying to fix possibly corrupt metadata. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 21 12月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them with separate i_flags. Right now we're abusing IEOFBLOCKS for both, which is totally bogus because we won't tag the inode with COWBLOCKS if IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS. Found by wiring up clonerange to fsstress in xfs/017. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 09 12月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
If we create a new file we will need an inode, and usually some metadata in the parent direction. Aiming for everything to go well despite the lack of a reservation leads to dirty transactions cancelled under a heavy create/delete load. This patch removes those nospace transactions, which will lead to slightly earlier ENOSPC on some workloads, but instead prevent file system shutdowns due to cancelling dirty transactions for others. A customer could observe assertations failures and shutdowns due to cancelation of dirty transactions during heavy NFS workloads as shown below: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace: 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd] 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0 2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]--- 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs] 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem 2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s) Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 27 10月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
We can simply use the i_rdev field in the Linux inode and just convert to and from the XFS dev_t when reading or logging/writing the inode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 03 7月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Switch to the iomap_seek_hole and iomap_seek_data helpers for implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that isn't needed any more. Based on patches from Andreas Gruenbacher <agruenba@redhat.com>. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 20 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 28 4月, 2017 1 次提交
-
-
由 Brian Foster 提交于
Inodes that are inserted into the perag tree but still under construction are flagged with the XFS_INEW bit. Most contexts either skip such inodes when they are encountered or have the ability to handle them. The runtime quotaoff sequence introduces a context that must wait for construction of such inodes to correctly ensure that all dquots in the fs are released. In anticipation of this, support the ability to wait on new inodes. Wake the appropriate bit when XFS_INEW is cleared. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 30 11月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which recently replaced i_mutex instead. This means we only have to take one lock instead of two in many fast path operations, and we can also shrink the xfs_inode structure. Thanks to the xfs_ilock family there is very little churn, the only thing of note is that we need to switch to use the lock_two_directory helper for taking the i_rwsem on two inodes in a few places to make sure our lock order matches the one used in the VFS. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJens Axboe <axboe@fb.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 10 11月, 2016 1 次提交
-
-
由 Brian Foster 提交于
Filesystem shutdown testing on an older distro kernel has uncovered an imbalanced locking pattern for the inode flush lock in xfs_reclaim_inode(). Specifically, there is a double unlock sequence between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the "reclaim:" label. This actually does not cause obvious problems on current kernels due to the current flush lock implementation. Older kernels use a counting based flush lock mechanism, however, which effectively breaks the lock indefinitely when an already unlocked flush lock is repeatedly unlocked. Though this only currently occurs on filesystem shutdown, it has reproduced the effect of elevating an fs shutdown to a system-wide crash or hang. As it turns out, the flush lock is not actually required for the reclaim logic in xfs_reclaim_inode() because by that time we have already cycled the flush lock once while holding ILOCK_EXCL. Therefore, remove the additional flush lock/unlock cycle around the 'reclaim:' label and update branches into this label to release the flush lock where appropriate. Add an assert to xfs_ifunlock() to help prevent future occurences of the same problem. Reported-by: NZorro Lang <zlang@redhat.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 10月, 2016 2 次提交
-
-
由 Darrick J. Wong 提交于
If the admin doesn't set a CoW extent size or a regular extent size hint, default to creating CoW reservations 32 blocks long to reduce fragmentation. Signed-off-by: NDarricK J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Create a per-inode extent size allocator hint for copy-on-write. This hint is separate from the existing extent size hint so that CoW can take advantage of the fragmentation-reducing properties of extent size hints without disabling delalloc for regular writes. The extent size hint that's fed to the allocator during a copy on write operation is the greater of the cowextsize and regular extsize hint. During reflink, if we're sharing the entire source file to the entire destination file and the destination file doesn't already have a cowextsize hint, propagate the source file's cowextsize hint to the destination file. Furthermore, zero the bulkstat buffer prior to setting the fields so that we don't copy kernel memory contents into userspace. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 05 10月, 2016 2 次提交
-
-
由 Darrick J. Wong 提交于
Introduce a new in-core fork for storing copy-on-write delalloc reservations and allocated extents that are in the process of being written out. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Log recovery will iget an inode to replay BUI items and iput the inode when it's done. Unfortunately, if the inode was unlinked, the iput will see that i_nlink == 0 and decide to truncate & free the inode, which prevents us from replaying subsequent BUIs. We can't skip the BUIs because we have to replay all the redo items to ensure that atomic operations complete. Since unlinked inode recovery will reap the inode anyway, we can safely introduce a new inode flag to indicate that an inode is in this 'unlinked recovery' state and should not be auto-reaped in the drop_inode path. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 04 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Add new per-AG refcount btree definitions to the per-AG structures. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 19 9月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
For long growing file writes we will usually already have the eofblocks tag set when adding more speculative preallocations. Add a flag in the inode to allow us to skip the the fairly expensive AG-wide spinlocks and multiple radix tree operations in that case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 03 8月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Drop the compatibility shims that we were using to integrate the new deferred operation mechanism into the existing code. No new code. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 20 7月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that we have the direct I/O kiocb flag there is no real need to sample the value inside of XFS, and the invis flag was always just partially used and isn't worth keeping this infrastructure around for. This also splits the read tracepoint into buffered vs direct as we've done for writes a long time ago. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 21 6月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
We'll want to use this code for large offsets now that we're skipping holes and unwritten extents efficiently. Also rename it to xfs_zero_range to be a bit more descriptive, and tell the caller if we actually did any zeroing. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 01 6月, 2016 1 次提交
-
-
由 Eric Sandeen 提交于
Al Viro noticed that xfs_lock_inodes should be static, and that led to ... a few more. These are just the easy ones, others require moving functions higher in source files, so that's not done here to keep this review simple. Signed-off-by: NEric Sandeen <sandeen@sandeen.net> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 4月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
In the next patch we'll set up different inode operations for inline vs out of line symlinks, for that we need to make sure the flags are already set up properly. [dchinner: added xfs_setup_iops() call to xfs_rename_alloc_whiteout()] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 09 2月, 2016 3 次提交
-
-
由 Dave Chinner 提交于
Move the di_mode value from the xfs_icdinode to the VFS inode, reducing the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole in the structure. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The VFS tracks the inode nlink just like the xfs_icdinode. We can remove the variable from the icdinode and use the VFS inode variable everywhere, reducing the size of the xfs_icdinode by a further 4 bytes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We currently carry around and log an entire inode core in the struct xfs_inode. A lot of the information in the inode core is duplicated in the VFS inode, but we cannot remove this duplication of infomration because the inode core is logged directly in xfs_inode_item_format(). Add a new function xfs_inode_item_format_core() that copies the inode core data into a struct xfs_icdinode that is pulled directly from the log vector buffer. This means we no longer directly copy the inode core, but copy the structures one member at a time. This will be slightly less efficient than copying, but will allow us to remove duplicate and unnecessary items from the struct xfs_inode. To enable us to do this, call the new structure a xfs_log_dinode, so that we know it's different to the physical xfs_dinode and the in-core xfs_icdinode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 08 2月, 2016 1 次提交
-
-
由 Eric Sandeen 提交于
Factor xfs_seek_hole_data into an unlocked helper which takes an xfs inode rather than a file for internal use. Also allow specification of "end" - the vfs lseek interface is defined such that any offset past eof/i_size shall return -ENXIO, but we will use this for quota code which does not maintain i_size, and we want to be able to SEEK_DATA past i_size as well. So the lseek path can send in i_size, and the quota code can determine its own ending offset. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 19 8月, 2015 1 次提交
-
-
由 Dave Chinner 提交于
Lockdep annotations are a maintenance nightmare. Locking has to be modified to suit the limitations of the annotations, and we're always having to fix the annotations because they are unable to express the complexity of locking heirarchies correctly. So, next up, we've got more issues with lockdep annotations for inode locking w.r.t. XFS_LOCK_PARENT: - lockdep classes are exclusive and can't be ORed together to form new classes. - IOLOCK needs multiple PARENT subclasses to express the changes needed for the readdir locking rework needed to stop the endless flow of lockdep false positives involving readdir calling filldir under the ILOCK. - there are only 8 unique lockdep subclasses available, so we can't create a generic solution. IOWs we need to treat the 3-bit space available to each lock type differently: - IOLOCK uses xfs_lock_two_inodes(), so needs: - at least 2 IOLOCK subclasses - at least 2 IOLOCK_PARENT subclasses - MMAPLOCK uses xfs_lock_two_inodes(), so needs: - at least 2 MMAPLOCK subclasses - ILOCK uses xfs_lock_inodes with up to 5 inodes, so needs: - at least 5 ILOCK subclasses - one ILOCK_PARENT subclass - one RTBITMAP subclass - one RTSUM subclass For the IOLOCK, split the space into two sets of subclasses. For the MMAPLOCK, just use half the space for the one subclass to match the non-parent lock classes of the IOLOCK. For the ILOCK, use 0-4 as the ILOCK subclasses, 5-7 for the remaining individual subclasses. Because they are now all different, modify xfs_lock_inumorder() to handle the nested subclasses, and to assert fail if passed an invalid subclass. Further, annotate xfs_lock_inodes() to assert fail if an invalid combination of lock primitives and inode counts are passed that would result in a lockdep subclass annotation overflow. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 23 2月, 2015 3 次提交
-
-
由 Dave Chinner 提交于
Al Viro noticed a generic set of issues to do with filehandle lookup racing with dentry cache setup. They involve a filehandle lookup occurring while an inode is being created and the filehandle lookup racing with the dentry creation for the real file. This can lead to multiple dentries for the one path being instantiated. There are a host of other issues around this same set of paths. The underlying cause is that file handle lookup only waits on inode cache instantiation rather than full dentry cache instantiation. XFS is mostly immune to the problems discovered due to it's own internal inode cache, but there are a couple of corner cases where races can happen. We currently clear the XFS_INEW flag when the inode is fully set up after insertion into the cache. Newly allocated inodes are inserted locked and so aren't usable until the allocation transaction commits. This, however, occurs before the dentry and security information is fully initialised and hence the inode is unlocked and available for lookups to find too early. To solve the problem, only clear the XFS_INEW flag for newly created inodes once the dentry is fully instantiated. This means lookups will retry until the XFS_INEW flag is removed from the inode and hence avoids the race conditions in questions. THis also means that xfs_create(), xfs_create_tmpfile() and xfs_symlink() need to finish the setup of the inode in their error paths if we had allocated the inode but failed later in the creation process. xfs_symlink(), in particular, needed a lot of help to make it's error handling match that of xfs_create(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
A new fsync vs power fail test in xfstests indicated that XFS can have unreliable data consistency when doing extending truncates that require block zeroing. The blocks beyond EOF get zeroed in memory, but we never force those changes to disk before we run the transaction that extends the file size and exposes those blocks to userspace. This can result in the blocks not being correctly zeroed after a crash. Because in-memory behaviour is correct, tools like fsx don't pick up any coherency problems - it's not until the filesystem is shutdown or the system crashes after writing the truncate transaction to the journal but before the zeroed data in the page cache is flushed that the issue is exposed. Fix this by also flushing the dirty data in memory region between the old size and new size when we've found blocks that need zeroing in the truncate process. Reported-by: NLiu Bo <bo.li.liu@oracle.com> cc: <stable@vger.kernel.org> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Right now we cannot serialise mmap against truncate or hole punch sanely. ->page_mkwrite is not able to take locks that the read IO path normally takes (i.e. the inode iolock) because that could result in lock inversions (read - iolock - page fault - page_mkwrite - iolock) and so we cannot use an IO path lock to serialise page write faults against truncate operations. Instead, introduce a new lock that is used *only* in the ->page_mkwrite path that is the equivalent of the iolock. The lock ordering in a page fault is i_mmaplock -> page lock -> i_ilock, and so in truncate we can i_iolock -> i_mmaplock and so lock out new write faults during the process of truncation. Because i_mmap_lock is outside the page lock, we can hold it across all the same operations we hold the i_iolock for. The only difference is that we never hold the i_mmaplock in the normal IO path and so do not ever have the possibility that we can page fault inside it. Hence there are no recursion issues on the i_mmap_lock and so we can use it to serialise page fault IO against inode modification operations that affect the IO path. This patch introduces the i_mmaplock infrastructure, lockdep annotations and initialisation/destruction code. Use of the new lock will be in subsequent patches. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 02 2月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
This logic is duplicated in xfs_file_fallocate and xfs_ioc_space, and we'll need another copy of it for pNFS block support. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 12月, 2014 1 次提交
-
-
由 Carlos Maiolino 提交于
Adds a new function named xfs_cross_rename(), responsible for handling requests from sys_renameat2() using RENAME_EXCHANGE flag. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 28 11月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
More consolidatation for the on-disk format defintions. Note that the XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related to the on disk format, but depends on a CONFIG_ option. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 02 10月, 2014 1 次提交
-
-
由 Brian Foster 提交于
If we write to the maximum file offset (2^63-2), XFS fails to log the inode size update when the page is flushed. For example: $ xfs_io -fc "pwrite `echo "2^63-1-1" | bc` 1" /mnt/file wrote 1/1 bytes at offset 9223372036854775806 1.000000 bytes, 1 ops; 0.0000 sec (22.711 KiB/sec and 23255.8140 ops/sec) $ stat -c %s /mnt/file 9223372036854775807 $ umount /mnt ; mount <dev> /mnt/ $ stat -c %s /mnt/file 0 This occurs because XFS calculates the new file size as io_offset + io_size, I/O occurs in block sized requests, and the maximum supported file size is not block aligned. Therefore, a write to the max allowable offset on a 4k blocksize fs results in a write of size 4k to offset 2^63-4096 (e.g., equivalent to round_down(2^63-1, 4096), or IOW the offset of the block that contains the max file size). The offset plus size calculation (2^63 - 4096 + 4096 == 2^63) overflows the signed 64-bit variable which goes negative and causes the > comparison to the on-disk inode size to fail. This returns 0 from xfs_new_eof() and results in no change to the inode on-disk. Update xfs_new_eof() to explicitly detect overflow of the local calculation and use the VFS inode size in this scenario. The VFS inode size is capped to the maximum and thus XFS writes the correct inode size to disk. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-