- 25 3月, 2015 1 次提交
-
-
由 Dave Chinner 提交于
Whiteouts are used by overlayfs - it has a crazy convention that a whiteout is a character device inode with a major:minor of 0:0. Because it's not documented anywhere, here's an example of what RENAME_WHITEOUT does on ext4: # echo foo > /mnt/scratch/foo # echo bar > /mnt/scratch/bar # ls -l /mnt/scratch total 24 -rw-r--r-- 1 root root 4 Feb 11 20:22 bar -rw-r--r-- 1 root root 4 Feb 11 20:22 foo drwx------ 2 root root 16384 Feb 11 20:18 lost+found # src/renameat2 -w /mnt/scratch/foo /mnt/scratch/bar # ls -l /mnt/scratch total 20 -rw-r--r-- 1 root root 4 Feb 11 20:22 bar c--------- 1 root root 0, 0 Feb 11 20:23 foo drwx------ 2 root root 16384 Feb 11 20:18 lost+found # cat /mnt/scratch/bar foo # In XFS rename terms, the operation that has been done is that source (foo) has been moved to the target (bar), which is like a nomal rename operation, but rather than the source being removed, it have been replaced with a whiteout. We can't allocate whiteout inodes within the rename transaction due to allocation being a multi-commit transaction: rename needs to be a single, atomic commit. Hence we have several options here, form most efficient to least efficient: - use DT_WHT in the target dirent and do no whiteout inode allocation. The main issue with this approach is that we need hooks in lookup to create a virtual chardev inode to present to userspace and in places where we might need to modify the dirent e.g. unlink. Overlayfs also needs to be taught about DT_WHT. Most invasive change, lowest overhead. - create a special whiteout inode in the root directory (e.g. a ".wino" dirent) and then hardlink every new whiteout to it. This means we only need to create a single whiteout inode, and rename simply creates a hardlink to it. We can use DT_WHT for these, though using DT_CHR means we won't have to modify overlayfs, nor anything in userspace. Downside is we have to look up the whiteout inode on every operation and create it if it doesn't exist. - copy ext4: create a special whiteout chardev inode for every whiteout. This is more complex than the above options because of the lack of atomicity between inode creation and the rename operation, requiring us to create a tmpfile inode and then linking it into the directory structure during the rename. At least with a tmpfile inode crashes between the create and rename doesn't leave unreferenced inodes or directory pollution around. By far the simplest thing to do in the short term is to copy ext4. While it is the most inefficient way of supporting whiteouts, but as an initial implementation we can simply reuse existing functions and add a small amount of extra code the the rename operation. When we get full whiteout support in the VFS (via the dentry cache) we can then look to supporting DT_WHT method outlined as the first method of supporting whiteouts. But until then, we'll stick with what overlayfs expects us to be: dumb and stupid. Signed-off-by: NDave Chinner <dchinner@redhat.com>
-
- 23 2月, 2015 1 次提交
-
-
由 Dave Chinner 提交于
Al Viro noticed a generic set of issues to do with filehandle lookup racing with dentry cache setup. They involve a filehandle lookup occurring while an inode is being created and the filehandle lookup racing with the dentry creation for the real file. This can lead to multiple dentries for the one path being instantiated. There are a host of other issues around this same set of paths. The underlying cause is that file handle lookup only waits on inode cache instantiation rather than full dentry cache instantiation. XFS is mostly immune to the problems discovered due to it's own internal inode cache, but there are a couple of corner cases where races can happen. We currently clear the XFS_INEW flag when the inode is fully set up after insertion into the cache. Newly allocated inodes are inserted locked and so aren't usable until the allocation transaction commits. This, however, occurs before the dentry and security information is fully initialised and hence the inode is unlocked and available for lookups to find too early. To solve the problem, only clear the XFS_INEW flag for newly created inodes once the dentry is fully instantiated. This means lookups will retry until the XFS_INEW flag is removed from the inode and hence avoids the race conditions in questions. THis also means that xfs_create(), xfs_create_tmpfile() and xfs_symlink() need to finish the setup of the inode in their error paths if we had allocated the inode but failed later in the creation process. xfs_symlink(), in particular, needed a lot of help to make it's error handling match that of xfs_create(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 16 2月, 2015 2 次提交
-
-
由 Christoph Hellwig 提交于
Recall all outstanding pNFS layouts and truncates, writes and similar extent list modifying operations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Add operations to export pNFS block layouts from an XFS filesystem. See the previous commit adding the operations for an explanation of them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 12月, 2014 2 次提交
-
-
由 Carlos Maiolino 提交于
Adds a new function named xfs_cross_rename(), responsible for handling requests from sys_renameat2() using RENAME_EXCHANGE flag. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Carlos Maiolino 提交于
To be able to support RENAME_EXCHANGE flag from renameat2() system call, XFS must have its inode_operations updated, exporting .rename2 method, instead of .rename. This patch just replaces the (now old) .rename method by .rename2, using the same infra-structure, but checking rename flags. Calls to .rename2 using RENAME_EXCHANGE flag, although now handled inside XFS, still return -EINVAL. RENAME_NOREPLACE is handled via VFS and we don't need to care about it inside xfs_vn_rename. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 04 12月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
These functions are needed in userspace for repair and mkfs to do the right thing. Move them to libxfs so they can be easily shared. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 28 11月, 2014 3 次提交
-
-
由 Christoph Hellwig 提交于
More on-disk format consolidation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
More consolidatation for the on-disk format defintions. Note that the XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related to the on disk format, but depends on a CONFIG_ option. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 23 9月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
On a sub-page sized filesystem, truncating a mapped region down leaves us in a world of hurt. We truncate the pagecache, zeroing the newly unused tail, then punch blocks out from under the page. If we then truncate the file back up immediately, we expose that unmapped hole to a dirty page mapped into the user application, and that's where it all goes wrong. In truncating the page cache, we avoid unmapping the tail page of the cache because it still contains valid data. The problem is that it also contains a hole after the truncate, but nobody told the mm subsystem that. Therefore, if the page is dirty before the truncate, we'll never get a .page_mkwrite callout after we extend the file and the application writes data into the hole on the page. Hence when we come to writing that region of the page, it has no blocks and no delayed allocation reservation and hence we toss the data away. This patch adds code to the truncate up case to solve it, by ensuring the partial page at the old EOF is always cleaned after we do any zeroing and move the EOF upwards. We can't actually serialise the page writeback and truncate against page faults (yes, that problem AGAIN) so this is really just a best effort and assumes it is extremely unlikely that someone is concurrently writing to the page at the EOF while extending the file. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 04 8月, 2014 1 次提交
-
-
由 Brian Foster 提交于
The offset and length parameters are converted from bytes to basic blocks by xfs_vn_fiemap(). The BTOBB() converter rounds the value up to the nearest basic block. This leads to unexpected behavior when unaligned offsets are provided to FIEMAP. Fix the conversions of byte values to block values to cover the provided offsets. Round down the start offset to the nearest basic block. Calculate the end offset based on the provided values, round up and calculate length based on the start block offset. Reported-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 25 6月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 22 6月, 2014 1 次提交
-
-
由 Eric Sandeen 提交于
XFS_ERROR was designed long ago to trap return values, but it's not runtime configurable, it's not consistently used, and we can do similar error trapping with ftrace scripts and triggers from userspace. Just nuke XFS_ERROR and associated bits. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 15 5月, 2014 2 次提交
-
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 07 5月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
truncate_setsize() removes pages from the page cache, and hence requires page locks to be held. It is not valid to lock a page cache page inside a transaction context as we can hold page locks when we we reserve space for a transaction. If we do, then we expose an ABBA deadlock between log space reservation and page locks. That is, both the write path and writeback lock a page, then start a transaction for block allocation, which means they can block waiting for a log reservation with the page lock held. If we hold a log reservation and then do something that locks a page (e.g. truncate_setsize in xfs_setattr_size) then that page lock can block on the page locked and waiting for a log reservation. If the transaction that is waiting for the page lock is the only active transaction in the system that can free log space via a commit, then writeback will never make progress and so log space will never free up. This issue with xfs_setattr_size() was introduced back in 2010 by commit fa9b227e ("xfs: new truncate sequence") which moved the page cache truncate from outside the transaction context (what was xfs_itruncate_data()) to inside the transaction context as a call to truncate_setsize(). The reason truncate_setsize() was located where in this place was that we can't shouldn't change the file size until after we are in the transaction context and the operation will either succeed or shut down the filesystem on failure. However, block_truncate_page() already modifies the file contents before we enter the transaction context, so we can't really fulfill this guarantee in any way. Hence we may as well ensure that on success or failure, the in-memory inode and data is truncated away and that the application cleans up the mess appropriately. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 5月, 2014 1 次提交
-
-
由 Brian Foster 提交于
The current tmpfile handler does not initialize default ACLs. Doing so within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(), which is already used as a common create handler. xfs_vn_mknod() does not currently have a mechanism to determine whether to link the file into the namespace. Therefore, further abstract xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile() on the dentry when called via ->tmpfile(). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 17 4月, 2014 1 次提交
-
-
由 Brian Foster 提交于
xfstests generic/004 reproduces an ilock deadlock using the tmpfile interface when selinux is enabled. This occurs because xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The latter eventually calls into xfs_xattr_get() which attempts to get the lock again. E.g.: xfs_io D ffffffff81c134c0 4096 3561 3560 0x00000080 ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8 00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540 ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480 Call Trace: [<ffffffff8177f969>] schedule+0x29/0x70 [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120 [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30 [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0 [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs] [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs] [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs] [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs] [<ffffffff8124842f>] generic_getxattr+0x4f/0x70 [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650 [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20 [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30 [<ffffffff81237db0>] d_instantiate+0x50/0x70 [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0 [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs] [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs] [<ffffffff81230388>] path_openat+0x228/0x6a0 [<ffffffff810230f9>] ? sched_clock+0x9/0x10 [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8123101a>] do_filp_open+0x3a/0x90 [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0 [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210 [<ffffffff8121e4ce>] SyS_open+0x1e/0x20 [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b xfs_vn_tmpfile() also fails to initialize security on the newly created inode. Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction has been committed and the inode unlocked. Also, initialize security on the inode based on the parent directory provided via the tmpfile call. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 27 2月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
The change to add the IO lock to protect the directory extent map during readdir operations has cause lockdep to have a heart attack as it now sees a different locking order on inodes w.r.t. the mmap_sem because readdir has a different ordering to write(). Add a new lockdep class for directory inodes to avoid this false positive. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 10 2月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
The VFS doesn't set the proper ATTR_CTIME and ATTR_MTIME values for truncate, so filesystems have to manually add them. The introduction of xfs_setattr_time accidentally broke this special case an caused a regression in generic/313. Fix this by removing the local mask variable in xfs_setattr_size so that we only have a single place to keep the attribute information. cc: <stable@vger.kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 26 1月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
Also don't bother to set up a .get_acl method for symlinks as we do not support access control (ACLs or even mode bits) for symlinks in Linux, and create inodes with the proper mode instead of fixing it up later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 25 1月, 2014 1 次提交
-
-
由 Al Viro 提交于
don't bother open-coding it... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 1月, 2014 1 次提交
-
-
由 Zhi Yong Wu 提交于
Add two functions xfs_create_tmpfile() and xfs_vn_tmpfile() to support O_TMPFILE file creation. In contrast to xfs_create(), xfs_create_tmpfile() has a different log reservation to the regular file creation because there is no directory modification, and doesn't check if an entry can be added to the directory, but the reservation quotas is required appropriately, and finally its inode is added to the unlinked list. xfs_vn_tmpfile() add one O_TMPFILE method to VFS interface and directly invoke xfs_create_tmpfile(). Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 17 12月, 2013 1 次提交
-
-
由 Jie Liu 提交于
For CRC enabled v5 super block, change a file's ownership can simply trigger an ASSERT failure at xfs_setattr_nonsize() if both group and project quota are enabled, i.e, [ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621 [ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable] [ 305.383939] Call Trace: [ 305.385536] [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs] [ 305.387142] [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs] [ 305.388727] [<ffffffff811ca388>] notify_change+0x1a8/0x350 [ 305.390298] [<ffffffff811ac39d>] chown_common+0xfd/0x110 [ 305.391868] [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110 [ 305.393440] [<ffffffff811ad760>] SyS_lchown+0x20/0x30 [ 305.394995] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 305.399870] RIP [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs] This fix adjust the assertion to check if the super block support both quota inodes or not. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit 5a01dd54)
-
- 10 12月, 2013 1 次提交
-
-
由 Jie Liu 提交于
For CRC enabled v5 super block, change a file's ownership can simply trigger an ASSERT failure at xfs_setattr_nonsize() if both group and project quota are enabled, i.e, [ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621 [ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable] [ 305.383939] Call Trace: [ 305.385536] [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs] [ 305.387142] [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs] [ 305.388727] [<ffffffff811ca388>] notify_change+0x1a8/0x350 [ 305.390298] [<ffffffff811ac39d>] chown_common+0xfd/0x110 [ 305.391868] [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110 [ 305.393440] [<ffffffff811ad760>] SyS_lchown+0x20/0x30 [ 305.394995] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 305.399870] RIP [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs] This fix adjust the assertion to check if the super block support both quota inodes or not. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 07 12月, 2013 2 次提交
-
-
由 Christoph Hellwig 提交于
Split out a xfs_setattr_time helper to share code between truncate and regular setattr similar to xfs_setattr_mode. I might also have another caller growing for this in the near future. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
Remove the pointless tp argument, and properly align the local variable declarations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 31 10月, 2013 3 次提交
-
-
由 Dave Chinner 提交于
Page cache allocation doesn't always go through ->begin_write and hence we don't always get the opportunity to set the allocation context to GFP_NOFS. Failing to do this means we open up the direct relcaim stack to recurse into the filesystem and consume a significant amount of stack. On RHEL6.4 kernels we are seeing ra_submit() and generic_file_splice_read() from an nfsd context recursing into the filesystem via the inode cache shrinker and evicting inodes. This is causing truncation to be run (e.g EOF block freeing) and causing bmap btree block merges and free space btree block splits to occur. These btree manipulations are occurring with the call chain already 30 functions deep and hence there is not enough stack space to complete such operations. To avoid these specific overruns, we need to prevent the page cache allocation from recursing via direct reclaim. We can do that because the allocation functions take the allocation context from that which is stored in the mapping for the inode. We don't set that right now, so the default is GFP_HIGHUSER_MOVABLE, which is effectively a GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so when we initialise an inode, set the mapping gfp mask appropriately. This makes the use of AOP_FLAG_NOFS redundant from other parts of the XFS IO path, so get rid of it. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The remaining non-vectorised code for the directory structure is the node format blocks. This is shared with the attribute tree, and so is slightly more complex to vectorise. Introduce a "non-directory" directory ops structure that is attached to all non-directory inodes so that attribute operations can be vectorised for all inodes. Once we do this, we can vectorise all the da btree operations. Because this patch adds more infrastructure than it removes the binary size does not decrease: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6 Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Lots of the dir code now goes through switches to determine what is the correct on-disk format to parse. It generally involves a "xfs_sbversion_hasfoo" check, deferencing the superblock version and feature fields and hence touching several cache lines per operation in the process. Some operations do multiple checks because they nest conditional operations and they don't pass the information in a direct fashion between each other. Hence, add an ops vector to the xfs_inode structure that is configured when the inode is initialised to point to all the correct decode and encoding operations. This will significantly reduce the branchiness and cacheline footprint of the directory object decoding and encoding. This is the first patch in a series of conversion patches. It will introduce the ops structure, the setup of it and add the first operation to the vector. Subsequent patches will convert directory ops one at a time to keep the changes simple and obvious. Just this patch shows the benefit of such an approach on code size. Just converting the two shortform dir operations as this patch does decreases the built binary size by ~1500 bytes: $ size fs/xfs/xfs.o.orig fs/xfs/xfs.o.p1 text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 $ That's a significant decrease in the instruction cache footprint of the directory code for such a simple change, and indicates that this approach is definitely worth pursuing further. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 24 10月, 2013 4 次提交
-
-
由 Dave Chinner 提交于
Currently the xfs_inode.h header has a dependency on the definition of the BMAP btree records as the inode fork includes an array of xfs_bmbt_rec_host_t objects in it's definition. Move all the btree format definitions from xfs_btree.h, xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to xfs_format.h to continue the process of centralising the on-disk format definitions. With this done, the xfs inode definitions are no longer dependent on btree header files. The enables a massive culling of unnecessary includes, with close to 200 #include directives removed from the XFS kernel code base. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfs_trans.h has a dependency on xfs_log.h for a couple of structures. Most code that does transactions doesn't need to know anything about the log, but this dependency means that they have to include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header files and clean up the includes to be in dependency order. In doing this, remove the direct include of xfs_trans_reserve.h from xfs_trans.h so that we remove the dependency between xfs_trans.h and xfs_mount.h. Hence the xfs_trans.h include can be moved to the indicate the actual dependencies other header files have on it. Note that these are kernel only header files, so this does not translate to any userspace changes at all. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The on-disk format definitions for the directory and attribute structures are spread across 3 header files right now, only one of which is dedicated to defining on-disk structures and their manipulation (xfs_dir2_format.h). Pull all the format definitions into a single header file - xfs_da_format.h - and switch all the code over to point at that. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
All of the buffer operations structures are needed to be exported for xfs_db, so move them all to a common location rather than spreading them all over the place. They are verifying the on-disk format, so while xfs_format.h might be a good place, it is not part of the on disk format. Hence we need to create a new header file that we centralise these related definitions. Start by moving the bffer operations structures, and then also move all the other definitions that have crept into xfs_log_format.h and xfs_format.h as there was no other shared header file to put them in. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 22 10月, 2013 1 次提交
-
-
由 Christoph Hellwig 提交于
There is no reason to conditionally take the iolock inside xfs_setattr_size when we can let the caller handle it unconditionally, which just incrases the lock hold time for the case where it was previously taken internally by a few instructions. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 22 8月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
Add support for the file type field in directory entries so that readdir can return the type of the inode the dirent points to to userspace without first having to read the inode off disk. The encoding of the type field is a single byte that is added to the end of the directory entry name length. For all intents and purposes, it appends a "hidden" byte to the name field which contains the type information. As the directory entry is already of dynamic size, helpers are already required to access and decode the direct entry structures. Hence the relevent extraction and iteration helpers are updated to understand the hidden byte. Helpers for reading and writing the filetype field from the directory entries are also added. Only the read helpers are used by this patch. It also adds all the code necessary to read the type information out of the dirents on disk. Further we add the superblock feature bit and helpers to indicate that we understand the on-disk format change. This is not a compatible change - existing kernels cannot read the new format successfully - so an incompatible feature flag is added. We don't yet allow filesystems to mount with this flag yet - that will be added once write support is added. Finally, the code to take the type from the VFS, convert it to an XFS on-disk type and put it into the xfs_name structures passed around is added, but the directory code does not use this field yet. That will be in the next patch. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 16 8月, 2013 1 次提交
-
-
由 Dwight Engen 提交于
Use uint32 from init_user_ns for xfs internal uid/gid representation in xfs_icdinode, xfs_dqid_t. Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDwight Engen <dwight.engen@oracle.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 13 8月, 2013 2 次提交
-
-
由 Jie Liu 提交于
With the new xfs_trans_res structure has been introduced, the log reservation size, log count as well as log flags are pre-initialized at mount time. So it's time to refine xfs_trans_reserve() interface to be more neat. Also, introduce a new helper M_RES() to return a pointer to the mp->m_resv structure to simplify the input. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
There are a few small helper functions in xfs_util, all related to xfs_inode modifications. Move them all to xfs_inode.c so all xfs_inode operations are consiolidated in the one place. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-