- 06 6月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
The directory code has a dependency on the struct xfs_mount to supply the directory block geometry. Block size, block log size, and other parameters are pre-caclulated in the struct xfs_mount or access directly from the superblock embedded in the struct xfs_mount. Extract all of this geometry information out of the struct xfs_mount and superblock and place it into a new struct xfs_da_geometry defined by the directory code. Allocate and initialise it at mount time, and attach it to the struct xfs_mount so it canbe passed back into the directory code appropriately rather than using the struct xfs_mount. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 15 5月, 2014 9 次提交
-
-
由 Dave Chinner 提交于
And we don't invert it properly when initialising the dquot lru list. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Invert it. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
And it should be negative. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 13 5月, 2014 5 次提交
-
-
由 Christoph Hellwig 提交于
And remove a very confused comment. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Replace xfs_attr_name_to_xname with a new xfs_attr_args_init helper that sets up the basic da_args structure without using a temporary xfs_name structure. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Also remove a useless ilock roundtrip for the first attr fork check, it's racy anyway and we redo it later under the ilock before we start the removal. Plus various minor style fixes to the new xfs_attr_remove. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
This allows doing an unlocked check if an attr for is present at all and slightly reduce the lock hold time if we actually do an attr get. Plus various minor style fixes to the new xfs_attr_get. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Plus various minor style fixes to the new xfs_attr_set. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 07 5月, 2014 3 次提交
-
-
由 Dave Chinner 提交于
Directory readahead can throw loud scary but harmless warnings when multiblock directories are in use a specific pattern of discontiguous blocks are found in the directory. That is, if a hole follows a discontiguous block, it will throw a warning like: XFS (dm-1): xfs_da_do_buf: bno 637 dir: inode 34363923462 XFS (dm-1): [00] br_startoff 637 br_startblock 1917954575 br_blockcount 1 br_state 0 XFS (dm-1): [01] br_startoff 638 br_startblock -2 br_blockcount 1 br_state 0 And dump a stack trace. This is because the readahead offset increment loop does a double increment of the block index - it does an increment for the loop iteration as well as increase the loop counter by the number of blocks in the extent. As a result, the readahead offset does not get incremented correctly for discontiguous blocks and hence can ask for readahead of a directory block from an offset part way through a directory block. If that directory block is followed by a hole, it will trigger a mapping warning like the above. The bad readahead will be ignored, though, because the main directory block read loop uses the correct mapping offsets rather than the readahead offset and so will ignore the bad readahead altogether. Fix the warning by ensuring that the readahead offset is correctly incremented. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Reports of a shutdown hang when fsyncing a directory have surfaced, such as this: [ 3663.394472] Call Trace: [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b If we trigger a shutdown in xlog_cil_push() from xlog_write(), we will never wake waiters on the current push sequence number, so anything waiting in xlog_cil_force_lsn() for that push sequence number to come up will not get woken and hence stall the shutdown. Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in the push abort handling, in the log shutdown code when waking all waiters, and adding a shutdown check in the sequence completion wait loops to ensure they abort when a wakeup due to a shutdown occurs. Reported-by: NBoris Ranto <branto@redhat.com> Reported-by: NEric Sandeen <esandeen@redhat.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
truncate_setsize() removes pages from the page cache, and hence requires page locks to be held. It is not valid to lock a page cache page inside a transaction context as we can hold page locks when we we reserve space for a transaction. If we do, then we expose an ABBA deadlock between log space reservation and page locks. That is, both the write path and writeback lock a page, then start a transaction for block allocation, which means they can block waiting for a log reservation with the page lock held. If we hold a log reservation and then do something that locks a page (e.g. truncate_setsize in xfs_setattr_size) then that page lock can block on the page locked and waiting for a log reservation. If the transaction that is waiting for the page lock is the only active transaction in the system that can free log space via a commit, then writeback will never make progress and so log space will never free up. This issue with xfs_setattr_size() was introduced back in 2010 by commit fa9b227e ("xfs: new truncate sequence") which moved the page cache truncate from outside the transaction context (what was xfs_itruncate_data()) to inside the transaction context as a call to truncate_setsize(). The reason truncate_setsize() was located where in this place was that we can't shouldn't change the file size until after we are in the transaction context and the operation will either succeed or shut down the filesystem on failure. However, block_truncate_page() already modifies the file contents before we enter the transaction context, so we can't really fulfill this guarantee in any way. Hence we may as well ensure that on success or failure, the in-memory inode and data is truncated away and that the application cleans up the mess appropriately. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 5月, 2014 2 次提交
-
-
由 Dave Chinner 提交于
Commit e461fcb1 ("xfs: remote attribute lookups require the value length") passes the remote attribute length in the xfs_da_args structure on lookup so that CRC calculations and validity checking can be performed correctly by related code. This, unfortunately has the side effect of changing the args->valuelen parameter in cases where it shouldn't. That is, when we replace a remote attribute, the incoming replacement stores the value and length in args->value and args->valuelen, but then the lookup which finds the existing remote attribute overwrites args->valuelen with the length of the remote attribute being replaced. Hence when we go to create the new attribute, we create it of the size of the existing remote attribute, not the size it is supposed to be. When the new attribute is much smaller than the old attribute, this results in a transaction overrun and an ASSERT() failure on a debug kernel: XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331 Fix this by keeping the remote attribute value length separate to the attribute value length in the xfs_da_args structure. The enables us to pass the length of the remote attribute to be removed without overwriting the new attribute's length. Also, ensure that when we save remote block contexts for a later rename we zero the original state variables so that we don't confuse the state of the attribute to be removes with the state of the new attribute that we just added. [Spotted by Brain Foster.] Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
The current tmpfile handler does not initialize default ACLs. Doing so within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(), which is already used as a common create handler. xfs_vn_mknod() does not currently have a mechanism to determine whether to link the file into the namespace. Therefore, further abstract xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile() on the dentry when called via ->tmpfile(). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 05 5月, 2014 5 次提交
-
-
由 From: Tuomas Tynkkynen 提交于
xfs_{compat_,}attrmulti_by_handle could return an errno with incorrect sign in some cases. While at it, make sure ENOMEM is returned instead of E2BIG if kmalloc fails. Signed-off-by: NTuomas Tynkkynen <tuomas.tynkkynen@iki.fi> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
group and project quota hints are currently stored on the user dquot. If we are attaching quotas to the inode, then the group and project dquots are stored as hints on the user dquot to save having to look them up again later. The thing is, the hints are not used for that inode for the rest of the life of the inode - the dquots are attached directly to the inode itself - so the only time the hints are used is when an inode first has dquots attached. When the hints on the user dquot don't match the dquots being attache dto the inode, they are then removed and replaced with the new hints. If a user is concurrently modifying files in different group and/or project contexts, then this leads to thrashing of the hints attached to user dquot. If user quotas are not enabled, then hints are never even used. So, if the hints are used to avoid the cost of the lookup, is the cost of the lookup significant enough to justify the hint infrstructure? Maybe it was once, when there was a global quota manager shared between all XFS filesystems and was hash table based. However, lookups are now much simpler, requiring only a single lock and radix tree lookup local to the filesystem and no hash or LRU manipulations to be made. Hence the cost of lookup is much lower than when hints were implemented. Turns out that benchmarks show that, too, with thir being no differnce in performance when doing file creation workloads as a single user with user, group and project quotas enabled - the hints do not make the code go any faster. In fact, removing the hints shows a 2-3% reduction in the time it takes to create 50 million inodes.... So, let's just get rid of the hints and the complexity around them. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
Coverity noticed that if we sent junk into xfs_qm_scall_trunc_qfiles(), we could get back an uninitialized error value. So sanitize the flags we will accept, and initialize error anyway for good measure. (This bug may have been introduced via c61a9e39). Should resolve Coverity CID 1163872. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
The Q_XQUOTARM quotactl was not working properly, because we weren't passing around proper flags. The xfs_fs_set_xstate() ioctl handler used the same flags for Q_XQUOTAON/OFF as well as for Q_XQUOTARM, but Q_XQUOTAON/OFF look for XFS_UQUOTA_ACCT, XFS_UQUOTA_ENFD, XFS_GQUOTA_ACCT etc, i.e. quota type + state, while Q_XQUOTARM looks only for the type of quota, i.e. XFS_DQ_USER, XFS_DQ_GROUP etc. Unfortunately these flag spaces overlap a bit, so we got semi-random results for Q_XQUOTARM; i.e. the value for XFS_DQ_USER == XFS_UQUOTA_ACCT, etc. yeargh. Add a new quotactl op vector specifically for the QUOTARM operation, since it operates with a different flag space. This has been broken more or less forever, AFAICT. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We have had this code in the kernel for over a year now and have shaken all the known issues out of the code over the past few releases. It's now time to remove the experimental warnings during mount and fully support the new filesystem format in production systems. Remove the experimental warning, and add a version number to the initial "mounting filesystem" message to tell use what type of filesystem is being mounted. Also, remove the temporary inode cluster size output at mount time now we know that this code works fine. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 4月, 2014 11 次提交
-
-
由 Brian Foster 提交于
Add the finobt feature bit to the list of known features. As of this point, the kernel code knows how to mount and manage both finobt and non-finobt formatted filesystems. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Define the XFS_FSOP_GEOM_FLAGS_FINOBT fs geometry flag and set the associated bit if the filesystem supports the free inode btree. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Add finobt support to growfs. Initialize the agi root/level fields and the root finobt block. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
An inode free operation can have several effects on the finobt. If all inodes have been freed and the chunk deallocated, we remove the finobt record. If the inode chunk was previously full, we must insert a new record based on the existing inobt record. Otherwise, we modify the record in place. Create the xfs_difree_finobt() function to identify the potential scenarios and update the finobt appropriately. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Refactor xfs_difree() in preparation for the finobt. xfs_difree() performs the validity checks against the ag and reads the agi header. The work of physically updating the inode allocation btree is pushed down into the new xfs_difree_inobt() helper. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Replace xfs_dialloc_ag() with an implementation that looks for a record in the finobt. The finobt only tracks records with at least one free inode. This eliminates the need for the intra-ag scan in the original algorithm. Once the inode is allocated, update the finobt appropriately (possibly removing the record) as well as the inobt. Move the original xfs_dialloc_ag() algorithm to xfs_dialloc_ag_inobt() and fall back as such if finobt support is not enabled. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
A newly allocated inode chunk, by definition, has at least one free inode, so a record is always inserted into the finobt. Create the xfs_inobt_insert() helper from existing code to insert a record in an inobt based on the provided BTNUM. Update xfs_ialloc_ag_alloc() to invoke the helper for the existing XFS_BTNUM_INO tree and XFS_BTNUM_FINO tree, if enabled. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Create the xfs_calc_finobt_res() helper to calculate the finobt log reservation for inode allocation and free. Update XFS_IALLOC_SPACE_RES() to reserve blocks for the additional finobt insertion on inode allocation. Create XFS_IFREE_SPACE_RES() to reserve blocks for the potential finobt record insertion on inode free (i.e., if an inode chunk was previously fully allocated). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Define the AGI fields for the finobt root/level and add magic numbers. Update the btree code to add support for the new XFS_BTNUM_FINOBT inode btree. The finobt root block is reserved immediately following the inobt root block in the AG. Update XFS_PREALLOC_BLOCKS() to determine the starting AG data block based on whether finobt support is enabled. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
Reserve a v5 read-only compatibility feature bit for the finobt and create the xfs_sb_version_hasfinobt() helper to determine whether an fs has the feature enabled. The finobt does not change existing on-disk structures, but must remain consistent with the ialloc btree. Modifications from older kernels would violate that constrant. Therefore, we restrict older kernels to read-only mounts of finobt-enabled filesystems. Note that this does not yet enable the ability to rw mount a finobt fs (by setting the feature bit in the XFS_SB_FEAT_RO_COMPAT_ALL mask). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
The introduction of the free inode btree (finobt) requires that xfs_ialloc_btree.c handle multiple trees. Refactor xfs_ialloc_btree.c so the caller specifies the btree type on cursor initialization to prepare for addition of the finobt. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 23 4月, 2014 4 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
There is no good reason to create a filestream when a directory entry is created. Delay it until the first allocation happens to simply the code and reduce the amount of mru cache lookups we do. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
We only have very few of these around, and allocation isn't that much of a hot path. Remove the slab cache to simplify the code, and to not waste any resources for the usual case of not having any inodes that use the filestream allocator. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
In Linux we will always be able to find a parent inode for file that are undergoing I/O. Use this to simply the file stream allocator by only keeping track of parent inodes. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-