- 09 2月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
When an inode has already be flushed delayed write, xfs_inode_clean() returns true and hence xfs_fs_write_inode() can return on a synchronous inode write without having written the inode. Currently these sycnhronous writes only come sync(1), unmount, a sycnhronous NFS export and cachefiles so should be relatively rare and out of common performance paths. Realistically, a synchronous inode write is not necessary here; we can avoid writing the inode by logging any non-transactional changes that are pending. This needs to be done with synchronous transactions, but it avoids seeking between the log and inode clusters as we do now. We don't force the log if the inode is pinned, though, so this differs from the fsync case. For normal sys_sync and unmount behaviour this is fine because we do a synchronous log force in xfs_sync_data which is called from the ->sync_fs code. It does however break the NFS synchronous export guarantees for now, but work is under way to fix this at a higher level or for the higher level to provide an additional flag in the writeback control to tell us that a log force is needed. Portions of this patch are based on work from Dave Chinner. Signed-off-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
- 02 2月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
We always need to flush the disk write cache and can't skip it just because the no inode attributes have changed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <david@fromorbit.com>
-
- 04 2月, 2010 1 次提交
-
-
由 Dave Chinner 提交于
dquots are never flushed asynchronously. Remove the flag and the async write support from the flush function. Make the default flush a delwri flush to make the inode flush code, which leaves the XFS_QMOPT_SYNC the only flag remaining. Convert that to use SYNC_WAIT instead, just like the inode flush code. V2: - just pass flush flags straight through Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 26 1月, 2010 2 次提交
-
-
由 Dave Chinner 提交于
xfs_qm_dqflock_pushbuf_wait() does a very similar trick to item pushing used to do to flush out delayed write dquot buffers. Change it to use the new promotion method rather than an async flush. Also, xfs_qm_dqflock_pushbuf_wait() can return without the flush lock held, yet the callers make the assumption that after this call the flush lock is held. Always return with the flush lock held. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
Currently when the xfsbufd writes delayed write buffers, it pushes them to disk in the order they come off the delayed write list. If there are lots of buffers ѕpread widely over the disk, this results in overwhelming the elevator sort queues in the block layer and we end up losing the posibility of merging adjacent buffers to minimise the number of IOs. Use the new generic list_sort function to sort the delwri dispatch queue before issue to ensure that the buffers are pushed in the most friendly order possible to the lower layers. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 02 2月, 2010 1 次提交
-
-
由 Dave Chinner 提交于
All buffers logged into the AIL are marked as delayed write. When the AIL needs to push the buffer out, it issues an async write of the buffer. This means that IO patterns are dependent on the order of buffers in the AIL. Instead of flushing the buffer, promote the buffer in the delayed write list so that the next time the xfsbufd is run the buffer will be flushed by the xfsbufd. Return the state to the xfsaild that the buffer was promoted so that the xfsaild knows that it needs to cause the xfsbufd to run to flush the buffers that were promoted. Using the xfsbufd for issuing the IO allows us to dispatch all buffer IO from the one queue. This means that we can make much more enlightened decisions on what order to flush buffers to disk as we don't have multiple places issuing IO. Optimisations to xfsbufd will be in a future patch. Version 2 - kill XFS_ITEM_FLUSHING as it is now unused. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 06 2月, 2010 2 次提交
-
-
由 Dave Chinner 提交于
We currently do background inode flush asynchronously, resulting in inodes being written in whatever order the background writeback issues them. Not only that, there are also blocking and non-blocking asynchronous inode flushes, depending on where the flush comes from. This patch completely removes asynchronous inode writeback. It removes all the strange writeback modes and replaces them with either a synchronous flush or a non-blocking delayed write flush. That is, inode flushes will only issue IO directly if they are synchronous, and background flushing may do nothing if the operation would block (e.g. on a pinned inode or buffer lock). Delayed write flushes will now result in the inode buffer sitting in the delwri queue of the buffer cache to be flushed by either an AIL push or by the xfsbufd timing out the buffer. This will allow accumulation of dirty inode buffers in memory and allow optimisation of inode cluster writeback at the xfsbufd level where we have much greater queue depths than the block layer elevators. We will also get adjacent inode cluster buffer IO merging for free when a later patch in the series allows sorting of the delayed write buffers before dispatch. This effectively means that any inode that is written back by background writeback will be seen as flush locked during AIL pushing, and will result in the buffers being pushed from there. This writeback path is currently non-optimal, but the next patch in the series will fix that problem. A side effect of this delayed write mechanism is that background inode reclaim will no longer directly flush inodes, nor can it wait on the flush lock. The result is that inode reclaim must leave the inode in the reclaimable state until it is clean. Hence attempts to reclaim a dirty inode in the background will simply skip the inode until it is clean and this allows other mechanisms (i.e. xfsbufd) to do more optimal writeback of the dirty buffers. As a result, the inode reclaim code has been rewritten so that it no longer relies on the ambiguous return values of xfs_iflush() to determine whether it is safe to reclaim an inode. Portions of this patch are derived from patches by Christoph Hellwig. Version 2: - cleanup reclaim code as suggested by Christoph - log background reclaim inode flush errors - just pass sync flags to xfs_iflush Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
A.K.A.: don't rely on xfs_iflush() return value in reclaim We have gradually been moving checks out of the reclaim code because they are duplicated in xfs_iflush(). We've had a history of problems in this area, and many of them stem from the overloading of the return values from xfs_iflush() and interaction with inode flush locking to determine if the inode is safe to reclaim. With the desire to move to delayed write flushing of inodes and non-blocking inode tree reclaim walks, the overloading of the return value of xfs_iflush makes it very difficult to determine the correct thing to do next. This patch explicitly re-adds the checks to the inode reclaim code, removing the reliance on the return value of xfs_iflush() to determine what to do next. It also means that we can clearly document all the inode states that reclaim must handle and hence we can easily see that we handled all the necessary cases. This also removes the need for the xfs_inode_clean() check in xfs_iflush() as all callers now check this first (safely). Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 09 2月, 2010 1 次提交
-
-
由 Eric Sandeen 提交于
This mangles the reserved blocks counts a little more. 1) add a helper function for the default reserved count 2) add helper functions to save/restore counts on ro/rw 3) save/restore reserved blocks on freeze/thaw 4) disallow changing reserved count while readonly V2: changed field name to match Dave's changes Signed-off-by: NEric Sandeen <sandeen@sandeen.net> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 26 1月, 2010 2 次提交
-
-
由 Dave Chinner 提交于
Because they cause warnings in static inline functions conditionally compiled into XFS from the VFS (e.g. fsnotify). Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
If we hold onto reserved blocks when doing a remount,ro we end up writing the blocks used count to disk that includes the reserved blocks. Reserved blocks are not actually used, so this results in the values in the superblock being incorrect. Hence if we run xfs_check or xfs_repair -n while the filesystem is mounted remount,ro we end up with an inconsistent filesystem being reported. Also, running xfs_copy on the remount,ro filesystem will result in an inconsistent image being generated. To fix this, unreserve the blocks when doing the remount,ro, and reserved them again on remount,rw. This way a remount,ro filesystem will appear consistent on disk to all utilities. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 22 1月, 2010 8 次提交
-
-
由 Christoph Hellwig 提交于
A "df" run on an NFS client of an exported XFS file system reports the wrong information for "available" blocks. When a block quota is enforced, the amount reported as free is limited by the quota, but the amount reported available is not (and should be). Reported-by: NGuk-Bong, Kwon <gbkwon@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
We use the KM_LARGE flag to make kmem_alloc and friends use vmalloc if necessary. As we only need this for a few boot/mount time allocations just switch to explicit vmalloc calls there. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Remove the XFS_LOG_FORCE argument which was always set, and the XFS_LOG_URGE define, which was never used. Split xfs_log_force into a two helpers - xfs_log_force which forces the whole log, and xfs_log_force_lsn which forces up to the specified LSN. The underlying implementations already were entirely separate, as were the users. Also re-indent the new _xfs_log_force/_xfs_log_force which previously had a weird coding style. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
This macro only obsfucates the log item type assignments, so kill it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Currently we define aliases for the buffer flags in various namespaces, which only adds confusion. Remove all but the XBF_ flags to clean this up a bit. Note that we still abuse XFS_B_ASYNC/XBF_ASYNC for some non-buffer uses, but I'll clean that up later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Wire up quota_send_warning to send quota warnings over netlink. This is used by various desktops to show user quota warnings. Tested by running the quota_nld daemon while running the xfstest quota tests and observing the warnings. I'll see how I can get a more formal testcase for it written. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Move the error code selection after the goto label and fold the xfs_quota_error helper into it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
The option is unused and one of the few remaining users of xfs_bawrite, so let's get rid of it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 20 1月, 2010 10 次提交
-
-
由 Dave Chinner 提交于
gcc warns of an array subscript out of bounds in xfs_mod_sb(). The code is written in such a way that if the array subscript is out of bounds, then it will assert fail. Rearrange the code to avoid the bounds check warning. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
Initialise the xfs_bmalloca_t structure to zero to avoid uninitialised variable warnings. This is done by zeroing the arg structure rather than using the uninitialised_var() trick so we know for certain that the structure is correctly initialised as xfs_bmapi is a very complex function and it is difficult to prove warnings are spurious. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
The -fno-unsigned-char directive has no effect anymore as the XFs build is clean. However, the kernel build hides pointer sign differences so turn that back on so that we can clean up all the mismatches prior to a userspace code resync. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
We are now consistently using unsigned char strings for names so fix up the remaining warnings in the dir2 code to complete the cleanup. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
To be consistent with the directory code, the attr code should use unsigned names. Convert the names from the vfs at the highest level to unsigned, and ænsure they are consistenly used as unsigned down to disk. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
xfs_buf_iomove() uses xfs_caddr_t as it's parameter types, but it doesn't care about the signedness of the variables as it is just copying the data. Change the prototype to use void * so that we don't get sign warnings at call sites. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
For consistency with the result of the code. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
To be consistent across the codebase, convert the dirnameops to pass the directory names by unsigned char strings. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
dmops uses a signed char for it's namespace event. To be consistent with the rest of the code, convert them to unsigned char for the namespace string. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
Convert the struct xfs_name to use unsigned chars for the name strings to match both what is stored on disk (__uint8_t) and what the VFS expects (unsigned char). Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 16 1月, 2010 11 次提交
-
-
由 Christoph Hellwig 提交于
Move xfsbdstrat and xfs_bdstrat_cb from xfs_lrw.c and xfs_bioerror and xfs_bioerror_relse from xfs_rw.c into xfs_buf.c. This also means xfs_bioerror and xfs_bioerror_relse can be marked static now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Fold XFS_bwrite into it's only caller, xfs_bwrite and move it into xfs_buf.c instead of leaving it as a fairly large inline function. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Don't bother using XFS_bwrite as it doesn't provide much code for our use case. Instead opencode it and fold xlog_bdstrat_cb into the new xlog_bdstrat helper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Now that the perag structure is allocated memory rather than held in an array, we don't need to have the busy extent array external to the structure. Embed it into the perag structure to avoid needing an extra allocation when setting up. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Add proper error handling in case an error occurs while initializing new perag structures for a mount point. The mount structure is restored to its previous state by deleting and freeing any perag structures added during the call. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
The filestreams cache flush is not needed in the sync code as it does not affect data writeback, and it is now not used by the growfs code, either, so kill it. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Uninline xfs_perag_{get,put} so that tracepoints can be inserted into them to speed debugging of reference count problems. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Reference count the per-ag structures to ensure that we keep get/put pairs balanced. Assert that the reference counts are zero at unmount time to catch leaks. In future, reference counts will enable us to safely remove perag structures by allowing us to detect when they are no longer in use. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
The use of an array for the per-ag structures requires reallocation of the array when growing the filesystem. This requires locking access to the array to avoid use after free situations, and the locking is difficult to get right. To avoid needing to reallocate an array, change the per-ag structures to an allocated object per ag and index them using a tree structure. The AGs are always densely indexed (hence the use of an array), but the number supported is 2^32 and lookups tend to be random and hence indexing needs to scale. A simple choice is a radix tree - it works well with this sort of index. This change also removes another large contiguous allocation from the mount/growfs path in XFS. The growing process now needs to change to only initialise the new AGs required for the extra space, and as such only needs to exclusively lock the tree for inserts. The rest of the code only needs to lock the tree while doing lookups, and hence this will remove all the deadlocks that currently occur on the m_perag_lock as it is now an innermost lock. The lock is also changed to a spinlock from a read/write lock as the hold time is now extremely short. To complete the picture, the per-ag structures will need to be reference counted to ensure that we don't free/modify them while they are still in use. This will be done in subsequent patch. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Convert the remaining direct lookups of the per ag structures to use get/put accesses. Ensure that the loops across AGs and prior users of the interface balance gets and puts correctly. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Use xfs_perag_get() and xfs_perag_put() in the filestreams code. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-