- 12 10月, 2011 2 次提交
-
-
由 Christoph Hellwig 提交于
Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to mirror the delwri and read paths. Also remove the mount pointer passed to xfs_bwrite, which is superflous now that we have a mount pointer in the buftarg. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Unify the ways we add buffers to the delwri queue by always calling xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and opencoded in its callers, and the two places setting XBF_DELWRI while a buffer is locked and expecting xfs_buf_unlock to pick it up are converted to call xfs_buf_delwri_queue directly, too. Also replace the XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue to make the explicit queuing/dequeuing more obvious. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 13 8月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the annoying subdirectories in the XFS source code. Besides the large amount of file rename the only changes are to the Makefile, a few files including headers with the subdirectory prefix, and the binary sysctl compat code that includes a header under fs/xfs/ from kernel/. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 26 7月, 2011 11 次提交
-
-
由 Chandra Seetharaman 提交于
Remove the definition and usages of the macro XFS_BUFTARG_NAME. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definition and usages of the macro XFS_BUF_TARGET Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the macro XFS_BUF_SET_TARGET. hch: As all the buffer allocator already set ->b_target it should be safe to simply remove these calls. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Replace the macro XFS_BUF_ISPINNED with an inline helper function xfs_buf_ispinned, and change all its usages. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definition and usages of the macro XFS_BUF_SET_PTR. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definition and usages of the macro XFS_BUF_PTR. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definition and usage of the macro XFS_BUF_SET_START. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definition and usage of the macro XFS_BUF_HOLD Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definitions and uses of the macros XFS_BUF_BUSY, XFS_BUF_UNBUSY, and XFS_BUF_ISBUSY. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definitions and usage of the macros XFS_BUF_ERROR, XFS_BUF_GETERROR and XFS_BUF_ISERROR. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Chandra Seetharaman 提交于
Remove the definition of the macro XFS_BUF_BFLAGS and its usage. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 13 7月, 2011 3 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
Replace the typeless b_fspriv2 and the ugly macros around it with a properly typed transaction pointer. As a fallout the log buffer state debug checks are also removed. We could have kept them using casts, but as they do not have a real purpose we can as well just remove them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 08 7月, 2011 3 次提交
-
-
由 Christoph Hellwig 提交于
Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and XBF_FLUSH to allow more fine grained control over the bio flags. Also cleanup processing of the flags in _xfs_buf_ioapply to make more sense, and renumber the sparse flag number space to group flags by purpose. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
Rename xfs_buf_cond_lock and reverse it's return value to fit most other trylock operations in the Kernel and XFS (with the exception of down_trylock, after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val with an xfs_buf_islocked for use in asserts, or and opencoded variant in tracing. remove the XFS_BUF_* wrappers for all the locking helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 20 5月, 2011 1 次提交
-
-
由 Dave Chinner 提交于
When we free a vmapped buffer, we need to ensure the vmap address and length we free is the same as when it was allocated. In various places in the log code we change the memory the buffer is pointing to before issuing IO, but we never reset the buffer to point back to it's original memory (or no memory, if that is the case for the buffer). As a result, when we free the buffer it points to memory that is owned by something else and attempts to unmap and free it. Because the range does not match any known mapped range, it can trigger BUG_ON() traps in the vmap code, and potentially corrupt the vmap area tracking. Fix this by always resetting these buffers to their original state before freeing them. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 26 3月, 2011 1 次提交
-
-
由 Dave Chinner 提交于
Now that the buffer cache has it's own LRU, we do not need to use the page cache to provide persistent caching and reclaim infrastructure. Convert the buffer cache to use alloc_pages() instead of the page cache. This will remove all the overhead of page cache management from setup and teardown of the buffers, as well as needing to mark pages accessed as we find buffers in the buffer cache. By avoiding the page cache, we also remove the need to keep state in the page_private(page) field for persistant storage across buffer free/buffer rebuild and so all that code can be removed. This also fixes the long-standing problem of not having enough bits in the page_private field to track all the state needed for a 512 sector/64k page setup. It also removes the need for page locking during reads as the pages are unique to the buffer and nobody else will be attempting to access them. Finally, it removes the buftarg address space lock as a point of global contention on workloads that allocate and free buffers quickly such as when creating or removing large numbers of inodes in parallel. This remove the 16TB limit on filesystem size on 32 bit machines as the page index (32 bit) is no longer used for lookups of metadata buffers - the buffer cache is now solely indexed by disk address which is stored in a 64 bit field in the buffer. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
- 12 1月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
If we get an IO error on a synchronous superblock write, we attach an error release function to it so that when the last reference goes away the release function is called and the buffer is invalidated and unlocked. The buffer is left locked until the release function is called so that other concurrent users of the buffer will be locked out until the buffer error is fully processed. Unfortunately, for the superblock buffer the filesyetm itself holds a reference to the buffer which prevents the reference count from dropping to zero and the release function being called. As a result, once an IO error occurs on a sync write, the buffer will never be unlocked and all future attempts to lock the buffer will hang. To make matters worse, this problems is not unique to such buffers; if there is a concurrent _xfs_buf_find() running, the lookup will grab a reference to the buffer and then wait on the buffer lock, preventing the reference count from ever falling to zero and hence unlocking the buffer. As such, the whole b_relse function implementation is broken because it cannot rely on the buffer reference count falling to zero to unlock the errored buffer. The synchronous write error path is the only path that uses this callback - it is used to ensure that the synchronous waiter gets the buffer error before the error state is cleared from the buffer by the release function. Given that the only sychronous buffer writes now go through xfs_bwrite and the error path in question can only occur for a write of a dirty, logged buffer, we can move most of the b_relse processing to happen inline in xfs_buf_iodone_callbacks, just like a normal I/O completion. In addition to that we make sure the error is not cleared in xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it. Given that xfs_bwrite keeps the buffer locked until it has waited for it and checked the error this allows to reliably propagate the error to the caller, and make sure that the buffer is reliably unlocked. Given that xfs_buf_iodone_callbacks was the only instance of the b_relse callback we can remove it entirely. Based on earlier patches by Dave Chinner and Ajeet Yadav. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NAjeet Yadav <ajeet.yadav.77@gmail.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 02 12月, 2010 2 次提交
-
-
由 Dave Chinner 提交于
Now that the buffer reclaim infrastructure can handle different reclaim priorities for different types of buffers, reconnect the hooks in the XFS code that has been sitting dormant since it was ported to Linux. This should finally give use reclaim prioritisation that is on a par with the functionality that Irix provided XFS 15 years ago. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
Introduce a per-buftarg LRU for memory reclaim to operate on. This is the last piece we need to put in place so that we can fully control the buffer lifecycle. This allows XFS to be responsibile for maintaining the working set of buffers under memory pressure instead of relying on the VM reclaim not to take pages we need out from underneath us. The implementation introduces a b_lru_ref counter into the buffer. This is currently set to 1 whenever the buffer is referenced and so is used to determine if the buffer should be added to the LRU or not when freed. Effectively it allows lazy LRU initialisation of the buffer so we do not need to touch the LRU list and locks in xfs_buf_find(). Instead, when the buffer is being released and we drop the last reference to it, we check the b_lru_ref count and if it is none zero we re-add the buffer reference and add the inode to the LRU. The b_lru_ref counter is decremented by the shrinker, and whenever the shrinker comes across a buffer with a zero b_lru_ref counter, if released the LRU reference on the buffer. In the absence of a lookup race, this will result in the buffer being freed. This counting mechanism is used instead of a reference flag so that it is simple to re-introduce buffer-type specific reclaim reference counts to prioritise reclaim more effectively. We still have all those hooks in the XFS code, so this will provide the infrastructure to re-implement that functionality. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 30 11月, 2010 1 次提交
-
-
由 Dave Chinner 提交于
Before we introduce per-buftarg LRU lists, split the shrinker implementation into per-buftarg shrinker callbacks. At the moment we wake all the xfsbufds to run the delayed write queues to free the dirty buffers and make their pages available for reclaim. However, with an LRU, we want to be able to free clean, unused buffers as well, so we need to separate the xfsbufd from the shrinker callbacks. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
- 19 10月, 2010 7 次提交
-
-
由 Christoph Hellwig 提交于
Stop having two different names for many buffer functions and use the more descriptive xfs_buf_* names directly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
pahole reports the struct xfs_buf has quite a few holes in it, so packing the structure better will reduce the size of it by 16 bytes. Also, move all the fields used in cache lookups into the first cacheline. Before on x86_64: /* size: 320, cachelines: 5 */ /* sum members: 298, holes: 6, sum holes: 22 */ After on x86_64: /* size: 304, cachelines: 5 */ /* padding: 6 */ /* last cacheline: 48 bytes */ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
The buffer cache hash is showing typical hash scalability problems. In large scale testing the number of cached items growing far larger than the hash can efficiently handle. Hence we need to move to a self-scaling cache indexing mechanism. I have selected rbtrees for indexing becuse they can have O(log n) search scalability, and insert and remove cost is not excessive, even on large trees. Hence we should be able to cache large numbers of buffers without incurring the excessive cache miss search penalties that the hash is imposing on us. To ensure we still have parallel access to the cache, we need multiple trees. Rather than hashing the buffers by disk address to select a tree, it seems more sensible to separate trees by typical access patterns. Most operations use buffers from within a single AG at a time, so rather than searching lots of different lists, separate the buffer indexes out into per-AG rbtrees. This means that searches during metadata operation have a much higher chance of hitting cache resident nodes, and that updates of the tree are less likely to disturb trees being accessed on other CPUs doing independent operations. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Filesystem level managed buffers are buffers that have their lifecycle controlled by the filesystem layer, not the buffer cache. We currently cache these buffers, which makes cleanup and cache walking somewhat troublesome. Convert the fs managed buffers to uncached buffers obtained by via xfs_buf_get_uncached(), and remove the XBF_FS_MANAGED special cases from the buffer cache. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Each buffer contains both a buftarg pointer and a mount pointer. If we add a mount pointer into the buftarg, we can avoid needing the b_mount field in every buffer and grab it from the buftarg when needed instead. This shrinks the xfs_buf by 8 bytes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
To avoid the need to use cached buffers for single-shot or buffers cached at the filesystem level, introduce a new buffer read primitive that bypasses the cache an reads directly from disk. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
xfs_buf_get_nodaddr() is really used to allocate a buffer that is uncached. While it is not directly assigned a disk address, the fact that they are not cached is a more important distinction. With the upcoming uncached buffer read primitive, we should be consistent with this disctinction. While there, make page allocation in xfs_buf_get_nodaddr() safe against memory reclaim re-entrancy into the filesystem by allowing a flags parameter to be passed. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
- 10 9月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
Switch to the WRITE_FLUSH_FUA flag for log writes and remove the EOPNOTSUPP detection for barriers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 02 9月, 2010 1 次提交
-
-
由 Dave Chinner 提交于
When doing large parallel file creates on a 16p machines, large amounts of time is being spent in _xfs_buf_find(). A system wide profile with perf top shows this: 1134740.00 19.3% _xfs_buf_find 733142.00 12.5% __ticket_spin_lock The problem is that the hash contains 45,000 buffers, and the hash table width is only 256 buffers. That means we've got around 200 buffers per chain, and searching it is quite expensive. The hash table size needs to increase. Secondly, every time we do a lookup, we promote the buffer we find to the head of the hash chain. This is causing cachelines to be dirtied and causes invalidation of cachelines across all CPUs that may have walked the hash chain recently. hence every walk of the hash chain is effectively a cold cache walk. Remove the promotion to avoid this invalidation. The results are: 1045043.00 21.2% __ticket_spin_lock 326184.00 6.6% _xfs_buf_find A 70% drop in the CPU usage when looking up buffers. Unfortunately that does not result in an increase in performance underthis workload as contention on the inode_lock soaks up most of the reduction in CPU usage. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 27 7月, 2010 3 次提交
-
-
由 Christoph Hellwig 提交于
The b_strat callback is used by xfs_buf_iostrategy to perform additional checks before submitting a buffer. It is used in xfs_bwrite and when writing out delayed buffers. In xfs_bwrite it we can de-virtualize the call easily as b_strat is set a few lines above the call to xfs_buf_iostrategy. For the delayed buffers the rationale is a bit more complicated: - there are three callers of xfs_buf_delwri_queue, which places buffers on the delwri list: (1) xfs_bdwrite - this sets up b_strat, so it's fine (2) xfs_buf_iorequest. None of the callers can have XBF_DELWRI set: - xlog_bdstrat is only used for log buffers, which are never delwri - _xfs_buf_read explicitly clears the delwri flag - xfs_buf_iodone_work retries log buffers only - xfsbdstrat - only used for reads, superblock writes without the delwri flag, log I/O and file zeroing with explicitly allocated buffers. - xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is not set (3) xfs_buf_unlock - only puts the buffer on the delwri list if the DELWRI flag is already set. The DELWRI flag is only ever set in xfs_bwrite, xfs_buf_iodone_callbacks, or xfs_trans_log_buf. For xfs_buf_iodone_callbacks and xfs_trans_log_buf we require an initialized buf item, which means b_strat was set to xfs_bdstrat_cb in xfs_buf_item_init. Conclusion: we can just get rid of the callback and replace it with explicit calls to xfs_bdstrat_cb. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
The tracing code can't print flags defined as enums. Most flags that we want to print are defines as macros already, but move the few remaining ones over to make the trace output more useful. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
Get rid of the xfs_buf_pin/xfs_buf_unpin/xfs_buf_ispin helpers and opencode them in their only callers, just like we did for the inode pinning a while ago. Also remove duplicate trace points - the bufitem tracepoints cover all the information that is present in a buffer tracepoint. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 19 5月, 2010 1 次提交
-
-
由 Jan Engelhardt 提交于
This allows to see in `ps` and similar tools which kthreads are allotted to which block device/filesystem, similar to what jbd2 does. As the process name is a fixed 16-char array, no extra space is needed in tasks. PID TTY STAT TIME COMMAND 2 ? S 0:00 [kthreadd] 197 ? S 0:00 \_ [jbd2/sda2-8] 198 ? S 0:00 \_ [ext4-dio-unwrit] 204 ? S 0:00 \_ [flush-8:0] 2647 ? S 0:00 \_ [xfs_mru_cache] 2648 ? S 0:00 \_ [xfslogd/0] 2649 ? S 0:00 \_ [xfsdatad/0] 2650 ? S 0:00 \_ [xfsconvertd/0] 2651 ? S 0:00 \_ [xfsbufd/ram0] 2652 ? S 0:00 \_ [xfsaild/ram0] 2653 ? S 0:00 \_ [xfssyncd/ram0] Signed-off-by: NJan Engelhardt <jengelh@medozas.de> Reviewed-by: NDave Chinner <david@fromorbit.com>
-
- 04 2月, 2010 1 次提交
-
-
由 Dave Chinner 提交于
There are no more users of this function left in the XFS code now that we've switched everything to delayed write flushing. Remove it. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-