- 19 10月, 2010 19 次提交
-
-
由 Christoph Hellwig 提交于
This header only provides one extern that isn't actually declared anywhere, and shadowed by a macro. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
It used to have a place when it contained an automatically generated CVS version, but these days it's entirely superflous. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Fail the mount if we can't allocate memory for the per-CPU counters. This is consistent with how we handle everything else in the mount path and makes the superblock counter modification a lot simpler. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
pahole reports the struct xfs_buf has quite a few holes in it, so packing the structure better will reduce the size of it by 16 bytes. Also, move all the fields used in cache lookups into the first cacheline. Before on x86_64: /* size: 320, cachelines: 5 */ /* sum members: 298, holes: 6, sum holes: 22 */ After on x86_64: /* size: 304, cachelines: 5 */ /* padding: 6 */ /* last cacheline: 48 bytes */ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
The buffer cache hash is showing typical hash scalability problems. In large scale testing the number of cached items growing far larger than the hash can efficiently handle. Hence we need to move to a self-scaling cache indexing mechanism. I have selected rbtrees for indexing becuse they can have O(log n) search scalability, and insert and remove cost is not excessive, even on large trees. Hence we should be able to cache large numbers of buffers without incurring the excessive cache miss search penalties that the hash is imposing on us. To ensure we still have parallel access to the cache, we need multiple trees. Rather than hashing the buffers by disk address to select a tree, it seems more sensible to separate trees by typical access patterns. Most operations use buffers from within a single AG at a time, so rather than searching lots of different lists, separate the buffer indexes out into per-AG rbtrees. This means that searches during metadata operation have a much higher chance of hitting cache resident nodes, and that updates of the tree are less likely to disturb trees being accessed on other CPUs doing independent operations. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Memory reclaim via shrinkers has a terrible habit of having N+M concurrent shrinker executions (N = num CPUs, M = num kswapds) all trying to shrink the same cache. When the cache they are all working on is protected by a single spinlock, massive contention an slowdowns occur. Wrap the per-ag inode caches with a reclaim mutex to serialise reclaim access to the AG. This will block concurrent reclaim in each AG but still allow reclaim to scan multiple AGs concurrently. Allow shrinkers to move on to the next AG if it can't get the lock, and if we can't get any AG, then start blocking on locks. To prevent reclaimers from continually scanning the same inodes in each AG, add a cursor that tracks where the last reclaim got up to and start from that point on the next reclaim. This should avoid only ever scanning a small number of inodes at the satart of each AG and not making progress. If we have a non-shrinker based reclaim pass, ignore the cursor and reset it to zero once we are done. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Batch and optimise the per-ag inode lookup for reclaim to minimise scanning overhead. This involves gang lookups on the radix trees to get multiple inodes during each tree walk, and tighter validation of what inodes can be reclaimed without blocking befor we take any locks. This is based on ideas suggested in a proof-of-concept patch posted by Nick Piggin. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
With the reclaim code separated from the generic walking code, it is simple to implement batched lookups for the generic walk code. Separate out the inode validation from the execute operations and modify the tree lookups to get a batch of inodes at a time. Reclaim operations will be optimised separately. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
When doing read side inode cache walks, the code to validate and grab an inode is common to all callers. Split it out of the execute callbacks in preparation for batching lookups. Similarly, split out the inode reference dropping from the execute callbacks into the main lookup look to be symmetric with the grab. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
The reclaim walk requires different locking and has a slightly different walk algorithm, so separate it out so that it can be optimised separately. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
For RT and external log devices, we never use hashed buffers on them now. Remove the buftarg hash tables that are set up for them. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Filesystem level managed buffers are buffers that have their lifecycle controlled by the filesystem layer, not the buffer cache. We currently cache these buffers, which makes cleanup and cache walking somewhat troublesome. Convert the fs managed buffers to uncached buffers obtained by via xfs_buf_get_uncached(), and remove the XBF_FS_MANAGED special cases from the buffer cache. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Each buffer contains both a buftarg pointer and a mount pointer. If we add a mount pointer into the buftarg, we can avoid needing the b_mount field in every buffer and grab it from the buftarg when needed instead. This shrinks the xfs_buf by 8 bytes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
To avoid the need to use cached buffers for single-shot or buffers cached at the filesystem level, introduce a new buffer read primitive that bypasses the cache an reads directly from disk. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
xfs_buf_get_nodaddr() is really used to allocate a buffer that is uncached. While it is not directly assigned a disk address, the fact that they are not cached is a more important distinction. With the upcoming uncached buffer read primitive, we should be consistent with this disctinction. While there, make page allocation in xfs_buf_get_nodaddr() safe against memory reclaim re-entrancy into the filesystem by allowing a flags parameter to be passed. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Under heavy multi-way parallel create workloads, the VFS struggles to write back all the inodes that have been changed in age order. The bdi flusher thread becomes CPU bound, spending 85% of it's time in the VFS code, mostly traversing the superblock dirty inode list to separate dirty inodes old enough to flush. We already keep an index of all metadata changes in age order - in the AIL - and continued log pressure will do age ordered writeback without any extra overhead at all. If there is no pressure on the log, the xfssyncd will periodically write back metadata in ascending disk address offset order so will be very efficient. Hence we can stop marking VFS inodes dirty during transaction commit or when changing timestamps during transactions. This will keep the inodes in the superblock dirty list to those containing data or unlogged metadata changes. However, the timstamp changes are slightly more complex than this - there are a couple of places that do unlogged updates of the timestamps, and the VFS need to be informed of these. Hence add a new function xfs_trans_ichgtime() for transactional changes, and leave xfs_ichgtime() for the non-transactional changes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
When we start taking a reference to the per-ag for every cached buffer in the system, kernel lockstat profiling on an 8-way create workload shows the mp->m_perag_lock has higher acquisition rates than the inode lock and has significantly more contention. That is, it becomes the highest contended lock in the system. The perag lookup is trivial to convert to lock-less RCU lookups because perag structures never go away. Hence the only thing we need to protect against is tree structure changes during a grow. This can be done simply by replacing the locking in xfs_perag_get() with RCU read locking. This removes the mp->m_perag_lock completely from this path. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
XFS_IOC_ZERO_RANGE is the equivalent of an atomic XFS_IOC_UNRESVSP/ XFS_IOC_RESVSP call pair. It enabled ranges of written data to be turned into zeroes without requiring IO or having to free and reallocate the extents in the range given as would occur if we had to punch and then preallocate them separately. This enables applications to zero parts of files very quickly without changing the layout of the files in any way. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
While XFS passes ranges to operate on from the core code, the functions being called ignore the either the entire range or the end of the range. This is historical because when the function were written linux didn't have the necessary range operations. Update the functions to use the correct operations. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 07 10月, 2010 1 次提交
-
-
由 Johannes Weiner 提交于
When marking an inode reclaimable, a per-AG counter is increased, the inode is tagged reclaimable in its per-AG tree, and, when this is the first reclaimable inode in the AG, the AG entry in the per-mount tree is also tagged. When an inode is finally reclaimed, however, it is only deleted from the per-AG tree. Neither the counter is decreased, nor is the parent tree's AG entry untagged properly. Since the tags in the per-mount tree are not cleared, the inode shrinker iterates over all AGs that have had reclaimable inodes at one point in time. The counters on the other hand signal an increasing amount of slab objects to reclaim. Since "70e60ce7 xfs: convert inode shrinker to per-filesystem context" this is not a real issue anymore because the shrinker bails out after one iteration. But the problem was observable on a machine running v2.6.34, where the reclaimable work increased and each process going into direct reclaim eventually got stuck on the xfs inode shrinking path, trying to scan several million objects. Fix this by properly unwinding the reclaimable-state tracking of an inode when it is reclaimed. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: stable@kernel.org Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 10 9月, 2010 2 次提交
-
-
由 Dave Chinner 提交于
The workqueue implementation in 2.6.36-rcX has changed, resulting in the workqueues no longer having dedicated threads for work processing. This has caused severe livelocks under heavy parallel create workloads because the log IO completions have been getting held up behind metadata IO completions. Hence log commits would stall, memory allocation would stall because pages could not be cleaned, and lock contention on the AIL during inode IO completion processing was being seen to slow everything down even further. By making the log Io completion workqueue a high priority workqueue, they are queued ahead of all data/metadata IO completions and processed before the data/metadata completions. Hence the log never gets stalled, and operations needed to clean memory can continue as quickly as possible. This avoids the livelock conditions and allos the system to keep running under heavy load as per normal. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dan Rosenberg 提交于
The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 03 9月, 2010 1 次提交
-
-
由 Tao Ma 提交于
In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want to return fi_extent_max extents, but actually it won't work for a sparse file. The reason is that in xfs_getbmap we will calculate holes and set it in 'out', while out is malloced by bmv_count(fi_extent_max+1) which didn't consider holes. So in the worst case, if 'out' vector looks like [hole, extent, hole, extent, hole, ... hole, extent, hole], we will only return half of fi_extent_max extents. This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags. So with this flags, we don't use our 'out' in xfs_getbmap for a hole. The solution is a bit ugly by just don't increasing index of 'out' vector. I felt that it is not easy to skip it at the very beginning since we have the complicated check and some function like xfs_getbmapx_fix_eof_hole to adjust 'out'. Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 02 9月, 2010 2 次提交
-
-
由 Arkadiusz Mi?kiewicz 提交于
Currently on-disk structure is able to keep only 16bit project quota id, so disallow 32bit ones. This fixes a problem where parts of kernel structures holding project quota id are 32bit while parts (on-disk) are 16bit variables which causes project quota member files to be inaccessible for some operations (like mv/rm). Signed-off-by: NArkadiusz Mi?kiewicz <arekm@maven.pl> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
When doing large parallel file creates on a 16p machines, large amounts of time is being spent in _xfs_buf_find(). A system wide profile with perf top shows this: 1134740.00 19.3% _xfs_buf_find 733142.00 12.5% __ticket_spin_lock The problem is that the hash contains 45,000 buffers, and the hash table width is only 256 buffers. That means we've got around 200 buffers per chain, and searching it is quite expensive. The hash table size needs to increase. Secondly, every time we do a lookup, we promote the buffer we find to the head of the hash chain. This is causing cachelines to be dirtied and causes invalidation of cachelines across all CPUs that may have walked the hash chain recently. hence every walk of the hash chain is effectively a cold cache walk. Remove the promotion to avoid this invalidation. The results are: 1045043.00 21.2% __ticket_spin_lock 326184.00 6.6% _xfs_buf_find A 70% drop in the CPU usage when looking up buffers. Unfortunately that does not result in an increase in performance underthis workload as contention on the inode_lock soaks up most of the reduction in CPU usage. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 24 8月, 2010 4 次提交
-
-
由 Christoph Hellwig 提交于
If xfs_map_blocks returns EAGAIN because of lock contention we must redirty the page and not disard the pagecache content and return an error from writepage. We used to do this correctly, but the logic got lost during the recent reshuffle of the writepage code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NMike Gao <ygao.linux@gmail.com> Tested-by: NMike Gao <ygao.linux@gmail.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <dchinner@redhat.com>
-
由 Dave Chinner 提交于
When we need to cover the log, we issue dummy transactions to ensure the current log tail is on disk. Unfortunately we currently use the root inode in the dummy transaction, and the act of committing the transaction dirties the inode at the VFS level. As a result, the VFS writeback of the dirty inode will prevent the filesystem from idling long enough for the log covering state machine to complete. The state machine gets stuck in a loop issuing new dummy transactions to cover the log and never makes progress. To avoid this problem, the dummy transactions should not cause externally visible state changes. To ensure this occurs, make sure that dummy transactions log an unchanging field in the superblock as it's state is never propagated outside the filesystem. This allows the log covering state machine to complete successfully and the filesystem now correctly enters a fully idle state about 90s after the last modification was made. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Stuart Brodsky 提交于
Because of delayed updates to sb_icount field in the super block, it is possible to allocate over maxicount number of inodes. This causes the arithmetic to calculate a negative number of free inodes in user commands like df or stat -f. Since maxicount is a somewhat arbitrary number, a slight over allocation is not critical but user commands should be displayed as 0 or greater and never go negative. To do this the value in the stats buffer f_ffree is capped to never go negative. [ Modified to use max_t as per Christoph's comment. ] Signed-off-by: NStu Brodsky <sbrodsky@sgi.com> Signed-off-by: NDave Chinner <dchinner@redhat.com>
-
由 Dave Chinner 提交于
During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will go negative on inodes with more than 1024 dirty pages due to implementation details of write_cache_pages(). Currently XFS will abort page clustering in writeback once nr_to_write drops below zero, and so for data integrity writeback we will do very inefficient page at a time allocation and IO submission for inodes with large numbers of dirty pages. Fix this by only aborting the page clustering code when wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE. Cc: <stable@kernel.org> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 10 8月, 2010 5 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is equivalent to I_FREEING for almost all code looking at either; it's there to keep track of having called clear_inode() exactly once per inode lifetime, at some point after having set I_FREEING. I_CLEAR and I_FREEING never get set at the same time with the current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR instead of I_CLEAR without loss of information. As the result of such change, checks become simpler and the amount of code that needs to know about I_CLEAR shrinks a lot. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Convert XFS to the new truncate sequence. We still can have errors after updating the file size in xfs_setattr, but these are real I/O errors and lead to a transaction abort and filesystem shutdown, so they are not an issue. Errors from ->write_begin and write_end can now be handled correctly because we can actually get rid of the delalloc extents while previous the buffer state was stipped in block_invalidatepage. There is still no error handling for ->direct_IO, because doing so will need some major restructuring given that we only have the iolock shared and do not hold i_mutex at all. Fortunately leaving the normally allocated blocks behind there is not a major issue and this will get cleaned up by xfs_free_eofblock later. Note: the patch is against Al's vfs.git tree as that contains the nessecary preparations. I'd prefer to get it applied there so that we can get some testing in linux-next. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Move the call to vmtruncate to get rid of accessive blocks to the callers in preparation of the new truncate sequence and rename the non-truncating version to block_write_begin. While we're at it also remove several unused arguments to block_write_begin. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 7月, 2010 6 次提交
-
-
由 Christoph Hellwig 提交于
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Christoph Hellwig 提交于
Our current handling of direct I/O completions is rather suboptimal, because we defer it to a workqueue more often than needed, and we perform a much to aggressive flush of the workqueue in case unwritten extent conversions happen. This patch changes the direct I/O reads to not even use a completion handler, as we don't bother to use it at all, and to perform the unwritten extent conversions in caller context for synchronous direct I/O. For a small I/O size direct I/O workload on a consumer grade SSD, such as the untar of a kernel tree inside qemu this patch gives speedups of about 5%. Getting us much closer to the speed of a native block device, or a fully allocated XFS file. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
If we write into an unwritten extent using AIO we need to complete the AIO request after the extent conversion has finished. Without that a read could race to see see the extent still unwritten and return zeros. For synchronous I/O we already take care of that by flushing the xfsconvertd workqueue (which might be a bit of overkill). To do that add iocb and result fields to struct xfs_ioend, so that we can call aio_complete from xfs_end_io after the extent conversion has happened. Note that we need a new result field as io_error is used for positive errno values, while the AIO code can return negative error values and positive transfer sizes. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
The b_strat callback is used by xfs_buf_iostrategy to perform additional checks before submitting a buffer. It is used in xfs_bwrite and when writing out delayed buffers. In xfs_bwrite it we can de-virtualize the call easily as b_strat is set a few lines above the call to xfs_buf_iostrategy. For the delayed buffers the rationale is a bit more complicated: - there are three callers of xfs_buf_delwri_queue, which places buffers on the delwri list: (1) xfs_bdwrite - this sets up b_strat, so it's fine (2) xfs_buf_iorequest. None of the callers can have XBF_DELWRI set: - xlog_bdstrat is only used for log buffers, which are never delwri - _xfs_buf_read explicitly clears the delwri flag - xfs_buf_iodone_work retries log buffers only - xfsbdstrat - only used for reads, superblock writes without the delwri flag, log I/O and file zeroing with explicitly allocated buffers. - xfs_buf_iostrategy - only calls xfs_buf_iorequest if b_strat is not set (3) xfs_buf_unlock - only puts the buffer on the delwri list if the DELWRI flag is already set. The DELWRI flag is only ever set in xfs_bwrite, xfs_buf_iodone_callbacks, or xfs_trans_log_buf. For xfs_buf_iodone_callbacks and xfs_trans_log_buf we require an initialized buf item, which means b_strat was set to xfs_bdstrat_cb in xfs_buf_item_init. Conclusion: we can just get rid of the callback and replace it with explicit calls to xfs_bdstrat_cb. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Since Linux 2.6.33 the kernel has support for real O_SYNC, which made the osyncisosync option a no-op. Warn the users about this and remove the mount flag for it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-