- 02 10月, 2014 5 次提交
-
-
由 Dave Chinner 提交于
xfs_buf_read_uncached() has two failure modes. If can either return NULL or bp->b_error != 0 depending on the type of failure, and not all callers check for both. Fix it so that xfs_buf_read_uncached() always returns the error status, and the buffer is returned as a function parameter. The buffer will only be returned on success. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
There is a lot of cookie-cutter code that looks like: if (shutdown) handle buffer error xfs_buf_iorequest(bp) error = xfs_buf_iowait(bp) if (error) handle buffer error spread through XFS. There's significant complexity now in xfs_buf_iorequest() to specifically handle this sort of synchronous IO pattern, but there's all sorts of nasty surprises in different error handling code dependent on who owns the buffer references and the locks. Pull this pattern into a single helper, where we can hide all the synchronous IO warts and hence make the error handling for all the callers much saner. This removes the need for a special extra reference to protect IO completion processing, as we can now hold a single reference across dispatch and waiting, simplifying the sync IO smeantics and error handling. In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and make it explicitly handle on asynchronous IO. This forces all users to be switched specifically to one interface or the other and removes any ambiguity between how the interfaces are to be used. It also means that xfs_buf_iowait() goes away. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
There is only one caller now - xfs_trans_read_buf_map() - and it has very well defined call semantics - read, synchronous, and b_iodone is NULL. Hence it's pretty clear what error handling is necessary for this case. The bigger problem of untangling xfs_trans_read_buf_map error handling is left to a future patch. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Currently the report of a bio error from completion immediately marks the buffer with an error. The issue is that this is racy w.r.t. synchronous IO - the submitter can see b_error being set before the IO is complete, and hence we cannot differentiate between submission failures and completion failures. Add an internal b_io_error field protected by the b_lock to catch IO completion errors, and only propagate that to the buffer during final IO completion handling. Hence we can tell in xfs_buf_iorequest if we've had a submission failure bey checking bp->b_error before dropping our b_io_remaining reference - that reference will prevent b_io_error values from being propagated to b_error in the event that completion races with submission. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We do some work in xfs_buf_ioend, and some work in xfs_buf_iodone_work, but much of that functionality is the same. This work can all be done in a single function, leaving xfs_buf_iodone just a wrapper to determine if we should execute it by workqueue or directly. hence rename xfs_buf_iodone_work to xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that need async processing. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 25 6月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 6月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
Most of the callers are just calling ASSERT(!xfs_buf_geterror()) which means they are checking for bp->b_error == 0. If bp is null in this case, we will assert fail, and hence it's no different in result to oopsing because of a null bp. In some cases, errors have already been checked for or the function returning the buffer can't return a buffer with an error, so it's just a redundant assert. Either way, the assert can either be removed. The other two non-assert callers can just test for a buffer and error properly. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 14 4月, 2014 2 次提交
-
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 27 2月, 2014 2 次提交
-
-
由 Eric Sandeen 提交于
Many/most callers of xfs_update_cksum() pass bp->b_addr and BBTOB(bp->b_length) as the first 2 args. Add a helper which can just accept the bp and the crc offset, and work it out on its own, for brevity. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
Many/most callers of xfs_verify_cksum() pass bp->b_addr and BBTOB(bp->b_length) as the first 2 args. Add a helper which can just accept the bp and the crc offset, and work it out on its own, for brevity. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 25 1月, 2014 3 次提交
-
-
由 Eric Sandeen 提交于
Some time ago, mkfs.xfs started picking the storage physical sector size as the default filesystem "sector size" in order to avoid RMW costs incurred by doing IOs at logical sector size alignments. However, this means that for a filesystem made with i.e. a 4k sector size on an "advanced format" 4k/512 disk, 512-byte direct IOs are no longer allowed. This means that XFS has essentially turned this AF drive into a hard 4K device, from the filesystem on up. XFS's mkfs-specified "sector size" is really just controlling the minimum size & alignment of filesystem metadata. There is no real need to tightly couple XFS's minimal metadata size to the minimum allowed direct IO size; XFS can continue doing metadata in optimal sizes, but still allow smaller DIOs for apps which issue them, for whatever reason. This patch adds a new field to the xfs_buftarg, so that we now track 2 sizes: 1) The metadata sector size, which is the minimum unit and alignment of IO which will be performed by metadata operations. 2) The device logical sector size The first is used internally by the file system for metadata alignment and IOs. The second is used for the minimum allowed direct IO alignment. This has passed xfstests on filesystems made with 4k sectors, including when run under the patch I sent to ignore XFS_IOC_DIOINFO, and issue 512 DIOs anyway. I also directly tested end of block behavior on preallocated, sparse, and existing files when we do a 512 IO into a 4k file on a 4k-sector filesystem, to be sure there were no unexpected behaviors. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Eric Sandeen 提交于
In preparation for adding new members to the structure, give these old ones more descriptive names: bt_ssize -> bt_meta_sectorsize bt_smask -> bt_meta_sectormask Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Eric Sandeen 提交于
Clean up the xfs_buftarg structure a bit: - remove bt_bsize which is never used - replace bt_sshift with bt_ssize; we only ever shift it back Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 17 12月, 2013 2 次提交
-
-
由 Dave Chinner 提交于
If we are doing aysnc writeback of metadata, we can get write errors but have nobody to report them to. At the moment, we simply attempt to reissue the write from io completion in the hope that it's a transient error. When it's not a transient error, the buffer is stuck forever in this loop, and we cannot break out of it. Eventually, unmount will hang because the AIL cannot be emptied and everything goes downhill from them. To solve this problem, only retry the write IO once before aborting it. We don't throw the buffer away because some transient errors can last minutes (e.g. FC path failover) or even hours (thin provisioned devices that have run out of backing space) before they go away. Hence we really want to keep trying until we can't try any more. Because the buffer was not cleaned, however, it does not get removed from the AIL and hence the next pass across the AIL will start IO on it again. As such, we still get the "retry forever" semantics that we currently have, but we allow other access to the buffer in the mean time. Meanwhile the filesystem can continue to modify the buffer and relog it, so the IO errors won't hang the log or the filesystem. Now when we are pushing the AIL, we can see all these "permanent IO error" buffers and we can issue a warning about failures before we retry the IO. We can also catch these buffers when unmounting an issue a corruption warning, too. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that handles the case of a shut down filesystem. Most of the users have private, uncached buffers that can just be freed in this case, but the complex error handling in xfs_bioerror_relse messes up the case when it's called without a locked buffer. Remove xfsbdstrat and opencode the error handling in the callers. All but one can simply return an error and don't need to deal with buffer state, and the one caller that cares about the buffer state could do with a major cleanup as well, but we'll defer that to later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 11 9月, 2013 2 次提交
-
-
由 Dave Chinner 提交于
In converting the buffer lru lists to use the generic code, the locking for marking the buffers as on the dispose list was lost. This results in confusion in LRU buffer tracking and acocunting, resulting in reference counts being mucked up and filesystem beig unmountable. To fix this, introduce an internal buffer spinlock to protect the state field that holds the dispose list information. Because there is now locking needed around xfs_buf_lru_add/del, and they are used in exactly one place each two lines apart, get rid of the wrappers and code the logic directly in place. Further, the LRU emptying code used on unmount is less than optimal. Convert it to use a dispose list as per a normal shrinker walk, and repeat the walk that fills the dispose list until the LRU is empty. Thi avoids needing to drop and regain the LRU lock for every item being freed, and allows the same logic as the shrinker isolate call to be used. Simpler, easier to understand. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NGlauber Costa <glommer@openvz.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Chinner 提交于
Convert the buftarg LRU to use the new generic LRU list and take advantage of the functionality it supplies to make the buffer cache shrinker node aware. Signed-off-by: NGlauber Costa <glommer@openvz.org> Signed-off-by: NDave Chinner <dchinner@redhat.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 1月, 2013 1 次提交
-
-
由 Mark Tinguely 提交于
Commits starting at 77c1a08f introduced a multiple segment support to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment buffer in the transaction because it was looking at the single segment block number rather than the multi-segment b_maps[0].bm.bn. This results on a recursive buffer lock that can never be satisfied. This patch: 1) Changed the remaining b_map accesses to be b_maps[0] accesses. 2) Renames the single segment b_map structure to __b_map to avoid future confusion. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 18 12月, 2012 1 次提交
-
-
由 Mark Tinguely 提交于
Commits starting at 77c1a08f introduced a multiple segment support to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment buffer in the transaction because it was looking at the single segment block number rather than the multi-segment b_maps[0].bm.bn. This results on a recursive buffer lock that can never be satisfied. This patch: 1) Changed the remaining b_map accesses to be b_maps[0] accesses. 2) Renames the single segment b_map structure to __b_map to avoid future confusion. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 16 11月, 2012 3 次提交
-
-
由 Dave Chinner 提交于
To separate the verifiers from iodone functions and associate read and write verifiers at the same time, introduce a buffer verifier operations structure to the xfs_buf. This avoids the need for assigning the write verifier, clearing the iodone function and re-running ioend processing in the read verifier, and gets rid of the nasty "b_pre_io" name for the write verifier function pointer. If we ever need to, it will also be easier to add further content specific callbacks to a buffer with an ops structure in place. We also avoid needing to export verifier functions, instead we can simply export the ops structures for those that are needed outside the function they are defined in. This patch also fixes a directory block readahead verifier issue it exposed. This patch also adds ops callbacks to the inode/alloc btree blocks initialised by growfs. These will need more work before they will work with CRCs. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NPhil White <pwhite@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Add a callback to the buffer write path to enable verification of the buffer and CRC calculation prior to issuing the write to the underlying storage. If the callback function detects some kind of failure or error condition, it must mark the buffer with an error so that the caller can take appropriate action. In the case of xfs_buf_ioapply(), a corrupt metadta buffer willt rigger a shutdown of the filesystem, because something is clearly wrong and we can't allow corrupt metadata to be written to disk. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NPhil White <pwhite@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Add a verifier function callback capability to the buffer read interfaces. This will be used by the callers to supply a function that verifies the contents of the buffer when it is read from disk. This patch does not provide callback functions, but simply modifies the interfaces to allow them to be called. The reason for adding this to the read interfaces is that it is very difficult to tell fom the outside is a buffer was just read from disk or whether we just pulled it out of cache. Supplying a callbck allows the buffer cache to use it's internal knowledge of the buffer to execute it only when the buffer is read from disk. It is intended that the verifier functions will mark the buffer with an EFSCORRUPTED error when verification fails. This allows the reading context to distinguish a verification error from an IO error, and potentially take further actions on the buffer (e.g. attempt repair) based on the error reported. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NPhil White <pwhite@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 30 8月, 2012 1 次提交
-
-
由 Carlos Maiolino 提交于
While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with buffers from lru list), there is a possibility to have xfs_buf_stale() racing with it, and removing buffers from dispose list before xfs_buftarg_shrink() does it. This happens because xfs_buftarg_shrink() handle the dispose list without locking and the test condition in xfs_buf_stale() checks for the buffer being in *any* list: if (!list_empty(&bp->b_lru)) If the buffer happens to be on dispose list, this causes the buffer counter of lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink() and another in xfs_buf_stale()) causing a wrong account usage of the lru list. This may cause xfs_buftarg_shrink() to return a wrong value to the memory shrinker shrink_slab(), and such account error may also cause an underflowed value to be returned; since the counter is lower than the current number of items in the lru list, a decrement may happen when the counter is 0, causing an underflow on the counter. The fix uses a new flag field (and a new buffer flag) to serialize buffer handling during the shrink process. The new flag field has been designed to use btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism. dchinner, sandeen, aquini and aris also deserve credits for this. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 25 8月, 2012 1 次提交
-
-
由 Carlos Maiolino 提交于
While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with buffers from lru list), there is a possibility to have xfs_buf_stale() racing with it, and removing buffers from dispose list before xfs_buftarg_shrink() does it. This happens because xfs_buftarg_shrink() handle the dispose list without locking and the test condition in xfs_buf_stale() checks for the buffer being in *any* list: if (!list_empty(&bp->b_lru)) If the buffer happens to be on dispose list, this causes the buffer counter of lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink() and another in xfs_buf_stale()) causing a wrong account usage of the lru list. This may cause xfs_buftarg_shrink() to return a wrong value to the memory shrinker shrink_slab(), and such account error may also cause an underflowed value to be returned; since the counter is lower than the current number of items in the lru list, a decrement may happen when the counter is 0, causing an underflow on the counter. The fix uses a new flag field (and a new buffer flag) to serialize buffer handling during the shrink process. The new flag field has been designed to use btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism. dchinner, sandeen, aquini and aris also deserve credits for this. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 14 7月, 2012 2 次提交
-
-
由 Christoph Hellwig 提交于
xfs_bdstrat_cb only adds a check for a shutdown filesystem over xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down filesystem a little earlier. In addition the shutdown handling in xfs_bdstrat_cb is not very suitable for this caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
xfs_bdstrat_cb only adds a check for a shutdown filesystem over xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down filesystem a little earlier. In addition the shutdown handling in xfs_bdstrat_cb is not very suitable for this caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 02 7月, 2012 3 次提交
-
-
由 Dave Chinner 提交于
With the internal interfaces supporting discontiguous buffer maps, add external lookup, read and get interfaces so they can start to be used. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
While the external interface currently uses separate blockno/length variables, we need to move internal interfaces to passing and parsing vector maps. This will then allow us to add external interfaces to support discontiguous buffer maps as the internal code will already support them. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
To support discontiguous buffers in the buffer cache, we need to separate the cache index variables from the I/O map. While this is currently a 1:1 mapping, discontiguous buffer support will break this relationship. However, for caching purposes, we can still treat them the same as a contiguous buffer - the block number of the first block and the length of the buffer - as that is still a unique representation. Also, the only way we will ever access the discontiguous regions of buffers is via bulding the complete buffer in the first place, so using the initial block number and entire buffer length is a sane way to index the buffers. Add a block mapping vector construct to the xfs_buf and use it in the places where we are doing IO instead of the current b_bn/b_length variables. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 15 5月, 2012 10 次提交
-
-
由 Dave Chinner 提交于
Rather than specifying XBF_MAPPED for almost all buffers, introduce XBF_UNMAPPED for the couple of users that use unmapped buffers. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK. This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to avoid recursion through memory reclaim back into the filesystem. All the blocking get calls in growfs occur inside a transaction, even though they are no part of the transaction, so all allocation will be GFP_NOFS due to the task flag PF_TRANS being set. The blocking read calls occur during log recovery, so they will probably be unaffected by converting to GFP_NOFS allocations. Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Buffers are always returned locked from the lookup routines. Hence we don't need to tell the lookup routines to return locked buffers, on to try and lock them. Remove XBF_LOCK from all the callers and from internal buffer cache usage. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfs_buf_btoc and friends are simple macros that do basic block to page index conversion and vice versa. These aren't widely used, and we use open coded masking and shifting everywhere else. Hence remove the macros and open code the work they do. Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now incorrect - we are using pages directly and not the page cache, so use PAGE_{SIZE|MASK|SHIFT} instead. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Now that we pass block counts everywhere, and index buffers by block number and length in units of blocks, convert the desired IO size into block counts rather than bytes. Convert the code to use block counts, and those that need byte counts get converted at the time of use. Rename the b_desired_count variable to something closer to it's purpose - b_io_length - as it is only used to specify the length of an IO for a subset of the buffer. The only time this is used is for log IO - both writing iclogs and during log recovery. In all other cases, the b_io_length matches b_length, and hence a lot of code confuses the two. e.g. the buf item code uses the io count exclusively when it should be using the buffer length. Fix these apprpriately as they are found. Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers around the desired IO length. They only serve to make the code shouty loud, don't actually add any real value, and are often used incorrectly. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Now that we pass block counts everywhere, and index buffers by block number, track the length of the buffer in units of blocks rather than bytes. Convert the code to use block counts, and those that need byte counts get converted at the time of use. Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers around the buffer length. They only serve to make the code shouty loud and don't actually add any real value. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Seeing as we pass block numbers around everywhere in the buffer cache now, it makes no sense to index everything by byte offset. Replace all the byte offset indexing with block number based indexing, and replace all uses of the byte offset with direct conversion from the block index. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The xfs_buf_get/read API is not consistent in the units it uses, and does not use appropriate or consistent units/types for the variables. Convert the API to use disk addresses and block counts for all buffer get and read calls. Use consistent naming for all the functions and their declarations, and convert the internal functions to use disk addresses and block counts to avoid need to convert them from one type to another and back again. Fix all the callers to use disk addresses and block counts. In many cases, this removes an additional conversion from the function call as the callers already have a block count. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
If we call xfs_buf_iowait() on a buffer that failed dispatch due to an IO error, it will wait forever for an Io that does not exist. This is hndled in xfs_buf_read, but there is other code that calls xfs_buf_iowait directly that doesn't. Rather than make the call sites have to handle checking for dispatch errors and then checking for completion errors, make xfs_buf_iowait() check for dispatch errors on the buffer before waiting. This means we handle both dispatch and completion errors with one set of error handling at the caller sites. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
Queue delwri buffers on a local on-stack list instead of a per-buftarg one, and write back the buffers per-process instead of by waking up xfsbufd. This is now easily doable given that we have very few places left that write delwri buffers: - log recovery: Only done at mount time, and already forcing out the buffers synchronously using xfs_flush_buftarg - quotacheck: Same story. - dquot reclaim: Writes out dirty dquots on the LRU under memory pressure. We might want to look into doing more of this via xfsaild, but it's already more optimal than the synchronous inode reclaim that writes each buffer synchronously. - xfsaild: This is the main beneficiary of the change. By keeping a local list of buffers to write we reduce latency of writing out buffers, and more importably we can remove all the delwri list promotions which were hitting the buffer cache hard under sustained metadata loads. The implementation is very straight forward - xfs_buf_delwri_queue now gets a new list_head pointer that it adds the delwri buffers to, and all callers need to eventually submit the list using xfs_buf_delwi_submit or xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are skipped in xfs_buf_delwri_queue, assuming they already are on another delwri list. The biggest change to pass down the buffer list was done to the AIL pushing. Now that we operate on buffers the trylock, push and pushbuf log item methods are merged into a single push routine, which tries to lock the item, and if possible add the buffer that needs writeback to the buffer list. This leads to much simpler code than the previous split but requires the individual IOP_PUSH instances to unlock and reacquire the AIL around calls to blocking routines. Given that xfsailds now also handle writing out buffers, the conditions for log forcing and the sleep times needed some small changes. The most important one is that we consider an AIL busy as long we still have buffers to push, and the other one is that we do increment the pushed LSN for buffers that are under flushing at this moment, but still count them towards the stuck items for restart purposes. Without this we could hammer on stuck items without ever forcing the log and not make progress under heavy random delete workloads on fast flash storage devices. [ Dave Chinner: - rebase on previous patches. - improved comments for XBF_DELWRI_Q handling - fix XBF_ASYNC handling in queue submission (test 106 failure) - rename delwri submit function buffer list parameters for clarity - xfs_efd_item_push() should return XFS_ITEM_PINNED ] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-