- 01 12月, 2019 2 次提交
-
-
由 Brian Foster 提交于
[ Upstream commit efc3289cf8d39c34502a7cc9695ca2fa125aad0c ] In the typical unmount case, the AIL is forced out by the unmount sequence before the xfsaild task is stopped. Since AIL items are removed on writeback completion, this means that the AIL ->ail_buf_list delwri queue has been drained. This is not always true in the shutdown case, however. It's possible for buffers to sit on a delwri queue for a period of time across submission attempts if said items are locked or have been relogged and pinned since first added to the queue. If the attempt to log such an item results in a log I/O error, the error processing can shutdown the fs, remove the item from the AIL, stale the buffer (dropping the LRU reference) and clear its delwri queue state. The latter bit means the buffer will be released from a delwri queue on the next submission attempt, but this might never occur if the filesystem has shutdown and the AIL is empty. This means that such buffers are held indefinitely by the AIL delwri queue across destruction of the AIL. Aside from being a memory leak, these buffers can also hold references to in-core perag structures. The latter problem manifests as a generic/475 failure, reproducing the following asserts at unmount time: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 151 XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 132 To prevent this problem, clear the AIL delwri queue as a final step before xfsaild() exit. The !empty state should never occur in the normal case, so add an assert to catch unexpected problems going forward. [dgc: add comment explaining need for xfs_buf_delwri_cancel() after calling xfs_buf_delwri_submit_nowait().] Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Dave Chinner 提交于
[ Upstream commit 37fd1678245f7a5898c1b05128bc481fb403c290 ] When looking at a 4.18 based KASAN use after free report, I noticed that racing xfs_buf_rele() may race on dropping the last reference to the buffer and taking the buffer lock. This was the symptom displayed by the KASAN report, but the actual issue that was reported had already been fixed in 4.19-rc1 by commit e339dd8d ("xfs: use sync buffer I/O for sync delwri queue submission"). Despite this, I think there is still an issue with xfs_buf_rele() in this code: release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock); spin_lock(&bp->b_lock); if (!release) { ..... If two threads race on the b_lock after both dropping a reference and one getting dropping the last reference so release = true, we end up with: CPU 0 CPU 1 atomic_dec_and_lock() atomic_dec_and_lock() spin_lock(&bp->b_lock) spin_lock(&bp->b_lock) <spins> <release = true bp->b_lru_ref = 0> <remove from lists> freebuf = true spin_unlock(&bp->b_lock) xfs_buf_free(bp) <gets lock, reading and writing freed memory> <accesses freed memory> spin_unlock(&bp->b_lock) <reads/writes freed memory> IOWs, we can't safely take bp->b_lock after dropping the hold reference because the buffer may go away at any time after we drop that reference. However, this can be fixed simply by taking the bp->b_lock before we drop the reference. It is safe to nest the pag_buf_lock inside bp->b_lock as the pag_buf_lock is only used to serialise against lookup in xfs_buf_find() and no other locks are held over or under the pag_buf_lock there. Make this clear by documenting the buffer lock orders at the top of the file. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com Signed-off-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 12 8月, 2018 1 次提交
-
-
由 Eric Sandeen 提交于
The old lock tracking infrastructure in xfs using the b_last_holder field seems to only be useful if you can get into the system with a debugger; it seems that the existing tracepoints would be the way to go these days, and this old infrastructure can be removed. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 7月, 2018 4 次提交
-
-
由 Brian Foster 提交于
Now that there is only one caller, fold the common submission helper into __xfs_buf_submit(). Suggested-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Brian Foster 提交于
The buffer I/O submission path consists of separate function calls per type. The buffer I/O type is already controlled via buffer state (XBF_ASYNC), however, so there is no real need for separate submission functions. Combine the buffer submission functions into a single function that processes the buffer appropriately based on XBF_ASYNC. Retain an internal helper with a conditional wait parameter to continue to support batched !XBF_ASYNC submission/completion required by delwri queues. Suggested-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Brian Foster 提交于
If a delwri queue occurs of a buffer that sits on a delwri queue wait list, the queue sets _XBF_DELWRI_Q without changing the state of ->b_list. This occurs, for example, if another thread beats the current delwri waiter thread to the buffer lock after I/O completion. Once the waiter acquires the lock, it removes the buffer from the wait list and leaves a buffer with _XBF_DELWRI_Q set but not populated on a list. This results in a lost buffer submission and in turn can result in assert failures due to _XBF_DELWRI_Q being set on buffer reclaim or filesystem lockups if the buffer happens to cover an item in the AIL. This problem has been reproduced by repeated iterations of xfs/305 on high CPU count (28xcpu) systems with limited memory (~1GB). Dirty dquot reclaim races with an xfsaild push of a separate dquot backed by the same buffer such that the buffer sits on the reclaim wait list at the time xfsaild attempts to queue it. Since the latter dquot has been flush locked but the underlying buffer not submitted for I/O, the dquot pins the AIL and causes the filesystem to livelock. This race is essentially made possible by the buffer lock cycle involved with waiting on a synchronous delwri queue submission. Close the race by using synchronous buffer I/O for respective delwri queue submission. This means the buffer remains locked across the I/O and so is inaccessible from other contexts while in the intermediate wait list state. The sync buffer I/O wait mechanism is factored into a helper such that sync delwri buffer submission and serialization are batched operations. Designed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Brian Foster 提交于
Sync and async buffer submission both do generally similar things with a couple odd exceptions. Refactor the core buffer submission code into a common helper to isolate buffer submission from completion handling of synchronous buffer I/O. This patch does not change behavior. It is a step towards support for using synchronous buffer I/O via synchronous delwri queue submission. Designed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 09 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
xfs_reflink_convert_cow() manipulates the incore extent list in GFP_KERNEL context in the IO submission path whilst holding locked pages under writeback. This is a memory reclaim deadlock vector. This code is not in a transaction, so any memory allocations it makes aren't protected via the memalloc_nofs_save() context that transactions carry. Hence we need to run this call under memalloc_nofs_save() context to prevent potential memory allocations from being run as GFP_KERNEL and deadlocking. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 07 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 5月, 2018 2 次提交
-
-
由 Dave Chinner 提交于
When looking at an event trace recently, I noticed that non-blocking buffer lookup attempts would fail on cached locked buffers and then run the slow cache-miss path. This means we are doing an xfs_buf allocation, lookup and free unnecessarily every time we avoid blocking on a locked buffer. Fix this by changing _xfs_buf_find() to return an error status to the caller to indicate that we failed the lock attempt rather than just returning a NULL. This allows the higher level code to discriminate between a cache miss and an cache hit that we failed to lock. This also allows us to return a -EFSCORRUPTED state if we are asked to look up a block number outside the range of the filesystem in _xfs_buf_find(), which moves us one step closer to being able to handle such errors in a more graceful manner at the higher levels. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
Move xfs_buf_incore out of line and make it the only way to look up a buffer in the buffer cache from outside the buffer cache. Convert the external users of _xfs_buf_find() to xfs_buf_incore() and make _xfs_buf_find() static. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: actually rename xfs_incore -> xfs_buf_incore] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 4月, 2018 1 次提交
-
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 3月, 2018 1 次提交
-
-
由 Vratislav Bendel 提交于
Due to an inverted logic mistake in xfs_buftarg_isolate() the xfs_buffers with zero b_lru_ref will take another trip around LRU, while isolating buffers with non-zero b_lru_ref. Additionally those isolated buffers end up right back on the LRU once they are released, because b_lru_ref remains elevated. Fix that circuitous route by leaving them on the LRU as originally intended. Signed-off-by: NVratislav Bendel <vbendel@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 29 1月, 2018 1 次提交
-
-
由 Carlos Maiolino 提交于
Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: minor style cleanups] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 1月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
If a metadata IO error happens, we report the location of the failed IO request in units of daddrs. However, the printk message misleads people into thinking that the units are fs blocks, so fix the reported units. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 09 1月, 2018 2 次提交
-
-
由 Darrick J. Wong 提交于
Increase the corrupt buffer dump to the first 128 bytes since v5 filesystems have larger block headers than before. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Since all verification errors also mark the buffer as having an error, we can combine these two calls. Later we'll add a xfs_failaddr_t parameter to promote the idea of reporting corruption errors and the address of the failing check to enable better debugging reports. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 29 11月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
percpu_counter_init failure path doesn't clean up &btp->bt_lru list. Call list_lru_destroy in that error path. Similarly register_shrinker error path is not handled. While it is unlikely to trigger these error path, it is not impossible especially the later might fail with large NUMAs. Let's handle the failure to make the code more robust. Noticed-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 02 11月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 28 10月, 2017 1 次提交
-
-
由 Brian Foster 提交于
Fix an unused variable warning on non-DEBUG builds introduced by commit 7561d27e ("xfs: buffer lru reference count error injection tag"). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 27 10月, 2017 1 次提交
-
-
由 Brian Foster 提交于
XFS uses a fixed reference count for certain types of buffers in the internal LRU cache. These reference counts dictate how aggressively certain buffers are reclaimed vs. others. While the reference counts implements priority across different buffer types, all buffers (other than uncached buffers) are typically cached for at least one reclaim cycle. We've had at least one bug recently that has been hidden by a released buffer sitting around in the LRU. Users hitting the problem were able to reproduce under enough memory pressure to cause aggressive reclaim in a particular window of time. To support future xfstests cases, add an error injection tag to hardcode the buffer reference count to zero. When enabled, this bypasses caching of associated buffers and facilitates test cases that depend on this behavior. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 26 9月, 2017 1 次提交
-
-
由 Colin Ian King 提交于
Variable total_nr_pages is being initialized and then updated with the same value, this latter assignment is redundant and can be removed. Cleans up clang build warning: Value stored to 'total_nr_pages' during its initialization is never read Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 01 9月, 2017 1 次提交
-
-
由 Dan Williams 提交于
The ->iomap_begin() operation is a hot path, so cache the fs_dax_get_by_host() result at mount time to avoid the incurring the hash lookup overhead on a per-i/o basis. Reported-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 24 8月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 19 6月, 2017 1 次提交
-
-
由 Brian Foster 提交于
Reclaim during quotacheck can lead to deadlocks on the dquot flush lock: - Quotacheck populates a local delwri queue with the physical dquot buffers. - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and dirties all of the dquots. - Reclaim kicks in and attempts to flush a dquot whose buffer is already queud on the quotacheck queue. The flush succeeds but queueing to the reclaim delwri queue fails as the backing buffer is already queued. The flush unlock is now deferred to I/O completion of the buffer from the quotacheck queue. - The dqadjust bulkstat continues and dirties the recently flushed dquot once again. - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires the flush lock to update the backing buffers with the in-core recalculated values. It deadlocks on the redirtied dquot as the flush lock was already acquired by reclaim, but the buffer resides on the local delwri queue which isn't submitted until the end of quotacheck. This is reproduced by running quotacheck on a filesystem with a couple million inodes in low memory (512MB-1GB) situations. This is a regression as of commit 43ff2122 ("xfs: on-stack delayed write buffer lists"), which removed a trylock and buffer I/O submission from the quotacheck dquot flush sequence. Quotacheck first resets and collects the physical dquot buffers in a delwri queue. Then, it traverses the filesystem inodes via bulkstat, updates the in-core dquots, flushes the corrected dquots to the backing buffers and finally submits the delwri queue for I/O. Since the backing buffers are queued across the entire quotacheck operation, dquot reclaim cannot possibly complete a dquot flush before quotacheck completes. Therefore, quotacheck must submit the buffer for I/O in order to cycle the flush lock and flush the dirty in-core dquot to the buffer. Add a delwri queue buffer push mechanism to submit an individual buffer for I/O without losing the delwri queue status and use it from quotacheck to avoid the deadlock. This restores quotacheck behavior to as before the regression was introduced. Reported-by: NMartin Svec <martin.svec@zoner.cz> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 09 6月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 6月, 2017 1 次提交
-
-
由 Brian Foster 提交于
The 0-day kernel test robot reports assertion failures on !CONFIG_SMP kernels due to failed spin_is_locked() checks. As it turns out, spin_is_locked() is hardcoded to return zero on !CONFIG_SMP kernels and so this function cannot be relied on to verify spinlock state in this configuration. To avoid this problem, replace the associated asserts with lockdep variants that do the right thing regardless of kernel configuration. Drop the one assert that checks for an unlocked lock as there is no suitable lockdep variant for that case. This moves the spinlock checks from XFS debug code to lockdep, but generally provides the same level of protection. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 31 5月, 2017 1 次提交
-
-
由 Brian Foster 提交于
We've had user reports of unmount hangs in xfs_wait_buftarg() that analysis shows is due to btp->bt_io_count == -1. bt_io_count represents the count of in-flight asynchronous buffers and thus should always be >= 0. xfs_wait_buftarg() waits for this value to stabilize to zero in order to ensure that all untracked (with respect to the lru) buffers have completed I/O processing before unmount proceeds to tear down in-core data structures. The value of -1 implies an I/O accounting decrement race. Indeed, the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele() (where the buffer lock is no longer held) means that bp->b_flags can be updated from an unsafe context. While a user-level reproducer is currently not available, some intrusive hacks to run racing buffer lookups/ioacct/releases from multiple threads was used to successfully manufacture this problem. Existing callers do not expect to acquire the buffer lock from xfs_buf_rele(). Therefore, we can not safely update ->b_flags from this context. It turns out that we already have separate buffer state bits and associated serialization for dealing with buffer LRU state in the form of ->b_state and ->b_lock. Therefore, replace the _XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O accounting wrappers appropriately and make sure they are used with the correct locking. This ensures that buffer in-flight state can be modified at buffer release time without racing with modifications from a buffer lock holder. Fixes: 9c7504aa ("xfs: track and serialize in-flight async buffers against unmount") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Tested-by: NLibor Pechacek <lpechacek@suse.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 04 5月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
kmem_zalloc_large and _xfs_buf_map_pages use memalloc_noio_{save,restore} API to prevent from reclaim recursion into the fs because vmalloc can invoke unconditional GFP_KERNEL allocations and these functions might be called from the NOFS contexts. The memalloc_noio_save will enforce GFP_NOIO context which is even weaker than GFP_NOFS and that seems to be unnecessary. Let's use memalloc_nofs_{save,restore} instead as it should provide exactly what we need here - implicit GFP_NOFS context. Link: http://lkml.kernel.org/r/20170306131408.9828-6-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 4月, 2017 1 次提交
-
-
由 Brian Foster 提交于
The quotacheck error handling of the delwri buffer list assumes the resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag on the buffers that are dequeued. This can lead to assert failures on buffer release and possibly other locking problems. Move this code to a delwri queue cancel helper function to encapsulate the logic required to properly release buffers from a delwri queue. Update the helper to clear the delwri queue flag and call it from quotacheck. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 02 3月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
Update the .c files that depend on these APIs. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 2月, 2017 1 次提交
-
-
由 Jan Kara 提交于
blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 1月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
If we try to allocate memory pages to back an xfs_buf that we're trying to read, it's possible that we'll be so short on memory that the page allocation fails. For a blocking read we'll just wait, but for readahead we simply dump all the pages we've collected so far. Unfortunately, after dumping the pages we neglect to clear the _XBF_PAGES state, which means that the subsequent call to xfs_buf_free thinks that b_pages still points to pages we own. It then double-frees the b_pages pages. This results in screaming about negative page refcounts from the memory manager, which xfs oughtn't be triggering. To reproduce this case, mount a filesystem where the size of the inodes far outweighs the availalble memory (a ~500M inode filesystem on a VM with 300MB memory did the trick here) and run bulkstat in parallel with other memory eating processes to put a huge load on the system. The "check summary" phase of xfs_scrub also works for this purpose. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-
- 09 12月, 2016 1 次提交
-
-
由 Dave Chinner 提交于
There is no reason anymore for not issuing device integrity operations when teh filesystem requires ordering or data integrity guarantees. We should always issue cache flushes and FUA writes where necessary and let the underlying storage optimise them as necessary for correct integrity operation. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 07 12月, 2016 1 次提交
-
-
由 Lucas Stach 提交于
On filesystems with a lot of metadata and in metadata intensive workloads xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of the CPU time is spent on CPU cache misses while traversing the rbtree. As the buffer cache does not need any kind of ordering, but fast lookups a hashtable is the natural data structure to use. The rhashtable infrastructure provides a self-scaling hashtable implementation and allows lookups to proceed while the table is going through a resize operation. This reduces the CPU-time spent for the lookups to 1/3 even for small filesystems with a relatively small number of cached buffers, with possibly much larger gains on higher loaded filesystems. [dchinner: reduce minimum hash size to an acceptable size for large filesystems with many AGs with no active use.] [dchinner: remove stale rbtree asserts.] [dchinner: use xfs_buf_map for compare function argument.] [dchinner: make functions static.] [dchinner: remove redundant comments.] Signed-off-by: NLucas Stach <dev@lynxeye.de> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 01 11月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 8月, 2016 1 次提交
-
-
由 Brian Foster 提交于
xfs_wait_buftarg() waits for all pending I/O, drains the ioend completion workqueue and walks the LRU until all buffers in the cache have been released. This is traditionally an unmount operation` but the mechanism is also reused during filesystem freeze. xfs_wait_buftarg() invokes drain_workqueue() as part of the quiesce, which is intended more for a shutdown sequence in that it indicates to the queue that new operations are not expected once the drain has begun. New work jobs after this point result in a WARN_ON_ONCE() and are otherwise dropped. With filesystem freeze, however, read operations are allowed and can proceed during or after the workqueue drain. If such a read occurs during the drain sequence, the workqueue infrastructure complains about the queued ioend completion work item and drops it on the floor. As a result, the buffer remains on the LRU and the freeze never completes. Despite the fact that the overall buffer cache cleanup is not necessary during freeze, fix up this operation such that it is safe to invoke during non-unmount quiesce operations. Replace the drain_workqueue() call with flush_workqueue(), which runs a similar serialization on pending workqueue jobs without causing new jobs to be dropped. This is safe for unmount as unmount independently locks out new operations by the time xfs_wait_buftarg() is invoked. cc: <stable@vger.kernel.org> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 17 8月, 2016 1 次提交
-
-
由 Brian Foster 提交于
The buffer I/O accounting mechanism tracks async buffers under I/O. As an optimization, the buffer I/O count is incremented only once on the first async I/O for a given hold cycle of a buffer and decremented once the buffer is released to the LRU (or freed). xfs_buf_ioacct_dec() has an ASSERT() check for an XBF_ASYNC buffer, but we have one or two corner cases where a buffer can be submitted for I/O multiple times via different methods in a single hold cycle. If an async I/O occurs first, the I/O count is incremented. If a sync I/O occurs before the hold count drops, XBF_ASYNC is cleared by the time the I/O count is decremented. Remove the async assert check from xfs_buf_ioacct_dec() as this is a perfectly valid scenario. For the purposes of I/O accounting, we really only care about the buffer async state at I/O submission time. Discovered-and-analyzed-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 20 7月, 2016 1 次提交
-
-
由 Brian Foster 提交于
Newly allocated XFS metadata buffers are added to the LRU once the hold count is released, which typically occurs after I/O completion. There is no other mechanism at current that tracks the existence or I/O state of a new buffer. Further, readahead I/O tends to be submitted asynchronously by nature, which means the I/O can remain in flight and actually complete long after the calling context is gone. This means that file descriptors or any other holds on the filesystem can be released, allowing the filesystem to be unmounted while I/O is still in flight. When I/O completion occurs, core data structures may have been freed, causing completion to run into invalid memory accesses and likely to panic. This problem is reproduced on XFS via directory readahead. A filesystem is mounted, a directory is opened/closed and the filesystem immediately unmounted. The open/close cycle triggers a directory readahead that if delayed long enough, runs buffer I/O completion after the unmount has completed. To address this problem, add a mechanism to track all in-flight, asynchronous buffers using per-cpu counters in the buftarg. The buffer is accounted on the first I/O submission after the current reference is acquired and unaccounted once the buffer is returned to the LRU or freed. Update xfs_wait_buftarg() to wait on all in-flight I/O before walking the LRU list. Once in-flight I/O has completed and the workqueue has drained, all new buffers should have been released onto the LRU. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-