- 12 9月, 2017 2 次提交
-
-
由 NeilBrown 提交于
1/ remove 'start' and 'end' args from nfs_file_fsync_commit(). They aren't used. 2/ Make nfs_context_set_write_error() a "static inline" in internal.h so we can... 3/ Use nfs_context_set_write_error() instead of mapping_set_error() if nfs_pageio_add_request() fails before sending any request. NFS generally keeps errors in the open_context, not the mapping, so this is more consistent. 4/ If filemap_write_and_write_range() reports any error, still check ctx->error. The value in ctx->error is likely to be more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is cleared slightly earlier, before nfs_file_fsync_commit() is called, rather than at the start of that function. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Chuck Lever 提交于
Tools like tcpdump and rpcdebug can be very useful. But there are plenty of environments where they are difficult or impossible to use. For example, we've had customers report I/O failures during workloads so heavy that collecting network traffic or enabling RPC debugging are themselves onerous. The kernel's static tracepoints are lightweight (less likely to introduce timing changes) and efficient (the trace data is compact). They also work in scenarios where capturing network traffic is not possible due to lack of hardware support (some InfiniBand HCAs) or where data or network privacy is a concern. Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT is initiated, and when it completes. Record the arguments and results of each operation, which are not shown by existing sunrpc module's tracepoints. For instance, the recorded offset and count can be used to match an "initiate" event to a "done" event. If an NFS READ result returns fewer bytes than requested or zero, seeing the EOF flag can be probative. Seeing an NFS4ERR_BAD_STATEID result is also indication of a particular class of problems. The timing information attached to each event record can often be useful as well. Usage example: [root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done /sys/kernel/debug/tracing/events/nfs/*initiate*/filter /sys/kernel/debug/tracing/events/nfs/*done/filter Hit Ctrl^C to stop recording ^CKernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 bytes: 3680 oldest event ts: 78.367422 now ts: 100.124419 dropped events: 0 read events: 74 ... and so on. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 10 9月, 2017 4 次提交
-
-
由 Trond Myklebust 提交于
If we skip a subrequest due to a zero refcount, we should still count the byte range that it covered so that we accurately reconstruct the original request size. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
That can deadlock if this is the last reference since nfs_page_group_destroy() calls nfs_page_group_sync_on_bit(). Note that even if the page was removed from the subpage list, the req->wb_head could still be pointing to the old head. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
It's pretty much a duplicate of nfs_scan_commit_list() that also clears the PG_COMMIT_TO_DS flag. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Since the commit list is not ordered, it is possible for nfs_scan_commit_list to hold a request that nfs_lock_and_join_requests() is waiting for, while at the same time trying to grab a request that nfs_lock_and_join_requests already holds. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 07 9月, 2017 1 次提交
-
-
由 NeilBrown 提交于
Commit fbe77c30 ("NFS: move rw_mode to nfs_pageio_header") reintroduced some pointless code that commit 518662e0 ("NFS: fix usage of mempools.") had recently removed. Remove it again. Cc: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 20 8月, 2017 1 次提交
-
-
由 Trond Myklebust 提交于
Now that the mirror allocation has been moved, the parameter can go. Also remove the redundant symbol export. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 15 8月, 2017 23 次提交
-
-
由 Trond Myklebust 提交于
If a request is on the commit list, but is locked, we will currently skip it, which can lead to livelocking when the commit count doesn't reduce to zero. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Switch from using the inode->i_lock for this to avoid contention with other metadata manipulation. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Rather than forcing us to take the inode->i_lock just in order to bump the number. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
The commit lists can get very large, so using the inode->i_lock can end up affecting general metadata performance. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Split out the 2 cases so that we can treat the locking differently. The issue is that the locking in the pageswapcache cache is highly linked to the commit list locking. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Hide the locking from nfs_lock_and_join_requests() so that we can separate out the requirements for swapcache pages. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Fix up the test in nfs_page_group_covers_page(). The simplest implementation is to check that we have a set of intersecting or contiguous subrequests that connect page offset 0 to nfs_page_length(req->wb_page). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
nfs_page_group_lock() is now always called with the 'nonblock' parameter set to 'false'. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
At this point, we only expect ever to potentially see PG_REMOVE and PG_TEARDOWN being set on the subrequests. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Since nfs_page_group_destroy() does not take any locks on the requests to be freed, we need to ensure that we don't inadvertently free the request in nfs_destroy_unlinked_subrequests() while the last reference is being released elsewhere. Do this by: 1) Taking a reference to the request unless it is already being freed 2) Checking (under the page group lock) if PG_TEARDOWN is already set before freeing an unreferenced request in nfs_destroy_unlinked_subrequests() Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
When locking the entire group in order to remove subrequests, the locks are always taken in order, and with the page group lock being taken after the page head is locked. The intention is that: 1) The lock on the group head guarantees that requests may not be removed from the group (although new entries could be appended if we're not holding the group lock). 2) It is safe to drop and retake the page group lock while iterating through the list, in particular when waiting for a subrequest lock. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
We should no longer need the inode->i_lock, now that we've straightened out the request locking. The locking schema is now: 1) Lock page head request 2) Lock the page group 3) Lock the subrequests one by one Note that there is a subtle race with nfs_inode_remove_request() due to the fact that the latter does not lock the page head, when removing it from the struct page. Only the last subrequest is locked, hence we need to re-check that the PagePrivate(page) is still set after we've locked all the subrequests. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
nfs_try_to_update_request() should be able to cope now. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Simplify the code, and avoid some flushes to disk. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests() manipulate the inode flags adjusting the NFS_I(inode)->nrequests. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
We don't want nfs_lock_and_join_requests() to start fiddling with the request before the call to nfs_page_group_sync_on_bit(). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Request offsets and sizes are not guaranteed to be stable unless you are holding the request locked. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
All other callers of nfs_page_group_lock() appear to already hold the page lock on the head page, so doing it in the opposite order here is inefficient, although not deadlock prone since we roll back all locks on contention. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Yes, this is a situation that should never happen (hence the WARN_ON) but we should still ensure that we free up the locks and references to the faulty pages. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Micro-optimisation to move the lockless check into the for(;;) loop. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> -
由 Trond Myklebust 提交于
Add a lockless check for whether or not the page might be carrying an existing writeback before we grab the inode->i_lock. Reported-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
We don't expect the page header lock to ever be held across I/O, so it should always be safe to wait for it, even if we're doing nonblocking writebacks. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 14 7月, 2017 2 次提交
-
-
由 Trond Myklebust 提交于
Now that the writes will schedule a commit on their own, we don't need nfs_write_inode() to schedule one if there are outstanding writes, and we're being called in non-blocking mode. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
If the page cache is being flushed, then we want to ensure that we do start a commit once the pages are done being flushed. If we just wait until all I/O is done to that file, we can end up livelocking until the balance_dirty_pages() mechanism puts its foot down and forces I/O to stop. So instead we do more or less the same thing that O_DIRECT does, and set up a counter to tell us when the flush is done, Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 09 5月, 2017 1 次提交
-
-
由 Olga Kornievskaia 提交于
Instead of messing with the commit path which has been causing issues, add a COMMIT op after the COPY and ask for stable copies in the first space. It saves a round trip, since after the COPY, the client sends a COMMIT anyway. Signed-off-by: NOlga Kornievskaia <kolga@netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 27 4月, 2017 1 次提交
-
-
由 Trond Myklebust 提交于
If the client receives a fatal server error from nfs_pageio_add_request(), then we should always truncate the page on which the error occurred. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 26 4月, 2017 1 次提交
-
-
由 Trond Myklebust 提交于
If the server has already returned a fatal write error that the user has not yet received on this file, then don't write back the other pages. Instead, act as if they have been sent, and have returned with the same error. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 21 4月, 2017 4 次提交
-
-
由 Jan Kara 提交于
Allocate struct backing_dev_info separately instead of embedding it inside the superblock. This unifies handling of bdi among users. CC: Anna Schumaker <anna.schumaker@netapp.com> CC: linux-nfs@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz> Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Benjamin Coddington 提交于
Let's try to have it in a cacheline in nfs4_proc_pgio_rpc_prepare(). Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Fred Isaman 提交于
Signed-off-by: NFred Isaman <fred.isaman@gmail.com> Fixes: 0bcbf039 ("nfs: handle request add failure properly") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 NeilBrown 提交于
When passed GFP flags that allow sleeping (such as GFP_NOIO), mempool_alloc() will never return NULL, it will wait until memory is available. This means that we don't need to handle failure, but that we do need to ensure one thread doesn't call mempool_alloc() twice on the one pool without queuing or freeing the first allocation. If multiple threads did this during times of high memory pressure, the pool could be exhausted and a deadlock could result. pnfs_generic_alloc_ds_commits() attempts to allocate from the nfs_commit_mempool while already holding an allocation from that pool. This is not safe. So change nfs_commitdata_alloc() to take a flag that indicates whether failure is acceptable. In pnfs_generic_alloc_ds_commits(), accept failure and handle it as we currently do. Else where, do not accept failure, and do not handle it. Even when failure is acceptable, we want to succeed if possible. That means both - using an entry from the pool if there is one - waiting for direct reclaim is there isn't. We call mempool_alloc(GFP_NOWAIT) to achieve the first, then kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the second. Each of these can fail, but together they do the best they can without blocking indefinitely. The objects returned by kmem_cache_alloc() will still be freed by mempool_free(). This is safe as mempool_alloc() uses exactly the same function to allocate objects (since the mempool was created with mempool_create_slab_pool()). The object returned by mempool_alloc() and kmem_cache_alloc() are indistinguishable so mempool_free() will handle both identically, either adding to the pool or calling kmem_cache_free(). Also, don't test for failure when allocating from nfs_wdata_mempool. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-