- 23 8月, 2014 6 次提交
-
-
由 Weston Andros Adamson 提交于
Commit 6094f838 "nfs: allow coalescing of subpage requests" got rid of the requirement that requests cover whole pages, but it made some incorrect assumptions. It turns out that callers of this interface can map adjacent requests (by file position as seen by req_offset + req->wb_bytes) to different pages, even when they could share a page. An example is the direct I/O interface - iov_iter_get_pages_alloc may return one segment with a partial page filled and the next segment (which is adjacent in the file position) starts with a new page. Reported-by: NToralf Förster <toralf.foerster@gmx.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Adjacent requests that share the same page are allowed, but should only use one entry in the page vector. This avoids overruning the page vector - it is sized based on how many bytes there are, not by request count. This fixes issues that manifest as "Redzone overwritten" bugs (the vector overrun) and hangs waiting on page read / write, as it waits on the same page more than once. This also adds bounds checking to the page vector with a graceful failure (WARN_ON_ONCE and pgio error returned to application). Reported-by: NToralf Förster <toralf.foerster@gmx.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
This handles the 'nonblock=false' case in nfs_lock_and_join_requests. If the group is already locked and blocking is allowed, drop the inode lock and wait for the group lock to be cleared before trying it all again. This should fix warnings found in peterz's tree (sched/wait branch), where might_sleep() checks are added to wait.[ch]. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Reviewed-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
__nfs_pageio_add_request was calling nfs_page_group_lock nonblocking, but this can return -EAGAIN which would end up passing -EIO to the application. There is no reason not to block in this path, so change the two calls to do so. Also, there is no need to check the return value of nfs_page_group_lock when nonblock=false, so remove the error handling code. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Reviewed-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
nfs_page_group_lock was calling wait_on_bit_lock even when told not to block. Fix by first trying test_and_set_bit, followed by wait_on_bit_lock if and only if blocking is allowed. Return -EAGAIN if nonblocking and the test_and_set of the bit was already locked. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Reviewed-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Flip the meaning of the second argument from 'wait' to 'nonblock' to match related functions. Update all five calls to reflect this change. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Reviewed-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 04 8月, 2014 1 次提交
-
-
由 Weston Andros Adamson 提交于
Return errors from wait_on_bit_lock from nfs_page_group_lock. Add a bool argument @wait to nfs_page_group_lock. If true, loop over wait_on_bit_lock until it returns cleanly. If false, return the error from wait_on_bit_lock. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 16 7月, 2014 2 次提交
-
-
由 NeilBrown 提交于
It is currently not possible for various wait_on_bit functions to implement a timeout. While the "action" function that is called to do the waiting could certainly use schedule_timeout(), there is no way to carry forward the remaining timeout after a false wake-up. As false-wakeups a clearly possible at least due to possible hash collisions in bit_waitqueue(), this is a real problem. The 'action' function is currently passed a pointer to the word containing the bit being waited on. No current action functions use this pointer. So changing it to something else will be a little noisy but will have no immediate effect. This patch changes the 'action' function to take a pointer to the "struct wait_bit_key", which contains a pointer to the word containing the bit so nothing is really lost. It also adds a 'private' field to "struct wait_bit_key", which is initialized to zero. An action function can now implement a timeout with something like static int timed_out_waiter(struct wait_bit_key *key) { unsigned long waited; if (key->private == 0) { key->private = jiffies; if (key->private == 0) key->private -= 1; } waited = jiffies - key->private; if (waited > 10 * HZ) return -EAGAIN; schedule_timeout(waited - 10 * HZ); return 0; } If any other need for context in a waiter were found it would be easy to use ->private for some other purpose, or even extend "struct wait_bit_key". My particular need is to support timeouts in nfs_release_page() to avoid deadlocks with loopback mounted NFS. While wait_on_bit_timeout() would be a cleaner interface, it will not meet my need. I need the timeout to be sensitive to the state of the connection with the server, which could change. So I need to use an 'action' interface. Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: David Howells <dhowells@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 NeilBrown 提交于
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 7月, 2014 1 次提交
-
-
由 Trond Myklebust 提交于
Once we've started sending unstable NFS writes, we do not want to clear pg_moreio, or we may end up sending the very last request as a stable write if the commit lists are still empty. Do, however, reset pg_moreio in the case where we end up having to recoalesce the write if an attempt to use pNFS failed. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 13 7月, 2014 3 次提交
-
-
由 Weston Andros Adamson 提交于
Change nfs_find_and_lock_request so nfs_page_async_flush can handle multiple requests in a page. There is only one request for a page the first time nfs_page_async_flush is called, but if a write or commit fails, async_flush is called again and there may be multiple requests associated with the page. The solution is to merge all the requests in a page group into a single request before calling nfs_pageio_add_request. Rename nfs_find_and_lock_request to nfs_lock_and_join_requests and change it to first lock all requests for the page, then cancel and merge all subrequests into the head request. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
nfs_pages that aren't the the head of a group must take a reference on the head as long as ->wb_head is set to it. This stops the head from hitting a refcount of 0 while there is still an active nfs_page for the page group. This avoids kref warnings in the writeback code when the page group head is found and referenced. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Change the use of PG_INODE_REF - set it when taking extra reference on subrequests and take care to only release once for each request. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 25 6月, 2014 8 次提交
-
-
由 Anna Schumaker 提交于
inode is unused when CONFIG_SUNRPC_DEBUG=n. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
EXPORT_GPLs of nfs_pageio_add_request and nfs_pageio_complete aren't needed anymore. Suggested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Clean up pnfs_read_done_resend_to_mds and pnfs_write_done_resend_to_mds: - instead of passing all arguments from a nfs_pgio_header, just pass the header - share the common code Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
The refcounting on nfs_pgio_header was related to there being (possibly) more than one nfs_pgio_data. Now that nfs_pgio_data has been merged into nfs_pgio_header, there is no reason to do this ref counting. Just call the completion callback on nfs_pgio_release/nfs_pgio_error. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Remove duplicate writeverf structure from merge of nfs_pgio_header and nfs_pgio_data and remove writeverf related flags and logic to handle more than one RPC per nfs_pgio_header. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
struct nfs_pgio_data only exists as a member of nfs_pgio_header, but is passed around everywhere, because there used to be multiple _data structs per _header. Many of these functions then use the _data to find a pointer to the _header. This patch cleans this up by merging the nfs_pgio_data structure into nfs_pgio_header and passing nfs_pgio_header around instead. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Rename "verf" to "writeverf" and "pages" to "page_array" to prepare for merge of nfs_pgio_data and nfs_pgio_header. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
nfs_rw_header was used to allocate an nfs_pgio_header along with an nfs_pgio_data, because a _header would need at least one _data. Now there is only ever one nfs_pgio_data for each nfs_pgio_header -- move it to nfs_pgio_header and get rid of nfs_rw_header. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 30 5月, 2014 2 次提交
-
-
由 Trond Myklebust 提交于
We cannot allow nfs_page_group_lock to use TASK_KILLABLE here, since the loop would cause a busy wait if somebody kills the task. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Handle the case where nfs_create_request() returns an error. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 29 5月, 2014 17 次提交
-
-
由 Weston Andros Adamson 提交于
Since the ability to split pages into subpage requests has been added, nfs_pgio_header->rpc_list only ever has one pgio data. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Use the newly added support for multiple requests per page for rsize/wsize < PAGE_SIZE, instead of having multiple read / write data structures per pageio header. This allows us to get rid of nfs_pgio_multi. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Remove check that the request covers a whole page. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Operations that modify state for a whole page must be syncronized across all requests within a page group. In the write path, this is calling end_page_writeback and removing the head request from an inode. Both of these operations should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Operations that modify state for a whole page must be syncronized across all requests within a page group. In the read path, this is calling unlock_page and SetPageUptodate. Both of these functions should not be called until all requests in a page group have reached the point where they would call them. This patch should have no effect yet since all page groups currently have one request, but will come into play when pg_test functions are modified to split pages into sub-page regions. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Add "page groups" - a circular list of nfs requests (struct nfs_page) that all reference the same page. This gives nfs read and write paths the ability to account for sub-page regions independently. This somewhat follows the design of struct buffer_head's sub-page accounting. Only "head" requests are ever added/removed from the inode list in the buffered write path. "head" and "sub" requests are treated the same through the read path and the rest of the write/commit path. Requests are given an extra reference across the life of the list. Page groups are never rejoined after being split. If the read/write request fails and the client falls back to another path (ie revert to MDS in PNFS case), the already split requests are pushed through the recoalescing code again, which may split them further and then coalesce them into properly sized requests on the wire. Fragmentation shouldn't be a problem with the current design, because we flush all requests in page group when a non-contiguous request is added, so the only time resplitting should occur is on a resend of a read or write. This patch lays the groundwork for sub-page splitting, but does not actually do any splitting. For now all page groups have one request as pg_test functions don't yet split pages. There are several related patches that are needed support multiple requests per page group. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
Call nfs_can_coalesce_requests for every request, even the first one. This is needed for future patches to give pg_test a way to inform add_request to reduce the size of the request. Now @prev can be null in nfs_can_coalesce_requests and pg_test functions. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
This is a step toward allowing pg_test to inform the the coalescing code to reduce the size of requests so they may fit in whatever scheme the pg_test callback wants to define. For now, just return the size of the request if there is space, or 0 if there is not. This shouldn't change any behavior as it acts the same as when the pg_test functions returned bool. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Weston Andros Adamson 提交于
@inode is passed but not used. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
At this point the read and write structures look identical, so combine them into something shared by both. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
What we have here is two functions that look identical. Let's share some more code! Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
Once again, these two functions look identical in the read and write case. Time to combine them together! Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
Most of this code is the same for both the read and write paths, so combine everything and use the rw_ops when necessary. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
These functions are almost identical on both the read and write side. FLUSH_COND_STABLE will never be set for the read path, so leaving it in the generic code won't hurt anything. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
At this point, the read and write versions of this function look identical so both should use the same function. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
Write adds a little bit of code dealing with flush flags, but since "how" will always be 0 when reading we can share the code. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
The read and write paths set up this struct in exactly the same way, so create a single shared struct. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-