- 01 6月, 2018 2 次提交
-
-
由 Anna Schumaker 提交于
Rather than doing this in the generic NFS client code. Let's put this with the other v4 stuff so it's all in one place. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Anna Schumaker 提交于
This doesn't really need to be in the generic NFS client code, and I think it makes more sense to keep the v4 code in one place. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
- 11 4月, 2018 2 次提交
-
-
由 Trond Myklebust 提交于
When we've changed the file size, then ensure we declare it to be up to date in the inode attributes. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by a call to nfs_post_op_update_inode_locked(), then it will not be cleared until all the attributes have been revalidated. This means, for instance, that NFSv4 writes will always force a full attribute revalidation. Track the ctime, mtime, size and change attribute separately from the other attributes so that we can have nfs_post_op_update_inode_locked() set them correctly, and later have the cache consistency bitmask be able to clear them. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 20 3月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 3月, 2018 1 次提交
-
-
由 Trond Myklebust 提交于
We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to ensure that all outstanding COMMIT requests to the inode in question are complete. Currently we may exit early from both nfs_commit_inode() and nfs_write_inode() even if there are COMMIT requests in flight, or unstable writes on the commit list. In order to get the right semantics w.r.t. sync_inode(), we don't need to have nfs_commit_inode() reset the inode dirty flags when called from nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that nfs_write_inode() leaves them in the right state if there are outstanding commits, or stable pages. Reported-by: NScott Mayhew <smayhew@redhat.com> Fixes: dc4fd9ab ("nfs: don't wait on commit in nfs_commit_inode()...") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 29 1月, 2018 1 次提交
-
-
由 Jeff Layton 提交于
For NFS, we just use the "raw" API since the i_version is mostly managed by the server. The exception there is when the client holds a write delegation, but we only need to bump it once there anyway to handle CB_GETATTR. Tested-by: NKrzysztof Kozlowski <krzk@kernel.org> Signed-off-by: NJeff Layton <jlayton@redhat.com>
-
- 15 1月, 2018 1 次提交
-
-
由 Trond Myklebust 提交于
The commit list can get very large, and so we need a cond_resched() in nfs_commit_release_pages() in order to ensure we don't hog the CPU for excessive periods of time. Reported-by: NMike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 16 12月, 2017 1 次提交
-
-
由 Scott Mayhew 提交于
If there were no commit requests, then nfs_commit_inode() should not wait on the commit or mark the inode dirty, otherwise the following BUG_ON can be triggered: [ 1917.130762] kernel BUG at fs/inode.c:578! [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1] [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1 [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000 [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80 [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64) [ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 22002822 XER: 20000000 [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1 GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750 GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03 GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8 GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0 GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8 [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350 [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350 [ 1917.130873] Call Trace: [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable) [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270 [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0 [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs] [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0 [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180 [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150 [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100 [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60 [ 1917.130919] Instruction dump: [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0 [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134 Signed-off-by: NScott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org # 4.5+ Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 18 11月, 2017 1 次提交
-
-
由 Markus Elfring 提交于
Add a jump target so that a bit of exception handling can be better reused at the end of this function. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 12 9月, 2017 2 次提交
-
-
由 NeilBrown 提交于
1/ remove 'start' and 'end' args from nfs_file_fsync_commit(). They aren't used. 2/ Make nfs_context_set_write_error() a "static inline" in internal.h so we can... 3/ Use nfs_context_set_write_error() instead of mapping_set_error() if nfs_pageio_add_request() fails before sending any request. NFS generally keeps errors in the open_context, not the mapping, so this is more consistent. 4/ If filemap_write_and_write_range() reports any error, still check ctx->error. The value in ctx->error is likely to be more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is cleared slightly earlier, before nfs_file_fsync_commit() is called, rather than at the start of that function. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Chuck Lever 提交于
Tools like tcpdump and rpcdebug can be very useful. But there are plenty of environments where they are difficult or impossible to use. For example, we've had customers report I/O failures during workloads so heavy that collecting network traffic or enabling RPC debugging are themselves onerous. The kernel's static tracepoints are lightweight (less likely to introduce timing changes) and efficient (the trace data is compact). They also work in scenarios where capturing network traffic is not possible due to lack of hardware support (some InfiniBand HCAs) or where data or network privacy is a concern. Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT is initiated, and when it completes. Record the arguments and results of each operation, which are not shown by existing sunrpc module's tracepoints. For instance, the recorded offset and count can be used to match an "initiate" event to a "done" event. If an NFS READ result returns fewer bytes than requested or zero, seeing the EOF flag can be probative. Seeing an NFS4ERR_BAD_STATEID result is also indication of a particular class of problems. The timing information attached to each event record can often be useful as well. Usage example: [root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done /sys/kernel/debug/tracing/events/nfs/*initiate*/filter /sys/kernel/debug/tracing/events/nfs/*done/filter Hit Ctrl^C to stop recording ^CKernel buffer statistics: Note: "entries" are the entries left in the kernel ring buffer and are not recorded in the trace data. They should all be zero. CPU: 0 entries: 0 overrun: 0 commit overrun: 0 bytes: 3680 oldest event ts: 78.367422 now ts: 100.124419 dropped events: 0 read events: 74 ... and so on. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 10 9月, 2017 4 次提交
-
-
由 Trond Myklebust 提交于
If we skip a subrequest due to a zero refcount, we should still count the byte range that it covered so that we accurately reconstruct the original request size. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
That can deadlock if this is the last reference since nfs_page_group_destroy() calls nfs_page_group_sync_on_bit(). Note that even if the page was removed from the subpage list, the req->wb_head could still be pointing to the old head. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
It's pretty much a duplicate of nfs_scan_commit_list() that also clears the PG_COMMIT_TO_DS flag. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Since the commit list is not ordered, it is possible for nfs_scan_commit_list to hold a request that nfs_lock_and_join_requests() is waiting for, while at the same time trying to grab a request that nfs_lock_and_join_requests already holds. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 07 9月, 2017 1 次提交
-
-
由 NeilBrown 提交于
Commit fbe77c30 ("NFS: move rw_mode to nfs_pageio_header") reintroduced some pointless code that commit 518662e0 ("NFS: fix usage of mempools.") had recently removed. Remove it again. Cc: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 20 8月, 2017 1 次提交
-
-
由 Trond Myklebust 提交于
Now that the mirror allocation has been moved, the parameter can go. Also remove the redundant symbol export. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 15 8月, 2017 22 次提交
-
-
由 Trond Myklebust 提交于
If a request is on the commit list, but is locked, we will currently skip it, which can lead to livelocking when the commit count doesn't reduce to zero. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Switch from using the inode->i_lock for this to avoid contention with other metadata manipulation. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Rather than forcing us to take the inode->i_lock just in order to bump the number. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
The commit lists can get very large, so using the inode->i_lock can end up affecting general metadata performance. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Split out the 2 cases so that we can treat the locking differently. The issue is that the locking in the pageswapcache cache is highly linked to the commit list locking. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Hide the locking from nfs_lock_and_join_requests() so that we can separate out the requirements for swapcache pages. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Fix up the test in nfs_page_group_covers_page(). The simplest implementation is to check that we have a set of intersecting or contiguous subrequests that connect page offset 0 to nfs_page_length(req->wb_page). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
nfs_page_group_lock() is now always called with the 'nonblock' parameter set to 'false'. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
At this point, we only expect ever to potentially see PG_REMOVE and PG_TEARDOWN being set on the subrequests. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Since nfs_page_group_destroy() does not take any locks on the requests to be freed, we need to ensure that we don't inadvertently free the request in nfs_destroy_unlinked_subrequests() while the last reference is being released elsewhere. Do this by: 1) Taking a reference to the request unless it is already being freed 2) Checking (under the page group lock) if PG_TEARDOWN is already set before freeing an unreferenced request in nfs_destroy_unlinked_subrequests() Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
When locking the entire group in order to remove subrequests, the locks are always taken in order, and with the page group lock being taken after the page head is locked. The intention is that: 1) The lock on the group head guarantees that requests may not be removed from the group (although new entries could be appended if we're not holding the group lock). 2) It is safe to drop and retake the page group lock while iterating through the list, in particular when waiting for a subrequest lock. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
We should no longer need the inode->i_lock, now that we've straightened out the request locking. The locking schema is now: 1) Lock page head request 2) Lock the page group 3) Lock the subrequests one by one Note that there is a subtle race with nfs_inode_remove_request() due to the fact that the latter does not lock the page head, when removing it from the struct page. Only the last subrequest is locked, hence we need to re-check that the PagePrivate(page) is still set after we've locked all the subrequests. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
nfs_try_to_update_request() should be able to cope now. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Simplify the code, and avoid some flushes to disk. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests() manipulate the inode flags adjusting the NFS_I(inode)->nrequests. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
We don't want nfs_lock_and_join_requests() to start fiddling with the request before the call to nfs_page_group_sync_on_bit(). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Request offsets and sizes are not guaranteed to be stable unless you are holding the request locked. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
All other callers of nfs_page_group_lock() appear to already hold the page lock on the head page, so doing it in the opposite order here is inefficient, although not deadlock prone since we roll back all locks on contention. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Yes, this is a situation that should never happen (hence the WARN_ON) but we should still ensure that we free up the locks and references to the faulty pages. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Micro-optimisation to move the lockless check into the for(;;) loop. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Add a lockless check for whether or not the page might be carrying an existing writeback before we grab the inode->i_lock. Reported-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-