- 04 4月, 2018 1 次提交
-
-
由 David Howells 提交于
Attach copies of the index key and auxiliary data to the fscache cookie so that: (1) The callbacks to the netfs for this stuff can be eliminated. This can simplify things in the cache as the information is still available, even after the cache has relinquished the cookie. (2) Simplifies the locking requirements of accessing the information as we don't have to worry about the netfs object going away on us. (3) The cache can do lazy updating of the coherency information on disk. As long as the cache is flushed before reboot/poweroff, there's no need to update the coherency info on disk every time it changes. (4) Cookies can be hashed or put in a tree as the index key is easily available. This allows: (a) Checks for duplicate cookies can be made at the top fscache layer rather than down in the bowels of the cache backend. (b) Caching can be added to a netfs object that has a cookie if the cache is brought online after the netfs object is allocated. A certain amount of space is made in the cookie for inline copies of the data, but if it won't fit there, extra memory will be allocated for it. The downside of this is that live cache operation requires more memory. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NAnna Schumaker <anna.schumaker@netapp.com> Tested-by: NSteve Dickson <steved@redhat.com>
-
- 28 3月, 2018 1 次提交
-
-
由 Kirill Tkhai 提交于
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 3月, 2018 2 次提交
-
-
由 Kirill Tkhai 提交于
These pernet_operations create and destroy per-net pipe and dentry, and they seem safe to be marked as async. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NAnna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
These pernet_operations look similar to rpcsec_gss_net_ops, they just create and destroy another cache. Also they create and destroy directory. So, they also look safe to be async. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NAnna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 3月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
The old wait_on_atomic_t() is going to get removed, use the more flexible wait_var_event() API instead. No change in functionality. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 3月, 2018 1 次提交
-
-
由 Eric W. Biederman 提交于
On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Fixes: 397d425d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 3月, 2018 3 次提交
-
-
由 Trond Myklebust 提交于
We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to ensure that all outstanding COMMIT requests to the inode in question are complete. Currently we may exit early from both nfs_commit_inode() and nfs_write_inode() even if there are COMMIT requests in flight, or unstable writes on the commit list. In order to get the right semantics w.r.t. sync_inode(), we don't need to have nfs_commit_inode() reset the inode dirty flags when called from nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that nfs_write_inode() leaves them in the right state if there are outstanding commits, or stable pages. Reported-by: NScott Mayhew <smayhew@redhat.com> Fixes: dc4fd9ab ("nfs: don't wait on commit in nfs_commit_inode()...") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Ensure that we hold a reference to the layout header when processing the pNFS return-on-close so that the refcount value does not inadvertently go to zero. Reported-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.10+ Tested-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
-
由 Trond Myklebust 提交于
The start offset needs to be of type loff_t. Fixed: 5fadeb47 ("nfs: count DIO good bytes correctly with mirroring") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 28 2月, 2018 1 次提交
-
-
由 Kirill Tkhai 提交于
These pernet_operations just create and destroy /proc entries and net_generic()->cb_ident_idr IDR. So, we are able to mark them async. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 2月, 2018 2 次提交
-
-
由 Colin Ian King 提交于
The structure nlmclnt_fl_close_lock_ops s local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not declared. Should it be static? Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Bill.Baker@oracle.com 提交于
nfs4_update_server unconditionally releases the nfs_client for the source server. If migration fails, this can cause the source server's nfs_client struct to be left with a low reference count, resulting in use-after-free. Also, adjust reference count handling for ELOOP. NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6 WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]() nfs_put_client+0xfa/0x110 [nfs] nfs4_run_state_manager+0x30/0x40 [nfsv4] kthread+0xd8/0xf0 BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8 nfs4_xdr_enc_write+0x6b/0x160 [nfsv4] rpcauth_wrap_req+0xac/0xf0 [sunrpc] call_transmit+0x18c/0x2c0 [sunrpc] __rpc_execute+0xa6/0x490 [sunrpc] rpc_async_schedule+0x15/0x20 [sunrpc] process_one_work+0x160/0x470 worker_thread+0x112/0x540 ? rescuer_thread+0x3f0/0x3f0 kthread+0xd8/0xf0 This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"), but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops") Reported-by: NHelen Chao <helen.chao@oracle.com> Fixes: 52442f9b ("NFS4: Avoid migration loops") Signed-off-by: NBill Baker <bill.baker@oracle.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 22 2月, 2018 1 次提交
-
-
由 Trond Myklebust 提交于
Passing a pointer to a unsigned integer to test_bit() is broken. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 01 2月, 2018 1 次提交
-
-
由 Goffredo Baroncelli 提交于
The function inode_cmp_iversion{+raw} is counter-intuitive, because it returns true when the counters are different and false when these are equal. Rename it to inode_eq_iversion{+raw}, which will returns true when the counters are equal and false otherwise. Signed-off-by: NGoffredo Baroncelli <kreijack@inwind.it> Signed-off-by: NJeff Layton <jlayton@redhat.com>
-
- 29 1月, 2018 2 次提交
-
-
由 Jeff Layton 提交于
For NFS, we just use the "raw" API since the i_version is mostly managed by the server. The exception there is when the client holds a write delegation, but we only need to bump it once there anyway to handle CB_GETATTR. Tested-by: NKrzysztof Kozlowski <krzk@kernel.org> Signed-off-by: NJeff Layton <jlayton@redhat.com>
-
由 Trond Myklebust 提交于
When locking the file in order to do O_DIRECT on it, we must unmap any mmapped ranges on the pagecache so that we can flush out the dirty data. Fixes: a5864c99 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.8+
-
- 28 1月, 2018 1 次提交
-
-
由 Trond Myklebust 提交于
We don't need to call unmap_mapping_range() prior to calling nfs_sync_mapping(). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 26 1月, 2018 2 次提交
-
-
由 Benjamin Coddington 提交于
It's possible that the device map is smaller than the offset into the device for the I/O we're adding. Add a check for it and bail out, otherwise we risk botching the bio calculations that follow. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trondmy@gmail.com>
-
由 Benjamin Coddington 提交于
Fixup the field types to match their use. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trondmy@gmail.com>
-
- 22 1月, 2018 1 次提交
-
-
由 Eric Biggers 提交于
nfs_idmap_legacy_upcall() is supposed to be called with 'aux' pointing to a 'struct idmap', via the call to request_key_with_auxdata() in nfs_idmap_request_key(). However it can also be reached via the request_key() system call in which case 'aux' will be NULL, causing a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall(), assuming that the key description is valid enough to get that far. Fix this by making nfs_idmap_legacy_upcall() negate the key if no auxdata is provided. As usual, this bug was found by syzkaller. A simple reproducer using the command-line keyctl program is: keyctl request2 id_legacy uid:0 '' @s Fixes: 57e62324 ("NFS: Store the legacy idmapper result in the keyring") Reported-by: syzbot+5dfdbcf7b3eb5912abbb@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NTrond Myklebust <trondmy@gmail.com>
-
- 19 1月, 2018 3 次提交
-
-
由 Jan Chochol 提交于
Since commit 57e62324 ("NFS: Store the legacy idmapper result in the keyring") nfs_idmap_cache_timeout changed units from jiffies to seconds. Unfortunately sysctl interface was not updated accordingly. As a effect updating /proc/sys/fs/nfs/idmap_cache_timeout with some value will incorrectly multiply this value by HZ. Also reading /proc/sys/fs/nfs/idmap_cache_timeout will show real value divided by HZ. Fixes: 57e62324 ("NFS: Store the legacy idmapper result in the keyring") Signed-off-by: NJan Chochol <jan@chochol.info> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Chuck Lever 提交于
Commit 8224b273 ("NFS: Add static NFS I/O tracepoints") had a hack to work around some odd behavior observed with __print_symbolic. I couldn't ever get it to display NFS_FILE_SYNC when using TRACE_DEFINE_ENUM macros to set up the enum values. I tracked down the actual bug that forced me to add the workaround. That issue will be addressed soon, so replace the hack with a proper implementation. Fixes: 8224b273 ("NFS: Add static NFS I/O tracepoints") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Tigran Mkrtchyan 提交于
A pNFS server may return LAYOUTUNAVAILABLE error on LAYOUTGET for files which don't have any layout. In this situation pnfs_update_layout currently returns NULL. As this NULL is converted into ENOMEM, IO requests fails instead of falling back to MDS. Do not return ENOMEM on LAYOUTUNAVAILABLE and let client retry through MDS. Fixes 8d40b0f1. I will suggest to backport this fix to affected stable branches. Signed-off-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de> [trondmy: Use IS_ERR_OR_NULL()] Fixes: 8d40b0f1 ("NFS filelayout:call GETDEVICEINFO after...") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 16 1月, 2018 2 次提交
-
-
由 J. Bruce Fields 提交于
If some of the WRITE calls making up an O_DIRECT write syscall fail, we neglect to commit, even if some of the WRITEs succeed. We also depend on the commit code to free the reference count on the nfs_page taken in the "if (request_commit)" case at the end of nfs_direct_write_completion(). The problem was originally noticed because ENOSPC's encountered partway through a write would result in a closed file being sillyrenamed when it should have been unlinked. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Arnd Bergmann 提交于
The only reference to the label got removed, so we now get a harmless compiler warning: fs/nfs/export.c: In function 'nfs_encode_fh': fs/nfs/export.c:58:1: error: label 'out' defined but not used [-Werror=unused-label] Fixes: aaa15008 ("nfs: remove dead code from nfs_encode_fh()") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 15 1月, 2018 15 次提交
-
-
由 Chuck Lever 提交于
After traversing a referral or recovering from a migration event, ensure that the server port reported in /proc/mounts is updated to the correct port setting for the new submount. Reported-by: NHelen Chao <helen.chao@oracle.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Chuck Lever 提交于
Helen Chao <helen.chao@oracle.com> noticed that when a user traverses a referral on an NFS/RDMA mount, the resulting submount always uses TCP. This behavior does not match the vers= setting when traversing a referral (vers=4.1 is preserved). It also does not match the behavior of crossing from the pseudofs into a real filesystem (proto=rdma is preserved in that case). The Linux NFS client does not currently support the fs_locations_info attribute. The situation is similar for all NFSv4 servers I know of. Therefore until the community has broad support for fs_locations_info, when following a referral: - First try to connect with RPC-over-RDMA. This will fail quickly if the client has no RDMA-capable interfaces. - If connecting with RPC-over-RDMA fails, or the RPC-over-RDMA transport is not available, use TCP. Reported-by: NHelen Chao <helen.chao@oracle.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Scott Mayhew 提交于
Currently when falling back to doing I/O through the MDS (via pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header without releasing the reference taken on the dreq via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> nfs_direct_pgio_init. It then takes another reference on the dreq via nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and as a result the requester will become stuck in inode_dio_wait. Once that happens, other processes accessing the inode will become stuck as well. Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean up correctly by calling hdr->completion_ops->completion() instead of calling hdr->release() directly. This can be reproduced (sometimes) by performing "storage failover takeover" commands on NetApp filer while doing direct I/O from a client. This can also be reproduced using SystemTap to simulate a failure while doing direct I/O from a client (from Dave Wysochanski <dwysocha@redhat.com>): stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' Suggested-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NScott Mayhew <smayhew@redhat.com> Fixes: 1ca018d2 ("pNFS: Fix a memory leak when attempted pnfs fails") Cc: stable@vger.kernel.org Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Benjamin Coddington 提交于
PNFS block/SCSI layouts should gracefully handle cases where block devices are not available when a layout is retrieved, or the block devices are removed while the client holds a layout. While setting up a layout segment, keep a record of an unavailable or un-parsable block device in cache with a flag so that subsequent layouts do not spam the server with GETDEVINFO. We can reuse the current NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing the device, we will discard it and send a fresh GETDEVINFO after the timeout, since the lookup and validation of the device occurs within the GETDEVINFO response handling. A lookup of a layout segment that references an unavailable device will return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow the pgio layer to mark the layout with the appropriate fail bit, which forces subsequent IO to the MDS, and prevents spamming the server with LAYOUTGET, LAYOUTRETURN. Finally, when IO to a block device fails, look up the block device(s) referenced by the pgio header, and mark them as unavailable. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Benjamin Coddington 提交于
If there's an error doing I/O to block device, and the client resends the I/O to the MDS, the MDS must recall the layout from the client before processing the I/O. Let's preempt that exchange by returning the layout before falling back to the MDS when there's an error. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Benjamin Coddington 提交于
The blocklayout module contains the client support for both block and SCSI layouts. Add a module alias for the SCSI layout type so that the module will be loaded for SCSI layouts. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Benjamin Coddington 提交于
nfs_pgio_rpcsetup() is always called with an offset of 0, so we should be able to drop the arguement altogether. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 NeilBrown 提交于
There are 2 comments in the NFSv4 code which suggest that SIGLOST should possibly be sent to a process. In these cases a lock has been lost. The current practice is to set NFS_LOCK_LOST so that read/write returns EIO when a lock is lost. So change these comments to code when sets NFS_LOCK_LOST. One case is when lock recovery after apparent server restart fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock attempt as part of lease recovery fails with NFS4ERR_DENIED. In an ideal world, these should not happen. However I have a packet trace showing an NFSv4.1 session getting NFS4ERR_BADSESSION after an extended network parition. The NFSv4.1 client treats this like server reboot until/unless it get NFS4ERR_NO_GRACE, in which case it switches over to "nograce" recovery mode. In this network trace, the client attempts to recover a lock and the server (incorrectly) reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This leads to the ineffective comment and the client then continues to write using the OPEN stateid. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 NeilBrown 提交于
This code can never be used as the IS_AUTOMOUNT(inode) case has already been handled. So remove it to avoid confusion. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Support the query flags AT_STATX_FORCE_SYNC by forcing an attribute revalidation, and AT_STATX_DONT_SYNC by returning cached attributes only. Use the mask to optimise away server revalidation for attributes that are not being requested by the user. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
The LOOKUPP operation was inserted into the nfs4_procedures array rather than being appended, which put /proc/net/rpc/nfs out of whack, and broke the nfsstat utility. Fix by moving the LOOKUPP operation to the end of the array, and by ensuring that it keeps the same length whether or not NFSV4.1 and NFSv4.2 are compiled in. Fixes: 5b5faaf6 ("nfs4: add NFSv4 LOOKUPP handlers") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Convert CLOSE so that it specifies the correct stateid and inode for the error handling. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Convert CLOSE so that it specifies the correct stateid, state and inode for the error handling. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
The commit list can get very large, and so we need a cond_resched() in nfs_commit_release_pages() in order to ensure we don't hog the CPU for excessive periods of time. Reported-by: NMike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-