- 15 8月, 2017 3 次提交
-
-
由 Trond Myklebust 提交于
Micro-optimisation to move the lockless check into the for(;;) loop. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Add a lockless check for whether or not the page might be carrying an existing writeback before we grab the inode->i_lock. Reported-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
We don't expect the page header lock to ever be held across I/O, so it should always be safe to wait for it, even if we're doing nonblocking writebacks. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 12 8月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
The blocklayout code does not compile cleanly for a 32-bit sector_t, and also has no reliable checks for devices sizes, which makes it unsafe to use with a kernel that doesn't support large block devices. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NArnd Bergmann <arnd@arndb.de> Fixes: 5c83746a ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 10 8月, 2017 1 次提交
-
-
由 Trond Myklebust 提交于
If the call to TEST_STATEID returns NFS4ERR_OLD_STATEID, then it just means we raced with other calls to OPEN. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 09 8月, 2017 1 次提交
-
-
由 Weston Andros Adamson 提交于
The client was freeing the nfs4_ff_layout_ds, but not the contained nfs4_ff_ds_version array. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 02 8月, 2017 2 次提交
-
-
由 Trond Myklebust 提交于
rpc_clnt_add_xprt() expects the callback function to be synchronous, and expects to release the transport and switch references itself. Fixes: 04fa2c6b ("NFS pnfs data server multipath session trunking") Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was changed to be asynchronous by commit 8d89bd70. If we interrrupt the call to rpc_wait_for_completion_task(), we can therefore end up transmitting random stack contents in lieu of the verifier. Fixes: 8d89bd70 ("NFS setup async exchange_id") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 29 7月, 2017 1 次提交
-
-
由 Benjamin Coddington 提交于
nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the same region protected by the wait_queue's lock after checking for a notification from CB_NOTIFY_LOCK callback. However, after releasing that lock, a wakeup for that task may race in before the call to freezable_schedule_timeout_interruptible() and set TASK_WAKING, then freezable_schedule_timeout_interruptible() will set the state back to TASK_INTERRUPTIBLE before the task will sleep. The result is that the task will sleep for the entire duration of the timeout. Since we've already set TASK_INTERRUPTIBLE in the locked section, just use freezable_schedule_timout() instead. Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks") Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 27 7月, 2017 3 次提交
-
-
由 NeilBrown 提交于
posix_fallocate() will allocate space in an NFS file by considering the last byte of every 4K block. If it is before EOF, it will read the byte and if it is zero, a zero is written out. If it is after EOF, the zero is unconditionally written. For the blocks beyond EOF, if NFS believes its cache is valid, it will expand these writes to write full pages, and then will merge the pages. This results if (typically) 1MB writes. If NFS believes its cache is not valid (particularly if NFS_INO_INVALID_DATA or NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will send the individual 1-byte writes. This results in (typically) 256 times as many RPC requests, and can be substantially slower. Currently nfs_revalidate_mapping() is only used when reading a file or mmapping a file, as these are times when the content needs to be up-to-date. Writes don't generally need the cache to be up-to-date, but writes beyond EOF can benefit, particularly in the posix_fallocate() case. So this patch calls nfs_revalidate_mapping() when writing beyond EOF - i.e. when there is a gap between the end of the file and the start of the write. If the cache is thought to be out of date (as happens after taking a file lock), this will cause a GETATTR, and the two flags mentioned above will be cleared. With this, posix_fallocate() on a newly locked file does not generate excessive tiny writes. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 NeilBrown 提交于
Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open for writing"), NFS would revalidate, or invalidate, the file size when taking a lock. Since that commit it only invalidates the file content. If the file size is changed on the server while wait for the lock, the client will have an incorrect understanding of the file size and could corrupt data. This particularly happens when writing beyond the (supposed) end of file and can be easily be demonstrated with posix_fallocate(). If an application opens an empty file, waits for a write lock, and then calls posix_fallocate(), glibc will determine that the underlying filesystem doesn't support fallocate (assuming version 4.1 or earlier) and will write out a '0' byte at the end of each 4K page in the region being fallocated that is after the end of the file. NFS will (usually) detect that these writes are beyond EOF and will expand them to cover the whole page, and then will merge the pages. Consequently, NFS will write out large blocks of zeroes beyond where it thought EOF was. If EOF had moved, the pre-existing part of the file will be over-written. Locking should have protected against this, but it doesn't. This patch restores the use of nfs_zap_caches() which invalidated the cached attributes. When posix_fallocate() asks for the file size, the request will go to the server and get a correct answer. cc: stable@vger.kernel.org (v4.8+) Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing") Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Anna Schumaker 提交于
Commit bd8b2441 ("NFS: Store the raw NFS access mask in the inode's access cache") changed how the access results are stored after an access() call. An NFS v4 OPEN might have access bits returned with the opendata, so we should use the NFS4_ACCESS values when determining the return value in nfs4_opendata_access(). Fixes: bd8b2441 ("NFS: Store the raw NFS access mask in the inode's access cache") Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Tested-by: NTakashi Iwai <tiwai@suse.de>
-
- 22 7月, 2017 1 次提交
-
-
由 Trond Myklebust 提交于
We must set fl->dsaddr once, and once only, even if there are multiple processes calling filelayout_check_deviceid() for the same layout segment. Reported-by: NOlga Kornievskaia <kolga@netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 21 7月, 2017 5 次提交
-
-
由 Trond Myklebust 提交于
When mapping a directory, we want the MAY_WRITE permissions to reflect whether or not we have permission to modify, add and delete the directory entries. MAY_EXEC must map to lookup permissions. On the other hand, for files, we want MAY_WRITE to reflect a permission to modify and extend the file. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Eryu Guan 提交于
Array size of mnt3_counts should be the size of array mnt3_procedures, not mnt_procedures, though they're same in size right now. Found this by code inspection. Fixes: 1c5876dd ("sunrpc: move p_count out of struct rpc_procinfo") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NEryu Guan <eguan@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 20 7月, 2017 5 次提交
-
-
由 Trond Myklebust 提交于
Doing the test without taking any locks is racy, and so really it makes more sense to do it in the flexfiles code (which is the only case that cares). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
If the layout has expired due to a fencing event, then we should not attempt to commit to the DS. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
We must make sure that cinfo->ds->ncommitting is in sync with the commit list, since it is checked as part of pnfs_commit_list(). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
We must make sure that cinfo->ds->nwritten is in sync with the commit list, since it is checked as part of pnfs_scan_commit_lists(). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Steve Dickson 提交于
Doing this copy eliminates the "port=0" entry in the /proc/mounts entries Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=69241Signed-off-by: NSteve Dickson <steved@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 14 7月, 2017 17 次提交
-
-
由 Trond Myklebust 提交于
"perf lock" shows fairly heavy contention for the bit waitqueue locks when doing an I/O heavy workload. Use a bit to tell whether or not there has been contention for a lock so that we can optimise away the bit waitqueue options in those cases. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Peng Tao 提交于
This support for opening files on NFS by file handle, both through the open_by_handle syscall, and for re-exporting NFS (for example using a different version). The support is very basic for now, as each open by handle will have to do an NFSv4 open operation on the wire. In the future this will hopefully be mitigated by an open file cache, as well as various optimizations in NFS for this specific case. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> [hch: incorporated various changes, resplit the patches, new changelog] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
"perf lock" shows fairly heavy contention for the bit waitqueue locks when doing an I/O heavy workload. Use a bit to tell whether or not there has been contention for a lock so that we can optimise away the bit waitqueue options in those cases. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Jeff Layton 提交于
This will be needed in order to implement the get_parent export op for nfsd. Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Peng Tao 提交于
This support for opening files on NFS by file handle, both through the open_by_handle syscall, and for re-exporting NFS (for example using a different version). The support is very basic for now, as each open by handle will have to do an NFSv4 open operation on the wire. In the future this will hopefully be mitigated by an open file cache, as well as various optimizations in NFS for this specific case. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> [hch: incorporated various changes, resplit the patches, new changelog] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Peng Tao 提交于
This helper will allow to find an existing NFS inode by the file handle and fattr. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> [hch: split from a larger patch] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Peng Tao 提交于
It's a trival change but follows knfsd export document that asks for d_splice_alias during lookup. Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Olga Kornievskaia 提交于
Return size of COPY is u64 but it was assigned to an "int" status. Signed-off-by: NOlga Kornievskaia <kolga@netapp.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Transparent State Migration copies a client's lease state from the server where a filesystem used to reside to the server where it now resides. When an NFSv4.1 client first contacts that destination server, it uses EXCHANGE_ID to detect trunking relationships. The lease that was copied there is returned to that client, but the destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to the client. This is because the lease was confirmed on the source server (before it was copied). When CONFIRMED_R is set, the client throws away the sequence ID returned by the server. During a Transparent State Migration, however there's no other way for the client to know what sequence ID to use with a lease that's been migrated. Therefore, the client must save and use the contrived slot sequence value returned by the destination server even when CONFIRMED_R is set. Note that some servers always return a seqid of 1 after a migration. Reported-by: NXuan Qi <xuan.qi@oracle.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NXuan Qi <xuan.qi@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Transparent State Migration copies a client's lease state from the server where a filesystem used to reside to the server where it now resides. When an NFSv4.1 client first contacts that destination server, it uses EXCHANGE_ID to detect trunking relationships. The lease that was copied there is returned to that client, but the destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to the client. This is because the lease was confirmed on the source server (before it was copied). Normally, when CONFIRMED_R is set, a client purges the lease and creates a new one. However, that throws away the entire benefit of Transparent State Migration. Therefore, the client must not purge that lease when it is possible that Transparent State Migration has occurred. Reported-by: NXuan Qi <xuan.qi@oracle.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NXuan Qi <xuan.qi@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 NeilBrown 提交于
If an NFS server returns a filehandle that we have previously seen, and reports a different type, then nfs_refresh_inode() will log a warning and return an error. nfs_fhget() does not check for this error and may return an inode with a different type than the one that the server reported. This is likely to cause confusion, and is one way that ->open_context() could return a directory inode as discussed in the previous patch. So if nfs_refresh_inode() returns and error, return that error from nfs_fhget() to avoid the confusion propagating. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 NeilBrown 提交于
A confused server could return a filehandle for an NFSv4 OPEN request, which it previously returned for a directory. So the inode returned by ->open_context() in nfs_atomic_open() could conceivably be a directory inode. This has particular implications for the call to nfs_file_set_open_context() in nfs_finish_open(). If that is called on a directory inode, then the nfs_open_context that gets stored in the filp->private_data will be linked to nfs_inode->open_files. When the directory is closed, nfs_closedir() will (ultimately) free the ->private_data, but not unlink it from nfs_inode->open_files (because it doesn't expect an nfs_open_context there). Subsequently the memory could get used for something else and eventually if the ->open_files list is walked, the walker will fall off the end and crash. So: change nfs_finish_open() to only call nfs_file_set_open_context() for regular-file inodes. This failure mode has been seen in a production setting (unknown NFS server implementation). The kernel was v3.0 and the specific sequence seen would not affect more recent kernels, but I think a risk is still present, and caution is wise. Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 NeilBrown 提交于
Since commit bafc9b75 ("vfs: More precise tests in d_invalidate") in v3.18, a return of '0' from ->d_revalidate() will cause the dentry to be invalidated even if it has filesystems mounted on or it or on a descendant. The mounted filesystem is unmounted. This means we need to be careful not to return 0 unless the directory referred to truly is invalid. So -ESTALE or -ENOENT should invalidate the directory. Other errors such a -EPERM or -ERESTARTSYS should be returned from ->d_revalidate() so they are propagated to the caller. A particular problem can be demonstrated by: 1/ mount an NFS filesystem using NFSv3 on /mnt 2/ mount any other filesystem on /mnt/foo 3/ ls /mnt/foo 4/ turn off network, or otherwise make the server unable to respond 5/ ls /mnt/foo & 6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack 7/ kill -9 $! # this results in -ERESTARTSYS being returned 8/ observe that /mnt/foo has been unmounted. This patch changes nfs_lookup_revalidate() to only treat -ESTALE from nfs_lookup_verify_inode() and -ESTALE or -ENOENT from ->lookup() as indicating an invalid inode. Other errors are returned. Also nfs_check_inode_attributes() is changed to return -ESTALE rather than -EIO. This is consistent with the error returned in similar circumstances from nfs_update_inode(). As this bug allows any user to unmount a filesystem mounted on an NFS filesystem, this fix is suitable for stable kernels. Fixes: bafc9b75 ("vfs: More precise tests in d_invalidate") Cc: stable@vger.kernel.org (v3.18+) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Olga Kornievskaia 提交于
Upon receiving a stateid error such as BAD_STATEID, the client should retry the operation against the MDS before deciding to do stateid recovery. Previously, the code would initiate state recovery and it could lead to a race in a state manager that could chose an incorrect recovery method which would lead to the EIO failure for the application. Signed-off-by: NOlga Kornievskaia <kolga@netapp.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Olga Kornievskaia 提交于
Commit fabbbee0 "PNFS fix fallback to MDS if got error on commit to DS" moved the pnfs_set_lo_fail() to unhandled errors which was not correct and lead to a kernel oops on umount. Instead, fix the original EACCESS on commit to DS error by getting the new layout and re-doing the IO. Fixes: fabbbee0 ("PNFS fix fallback to MDS if got error on commit to DS") Signed-off-by: NOlga Kornievskaia <kolga@netapp.com> Cc: stable@vger.kernel.org # v4.12 Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Dan Carpenter 提交于
Static checkers have gotten clever enough to complain that "id_long" is uninitialized on the failure path. It's harmless, but simple to fix. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Tuo Chen Peng 提交于
nfs_show_stats() was incorrectly reading statistics for bytes when printing that for fsc. It caused files like /proc/self/mountstats to report incorrect fsc statistics for NFS mounts. Signed-off-by: NTuo Chen Peng <tpeng@nvidia.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-