- 01 10月, 2014 1 次提交
-
-
由 Anna Schumaker 提交于
The SEEK operation is used when an application makes an lseek call with either the SEEK_HOLE or SEEK_DATA flags set. I fall back on nfs_file_llseek() if the server does not have SEEK support. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 30 9月, 2014 2 次提交
-
-
由 Anna Schumaker 提交于
This patch adds server support for the NFS v4.2 operation SEEK, which returns the position of the next hole or data segment in a file. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Anna Schumaker 提交于
It's cleaner to introduce everything at once and have the server reply with "not supported" than it would be to introduce extra operations when implementing a specific one in the middle of the list. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 27 9月, 2014 6 次提交
-
-
由 Christoph Hellwig 提交于
Add a higher level abstraction than the rpc_ops for callback operations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Christoph Hellwig 提交于
Split out initializing the nfs4_callback structure from using it. For the NULL callback this gets rid of tons of pointless re-initializations. Note that I don't quite understand what protects us from running multiple NULL callbacks at the same time, but at least this chance doesn't make it worse.. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Christoph Hellwig 提交于
Add a helper to queue up a callback. CB_NULL has a bit of special casing because it is special in the specification, but all other new callback operations will be able to share code with this and a few more changes to refactor the callback code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Christoph Hellwig 提交于
We can always get at the private data by using container_of, no need for a void pointer. Also introduce a little to_delegation helper to avoid opencoding the container_of everywhere. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Benny Halevy 提交于
This is incorrect when a callback is has to be restarted, in which case the XDR decoding of the second iteration will see a NULL cb argument. [hch: updated description] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Christoph Hellwig 提交于
For any error that is not EBADHANDLE or NFS4ERR_BAD_STATEID, nfsd4_cb_recall_done first marks the connection down, then retries until dl_retries hits zero, then marks the connection down again and sets cb_done. This changes the code to only retry for EBADHANDLE or NFS4ERR_BAD_STATEID, and factors setting cb_done into a single point in the function. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 18 9月, 2014 11 次提交
-
-
由 J. Bruce Fields 提交于
The grace period is ended in two steps--first userland is notified that the grace period is now long enough that any clients who have not yet reclaimed can be safely forgotten, then we flip the switch that forbids reclaims and allows new opens. I had to think a bit to convince myself that the ordering was right here. Document it. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
The attempt to automatically set a new grace period time at the end of the grace period isn't really helpful. We'll probably shut down and reboot before we actually make use of the new grace period time anyway. So may as well leave it up to the init system to get this right. This just confuses people when they see /proc/fs/nfsd/nfsv4gracetime change from what they set it to. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
In the case of v4.0 clients, we may call into the "create" client tracking operation multiple times (once for each openowner). Upcalling for each one of those is wasteful and slow however. We can skip doing further "create" operations after the first one if we know that one has already been done. v4.1+ clients generally only call into this function once (on RECLAIM_COMPLETE), and we can't skip upcalling on the create even if the STABLE bit is set. Doing so would make it impossible for nfsdcltrack to lift the grace period early since the timestamp has a different meaning in the case where the client is expected to issue a RECLAIM_COMPLETE. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
The nfsdcltrack upcall doesn't utilize the NFSD4_CLIENT_STABLE flag, which basically results in an upcall every time we call into the client tracking ops. Change it to set this bit on a successful "check" or "create" request, and clear it on a "remove" request. Also, check to see if that bit is set before upcalling on a "check" or "remove" request, and skip upcalling appropriately, depending on its state. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
In a later patch, we want to add a flag that will allow us to reduce the need for upcalls. In order to handle that correctly, we'll need to ensure that racing upcalls for the same client can't occur. In practice it should be rare for this to occur with a well-behaved client, but it is possible. Convert one of the bits in the cl_flags field to be an upcall bitlock, and use it to ensure that upcalls for the same client are serialized. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
In order to support lifting the grace period early, we must tell nfsdcltrack what sort of client the "create" upcall is for. We can't reliably tell if a v4.0 client has completed reclaiming, so we can only lift the grace period once all the v4.1+ clients have issued a RECLAIM_COMPLETE and if there are no v4.0 clients. Also, in order to lift the grace period, we have to tell userland when the grace period started so that it can tell whether a RECLAIM_COMPLETE has been issued for each client since then. Since this is all optional info, we pass it along in environment variables to the "init" and "create" upcalls. By doing this, we don't need to revise the upcall format. The UMH upcall can simply make use of this info if it happens to be present. If it's not then it can just avoid lifting the grace period early. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
Allow a privileged userland process to end the v4 grace period early. Writing "Y", "y", or "1" to the file will cause the v4 grace period to be lifted. The basic idea with this will be to allow the userland client tracking program to lift the grace period once it knows that no more clients will be reclaiming state. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
Add a new procfile that will allow a (privileged) userland process to end the NLM grace period early. The basic idea here will be to have sm-notify write to this file, if it sent out no NOTIFY requests when it runs. In that situation, we can generally expect that there will be no reclaim requests so the grace period can be lifted early. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
As stated in RFC 5661, section 18.51.3: Once a RECLAIM_COMPLETE is done, there can be no further reclaim operations for locks whose scope is defined as having completed recovery. Once the client sends RECLAIM_COMPLETE, the server will not allow the client to do subsequent reclaims of locking state for that scope and, if these are attempted, will return NFS4ERR_NO_GRACE. Ensure that we enforce that requirement. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
Since it's stored in nfsd_net, we don't need to pass it in separately. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
Currently, all of the grace period handling is part of lockd. Eventually though we'd like to be able to build v4-only servers, at which point we'll need to put all of this elsewhere. Move the code itself into fs/nfs_common and have it build a grace.ko module. Then, rejigger the Kconfig options so that both nfsd and lockd enable it automatically. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
- 11 9月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
This fixes a failure in xfstests generic/313 because nfs doesn't update mtime on a truncate. The protocol requires this to be done implicity for a size changing setattr. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 08 9月, 2014 1 次提交
-
-
由 Alexey Khoroshilov 提交于
Commit 0244756e ("ufs: sb mutex merge + mutex_destroy") introduces deadlocks in ufs_new_inode() and ufs_free_inode(). Most callers of that functions acqure the mutex by themselves and ufs_{new,free}_inode() do that via lock_ufs(), i.e we have an unavoidable double lock. The patch proposes to resolve the issue by making sure that ufs_{new,free}_inode() are not called with the mutex held. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org # 3.16 Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 9月, 2014 2 次提交
-
-
由 Anton Altaparmakov 提交于
This patch changes sync_filesystem() to be EXPORT_SYMBOL(). The reason this is needed is that starting with 3.15 kernel, due to Theodore Ts'o's commit 02b9984d ("fs: push sync_filesystem() down to the file system's remount_fs()"), all file systems that have dirty data to be written out need to call sync_filesystem() from their ->remount_fs() method when remounting read-only. As this is now a generically required function rather than an internal only function it should be EXPORT_SYMBOL() so that all file systems can call it. Signed-off-by: NAnton Altaparmakov <aia21@cantab.net> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gu Zheng 提交于
It seems that exit_aio() also needs to wait for all iocbs to complete (like io_destroy), but we missed the wait step in current implemention, so fix it in the same way as we did in io_destroy. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org
-
- 04 9月, 2014 6 次提交
-
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 03 9月, 2014 3 次提交
-
-
由 Trond Myklebust 提交于
This fixes an Oopsable race when starting up the callback server. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
This fixes an Oopsable race when starting lockd. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Moyer 提交于
We ran into a case on ppc64 running mariadb where io_getevents would return zeroed out I/O events. After adding instrumentation, it became clear that there was some missing synchronization between reading the tail pointer and the events themselves. This small patch fixes the problem in testing. Thanks to Zach for helping to look into this, and suggesting the fix. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org
-
- 02 9月, 2014 7 次提交
-
-
由 Chao Yu 提交于
As the race condition on the inode cache, following scenario can appear: [Thread a] [Thread b] ->f2fs_mkdir ->f2fs_add_link ->__f2fs_add_link ->init_inode_metadata failed here ->gc_thread_func ->f2fs_gc ->do_garbage_collect ->gc_data_segment ->f2fs_iget ->iget_locked ->wait_on_inode ->unlock_new_inode ->move_data_page ->make_bad_inode ->iput When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode should be set as bad to avoid being accessed by other thread. But in above scenario, it allows f2fs to access the invalid inode before this inode was set as bad. This patch fix the potential problem, and this issue was found by code review. change log from v1: o Add condition judgment in gc_data_segment() suggested by Changman Lee. o use iget_failed to simplify code. Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Brian Foster 提交于
xfs_collapse_file_space() currently writes back the entire file undergoing collapse range to settle things down for the extent shift algorithm. While this prevents changes to the extent list during the collapse operation, the writeback itself is not enough to prevent unnecessary collapse failures. The current shift algorithm uses the extent index to iterate the in-core extent list. If a post-eof delalloc extent persists after the writeback (e.g., a prior zero range op where the end of the range aligns with eof can separate the post-eof blocks such that they are not written back and converted), xfs_bmap_shift_extents() becomes confused over the encoded br_startblock value and fails the collapse. As with the full writeback, this is a temporary fix until the algorithm is improved to cope with a volatile extent list and avoid attempts to shift post-eof extents. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
If we have delalloc extents on a file before we run a collapse range opertaion, we sync the range that we are going to collapse to convert delalloc extents in that region to real extents to simplify the shift operation. However, the shift operation then assumes that the extent list is not going to change as it iterates over the extent list moving things about. Unfortunately, this isn't true because we can't hold the ILOCK over all the operations. We can prevent new IO from modifying the extent list by holding the IOLOCK, but that doesn't prevent writeback from running.... And when writeback runs, it can convert delalloc extents is the range of the file prior to the region being collapsed, and this changes the indexes of all the extents in the file. That causes the collapse range operation to Go Bad. The right fix is to rewrite the extent shift operation not to be dependent on the extent list not changing across the entire operation, but this is a fairly significant piece of work to do. Hence, as a short-term workaround for the problem, sync the entire file before starting a collapse operation to remove all delalloc ranges from the file and so avoid the problem of concurrent writeback changing the extent list. Diagnosed-and-Reported-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
The file collapse mechanism uses xfs_bmap_shift_extents() to collapse all subsequent extents down into the specified, previously punched out, region. This function performs some validation, such as whether a sufficient hole exists in the target region of the collapse, then shifts the remaining exents downward. The exit path of the function currently logs the inode unconditionally. While we must log the inode (and abort) if an error occurs and the transaction is dirty, the initial validation paths can generate errors before the transaction has been dirtied. This creates an unnecessary filesystem shutdown scenario, as the caller will cancel a transaction that has been marked dirty. Modify xfs_bmap_shift_extents() to OR the logflags bits as modifications are made to the inode bmap. Only log the inode in the exit path if logflags has been set. This ensures we only have to cancel a dirty transaction if modifications have been made and prevents an unnecessary filesystem shutdown otherwise. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Now we are not doing silly things with dirtying buffers beyond EOF and using invalidation correctly, we can finally reduce the ranges of writeback and invalidation used by direct IO to match that of the IO being issued. Bring the writeback and invalidation ranges back to match the generic direct IO code - this will greatly reduce the perturbation of cached data when direct IO and buffered IO are mixed, but still provide the same buffered vs direct IO coherency behaviour we currently have. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Similar to direct IO reads, direct IO writes are using truncate_pagecache_range to invalidate the page cache. This is incorrect due to the sub-block zeroing in the page cache that truncate_pagecache_range() triggers. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. cc: stable@vger.kernel.org Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Chris Mason 提交于
xfs is using truncate_pagecache_range to invalidate the page cache during DIO reads. This is different from the other filesystems who only invalidate pages during DIO writes. truncate_pagecache_range is meant to be used when we are freeing the underlying data structs from disk, so it will zero any partial ranges in the page. This means a DIO read can zero out part of the page cache page, and it is possible the page will stay in cache. buffered reads will find an up to date page with zeros instead of the data actually on disk. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. [dchinner: catch error and warn if it fails. Comment.] cc: stable@vger.kernel.org Signed-off-by: NChris Mason <clm@fb.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-