- 10 9月, 2014 10 次提交
-
-
由 Jeff Layton 提交于
GFS2 and NFS have setlease routines that always just return -EINVAL. Turn that into a generic routine that can live in fs/libfs.c. Cc: <linux-nfs@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: <cluster-devel@redhat.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com> Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jeff Layton 提交于
There are no callers of these functions. Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
As Kinglong points out, the nlm_block->b_fl field is no longer used at all. Also, vfs_test_lock in the generic locking code will only return FILE_LOCK_DEFERRED if FL_SLEEP is set, and it isn't here. The only other place that returns that value is the DLM lock code, but it only does that in dlm_posix_lock, never in dlm_posix_get. Remove all of the deferred locking code from the testlock codepath since it doesn't appear to ever be used anyway. I do have a small concern that this might cause a behavior change in the case where you have a block already sitting on the list when the testlock request comes in, but that looks like it doesn't really work properly anyway. I think it's best to just pass that down to vfs_test_lock and let the filesystem report that instead of trying to infer what's going on with the lock by looking at an existing block. Cc: cluster-devel@redhat.com Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NKinglong Mee <kinglongmee@gmail.com>
-
由 Kinglong Mee 提交于
v5: using nfs4_get_stateowner() instead of an inline function v3: Update based on Jeff's comments v2: Fix bad using of struct file_lock_operations for handle the owner Acked-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Kinglong Mee 提交于
v5: same as the first version Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Kinglong Mee 提交于
Commit d5b9026a ([PATCH] knfsd: locks: flag NFSv4-owned locks) using fl_lmops field in file_lock for checking nfsd4 lockowner. But, commit 1a747ee0 (locks: don't call ->copy_lock methods on return of conflicting locks) causes the fl_lmops of conflock always be NULL. Also, commit 0996905f (lockd: posix_test_lock() should not call locks_copy_lock()) caused the fl_lmops of conflock always be NULL too. Make sure copy the private information by fl_copy_lock() in struct file_lock_operations, merge __locks_copy_lock() to fl_copy_lock(). Jeff advice, "Set fl_lmops on conflocks, but don't set fl_ops. fl_ops are superfluous, since they are callbacks into the filesystem. There should be no need to bother the filesystem at all with info in a conflock. But, lock _ownership_ matters for conflocks and that's indicated by the fl_lmops. So you really do want to copy the fl_lmops for conflocks I think." v5: add missing calling of locks_release_private() in nlmsvc_testlock() v4: only copy fl_lmops for conflock, don't copy fl_ops Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Kinglong Mee 提交于
NFSD or other lockmanager may increase the owner's reference, so adds two new options for copying and releasing owner. v5: change order from 2/6 to 3/6 v4: rename lm_copy_owner/lm_release_owner to lm_get_owner/lm_put_owner Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Kinglong Mee 提交于
Jeff advice, " Right now __locks_copy_lock is only used to copy conflocks. It would be good to rename that to something more distinct (i.e.locks_copy_conflock), to make it clear that we're generating a conflock there." v5: change order from 3/6 to 2/6 v4: new patch only renaming function name Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Joe Perches 提交于
This argument is always NULL so don't pass it around. [jlayton: remove dependencies on previous patches in series] Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
The argument to locks_unlink_lock can't be just any pointer to a pointer. It must be a pointer to the fl_next field in the previous lock in the list. Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 03 9月, 2014 2 次提交
-
-
由 Trond Myklebust 提交于
This fixes an Oopsable race when starting up the callback server. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
This fixes an Oopsable race when starting lockd. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 29 8月, 2014 3 次提交
-
-
由 J. Bruce Fields 提交于
The working group appears committed to keeping the protocol stable, the code has gotten some use and seems to work OK. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Anna Schumaker 提交于
Recent NFS v4.2 drafts have removed NFS4ERR_METADATA_NOTSUPP and reassigned the error code to NFS4ERR_UNION_NOTSUPP. I also add in the NFS4ERR_OFFLOAD_NO_REQS error code. We're not using any of these yet, so there's no harm done. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
locks_alloc_lock() has initialized struct file_lock, no need to re-initialize it here. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 19 8月, 2014 1 次提交
-
-
由 Rajesh Ghanekar 提交于
One of our customer's application only needs file names, not file attributes. With directories having 10K+ inodes (assuming buffer cache has directory blocks cached having file names, but inode cache is limited and hence need eviction of older cached inodes), older inodes are evicted periodically. So if they keep on doing readdir(2) from NSF client on multiple directories, some directory's files are periodically removed from inode cache and hence new readdir(2) on same directory requires disk access to bring back inodes again to inode cache. As READDIRPLUS request fetches attributes also, doing getattr on each file on server, it causes unnecessary disk accesses. If READDIRPLUS on NFS client is returned with -ENOTSUPP, NFS client uses READDIR request which just gets the names of the files in a directory, not attributes, hence avoiding disk accesses on server. There's already a corresponding client-side mount option, but an export option reduces the need for configuration across multiple clients. This flag affects NFSv3 only. If it turns out it's needed for NFSv4 as well then we may have to figure out how to extend the behavior to NFSv4, but it's not currently obvious how to do that. Signed-off-by: NRajesh Ghanekar <rajesh_ghanekar@symantec.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 18 8月, 2014 13 次提交
-
-
由 J. Bruce Fields 提交于
As of 8c7424cf "nfsd4: don't try to encode conflicting owner if low on space", we permit the server to process a LOCK operation even if there might not be space to return the conflicting lockowner, because we've made returning the conflicting lockowner optional. However, the rpc server still wants to know the most we might possibly return, so we need to take into account the possible conflicting lockowner in the svc_reserve_space() call here. Symptoms were log messages like "RPC request reserved 88 but used 108". Fixes: 8c7424cf "nfsd4: don't try to encode conflicting owner if low on space" Reported-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
We do what Neil suggests now. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Ross Lagerwall 提交于
When creating a file that already exists in a read-only directory with O_EXCL, the NFSv3 server returns EACCES rather than EEXIST (which local files and the NFSv4 server return). Fix this by checking the MAY_CREATE permission only if the file does not exist. Since this already happens in do_nfsd_create, the check in nfsd3_proc_create can simply be removed. Signed-off-by: NRoss Lagerwall <rosslagerwall@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Currently, we hold the state_lock when releasing the lease. That's potentially problematic in the future if we allow for setlease methods that can sleep. Move the nfs4_put_deleg_lease call out of the delegation unhashing routine (which was always a bit goofy anyway), and into the unlocked sections of the callers of unhash_delegation_locked. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Currently these fields are protected with the state_lock, but that doesn't really make a lot of sense. These fields are "private" to the nfs4_file, and can be protected with the more granular fi_lock. The fi_lock is already held when setting these fields. Make the code hold the fp->fi_lock when clearing the lease-related fields in the nfs4_file, and no longer require that the state_lock be held when calling into this function. To prevent lock inversion with the i_lock, we also move the vfs_setlease and fput calls outside of the fi_lock. This also sets us up for allowing vfs_setlease calls to block in the future. Finally, remove a redundant NULL pointer check. unhash_delegation_locked locks the fp->fi_lock prior to that check, so fp in that function must never be NULL. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
We would normally expect the xid and the checksum to be the best discriminators. Check them before looking at the procedure number, etc. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
...so we can remove the spinlocking around it. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Now that the lru list is per-bucket, we don't need a second list for searches. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 15 8月, 2014 9 次提交
-
-
由 Chris Mason 提交于
Truncates and renames are often used to replace old versions of a file with new versions. Applications often expect this to be an atomic replacement, even if they haven't done anything to make sure the new version is fully on disk. Btrfs has strict flushing in place to make sure that renaming over an old file with a new file will fully flush out the new file before allowing the transaction commit with the rename to complete. This ordering means the commit code needs to be able to lock file pages, and there are a few paths in the filesystem where we will try to end a transaction with the page lock held. It's rare, but these things can deadlock. This patch removes the ordered flushes and switches to a best effort filemap_flush like ext4 uses. It's not perfect, but it should fix the deadlocks. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
Under rare circumstances we can end up leaving 2 versions of a checksum for the same file extent range. The reason for this is that after calling btrfs_next_leaf we process slot 0 of the leaf it returns, instead of processing the slot set in path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after btrfs_next_leaf() releases the path and before it searches for the next leaf, another task might cause a split of the next leaf, which migrates some of its keys to the leaf we were processing before calling btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the same leaf but with path->slots[0] having a slot number corresponding to the first new key it got, that is, a slot number that didn't exist before calling btrfs_next_leaf(), as the leaf now has more keys than it had before. So we must really process the returned leaf starting at path->slots[0] always, as it isn't always 0, and the key at slot 0 can have an offset much lower than our search offset/bytenr. For example, consider the following scenario, where we have: sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568 four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472 Leaf N: slot = 0 slot = btrfs_header_nritems() - 1 |-------------------------------------------------------------------| | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] | |-------------------------------------------------------------------| Leaf N + 1: slot = 0 slot = btrfs_header_nritems() - 1 |--------------------------------------------------------------------| | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 | |--------------------------------------------------------------------| Because we are at the last slot of leaf N, we call btrfs_next_leaf() to find the next highest key, which releases the current path and then searches for that next key. However after releasing the path and before finding that next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore btrfs_next_leaf() will returns us a path again with leaf N but with the slot pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N is then: slot = 0 slot = btrfs_header_nritems() - 2 slot = btrfs_header_nritems() - 1 |----------------------------------------------------------------------------------------------------| | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] [(CSUM CSUM 40161280), size 32] | |----------------------------------------------------------------------------------------------------| And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump into the "insert:" label, which will set tmp to: tmp = min((sums->len - total_bytes) >> blocksize_bits, (next_offset - file_key.offset) >> blocksize_bits) = min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) = min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4 and ins_size = csum_size * tmp = 4 * 4 = 16 bytes. In other words, we insert a new csum item in the tree with key (CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong, because the item with key (CSUM CSUM 40161280) (the one that was moved from leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288 bytes of our data and won't get those old checksums removed. So this leaves us 2 different checksums for 3 4kb blocks of data in the tree, and breaks the logical rule: Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover An obvious bad effect of this is that a subsequent csum tree lookup to get the checksum of any of the blocks with logical offset of 40161280, 40165376 or 40169472 (the last 3 4kb blocks of file data), will get the old checksums. Cc: stable@vger.kernel.org Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Takashi Iwai 提交于
We've got bug reports that btrfs crashes when quota is enabled on 32bit kernel, typically with the Oops like below: BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs] *pde = 00000000 Oops: 0000 [#1] SMP CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S W 3.15.2-1.gd43d97e-default #1 Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs] task: f1478130 ti: f147c000 task.ti: f147c000 EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0 EIP is at find_parent_nodes+0x360/0x1380 [btrfs] EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000 ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690 Stack: 00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050 00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000 00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000 Call Trace: [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs] [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs] [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs] [<c025e38b>] process_one_work+0x11b/0x390 [<c025eea1>] worker_thread+0x101/0x340 [<c026432b>] kthread+0x9b/0xb0 [<c0712a71>] ret_from_kernel_thread+0x21/0x30 [<c0264290>] kthread_create_on_node+0x110/0x110 This indicates a NULL corruption in prefs_delayed list. The further investigation and bisection pointed that the call of ulist_add_merge() results in the corruption. ulist_add_merge() takes u64 as aux and writes a 64bit value into old_aux. The callers of this function in backref.c, however, pass a pointer of a pointer to old_aux. That is, the function overwrites 64bit value on 32bit pointer. This caused a NULL in the adjacent variable, in this case, prefs_delayed. Here is a quick attempt to band-aid over this: a new function, ulist_add_merge_ptr() is introduced to pass/store properly a pointer value instead of u64. There are still ugly void ** cast remaining in the callers because void ** cannot be taken implicitly. But, it's safer than explicit cast to u64, anyway. Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046 Cc: <stable@vger.kernel.org> [v3.11+] Signed-off-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Liu Bo 提交于
When failing to allocate space for the whole compressed extent, we'll fallback to uncompressed IO, but we've forgotten to redirty the pages which belong to this compressed extent, and these 'clean' pages will simply skip 'submit' part and go to endio directly, at last we got data corruption as we write nothing. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Tested-By: NMartin Steigerwald <martin@lichtvoll.de> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Mark Fasheh 提交于
ulist_add() can return '1' on sucess, which qgroup_subtree_accounting() doesn't take into account. As a result, that value can be bubbled up to callers, causing an error to be printed. Fix this by only returning the value of ulist_add() when it indicates an error. Signed-off-by: NMark Fasheh <mfasheh@suse.de> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Mark Fasheh 提交于
During its tree walk, btrfs_drop_snapshot() will skip any shared subtrees it encounters. This is incorrect when we have qgroups turned on as those subtrees need to have their contents accounted. In particular, the case we're concerned with is when removing our snapshot root leaves the subtree with only one root reference. In those cases we need to find the last remaining root and add each extent in the subtree to the corresponding qgroup exclusive counts. This patch implements the shared subtree walk and a new qgroup operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of this type is encountered during qgroup accounting, we search for any root references to that extent and in the case that we find only one reference left, we go ahead and do the math on it's exclusive counts. Signed-off-by: NMark Fasheh <mfasheh@suse.de> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
Before processing the extent buffer, acquire a read lock on it, so that we're safe against concurrent updates on the extent buffer. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
Before I extended the no_quota arg to btrfs_dec/inc_ref because I didn't understand how snapshot delete was using it and assumed that we needed the quota operations there. With Mark's work this has turned out to be not the case, we _always_ need to use no_quota for btrfs_dec/inc_ref, so just drop the argument and make __btrfs_mod_ref call it's process function with no_quota set always. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 David Sterba 提交于
This has been discussed in thread: http://thread.gmane.org/gmane.comp.file-systems.btrfs/32528 and this patch implements this proposal: http://thread.gmane.org/gmane.comp.file-systems.btrfs/32536 Works fine for "clean" raid profiles where the raid factor correction does the right job. Otherwise it's pessimistic and may show low space although there's still some left. The df nubmers are lightly wrong in case of mixed block groups, but this is not a major usecase and can be addressed later. The RAID56 numbers are wrong almost the same way as before and will be addressed separately. CC: Hugo Mills <hugo@carfax.org.uk> CC: cwillu <cwillu@cwillu.com> CC: Josef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
- 14 8月, 2014 2 次提交
-
-
由 Jeff Layton 提交于
There's no need to call locks_free_lock here while still holding the i_lock. Defer that until the lock has been dropped. Acked-by: NJ. Bruce Fields <bfields@fieldses.org> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-
由 Jeff Layton 提交于
In commit 72f98e72 (locks: turn lock_flocks into a spinlock), we moved from using the BKL to a global spinlock. With this change, we lost the ability to block in the fl_release_private operation. This is problematic for NFS (and probably some other filesystems as well). Add a new list_head argument to locks_delete_lock. If that argument is non-NULL, then queue any locks that we want to free to the list instead of freeing them. Then, add a new locks_dispose_list function that will walk such a list and call locks_free_lock on them after the i_lock has been dropped. Finally, change all of the callers of locks_delete_lock to pass in a list_head, except for lease_modify. That function can be called long after the i_lock has been acquired. Deferring the freeing of a lease after unlocking it in that function is non-trivial until we overhaul some of the spinlocking in the lease code. Currently though, no filesystem that sets fl_release_private supports leases, so this is not currently a problem. We'll eventually want to make the same change in the lease code, but it needs a lot more work before we can reasonably do so. Acked-by: NJ. Bruce Fields <bfields@fieldses.org> Signed-off-by: NJeff Layton <jlayton@primarydata.com>
-