- 09 7月, 2014 19 次提交
-
-
由 Trond Myklebust 提交于
In the NFSv4 spec, lock stateids are per-file objects. Lockowners are not. This patch replaces the current list of lock owners in the open stateids with a list of lock stateids. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Minor cleanup that should introduce no behavioral changes. Currently this function just unhashes the stateid and leaves the caller to do the work of the CLOSE processing. Change nfsd4_close_open_stateid so that it handles doing all of the work of closing a stateid. Move the handling of the unhashed stateid into it instead of doing that work in nfsd4_close. This will help isolate some coming changes to stateid handling from nfsd4_close. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
There's no need to confirm an openowner in v4.1 and above, so we can go ahead and set NFS4_OO_CONFIRMED when we create openowners in those versions. This will also be necessary when we remove the client_mutex, as it'll be possible for two concurrent opens to race in versions >4.0. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Move the slot return, put session etc into a helper in fs/nfsd/nfs4state.c instead of open coding in nfs4svc_encode_compoundres. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Not technically a bugfix, since nothing tries to use the return pointer if this function doesn't return success, but it could be a problem with some coming changes. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Currently, the maximum number of connections that nfsd will allow is based on the number of threads spawned. While this is fine for a default, there really isn't a clear relationship between the two. The number of threads corresponds to the number of concurrent requests that we want to allow the server to process at any given time. The connection limit corresponds to the maximum number of clients that we want to allow the server to handle. These are two entirely different quantities. Break the dependency on increasing threads in order to allow for more connections, by adding a new per-net parameter that can be set to a non-zero value. The default is still to base it on the number of threads, so there should be no behavior change for anyone who doesn't use it. Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Since nfsd_create_setattr strips the mode from the struct iattr, it is quite possible that it will optimise away the call to nfsd_setattr altogether. If this is the case, then we never call commit_metadata() on the newly created file. Also ensure that both nfsd_setattr() and nfsd_create_setattr() fail when the call to commit_metadata fails. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Commit db2e747b (vfs: remove mode parameter from vfs_symlink()) have remove mode parameter from vfs_symlink. So that, iattr isn't needed by nfsd_symlink now, just remove it. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
Current code depends on the client_mutex to guarantee a single struct nfs4_file per inode in the file_hashtbl and make addition atomic with respect to lookup. Rely instead on the state_Lock, to make it easier to stop taking the client_mutex here later. To prevent an i_lock/state_lock inversion, change nfsd4_init_file to use ihold instead if igrab. That's also more efficient anyway as we definitely hold a reference to the inode at that point. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Christoph Hellwig 提交于
nfsd4_process_open2 will currently will get access to the file, and then call nfsd4_truncate to (possibly) truncate it. If that operation fails though, then the access references will never be released as the nfs4_ol_stateid is never initialized. Fix by moving the nfsd4_truncate call into nfs4_get_vfs_file, ensuring that the refcounts are properly put if the truncate fails. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_readv': >> fs/nfsd/nfs4xdr.c:3137:148: warning: comparison of distinct pointer types lacks a cast [enabled by default] thislen = min(len, ((void *)xdr->end - (void *)xdr->p)); Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Avoid an extra allocation for the tmpbuf struct itself, and stop ignoring some allocation failures. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
This is a not-that-useful kmalloc wrapper. And I'd like one of the callers to actually use something other than kmalloc. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
28e05dd8 "knfsd: nfsd4: represent nfsv4 acl with array instead of linked list" removed the last user that wanted a custom free function. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
The name of a link is currently stored in cr_name and cr_namelen, and the content in cr_linkname and cr_linklen. That's confusing. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Currently nfsd_symlink has a weird hack to serve callers who don't null-terminate symlink data: it looks ahead at the next byte to see if it's zero, and copies it to a new buffer to null-terminate if not. That means callers don't have to null-terminate, but they *do* have to ensure that the byte following the end of the data is theirs to read. That's a bit subtle, and the NFSv4 code actually got this wrong. So let's just throw out that code and let callers pass null-terminated strings; we've already fixed them to do that. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
It's simple enough for NFSv2 to null-terminate the symlink data. A bit weird (it depends on knowing that we've already read the following byte, which is either padding or part of the mode), but no worse than the conditional kstrdup it otherwise relies on in nfsd_symlink(). Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
An NFS operation that creates a new symlink includes the symlink data, which is xdr-encoded as a length followed by the data plus 0 to 3 bytes of zero-padding as required to reach a 4-byte boundary. The vfs, on the other hand, wants null-terminated data. The simple way to handle this would be by copying the data into a newly allocated buffer with space for the final null. The current nfsd_symlink code tries to be more clever by skipping that step in the (likely) case where the byte following the string is already 0. But that assumes that the byte following the string is ours to look at. In fact, it might be the first byte of a page that we can't read, or of some object that another task might modify. Worse, the NFSv4 code tries to fix the problem by actually writing to that byte. In the NFSv2/v3 cases this actually appears to be safe: - nfs3svc_decode_symlinkargs explicitly null-terminates the data (after first checking its length and copying it to a new page). - NFSv2 limits symlinks to 1k. The buffer holding the rpc request is always at least a page, and the link data (and previous fields) have maximum lengths that prevent the request from reaching the end of a page. In the NFSv4 case the CREATE op is potentially just one part of a long compound so can end up on the end of a page if you're unlucky. The minimal fix here is to copy and null-terminate in the NFSv4 case. The nfsd_symlink() interface here seems too fragile, though. It should really either do the copy itself every time or just require a null-terminated string. Reported-by: NJeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 23 6月, 2014 10 次提交
-
-
由 Jeff Layton 提交于
Currently rpc_pton() fails to handle the case where you echo an address into the file, as it barfs on the newline. Ensure that we NULL out the first occurrence of any newline. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
AFAICT, the only way to hit this error is to pass this function a bogus "who" value. In that case, we probably don't want to return -1 as that could get sent back to the client. Turn this into nfserr_serverfault, which is a more appropriate error for a server bug like this. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
The filehandle structs all use host-endian values, but will sometimes stuff big-endian values into those fields. This is OK since these values are opaque to the client, but it confuses sparse. Add __force to make it clear that we are doing this intentionally. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
The callers expect a __be32 return and the functions they call return __be32, so having these return int is just wrong. Also, nfsd_finish_read can be made static. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
We currently hash the XID to determine a hash bucket to use for the reply cache entry, which is fed into hash_32 without byte-swapping it. Add __force to make sparse happy, and add some comments to explain why. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
sparse complains that we're stuffing non-byte-swapped values into __be32's here. Since they're supposed to be opaque, it doesn't matter much. Just add __force to make sparse happy. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Don't using cache_get besides export.h, using exp_get for export. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 20 6月, 2014 9 次提交
-
-
由 Miao Xie 提交于
The original bio might be submitted, so we shoud increase bi_remaining to account for it when we deal with the error that the device is missing or is not writeable, or we would skip the endio handle. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
The deadlock happened when we mount degraded filesystem, the reproduced steps are following: # mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1> # echo 1 > /sys/block/`basename <dev0>`/device/delete # mount -o degraded <dev1> <mnt> The reason was that the counter -- bi_remaining was wrong. If the missing or unwriteable device was the last device in the mapping array, we would not submit the original bio, so we shouldn't increase bi_remaining of it in btrfs_end_bio(), or we would skip the final endio handle. Fix this problem by adding a flag into btrfs bio structure. If we submit the original bio, we will set the flag, and we increase bi_remaining counter, or we don't. Though there is another way to fix it -- decrease bi_remaining counter of the original bio when we make sure the original bio is not submitted, this method need add more check and is easy to make mistake. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
While running balance, scrub, fsstress concurrently we hit the following kernel crash: [56561.448845] BTRFS info (device sde): relocating block group 11005853696 flags 132 [56561.524077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 [56561.524237] IP: [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs] [56561.524297] PGD 9be28067 PUD 7f3dd067 PMD 0 [56561.524325] Oops: 0000 [#1] SMP [....] [56561.527237] Call Trace: [56561.527309] [<ffffffffa038980e>] scrub_enumerate_chunks+0x24e/0x490 [btrfs] [56561.527392] [<ffffffff810abe00>] ? abort_exclusive_wait+0x50/0xb0 [56561.527476] [<ffffffffa038add4>] btrfs_scrub_dev+0x1a4/0x530 [btrfs] [56561.527561] [<ffffffffa0368107>] btrfs_ioctl+0x13f7/0x2a90 [btrfs] [56561.527639] [<ffffffff811c82f0>] do_vfs_ioctl+0x2e0/0x4c0 [56561.527712] [<ffffffff8109c384>] ? vtime_account_user+0x54/0x60 [56561.527788] [<ffffffff810f768c>] ? __audit_syscall_entry+0x9c/0xf0 [56561.527870] [<ffffffff811c8551>] SyS_ioctl+0x81/0xa0 [56561.527941] [<ffffffff815707f7>] tracesys+0xdd/0xe2 [...] [56561.528304] RIP [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs] [56561.528395] RSP <ffff88004c0f5be8> [56561.528454] CR2: 0000000000000078 This is because in btrfs_relocate_chunk(), we will free @bdev directly while scrub may still hold extent mapping, and may access freed memory. Fix this problem by wrapping freeing @bdev work into free_extent_map() which is based on reference count. Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
When run scrub with balance, sometimes -ENOENT will be returned, since in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but btrfs_lookup_block_group() will search block group in *MEMORY*, so if a chunk is removed but not committed, -ENOENT will be returned. However, there is no need to stop scrubbing since other chunks may be scrubbed without problem. So this patch changes the behavior to skip removed chunks and continue to scrub the rest. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
When we mounted the filesystem after the crash, we got the following message: BTRFS error (device xxx): block group xxxx has wrong amount of free space BTRFS error (device xxx): failed to load free space cache for block group xxx It is because we didn't update the metadata of the allocated space (in extent tree) until the file data was written into the disk. During this time, there was no information about the allocated spaces in either the extent tree nor the free space cache. when we wrote out the free space cache at this time (commit transaction), those spaces were lost. In fact, only the free space that is used to store the file data had this problem, the others didn't because the metadata of them is updated in the same transaction context. There are many methods which can fix the above problem - track the allocated space, and write it out when we write out the free space cache - account the size of the allocated space that is used to store the file data, if the size is not zero, don't write out the free space cache. The first one is complex and may make the performance drop down. This patch chose the second method, we use a per-block-group variant to account the size of that allocated space. Besides that, we also introduce a per-block-group read-write semaphore to avoid the race between the allocation and the free space cache write out. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
This patch makes the free space cache write out functions more readable, and beisdes that, it also reduces the stack space that the function -- __btrfs_write_out_cache uses from 194bytes to 144bytes. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
The lock_wq wait queue is not used anywhere, therefore just remove it. On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320 bytes down to 296 bytes, which means a 4Kb page can now be used for 13 extent buffers instead of 12. Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Chris Mason 提交于
The Btrfs tree trylock function is poorly named. It always takes the spinlock and backs off if the blocking lock is held. This can lead to surprising lockups because people expect it to really be a trylock. This commit makes it a pure trylock, both for the spinlock and the blocking lock. It also reworks the nested lock handling slightly to avoid taking the read lock while a spinning write lock might be held. Signed-off-by: NChris Mason <clm@fb.com>
-
- 18 6月, 2014 2 次提交
-
-
由 Kinglong Mee 提交于
Commit 561f0ed4 (nfsd4: allow large readdirs) introduces a bug about readdir the root of pseudofs. Call xdr_truncate_encode() revert encoded name when skipping. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 NeilBrown 提交于
If nfsd needs to recall a delegation for some reason it implies that there is contention on the file, so further delegations should not be handed out. The current code fails to do so, and the result is effectively a live-lock under some workloads: a client attempting a conflicting operation on a read-delegated file receives NFS4ERR_DELAY and retries the operation, but by the time it retries the server may already have given out another delegation. We could simply avoid delegations for (say) 30 seconds after any recall, but this is probably too heavy handed. We could keep a list of inodes (or inode numbers or filehandles) for recalled delegations, but that requires memory allocation and searching. The approach taken here is to use a bloom filter to record the filehandles which are currently blocked from delegation, and to accept the cost of a few false positives. We have 2 bloom filters, each of which is valid for 30 seconds. When a delegation is recalled the filehandle is added to one filter and will remain disabled for between 30 and 60 seconds. We keep a count of the number of filehandles that have been added, so when that count is zero we can bypass all other tests. The bloom filters have 256 bits and 3 hash functions. This should allow a couple of dozen blocked filehandles with minimal false positives. If many more filehandles are all blocked at once, behaviour will degrade towards rejecting all delegations for between 30 and 60 seconds, then resetting and allowing new delegations. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-