- 25 10月, 2016 1 次提交
-
-
由 Jeff Layton 提交于
Bruce was hitting some lockdep warnings in testing, showing that we could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a rather complex situation involving four different spinlocks. The crux of the matter is that we end up taking the nn->client_lock in the lm_notify handler. The simplest fix is to just declare a new per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 08 10月, 2016 4 次提交
-
-
由 Alexey Dobriyan 提交于
Current supplementary groups code can massively overallocate memory and is implemented in a way so that access to individual gid is done via 2D array. If number of gids is <= 32, memory allocation is more or less tolerable (140/148 bytes). But if it is not, code allocates full page (!) regardless and, what's even more fun, doesn't reuse small 32-entry array. 2D array means dependent shifts, loads and LEAs without possibility to optimize them (gid is never known at compile time). All of the above is unnecessary. Switch to the usual trailing-zero-len-array scheme. Memory is allocated with kmalloc/vmalloc() and only as much as needed. Accesses become simpler (LEA 8(gi,idx,4) or even without displacement). Maximum number of gids is 65536 which translates to 256KB+8 bytes. I think kernel can handle such allocation. On my usual desktop system with whole 9 (nine) aux groups, struct group_info shrinks from 148 bytes to 44 bytes, yay! Nice side effects: - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing, - fix little mess in net/ipv4/ping.c should have been using GROUP_AT macro but this point becomes moot, - aux group allocation is persistent and should be accounted as such. Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Vasily Kulikov <segoon@openwall.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anna Schumaker 提交于
I only implemented the sync version of this call, since it's the easiest. I can simply call vfs_copy_range() and have the vfs do the right thing for the filesystem being exported. Signed-off-by: NAnna Schumaker <bjschuma@netapp.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Eric Sandeen reports that xfs can return this if filesystem corruption prevented completing the operation. Reported-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
No need to spam the logs here. The only drawback is losing information if we ever encounter two different unmapped errors, but in practice we've rarely see even one. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 28 9月, 2016 1 次提交
-
-
由 Deepa Dinamani 提交于
current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 9月, 2016 7 次提交
-
-
由 J. Bruce Fields 提交于
A setclientid_confirm with (clientid, verifier) both matching an existing confirmed record is assumed to be a replay, but if the verifier doesn't match, it shouldn't be. This would be a very rare case, except that clients following https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the failure. Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
NFSv4.1 has built-in trunking support that allows a client to determine whether two connections to two different IP addresses are actually to the same server. NFSv4.0 does not, but RFC 7931 attempts to provide clients a means to do this, basically by performing a SETCLIENTID to one address and confirming it with a SETCLIENTID_CONFIRM to the other. Linux clients since 05f4c350 "NFS: Discover NFSv4 server trunking when mounting" implement a variation on this suggestion. It is possible that other clients do too. This depends on the clientid and verifier not being accepted by an unrelated server. Since both are 64-bit values, that would be very unlikely if they were random numbers. But they aren't: knfsd generates the 64-bit clientid by concatenating the 32-bit boot time (in seconds) and a counter. This makes collisions between clientids generated by the same server extremely unlikely. But collisions are very likely between clientids generated by servers that boot at the same time, and it's quite common for multiple servers to boot at the same time. The verifier is a concatenation of the SETCLIENTID time (in seconds) and a counter, so again collisions between different servers are likely if multiple SETCLIENTIDs are done at the same time, which is a common case. Therefore recent NFSv4.0 clients may decide two different servers are really the same, and mount a filesystem from the wrong server. Fortunately the Linux client, since 55b9df93 "nfsv4/v4.1: Verify the client owner id during trunking detection", only does this when given the non-default "migration" mount option. The fault is really with RFC 7931, and needs a client fix, but in the meantime we can mitigate the chance of these collisions by randomizing the starting value of the counters used to generate clientids and verifiers. Reported-by: NFrank Sorenson <fsorenso@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
If we are using v4.1+, then we can send notification when contended locks become free. Inform the client of that fact. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
It's possible for a client to call in on a lock that is blocked for a long time, but discontinue polling for it. A malicious client could even set a lock on a file, and then spam the server with failing lock requests from different lockowners that pile up in a DoS attack. Add the blocked lock structures to a per-net namespace LRU when hashing them, and timestamp them. If the lock request is not revisited after a lease period, we'll drop it under the assumption that the client is no longer interested. This also gives us a mechanism to clean up these objects at server shutdown time as well. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Create a new per-lockowner+per-inode structure that contains a file_lock. Have nfsd4_lock add this structure to the lockowner's list prior to setting the lock. Then call the vfs and request a blocking lock (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED back, then we dequeue the block structure and free it. When the next lock request comes in, we'll look for an existing block for the same filehandle and dequeue and reuse it if there is one. When the lock comes free (a'la an lm_notify call), we dequeue it from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to inform the client that it should retry the lock request. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Add the encoding/decoding for CB_NOTIFY_LOCK operations. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Vasily Averin 提交于
By design notifier can be registered once only, however nfsd registers the same inetaddr notifiers per net-namespace. When this happen it corrupts list of notifiers, as result some notifiers can be not called on proper event, traverse on list can be cycled forever, and second unregister can access already freed memory. Cc: stable@vger.kernel.org fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain") Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 23 9月, 2016 1 次提交
-
-
由 Jeff Layton 提交于
nfserr is big-endian, so we should convert it to host-endian before printing it. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 22 9月, 2016 1 次提交
-
-
由 Jan Kara 提交于
inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 17 9月, 2016 2 次提交
-
-
由 Jeff Layton 提交于
We already have that info in the client pointer. No need to pass around a copy. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
We currently can hit a deadlock (of sorts) when trying to use flexfiles layouts with XFS. XFS will call break_layout when something wants to write to the file. In the case of the (super-simple) flexfiles layout driver in knfsd, the MDS and DS are the same machine. The client can get a layout and then issue a v3 write to do its I/O. XFS will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to be issued to the client. The client however can't return the layout until the v3 WRITE completes, but XFS won't allow the write to proceed until the layout is returned. Christoph says: XFS only cares about block-like layouts where the client has direct access to the file blocks. I'd need to look how to propagate the flag into break_layout, but in principle we don't need to do any recalls on truncate ever for file and flexfile layouts. If we're never going to recall the layout, then we don't even need to set the lease at all. Just skip doing so on flexfiles layouts by adding a new flag to struct nfsd4_layout_ops and skipping the lease setting and removal when that flag is true. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 13 8月, 2016 1 次提交
-
-
由 Jeff Layton 提交于
nfsd4_lock will take the st_mutex before working with the stateid it gets, but between the time when we drop the cl_lock and take the mutex, the stateid could become unhashed (a'la FREE_STATEID). If that happens the lock stateid returned to the client will be forgotten. Fix this by first moving the st_mutex acquisition into lookup_or_create_lock_state. Then, have it check to see if the lock stateid is still hashed after taking the mutex. If it's not, then put the stateid and try the find/create again. Signed-off-by: NJeff Layton <jlayton@redhat.com> Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively. Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 12 8月, 2016 1 次提交
-
-
由 Chuck Lever 提交于
When running LTP's nfslock01 test, the Linux client can send a LOCK and a FREE_STATEID request at the same time. The outcome is: Frame 324 R OPEN stateid [2,O] Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64 Frame 115008 R LOCK stateid [1,L] Frame 115012 C WRITE stateid [0,L] offset 672000 len 64 Frame 115016 R WRITE NFS4_OK Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64 Frame 115022 R LOCKU NFS4_OK Frame 115025 C FREE_STATEID stateid [2,L] Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64 Frame 115029 R FREE_STATEID NFS4_OK Frame 115030 R LOCK stateid [3,L] Frame 115034 C WRITE stateid [0,L] offset 672128 len 64 Frame 115038 R WRITE NFS4ERR_BAD_STATEID In other words, the server returns stateid L in a successful LOCK reply, but it has already released it. Subsequent uses of stateid L fail. To address this, protect the generation check in nfsd4_free_stateid with the st_mutex. This should guarantee that only one of two outcomes occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID returns NFS4ERR_LOCKS_HELD. Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com> Fix-suggested-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 11 8月, 2016 1 次提交
-
-
由 Josef Bacik 提交于
b44061d0 introduced a dentry ref counting bug. Previously we were grabbing one ref to dchild in nfsd_create(), but with the creation of nfsd_create_locked() we have a ref for dchild from the lookup in nfsd_create(), and then another ref in nfsd_create_locked(). The ref from the lookup in nfsd_create() is never dropped and results in dentries still in use at unmount. Signed-off-by: NJosef Bacik <jbacik@fb.com> Fixes: b44061d0 "nfsd: reorganize nfsd_create" Reported-by: Nkernel test robot <xiaolong.ye@intel.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 05 8月, 2016 8 次提交
-
-
由 Dan Carpenter 提交于
We changed this around in f135af1041f ('nfsd: reorganize nfsd_create') so "dchild" can't be an error pointer any more. Also, dchild can't be NULL here (and dput would already handle this even if it was). Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
We need an fh_verify to make sure we at least have a dentry, but actual permission checks happen later. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Minor cleanup, no change in behavior. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
vfs_{create,mkdir,mknod} each begin with a call to may_create(), which returns EEXIST if the object already exists. This check is therefore unnecessary. (In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to RFC 1094, our code seems to believe that a CREATE of an existing file should succeed. I'm leaving that behavior alone.) Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
There's some odd logic in nfsd_create() that allows it to be called with the parent directory either locked or unlocked. The only already-locked caller is NFSv2's nfsd_proc_create(). It's less confusing to split out the unlocked case into a separate function which the NFSv2 code can call directly. Also fix some comments while we're here. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Create and other nfsd ops generally assume we can call lookup_one_len on inodes with S_IFDIR set. Al says that this assumption isn't true in general, though it should be for the filesystem objects nfsd sees. Add a check just to make sure our assumption isn't violated. Remove a couple checks for i_op->lookup in create code. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
lookup_one_len already has this check. The only effect of this patch is to return access instead of perm in the 0-length-filename case. I actually prefer nfserr_perm (or _inval?), but I doubt anyone cares. The isdotent check seems redundant too, but I worry that some client might actually care about that strange nfserr_exist error. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Oleg Drokin 提交于
When doing a create (mkdir/mknod) on a name, it's worth checking the name exists first before returning EACCES in case the directory is not writeable by the user. This makes return values on the client more consistent regardless of whenever the entry there is cached in the local cache or not. Another positive side effect is certain programs only expect EEXIST in that case even despite POSIX allowing any valid error to be returned. Signed-off-by: NOleg Drokin <green@linuxhacker.ru> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 16 7月, 2016 4 次提交
-
-
由 Jeff Layton 提交于
If the underlying filesystem supports multiple layout types, then there is little reason not to advertise that fact to clients and let them choose what type to use. Turn the ex_layout_type field into a bitfield. For each supported layout type, we set a bit in that field. When the client requests a layout, ensure that the bit for that layout type is set. When the client requests attributes, send back a list of supported types. Signed-off-by: NJeff Layton <jlayton@poochiereds.net> Reviewed-by: NWeston Andros Adamson <dros@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
nfsd4_release_lockowner finds a lock owner that has no lock state, and drops cl_lock. Then release_lockowner picks up cl_lock and unhashes the lock owner. During the window where cl_lock is dropped, I don't see anything preventing a concurrent nfsd4_lock from finding that same lock owner and adding lock state to it. Move release_lockowner() into nfsd4_release_lockowner and hang onto the cl_lock until after the lock owner's state cannot be found again. Found by inspection, we don't currently have a reproducer. Fixes: 2c41beb0 ("nfsd: reduce cl_lock thrashing in ... ") Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
These values are all multiples of 4 already, so there's no change in behavior from this patch. But perhaps this will prevent mistakes in the future. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Benjamin Coddington 提交于
Instead of creeping pnfs layout configuration into filesystems, move the definition of block-based export operations under a more abstract configuration. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 14 7月, 2016 6 次提交
-
-
由 Christophe JAILLET 提交于
Silent a few smatch warnings about indentation Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Oleg Drokin 提交于
Those are now defined in fs/nfsd/vfs.h Signed-off-by: NOleg Drokin <green@linuxhacker.ru> Reviewed-by: NJeff Layton <jlayton@poochiereds.net> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tom Haynes 提交于
Have a simple flex file server where the mds (NFSv4.1 or NFSv4.2) is also the ds (NFSv3). I.e., the metadata and the data file are the exact same file. This will allow testing of the flex file client. Simply add the "pnfs" export option to your export in /etc/exports and mount from a client that supports flex files. Signed-off-by: NTom Haynes <loghyr@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tom Haynes 提交于
Signed-off-by: NTom Haynes <loghyr@primarydata.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJeff Layton <jlayton@poochiereds.net> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Andrew Elble 提交于
This addresses the conundrum referenced in RFC5661 18.35.3, and will allow clients to return state to the server using the machine credentials. The biggest part of the problem is that we need to allow the client to send a compound op with integrity/privacy on mounts that don't have it enabled. Add server support for properly decoding and using spo_must_enforce and spo_must_allow bits. Add support for machine credentials to be used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN, and TEST/FREE STATEID. Implement a check so as to not throw WRONGSEC errors when these operations are used if integrity/privacy isn't turned on. Without this, Linux clients with credentials that expired while holding delegations were getting stuck in an endless loop. Signed-off-by: NAndrew Elble <aweits@rit.edu> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Andrew Elble 提交于
Rename mach_creds_match() to nfsd4_mach_creds_match() and un-staticify Signed-off-by: NAndrew Elble <aweits@rit.edu> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 25 6月, 2016 1 次提交
-
-
由 Ben Hutchings 提交于
Use set_posix_acl, which includes proper permission checks, instead of calling ->set_acl directly. Without this anyone may be able to grant themselves permissions to a file by setting the ACL. Lock the inode to make the new checks atomic with respect to set_acl. (Also, nfsd was the only caller of set_acl not locking the inode, so I suspect this may fix other races.) This also simplifies the code, and ensures our ACLs are checked by posix_acl_valid. The permission checks and the inode locking were lost with commit 4ac7249e, which changed nfsd to use the set_acl inode operation directly instead of going through xattr handlers. Reported-by: NDavid Sinquin <david@sinquin.eu> [agreunba@redhat.com: use set_posix_acl] Fixes: 4ac7249e Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 24 6月, 2016 1 次提交
-
-
由 Eric W. Biederman 提交于
Today what is normally called data (the mount options) is not passed to fill_super through mount_ns. Pass the mount options and the namespace separately to mount_ns so that filesystems such as proc that have mount options, can use mount_ns. Pass the user namespace to mount_ns so that the standard permission check that verifies the mounter has permissions over the namespace can be performed in mount_ns instead of in each filesystems .mount method. Thus removing the duplication between mqueuefs and proc in terms of permission checks. The extra permission check does not currently affect the rpc_pipefs filesystem and the nfsd filesystem as those filesystems do not currently allow unprivileged mounts. Without unpvileged mounts it is guaranteed that the caller has already passed capable(CAP_SYS_ADMIN) which guarantees extra permission check will pass. Update rpc_pipefs and the nfsd filesystem to ensure that the network namespace reference is always taken in fill_super and always put in kill_sb so that the logic is simpler and so that errors originating inside of fill_super do not cause a network namespace leak. Acked-by: NSeth Forshee <seth.forshee@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-