- 28 8月, 2011 9 次提交
-
-
由 J. Bruce Fields 提交于
This flag doesn't really buy us anything. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Share some common code, stop doing silly things like initializing a list head immediately before adding it to a list, etc. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
These appear to be generic (for both open and lock owners), but they're actually just for open owners. This has confused me more than once. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
For all the usual reasons. (Type safety, readability.) Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
The server is returning nfserr_resource for both permanent errors and for errors (like allocation failures) that might be resolved by retrying later. Save nfserr_resource for the former and use delay/jukebox for the latter. Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Boaz Harrosh 提交于
Even if we fail to write a recovery record, we should still mark the client as having acquired its first state. Otherwise we leave 4.1 clients with indefinite ERR_GRACE returns. However, an inability to write stable storage records may cause failures of reboot recovery, and the problem should still be brought to the server administrator's attention. So, make sure the error is logged. These errors shouldn't normally be triggered on a corectly functioning server--this isn't a case where a misconfigured client could spam the logs. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Move around some of this code, simplify a bit. Reviewed-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Acked-by: NJim Rees <rees@umich.edu> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
A client that wants to execute a file must be able to read it. Read opens over nfs are therefore implicitly allowed for executable files even when those files are not readable. NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on read requests, but NFSv4 has gotten this wrong ever since dc730e17 "nfsd4: fix owner-override on open", when we realized that the file owner shouldn't override permissions on non-reclaim NFSv4 opens. So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow reads of executable files. So, do the same thing we do whenever we encounter another weird NFS permission nit: define yet another NFSD_MAY_* flag. The industry's future standardization on 128-bit processors will be motivated primarily by the need for integers with enough bits for all the NFSD_MAY_* flags. Reported-by: NLeonardo Borda <leonardoborda@gmail.com> Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 27 8月, 2011 6 次提交
-
-
由 J. Bruce Fields 提交于
Userspace shouldn't have a use for these constants. Nothing here is used outside fs/nfsd. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
The nfsd4 code has a bunch of special exceptions for error returns which map nfserr_symlink to other errors. In fact, the spec makes it clear that nfserr_symlink is to be preferred over less specific errors where possible. The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink related error returns.", which claims that these special exceptions are represent an NFSv4 break from v2/v3 tradition--when in fact the symlink error was introduced with v4. I suspect what happened was pynfs tests were written that were overly faithful to the (known-incomplete) rfc3530 error return lists, and then code was fixed up mindlessly to make the tests pass. Delete these unnecessary exceptions. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Zero means "I don't care what kind of file this is". And that's probably what we want--acls are also settable at least on directories, and if the filesystem doesn't want them on other objects, leave it to it to complain. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Add some more comments, simplify logic, do & S_IFMT just once, name "type" more helpfully. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
We allow the fh_verify caller to specify that any object *except* those of a given type is allowed, by passing a negative type. But only one caller actually uses it. Open-code that check in the one caller. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
A slightly unconventional approach to make the code more compact I could live with, but let's give the poor reader *some* chance. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 20 8月, 2011 7 次提交
-
-
由 Eric Dumazet 提交于
Use NUMA aware allocations to reduce latencies and increase throughput. sunrpc kthreads can use kthread_create_on_node() if pool_mode is "percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can also take into account NUMA node affinity for memory allocations. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: "J. Bruce Fields" <bfields@fieldses.org> CC: Neil Brown <neilb@suse.de> CC: David Miller <davem@davemloft.net> Reviewed-by: NGreg Banks <gnb@fastmail.fm> [bfields@redhat.com: fix up caller nfs41_callback_up] Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
There's an incorrect comment here. Also clean up the logic: the "rdlease" and "wrlease" locals are confusingly named, and don't really add anything since we can make a decision as soon as we hit one of these cases. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
We currently use a bit in fl_flags to record whether a lease is being broken, and set fl_type to the type (RDLCK or UNLCK) that it will eventually have. This means that once the lease break starts, we forget what the lease's type *used* to be. Breaking a read lease will then result in blocking read opens, even though there's no conflict--because the lease type is now F_UNLCK and we can no longer tell whether it was previously a read or write lease. So, instead keep fl_type as the original type (the type which we enforce), and keep track of whether we're unlocking or merely downgrading by replacing the single FL_INPROGRESS flag by FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags. To get this right we also need to track separate downgrade and break times, to handle the case where a write-leased file gets conflicting opens first for read, then later for write. (I first considered just eliminating the downgrade behavior completely--nfsv4 doesn't need it, and nobody as far as I can tell actually uses it currently--but Jeremy Allison tells me that Windows oplocks do behave this way, so Samba will probably use this some day.) Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
F_INPROGRESS isn't exposed to userspace. To me it makes more sense in fl_flags.... Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Use a helper function, to simplify upcoming changes. Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Without this, an attempt to open a device special file without first stat'ing it will fail. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
The set of errors here does *not* agree with the set of errors specified in the rfc! While we're there, turn this macros into a function, for the usual reasons, and move it to the one place where it's actually used. Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 17 8月, 2011 1 次提交
-
-
由 Bernd Schubert 提交于
Fan Yong <yong.fan@whamcloud.com> noticed setting FMODE_32bithash wouldn't work with nfsd v4, as nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530 cookies have a 64 bit type and cookies are also defined as u64 in 'struct nfsd4_readdir'. So remove the test for >32-bit values. Cc: stable@kernel.org Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 08 8月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
Al points out that the do_follow_link() helper function really is misnamed - it's about whether we should try to follow a symlink or not, not about actually doing the following. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2011 10 次提交
-
-
由 Ari Savolainen 提交于
After commit 3567866b: "RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached" posix_acl_permission is being called with an unsupported flag and the permission check fails. This patch fixes the issue. Signed-off-by: NAri Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Linus Torvalds 提交于
The inode structure layout is largely random, and some of the vfs paths really do care. The path lookup in particular is already quite D$ intensive, and profiles show that accessing the 'inode->i_op->xyz' fields is quite costly. We already optimized the dcache to not unnecessarily load the d_op structure for members that are often NULL using the DCACHE_OP_xyz bits in dentry->d_flags, and this does something very similar for the inode ops that are used during pathname lookup. It also re-orders the fields so that the fields accessed by 'stat' are together at the beginning of the inode structure, and roughly in the order accessed. The effect of this seems to be in the 1-2% range for an empty kernel "make -j" run (which is fairly kernel-intensive, mostly in filename lookup), so it's visible. The numbers are fairly noisy, though, and likely depend a lot on exact microarchitecture. So there's more tuning to be done. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Gcc tends to generate better code with small integers, including the DCACHE_xyz flag tests - so move the common ones to be first in the list. Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and DCACHE_AUTOFS_PENDING values, their users no longer exists in the source tree. And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the common case to be a nice straight-line fall-through. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Boaz Harrosh 提交于
Export everything from ore need exporting. Change Kbuild and Kconfig to build ore.ko as an independent module. Import ore from exofs Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
ORE stands for "Objects Raid Engine" This patch is a mechanical rename of everything that was in ios.c and its API declaration to an ore.c and an osd_ore.h header. The ore engine will later be used by the pnfs objects layout driver. * File ios.c => ore.c * Declaration of types and API are moved from exofs.h to a new osd_ore.h * All used types are prefixed by ore_ from their exofs_ name. * Shift includes from exofs.h to osd_ore.h so osd_ore.h is independent, include it from exofs.h. Other than a pure rename there are no other changes. Next patch will move the ore into it's own module and will export the API to be used by exofs and later the layout driver Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
Exofs raid engine was saving on memory space by having a single layout-info, single pid, and a single device-table, global to the filesystem. Then passing a credential and object_id info at the io_state level, private for each inode. It would also devise this contraption of rotating the device table view for each inode->ino to spread out the device usage. This is not compatible with the pnfs-objects standard, demanding that each inode can have it's own layout-info, device-table, and each object component it's own pid, oid and creds. So: Bring exofs raid engine to be usable for generic pnfs-objects use by: * Define an exofs_comp structure that holds obj_id and credential info. * Break up exofs_layout struct to an exofs_components structure that holds a possible array of exofs_comp and the array of devices + the size of the arrays. * Add a "comps" parameter to get_io_state() that specifies the ids creds and device array to use for each IO. This enables to keep the layout global, but the device-table view, creds and IDs at the inode level. It only adds two 64bit to each inode, since some of these members already existed in another form. * ios raid engine now access layout-info and comps-info through the passed pointers. Everything is pre-prepared by caller for generic access of these structures and arrays. At the exofs Level: * Super block holds an exofs_components struct that holds the device array, previously in layout. The devices there are in device-table order. The device-array is twice bigger and repeats the device-table twice so now each inode's device array can point to a random device and have a round-robin view of the table, making it compatible to previous exofs versions. * Each inode has an exofs_components struct that is initialized at load time, with it's own view of the device table IDs and creds. When doing IO this gets passed to the io_state together with the layout. While preforming this change. Bugs where found where credentials with the wrong IDs where used to access the different SB objects (super.c). As well as some dead code. It was never noticed because the target we use does not check the credentials. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
ios.c will be moving to an external library, for use by the objects-layout-driver. Remove from it some exofs specific functions. Also g_attr_logical_length is used both by inode.c and ios.c move definition to the later, to keep it independent Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
In future raid code we will need to know the IO offset/length and if it's a read or write to determine some of the array sizes we'll need. So add a new exofs_get_rw_state() API for use when writeing/reading. All other simple cases are left using the old way. The major change to this is that now we need to call exofs_get_io_state later at inode.c::read_exec and inode.c::write_exec when we actually know these things. So this patch is kept separate so I can test things apart from other changes. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Linus Torvalds 提交于
The CLOEXE bit is magical, and for performance (and semantic) reasons we don't actually maintain it in the file descriptor itself, but in a separate bit array. Which means that when we show f_flags, the CLOEXE status is shown incorrectly: we show the status not as it is now, but as it was when the file was opened. Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit array. Uli needs this in order to re-implement the pfiles program: "For normal file descriptors (not sockets) this was the last piece of information which wasn't available. This is all part of my 'give Solaris users no reason to not switch' effort. I intend to offer the code to the util-linux-ng maintainers." Requested-by: NUlrich Drepper <drepper@akkadia.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
WARN_ONCE() is very annoying, in that it shows the stack trace that we don't care about at all, and also triggers various user-level "kernel oopsed" logic that we really don't care about. And it's not like the user can do anything about the applications (sshd) in question, it's a distro issue. Requested-by: Andi Kleen <andi@firstfloor.org> (and many others) Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 8月, 2011 6 次提交
-
-
由 Jeff Layton 提交于
The loop around lookup_one_len doesn't handle the case where it might return a negative dentry, which can cause an oops on the next pass through the loop. Check for that and break out of the loop with an error of -ENOENT if there is one. Fixes the panic reported here: https://bugzilla.redhat.com/show_bug.cgi?id=727927Reported-by: NTR Bentley <home@trarbentley.net> Reported-by: NIain Arnell <iarnell@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: stable@kernel.org Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Jeff Layton 提交于
Regression from 2.6.39... The delimiters in the prefixpath are not being converted based on whether posix paths are in effect. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=727834Reported-and-Tested-by: NIain Arnell <iarnell@gmail.com> Reported-by: NPatrick Oltmann <patrick.oltmann@gmx.net> Cc: Pavel Shilovsky <piastryyy@gmail.com> Cc: stable@kernel.org Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Boaz Harrosh 提交于
In the general raid-group case the truncate was wrong in that it did not also fix the object length of the neighboring groups. There are two bad cases in the old code: 1. Space that should be freed was not. 2. If a file That was big is truncated small, then made bigger again, the holes would not contain zeros but could expose old data. (If the growing of the file expands to more than a full groups cycle + group size (> S + T)) Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
Small cleanup that unifies duplicated code used in both the error and success cases Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
Since the beginning we realloced the sbi structure when a bigger then one device table was specified. (I know that was really stupid). Then much later when "register bdi" was added (By Jens) it was registering the pointer to sbi->bdi before the realloc. We never saw this problem because up till now the realloc did not do anything since the device table was small enough to fit in the original allocation. But once we starting testing with large device tables (Bigger then 28) we noticed the crash of writeback operating on a deallocated pointer. * Avoid the all mess by allocating the device-table as a second array and get rid of the variable-sized structure and the rest of this mess. * Take the chance to clean near by structures and comments. * Add a needed dprint on startup to indicate the loaded layout. * Also move the bdi registration to the very end because it will only fail in a low memory, which will probably fail before hand. There are many more likely causes to not load before that. This way the error handling is made simpler. (Just doing this would be enough to fix the BUG) Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
Now that pnfs-osd has hit mainline we can remove exofs's private header. (And the FIXME comment) Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-