提交 · 28dde241cc65c9464b7627d9a9ed3a66e4df2586 · openeuler / raspberrypi-kernel

28 8月, 2011 9 次提交

由 J. Bruce Fields 提交于 8月 22, 2011

This flag doesn't really buy us anything.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

28dde241

nfsd4: cleanup lock/stateowner initialization · ff194bd9

由 J. Bruce Fields 提交于 8月 12, 2011

Share some common code, stop doing silly things like initializing a list
head immediately before adding it to a list, etc.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ff194bd9

nfsd4: name openowner data structures more clearly · 506f275f

由 J. Bruce Fields 提交于 8月 11, 2011

These appear to be generic (for both open and lock owners), but they're
actually just for open owners. This has confused me more than once.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

506f275f

nfsd4: replace some macros by functions · ddc04c41

由 J. Bruce Fields 提交于 7月 30, 2011

For all the usual reasons.  (Type safety, readability.)
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ddc04c41

nfsd4: stop using nfserr_resource for transitory errors · 3e772463

由 J. Bruce Fields 提交于 8月 10, 2011

The server is returning nfserr_resource for both permanent errors and
for errors (like allocation failures) that might be resolved by retrying
later.  Save nfserr_resource for the former and use delay/jukebox for
the latter.

Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3e772463

nfsd4: fix failure to end nfsd4 grace period · 6577aac0

由 Boaz Harrosh 提交于 8月 12, 2011

Even if we fail to write a recovery record, we should still mark the
client as having acquired its first state.  Otherwise we leave 4.1
clients with indefinite ERR_GRACE returns.

However, an inability to write stable storage records may cause failures
of reboot recovery, and the problem should still be brought to the
server administrator's attention.

So, make sure the error is logged.

These errors shouldn't normally be triggered on a corectly functioning
server--this isn't a case where a misconfigured client could spam the
logs.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6577aac0

nfsd4: simplify recovery dir setting · 48483bf2

由 J. Bruce Fields 提交于 8月 26, 2011

Move around some of this code, simplify a bit.
Reviewed-by: NBoaz Harrosh <bharrosh@panasas.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

48483bf2

J
nfsd: prettify NFSD_MAY_* flag definitions · 8e82fa8f
由 J. Bruce Fields 提交于 8月 25, 2011
```
Acked-by: NJim Rees <rees@umich.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
8e82fa8f

nfsd4: permit read opens of executable-only files · a043226b

由 J. Bruce Fields 提交于 8月 25, 2011

A client that wants to execute a file must be able to read it.  Read
opens over nfs are therefore implicitly allowed for executable files
even when those files are not readable.

NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on
read requests, but NFSv4 has gotten this wrong ever since
dc730e17 "nfsd4: fix owner-override on
open", when we realized that the file owner shouldn't override
permissions on non-reclaim NFSv4 opens.

So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow
reads of executable files.

So, do the same thing we do whenever we encounter another weird NFS
permission nit: define yet another NFSD_MAY_* flag.

The industry's future standardization on 128-bit processors will be
motivated primarily by the need for integers with enough bits for all
the NFSD_MAY_* flags.
Reported-by: NLeonardo Borda <leonardoborda@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a043226b

27 8月, 2011 6 次提交

Remove include/linux/nfsd/const.h · c10bd39d

由 J. Bruce Fields 提交于 8月 19, 2011

Userspace shouldn't have a use for these constants.  Nothing here is
used outside fs/nfsd.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c10bd39d

nfsd4: it's OK to return nfserr_symlink · 75c096f7

由 J. Bruce Fields 提交于 8月 15, 2011

The nfsd4 code has a bunch of special exceptions for error returns which
map nfserr_symlink to other errors.

In fact, the spec makes it clear that nfserr_symlink is to be preferred
over less specific errors where possible.

The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink
related error returns.", which claims that these special exceptions are
represent an NFSv4 break from v2/v3 tradition--when in fact the symlink
error was introduced with v4.

I suspect what happened was pynfs tests were written that were overly
faithful to the (known-incomplete) rfc3530 error return lists, and then
code was fixed up mindlessly to make the tests pass.

Delete these unnecessary exceptions.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

75c096f7

nfsd4: fix incorrect comment in nfsd4_set_nfs4_acl · e281d810

由 J. Bruce Fields 提交于 8月 15, 2011

Zero means "I don't care what kind of file this is".  And that's
probably what we want--acls are also settable at least on directories,
and if the filesystem doesn't want them on other objects, leave it to it
to complain.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e281d810

nfsd: clean up nfsd_mode_check() · e10f9e14

由 J. Bruce Fields 提交于 8月 15, 2011

Add some more comments, simplify logic, do & S_IFMT just once, name
"type" more helpfully.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e10f9e14

nfsd: open-code special directory-hardlink check · 7d818a7b

由 J. Bruce Fields 提交于 8月 15, 2011

We allow the fh_verify caller to specify that any object *except* those
of a given type is allowed, by passing a negative type. But only one
caller actually uses it. Open-code that check in the one caller.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7d818a7b

nfsd4: clean up S_IS -> NF4 file type mapping · 3d2544b1

由 J. Bruce Fields 提交于 8月 15, 2011

A slightly unconventional approach to make the code more compact I could
live with, but let's give the poor reader *some* chance.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3d2544b1

20 8月, 2011 7 次提交

sunrpc: use better NUMA affinities · 11fd165c

由 Eric Dumazet 提交于 7月 28, 2011

Use NUMA aware allocations to reduce latencies and increase throughput.

sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: NGreg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

11fd165c

locks: setlease cleanup · c1f24ef4

由 J. Bruce Fields 提交于 8月 19, 2011

There's an incorrect comment here.  Also clean up the logic: the
"rdlease" and "wrlease" locals are confusingly named, and don't really
add anything since we can make a decision as soon as we hit one of these
cases.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c1f24ef4

locks: fix tracking of inprogress lease breaks · 778fc546

由 J. Bruce Fields 提交于 7月 26, 2011

We currently use a bit in fl_flags to record whether a lease is being
broken, and set fl_type to the type (RDLCK or UNLCK) that it will
eventually have.  This means that once the lease break starts, we forget
what the lease's type *used* to be.  Breaking a read lease will then
result in blocking read opens, even though there's no conflict--because
the lease type is now F_UNLCK and we can no longer tell whether it was
previously a read or write lease.

So, instead keep fl_type as the original type (the type which we
enforce), and keep track of whether we're unlocking or merely
downgrading by replacing the single FL_INPROGRESS flag by
FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags.

To get this right we also need to track separate downgrade and break
times, to handle the case where a write-leased file gets conflicting
opens first for read, then later for write.

(I first considered just eliminating the downgrade behavior
completely--nfsv4 doesn't need it, and nobody as far as I can tell
actually uses it currently--but Jeremy Allison tells me that Windows
oplocks do behave this way, so Samba will probably use this some day.)
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

778fc546

locks: move F_INPROGRESS from fl_type to fl_flags field · 710b7216

由 J. Bruce Fields 提交于 7月 26, 2011

F_INPROGRESS isn't exposed to userspace.  To me it makes more sense in
fl_flags....
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

710b7216

locks: minor lease cleanup · ab83fa4b

由 J. Bruce Fields 提交于 7月 26, 2011

Use a helper function, to simplify upcoming changes.
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ab83fa4b

nfsd4: return nfserr_symlink on v4 OPEN of non-regular file · aadab6c6

由 J. Bruce Fields 提交于 8月 15, 2011

Without this, an attempt to open a device special file without first
stat'ing it will fail.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

aadab6c6

nfsd4: fix seqid_mutating_error · 57616300

由 J. Bruce Fields 提交于 8月 10, 2011

The set of errors here does *not* agree with the set of errors specified
in the rfc!

While we're there, turn this macros into a function, for the usual
reasons, and move it to the one place where it's actually used.

Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

57616300

17 8月, 2011 1 次提交

nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir() · 832023bf

由 Bernd Schubert 提交于 8月 08, 2011

Fan Yong <yong.fan@whamcloud.com> noticed setting
FMODE_32bithash wouldn't work with nfsd v4, as
nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
cookies have a 64 bit type and cookies are also defined as u64 in
'struct nfsd4_readdir'. So remove the test for >32-bit values.

Cc: stable@kernel.org
Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

832023bf

08 8月, 2011 1 次提交

vfs: rename 'do_follow_link' to 'should_follow_link' · 7813b94a

由 Linus Torvalds 提交于 8月 07, 2011

Al points out that the do_follow_link() helper function really is
misnamed - it's about whether we should try to follow a symlink or not,
not about actually doing the following.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7813b94a

07 8月, 2011 10 次提交

Fix POSIX ACL permission check · 206b1d09

由 Ari Savolainen 提交于 8月 06, 2011

After commit 3567866b: "RCUify freeing acls, let check_acl() go ahead in
RCU mode if acl is cached" posix_acl_permission is being called with an
unsupported flag and the permission check fails. This patch fixes the issue.
Signed-off-by: NAri Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

206b1d09

vfs: optimize inode cache access patterns · 3ddcd056

由 Linus Torvalds 提交于 8月 06, 2011

The inode structure layout is largely random, and some of the vfs paths
really do care.  The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.

We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.

It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.

The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible.  The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture.  So there's more tuning
to be done.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ddcd056

vfs: renumber DCACHE_xyz flags, remove some stale ones · 830c0f0e

由 Linus Torvalds 提交于 8月 06, 2011

Gcc tends to generate better code with small integers, including the
DCACHE_xyz flag tests - so move the common ones to be first in the list.
Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
tree.

And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
common case to be a nice straight-line fall-through.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

830c0f0e

ore: Make ore its own module · cf283ade

由 Boaz Harrosh 提交于 8月 06, 2011

Export everything from ore need exporting. Change Kbuild and Kconfig
to build ore.ko as an independent module. Import ore from exofs
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

cf283ade

exofs: Rename raid engine from exofs/ios.c => ore · 8ff660ab

由 Boaz Harrosh 提交于 8月 06, 2011

ORE stands for "Objects Raid Engine"

This patch is a mechanical rename of everything that was in ios.c
and its API declaration to an ore.c and an osd_ore.h header. The ore
engine will later be used by the pnfs objects layout driver.

* File ios.c => ore.c

* Declaration of types and API are moved from exofs.h to a new
  osd_ore.h

* All used types are prefixed by ore_ from their exofs_ name.

* Shift includes from exofs.h to osd_ore.h so osd_ore.h is
  independent, include it from exofs.h.

Other than a pure rename there are no other changes. Next patch
will move the ore into it's own module and will export the API
to be used by exofs and later the layout driver
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

8ff660ab

exofs: ios: Move to a per inode components & device-table · 9e9db456

由 Boaz Harrosh 提交于 8月 05, 2011

Exofs raid engine was saving on memory space by having a single layout-info,
single pid, and a single device-table, global to the filesystem. Then passing
a credential and object_id info at the io_state level, private for each
inode. It would also devise this contraption of rotating the device table
view for each inode->ino to spread out the device usage.

This is not compatible with the pnfs-objects standard, demanding that
each inode can have it's own layout-info, device-table, and each object
component it's own pid, oid and creds.

So: Bring exofs raid engine to be usable for generic pnfs-objects use by:

* Define an exofs_comp structure that holds obj_id and credential info.

* Break up exofs_layout struct to an exofs_components structure that holds a
  possible array of exofs_comp and the array of devices + the size of the
  arrays.

* Add a "comps" parameter to get_io_state() that specifies the ids creds
  and device array to use for each IO.

  This enables to keep the layout global, but the device-table view, creds
  and IDs at the inode level. It only adds two 64bit to each inode, since
  some of these members already existed in another form.

* ios raid engine now access layout-info and comps-info through the passed
  pointers. Everything is pre-prepared by caller for generic access of
  these structures and arrays.

At the exofs Level:

* Super block holds an exofs_components struct that holds the device
  array, previously in layout. The devices there are in device-table
  order. The device-array is twice bigger and repeats the device-table
  twice so now each inode's device array can point to a random device
  and have a round-robin view of the table, making it compatible to
  previous exofs versions.

* Each inode has an exofs_components struct that is initialized at
  load time, with it's own view of the device table IDs and creds.
  When doing IO this gets passed to the io_state together with the
  layout.

While preforming this change. Bugs where found where credentials with the
wrong IDs where used to access the different SB objects (super.c). As well
as some dead code. It was never noticed because the target we use does not
check the credentials.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

9e9db456

exofs: Move exofs specific osd operations out of ios.c · 85e44df4

由 Boaz Harrosh 提交于 5月 16, 2011

ios.c will be moving to an external library, for use by the
objects-layout-driver. Remove from it some exofs specific functions.

Also g_attr_logical_length is used both by inode.c and ios.c
move definition to the later, to keep it independent
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

85e44df4

exofs: Add offset/length to exofs_get_io_state · e1042ba0

由 Boaz Harrosh 提交于 11月 16, 2010

In future raid code we will need to know the IO offset/length
and if it's a read or write to determine some of the array
sizes we'll need.

So add a new exofs_get_rw_state() API for use when
writeing/reading. All other simple cases are left using the
old way.

The major change to this is that now we need to call
exofs_get_io_state later at inode.c::read_exec and
inode.c::write_exec when we actually know these things. So this
patch is kept separate so I can test things apart from other
changes.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

e1042ba0

vfs: show O_CLOEXE bit properly in /proc/<pid>/fdinfo/<fd> files · 1117f72e

由 Linus Torvalds 提交于 8月 06, 2011

The CLOEXE bit is magical, and for performance (and semantic) reasons we
don't actually maintain it in the file descriptor itself, but in a
separate bit array.  Which means that when we show f_flags, the CLOEXE
status is shown incorrectly: we show the status not as it is now, but as
it was when the file was opened.

Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit
array.

Uli needs this in order to re-implement the pfiles program:

  "For normal file descriptors (not sockets) this was the last piece of
   information which wasn't available.  This is all part of my 'give
   Solaris users no reason to not switch' effort.  I intend to offer the
   code to the util-linux-ng maintainers."
Requested-by: NUlrich Drepper <drepper@akkadia.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1117f72e

oom_ajd: don't use WARN_ONCE, just use printk_once · c2142704

由 Linus Torvalds 提交于 8月 06, 2011

WARN_ONCE() is very annoying, in that it shows the stack trace that we
don't care about at all, and also triggers various user-level "kernel
oopsed" logic that we really don't care about.  And it's not like the
user can do anything about the applications (sshd) in question, it's a
distro issue.

Requested-by: Andi Kleen <andi@firstfloor.org> (and many others)
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2142704

05 8月, 2011 6 次提交

cifs: cope with negative dentries in cifs_get_root · 80975d21

由 Jeff Layton 提交于 8月 05, 2011

The loop around lookup_one_len doesn't handle the case where it might
return a negative dentry, which can cause an oops on the next pass
through the loop. Check for that and break out of the loop with an
error of -ENOENT if there is one.

Fixes the panic reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=727927Reported-by: NTR Bentley <home@trarbentley.net>
Reported-by: NIain Arnell <iarnell@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

80975d21

cifs: convert prefixpath delimiters in cifs_build_path_to_root · f9e8c450

由 Jeff Layton 提交于 8月 05, 2011

Regression from 2.6.39...

The delimiters in the prefixpath are not being converted based on
whether posix paths are in effect. Fixes:

    https://bugzilla.redhat.com/show_bug.cgi?id=727834Reported-and-Tested-by: NIain Arnell <iarnell@gmail.com>
Reported-by: NPatrick Oltmann <patrick.oltmann@gmx.net>
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

f9e8c450

exofs: Fix truncate for the raid-groups case · 16f75bb3

由 Boaz Harrosh 提交于 8月 03, 2011

In the general raid-group case the truncate was wrong in that
it did not also fix the object length of the neighboring groups.

There are two bad cases in the old code:
1. Space that should be freed was not.
2. If a file That was big is truncated small, then made bigger
   again, the holes would not contain zeros but could expose old data.
   (If the growing of the file expands to more than a full
    groups cycle + group size (> S + T))
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

16f75bb3

exofs: Small cleanup of exofs_fill_super · 9ce73047

由 Boaz Harrosh 提交于 8月 03, 2011

Small cleanup that unifies duplicated code used in both the
error and success cases
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

9ce73047

exofs: BUG: Avoid sbi realloc · 6d4073e8

由 Boaz Harrosh 提交于 7月 27, 2011

Since the beginning we realloced the sbi structure when a bigger
then one device table was specified. (I know that was really stupid).

Then much later when "register bdi" was added (By Jens) it was
registering the pointer to sbi->bdi before the realloc.

We never saw this problem because up till now the realloc did not
do anything since the device table was small enough to fit in the
original allocation. But once we starting testing with large device
tables (Bigger then 28) we noticed the crash of writeback operating
on a deallocated pointer.

* Avoid the all mess by allocating the device-table as a second array
  and get rid of the variable-sized structure and the rest of this
  mess.
* Take the chance to clean near by structures and comments.
* Add a needed dprint on startup to indicate the loaded layout.
* Also move the bdi registration to the very end because it will
  only fail in a low memory, which will probably fail before hand.
  There are many more likely causes to not load before that. This
  way the error handling is made simpler. (Just doing this would be
  enough to fix the BUG)
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

6d4073e8

exofs: Remove pnfs-osd private definitions · 26ae93c2

由 Boaz Harrosh 提交于 2月 02, 2010

Now that pnfs-osd has hit mainline we can remove exofs's
private header. (And the FIXME comment)
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

26ae93c2