提交 · 42d671c78f6486c932b68a50f88768c7b4e57ebf · openeuler / raspberrypi-kernel

19 3月, 2009 35 次提交

Inconsistent setattr behaviour · 0953e620

由 Sachin S. Prabhu 提交于 2月 23, 2009

There is an inconsistency seen in the behaviour of nfs compared to other local
filesystems on linux when changing owner or group of a directory. If the
directory has SUID/SGID flags set, on changing owner or group on the directory,
the flags are stripped off on nfs. These flags are maintained on other
filesystems such as ext3.

To reproduce on a nfs share or local filesystem, run the following commands
mkdir test; chmod +s+g test; chown user1 test; ls -ld test

On the nfs share, the flags are stripped and the output seen is
drwxr-xr-x 2 user1 root 4096 Feb 23  2009 test

On other local filesystems(ex: ext3), the flags are not stripped and the output
seen is
drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test

chown_common() called from sys_chown() will only strip the flags if the inode is
not a directory.
static int chown_common(struct dentry * dentry, uid_t user, gid_t group)
{
..
        if (!S_ISDIR(inode->i_mode))
                newattrs.ia_valid |=
                        ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
..
}

See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html

"If the path argument refers to a regular file, the set-user-ID (S_ISUID) and
set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return
from chown(), unless the call is made by a process with appropriate privileges,
in which case it is implementation-dependent whether these bits are altered. If
chown() is successfully invoked on a file that is not a regular file, these
bits may be cleared. These bits are defined in <sys/stat.h>."

The behaviour as it stands does not appear to violate POSIX.  However the
actions performed are inconsistent when comparing ext3 and nfs.
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

0953e620

nfsd4: don't check ip address in setclientid · 026722c2

由 J. Bruce Fields 提交于 3月 18, 2009

The spec allows clients to change ip address, so we shouldn't be
requiring that setclientid always come from the same address.  For
example, a client could reboot and get a new dhcpd address, but still
present the same clientid to the server.  In that case the server should
revoke the client's previous state and allow it to continue, instead of
(as it currently does) returning a CLID_INUSE error.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

026722c2

knfsd: add file to export stats about nfsd pools · 03cf6c9f

由 Greg Banks 提交于 1月 13, 2009

Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.

This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:

http://article.gmane.org/gmane.linux.nfs/10375

It has also been updated thus:

 * moved EXPORT_SYMBOL() to near the function it exports
 * made the new struct struct seq_operations const
 * used SEQ_START_TOKEN instead of ((void *)1)
 * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
   printk in svc_pool_stats_*()" by Harshula Jayasuriya.
 * merged fix from SGI PV 964001 "Crash reading pool_stats before
   nfsds are started".
Signed-off-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NHarshula Jayasuriya <harshula@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

03cf6c9f

knfsd: remove the nfsd thread busy histogram · 8bbfa9f3

由 Greg Banks 提交于 1月 13, 2009

Stop gathering the data that feeds the 'th' line in /proc/net/rpc/nfsd
because the questionable data provided is not worth the scalability
impact of calculating it.  Instead, always report zeroes.  The current
approach suffers from three major issues:

1. update_thread_usage() increments buckets by call service
   time or call arrival time...in jiffies.  On lightly loaded
   machines, call service times are usually < 1 jiffy; on
   heavily loaded machines call arrival times will be << 1 jiffy.
   So a large portion of the updates to the buckets are rounded
   down to zero, and the histogram is undercounting.

2. As seen previously on the nfs mailing list, the format in which
   the histogram is presented is cryptic, difficult to explain,
   and difficult to use.

3. Updating the histogram requires taking a global spinlock and
   dirtying the global variables nfsd_last_call, nfsd_busy, and
   nfsdstats *twice* on every RPC call, which is a significant
   scaling limitation.

Testing on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing
1K streaming reads at full line rate, shows the stats update code
(inlined into nfsd()) takes about 1.7% of each CPU.  This patch drops
the contribution from nfsd() into the profile noise.

This patch is a forward-ported version of knfsd-remove-nfsd-threadstats
which has been shipping in the SGI "Enhanced NFS" product since 2006.
In that time, exactly one customer has noticed that the threadstats
were missing.  It has been previously posted:

http://article.gmane.org/gmane.linux.nfs/10376

and more recently requested to be posted again.
Signed-off-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

8bbfa9f3

nfsd4: remove redundant check from nfsd4_open · 5cb031b0

由 J. Bruce Fields 提交于 3月 14, 2009

Note that we already checked for this invalid case at the top of this
function.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

5cb031b0

nfsd4: don't do lookup within readdir in recovery code · 05f4f678

由 J. Bruce Fields 提交于 3月 13, 2009

The main nfsd code was recently modified to no longer do lookups from
withing the readdir callback, to avoid locking problems on certain
filesystems.

This (rather hacky, and overdue for replacement) NFSv4 recovery code has
the same problem.  Fix it to build up a list of names (instead of
dentries) and do the lookups afterwards.

Reported symptoms were a deadlock in the xfs code (called from
nfsd4_recdir_load), with /var/lib/nfs on xfs.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Reported-by: NDavid Warren <warren@atmos.washington.edu>

05f4f678

nfsd4: support putpubfh operation · a1c8c4d1

由 J. Bruce Fields 提交于 3月 09, 2009

Currently putpubfh returns NFSERR_OPNOTSUPP, which isn't actually
allowed for v4.  The right error is probably NFSERR_NOTSUPP.

But let's just implement it; though rarely seen, it can be used by
Solaris (with a special mount option), is mandated by the rfc, and is
trivial for us to support.

Thanks to Yang Hongyang for pointing out the original problem, and to
Mike Eisler, Tom Talpey, Trond Myklebust, and Dave Noveck for further
argument....
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

a1c8c4d1

Short write in nfsd becomes a full write to the client · 31dec253

由 David Shaw 提交于 3月 05, 2009

If a filesystem being written to via NFS returns a short write count
(as opposed to an error) to nfsd, nfsd treats that as a success for
the entire write, rather than the short count that actually succeeded.

For example, given a 8192 byte write, if the underlying filesystem
only writes 4096 bytes, nfsd will ack back to the nfs client that all
8192 bytes were written.  The nfs client does have retry logic for
short writes, but this is never called as the client is told the
complete write succeeded.

There are probably other ways it could happen, but in my case it
happened with a fuse (filesystem in userspace) filesystem which can
rather easily have a partial write.

Here is a patch to properly return the short write count to the
client.
Signed-off-by: NDavid Shaw <dshaw@jabberwocky.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

31dec253

NFSD: return nfsv4 error code nfserr_notsupp rather than nfsv[23]'s nfserr_opnotsupp · 1e685ec2

由 Benny Halevy 提交于 3月 04, 2009

Thanks for Bill Baker at sun.com for catching this
at Connectathon 2009.

This bug was introduced in 2.6.27
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

1e685ec2

J
nfsd4: move rpc_client setup to a separate function · a601caed
由 J. Bruce Fields 提交于 2月 22, 2009
```
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
a601caed

nfsd4: fix do_probe_callback errors · 418cd20a

由 J. Bruce Fields 提交于 2月 22, 2009

The errors returned aren't used.  Just return 0 and make them available
to a dprintk().  Also, consistently use -ERRNO errors instead of nfs
errors.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: NBenny Halevy <bhalevy@panasas.com>

418cd20a

nfsd4: remove use of mutex for file_hashtable · 8b671b80

由 J. Bruce Fields 提交于 2月 22, 2009

As part of reducing the scope of the client_mutex, and in order to
remove the need for mutexes from the callback code (so that callbacks
can be done as asynchronous rpc calls), move manipulations of the
file_hashtable under the recall_lock.

Update the relevant comments while we're here.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Cc: Alexandros Batsakis <batsakis@netapp.com>
Reviewed-by: NBenny Halevy <bhalevy@panasas.com>

8b671b80

nfsd4: put_nfs4_client does not require state lock · d7fdcfe0

由 J. Bruce Fields 提交于 2月 21, 2009

Since free_client() is guaranteed to only be called once, and to only
touch the client structure itself (not any common data structures), it
has no need for the state lock.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Cc: Alexandros Batsakis <batsakis@netapp.com>

d7fdcfe0

nfsd4: rename io_during_grace_disallowed · 18f82731

由 J. Bruce Fields 提交于 2月 21, 2009

Use a slightly clearer, more concise name.  Also removed unused
argument.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

18f82731

nfsd4: remove unused CHECK_FH flag · 6150ef0d

由 J. Bruce Fields 提交于 2月 21, 2009

All users now pass this, so it's meaningless.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

6150ef0d

nfsd4: fail when delegreturn gets a non-delegation stateid · 7e0f7cf5

由 J. Bruce Fields 提交于 2月 21, 2009

Previous cleanup reveals an obvious (though harmless) bug: when
delegreturn gets a stateid that isn't for a delegation, it should return
an error rather than doing nothing.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

7e0f7cf5

nfsd4: separate delegreturn case from preprocess_stateid_op · 203a8c8e

由 J. Bruce Fields 提交于 2月 21, 2009

Delegreturn is enough a special case for preprocess_stateid_op to
warrant just open-coding it in delegreturn.

There should be no change in behavior here; we're just reshuffling code.

Thanks to Yang Hongyang for catching a critical typo.
Reviewed-by: NYang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

203a8c8e

J
nfsd4: add a helper function to decide if stateid is delegation · 3e633079
由 J. Bruce Fields 提交于 2月 21, 2009
```
Make this check self-documenting.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
3e633079

nfsd4: remove some dprintk's · 819a8f53

由 J. Bruce Fields 提交于 2月 21, 2009

I can't recall ever seeing these printk's used to debug a problem.  I'll
happily put them back if we see a case where they'd be useful.  (Though
if we do that the find_XXX() errors would probably be better
reported in find_XXX() functions themselves.)
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

819a8f53

J
nfsd4: remove unneeded local variable · fd03b099
由 J. Bruce Fields 提交于 2月 21, 2009
```
We no longer need stidp.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
fd03b099

nfsd4: remove redundant "if" in nfs4_preprocess_stateid_op · dc9bf700

由 J. Bruce Fields 提交于 2月 21, 2009

Note that we exit this first big "if" with stp == NULL if and only if we
took the first branch; therefore, the second "if" is redundant, and we
can just combine the two, simplifying the logic.
Reviewed-by: NYang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

dc9bf700

J
nfsd4: move check_stateid_generation check · 0c2a498f
由 J. Bruce Fields 提交于 2月 21, 2009
```
No change in behavior.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
0c2a498f

nfsd4: trivial preprocess_stateid_op cleanup · a4455be0

由 J. Bruce Fields 提交于 2月 21, 2009

Remove a couple redundant comments, adjust style; no change in behavior.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

a4455be0

nfsd(v2/v3): fix the failure of creation from HPUX client · 4ac35c2f

由 wengang wang 提交于 2月 10, 2009

sometimes HPUX nfs client sends a create request to linux nfs server(v2/v3).
the dump of the request is like:
    obj_attributes
        mode: value follows
            set_it: value follows (1)
            mode: 00
        uid: no value
            set_it: no value (0)
        gid: value follows
            set_it: value follows (1)
            gid: 8030
        size: value follows
            set_it: value follows (1)
            size: 0
        atime: don't change
            set_it: don't change (0)
        mtime: don't change
            set_it: don't change (0)

note that mode is 00(havs no rwx privilege even for the owner) and it requires
to set size to 0.

as current nfsd(v2/v3) implementation, the server does mainly 2 steps:
1) creates the file in mode specified by calling vfs_create().
2) sets attributes for the file by calling nfsd_setattr().

at step 2), it finally calls file system specific setattr() function which may
fail when checking permission because changing size needs WRITE privilege but
it has none since mode is 000.

for this case, a new file created, we may simply ignore the request of
setting size to 0, so that WRITE privilege is not needed and the open
succeeds.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
--
 vfs.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

4ac35c2f

lockd: clean up blocking lock cases of nlsmvc_lock() · e33d1ea6

由 Miklos Szeredi 提交于 2月 09, 2009

No change in behavior, just rearranging the switch so that we break out
of the switch if and only if we're in the wait case.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

e33d1ea6

nfsd: lock state around put client and delegation in nfsd4_cb_recall · e37da04e

由 Alexandros Batsakis 提交于 12月 18, 2008

not having the state locked before putting the client/delegation causes a bug.
Also removed the comment from the function header about the state being already locked
Signed-off-by: NAlexandros Batsakis <batsakis@netapp.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

e37da04e

J
nfsd4: use helper for copying delegation filehandle · 6c02eaa1
由 J. Bruce Fields 提交于 2月 02, 2009
```
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
6c02eaa1
J
nfsd4: use helper for copying filehandles for replay · a4773c08
由 J. Bruce Fields 提交于 2月 02, 2009
```
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
a4773c08
J
nfsd4: fix misplaced comment · 13024b7b
由 J. Bruce Fields 提交于 2月 02, 2009
```
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
13024b7b

nfsd: clarify exclusive create bitmask result. · 99f88726

由 J. Bruce Fields 提交于 2月 02, 2009

The use of |= is confusing--the bitmask is always initialized to zero in
this case, so we're effectively just doing an assignment here.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

99f88726

nfsd : Define NFSD only when FILE_LOCKING is enabled · 68666561

由 Manish Katiyar 提交于 1月 29, 2009

Enable NFSD only when FILE_LOCKING is enabled, since we don't want to
support NFSD without FILE_LOCKING.
Signed-off-by: NManish Katiyar <mkatiyar@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

68666561

NFSD: cleanup for nfs3proc.c · 12214cb7

由 Qinghuang Feng 提交于 1月 12, 2009

MSDOS_SUPER_MAGIC is defined in <linux/magic.h>,
so use MSDOS_SUPER_MAGIC directly.
Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

12214cb7

nfsd4: split open/lockowner release code · f044ff83

由 J. Bruce Fields 提交于 1月 11, 2009

The caller always knows specifically whether it's releasing a lockowner
or an openowner, and the code is simpler if we use separate functions
(and the apparent recursion is gone).
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

f044ff83

J
nfsd4: remove a forward declaration · f1d110ca
由 J. Bruce Fields 提交于 1月 11, 2009
```
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
```
f1d110ca

nfsd4: split lockstateid/openstateid release logic · 2283963f

由 J. Bruce Fields 提交于 1月 11, 2009

The flags here attempt to make the code more general, but I find it
actually just adds confusion.

I think it's clearer to separate the logic for the open and lock cases
entirely.  And eventually we may want to separate the stateowner and
stateid types as well, as many of the fields aren't shared between the
lock and open cases.

Also move to eliminate forward references.

Start with the stateid's.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: NBenny Halevy <bhalevy@panasas.com>

2283963f

18 3月, 2009 2 次提交

NFSD: provide encode routine for OP_OPENATTR · 84f09f46

由 Benny Halevy 提交于 3月 04, 2009

Although this operation is unsupported by our implementation
we still need to provide an encode routine for it to
merely encode its (error) status back in the compound reply.

Thanks for Bill Baker at sun.com for testing with the Sun
OpenSolaris' client, finding, and reporting this bug at
Connectathon 2009.

This bug was introduced in 2.6.27
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

84f09f46

Avoid 64-bit "switch()" statements on 32-bit architectures · ee568b25

由 Linus Torvalds 提交于 3月 17, 2009

Commit ee6f779b ("filp->f_pos not
correctly updated in proc_task_readdir") changed the proc code to use
filp->f_pos directly, rather than through a temporary variable.  In the
process, that caused the operations to be done on the full 64 bits, even
though the offset is never that big.

That's all fine and dandy per se, but for some unfathomable reason gcc
generates absolutely horrid code when using 64-bit values in switch()
statements.  To the point of actually calling out to gcc helper
functions like __cmpdi2 rather than just doing the trivial comparisons
directly the way gcc does for normal compares.  At which point we get
link failures, because we really don't want to support that kind of
crazy code.

Fix this by just casting the f_pos value to "unsigned long", which
is plenty big enough for /proc, and avoids the gcc code generation issue.
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Le <r0bertz@gentoo.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee568b25

17 3月, 2009 1 次提交

ext4: fix bb_prealloc_list corruption due to wrong group locking · d33a1976

由 Eric Sandeen 提交于 3月 16, 2009

This is for Red Hat bug 490026: EXT4 panic, list corruption in
ext4_mb_new_inode_pa

ext4_lock_group(sb, group) is supposed to protect this list for
each group, and a common code flow to remove an album is like
this:

    ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
    ext4_lock_group(sb, grp);
    list_del(&pa->pa_group_list);
    ext4_unlock_group(sb, grp);

so it's critical that we get the right group number back for
this prealloc context, to lock the right group (the one 
associated with this pa) and prevent concurrent list manipulation.

however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a 
comment, "-1 is to protect from crossing allocation group".

This makes sense for the group_pa, where pa_pstart is advanced
by the length which has been used (in ext4_mb_release_context()),
and when the entire length has been used, pa_pstart has been
advanced to the first block of the next group.

However, for inode_pa, pa_pstart is never advanced; it's just
set once to the first block in the group and not moved after
that.  So in this case, if we subtract one in ext4_mb_put_pa(),
we are actually locking the *previous* group, and opening the
race with the other threads which do not subtract off the extra
block.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d33a1976

16 3月, 2009 1 次提交

filp->f_pos not correctly updated in proc_task_readdir · ee6f779b

由 Zhang Le 提交于 3月 16, 2009

filp->f_pos only get updated at the end of the function. Thus d_off of those
dirents who are in the middle will be 0, and this will cause a problem in
glibc's readdir implementation, specifically endless loop. Because when overflow
occurs, f_pos will be set to next dirent to read, however it will be 0, unless
the next one is the last one. So it will start over again and again.

There is a sample program in man 2 gendents. This is the output of the program
running on a multithread program's task dir before this patch is applied:

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          0  ..
    506443  directory    16          0  3807
    506444  directory    16          0  3809
    506445  directory    16          0  3812
    506446  directory    16          0  3861
    506447  directory    16          0  3862
    506448  directory    16          8  3863

This is the output after this patch is applied

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          2  ..
    506443  directory    16          3  3807
    506444  directory    16          4  3809
    506445  directory    16          5  3812
    506446  directory    16          6  3861
    506447  directory    16          7  3862
    506448  directory    16          8  3863
Signed-off-by: NZhang Le <r0bertz@gentoo.org>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee6f779b

15 3月, 2009 1 次提交

block: fix memory leak in bio_clone() · 059ea331

由 Li Zefan 提交于 3月 09, 2009

If bio_integrity_clone() fails, bio_clone() returns NULL without freeing
the newly allocated bio.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

059ea331