提交 · 50a737f86dbf99daf3a8dcbdf778a3be36bb2a39 · openeuler / raspberrypi-kernel

24 12月, 2008 11 次提交

NFS: "[no]resvport" mount option changes mountd client too · 50a737f8

由 Chuck Lever 提交于 12月 23, 2008

If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport.  The kernel's mountd client should use an
unprivileged port in this case as well.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

50a737f8

NFS: add "[no]resvport" mount option · d740351b

由 Chuck Lever 提交于 12月 23, 2008

The standard default security setting for NFS is AUTH_SYS.  An NFS
client connects to NFS servers via a privileged source port and a
fixed standard destination port (2049).  The client sends raw uid and
gid numbers to identify users making NFS requests, and the server
assumes an appropriate authority on the client has vetted these
values because the source port is privileged.

On Linux, by default in-kernel RPC services use a privileged port in
the range between 650 and 1023 to avoid using source ports of well-
known IP services.  Using such a small range limits the number of NFS
mount points and the number of unique NFS servers to which a client
can connect concurrently.

An NFS client can use unprivileged source ports to expand the range of
source port numbers, allowing more concurrent server connections and
more NFS mount points.  Servers must explicitly allow NFS connections
from unprivileged ports for this to work.

In the past, bumping the value of the sunrpc.max_resvport sysctl on
the client would permit the NFS client to use unprivileged ports.
Bumping this setting also changes the maximum port number used by
other in-kernel RPC services, some of which still required a port
number less than 1023.

This is exacerbated by the way source port numbers are chosen by the
Linux RPC client, which starts at the top of the range and works
downwards.  It means that bumping the maximum means all RPC services
requesting a source port will likely get an unprivileged port instead
of a privileged one.

Changing this setting effects all NFS mount points on a client.  A
sysadmin could not selectively choose which mount points would use
non-privileged ports and which could not.

Lastly, this mechanism of expanding the limit on the number of NFS
mount points was entirely undocumented.

To address the need for the NFS client to use a large range of source
ports without interfering with the activity of other in-kernel RPC
services, we introduce a new NFS mount option.  This option explicitly
tells only the NFS client to use a non-privileged source port when
communicating with the NFS server for one specific mount point.

This new mount option is called "resvport," like the similar NFS mount
option on FreeBSD and Mac OS X.  A sister patch for nfs-utils will be
submitted that documents this new option in nfs(5).

The default setting for this new mount option requires the NFS client
to use a privileged port, as before.  Explicitly specifying the
"noresvport" mount option allows the NFS client to use an unprivileged
source port for this mount point when connecting to the NFS server
port.

This mount option is supported only for text-based NFS mounts.

[ Sidebar: it is widely known that security mechanisms based on the
  use of privileged source ports are ineffective.  However, the NFS
  client can combine the use of unprivileged ports with the use of
  secure authentication mechanisms, such as Kerberos.  This allows a
  large number of connections and mount points while ensuring a useful
  level of security.

  Eventually we may change the default setting for this option
  depending on the security flavor used for the mount.  For example,
  if the mount is using only AUTH_SYS, then the default setting will
  be "resvport;" if the mount is using a strong security flavor such
  as krb5, the default setting will be "noresvport." ]
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
[Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client()
was being called with incorrect arguments.]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d740351b

NFS: move nfs_server flag initialization · 542fcc33

由 Chuck Lever 提交于 12月 23, 2008

Make it possible for the NFSv4 mount set up logic to pass mount option
flags down the stack to nfs_create_rpc_client().

This is immediately useful if we want NFS mount options to modulate
settings of the underlying RPC transport, but it may be useful at some
later point if other parts of the NFSv4 mount initialization logic
want to know what the mount options are.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

542fcc33

NFS: expand flags passed to nfs_create_rpc_client() · 4a01b8a4

由 Chuck Lever 提交于 12月 23, 2008

The nfs_create_rpc_client() function sets up an RPC client for an NFS
mount point.  Add an option that allows it to set up an RPC transport
from an unprivileged port.

Instead of having nfs_create_rpc_client()'s callers retain local
knowledge about how to set up an RPC client, create a couple of flag
arguments to control the use of RPC_CLNT_CREATE flags.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4a01b8a4

NFS: introduce nfs_mount_info struct for calling nfs_mount() · c5d120f8

由 Chuck Lever 提交于 12月 23, 2008

Clean up: convert nfs_mount() to take a single data structure argument to make
it simpler to add more arguments.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c5d120f8

NFS: Move declaration of nfs_mount() to fs/nfs/internal.h · 146ec944

由 Chuck Lever 提交于 12月 23, 2008

Clean up:  The nfs_mount() function is not to be used outside of the
NFS client.  Move its public declaration to fs/nfs/internal.h.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

146ec944

NFS: rename nfs_path variable · 7b5d2b98

由 Chuck Lever 提交于 12月 23, 2008

Clean up: I'm about to move the declaration of nfs_mount into
fs/nfs/internal.h and include it in fs/nfs/nfsroot.c.  There's a
conflicting definition of nfs_path in fs/nfs/internal.h and
fs/nfs/nfsroot.c, so rename the private one.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7b5d2b98

lockd: convert reclaimer thread to kthread interface · df94f000

由 Jeff Layton 提交于 12月 23, 2008

My understanding is that there is a push to turn the kernel_thread
interface into a non-exported symbol and move all kernel threads to use
the kthread API. This patch changes lockd to use kthread_run to spawn
the reclaimer thread.

I've made the assumption here that the extra module references taken
when we spawn this thread are unnecessary and removed them. I've also
added a KERN_ERR printk that pops if the thread can't be spawned to warn
the admin that the locks won't be reclaimed.

In the future, it would be nice to be able to notify userspace that
locks have been lost (probably by implementing SIGLOST), and adding some
good policies about how long we should reattempt to reclaim the locks.

Finally, I removed a comment about memory leaks that I believe is
obsolete and added a new one to clarify the result of sending a SIGKILL
to the reclaimer thread. As best I can tell, doing so doesn't actually
cause a memory leak.

I consider this patch 2.6.29 material.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

df94f000

T
LOCKD: Make lockd_up() and lockd_down() exported GPL-only · 2de59872
由 Trond Myklebust 提交于 12月 23, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
2de59872

SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only · d716f0b8

由 Trond Myklebust 提交于 12月 23, 2008

Again, this has never been intended as a public abi for out-of-tree
modules.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d716f0b8

nfs: remove redundant tests on reading new pages · 136221fc

由 Wu Fengguang 提交于 12月 23, 2008

aops->readpages() and its NFS helper readpage_async_filler() will only
be called to do readahead I/O for newly allocated pages. So it's not
necessary to test for the always 0 dirty/uptodate page flags.

The removal of nfs_wb_page() call also fixes a readahead bug: the NFS
readahead has been synchronous since 2.6.23, because that call will
clear PG_readahead, which is the reminder for asynchronous readahead.

More background: the PG_readahead page flag is shared with PG_reclaim,
one for read path and the other for write path. clear_page_dirty_for_io()
unconditionally clears PG_readahead to prevent possible readahead residuals,
assuming itself to be always called in the write path. However, NFS is one
and the only exception in that it _always_ calls clear_page_dirty_for_io()
in the read path, i.e. for readpages()/readpage().

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

136221fc

20 12月, 2008 3 次提交

fs/9p: change simple_strtol to simple_strtoul · f1d9e458

由 Julia Lawall 提交于 12月 19, 2008

Since v9ses->uid is unsigned, it would seem better to use simple_strtoul that
simple_strtol.

A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r2@
long e;
position p;
@@

e = simple_strtol@p(...)

@@
position p != r2.p;
type T;
T e;
@@

e =
- simple_strtol@p
+ simple_strtoul
  (...)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Acked-by: NEric Van Hensbergen <ericvh@gmail.com>

f1d9e458

9p: convert d_iname references to d_name.name · 7dd0cdc5

由 Wu Fengguang 提交于 12月 19, 2008

d_iname is rubbish for long file names.
Use d_name.name in printks instead.
Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
Acked-by: NEric Van Hensbergen <ericvh@gmail.com>

7dd0cdc5

D
9p: Remove potentially bad parameter from function entry debug print. · 6ff23207
由 Duane Griffin 提交于 12月 19, 2008
```
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
```
6ff23207

18 12月, 2008 1 次提交

cifs: fix buffer overrun in parse_DFS_referrals · 331c3135

由 Jeff Layton 提交于 12月 17, 2008

While testing a kernel with memory poisoning enabled, I saw some warnings
about the redzone getting clobbered when chasing DFS referrals. The
buffer allocation for the unicode converted version of the searchName is
too small and needs to take null termination into account.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteve French <sfrench@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

331c3135

17 12月, 2008 2 次提交

ocfs2: Add JBD2 compat feature bit. · a9772189

由 Joel Becker 提交于 12月 16, 2008

Define the OCFS2_FEATURE_COMPAT_JBD2 bit in the filesystem header.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a9772189

ocfs2: Always update xattr search when creating bucket. · 83099bc6

由 Tao Ma 提交于 12月 05, 2008

When we create xattr bucket during the process of xattr set, we always
need to update the ocfs2_xattr_search since even if the bucket size is
the same as block size, the offset will change because of the removal
of the ocfs2_xattr_block header.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

83099bc6

11 12月, 2008 5 次提交

KSYM_SYMBOL_LEN fixes · 9c246247

由 Hugh Dickins 提交于 12月 09, 2008

Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12 sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.

The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.

Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.

[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc Miles Lane <miles.lane@gmail.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NSteven Rostedt <srostedt@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c246247

inotify: fix IN_ONESHOT unmount event watcher · 6ee5a399

由 Dmitri Monakhov 提交于 12月 09, 2008

On umount two event will be dispatched to watcher:

1: inotify_dev_queue_event(.., IN_UNMOUNT,..)
2: remove_watch(watch, dev)
    ->inotify_dev_queue_event(.., IN_IGNORED, ..)

But if watcher has IN_ONESHOT bit set then the watcher will be released
inside first event.  Which result in accessing invalid object later.  IMHO
it is not pure regression.  This bug wasn't triggered while initial
inotify interface testing phase because of another bug in IN_ONESHOT
handling logic :)

  commit ac74c00e
  Author: Ulisses Furquim <ulissesf@gmail.com>
  Date:   Fri Feb 8 04:18:16 2008 -0800
    inotify: fix check for one-shot watches before destroying them
    As the IN_ONESHOT bit is never set when an event is sent we must check it
    in the watch's mask and not in the event's mask.

TESTCASE:
mkdir mnt
mount -ttmpfs none mnt
mkdir mnt/d
./inotify mnt/d&
umount mnt ## << lockup or crash here

TESTSOURCE:
/* gcc -oinotify inotify.c */
#include <stdio.h>
#include <stdlib.h>
#include <sys/inotify.h>

int main(int argc, char **argv)
{
        char buf[1024];
        struct inotify_event *ie;
        char *p;
        int i;
        ssize_t l;

        p = argv[1];
        i = inotify_init();
        inotify_add_watch(i, p, ~0);

        l = read(i, buf, sizeof(buf));
        printf("read %d bytes\n", l);
        ie = (struct inotify_event *) buf;
        printf("event mask: %d\n", ie->mask);
	return 0;
}
Signed-off-by: NDmitri Monakhov <dmonakhov@openvz.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Ulisses Furquim <ulissesf@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ee5a399

pagemap: fix 32-bit pagemap regression · 49c50342

由 Matt Mackall 提交于 12月 09, 2008

The large pages fix from bcf8039e broke 32-bit pagemap by pulling the
pagemap entry code out into a function with the wrong return type.
Pagemap entries are 64 bits on all systems and unsigned long is only 32
bits on 32-bit systems.
Signed-off-by: NMatt Mackall <mpm@selenic.com>
Reported-by: NDoug Graham <dgraham@nortel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@kernel.org>		[2.6.26.x, 2.6.27.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49c50342

revert "percpu_counter: new function percpu_counter_sum_and_set" · 02d21168

由 Andrew Morton 提交于 12月 09, 2008

Revert

    commit e8ced39d
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Fri Jul 11 19:27:31 2008 -0400

        percpu_counter: new function percpu_counter_sum_and_set

As described in

	revert "percpu counter: clean up percpu_counter_sum_and_set()"

the new percpu_counter_sum_and_set() is racy against updates to the
cpu-local accumulators on other CPUs.  Revert that change.

This means that ext4 will be slow again.  But correct.
Reported-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>		[2.6.27.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02d21168

revert "percpu counter: clean up percpu_counter_sum_and_set()" · 71c5576f

由 Andrew Morton 提交于 12月 09, 2008

Revert

    commit 1f7c14c6
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Thu Oct 9 12:50:59 2008 -0400

        percpu counter: clean up percpu_counter_sum_and_set()

Before this patch we had the following:

percpu_counter_sum(): return the percpu_counter's value

percpu_counter_sum_and_set(): return the percpu_counter's value, copying
that value into the central value and zeroing the per-cpu counters before
returning.

After this patch, percpu_counter_sum_and_set() has gone, and
percpu_counter_sum() gets the old percpu_counter_sum_and_set()
functionality.

Problem is, as Eric points out, the old percpu_counter_sum_and_set()
functionality was racy and wrong.  It zeroes out counters on "other" cpus,
without holding any locks which will prevent races agaist updates from
those other CPUS.

This patch reverts 1f7c14c6.  This means
that percpu_counter_sum_and_set() still has the race, but
percpu_counter_sum() does not.

Note that this is not a simple revert - ext4 has since started using
percpu_counter_sum() for its dirty_blocks counter as well.

Note that this revert patch changes percpu_counter_sum() semantics.

Before the patch, a call to percpu_counter_sum() will bring the counter's
central counter mostly up-to-date, so a following percpu_counter_read()
will return a close value.

After this patch, a call to percpu_counter_sum() will leave the counter's
central accumulator unaltered, so a subsequent call to
percpu_counter_read() can now return a significantly inaccurate result.

If there is any code in the tree which was introduced after
e8ced39d was merged, and which depends
upon the new percpu_counter_sum() semantics, that code will break.
Reported-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

71c5576f

10 12月, 2008 1 次提交

tracehook: exec double-reporting fix · 85f33466

由 Roland McGrath 提交于 12月 09, 2008

The patch 6341c393 "tracehook: exec" introduced a small regression in
2.6.27 regarding binfmt_misc exec event reporting.  Since the reporting
is now done in the common search_binary_handler() function, an exec
of a misc binary will result in two (or possibly multiple) exec events
being reported, instead of just a single one, because the misc handler
contains a recursive call to search_binary_handler.

To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple
SIGTRAP signals will in fact cause only a single ptrace intercept, as the
signals are not queued.  However, if PTRACE_O_TRACEEXEC is on, the debugger
will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC).

The test program included below demonstrates the problem.

This change fixes the bug by calling tracehook_report_exec() only in the
outermost search_binary_handler() call (bprm->recursion_depth == 0).

The additional change to restore bprm->recursion_depth after each binfmt
load_binary call is actually superfluous for this bug, since we test the
value saved on entry to search_binary_handler().  But it keeps the use of
of the depth count to its most obvious expected meaning.  Depending on what
binfmt handlers do in certain cases, there could have been false-positive
tests for recursion limits before this change.

    /* Test program using PTRACE_O_TRACEEXEC.
       This forks and exec's the first argument with the rest of the arguments,
       while ptrace'ing.  It expects to see one PTRACE_EVENT_EXEC stop and
       then a successful exit, with no other signals or events in between.

       Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec:

       $ gcc -g traceexec.c -o traceexec
       $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register'
       $ echo 'foobar test' > ./foobar
       $ chmod +x ./foobar
       $ ./traceexec ./foobar; echo $?
       ==> good <==
       foobar test
       0
       $
       ==> bad <==
       foobar test
       unexpected status 0x4057f != 0
       3
       $

    */

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <sys/ptrace.h>
    #include <unistd.h>
    #include <signal.h>
    #include <stdlib.h>

    static void
    wait_for (pid_t child, int expect)
    {
      int status;
      pid_t p = wait (&status);
      if (p != child)
	{
	  perror ("wait");
	  exit (2);
	}
      if (status != expect)
	{
	  fprintf (stderr, "unexpected status %#x != %#x\n", status, expect);
	  exit (3);
	}
    }

    int
    main (int argc, char **argv)
    {
      pid_t child = fork ();

      if (child < 0)
	{
	  perror ("fork");
	  return 127;
	}
      else if (child == 0)
	{
	  ptrace (PTRACE_TRACEME);
	  raise (SIGUSR1);
	  execv (argv[1], &argv[1]);
	  perror ("execve");
	  _exit (127);
	}

      wait_for (child, W_STOPCODE (SIGUSR1));

      if (ptrace (PTRACE_SETOPTIONS, child,
		  0L, (void *) (long) PTRACE_O_TRACEEXEC) != 0)
	{
	  perror ("PTRACE_SETOPTIONS");
	  return 4;
	}

      if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
	{
	  perror ("PTRACE_CONT");
	  return 5;
	}

      wait_for (child, W_STOPCODE (SIGTRAP | (PTRACE_EVENT_EXEC << 8)));

      if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
	{
	  perror ("PTRACE_CONT");
	  return 6;
	}

      wait_for (child, W_EXITCODE (0, 0));

      return 0;
    }
Reported-by: NArnd Bergmann <arnd@arndb.de>
CC: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: NRoland McGrath <roland@redhat.com>

85f33466

09 12月, 2008 1 次提交

EXPORTFS: handle NULL returns from fh_to_dentry()/fh_to_parent() · a4f4d6df

由 J. Bruce Fields 提交于 12月 08, 2008

While 44003728 "[PATCH] switch all filesystems over to
d_obtain_alias" removed some cases where fh_to_dentry() and
fh_to_parent() could return NULL, there are still a few NULL returns
left in individual filesystems.  Thus it was a mistake for that commit
to remove the handling of NULL returns in the callers.

Revert those parts of 44003728 which removed the NULL handling.

(We could, alternatively, modify all implementations to return -ESTALE
instead of NULL, but that proves to require fixing a number of
filesystems, and in some cases it's arguably more natural to return
NULL.)

Thanks to David for original patch and Linus, Christoph, and Hugh for
review.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a4f4d6df

06 12月, 2008 1 次提交

Fix a race condition in FASYNC handling · 218d11a8

由 Jonathan Corbet 提交于 12月 05, 2008

Changeset a238b790 (Call fasync()
functions without the BKL) introduced a race which could leave
file->f_flags in a state inconsistent with what the underlying
driver/filesystem believes.  Revert that change, and also fix the same
races in ioctl_fioasync() and ioctl_fionbio().

This is a minimal, short-term fix; the real fix will not involve the
BKL.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: NJonathan Corbet <corbet@lwn.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

218d11a8

05 12月, 2008 1 次提交

[XFS] Fix hang after disallowed rename across directory quota domains · 576a488a

由 Dave Chinner 提交于 12月 04, 2008

When project quota is active and is being used for directory tree
quota control, we disallow rename outside the current directory
tree. This requires a check to be made after all the inodes
involved in the rename are locked. We fail to unlock the inodes
correctly if we disallow the rename when the target is outside the
current directory tree. This results in a hang on the next access
to the inodes involved in failed rename.
Reported-by: NArkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: NDave Chinner <david@fromorbit.com>
Tested-by: NArkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>

576a488a

04 12月, 2008 2 次提交

[PATCH 1/2] kill FMODE_NDELAY_NOW · fd4ce1ac

由 Christoph Hellwig 提交于 11月 05, 2008

Update FMODE_NDELAY before each ioctl call so that we can kill the
magic FMODE_NDELAY_NOW.  It would be even better to do this directly
in setfl(), but for that we'd need to have FMODE_NDELAY for all files,
not just block special files.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fd4ce1ac

[PATCH] clean up blkdev_get a little bit · ebbefc01

由 Christoph Hellwig 提交于 11月 05, 2008

The way the bd_claim for the FMODE_EXCL case is implemented is rather
confusing.  Clean it up to the most logical style.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebbefc01

02 12月, 2008 7 次提交

ntfs: don't fool kernel-doc · 03801553

由 Randy Dunlap 提交于 12月 01, 2008

kernel-doc handles macros now (it has for quite some time), so change the
ntfs_debug() macro's kernel-doc to be just before the macro instead of
before a phony function prototype.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

03801553

epoll: introduce resource usage limits · 7ef9964e

由 Davide Libenzi 提交于 12月 01, 2008

It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface.  Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds.  To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced.  A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:

  max_user_instances = Maximum number of devices - per user

  max_user_watches   = Maximum number of "watched" fds - per user

The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM.  As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users.  The
default value for "max_user_instances" is set to 128, that should be
enough too.

This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
listed, so that should be ok.

[akpm@linux-foundation.org: use get_current_user()]
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: NVegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ef9964e

ocfs2: fix regression in ocfs2_read_blocks_sync() · d6b58f89

由 Mark Fasheh 提交于 11月 21, 2008

We're panicing in ocfs2_read_blocks_sync() if a jbd-managed buffer is seen.
At first glance, this seems ok but in reality it can happen. My test case
was to just run 'exorcist'. A struct inode is being pushed out of memory but
is then re-read at a later time, before the buffer has been checkpointed by
jbd. This causes a BUG to be hit in ocfs2_read_blocks_sync().
Reviewed-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

d6b58f89

ocfs2: fix return value set in init_dlmfs_fs() · 07d9a395

由 Coly Li 提交于 11月 17, 2008

In init_dlmfs_fs(), if calling kmem_cache_create() failed, the code will use return value from
calling bdi_init(). The correct behavior should be set status as -ENOMEM before going to "bail:".
Signed-off-by: NColy Li <coyli@suse.de>
Acked-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

07d9a395

ocfs2: fix wake_up in unlock_ast · 07f9eebc

由 David Teigland 提交于 11月 17, 2008

In ocfs2_unlock_ast(), call wake_up() on lockres before releasing
the spin lock on it.  As soon as the spin lock is released, the
lockres can be freed.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

07f9eebc

ocfs2: initialize stack_user lvbptr · 66f502a4

由 David Teigland 提交于 11月 10, 2008

The locking_state dump, ocfs2_dlm_seq_show, reads the lvb on locks where it
has not yet been initialized by a lock call.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

66f502a4

ocfs2: comments typo fix · 3b5da018

由 Coly Li 提交于 11月 05, 2008

This patch fixes two typos in comments of ocfs2.
Signed-off-by: NColy Li <coyli@suse.de>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

3b5da018

28 11月, 2008 1 次提交

udf: Fix BUG_ON() in destroy_inode() · 52b19ac9

由 Jan Kara 提交于 9月 23, 2008

udf_clear_inode() can leave behind buffers on mapping's i_private list (when
we truncated preallocation). Call invalidate_inode_buffers() so that the list
is properly cleaned-up before we return from udf_clear_inode(). This is ugly
and suggest that we should cleanup preallocation earlier than in clear_inode()
but currently there's no such call available since drop_inode() is called under
inode lock and thus is unusable for disk operations.
Signed-off-by: NJan Kara <jack@suse.cz>

52b19ac9

27 11月, 2008 1 次提交

[CIFS] fix regression in cifs_write_begin/cifs_write_end · a98ee8c1

由 Jeff Layton 提交于 11月 26, 2008

The conversion to write_begin/write_end interfaces had a bug where we
were passing a bad parameter to cifs_readpage_worker. Rather than
passing the page offset of the start of the write, we needed to pass the
offset of the beginning of the page. This was reliably showing up as
data corruption in the fsx-linux test from LTP.

It also became evident that this code was occasionally doing unnecessary
read calls. Optimize those away by using the PG_checked flag to indicate
that the unwritten part of the page has been initialized.

CC: Nick Piggin <npiggin@suse.de>
Acked-by: NDave Kleikamp <shaggy@us.ibm.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

a98ee8c1

25 11月, 2008 3 次提交

NLM: client-side nlm_lookup_host() should avoid matching on srcaddr · a8d82d9b

由 Chuck Lever 提交于 11月 24, 2008

Since commit c98451bd, the loop in nlm_lookup_host() unconditionally
compares the host's h_srcaddr field to the incoming source address.
For client-side nlm_host entries, both are always AF_UNSPEC, so this
check is unnecessary.

Since commit 781b61a6, which added support for AF_INET6 addresses to
nlm_cmp_addr(), nlm_cmp_addr() now returns FALSE for AF_UNSPEC
addresses, which causes nlm_lookup_host() to create a fresh nlm_host
entry every time it is called on the client.

These extra entries will eventually expire once the server is
unmounted, so the impact of this regression, introduced with lockd
IPv6 support in 2.6.28, should be minor.

We could fix this by adding an arm in nlm_cmp_addr() for AF_UNSPEC
addresses, but really, nlm_lookup_host() shouldn't be matching on the
srcaddr field for client-side nlm_host lookups.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

a8d82d9b

nfsd: use of unitialized list head on error exit in nfs4recover.c · e4625eb8

由 J. Bruce Fields 提交于 11月 24, 2008

Thanks to Matthew Dodd for this bug report:

A file label issue while running SELinux in MLS mode provoked the
following bug, which is a result of use before init on a 'struct list_head'.

In nfsd4_list_rec_dir() if the call to dentry_open() fails the 'goto
out' skips INIT_LIST_HEAD() which results in the normally improbable
case where list_entry() returns NULL.

Trace follows.

NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
SELinux:  Context unconfined_t:object_r:var_lib_nfs_t:s0 is not valid
(left unmapped).
type=1400 audit(1227298063.609:282): avc:  denied  { read } for
pid=1890 comm="rpc.nfsd" name="v4recovery" dev=dm-0 ino=148726
scontext=system_u:system_r:nfsd_t:s0-s15:c0.c1023
tcontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tclass=dir
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c050894e>] list_del+0x6/0x60
*pde = 0d9ce067 *pte = 00000000
Oops: 0000 [#1] SMP
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4
sunrpc ipv6 dm_multipath scsi_dh ppdev parport_pc sg parport floppy
ata_piix pata_acpi ata_generic libata pcnet32 i2c_piix4 mii pcspkr
i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod BusLogic sd_mod
scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last
unloaded: microcode]

Pid: 1890, comm: rpc.nfsd Not tainted (2.6.27.5-37.fc9.i686 #1)
EIP: 0060:[<c050894e>] EFLAGS: 00010217 CPU: 0
EIP is at list_del+0x6/0x60
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: cd99e480
ESI: cf9caed8 EDI: 00000000 EBP: cf9caebc ESP: cf9caeb8
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rpc.nfsd (pid: 1890, ti=cf9ca000 task=cf4de580 task.ti=cf9ca000)
Stack: 00000000 cf9caef0 d0a9f139 c0496d04 d0a9f217 fffffff3 00000000
00000000
        00000000 00000000 cf32b220 00000000 00000008 00000801 cf9caefc
d0a9f193
        00000000 cf9caf08 d0a9b6ea 00000000 cf9caf1c d0a874f2 cf9c3004
00000008
Call Trace:
  [<d0a9f139>] ? nfsd4_list_rec_dir+0xf3/0x13a [nfsd]
  [<c0496d04>] ? do_path_lookup+0x12d/0x175
  [<d0a9f217>] ? load_recdir+0x0/0x26 [nfsd]
  [<d0a9f193>] ? nfsd4_recdir_load+0x13/0x34 [nfsd]
  [<d0a9b6ea>] ? nfs4_state_start+0x2a/0xc5 [nfsd]
  [<d0a874f2>] ? nfsd_svc+0x51/0xff [nfsd]
  [<d0a87f2d>] ? write_svc+0x0/0x1e [nfsd]
  [<d0a87f48>] ? write_svc+0x1b/0x1e [nfsd]
  [<d0a87854>] ? nfsctl_transaction_write+0x3a/0x61 [nfsd]
  [<c04b6a4e>] ? sys_nfsservctl+0x116/0x154
  [<c04975c1>] ? putname+0x24/0x2f
  [<c04975c1>] ? putname+0x24/0x2f
  [<c048d49f>] ? do_sys_open+0xad/0xb7
  [<c048d337>] ? filp_close+0x50/0x5a
  [<c048d4eb>] ? sys_open+0x1e/0x26
  [<c0403cca>] ? syscall_call+0x7/0xb
  [<c064007b>] ? init_cyrix+0x185/0x490
  =======================
Code: 75 e1 8b 53 08 8d 4b 04 8d 46 04 e8 75 00 00 00 8b 53 10 8d 4b 0c
8d 46 0c e8 67 00 00 00 5b 5e 5f 5d c3 90 90 55 89 e5 53 89 c3 <8b> 40
04 8b 00 39 d8 74 16 50 53 68 3e d6 6f c0 6a 30 68 78 d6
EIP: [<c050894e>] list_del+0x6/0x60 SS:ESP 0068:cf9caeb8
---[ end trace a89c4ad091c4ad53 ]---

Cc: Matthew N. Dodd <Matthew.Dodd@spart.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

e4625eb8

nfsd: clean up grace period on early exit · 2c5e7615

由 J. Bruce Fields 提交于 11月 20, 2008

If nfsd was shut down before the grace period ended, we could end up
with a freed object still on grace_list.  Thanks to Jeff Moyer for
reporting the resulting list corruption warnings.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Tested-by: NJeff Moyer <jmoyer@redhat.com>

2c5e7615