1. 24 12月, 2008 11 次提交
    • C
      NFS: "[no]resvport" mount option changes mountd client too · 50a737f8
      Chuck Lever 提交于
      If the admin has specified the "noresvport" option for an NFS mount
      point, the kernel's NFS client uses an unprivileged source port for
      the main NFS transport.  The kernel's mountd client should use an
      unprivileged port in this case as well.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      50a737f8
    • C
      NFS: add "[no]resvport" mount option · d740351b
      Chuck Lever 提交于
      The standard default security setting for NFS is AUTH_SYS.  An NFS
      client connects to NFS servers via a privileged source port and a
      fixed standard destination port (2049).  The client sends raw uid and
      gid numbers to identify users making NFS requests, and the server
      assumes an appropriate authority on the client has vetted these
      values because the source port is privileged.
      
      On Linux, by default in-kernel RPC services use a privileged port in
      the range between 650 and 1023 to avoid using source ports of well-
      known IP services.  Using such a small range limits the number of NFS
      mount points and the number of unique NFS servers to which a client
      can connect concurrently.
      
      An NFS client can use unprivileged source ports to expand the range of
      source port numbers, allowing more concurrent server connections and
      more NFS mount points.  Servers must explicitly allow NFS connections
      from unprivileged ports for this to work.
      
      In the past, bumping the value of the sunrpc.max_resvport sysctl on
      the client would permit the NFS client to use unprivileged ports.
      Bumping this setting also changes the maximum port number used by
      other in-kernel RPC services, some of which still required a port
      number less than 1023.
      
      This is exacerbated by the way source port numbers are chosen by the
      Linux RPC client, which starts at the top of the range and works
      downwards.  It means that bumping the maximum means all RPC services
      requesting a source port will likely get an unprivileged port instead
      of a privileged one.
      
      Changing this setting effects all NFS mount points on a client.  A
      sysadmin could not selectively choose which mount points would use
      non-privileged ports and which could not.
      
      Lastly, this mechanism of expanding the limit on the number of NFS
      mount points was entirely undocumented.
      
      To address the need for the NFS client to use a large range of source
      ports without interfering with the activity of other in-kernel RPC
      services, we introduce a new NFS mount option.  This option explicitly
      tells only the NFS client to use a non-privileged source port when
      communicating with the NFS server for one specific mount point.
      
      This new mount option is called "resvport," like the similar NFS mount
      option on FreeBSD and Mac OS X.  A sister patch for nfs-utils will be
      submitted that documents this new option in nfs(5).
      
      The default setting for this new mount option requires the NFS client
      to use a privileged port, as before.  Explicitly specifying the
      "noresvport" mount option allows the NFS client to use an unprivileged
      source port for this mount point when connecting to the NFS server
      port.
      
      This mount option is supported only for text-based NFS mounts.
      
      [ Sidebar: it is widely known that security mechanisms based on the
        use of privileged source ports are ineffective.  However, the NFS
        client can combine the use of unprivileged ports with the use of
        secure authentication mechanisms, such as Kerberos.  This allows a
        large number of connections and mount points while ensuring a useful
        level of security.
      
        Eventually we may change the default setting for this option
        depending on the security flavor used for the mount.  For example,
        if the mount is using only AUTH_SYS, then the default setting will
        be "resvport;" if the mount is using a strong security flavor such
        as krb5, the default setting will be "noresvport." ]
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      [Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client()
      was being called with incorrect arguments.]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d740351b
    • C
      NFS: move nfs_server flag initialization · 542fcc33
      Chuck Lever 提交于
      Make it possible for the NFSv4 mount set up logic to pass mount option
      flags down the stack to nfs_create_rpc_client().
      
      This is immediately useful if we want NFS mount options to modulate
      settings of the underlying RPC transport, but it may be useful at some
      later point if other parts of the NFSv4 mount initialization logic
      want to know what the mount options are.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      542fcc33
    • C
      NFS: expand flags passed to nfs_create_rpc_client() · 4a01b8a4
      Chuck Lever 提交于
      The nfs_create_rpc_client() function sets up an RPC client for an NFS
      mount point.  Add an option that allows it to set up an RPC transport
      from an unprivileged port.
      
      Instead of having nfs_create_rpc_client()'s callers retain local
      knowledge about how to set up an RPC client, create a couple of flag
      arguments to control the use of RPC_CLNT_CREATE flags.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      4a01b8a4
    • C
      NFS: introduce nfs_mount_info struct for calling nfs_mount() · c5d120f8
      Chuck Lever 提交于
      Clean up: convert nfs_mount() to take a single data structure argument to make
      it simpler to add more arguments.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c5d120f8
    • C
      NFS: Move declaration of nfs_mount() to fs/nfs/internal.h · 146ec944
      Chuck Lever 提交于
      Clean up:  The nfs_mount() function is not to be used outside of the
      NFS client.  Move its public declaration to fs/nfs/internal.h.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      146ec944
    • C
      NFS: rename nfs_path variable · 7b5d2b98
      Chuck Lever 提交于
      Clean up: I'm about to move the declaration of nfs_mount into
      fs/nfs/internal.h and include it in fs/nfs/nfsroot.c.  There's a
      conflicting definition of nfs_path in fs/nfs/internal.h and
      fs/nfs/nfsroot.c, so rename the private one.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      7b5d2b98
    • J
      lockd: convert reclaimer thread to kthread interface · df94f000
      Jeff Layton 提交于
      My understanding is that there is a push to turn the kernel_thread
      interface into a non-exported symbol and move all kernel threads to use
      the kthread API. This patch changes lockd to use kthread_run to spawn
      the reclaimer thread.
      
      I've made the assumption here that the extra module references taken
      when we spawn this thread are unnecessary and removed them. I've also
      added a KERN_ERR printk that pops if the thread can't be spawned to warn
      the admin that the locks won't be reclaimed.
      
      In the future, it would be nice to be able to notify userspace that
      locks have been lost (probably by implementing SIGLOST), and adding some
      good policies about how long we should reattempt to reclaim the locks.
      
      Finally, I removed a comment about memory leaks that I believe is
      obsolete and added a new one to clarify the result of sending a SIGKILL
      to the reclaimer thread. As best I can tell, doing so doesn't actually
      cause a memory leak.
      
      I consider this patch 2.6.29 material.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      df94f000
    • T
    • T
      SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only · d716f0b8
      Trond Myklebust 提交于
      Again, this has never been intended as a public abi for out-of-tree
      modules.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d716f0b8
    • W
      nfs: remove redundant tests on reading new pages · 136221fc
      Wu Fengguang 提交于
      aops->readpages() and its NFS helper readpage_async_filler() will only
      be called to do readahead I/O for newly allocated pages. So it's not
      necessary to test for the always 0 dirty/uptodate page flags.
      
      The removal of nfs_wb_page() call also fixes a readahead bug: the NFS
      readahead has been synchronous since 2.6.23, because that call will
      clear PG_readahead, which is the reminder for asynchronous readahead.
      
      More background: the PG_readahead page flag is shared with PG_reclaim,
      one for read path and the other for write path. clear_page_dirty_for_io()
      unconditionally clears PG_readahead to prevent possible readahead residuals,
      assuming itself to be always called in the write path. However, NFS is one
      and the only exception in that it _always_ calls clear_page_dirty_for_io()
      in the read path, i.e. for readpages()/readpage().
      
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      136221fc
  2. 20 12月, 2008 3 次提交
  3. 18 12月, 2008 1 次提交
  4. 17 12月, 2008 2 次提交
  5. 11 12月, 2008 5 次提交
    • H
      KSYM_SYMBOL_LEN fixes · 9c246247
      Hugh Dickins 提交于
      Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
      to my 966c8c12 sprint_symbol(): use
      less stack exposing a bug in slub's list_locations() -
      kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
      beyond the end of page provided.
      
      The 100 slop which list_locations() allows at end of page looks roughly
      enough for all the other stuff it might print after the symbol before
      it checks again: break out KSYM_SYMBOL_LEN earlier than before.
      
      Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
      need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
      where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
      them.
      
      [akpm@linux-foundation.org: ftrace.h needs module.h]
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc Miles Lane <miles.lane@gmail.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c246247
    • D
      inotify: fix IN_ONESHOT unmount event watcher · 6ee5a399
      Dmitri Monakhov 提交于
      On umount two event will be dispatched to watcher:
      
      1: inotify_dev_queue_event(.., IN_UNMOUNT,..)
      2: remove_watch(watch, dev)
          ->inotify_dev_queue_event(.., IN_IGNORED, ..)
      
      But if watcher has IN_ONESHOT bit set then the watcher will be released
      inside first event.  Which result in accessing invalid object later.  IMHO
      it is not pure regression.  This bug wasn't triggered while initial
      inotify interface testing phase because of another bug in IN_ONESHOT
      handling logic :)
      
        commit ac74c00e
        Author: Ulisses Furquim <ulissesf@gmail.com>
        Date:   Fri Feb 8 04:18:16 2008 -0800
          inotify: fix check for one-shot watches before destroying them
          As the IN_ONESHOT bit is never set when an event is sent we must check it
          in the watch's mask and not in the event's mask.
      
      TESTCASE:
      mkdir mnt
      mount -ttmpfs none mnt
      mkdir mnt/d
      ./inotify mnt/d&
      umount mnt ## << lockup or crash here
      
      TESTSOURCE:
      /* gcc -oinotify inotify.c */
      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/inotify.h>
      
      int main(int argc, char **argv)
      {
              char buf[1024];
              struct inotify_event *ie;
              char *p;
              int i;
              ssize_t l;
      
              p = argv[1];
              i = inotify_init();
              inotify_add_watch(i, p, ~0);
      
              l = read(i, buf, sizeof(buf));
              printf("read %d bytes\n", l);
              ie = (struct inotify_event *) buf;
              printf("event mask: %d\n", ie->mask);
      	return 0;
      }
      Signed-off-by: NDmitri Monakhov <dmonakhov@openvz.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Cc: Ulisses Furquim <ulissesf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ee5a399
    • M
      pagemap: fix 32-bit pagemap regression · 49c50342
      Matt Mackall 提交于
      The large pages fix from bcf8039e broke 32-bit pagemap by pulling the
      pagemap entry code out into a function with the wrong return type.
      Pagemap entries are 64 bits on all systems and unsigned long is only 32
      bits on 32-bit systems.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Reported-by: NDoug Graham <dgraham@nortel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: <stable@kernel.org>		[2.6.26.x, 2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49c50342
    • A
      revert "percpu_counter: new function percpu_counter_sum_and_set" · 02d21168
      Andrew Morton 提交于
      Revert
      
          commit e8ced39d
          Author: Mingming Cao <cmm@us.ibm.com>
          Date:   Fri Jul 11 19:27:31 2008 -0400
      
              percpu_counter: new function percpu_counter_sum_and_set
      
      As described in
      
      	revert "percpu counter: clean up percpu_counter_sum_and_set()"
      
      the new percpu_counter_sum_and_set() is racy against updates to the
      cpu-local accumulators on other CPUs.  Revert that change.
      
      This means that ext4 will be slow again.  But correct.
      Reported-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: <stable@kernel.org>		[2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02d21168
    • A
      revert "percpu counter: clean up percpu_counter_sum_and_set()" · 71c5576f
      Andrew Morton 提交于
      Revert
      
          commit 1f7c14c6
          Author: Mingming Cao <cmm@us.ibm.com>
          Date:   Thu Oct 9 12:50:59 2008 -0400
      
              percpu counter: clean up percpu_counter_sum_and_set()
      
      Before this patch we had the following:
      
      percpu_counter_sum(): return the percpu_counter's value
      
      percpu_counter_sum_and_set(): return the percpu_counter's value, copying
      that value into the central value and zeroing the per-cpu counters before
      returning.
      
      After this patch, percpu_counter_sum_and_set() has gone, and
      percpu_counter_sum() gets the old percpu_counter_sum_and_set()
      functionality.
      
      Problem is, as Eric points out, the old percpu_counter_sum_and_set()
      functionality was racy and wrong.  It zeroes out counters on "other" cpus,
      without holding any locks which will prevent races agaist updates from
      those other CPUS.
      
      This patch reverts 1f7c14c6.  This means
      that percpu_counter_sum_and_set() still has the race, but
      percpu_counter_sum() does not.
      
      Note that this is not a simple revert - ext4 has since started using
      percpu_counter_sum() for its dirty_blocks counter as well.
      
      Note that this revert patch changes percpu_counter_sum() semantics.
      
      Before the patch, a call to percpu_counter_sum() will bring the counter's
      central counter mostly up-to-date, so a following percpu_counter_read()
      will return a close value.
      
      After this patch, a call to percpu_counter_sum() will leave the counter's
      central accumulator unaltered, so a subsequent call to
      percpu_counter_read() can now return a significantly inaccurate result.
      
      If there is any code in the tree which was introduced after
      e8ced39d was merged, and which depends
      upon the new percpu_counter_sum() semantics, that code will break.
      Reported-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71c5576f
  6. 10 12月, 2008 1 次提交
    • R
      tracehook: exec double-reporting fix · 85f33466
      Roland McGrath 提交于
      The patch 6341c393 "tracehook: exec" introduced a small regression in
      2.6.27 regarding binfmt_misc exec event reporting.  Since the reporting
      is now done in the common search_binary_handler() function, an exec
      of a misc binary will result in two (or possibly multiple) exec events
      being reported, instead of just a single one, because the misc handler
      contains a recursive call to search_binary_handler.
      
      To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple
      SIGTRAP signals will in fact cause only a single ptrace intercept, as the
      signals are not queued.  However, if PTRACE_O_TRACEEXEC is on, the debugger
      will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC).
      
      The test program included below demonstrates the problem.
      
      This change fixes the bug by calling tracehook_report_exec() only in the
      outermost search_binary_handler() call (bprm->recursion_depth == 0).
      
      The additional change to restore bprm->recursion_depth after each binfmt
      load_binary call is actually superfluous for this bug, since we test the
      value saved on entry to search_binary_handler().  But it keeps the use of
      of the depth count to its most obvious expected meaning.  Depending on what
      binfmt handlers do in certain cases, there could have been false-positive
      tests for recursion limits before this change.
      
          /* Test program using PTRACE_O_TRACEEXEC.
             This forks and exec's the first argument with the rest of the arguments,
             while ptrace'ing.  It expects to see one PTRACE_EVENT_EXEC stop and
             then a successful exit, with no other signals or events in between.
      
             Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec:
      
             $ gcc -g traceexec.c -o traceexec
             $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register'
             $ echo 'foobar test' > ./foobar
             $ chmod +x ./foobar
             $ ./traceexec ./foobar; echo $?
             ==> good <==
             foobar test
             0
             $
             ==> bad <==
             foobar test
             unexpected status 0x4057f != 0
             3
             $
      
          */
      
          #include <stdio.h>
          #include <sys/types.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
      
          static void
          wait_for (pid_t child, int expect)
          {
            int status;
            pid_t p = wait (&status);
            if (p != child)
      	{
      	  perror ("wait");
      	  exit (2);
      	}
            if (status != expect)
      	{
      	  fprintf (stderr, "unexpected status %#x != %#x\n", status, expect);
      	  exit (3);
      	}
          }
      
          int
          main (int argc, char **argv)
          {
            pid_t child = fork ();
      
            if (child < 0)
      	{
      	  perror ("fork");
      	  return 127;
      	}
            else if (child == 0)
      	{
      	  ptrace (PTRACE_TRACEME);
      	  raise (SIGUSR1);
      	  execv (argv[1], &argv[1]);
      	  perror ("execve");
      	  _exit (127);
      	}
      
            wait_for (child, W_STOPCODE (SIGUSR1));
      
            if (ptrace (PTRACE_SETOPTIONS, child,
      		  0L, (void *) (long) PTRACE_O_TRACEEXEC) != 0)
      	{
      	  perror ("PTRACE_SETOPTIONS");
      	  return 4;
      	}
      
            if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
      	{
      	  perror ("PTRACE_CONT");
      	  return 5;
      	}
      
            wait_for (child, W_STOPCODE (SIGTRAP | (PTRACE_EVENT_EXEC << 8)));
      
            if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0)
      	{
      	  perror ("PTRACE_CONT");
      	  return 6;
      	}
      
            wait_for (child, W_EXITCODE (0, 0));
      
            return 0;
          }
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      CC: Ulrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      85f33466
  7. 09 12月, 2008 1 次提交
    • J
      EXPORTFS: handle NULL returns from fh_to_dentry()/fh_to_parent() · a4f4d6df
      J. Bruce Fields 提交于
      While 44003728 "[PATCH] switch all filesystems over to
      d_obtain_alias" removed some cases where fh_to_dentry() and
      fh_to_parent() could return NULL, there are still a few NULL returns
      left in individual filesystems.  Thus it was a mistake for that commit
      to remove the handling of NULL returns in the callers.
      
      Revert those parts of 44003728 which removed the NULL handling.
      
      (We could, alternatively, modify all implementations to return -ESTALE
      instead of NULL, but that proves to require fixing a number of
      filesystems, and in some cases it's arguably more natural to return
      NULL.)
      
      Thanks to David for original patch and Linus, Christoph, and Hugh for
      review.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4f4d6df
  8. 06 12月, 2008 1 次提交
  9. 05 12月, 2008 1 次提交
  10. 04 12月, 2008 2 次提交
  11. 02 12月, 2008 7 次提交
  12. 28 11月, 2008 1 次提交
    • J
      udf: Fix BUG_ON() in destroy_inode() · 52b19ac9
      Jan Kara 提交于
      udf_clear_inode() can leave behind buffers on mapping's i_private list (when
      we truncated preallocation). Call invalidate_inode_buffers() so that the list
      is properly cleaned-up before we return from udf_clear_inode(). This is ugly
      and suggest that we should cleanup preallocation earlier than in clear_inode()
      but currently there's no such call available since drop_inode() is called under
      inode lock and thus is unusable for disk operations.
      Signed-off-by: NJan Kara <jack@suse.cz>
      52b19ac9
  13. 27 11月, 2008 1 次提交
    • J
      [CIFS] fix regression in cifs_write_begin/cifs_write_end · a98ee8c1
      Jeff Layton 提交于
      The conversion to write_begin/write_end interfaces had a bug where we
      were passing a bad parameter to cifs_readpage_worker. Rather than
      passing the page offset of the start of the write, we needed to pass the
      offset of the beginning of the page. This was reliably showing up as
      data corruption in the fsx-linux test from LTP.
      
      It also became evident that this code was occasionally doing unnecessary
      read calls. Optimize those away by using the PG_checked flag to indicate
      that the unwritten part of the page has been initialized.
      
      CC: Nick Piggin <npiggin@suse.de>
      Acked-by: NDave Kleikamp <shaggy@us.ibm.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      a98ee8c1
  14. 25 11月, 2008 3 次提交
    • C
      NLM: client-side nlm_lookup_host() should avoid matching on srcaddr · a8d82d9b
      Chuck Lever 提交于
      Since commit c98451bd, the loop in nlm_lookup_host() unconditionally
      compares the host's h_srcaddr field to the incoming source address.
      For client-side nlm_host entries, both are always AF_UNSPEC, so this
      check is unnecessary.
      
      Since commit 781b61a6, which added support for AF_INET6 addresses to
      nlm_cmp_addr(), nlm_cmp_addr() now returns FALSE for AF_UNSPEC
      addresses, which causes nlm_lookup_host() to create a fresh nlm_host
      entry every time it is called on the client.
      
      These extra entries will eventually expire once the server is
      unmounted, so the impact of this regression, introduced with lockd
      IPv6 support in 2.6.28, should be minor.
      
      We could fix this by adding an arm in nlm_cmp_addr() for AF_UNSPEC
      addresses, but really, nlm_lookup_host() shouldn't be matching on the
      srcaddr field for client-side nlm_host lookups.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      a8d82d9b
    • J
      nfsd: use of unitialized list head on error exit in nfs4recover.c · e4625eb8
      J. Bruce Fields 提交于
      Thanks to Matthew Dodd for this bug report:
      
      A file label issue while running SELinux in MLS mode provoked the
      following bug, which is a result of use before init on a 'struct list_head'.
      
      In nfsd4_list_rec_dir() if the call to dentry_open() fails the 'goto
      out' skips INIT_LIST_HEAD() which results in the normally improbable
      case where list_entry() returns NULL.
      
      Trace follows.
      
      NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
      SELinux:  Context unconfined_t:object_r:var_lib_nfs_t:s0 is not valid
      (left unmapped).
      type=1400 audit(1227298063.609:282): avc:  denied  { read } for
      pid=1890 comm="rpc.nfsd" name="v4recovery" dev=dm-0 ino=148726
      scontext=system_u:system_r:nfsd_t:s0-s15:c0.c1023
      tcontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tclass=dir
      BUG: unable to handle kernel NULL pointer dereference at 00000004
      IP: [<c050894e>] list_del+0x6/0x60
      *pde = 0d9ce067 *pte = 00000000
      Oops: 0000 [#1] SMP
      Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4
      sunrpc ipv6 dm_multipath scsi_dh ppdev parport_pc sg parport floppy
      ata_piix pata_acpi ata_generic libata pcnet32 i2c_piix4 mii pcspkr
      i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod BusLogic sd_mod
      scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last
      unloaded: microcode]
      
      Pid: 1890, comm: rpc.nfsd Not tainted (2.6.27.5-37.fc9.i686 #1)
      EIP: 0060:[<c050894e>] EFLAGS: 00010217 CPU: 0
      EIP is at list_del+0x6/0x60
      EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: cd99e480
      ESI: cf9caed8 EDI: 00000000 EBP: cf9caebc ESP: cf9caeb8
        DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process rpc.nfsd (pid: 1890, ti=cf9ca000 task=cf4de580 task.ti=cf9ca000)
      Stack: 00000000 cf9caef0 d0a9f139 c0496d04 d0a9f217 fffffff3 00000000
      00000000
              00000000 00000000 cf32b220 00000000 00000008 00000801 cf9caefc
      d0a9f193
              00000000 cf9caf08 d0a9b6ea 00000000 cf9caf1c d0a874f2 cf9c3004
      00000008
      Call Trace:
        [<d0a9f139>] ? nfsd4_list_rec_dir+0xf3/0x13a [nfsd]
        [<c0496d04>] ? do_path_lookup+0x12d/0x175
        [<d0a9f217>] ? load_recdir+0x0/0x26 [nfsd]
        [<d0a9f193>] ? nfsd4_recdir_load+0x13/0x34 [nfsd]
        [<d0a9b6ea>] ? nfs4_state_start+0x2a/0xc5 [nfsd]
        [<d0a874f2>] ? nfsd_svc+0x51/0xff [nfsd]
        [<d0a87f2d>] ? write_svc+0x0/0x1e [nfsd]
        [<d0a87f48>] ? write_svc+0x1b/0x1e [nfsd]
        [<d0a87854>] ? nfsctl_transaction_write+0x3a/0x61 [nfsd]
        [<c04b6a4e>] ? sys_nfsservctl+0x116/0x154
        [<c04975c1>] ? putname+0x24/0x2f
        [<c04975c1>] ? putname+0x24/0x2f
        [<c048d49f>] ? do_sys_open+0xad/0xb7
        [<c048d337>] ? filp_close+0x50/0x5a
        [<c048d4eb>] ? sys_open+0x1e/0x26
        [<c0403cca>] ? syscall_call+0x7/0xb
        [<c064007b>] ? init_cyrix+0x185/0x490
        =======================
      Code: 75 e1 8b 53 08 8d 4b 04 8d 46 04 e8 75 00 00 00 8b 53 10 8d 4b 0c
      8d 46 0c e8 67 00 00 00 5b 5e 5f 5d c3 90 90 55 89 e5 53 89 c3 <8b> 40
      04 8b 00 39 d8 74 16 50 53 68 3e d6 6f c0 6a 30 68 78 d6
      EIP: [<c050894e>] list_del+0x6/0x60 SS:ESP 0068:cf9caeb8
      ---[ end trace a89c4ad091c4ad53 ]---
      
      Cc: Matthew N. Dodd <Matthew.Dodd@spart.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      e4625eb8
    • J
      nfsd: clean up grace period on early exit · 2c5e7615
      J. Bruce Fields 提交于
      If nfsd was shut down before the grace period ended, we could end up
      with a freed object still on grace_list.  Thanks to Jeff Moyer for
      reporting the resulting list corruption warnings.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Tested-by: NJeff Moyer <jmoyer@redhat.com>
      2c5e7615