1. 03 10月, 2006 11 次提交
  2. 02 10月, 2006 29 次提交
    • D
      [PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h · 3f2e05e9
      David Howells 提交于
      Revert Andrew Morton's patch to temporarily hack around the lack of a
      declaration of sigset_t in linux/compat.h to make the block-disablement
      patches build on IA64.  This got accidentally pushed to Linus and should
      be fixed in a different manner.
      
      Also make linux/compat.h #include asm/signal.h to gain a definition of
      sigset_t so that it can externally declare sigset_from_compat().
      
      This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
      ppc64 and run-tested on frv.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3f2e05e9
    • C
      [PATCH] replace cad_pid by a struct pid · 9ec52099
      Cedric Le Goater 提交于
      There are a few places in the kernel where the init task is signaled.  The
      ctrl+alt+del sequence is one them.  It kills a task, usually init, using a
      cached pid (cad_pid).
      
      This patch replaces the pid_t by a struct pid to avoid pid wrap around
      problem.  The struct pid is initialized at boot time in init() and can be
      modified through systctl with
      
      	/proc/sys/kernel/cad_pid
      
      [ I haven't found any distro using it ? ]
      
      It also introduces a small helper routine kill_cad_pid() which is used
      where it seemed ok to use cad_pid instead of pid 1.
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9ec52099
    • O
      [PATCH] introduce get_task_pid() to fix unsafe get_pid() · 1a657f78
      Oleg Nesterov 提交于
      proc_pid_make_inode:
      
      	ei->pid = get_pid(task_pid(task));
      
      I think this is not safe.  get_pid() can be preempted after checking "pid
      != NULL".  Then the task exits, does detach_pid(), and RCU frees the pid.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a657f78
    • A
      [PATCH] remove remaining errno and __KERNEL_SYSCALLS__ references · 135ab6ec
      Arnd Bergmann 提交于
      The last in-kernel user of errno is gone, so we should remove the definition
      and everything referring to it.  This also removes the now-unused lib/execve.c
      file that was introduced earlier.
      
      Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the
      kernel.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      135ab6ec
    • A
      [PATCH] rename the provided execve functions to kernel_execve · 3db03b4a
      Arnd Bergmann 提交于
      Some architectures provide an execve function that does not set errno, but
      instead returns the result code directly.  Rename these to kernel_execve to
      get the right semantics there.  Moreover, there is no reasone for any of these
      architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
      remove these right away.
      
      [akpm@osdl.org: build fix]
      [bunk@stusta.de: build fix]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andi Kleen <ak@muc.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3db03b4a
    • K
      [PATCH] IPC namespace - utils · 73ea4130
      Kirill Korotaev 提交于
      This patch adds basic IPC namespace functionality to
      IPC utils:
      - init_ipc_ns
      - copy/clone/unshare/free IPC ns
      - /proc preparations
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      73ea4130
    • K
      [PATCH] IPC namespace core · 25b21cb2
      Kirill Korotaev 提交于
      This patch set allows to unshare IPCs and have a private set of IPC objects
      (sem, shm, msg) inside namespace.  Basically, it is another building block of
      containers functionality.
      
      This patch implements core IPC namespace changes:
      - ipc_namespace structure
      - new config option CONFIG_IPC_NS
      - adds CLONE_NEWIPC flag
      - unshare support
      
      [clg@fr.ibm.com: small fix for unshare of ipc namespace]
      [akpm@osdl.org: build fix]
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25b21cb2
    • S
      [PATCH] namespaces: utsname: implement CLONE_NEWUTS flag · 071df104
      Serge E. Hallyn 提交于
      Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare.
      
      [clg@fr.ibm.com: IPC unshare fix]
      [bunk@stusta.de: cleanup]
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      071df104
    • S
      [PATCH] namespaces: utsname: remove system_utsname · bf47fdcd
      Serge E. Hallyn 提交于
      The system_utsname isn't needed now that kernel/sysctl.c is fixed.
      Nuke it.
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bf47fdcd
    • S
      [PATCH] namespaces: utsname: implement utsname namespaces · 4865ecf1
      Serge E. Hallyn 提交于
      This patch defines the uts namespace and some manipulators.
      Adds the uts namespace to task_struct, and initializes a
      system-wide init namespace.
      
      It leaves a #define for system_utsname so sysctl will compile.
      This define will be removed in a separate patch.
      
      [akpm@osdl.org: build fix, cleanup]
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4865ecf1
    • S
      [PATCH] namespaces: utsname: switch to using uts namespaces · e9ff3990
      Serge E. Hallyn 提交于
      Replace references to system_utsname to the per-process uts namespace
      where appropriate.  This includes things like uname.
      
      Changes: Per Eric Biederman's comments, use the per-process uts namespace
      	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
      
      [jdike@addtoit.com: UML fix]
      [clg@fr.ibm.com: cleanup]
      [akpm@osdl.org: build fix]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e9ff3990
    • S
      [PATCH] namespaces: utsname: introduce temporary helpers · 0bdd7aab
      Serge E. Hallyn 提交于
      Define utsname() and init_utsname() which return &system_utsname.  Users of
      system_utsname will be changed to use these helpers, after which
      system_utsname will disappear.
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0bdd7aab
    • S
      [PATCH] namespaces: incorporate fs namespace into nsproxy · 1651e14e
      Serge E. Hallyn 提交于
      This moves the mount namespace into the nsproxy.  The mount namespace count
      now refers to the number of nsproxies point to it, rather than the number of
      tasks.  As a result, the unshare_namespace() function in kernel/fork.c no
      longer checks whether it is being shared.
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1651e14e
    • S
      [PATCH] namespaces: add nsproxy · ab516013
      Serge E. Hallyn 提交于
      This patch adds a nsproxy structure to the task struct.  Later patches will
      move the fs namespace pointer into this structure, and introduce a new utsname
      namespace into the nsproxy.
      
      The vserver and openvz functionality, then, would be implemented in large part
      by virtualizing/isolating more and more resources into namespaces, each
      contained in the nsproxy.
      
      [akpm@osdl.org: build fix]
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab516013
    • P
      [PATCH] nfsd: lockdep annotation · 12fd3520
      Peter Zijlstra 提交于
      while doing a kernel make modules_install install over an NFS mount.
      
        =============================================
        [ INFO: possible recursive locking detected ]
        ---------------------------------------------
        nfsd/9550 is trying to acquire lock:
         (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f
      
        but task is already holding lock:
         (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f
      
        other info that might help us debug this:
        2 locks held by nfsd/9550:
         #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
         #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f
      
        stack backtrace:
         [<c0103508>] show_trace_log_lvl+0x58/0x152
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa57>] __lock_acquire+0x77a/0x9a3
         [<c012af4a>] lock_acquire+0x60/0x80
         [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034c845>] mutex_lock+0x1c/0x1f
         [<c0162edc>] vfs_unlink+0x34/0x8a
         [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
         [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033e84d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
        DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
        Leftover inexact backtrace:
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa57>] __lock_acquire+0x77a/0x9a3
         [<c012af4a>] lock_acquire+0x60/0x80
         [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034c845>] mutex_lock+0x1c/0x1f
         [<c0162edc>] vfs_unlink+0x34/0x8a
         [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
         [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033e84d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
      
        =============================================
        [ INFO: possible recursive locking detected ]
        ---------------------------------------------
        nfsd/9580 is trying to acquire lock:
         (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f
      
        but task is already holding lock:
         (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f
      
        other info that might help us debug this:
        2 locks held by nfsd/9580:
         #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
         #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f
      
        stack backtrace:
         [<c0103508>] show_trace_log_lvl+0x58/0x152
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa63>] __lock_acquire+0x77a/0x9a3
         [<c012af56>] lock_acquire+0x60/0x80
         [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034cc1d>] mutex_lock+0x1c/0x1f
         [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
         [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
         [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033ec1d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
        DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
        Leftover inexact backtrace:
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa63>] __lock_acquire+0x77a/0x9a3
         [<c012af56>] lock_acquire+0x60/0x80
         [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034cc1d>] mutex_lock+0x1c/0x1f
         [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
         [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
         [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033ec1d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      12fd3520
    • G
      [PATCH] knfsd: make rpc threads pools numa aware · bfd24160
      Greg Banks 提交于
      Actually implement multiple pools.  On NUMA machines, allocate a svc_pool per
      NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool.  Enqueue
      sockets on the svc_pool corresponding to the CPU on which the socket bh is run
      (i.e.  the NIC interrupt CPU).  Threads have their cpu mask set to limit them
      to the CPUs in the svc_pool that owns them.
      
      This is the patch that allows an Altix to scale NFS traffic linearly
      beyond 4 CPUs and 4 NICs.
      
      Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
      Christoph Hellwig.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bfd24160
    • G
      [PATCH] knfsd: add svc_set_num_threads · a7455442
      Greg Banks 提交于
      Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
      way of managing the list of all threads in a svc_serv.  Add
      svc_create_pooled() to allow creation of a svc_serv whose threads are managed
      by the sunrpc code.  Add svc_set_num_threads() to manage the number of threads
      in a service, either per-pool or globally across the service.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a7455442
    • G
      [PATCH] knfsd: add svc_get · 9a24ab57
      Greg Banks 提交于
      add svc_get() for those occasions when we need to temporarily bump up
      svc_serv->sv_nrthreads as a pseudo refcount.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a24ab57
    • G
      [PATCH] knfsd: split svc_serv into pools · 3262c816
      Greg Banks 提交于
      Split out the list of idle threads and pending sockets from svc_serv into a
      new svc_pool structure, and allocate a fixed number (in this patch, 1) of
      pools per svc_serv.  The new structure contains a lock which takes over
      several of the duties of svc_serv->sv_lock, which is now relegated to
      protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.
      
      The point is to move the hottest fields out of svc_serv and into svc_pool,
      allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
       This is a major step towards making the NFS server NUMA-friendly.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3262c816
    • G
      [PATCH] knfsd: convert sk_reserved to atomic_t · 5685f0fa
      Greg Banks 提交于
      Convert the svc_sock->sk_reserved variable from an int protected by
      svc_serv->sv_lock, to an atomic.  This reduces (by 1) the number of places we
      need to take the (effectively global) svc_serv->sv_lock.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5685f0fa
    • G
      [PATCH] knfsd: use new lock for svc_sock deferred list · 1a68d952
      Greg Banks 提交于
      Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
      instead of svc_serv->sv_lock.  Using the more fine-grained lock reduces the
      number of places we need to take the svc_serv lock.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a68d952
    • G
      [PATCH] knfsd: convert sk_inuse to atomic_t · c45c357d
      Greg Banks 提交于
      Convert the svc_sock->sk_inuse counter from an int protected by
      svc_serv->sv_lock, to an atomic.  This reduces the number of places we need to
      take the (effectively global) svc_serv->sv_lock.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c45c357d
    • G
      [PATCH] knfsd: move tempsock aging to a timer · 36bdfc8b
      Greg Banks 提交于
      Following are 11 patches from Greg Banks which combine to make knfsd more
      Numa-aware.  They reduce hitting on 'global' data structures, and create some
      data-structures that can be node-local.
      
      knfsd threads are bound to a particular node, and the thread to handle a new
      request is chosen from the threads that are attach to the node that received
      the interrupt.
      
      The distribution of threads across nodes can be controlled by a new file in
      the 'nfsd' filesystem, though the default approach of an even spread is
      probably fine for most sites.
      
      Some (old) numbers that show the efficacy of these patches: N == number of
      NICs == number of CPUs == nmber of clients.  Number of NUMA nodes == N/2
      
      N	Throughput, MiB/s	CPU usage, % (max=N*100)
      	Before	After		Before	After
      	---	------	----		-----	-----
      	4	312	435		350	228
      	6	500	656		501	418
      	8	562	804		690	589
      
      This patch:
      
      Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
      a timer which uses a mark-and-sweep algorithm every 6 minutes.  This reduces
      the amount of work that needs to be done in the main RPC loop and the length
      of time we need to hold the (effectively global) svc_serv->sv_lock.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      36bdfc8b
    • N
      [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process · 6fb2b47f
      NeilBrown 提交于
      It isn't needed as it is available in rqstp->rq_server, and dropping it allows
      some local vars to be dropped.
      
      [akpm@osdl.org: build fix]
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6fb2b47f
    • N
      [PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist' · b41b66d6
      NeilBrown 提交于
      Userspace should create and bind a socket (but not connectted) and write the
      'fd' to portlist.  This will cause the nfs server to listen on that socket.
      
      To close a socket, the name of the socket - as read from 'portlist' can be
      written to 'portlist' with a preceding '-'.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b41b66d6
    • N
      [PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports · 80212d59
      NeilBrown 提交于
      This file will list all ports that nfsd has open.
      Default when TCP enabled will be
         ipv4 udp 0.0.0.0 2049
         ipv4 tcp 0.0.0.0 2049
      
      Later, the list of ports will be settable.
      
      'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
      non-mainline patches which created 'ports' with different semantics.
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      80212d59
    • N
      [PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions · 6658d3a7
      NeilBrown 提交于
      We have an array 'nfsd_version' which lists the available versions of nfsd,
      and 'nfsd_versions' (poor choice there :-() which lists the currently active
      versions.
      
      Then we have a bitmap - nfsd_versbits which says which versions are wanted.
      The bits in this bitset cause content to be copied from nfsd_version to
      nfsd_versions when nfsd starts.
      
      This patch removes nfsd_versbits and moves information directly from
      nfsd_version to nfsd_versions when requests for version changes arrive.
      
      Note that this doesn't make it possible to change versions while the server is
      running.  This is because serv->sv_xdrsize is calculated when a service is
      created, and used when threads are created, and xdrsize depends on the active
      versions.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6658d3a7
    • N
      [PATCH] knfsd: be more selective in which sockets lockd listens on · 24e36663
      NeilBrown 提交于
      Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.
      
      However as lockd performs services of the client as well, this is a problem.
      If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
      able to call back to lockd.
      
      So:
       - add an option to lockd_up saying which protocol is needed
       - Always open sockets for which an explicit port was given, otherwise
         only open a socket of the type required
       - Change nfsd to do one lockd_up per socket rather than one per thread.
      
      This
       - removes the dependancy on CONFIG_NFSD_TCP
       - means that lockd may open sockets other than at startup
       - means that lockd will *not* listen on UDP if the only
         mounts are TCP mount (and nfsd hasn't started).
      
      The latter is the only one that concerns me at all - I don't know if this
      might be a problem with some servers.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      24e36663
    • N
      [PATCH] knfsd: add a callback for when last rpc thread finishes · bc591ccf
      NeilBrown 提交于
      nfsd has some cleanup that it wants to do when the last thread exits, and
      there will shortly be some more.  So collect this all into one place and
      define a callback for an rpc service to call when the service is about to be
      destroyed.
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc591ccf