1. 03 10月, 2006 8 次提交
    • M
      [PATCH] Make number of IDE interfaces configurable · 83d7dbc4
      Matt Mackall 提交于
      Make IDE_HWIFS configurable if EMBEDDED
      
      This lets us lop as much as 16k off an x86 build.  It's a little ugly, but
      it's dead simple.  Note the fix for HWIFS < 2.
      
      Sizing interfaces dynamically unfortunately turns out to be pretty
      major surgery.
      
      add/remove: 0/1 grow/shrink: 0/11 up/down: 0/-16182 (-16182)
      function                                     old     new   delta
      ide_hwifs                                  16920    1692  -15228
      init_irq                                    1113     750    -363
      ideprobe_init                                283     138    -145
      ide_pci_setup_ports                         1329    1193    -136
      save_match                                    85       -     -85
      ide_register_hw_with_fixup                   367     287     -80
      ide_setup                                   1364    1308     -56
      is_chipset_set                                40       4     -36
      create_proc_ide_interfaces                   225     205     -20
      init_ide_data                                 84      67     -17
      ide_probe_for_cmd640x                       1198    1183     -15
      ide_unregister                              1452    1451      -1
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      83d7dbc4
    • S
      [PATCH] IDE: claim extra DMA ports regardless of channel · 020e322d
      Sergei Shtylylov 提交于
      - Claim extra DMA I/O ports regardless of what IDE channels are
        present/enabled.
      
      - Remove extra ports handling from ide_mapped_mmio_dma() since it's not
        applicable to the custom-mapping IDE drivers.
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      020e322d
    • S
      [PATCH] sched: cleanup sched_group cpu_power setup · 89c4710e
      Siddha, Suresh B 提交于
      Up to now sched group's cpu_power for each sched domain is initialized
      independently.  This made the setup code ugly as the new sched domains are
      getting added.
      
      Make the sched group cpu_power setup code generic, by using domain child
      field and new domain flag in sched_domain.  For most of the sched
      domains(except NUMA), sched group's cpu_power is now computed generically
      using the domain properties of itself and of the child domain.
      
      sched groups in NUMA domains are setup little differently and hence they
      don't use this generic mechanism.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      89c4710e
    • S
      [PATCH] sched: introduce child field in sched_domain · 1a848870
      Siddha, Suresh B 提交于
      Introduce the child field in sched_domain struct and use it in
      sched_balance_self().
      
      We will also use this field in cleaning up the sched group cpu_power
      setup(done in a different patch) code.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a848870
    • F
      [PATCH] Create kallsyms_lookup_size_offset() · ffc50891
      Franck Bui-Huu 提交于
      Some uses of kallsyms_lookup() do not need to find out the name of a symbol
      and its module's name it belongs.  This is specially true in arch specific
      code, which needs to unwind the stack to show the back trace during oops
      (mips is an example).  In this specific case, we just need to retreive the
      function's size and the offset of the active intruction inside it.
      
      Adds a new entry "kallsyms_lookup_size_offset()" This new entry does
      exactly the same as kallsyms_lookup() but does not require any buffers to
      store any names.
      
      It returns 0 if it fails otherwise 1.
      Signed-off-by: NFranck Bui-Huu <vagabon.xyz@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ffc50891
    • D
      [PATCH] VFS: Make filldir_t and struct kstat deal in 64-bit inode numbers · afefdbb2
      David Howells 提交于
      These patches make the kernel pass 64-bit inode numbers internally when
      communicating to userspace, even on a 32-bit system.  They are required
      because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
      for example.  The 64-bit inode numbers are then propagated to userspace
      automatically where the arch supports it.
      
      Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
      number returned by stat64() or getdents64() to differentiate files, and
      failing because the 64-bit inode number space was compressed to 32-bits, and
      so overlaps occur.
      
      This patch:
      
      Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
      inode number so that 64-bit inode numbers can be passed back to userspace.
      
      The stat functions then returns the full 64-bit inode number where
      available and where possible.  If it is not possible to represent the inode
      number supplied by the filesystem in the field provided by userspace, then
      error EOVERFLOW will be issued.
      
      Similarly, the getdents/readdir functions now pass the full 64-bit inode
      number to userspace where possible, returning EOVERFLOW instead when a
      directory entry is encountered that can't be properly represented.
      
      Note that this means that some inodes will not be stat'able on a 32-bit
      system with old libraries where they were before - but it does mean that
      there will be no ambiguity over what a 32-bit inode number refers to.
      
      Note similarly that directory scans may be cut short with an error on a
      32-bit system with old libraries where the scan would work before for the
      same reasons.
      
      It is judged unlikely that this situation will occur because modern glibc
      uses 64-bit capable versions of stat and getdents class functions
      exclusively, and that older systems are unlikely to encounter
      unrepresentable inode numbers anyway.
      
      [akpm: alpha build fix]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      afefdbb2
    • A
      [PATCH] pid.h cleanup · 1d32849b
      Andrew Morton 提交于
      Make the pid.h macros look less revolting in an 80-col window.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1d32849b
    • L
      Add prototype for sigset_from_compat() · 0235497f
      Linus Torvalds 提交于
      Duh.  I screwed up editing David Howells patch in commit
      3f2e05e9, and the actual declaration for
      the sigset_from_compat() function went missing. My bad.
      
      Olaf Hering saved the day and noticed that I'm a moron.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0235497f
  2. 02 10月, 2006 32 次提交
    • D
      [PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h · 3f2e05e9
      David Howells 提交于
      Revert Andrew Morton's patch to temporarily hack around the lack of a
      declaration of sigset_t in linux/compat.h to make the block-disablement
      patches build on IA64.  This got accidentally pushed to Linus and should
      be fixed in a different manner.
      
      Also make linux/compat.h #include asm/signal.h to gain a definition of
      sigset_t so that it can externally declare sigset_from_compat().
      
      This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
      ppc64 and run-tested on frv.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3f2e05e9
    • C
      [PATCH] replace cad_pid by a struct pid · 9ec52099
      Cedric Le Goater 提交于
      There are a few places in the kernel where the init task is signaled.  The
      ctrl+alt+del sequence is one them.  It kills a task, usually init, using a
      cached pid (cad_pid).
      
      This patch replaces the pid_t by a struct pid to avoid pid wrap around
      problem.  The struct pid is initialized at boot time in init() and can be
      modified through systctl with
      
      	/proc/sys/kernel/cad_pid
      
      [ I haven't found any distro using it ? ]
      
      It also introduces a small helper routine kill_cad_pid() which is used
      where it seemed ok to use cad_pid instead of pid 1.
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9ec52099
    • O
      [PATCH] introduce get_task_pid() to fix unsafe get_pid() · 1a657f78
      Oleg Nesterov 提交于
      proc_pid_make_inode:
      
      	ei->pid = get_pid(task_pid(task));
      
      I think this is not safe.  get_pid() can be preempted after checking "pid
      != NULL".  Then the task exits, does detach_pid(), and RCU frees the pid.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a657f78
    • A
      [PATCH] remove remaining errno and __KERNEL_SYSCALLS__ references · 135ab6ec
      Arnd Bergmann 提交于
      The last in-kernel user of errno is gone, so we should remove the definition
      and everything referring to it.  This also removes the now-unused lib/execve.c
      file that was introduced earlier.
      
      Also remove every trace of __KERNEL_SYSCALLS__ that still remained in the
      kernel.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      135ab6ec
    • A
      [PATCH] rename the provided execve functions to kernel_execve · 3db03b4a
      Arnd Bergmann 提交于
      Some architectures provide an execve function that does not set errno, but
      instead returns the result code directly.  Rename these to kernel_execve to
      get the right semantics there.  Moreover, there is no reasone for any of these
      architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so
      remove these right away.
      
      [akpm@osdl.org: build fix]
      [bunk@stusta.de: build fix]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andi Kleen <ak@muc.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3db03b4a
    • K
      [PATCH] IPC namespace - utils · 73ea4130
      Kirill Korotaev 提交于
      This patch adds basic IPC namespace functionality to
      IPC utils:
      - init_ipc_ns
      - copy/clone/unshare/free IPC ns
      - /proc preparations
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      73ea4130
    • K
      [PATCH] IPC namespace core · 25b21cb2
      Kirill Korotaev 提交于
      This patch set allows to unshare IPCs and have a private set of IPC objects
      (sem, shm, msg) inside namespace.  Basically, it is another building block of
      containers functionality.
      
      This patch implements core IPC namespace changes:
      - ipc_namespace structure
      - new config option CONFIG_IPC_NS
      - adds CLONE_NEWIPC flag
      - unshare support
      
      [clg@fr.ibm.com: small fix for unshare of ipc namespace]
      [akpm@osdl.org: build fix]
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25b21cb2
    • S
      [PATCH] namespaces: utsname: implement CLONE_NEWUTS flag · 071df104
      Serge E. Hallyn 提交于
      Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare.
      
      [clg@fr.ibm.com: IPC unshare fix]
      [bunk@stusta.de: cleanup]
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      071df104
    • S
      [PATCH] namespaces: utsname: remove system_utsname · bf47fdcd
      Serge E. Hallyn 提交于
      The system_utsname isn't needed now that kernel/sysctl.c is fixed.
      Nuke it.
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bf47fdcd
    • S
      [PATCH] namespaces: utsname: implement utsname namespaces · 4865ecf1
      Serge E. Hallyn 提交于
      This patch defines the uts namespace and some manipulators.
      Adds the uts namespace to task_struct, and initializes a
      system-wide init namespace.
      
      It leaves a #define for system_utsname so sysctl will compile.
      This define will be removed in a separate patch.
      
      [akpm@osdl.org: build fix, cleanup]
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4865ecf1
    • S
      [PATCH] namespaces: utsname: switch to using uts namespaces · e9ff3990
      Serge E. Hallyn 提交于
      Replace references to system_utsname to the per-process uts namespace
      where appropriate.  This includes things like uname.
      
      Changes: Per Eric Biederman's comments, use the per-process uts namespace
      	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c
      
      [jdike@addtoit.com: UML fix]
      [clg@fr.ibm.com: cleanup]
      [akpm@osdl.org: build fix]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e9ff3990
    • S
      [PATCH] namespaces: utsname: introduce temporary helpers · 0bdd7aab
      Serge E. Hallyn 提交于
      Define utsname() and init_utsname() which return &system_utsname.  Users of
      system_utsname will be changed to use these helpers, after which
      system_utsname will disappear.
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0bdd7aab
    • S
      [PATCH] namespaces: incorporate fs namespace into nsproxy · 1651e14e
      Serge E. Hallyn 提交于
      This moves the mount namespace into the nsproxy.  The mount namespace count
      now refers to the number of nsproxies point to it, rather than the number of
      tasks.  As a result, the unshare_namespace() function in kernel/fork.c no
      longer checks whether it is being shared.
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1651e14e
    • S
      [PATCH] namespaces: add nsproxy · ab516013
      Serge E. Hallyn 提交于
      This patch adds a nsproxy structure to the task struct.  Later patches will
      move the fs namespace pointer into this structure, and introduce a new utsname
      namespace into the nsproxy.
      
      The vserver and openvz functionality, then, would be implemented in large part
      by virtualizing/isolating more and more resources into namespaces, each
      contained in the nsproxy.
      
      [akpm@osdl.org: build fix]
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab516013
    • P
      [PATCH] nfsd: lockdep annotation · 12fd3520
      Peter Zijlstra 提交于
      while doing a kernel make modules_install install over an NFS mount.
      
        =============================================
        [ INFO: possible recursive locking detected ]
        ---------------------------------------------
        nfsd/9550 is trying to acquire lock:
         (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f
      
        but task is already holding lock:
         (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f
      
        other info that might help us debug this:
        2 locks held by nfsd/9550:
         #0:  (hash_sem){..--}, at: [<cc895223>] exp_readlock+0xd/0xf [nfsd]
         #1:  (&inode->i_mutex){--..}, at: [<c034c845>] mutex_lock+0x1c/0x1f
      
        stack backtrace:
         [<c0103508>] show_trace_log_lvl+0x58/0x152
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa57>] __lock_acquire+0x77a/0x9a3
         [<c012af4a>] lock_acquire+0x60/0x80
         [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034c845>] mutex_lock+0x1c/0x1f
         [<c0162edc>] vfs_unlink+0x34/0x8a
         [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
         [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033e84d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
        DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
        Leftover inexact backtrace:
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa57>] __lock_acquire+0x77a/0x9a3
         [<c012af4a>] lock_acquire+0x60/0x80
         [<c034c6c2>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034c845>] mutex_lock+0x1c/0x1f
         [<c0162edc>] vfs_unlink+0x34/0x8a
         [<cc891d98>] nfsd_unlink+0x18f/0x1e2 [nfsd]
         [<cc89884f>] nfsd3_proc_remove+0x95/0xa2 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033e84d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
      
        =============================================
        [ INFO: possible recursive locking detected ]
        ---------------------------------------------
        nfsd/9580 is trying to acquire lock:
         (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f
      
        but task is already holding lock:
         (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f
      
        other info that might help us debug this:
        2 locks held by nfsd/9580:
         #0:  (hash_sem){..--}, at: [<cc89522b>] exp_readlock+0xd/0xf [nfsd]
         #1:  (&inode->i_mutex){--..}, at: [<c034cc1d>] mutex_lock+0x1c/0x1f
      
        stack backtrace:
         [<c0103508>] show_trace_log_lvl+0x58/0x152
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa63>] __lock_acquire+0x77a/0x9a3
         [<c012af56>] lock_acquire+0x60/0x80
         [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034cc1d>] mutex_lock+0x1c/0x1f
         [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
         [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
         [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033ec1d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
        DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb
        Leftover inexact backtrace:
         [<c0103b8b>] show_trace+0xd/0x10
         [<c0103c2f>] dump_stack+0x19/0x1b
         [<c012aa63>] __lock_acquire+0x77a/0x9a3
         [<c012af56>] lock_acquire+0x60/0x80
         [<c034ca9a>] __mutex_lock_slowpath+0xa7/0x20e
         [<c034cc1d>] mutex_lock+0x1c/0x1f
         [<cc892ad1>] nfsd_setattr+0x2c8/0x499 [nfsd]
         [<cc893ede>] nfsd_create_v3+0x31b/0x4ac [nfsd]
         [<cc8984a1>] nfsd3_proc_create+0x128/0x138 [nfsd]
         [<cc88f0d4>] nfsd_dispatch+0xc0/0x178 [nfsd]
         [<c033ec1d>] svc_process+0x3a5/0x5ed
         [<cc88f5ba>] nfsd+0x1a7/0x305 [nfsd]
         [<c0101005>] kernel_thread_helper+0x5/0xb
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      12fd3520
    • G
      [PATCH] knfsd: make rpc threads pools numa aware · bfd24160
      Greg Banks 提交于
      Actually implement multiple pools.  On NUMA machines, allocate a svc_pool per
      NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool.  Enqueue
      sockets on the svc_pool corresponding to the CPU on which the socket bh is run
      (i.e.  the NIC interrupt CPU).  Threads have their cpu mask set to limit them
      to the CPUs in the svc_pool that owns them.
      
      This is the patch that allows an Altix to scale NFS traffic linearly
      beyond 4 CPUs and 4 NICs.
      
      Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
      Christoph Hellwig.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bfd24160
    • G
      [PATCH] knfsd: add svc_set_num_threads · a7455442
      Greg Banks 提交于
      Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
      way of managing the list of all threads in a svc_serv.  Add
      svc_create_pooled() to allow creation of a svc_serv whose threads are managed
      by the sunrpc code.  Add svc_set_num_threads() to manage the number of threads
      in a service, either per-pool or globally across the service.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a7455442
    • G
      [PATCH] knfsd: add svc_get · 9a24ab57
      Greg Banks 提交于
      add svc_get() for those occasions when we need to temporarily bump up
      svc_serv->sv_nrthreads as a pseudo refcount.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a24ab57
    • G
      [PATCH] knfsd: split svc_serv into pools · 3262c816
      Greg Banks 提交于
      Split out the list of idle threads and pending sockets from svc_serv into a
      new svc_pool structure, and allocate a fixed number (in this patch, 1) of
      pools per svc_serv.  The new structure contains a lock which takes over
      several of the duties of svc_serv->sv_lock, which is now relegated to
      protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.
      
      The point is to move the hottest fields out of svc_serv and into svc_pool,
      allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
       This is a major step towards making the NFS server NUMA-friendly.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3262c816
    • G
      [PATCH] knfsd: convert sk_reserved to atomic_t · 5685f0fa
      Greg Banks 提交于
      Convert the svc_sock->sk_reserved variable from an int protected by
      svc_serv->sv_lock, to an atomic.  This reduces (by 1) the number of places we
      need to take the (effectively global) svc_serv->sv_lock.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5685f0fa
    • G
      [PATCH] knfsd: use new lock for svc_sock deferred list · 1a68d952
      Greg Banks 提交于
      Protect the svc_sock->sk_deferred list with a new lock svc_sock->sk_defer_lock
      instead of svc_serv->sv_lock.  Using the more fine-grained lock reduces the
      number of places we need to take the svc_serv lock.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a68d952
    • G
      [PATCH] knfsd: convert sk_inuse to atomic_t · c45c357d
      Greg Banks 提交于
      Convert the svc_sock->sk_inuse counter from an int protected by
      svc_serv->sv_lock, to an atomic.  This reduces the number of places we need to
      take the (effectively global) svc_serv->sv_lock.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c45c357d
    • G
      [PATCH] knfsd: move tempsock aging to a timer · 36bdfc8b
      Greg Banks 提交于
      Following are 11 patches from Greg Banks which combine to make knfsd more
      Numa-aware.  They reduce hitting on 'global' data structures, and create some
      data-structures that can be node-local.
      
      knfsd threads are bound to a particular node, and the thread to handle a new
      request is chosen from the threads that are attach to the node that received
      the interrupt.
      
      The distribution of threads across nodes can be controlled by a new file in
      the 'nfsd' filesystem, though the default approach of an even spread is
      probably fine for most sites.
      
      Some (old) numbers that show the efficacy of these patches: N == number of
      NICs == number of CPUs == nmber of clients.  Number of NUMA nodes == N/2
      
      N	Throughput, MiB/s	CPU usage, % (max=N*100)
      	Before	After		Before	After
      	---	------	----		-----	-----
      	4	312	435		350	228
      	6	500	656		501	418
      	8	562	804		690	589
      
      This patch:
      
      Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
      a timer which uses a mark-and-sweep algorithm every 6 minutes.  This reduces
      the amount of work that needs to be done in the main RPC loop and the length
      of time we need to hold the (effectively global) svc_serv->sv_lock.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      36bdfc8b
    • N
      [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process · 6fb2b47f
      NeilBrown 提交于
      It isn't needed as it is available in rqstp->rq_server, and dropping it allows
      some local vars to be dropped.
      
      [akpm@osdl.org: build fix]
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6fb2b47f
    • N
      [PATCH] knfsd: allow sockets to be passed to nfsd via 'portlist' · b41b66d6
      NeilBrown 提交于
      Userspace should create and bind a socket (but not connectted) and write the
      'fd' to portlist.  This will cause the nfs server to listen on that socket.
      
      To close a socket, the name of the socket - as read from 'portlist' can be
      written to 'portlist' with a preceding '-'.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b41b66d6
    • N
      [PATCH] knfsd: define new nfsdfs file: portlist - contains list of ports · 80212d59
      NeilBrown 提交于
      This file will list all ports that nfsd has open.
      Default when TCP enabled will be
         ipv4 udp 0.0.0.0 2049
         ipv4 tcp 0.0.0.0 2049
      
      Later, the list of ports will be settable.
      
      'portlist' chosen rather than 'ports', to avoid unnecessary confusion with
      non-mainline patches which created 'ports' with different semantics.
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      80212d59
    • N
      [PATCH] knfsd: remove nfsd_versbits as intermediate storage for desired versions · 6658d3a7
      NeilBrown 提交于
      We have an array 'nfsd_version' which lists the available versions of nfsd,
      and 'nfsd_versions' (poor choice there :-() which lists the currently active
      versions.
      
      Then we have a bitmap - nfsd_versbits which says which versions are wanted.
      The bits in this bitset cause content to be copied from nfsd_version to
      nfsd_versions when nfsd starts.
      
      This patch removes nfsd_versbits and moves information directly from
      nfsd_version to nfsd_versions when requests for version changes arrive.
      
      Note that this doesn't make it possible to change versions while the server is
      running.  This is because serv->sv_xdrsize is calculated when a service is
      created, and used when threads are created, and xdrsize depends on the active
      versions.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6658d3a7
    • N
      [PATCH] knfsd: be more selective in which sockets lockd listens on · 24e36663
      NeilBrown 提交于
      Currently lockd listens on UDP always, and TCP if CONFIG_NFSD_TCP is set.
      
      However as lockd performs services of the client as well, this is a problem.
      If CONFIG_NfSD_TCP is not set, and a tcp mount is used, the server will not be
      able to call back to lockd.
      
      So:
       - add an option to lockd_up saying which protocol is needed
       - Always open sockets for which an explicit port was given, otherwise
         only open a socket of the type required
       - Change nfsd to do one lockd_up per socket rather than one per thread.
      
      This
       - removes the dependancy on CONFIG_NFSD_TCP
       - means that lockd may open sockets other than at startup
       - means that lockd will *not* listen on UDP if the only
         mounts are TCP mount (and nfsd hasn't started).
      
      The latter is the only one that concerns me at all - I don't know if this
      might be a problem with some servers.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      24e36663
    • N
      [PATCH] knfsd: add a callback for when last rpc thread finishes · bc591ccf
      NeilBrown 提交于
      nfsd has some cleanup that it wants to do when the last thread exits, and
      there will shortly be some more.  So collect this all into one place and
      define a callback for an rpc service to call when the service is about to be
      destroyed.
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc591ccf
    • G
      [PATCH] cpumask: add highest_possible_node_id · 0f532f38
      Greg Banks 提交于
      cpumask: add highest_possible_node_id(), analogous to
      highest_possible_processor_id().
      
      [pj@sgi.com: fix typo]
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0f532f38
    • B
      [PATCH] kretprobe spinlock deadlock patch · 99219a3f
      bibo,mao 提交于
      kprobe_flush_task() possibly calls kfree function during holding
      kretprobe_lock spinlock, if kfree function is probed by kretprobe that will
      incur spinlock deadlock.  This patch moves kfree function out scope of
      kretprobe_lock.
      Signed-off-by: Nbibo, mao <bibo.mao@intel.com>
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      99219a3f
    • A
      [PATCH] Kprobes: Make kprobe modules more portable · 3a872d89
      Ananth N Mavinakayanahalli 提交于
      In an effort to make kprobe modules more portable, here is a patch that:
      
      o Introduces the "symbol_name" field to struct kprobe.
        The symbol->address resolution now happens in the kernel in an
        architecture agnostic manner. 64-bit powerpc users no longer have
        to specify the ".symbols"
      o Introduces the "offset" field to struct kprobe to allow a user to
        specify an offset into a symbol.
      o The legacy mechanism of specifying the kprobe.addr is still supported.
        However, if both the kprobe.addr and kprobe.symbol_name are specified,
        probe registration fails with an -EINVAL.
      o The symbol resolution code uses kallsyms_lookup_name(). So
        CONFIG_KPROBES now depends on CONFIG_KALLSYMS
      o Apparantly kprobe modules were the only legitimate out-of-tree user of
        the kallsyms_lookup_name() EXPORT. Now that the symbol resolution
        happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig
      o Modify tcp_probe.c that uses the kprobe interface so as to make it
        work on multiple platforms (in its earlier form, the code wouldn't
        work, say, on powerpc)
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Signed-off-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3a872d89