1. 17 10月, 2008 1 次提交
  2. 28 7月, 2008 1 次提交
    • P
      binfmt_elf_fdpic: Magical stack pointer index, for NEW_AUX_ENT compat. · 9b14ec35
      Paul Mundt 提交于
      While implementing binfmt_elf_fdpic on SH it quickly became apparent
      that SH was the first platform to support both binfmt_elf_fdpic and
      binfmt_elf, as well as the only of the FDPIC platforms to make use of the
      auxvt.
      
      Currently binfmt_elf_fdpic uses a special version of NEW_AUX_ENT() where
      the first argument is the entry displacement after csp has been adjusted,
      being reset after each adjustment. As we have no ability to sort this out
      through the platform's ARCH_DLINFO, this index needs to be managed
      entirely in create_elf_fdpic_tables(). Presently none of the platforms
      that set their own auxvt entries are able to do so through their
      respective ARCH_DLINFOs when using binfmt_elf_fdpic.
      
      In addition to this, binfmt_elf_fdpic has been looking at
      DLINFO_ARCH_ITEMS for the number of architecture-specific entries in the
      auxvt. This is legacy cruft, and is not defined by any platforms in-tree,
      even those that make heavy use of the auxvt. AT_VECTOR_SIZE_ARCH is
      always available, and contains the number that is of interest here, so we
      switch to using that unconditionally as well.
      
      As this has direct bearing on how much stack is used, platforms that have
      configurable (or dynamically adjustable) NEW_AUX_ENT calls need to either
      make AT_VECTOR_SIZE_ARCH more fine-grained, or leave it as a worst-case
      and live with some lost stack space if those entries aren't pushed (some
      platforms may also need to purposely sacrifice some space here for
      alignment considerations, as noted in the code -- although not an issue
      for any FDPIC-capable platform today).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      9b14ec35
  3. 27 7月, 2008 1 次提交
  4. 26 7月, 2008 2 次提交
  5. 07 6月, 2008 1 次提交
  6. 29 4月, 2008 1 次提交
  7. 20 10月, 2007 2 次提交
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
    • P
      pid namespaces: round up the API · a47afb0f
      Pavel Emelianov 提交于
      The set of functions process_session, task_session, process_group and
      task_pgrp is confusing, as the names can be mixed with each other when looking
      at the code for a long time.
      
      The proposals are to
      * equip the functions that return the integer with _nr suffix to
        represent that fact,
      * and to make all functions work with task (not process) by making
        the common prefix of the same name.
      
      For monotony the routines signal_session() and set_signal_session() are
      replaced with task_session_nr() and set_task_session(), especially since they
      are only used with the explicit task->signal dereference.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47afb0f
  8. 17 10月, 2007 3 次提交
    • N
      core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe · 7dc0b22e
      Neil Horman 提交于
      For some time /proc/sys/kernel/core_pattern has been able to set its output
      destination as a pipe, allowing a user space helper to receive and
      intellegently process a core.  This infrastructure however has some
      shortcommings which can be enhanced.  Specifically:
      
      1) The coredump code in the kernel should ignore RLIMIT_CORE limitation
         when core_pattern is a pipe, since file system resources are not being
         consumed in this case, unless the user application wishes to save the core,
         at which point the app is restricted by usual file system limits and
         restrictions.
      
      2) The core_pattern code should be able to parse and pass options to the
         user space helper as an argv array.  The real core limit of the uid of the
         crashing proces should also be passable to the user space helper (since it
         is overridden to zero when called).
      
      3) Some miscellaneous bugs need to be cleaned up (specifically the
         recognition of a recursive core dump, should the user mode helper itself
         crash.  Also, the core dump code in the kernel should not wait for the user
         mode helper to exit, since the same context is responsible for writing to
         the pipe, and a read of the pipe by the user mode helper will result in a
         deadlock.
      
      This patch:
      
      Remove the check of RLIMIT_CORE if core_pattern is a pipe.  In the event that
      core_pattern is a pipe, the entire core will be fed to the user mode helper.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: <martin.pitt@ubuntu.com>
      Cc: <wwoods@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7dc0b22e
    • M
      x86: replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE #define · 5b20cd80
      Mark Nelson 提交于
      Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which
      allows for more flexibility in the note type for the state of 'extended
      floating point' implementations in coredumps.  New note types can now be
      added with an appropriate #define.
      
      This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all
      current users so there's are no change in behaviour.
      
      This will let us use different note types on powerpc for the Altivec/VMX
      state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE
      (signal processing extension) state that some embedded PowerPC cpus from
      Freescale have.
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b20cd80
    • N
      remove ZERO_PAGE · 557ed1fa
      Nick Piggin 提交于
      The commit b5810039 contains the note
      
        A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
        (and thus mapcounted and count towards shared rss).  These writes to
        the struct page could cause excessive cacheline bouncing on big
        systems.  There are a number of ways this could be addressed if it is
        an issue.
      
      And indeed this cacheline bouncing has shown up on large SGI systems.
      There was a situation where an Altix system was essentially livelocked
      tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
      This situation can be avoided in userspace, but it does highlight the
      potential scalability problem with refcounting ZERO_PAGE, and corner
      cases where it can really hurt (we don't want the system to livelock!).
      
      There are several broad ways to fix this problem:
      1. add back some special casing to avoid refcounting ZERO_PAGE
      2. per-node or per-cpu ZERO_PAGES
      3. remove the ZERO_PAGE completely
      
      I will argue for 3. The others should also fix the problem, but they
      result in more complex code than does 3, with little or no real benefit
      that I can see.
      
      Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
      false optimisation: if an application is performance critical, it would
      not be doing many read faults of new memory, or at least it could be
      expected to write to that memory soon afterwards. If cache or memory use
      is critical, it should not be working with a significant number of
      ZERO_PAGEs anyway (a more compact representation of zeroes should be
      used).
      
      As a sanity check -- mesuring on my desktop system, there are never many
      mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
      increase much without it.
      
      When running a make -j4 kernel compile on my dual core system, there are
      about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
      ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
      is torn down without being COWed). So removing ZERO_PAGE will save 1,000
      page faults per second when running kbuild, while keeping it only saves
      less than 1 page clearing operation per second. 1 page clear is cheaper
      than a thousand faults, presumably, so there isn't an obvious loss.
      
      Neither the logical argument nor these basic tests give a guarantee of no
      regressions. However, this is a reasonable opportunity to try to remove
      the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
      we can reintroduce it and just avoid refcounting it.
      
      The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
      much use to them except on benchmarks.  All other users of ZERO_PAGE are
      converted just to use ZERO_PAGE(0) for simplicity. We can look at
      replacing them all and maybe ripping out ZERO_PAGE completely when we are
      more satisfied with this solution.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus "snif" Torvalds <torvalds@linux-foundation.org>
      557ed1fa
  9. 20 7月, 2007 3 次提交
  10. 09 5月, 2007 1 次提交
  11. 03 4月, 2007 1 次提交
  12. 24 3月, 2007 1 次提交
    • D
      [PATCH] FDPIC: fix the /proc/pid/stat representation of executable boundaries · aa289b47
      David Howells 提交于
      Fix the /proc/pid/stat representation of executable boundaries.  It should
      show the bounds of the executable, but instead shows the bounds of the
      loader.
      
      Before the patch is applied, the bug can be seen by examining, say, inetd:
      
      	# ps | grep inetd
      	  610         root          0   S   /usr/sbin/inetd -i
      	# cat /proc/610/maps
      	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3180000-c31dede4 r-xs 00000000 00:0b 14582179  /lib/libuClibc-0.9.28.so
      	c328c000-c328ea00 rw-p 00008000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      	c3290000-c329b6c0 rw-p 00000000 00:00 0
      	c32a0000-c32c0000 rwxp 00000000 00:00 0
      	c32d4000-c32d8000 rw-p 00000000 00:00 0
      	c3394000-c3398000 rw-p 00000000 00:00 0
      	c3458000-c345f464 r-xs 00000000 00:0b 16384612  /usr/sbin/inetd
      	c3470000-c34748f8 rw-p 00004000 00:0b 16384612  /usr/sbin/inetd
      	c34cc000-c34d0000 rw-p 00000000 00:00 0
      	c34d4000-c34d8000 rw-p 00000000 00:00 0
      	c34d8000-c34dc000 rw-p 00000000 00:00 0
      	# cat /proc/610/stat
      	610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 8 0 0 19 0 1 0 94392000718
      	950272 0 4294967295 3233480704 3233523592 3274440352 3274439976
       	3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0
      
      The code boundaries are 3233480704 to 3233523592, which are:
      
      	(gdb) p/x 3233480704
      	$1 = 0xc0bb0000
      	(gdb) p/x 3233523592
      	$2 = 0xc0bba788
      
      Which corresponds to this line in the maps file:
      
      	c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157  /lib/ld-uClibc-0.9.28.so
      
      Which is wrong.  After the patch is applied, the maps file is pretty much
      identical (there's some minor shuffling of the location of some of the
      anonymous VMAs), but the stat file is now:
      
      	# cat /proc/610/stat
      	610 (inetd) S 1 610 610 0 -1 256 0 0 0 0 0 7 0 0 18 0 1 0 94392000722
      	950272 0 4294967295 3276111872 3276141668 3274440352 3274439976
      	3273467584 0 0 4096 90115 3221712796 0 0 17 0 0 0 0
      
      The code boundaries are then 3276111872 to 3276141668, which are:
      
      	(gdb) p/x 3276111872
      	$1 = 0xc3458000
      	(gdb) p/x 3276141668
      	$2 = 0xc345f464
      
      And these correspond to this line in the maps file instead:
      
      	c3458000-c345f464 r-xs 00000000 00:0b 16384612  /usr/sbin/inetd
      
      Which is now correct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa289b47
  13. 12 2月, 2007 1 次提交
  14. 27 1月, 2007 1 次提交
  15. 13 12月, 2006 1 次提交
  16. 09 12月, 2006 2 次提交
  17. 08 12月, 2006 1 次提交
  18. 30 9月, 2006 1 次提交
  19. 11 7月, 2006 3 次提交
  20. 23 6月, 2006 1 次提交
  21. 25 3月, 2006 1 次提交
  22. 11 1月, 2006 1 次提交
  23. 07 11月, 2005 1 次提交
  24. 30 10月, 2005 1 次提交
  25. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4