1. 20 10月, 2008 1 次提交
    • K
      coredump_filter: add hugepage dumping · e575f111
      KOSAKI Motohiro 提交于
      Presently hugepage's vma has a VM_RESERVED flag in order not to be
      swapped.  But a VM_RESERVED vma isn't core dumped because this flag is
      often used for some kernel vmas (e.g.  vmalloc, sound related).
      
      Thus hugepages are never dumped and it can't be debugged easily.  Many
      developers want hugepages to be included into core-dump.
      
      However, We can't read generic VM_RESERVED area because this area is often
      IO mapping area.  then these area reading may change device state.  it is
      definitly undesiable side-effect.
      
      So adding a hugepage specific bit to the coredump filter is better.  It
      will be able to hugepage core dumping and doesn't cause any side-effect to
      any i/o devices.
      
      In additional, libhugetlb use hugetlb private mapping pages as anonymous
      page.  Then, hugepage private mapping pages should be core dumped by
      default.
      
      Then, /proc/[pid]/core_dump_filter has two new bits.
      
       - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes)
       - bit 6 mean hugetlb shared mapping pages are dumped or not.  (default: no)
      
      I tested by following method.
      
      % ulimit -c unlimited
      % ./crash_hugepage  50
      % ./crash_hugepage  50  -p
      % ls -lh
      % gdb ./crash_hugepage core
      %
      % echo 0x43 > /proc/self/coredump_filter
      % ./crash_hugepage  50
      % ./crash_hugepage  50  -p
      % ls -lh
      % gdb ./crash_hugepage core
      
      #include <stdlib.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #include <string.h>
      
      #include "hugetlbfs.h"
      
      int main(int argc, char** argv){
      	char* p;
      	int ch;
      	int mmap_flags = MAP_SHARED;
      	int fd;
      	int nr_pages;
      
      	while((ch = getopt(argc, argv, "p")) != -1) {
      		switch (ch) {
      		case 'p':
      			mmap_flags &= ~MAP_SHARED;
      			mmap_flags |= MAP_PRIVATE;
      			break;
      		default:
      			/* nothing*/
      			break;
      		}
      	}
      	argc -= optind;
      	argv += optind;
      
      	if (argc == 0){
      		printf("need # of pages\n");
      		exit(1);
      	}
      
      	nr_pages = atoi(argv[0]);
      	if (nr_pages < 2) {
      		printf("nr_pages must >2\n");
      		exit(1);
      	}
      
      	fd = hugetlbfs_unlinked_fd();
      	p = mmap(NULL, nr_pages * gethugepagesize(),
      		 PROT_READ|PROT_WRITE, mmap_flags, fd, 0);
      
      	sleep(2);
      
      	*(p + gethugepagesize()) = 1; /* COW */
      	sleep(2);
      
      	/* crash! */
      	*(int*)0 = 1;
      
      	return 0;
      }
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NKawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: William Irwin <wli@holomorphy.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e575f111
  2. 16 10月, 2008 1 次提交
  3. 27 7月, 2008 1 次提交
  4. 26 7月, 2008 2 次提交
  5. 25 7月, 2008 1 次提交
    • N
      ELF loader support for auxvec base platform string · 483fad1c
      Nathan Lynch 提交于
      Some IBM POWER-based platforms have the ability to run in a
      mode which mostly appears to the OS as a different processor from the
      actual hardware.  For example, a Power6 system may appear to be a
      Power5+, which makes the AT_PLATFORM value "power5+".  This means that
      programs are restricted to the ISA supported by Power5+;
      Power6-specific instructions are treated as illegal.
      
      However, some applications (virtual machines, optimized libraries) can
      benefit from knowledge of the underlying CPU model.  A new aux vector
      entry, AT_BASE_PLATFORM, will denote the actual hardware.  For
      example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
      will be "power5+" and AT_BASE_PLATFORM will be "power6".  The idea is
      that AT_PLATFORM indicates the instruction set supported, while
      AT_BASE_PLATFORM indicates the underlying microarchitecture.
      
      If the architecture has defined ELF_BASE_PLATFORM, copy that value to
      the user stack in the same manner as ELF_PLATFORM.
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      483fad1c
  6. 23 7月, 2008 1 次提交
    • J
      execve filename: document and export via auxiliary vector · 65191087
      John Reiser 提交于
      The Linux kernel puts the filename argument of execve() into the new
      address space.  Many developers are surprised to learn this.  Those who
      know and could use it, object "But it's not documented."
      
      Those who want to use it dislike the expression
        (char *)(1+ strlen(env[-1+ n_env]) + env[-1+ n_env])
      because it requires locating the last original environment variable,
      and assumes that the filename follows the characters.
      
      This patch documents the insertion of the filename, and makes it easier
      to find by adding a new tag AT_EXECFN in the ElfXX_auxv_t; see <elf.h>.
      
      In many cases readlink("/proc/self/exe",) gives the same answer.  But if
      all the original pages get unmapped, then the kernel erases the symlink
      for /proc/self/exe.  This can happen when a program decompressor does a
      good job of cleaning up after uncompressing directly to memory, so that
      the address space of the target program looks the same as if compression
      had never happened.  One example is http://upx.sourceforge.net .
      
      One notable use of the underlying concept (what path containED the
      executable) is glibc expanding $ORIGIN in DT_RUNPATH.  In practice for
      the near term, it may be a good idea for user-mode code to use both
      /proc/self/exe and AT_EXECFN as fall-back methods for each other.
      /proc/self/exe can fail due to unmapping, AT_EXECFN can fail because it
      won't be present on non-new systems.  The auxvec or {AT_EXECFN}.d_val
      also can get overwritten, although in nearly all cases this would be the
      result of a bug.
      
      The runtime cost is one NEW_AUX_ENT using two words of stack space.  The
      underlying value is maintained already as bprm->exec; setup_arg_pages()
      in fs/exec.c slides it for stack_shift, etc.
      Signed-off-by: NJohn Reiser <jreiser@BitWagon.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65191087
  7. 17 6月, 2008 1 次提交
  8. 17 5月, 2008 2 次提交
  9. 29 4月, 2008 2 次提交
  10. 25 4月, 2008 1 次提交
    • A
      [PATCH] sanitize handling of shared descriptor tables in failing execve() · fd8328be
      Al Viro 提交于
      * unshare_files() can fail; doing it after irreversible actions is wrong
        and de_thread() is certainly irreversible.
      * since we do it unconditionally anyway, we might as well do it in do_execve()
        and save ourselves the PITA in binfmt handlers, etc.
      * while we are at it, binfmt_som actually leaked files_struct on failure.
      
      As a side benefit, unshare_files(), put_files_struct() and reset_files_struct()
      become unexported.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fd8328be
  11. 05 3月, 2008 1 次提交
  12. 09 2月, 2008 2 次提交
    • A
      Remove a.out interpreter support in ELF loader · d20894a2
      Andi Kleen 提交于
      Following the deprecation schedule the a.out ELF interpreter support
      is removed now with this patch. a.out ELF interpreters were an transition
      feature for moving a.out systems to ELF, but they're unlikely to be still
      needed. Pure a.out systems will still work of course. This allows to
      simplify the hairy ELF loader.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d20894a2
    • D
      aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT · 7fa30315
      David Howells 提交于
      Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.
      
      Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
      be permitted to go looking for A.OUT libraries to load in such a case.  Not
      only that, but under such conditions A.OUT core dumps are not produced either.
      
      To make this work, this patch also does the following:
      
       (1) Makes the existence of the contents of linux/a.out.h contingent on
           CONFIG_ARCH_SUPPORTS_AOUT.
      
       (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
           core dumping code.
      
       (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline.  This
           is then included only where needed.  This means that this bit of arch
           code will be stored in the appropriate A.OUT binfmt module rather than
           the core kernel.
      
       (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
           needed) and FRV.
      
      This patch depends on the previous patch to move STACK_TOP[_MAX] out of
      asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
      format is available.
      
      [jdike@addtoit.com: uml: re-remove accidentally restored code]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NJeff Dike <jdike@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7fa30315
  13. 07 2月, 2008 1 次提交
  14. 04 2月, 2008 1 次提交
  15. 30 1月, 2008 6 次提交
    • A
      x86: remove iBCS support · 612a95b4
      Andi Kleen 提交于
      ibcs2 support has never been supported on 2.6 kernels as far as I know,
      and if it has it must have been an external patch.  Anyways, if anybody
      applies an external patch they could as well readd the ibcs checking
      code to the ELF loader in the same patch.  But there is no reason to
      keep this code running in all Linux kernels.  This will save at least
      two strcmps each ELF execution.
      
      No deprecation period because it could not have been used anyway.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      612a95b4
    • R
      elf core dump: notes user_regset · 4206d3aa
      Roland McGrath 提交于
      This modifies the ELF core dump code under #ifdef CORE_DUMP_USE_REGSET.
      It changes nothing when this macro is not defined.  When it's #define'd
      by some arch header (e.g. asm/elf.h), the arch must support the
      user_regset (linux/regset.h) interface for reading thread state.
      
      This provides an alternate version of note segment writing that is based
      purely on the user_regset interfaces.  When CORE_DUMP_USE_REGSET is set,
      the arch need not define macros such as ELF_CORE_COPY_REGS and ELF_ARCH.
      All that information is taken from the user_regset data structures.
      The core dumps come out exactly the same if arch's definitions for its
      user_regset details are correct.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4206d3aa
    • R
      elf core dump: notes reorg · 3aba481f
      Roland McGrath 提交于
      This pulls out the code for writing the notes segment of an ELF core dump
      into separate functions.  This cleanly isolates into one cluster of
      functions everything that deals with the note formats and the hooks into
      arch code to fill them.  The top-level elf_core_dump function itself now
      deals purely with the generic ELF format and the memory segments.
      
      This only moves code around into functions that can be inlined away.
      It should not change any behavior at all.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3aba481f
    • A
      x86: PIE executable randomization, checkpatch fixes · bb1ad820
      Andrew Morton 提交于
      #39: FILE: arch/ia64/ia32/binfmt_elf32.c:229:
      +elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused)
      
      WARNING: no space between function name and open parenthesis '('
      #39: FILE: arch/ia64/ia32/binfmt_elf32.c:229:
      +elf32_map (struct file *filep, unsigned long addr, struct elf_phdr *eppnt, int prot, int type, unsigned long unused)
      
      WARNING: line over 80 characters
      #67: FILE: arch/x86/kernel/sys_x86_64.c:80:
      +			new_begin = randomize_range(*begin, *begin + 0x02000000, 0);
      
      ERROR: use tabs not spaces
      #110: FILE: arch/x86/kernel/sys_x86_64.c:185:
      + ^I        mm->cached_hole_size = 0;$
      
      ERROR: use tabs not spaces
      #111: FILE: arch/x86/kernel/sys_x86_64.c:186:
      + ^I^Imm->free_area_cache = mm->mmap_base;$
      
      ERROR: use tabs not spaces
      #112: FILE: arch/x86/kernel/sys_x86_64.c:187:
      + ^I}$
      
      ERROR: use tabs not spaces
      #141: FILE: arch/x86/kernel/sys_x86_64.c:216:
      + ^I^I/* remember the largest hole we saw so far */$
      
      ERROR: use tabs not spaces
      #142: FILE: arch/x86/kernel/sys_x86_64.c:217:
      + ^I^Iif (addr + mm->cached_hole_size < vma->vm_start)$
      
      ERROR: use tabs not spaces
      #143: FILE: arch/x86/kernel/sys_x86_64.c:218:
      + ^I^I        mm->cached_hole_size = vma->vm_start - addr;$
      
      ERROR: use tabs not spaces
      #157: FILE: arch/x86/kernel/sys_x86_64.c:232:
      +  ^Imm->free_area_cache = TASK_UNMAPPED_BASE;$
      
      ERROR: need a space before the open parenthesis '('
      #291: FILE: arch/x86/mm/mmap_64.c:101:
      +	} else if(mmap_is_legacy()) {
      
      WARNING: braces {} are not necessary for single statement blocks
      #302: FILE: arch/x86/mm/mmap_64.c:112:
      +	if (current->flags & PF_RANDOMIZE) {
      +		mm->mmap_base += ((long)rnd) << PAGE_SHIFT;
      +	}
      
      WARNING: line over 80 characters
      #314: FILE: fs/binfmt_elf.c:48:
      +static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long);
      
      WARNING: no space between function name and open parenthesis '('
      #314: FILE: fs/binfmt_elf.c:48:
      +static unsigned long elf_map (struct file *, unsigned long, struct elf_phdr *, int, int, unsigned long);
      
      WARNING: line over 80 characters
      #429: FILE: fs/binfmt_elf.c:438:
      +					   eppnt, elf_prot, elf_type, total_size);
      
      ERROR: need space after that ',' (ctx:VxV)
      #480: FILE: fs/binfmt_elf.c:939:
      +				elf_prot, elf_flags,0);
       				                   ^
      
      total: 9 errors, 7 warnings, 461 lines checked
      Your patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      bb1ad820
    • J
      x86: PIE executable randomization · cc503c1b
      Jiri Kosina 提交于
      main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries
      onto a random address (in cases in which mmap() is allowed to perform a
      randomization).
      
      The code has been extraced from Ingo's exec-shield patch
      http://people.redhat.com/mingo/exec-shield/
      
      [akpm@linux-foundation.org: fix used-uninitialsied warning]
      [kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling]
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      cc503c1b
    • J
      x86: randomize brk · c1d171a0
      Jiri Kosina 提交于
      Randomize the location of the heap (brk) for i386 and x86_64.  The range is
      randomized in the range starting at current brk location up to 0x02000000
      offset for both architectures.  This, together with
      pie-executable-randomization.patch and
      pie-executable-randomization-fix.patch, should make the address space
      randomization on i386 and x86_64 complete.
      
      Arjan says:
      
      This is known to break older versions of some emacs variants, whose dumper
      code assumed that the last variable declared in the program is equal to the
      start of the dynamically allocated memory region.
      
      (The dumper is the code where emacs effectively dumps core at the end of it's
      compilation stage; this coredump is then loaded as the main program during
      normal use)
      
      iirc this was 5 years or so; we found this way back when I was at RH and we
      first did the security stuff there (including this brk randomization).  It
      wasn't all variants of emacs, and it got fixed as a result (I vaguely remember
      that emacs already had code to deal with it for other archs/oses, just
      ifdeffed wrongly).
      
      It's a rare and wrong assumption as a general thing, just on x86 it mostly
      happened to be true (but to be honest, it'll break too if gcc does
      something fancy or if the linker does a non-standard order).  Still its
      something we should at least document.
      
      Note 2: afaik it only broke the emacs *build*.  I'm not 100% sure about that
      (it IS 5 years ago) though.
      
      [ akpm@linux-foundation.org: deuglification ]
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c1d171a0
  16. 08 1月, 2008 1 次提交
  17. 20 10月, 2007 2 次提交
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
    • P
      pid namespaces: round up the API · a47afb0f
      Pavel Emelianov 提交于
      The set of functions process_session, task_session, process_group and
      task_pgrp is confusing, as the names can be mixed with each other when looking
      at the code for a long time.
      
      The proposals are to
      * equip the functions that return the integer with _nr suffix to
        represent that fact,
      * and to make all functions work with task (not process) by making
        the common prefix of the same name.
      
      For monotony the routines signal_session() and set_signal_session() are
      replaced with task_session_nr() and set_task_session(), especially since they
      are only used with the explicit task->signal dereference.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47afb0f
  18. 17 10月, 2007 7 次提交
    • F
      Break ELF_PLATFORM and stack pointer randomization dependency · d68c9d6a
      Franck Bui-Huu 提交于
      Currently arch_align_stack() is used by fs/binfmt_elf.c to randomize
      stack pointer inside a page. But this happens only if ELF_PLATFORM
      symbol is defined.
      
      ELF_PLATFORM is normally set if the architecture wants ld.so to load
      implementation specific libraries for optimization. And currently a
      lot of architectures just yield this symbol to NULL.
      
      This is the case for MIPS architecture where ELF_PLATFORM is NULL but
      arch_align_stack() has been redefined to do stack inside page
      randomization. So in this case no randomization is actually done.
      
      This patch breaks this dependency which seems to be useless and allows
      platforms such MIPS to do the randomization.
      Signed-off-by: NFranck Bui-Huu <fbuihuu@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d68c9d6a
    • O
      increase AT_VECTOR_SIZE to terminate saved_auxv properly · 4f9a58d7
      Olaf Hering 提交于
      include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO.  fs/binfmt_elf.c
      has 14 unconditional NEW_AUX_ENT entries and 2 conditional NEW_AUX_ENT
      entries.  So in the worst case, saved_auxv does not get an AT_NULL entry at
      the end.
      
      The saved_auxv array must be terminated with an AT_NULL entry.  Make the
      size of mm_struct->saved_auxv arch dependend, based on the number of
      ARCH_DLINFO entries.
      Signed-off-by: NOlaf Hering <olh@suse.de>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f9a58d7
    • R
      Add MMF_DUMP_ELF_HEADERS · 82df3973
      Roland McGrath 提交于
      This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter.
      This dumps the first page (only) of a private file mapping if it appears to
      be a mapping of an ELF file.  Including these pages in the core dump may
      give sufficient identifying information to associate the original DSO and
      executable file images and their debugging information with a core file in
      a generic way just from its contents (e.g.  when those binaries were built
      with ld --build-id).  I expect this to become the default behavior
      eventually.  Existing versions of gdb can be confused by the core dumps it
      creates, so it won't enabled by default for some time to come.  Soon many
      people will have systems with a gdb that handle these dumps, so they can
      arrange to set the bit at boot and have it inherited system-wide.
      
      This also cleans up the checking of the MMF_DUMP_* flag bits, which did not
      need to be using atomic macros.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82df3973
    • A
      Deprecate a.out ELF interpreters · 8e9073ed
      Andi Kleen 提交于
      The Linux ELF loader is quite complicated and messy code (that could
      probably need a rewrite, but that's a different chapter).  One particular
      messy part in it is the support for non ELF a.out ld.sos.  This was
      originally added to make transition from a.out to ELF easier because an
      a.out ELF ld.so could be still build using an older a.out toolkit.  But by
      now that should be fully obsolete and removing it would clean up
      binfmt_elf.c up a bit.
      
      I propose to deprecate this support and remove for 2.6.25.
      
      Drawback is that someone still runs their system with a.out ld.so
      they would need to update the ld.so when updating to a new kernel.
      
      This patch just adds an entry to the deprecation file and a printk
      warning users.
      
      [akpm@linux-foundation.org: better warning message]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e9073ed
    • N
      core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe · 7dc0b22e
      Neil Horman 提交于
      For some time /proc/sys/kernel/core_pattern has been able to set its output
      destination as a pipe, allowing a user space helper to receive and
      intellegently process a core.  This infrastructure however has some
      shortcommings which can be enhanced.  Specifically:
      
      1) The coredump code in the kernel should ignore RLIMIT_CORE limitation
         when core_pattern is a pipe, since file system resources are not being
         consumed in this case, unless the user application wishes to save the core,
         at which point the app is restricted by usual file system limits and
         restrictions.
      
      2) The core_pattern code should be able to parse and pass options to the
         user space helper as an argv array.  The real core limit of the uid of the
         crashing proces should also be passable to the user space helper (since it
         is overridden to zero when called).
      
      3) Some miscellaneous bugs need to be cleaned up (specifically the
         recognition of a recursive core dump, should the user mode helper itself
         crash.  Also, the core dump code in the kernel should not wait for the user
         mode helper to exit, since the same context is responsible for writing to
         the pipe, and a read of the pipe by the user mode helper will result in a
         deadlock.
      
      This patch:
      
      Remove the check of RLIMIT_CORE if core_pattern is a pipe.  In the event that
      core_pattern is a pipe, the entire core will be fed to the user mode helper.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: <martin.pitt@ubuntu.com>
      Cc: <wwoods@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7dc0b22e
    • M
      x86: replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE #define · 5b20cd80
      Mark Nelson 提交于
      Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which
      allows for more flexibility in the note type for the state of 'extended
      floating point' implementations in coredumps.  New note types can now be
      added with an appropriate #define.
      
      This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all
      current users so there's are no change in behaviour.
      
      This will let us use different note types on powerpc for the Altivec/VMX
      state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE
      (signal processing extension) state that some embedded PowerPC cpus from
      Freescale have.
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b20cd80
    • N
      remove ZERO_PAGE · 557ed1fa
      Nick Piggin 提交于
      The commit b5810039 contains the note
      
        A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
        (and thus mapcounted and count towards shared rss).  These writes to
        the struct page could cause excessive cacheline bouncing on big
        systems.  There are a number of ways this could be addressed if it is
        an issue.
      
      And indeed this cacheline bouncing has shown up on large SGI systems.
      There was a situation where an Altix system was essentially livelocked
      tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
      This situation can be avoided in userspace, but it does highlight the
      potential scalability problem with refcounting ZERO_PAGE, and corner
      cases where it can really hurt (we don't want the system to livelock!).
      
      There are several broad ways to fix this problem:
      1. add back some special casing to avoid refcounting ZERO_PAGE
      2. per-node or per-cpu ZERO_PAGES
      3. remove the ZERO_PAGE completely
      
      I will argue for 3. The others should also fix the problem, but they
      result in more complex code than does 3, with little or no real benefit
      that I can see.
      
      Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
      false optimisation: if an application is performance critical, it would
      not be doing many read faults of new memory, or at least it could be
      expected to write to that memory soon afterwards. If cache or memory use
      is critical, it should not be working with a significant number of
      ZERO_PAGEs anyway (a more compact representation of zeroes should be
      used).
      
      As a sanity check -- mesuring on my desktop system, there are never many
      mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
      increase much without it.
      
      When running a make -j4 kernel compile on my dual core system, there are
      about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
      ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
      is torn down without being COWed). So removing ZERO_PAGE will save 1,000
      page faults per second when running kbuild, while keeping it only saves
      less than 1 page clearing operation per second. 1 page clear is cheaper
      than a thousand faults, presumably, so there isn't an obvious loss.
      
      Neither the logical argument nor these basic tests give a guarantee of no
      regressions. However, this is a reasonable opportunity to try to remove
      the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
      we can reintroduce it and just avoid refcounting it.
      
      The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
      much use to them except on benchmarks.  All other users of ZERO_PAGE are
      converted just to use ZERO_PAGE(0) for simplicity. We can look at
      replacing them all and maybe ripping out ZERO_PAGE completely when we are
      more satisfied with this solution.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus "snif" Torvalds <torvalds@linux-foundation.org>
      557ed1fa
  19. 19 9月, 2007 1 次提交
    • M
      [POWERPC] spufs: Cleanup ELF coredump extra notes logic · e5501492
      Michael Ellerman 提交于
      To start with, arch_notes_size() etc. is a little too ambiguous a name for
      my liking, so change the function names to be more explicit.
      
      Calling through macros is ugly, especially with hidden parameters, so don't
      do that, call the routines directly.
      
      Use ARCH_HAVE_EXTRA_ELF_NOTES as the only flag, and based on it decide
      whether we want the extern declarations or the empty versions.
      
      Since we have empty routines, actually use them in the coredump code to
      save a few #ifdefs.
      
      We want to change the handling of foffset so that the write routine updates
      foffset as it goes, instead of using file->f_pos (so that writing to a pipe
      works).  So pass foffset to the write routine, and for now just set it to
      file->f_pos at the end of writing.
      
      It should also be possible for the write routine to fail, so change it to
      return int and treat a non-zero return as failure.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e5501492
  20. 22 7月, 2007 1 次提交
  21. 20 7月, 2007 2 次提交
  22. 17 7月, 2007 2 次提交