1. 02 10月, 2006 1 次提交
    • S
      [PATCH] namespaces: utsname: use init_utsname when appropriate · 96b644bd
      Serge E. Hallyn 提交于
      In some places, particularly drivers and __init code, the init utsns is the
      appropriate one to use.  This patch replaces those with a the init_utsname
      helper.
      
      Changes: Removed several uses of init_utsname().  Hope I picked all the
      	right ones in net/ipv4/ipconfig.c.  These are now changed to
      	utsname() (the per-process namespace utsname) in the previous
      	patch (2/7)
      
      [akpm@osdl.org: CIFS fix]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      96b644bd
  2. 01 10月, 2006 1 次提交
  3. 26 9月, 2006 3 次提交
  4. 29 7月, 2006 1 次提交
  5. 10 7月, 2006 1 次提交
  6. 01 7月, 2006 1 次提交
  7. 27 6月, 2006 2 次提交
  8. 01 4月, 2006 1 次提交
  9. 27 3月, 2006 1 次提交
    • B
      [PATCH] kretprobe instance recycled by parent process · c6fd91f0
      bibo mao 提交于
      When kretprobe probes the schedule() function, if the probed process exits
      then schedule() will never return, so some kretprobe instances will never
      be recycled.
      
      In this patch the parent process will recycle retprobe instances of the
      probed function and there will be no memory leak of kretprobe instances.
      Signed-off-by: Nbibo mao <bibo.mao@intel.com>
      Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c6fd91f0
  10. 23 3月, 2006 1 次提交
    • J
      [PATCH] i386: fix uses of user_mode() vs. user_mode_vm() · db753bdf
      Jan Beulich 提交于
      >commit 76381fee
      >Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      >Date:   Thu Jun 23 00:08:46 2005 -0700
      >
      >    [PATCH] xen: x86_64: use more usermode macro
      >
      >    Make use of the user_mode macro where it's possible.  This is useful for Xen
      >    because it will need only to redefine only the macro to a hypervisor call.
      
      I am of the opinion that the above changeset is incomplete, i.e.  it missed
      converting some previous uses of user_mode to user_mode_vm.  While most of
      them could be considered just cosmetical, at least the one in die_nmi
      doesn't appear to be.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      db753bdf
  11. 06 2月, 2006 1 次提交
  12. 13 1月, 2006 3 次提交
  13. 09 1月, 2006 1 次提交
    • M
      [PATCH] Make vm86 support optional · 64ca9004
      Matt Mackall 提交于
      This adds an option to remove vm86 support under CONFIG_EMBEDDED.  Saves
      about 5k.
      
      This version eliminates most of the #ifdefs of the previous version and
      instead uses function stubs in vm86.h.  Also, release_vm86_irqs is moved
      from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs
      can live together.
      
      $ size vmlinux-baseline vmlinux-novm86
         text    data     bss     dec     hex filename
      2920821  523232  190652 3634705  377611 vmlinux-baseline
      2916268  523100  190492 3629860  376324 vmlinux-novm86
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64ca9004
  14. 07 1月, 2006 2 次提交
  15. 01 1月, 2006 1 次提交
  16. 24 11月, 2005 1 次提交
    • J
      [PATCH] kprobes: Fix return probes on sys_execve · 8bf1101b
      Jim Keniston 提交于
      Fix a bug in kprobes that can cause an Oops or even a crash when a return
      probe is installed on one of the following functions: sys_execve,
      do_execve, load_*_binary, flush_old_exec, or flush_thread.  The fix is to
      remove the call to kprobe_flush_task() in flush_thread().  This fix has
      been tested on all architectures for which the return-probes feature has
      been implemented (i386, x86_64, ppc64, ia64).  Please apply.
      
      BACKGROUND
      
      Up to now, we have called kprobe_flush_task() under two situations: when a
      task exits, and when it execs.  Flushing kretprobe_instances on exit is
      correct because (a) do_exit() doesn't return, and (b) one or more
      return-probed functions may be active when a task calls do_exit().  Neither
      is the case for sys_execve() and its callees.
      
      Initially, the mistaken call to kprobe_flush_task() on exec was harmless
      because we put the "real" return address of each active probed function
      back in the stack, just to be safe, when we recycled its
      kretprobe_instance.  When support for ppc64 and ia64 was added, this safety
      measure couldn't be employed, and was eventually dropped even for i386 and
      x86_64.  sys_execve() and its callees were informally blacklisted for
      return probes until this fix was developed.
      Acked-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NJim Keniston <jkenisto@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8bf1101b
  17. 09 11月, 2005 2 次提交
    • N
      [PATCH] sched: resched and cpu_idle rework · 64c7c8f8
      Nick Piggin 提交于
      Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
      confusion, and make their semantics rigid.  Improves efficiency of
      resched_task and some cpu_idle routines.
      
      * In resched_task:
      - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
        and as we hold it during resched_task, then there is no need for an
        atomic test and set there. The only other time this should be set is
        when the task's quantum expires, in the timer interrupt - this is
        protected against because the rq lock is irq-safe.
      
      - If TIF_NEED_RESCHED is set, then we don't need to do anything. It
        won't get unset until the task get's schedule()d off.
      
      - If we are running on the same CPU as the task we resched, then set
        TIF_NEED_RESCHED and no further action is required.
      
      - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
        after TIF_NEED_RESCHED has been set, then we need to send an IPI.
      
      Using these rules, we are able to remove the test and set operation in
      resched_task, and make clear the previously vague semantics of
      POLLING_NRFLAG.
      
      * In idle routines:
      - Enter cpu_idle with preempt disabled. When the need_resched() condition
        becomes true, explicitly call schedule(). This makes things a bit clearer
        (IMO), but haven't updated all architectures yet.
      
      - Many do a test and clear of TIF_NEED_RESCHED for some reason. According
        to the resched_task rules, this isn't needed (and actually breaks the
        assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
        held). So remove that. Generally one less locked memory op when switching
        to the idle thread.
      
      - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
        most polling idle loops. The above resched_task semantics allow it to be
        set until before the last time need_resched() is checked before going into
        a halt requiring interrupt wakeup.
      
        Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
        can be always left set, completely eliminating resched IPIs when rescheduling
        the idle task.
      
        POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Con Kolivas <kernel@kolivas.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64c7c8f8
    • N
      [PATCH] sched: disable preempt in idle tasks · 5bfb5d69
      Nick Piggin 提交于
      Run idle threads with preempt disabled.
      
      Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
      How did it ever work before?
      
      Might fix the CPU hotplugging hang which Nigel Cunningham noted.
      
      We think the bug hits if the idle thread is preempted after checking
      need_resched() and before going to sleep, then the CPU offlined.
      
      After calling stop_machine_run, the CPU eventually returns from preemption and
      into the idle thread and goes to sleep.  The CPU will continue executing
      previous idle and have no chance to call play_dead.
      
      By disabling preemption until we are ready to explicitly schedule, this bug is
      fixed and the idle threads generally become more robust.
      
      From: alexs <ashepard@u.washington.edu>
      
        PPC build fix
      
      From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
      
        MIPS build fix
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5bfb5d69
  18. 27 9月, 2005 1 次提交
  19. 05 9月, 2005 4 次提交
    • Z
      [PATCH] x86: make IOPL explicit · a5201129
      Zachary Amsden 提交于
      The pushf/popf in switch_to are ONLY used to switch IOPL.  Making this
      explicit in C code is more clear.  This pushf/popf pair was added as a
      bugfix for leaking IOPL to unprivileged processes when using
      sysenter/sysexit based system calls (sysexit does not restore flags).
      
      When requesting an IOPL change in sys_iopl(), it is just as easy to change
      the current flags and the flags in the stack image (in case an IRET is
      required), but there is no reason to force an IRET if we came in from the
      SYSENTER path.
      
      This change is the minimal solution for supporting a paravirtualized Linux
      kernel that allows user processes to run with I/O privilege.  Other
      solutions require radical rewrites of part of the low level fault / system
      call handling code, or do not fully support sysenter based system calls.
      
      Unfortunately, this added one field to the thread_struct.  But as a bonus,
      on P4, the fastest time measured for switch_to() went from 312 to 260
      cycles, a win of about 17% in the fast case through this performance
      critical path.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a5201129
    • Z
      [PATCH] x86: more asm cleanups · f2ab4461
      Zachary Amsden 提交于
      Some more assembler cleanups I noticed along the way.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2ab4461
    • Z
      [PATCH] i386: load_tls() fix · e7a2ff59
      Zachary Amsden 提交于
      Subtle fix: load_TLS has been moved after saving %fs and %gs segments to avoid
      creating non-reversible segments.  This could conceivably cause a bug if the
      kernel ever needed to save and restore fs/gs from the NMI handler.  It
      currently does not, but this is the safest approach to avoiding fs/gs
      corruption.  SMIs are safe, since SMI saves the descriptor hidden state.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7a2ff59
    • Z
      [PATCH] i386: inline asm cleanup · 4bb0d3ec
      Zachary Amsden 提交于
      i386 Inline asm cleanup.  Use cr/dr accessor functions.
      
      Also, a potential bugfix.  Also, some CR accessors really should be volatile.
      Reads from CR0 (numeric state may change in an exception handler), writes to
      CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
      re-ordering.  I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
      was not there to begin with, and in no case should kernel memory be clobbered,
      except when doing a TLB flush, which already has memory clobber.
      
      I noticed that page invalidation does not have a memory clobber.  I can't find
      a bug as a result, but there is definitely a potential for a bug here:
      
      #define __flush_tlb_single(addr) \
      	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4bb0d3ec
  20. 28 7月, 2005 1 次提交
  21. 23 7月, 2005 1 次提交
    • L
      Fix up incorrect "unlikely()" on %gs reload in x86 __switch_to · b339a18b
      Linus Torvalds 提交于
      These days %gs is normally the TLS segment, so it's no longer zero.  As
      a result, we shouldn't just assume that %fs/%gs tend to be zero
      together, but test them independently instead.
      
      Also, fix setting of debug registers to use the "next" pointer instead
      of "current".  It so happens that the scheduler will have set the new
      current pointer before calling __switch_to(), but that's just an
      implementation detail.
      b339a18b
  22. 28 6月, 2005 1 次提交
    • A
      [PATCH] seccomp: tsc disable · ffaa8bd6
      Andrea Arcangeli 提交于
      I believe at least for seccomp it's worth to turn off the tsc, not just for
      HT but for the L2 cache too.  So it's up to you, either you turn it off
      completely (which isn't very nice IMHO) or I recommend to apply this below
      patch.
      
      This has been tested successfully on x86-64 against current cogito
      repository (i686 compiles so I didn't bother testing ;).  People selling
      the cpu through cpushare may appreciate this bit for a peace of mind.
      
      There's no way to get any timing info anymore with this applied
      (gettimeofday is forbidden of course).  The seccomp environment is
      completely deterministic so it can't be allowed to get timing info, it has
      to be deterministic so in the future I can enable a computing mode that
      does a parallel computing for each task with server side transparent
      checkpointing and verification that the output is the same from all the 2/3
      seller computers for each task, without the buyer even noticing (for now
      the verification is left to the buyer client side and there's no
      checkpointing, since that would require more kernel changes to track the
      dirty bits but it'll be easy to extend once the basic mode is finished).
      
      Eliminating a cold-cache read of the cr4 global variable will save one
      cacheline during the tlb flush while making the code per-cpu-safe at the
      same time.  Thanks to Mikael Pettersson for noticing the tlb flush wasn't
      per-cpu-safe.
      
      The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but
      it'll be transparent to the switch_to code since the IPI won't make any
      change to the cr4 contents from the point of view of the interrupted code
      and since it's now all per-cpu stuff, it will not race.  So no need to
      disable irqs in switch_to slow path.
      Signed-off-by: NAndrea Arcangeli <andrea@cpushare.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ffaa8bd6
  23. 26 6月, 2005 3 次提交
    • L
      [PATCH] cpu state clean after hot remove · e1367daf
      Li Shaohua 提交于
      Clean CPU states in order to reuse smp boot code for CPU hotplug.
      
      Signed-off-by: Li Shaohua<shaohua.li@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e1367daf
    • L
      [PATCH] init call cleanup · 0bb3184d
      Li Shaohua 提交于
      Trival patch for CPU hotplug.  In CPU identify part, only did cleaup for intel
      CPUs.  Need do for other CPUs if they support S3 SMP.
      
      Signed-off-by: Li Shaohua<shaohua.li@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0bb3184d
    • Z
      [PATCH] i386 CPU hotplug · f3705136
      Zwane Mwaikambo 提交于
      (The i386 CPU hotplug patch provides infrastructure for some work which Pavel
      is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua
      <shaohua.li@intel.com> is doing)
      
      The following provides i386 architecture support for safely unregistering and
      registering processors during runtime, updated for the current -mm tree.  In
      order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the
      cpu_online check in do_IRQ() by modifying fixup_irqs().  The difference being
      that on cpu offline, fixup_irqs() is called before we clear the cpu from
      cpu_online_map and a long delay in order to ensure that we never have any
      queued external interrupts on the APICs.  There are additional changes to s390
      and ppc64 to account for this change.
      
      1) Add CONFIG_HOTPLUG_CPU
      2) disable local APIC timer on dead cpus.
      3) Disable preempt around irq balancing to prevent CPUs going down.
      4) Print irq stats for all possible cpus.
      5) Debugging check for interrupts on offline cpus.
      6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
      7) play_dead() for offline cpus to spin inside.
      8) Handle offline cpus set in flush_tlb_others().
      9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
      10) Implement __cpu_disable() and __cpu_die().
      11) Enable local interrupts in cpu_enable() after fixup_irqs()
      12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
      13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline.
      Signed-off-by: NZwane Mwaikambo <zwane@linuxpower.ca>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f3705136
  24. 24 6月, 2005 4 次提交
    • H
      [PATCH] kprobes: function-return probes · b94cce92
      Hien Nguyen 提交于
      This patch adds function-return probes to kprobes for the i386
      architecture.  This enables you to establish a handler to be run when a
      function returns.
      
      1. API
      
      Two new functions are added to kprobes:
      
      	int register_kretprobe(struct kretprobe *rp);
      	void unregister_kretprobe(struct kretprobe *rp);
      
      2. Registration and unregistration
      
      2.1 Register
      
        To register a function-return probe, the user populates the following
        fields in a kretprobe object and calls register_kretprobe() with the
        kretprobe address as an argument:
      
        kp.addr - the function's address
      
        handler - this function is run after the ret instruction executes, but
        before control returns to the return address in the caller.
      
        maxactive - The maximum number of instances of the probed function that
        can be active concurrently.  For example, if the function is non-
        recursive and is called with a spinlock or mutex held, maxactive = 1
        should be enough.  If the function is non-recursive and can never
        relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
        be enough.  maxactive is used to determine how many kretprobe_instance
        objects to allocate for this particular probed function.  If maxactive <=
        0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 *
        NR_CPUS) else maxactive=NR_CPUS)
      
        For example:
      
          struct kretprobe rp;
          rp.kp.addr = /* entrypoint address */
          rp.handler = /*return probe handler */
          rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */
          register_kretprobe(&rp);
      
        The following field may also be of interest:
      
        nmissed - Initialized to zero when the function-return probe is
        registered, and incremented every time the probed function is entered but
        there is no kretprobe_instance object available for establishing the
        function-return probe (i.e., because maxactive was set too low).
      
      2.2 Unregister
      
        To unregiter a function-return probe, the user calls
        unregister_kretprobe() with the same kretprobe object as registered
        previously.  If a probed function is running when the return probe is
        unregistered, the function will return as expected, but the handler won't
        be run.
      
      3. Limitations
      
      3.1 This patch supports only the i386 architecture, but patches for
          x86_64 and ppc64 are anticipated soon.
      
      3.2 Return probes operates by replacing the return address in the stack
          (or in a known register, such as the lr register for ppc).  This may
          cause __builtin_return_address(0), when invoked from the return-probed
          function, to return the address of the return-probes trampoline.
      
      3.3 This implementation uses the "Multiprobes at an address" feature in
          2.6.12-rc3-mm3.
      
      3.4 Due to a limitation in multi-probes, you cannot currently establish
          a return probe and a jprobe on the same function.  A patch to remove
          this limitation is being tested.
      
      This feature is required by SystemTap (http://sourceware.org/systemtap),
      and reflects ideas contributed by several SystemTap developers, including
      Will Cohen and Ananth Mavinakayanahalli.
      Signed-off-by: NHien Nguyen <hien@us.ibm.com>
      Signed-off-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@laposte.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b94cce92
    • V
      [PATCH] xen: x86: Use more usermode macro · 717b594a
      Vincent Hanquez 提交于
      Use the user_mode macro where it's possible.
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      717b594a
    • V
      [PATCH] xen: x86: Use new macro for debugreg · 1cc6f12e
      Vincent Hanquez 提交于
      Make use of the 2 new macro set_debugreg and get_debugreg.
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1cc6f12e
    • A
      [PATCH] Remove i386_ksyms.c, almost. · 129f6946
      Alexey Dobriyan 提交于
      * EXPORT_SYMBOL's moved to other files
      * #include <linux/config.h>, <linux/module.h> where needed
      * #include's in i386_ksyms.c cleaned up
      * After copy-paste, redundant due to Makefiles rules preprocessor directives
        removed:
      
      	#ifdef CONFIG_FOO
      	EXPORT_SYMBOL(foo);
      	#endif
      
      	obj-$(CONFIG_FOO) += foo.o
      
      * Tiny reformat to fit in 80 columns
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      129f6946
  25. 06 5月, 2005 1 次提交
    • A
      [PATCH] x86 stack initialisation fix · f48d9663
      Alexander Nyberg 提交于
      The recent change fix-crash-in-entrys-restore_all.patch
      
       	childregs->esp = esp;
      
       	p->thread.esp = (unsigned long) childregs;
      -	p->thread.esp0 = (unsigned long) (childregs+1);
      +	p->thread.esp0 = (unsigned long) (childregs+1) - 8;
      
       	p->thread.eip = (unsigned long) ret_from_fork;
      
      introduces an inconsistency between esp and esp0 before the task is run the
      first time.  esp0 is no longer the actual start of the stack, but 8 bytes
      off.
      
      This shows itself clearly in a scenario when a ptracer that is set to also
      ptrace eventual children traces program1 which then clones thread1.  Now
      the ptracer wants to modify the registers of thread1.  The x86 ptrace
      implementation bases it's knowledge about saved user-space registers upon
      p->thread.esp0.  But this will be a few bytes off causing certain writes to
      the kernel stack to overwrite a saved kernel function address making the
      kernel when actually running thread1 jump out into user-space.  Very
      spectacular.
      
      The testcase I've used is:
      /* start with strace -f ./a.out */
      #include <pthread.h>
      #include <stdio.h>
      
      void *do_thread(void *p)
      {
      	for (;;);
      }
      
      int main()
      {
      	pthread_t one;
      	pthread_create(&one, NULL, &do_thread, NULL);
      	for (;;);
      	return 0;
      }
      
      So, my solution is to instead of just adjusting esp0 that creates an
      inconsitent state I adjust where the user-space registers are saved with -8
      bytes.  This gives us the wanted extra bytes on the start of the stack and
      esp0 is now correct.  This solves the issues I saw from the original
      testcase from Mateusz Berezecki and has survived testing here.  I think
      this should go into -mm a round or two first however as there might be some
      cruft around depending on pt_regs lying on the start of the stack.  That
      however would have broken with the first change too!
      
      It's actually a 2-line diff but I had to move the comment of why the -8 bytes
      are there a few lines up. Thanks to Zwane for helping me with this.
      Signed-off-by: NAlexander Nyberg <alexn@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f48d9663