1. 19 4月, 2008 5 次提交
  2. 18 4月, 2008 10 次提交
    • R
      ptrace_signal subroutine · 18c98b65
      Roland McGrath 提交于
      This breaks out the ptrace handling from get_signal_to_deliver into a
      new subroutine.  The actual code there doesn't change, and it gets
      inlined into nearly identical compiled code.  This makes the function
      substantially shorter and thus easier to read, and it nicely isolates
      the ptrace magic.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18c98b65
    • L
      cgroup: fix a race condition in manipulating tsk->cg_list · 0e04388f
      Li Zefan 提交于
      When I ran a test program to fork mass processes and at the same time
      'cat /cgroup/tasks', I got the following oops:
      
        ------------[ cut here ]------------
        kernel BUG at lib/list_debug.c:72!
        invalid opcode: 0000 [#1] SMP
        Pid: 4178, comm: a.out Not tainted (2.6.25-rc9 #72)
        ...
        Call Trace:
         [<c044a5f9>] ? cgroup_exit+0x55/0x94
         [<c0427acf>] ? do_exit+0x217/0x5ba
         [<c0427ed7>] ? do_group_exit+0.65/0x7c
         [<c0427efd>] ? sys_exit_group+0xf/0x11
         [<c0404842>] ? syscall_call+0x7/0xb
         [<c05e0000>] ? init_cyrix+0x2fa/0x479
        ...
        EIP: [<c04df671>] list_del+0x35/0x53 SS:ESP 0068:ebc7df4
        ---[ end trace caffb7332252612b ]---
        Fixing recursive fault but reboot is needed!
      
      After digging into the code and debugging, I finlly found out a race
      situation:
      
      				do_exit()
      				  ->cgroup_exit()
      				    ->if (!list_empty(&tsk->cg_list))
      				        list_del(&tsk->cg_list);
      
        cgroup_iter_start()
          ->cgroup_enable_task_cg_list()
            ->list_add(&tsk->cg_list, ..);
      
      In this case the list won't be deleted though the process has exited.
      
      We got two bug reports in the past, which seem to be the same bug as
      this one:
      	http://lkml.org/lkml/2008/3/5/332
      	http://lkml.org/lkml/2007/10/17/224
      
      Actually sometimes I got oops on list_del, sometimes oops on list_add.
      And I can change my test program a bit to trigger other oops.
      
      The patch has been tested both on x86_32 and x86_64.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e04388f
    • J
      kgdb: always use icache flush for sw breakpoints · 1a9a3e76
      Jason Wessel 提交于
      On the ppc 4xx architecture the instruction cache must be flushed as
      well as the data cache.  This patch just makes it generic for all
      architectures where CACHE_FLUSH_IS_SAFE is set to 1.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a9a3e76
    • J
      kgdb: fix SMP NMI kgdb_handle_exception exit race · 56fb7093
      Jason Wessel 提交于
      Fix the problem of protecting the kgdb handle_exception exit
      which had an NMI race condition, while trying to restore
      normal system operation.
      
      There was a small window after the master processor sets cpu_in_debug
      to zero but before it has set kgdb_active to zero where a
      non-master processor in an SMP system could receive an NMI and
      re-enter the kgdb_wait() loop.
      
      As long as the master processor sets the cpu_in_debug before sending
      the cpu roundup the cpu_in_debug variable can also be used to guard
      against the race condition.
      
      The kgdb_wait() function no longer needs to check
      kgdb_active because it is done in the arch specific code
      and handled along with the nmi traps at the low level.
      This also allows kgdb_wait() to exit correctly if it was
      entered for some unknown reason due to a spurious NMI that
      could not be handled by the arch specific code.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56fb7093
    • J
      kgdb: fix several kgdb regressions · 737a460f
      Jason Wessel 提交于
      kgdb core fixes:
      - Check to see that mm->mmap_cache is not null before calling
        flush_cache_range(), else on arch=ARM it will cause a fatal
        fault.
      
      - Breakpoints should only be restored if they are in the BP_ACTIVE
        state.
      
      - Fix a typo in comments to "kgdb_register_io_module"
      
      x86 kgdb fixes:
      - Fix the x86 arch handler such that on a kill or detach that the
        appropriate cleanup on the single stepping flags gets run.
      
      - Add in the DIE_NMIWATCHDOG call for x86_64
      
      - Touch the nmi watchdog before returning the system to normal
        operation after performing any kind of kgdb operation, else
        the possibility exists to trigger the watchdog.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      737a460f
    • J
      kgdb: fix optional arch functions and probe_kernel_* · b4b8ac52
      Jason Wessel 提交于
      Fix two regressions dealing with the kgdb core.
      
      1) kgdb_skipexception and kgdb_post_primary_code are optional
      functions that are only required on archs that need special exception
      fixups.
      
      2) The kernel address space scope must be set on any probe_kernel_*
      function or archs such as ARCH=arm will not allow access to the kernel
      memory space.  As an example, it is required to allow the full kernel
      address space is when you the kernel debugger to inspect a system
      call.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b4b8ac52
    • J
      kgdb: add x86 HW breakpoints · 64e9ee30
      Jason Wessel 提交于
      Add HW breakpoints into the arch specific portion of x86 kgdb.  In the
      current x86 kernel.org kernels HW breakpoints are changed out in lazy
      fashion because there is no infrastructure around changing them when
      changing to a kernel task or entering the kernel mode via a system
      call.  This lazy approach means that if a user process uses HW
      breakpoints the kgdb will loose out.  This is an acceptable trade off
      because the developer debugging the kernel is assumed to know what is
      going on system wide and would be aware of this trade off.
      
      There is a minor bug fix to the kgdb core so as to correctly call the
      hw breakpoint functions with a valid value from the enum.
      
      There is also a minor change to the x86_64 startup code when using
      early HW breakpoints.  When the debugger is connected, the cpu startup
      code must not zero out the HW breakpoint registers or you cannot hit
      the breakpoints you are interested in, in the first place.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64e9ee30
    • J
      kgdb: print breakpoint removed on exception · 67baf94c
      Jason Wessel 提交于
      If kgdb does remove a breakpoint that had a problem on the recursion
      check, it should also print the address of the breakpoint.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      67baf94c
    • J
      kgdb: clocksource watchdog · 7c3078b6
      Jason Wessel 提交于
      In order to not trip the clocksource watchdog, kgdb must touch the
      clocksource watchdog on the return to normal system run state.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7c3078b6
    • J
      kgdb: core · dc7d5527
      Jason Wessel 提交于
      kgdb core code. Handles the protocol and the arch details.
      
      [ mingo@elte.hu: heavily modified, simplified and cleaned up. ]
      [ xemul@openvz.org: use find_task_by_pid_ns ]
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJan Kiszka <jan.kiszka@web.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      dc7d5527
  3. 17 4月, 2008 13 次提交
  4. 16 4月, 2008 1 次提交
  5. 14 4月, 2008 1 次提交
  6. 11 4月, 2008 2 次提交
    • P
      cgroups: include hierarchy ids in /proc/<pid>/cgroup · b6c3006d
      Paul Menage 提交于
      Extend the /proc/<pid>/cgroup file to include the appropriate hierarchy ID on
      each line.
      
      Currently this ID isn't really needed since a hierarchy can be completely
      identified by the set of subsystems bound to it, but this is likely to change
      in the near future in order to support stateless subsystems and
      merging/rebinding of subsystems.  Getting this change into 2.6.25 reduces the
      need for an API change later.
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6c3006d
    • R
      asmlinkage_protect replaces prevent_tail_call · 54a01510
      Roland McGrath 提交于
      The prevent_tail_call() macro works around the problem of the compiler
      clobbering argument words on the stack, which for asmlinkage functions
      is the caller's (user's) struct pt_regs.  The tail/sibling-call
      optimization is not the only way that the compiler can decide to use
      stack argument words as scratch space, which we have to prevent.
      Other optimizations can do it too.
      
      Until we have new compiler support to make "asmlinkage" binding on the
      compiler's own use of the stack argument frame, we have work around all
      the manifestations of this issue that crop up.
      
      More cases seem to be prevented by also keeping the incoming argument
      variables live at the end of the function.  This makes their original
      stack slots attractive places to leave those variables, so the compiler
      tends not clobber them for something else.  It's still no guarantee, but
      it handles some observed cases that prevent_tail_call() did not.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54a01510
  7. 05 4月, 2008 1 次提交
    • P
      cgroups: add cgroup support for enabling controllers at boot time · 8bab8dde
      Paul Menage 提交于
      The effects of cgroup_disable=foo are:
      
      - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
      - foo isn't visible as an individually mountable subsystem
      
      As a result there will only ever be one call to foo->create(), at init time;
      all processes will stay in this group, and the group will never be mounted on
      a visible hierarchy.  Any additional effects (e.g.  not allocating metadata)
      are up to the foo subsystem.
      
      This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
      but it could easily be extended to do so if any of the early_init systems
      wanted it - I think it would just involve some nastier parameter processing
      since it would occur before the command-line argument parser had been run.
      
      Hugh said:
      
        Ballpark figures, I'm trying to get this question out rather than
        processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
        to the affected paths, booting with cgroup_disable=memory cuts that back to
        1% overhead (due to slightly bigger struct page).
      
        I'm no expert on distros, they may have no interest whatever in
        CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
        without it, or apply the cgroup_disable=memory patches.
      
      Unix bench's execl test result on x86_64 was
      
      == just after boot without mounting any cgroup fs.==
      mem_cgorup=off : Execl Throughput       43.0     3150.1      732.6
      mem_cgroup=on  : Execl Throughput       43.0     2932.6      682.0
      ==
      
      [lizf@cn.fujitsu.com: fix boot option parsing]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bab8dde
  8. 03 4月, 2008 1 次提交
    • M
      markers: use synchronize_sched() · 6496968e
      Mathieu Desnoyers 提交于
      Markers do not mix well with CONFIG_PREEMPT_RCU because it uses
      preempt_disable/enable() and not rcu_read_lock/unlock for minimal
      intrusiveness.  We would need call_sched and sched_barrier primitives.
      
      Currently, the modification (connection and disconnection) of probes
      from markers requires changes to the data structure done in RCU-style :
      a new data structure is created, the pointer is changed atomically, a
      quiescent state is reached and then the old data structure is freed.
      
      The quiescent state is reached once all the currently running
      preempt_disable regions are done running.  We use the call_rcu mechanism
      to execute kfree() after such quiescent state has been reached.
      However, the new CONFIG_PREEMPT_RCU version of call_rcu and rcu_barrier
      does not guarantee that all preempt_disable code regions have finished,
      hence the race.
      
      The "proper" way to do this is to use rcu_read_lock/unlock, but we don't
      want to use it to minimize intrusiveness on the traced system.  (we do
      not want the marker code to call into much of the OS code, because it
      would quickly restrict what can and cannot be instrumented, such as the
      scheduler).
      
      The temporary fix, until we get call_rcu_sched and rcu_barrier_sched in
      mainline, is to use synchronize_sched before each call_rcu calls, so we
      wait for the quiescent state in the system call code path.  It will slow
      down batch marker enable/disable, but will make sure the race is gone.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6496968e
  9. 31 3月, 2008 2 次提交
  10. 29 3月, 2008 2 次提交
  11. 27 3月, 2008 1 次提交
  12. 26 3月, 2008 1 次提交