1. 07 6月, 2012 4 次提交
  2. 02 6月, 2012 24 次提交
  3. 01 6月, 2012 12 次提交
    • S
      ftrace/x86: Do not change stacks in DEBUG when calling lockdep · 5963e317
      Steven Rostedt 提交于
      When both DYNAMIC_FTRACE and LOCKDEP are set, the TRACE_IRQS_ON/OFF
      will call into the lockdep code. The lockdep code can call lots of
      functions that may be traced by ftrace. When ftrace is updating its
      code and hits a breakpoint, the breakpoint handler will call into
      lockdep. If lockdep happens to call a function that also has a breakpoint
      attached, it will jump back into the breakpoint handler resetting
      the stack to the debug stack and corrupt the contents currently on
      that stack.
      
      The 'do_sym' call that calls do_int3() is protected by modifying the
      IST table to point to a different location if another breakpoint is
      hit. But the TRACE_IRQS_OFF/ON are outside that protection, and if
      a breakpoint is hit from those, the stack will get corrupted, and
      the kernel will crash:
      
      [ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      [ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff
      [ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0
      [ 1013.298832] Oops: 0010 [#1] PREEMPT SMP
      [ 1013.310600] CPU 2
      [ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
      [ 1013.401848]
      [ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30
      [ 1013.437943] RIP: 8eb8:[<ffff88014630a000>]  [<ffff88014630a000>] 0xffff880146309fff
      [ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408  EFLAGS: 00010046
      [ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000
      [ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20
      [ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000
      [ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000
      [ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40
      [ 1013.586108] FS:  0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000
      [ 1013.609458] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0
      [ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0)
      [ 1013.717028] Stack:
      [ 1013.724131]  ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000
      [ 1013.745918]  cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627
      [ 1013.767870]  ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb
      [ 1013.790021] Call Trace:
      [ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff
      [ 1013.861443] RIP  [<ffff88014630a000>] 0xffff880146309fff
      [ 1013.884466]  RSP <ffff88014780f408>
      [ 1013.901507] CR2: 0000000000000002
      
      The solution was to reuse the NMI functions that change the IDT table to make the debug
      stack keep its current stack (in kernel mode) when hitting a breakpoint:
      
        call debug_stack_set_zero
        TRACE_IRQS_ON
        call debug_stack_reset
      
      If the TRACE_IRQS_ON happens to hit a breakpoint then it will keep the current stack
      and not crash the box.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5963e317
    • S
      x86: Allow nesting of the debug stack IDT setting · f8988175
      Steven Rostedt 提交于
      When the NMI handler runs, it checks if it preempted a debug handler
      and if that handler is using the debug stack. If it is, it changes the
      IDT table not to update the stack, otherwise it will reset the debug
      stack and corrupt the debug handler it preempted.
      
      Now that ftrace uses breakpoints to change functions from nops to
      callers, many more places may hit a breakpoint. Unfortunately this
      includes some of the calls that lockdep performs. Which causes issues
      with the debug stack. It too needs to change the debug stack before
      tracing (if called from the debug handler).
      
      Allow the debug_stack_set_zero() and debug_stack_reset() to be nested
      so that the debug handlers can take advantage of them too.
      
      [ Used this_cpu_*() over __get_cpu_var() as suggested by H. Peter Anvin ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f8988175
    • S
      x86: Reset the debug_stack update counter · c0525a69
      Steven Rostedt 提交于
      When an NMI goes off and it sees that it preempted the debug stack,
      to keep the debug stack safe, it changes the IDT to point to one that
      does not modify the stack on breakpoint (to allow breakpoints in NMIs).
      
      But the variable that gets set to know to undo it on exit never gets
      cleared on exit. Thus every NMI will reset it on exit the first time
      it is done even if it does not need to be reset.
      
      [ Added H. Peter Anvin's suggestion to use this_cpu_read/write ]
      
      Cc: <stable@vger.kernel.org> # v3.3
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c0525a69
    • S
      ftrace: Use breakpoint method to update ftrace caller · 8a4d0a68
      Steven Rostedt 提交于
      On boot up and module load, it is fine to modify the code directly,
      without the use of breakpoints. This is because boot up modification
      is done before SMP is initialized, thus the modification is serial,
      and module load is done before the module executes.
      
      But after that we must use a SMP safe method to modify running code.
      Otherwise, if we are running the function tracer and update its
      function (by starting off the stack tracer, or perf tracing)
      the change of the function called by the ftrace trampoline is done
      directly. If this is being executed on another CPU, that CPU may
      take a GPF and crash the kernel.
      
      The breakpoint method is used to change the nops at all the functions, but
      the change of the ftrace callback handler itself was still using a
      direct modification. If tracing was enabled and the function callback
      was changed then another CPU could fault if it was currently calling
      the original callback. This modification must use the breakpoint method
      too.
      
      Note, the direct method is still used for boot up and module load.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8a4d0a68
    • S
      ftrace: Synchronize variable setting with breakpoints · a192cd04
      Steven Rostedt 提交于
      When the function tracer starts modifying the code via breakpoints
      it sets a variable (modifying_ftrace_code) to inform the breakpoint
      handler to call the ftrace int3 code.
      
      But there's no synchronization between setting this code and the
      handler, thus it is possible for the handler to be called on another
      CPU before it sees the variable. This will cause a kernel crash as
      the int3 handler will not know what to do with it.
      
      I originally added smp_mb()'s to force the visibility of the variable
      but H. Peter Anvin suggested that I just make it atomic.
      
      [ Added comments as suggested by Peter Zijlstra ]
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a192cd04
    • C
      syscalls, x86: add __NR_kcmp syscall · d97b46a6
      Cyrill Gorcunov 提交于
      While doing the checkpoint-restore in the user space one need to determine
      whether various kernel objects (like mm_struct-s of file_struct-s) are
      shared between tasks and restore this state.
      
      The 2nd step can be solved by using appropriate CLONE_ flags and the
      unshare syscall, while there's currently no ways for solving the 1st one.
      
      One of the ways for checking whether two tasks share e.g.  mm_struct is to
      provide some mm_struct ID of a task to its proc file, but showing such
      info considered to be not that good for security reasons.
      
      Thus after some debates we end up in conclusion that using that named
      'comparison' syscall might be the best candidate.  So here is it --
      __NR_kcmp.
      
      It takes up to 5 arguments - the pids of the two tasks (which
      characteristics should be compared), the comparison type and (in case of
      comparison of files) two file descriptors.
      
      Lookups for pids are done in the caller's PID namespace only.
      
      At moment only x86 is supported and tested.
      
      [akpm@linux-foundation.org: fix up selftests, warnings]
      [akpm@linux-foundation.org: include errno.h]
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Valdis.Kletnieks@vt.edu
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d97b46a6
    • A
      um: properly check all process' threads for a live mm · 2c922c51
      Anton Vorontsov 提交于
      kill_off_processes() might miss a valid process, this is because checking
      for process->mm is not enough.  Process' main thread may exit or detach
      its mm via use_mm(), but other threads may still have a valid mm.
      
      To catch this we use find_lock_task_mm(), which walks up all threads and
      returns an appropriate task (with task lock held).
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c922c51
    • A
      um: fix possible race on task->mm · 137d1a26
      Anton Vorontsov 提交于
      Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
      assigns NULL under task_lock(), so tasklist lock is not enough).
      
      We can't use get_task_mm()/mmput() pair as mmput() might sleep, so let's
      take the task lock while we care about its mm.
      
      Note that we should also use find_lock_task_mm() to check all process'
      threads for a valid mm, but for uml we'll do it in a separate patch.
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      137d1a26
    • A
      um: should hold tasklist_lock while traversing processes · 9bd0a077
      Anton Vorontsov 提交于
      Traversing the tasks requires holding tasklist_lock, otherwise it is
      unsafe.
      
      p.s.  However, I'm not sure that calling os_kill_ptraced_process() in the
      atomic context is correct.  It seem to work, but please take a closer
      look.
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9bd0a077
    • A
      blackfin: fix possible deadlock in decode_address() · af1be5a5
      Anton Vorontsov 提交于
      Oleg Nesterov found an interesting deadlock possibility:
      
      > sysrq_showregs_othercpus() does smp_call_function(showacpu)
      > and showacpu() show_stack()->decode_address(). Now suppose that IPI
      > interrupts the task holding read_lock(tasklist).
      
      To fix this, blackfin should not grab the write_ variant of the
      tasklist lock, read_ one is enough.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af1be5a5
    • A
      blackfin: a couple of task->mm handling fixes · 2214f707
      Anton Vorontsov 提交于
      The patch fixes two problems:
      
      1. Working with task->mm w/o getting mm or grabing the task lock is
         dangerous as ->mm might disappear (exit_mm() assigns NULL under
         task_lock(), so tasklist lock is not enough).
      
         We can't use get_task_mm()/mmput() pair as mmput() might sleep,
         so we have to take the task lock while handle its mm.
      
      2. Checking for process->mm is not enough because process' main
         thread may exit or detach its mm via use_mm(), but other threads
         may still have a valid mm.
      
         To catch this we use find_lock_task_mm(), which walks up all
         threads and returns an appropriate task (with task lock held).
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2214f707
    • A
      sh: use clear_tasks_mm_cpumask() · 1198c8b9
      Anton Vorontsov 提交于
      Checking for process->mm is not enough because process' main thread may
      exit or detach its mm via use_mm(), but other threads may still have a
      valid mm.
      
      To fix this we would need to use find_lock_task_mm(), which would walk up
      all threads and returns an appropriate task (with task lock held).
      
      clear_tasks_mm_cpumask() has the issue fixed, so let's use it.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1198c8b9