1. 15 6月, 2011 12 次提交
    • P
      tracing: Convert to kstrtoul_from_user · 22fe9b54
      Peter Huewe 提交于
      This patch replaces the code for getting an unsigned long from a
      userspace buffer by a simple call to kstroul_from_user.
      This makes it easier to read and less error prone.
      Signed-off-by: NPeter Huewe <peterhuewe@gmx.de>
      Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.deSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      22fe9b54
    • J
      tracing, function_graph: Add context-info support for function_graph tracer · 749230b0
      Jiri Olsa 提交于
      The function_graph tracer does not follow global context-info option.
      Adding TRACE_ITER_CONTEXT_INFO trace_flags check to enable it.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# echo 0 > options/context-info
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      #     TIME        CPU  DURATION                  FUNCTION CALLS
      #      |          |     |   |                     |   |   |   |
       1)   0.079 us    |          } /* __vma_link_rb */
       1)   0.056 us    |          copy_page_range();
       1)               |          security_vm_enough_memory() {
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
        } /* update_ts_time_stats */
        timekeeping_max_deferment();
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-6-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      749230b0
    • J
      tracing, function_graph: Remove lock-depth from latency trace · 199abfab
      Jiri Olsa 提交于
      The lock_depth was removed in commit
      e6e1e259 tracing: Remove lock_depth from event entry
      
      Removing the lock_depth info from function_graph latency header.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# echo 1 > options/latency-format
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
      # --------------------------------------------------------------------
      # latency: 0 us, #59756/311298, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
      #    -----------------
      #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #
      #      _-----=> irqs-off
      #     / _----=> need-resched
      #    | / _---=> hardirq/softirq
      #    || / _--=> preempt-depth
      #    ||| / _-=> lock-depth
      #    |||| /
      # CPU|||||  DURATION                  FUNCTION CALLS
      # |  |||||   |   |                     |   |   |   |
       0)  ....  0.068 us    |    } /* __rcu_read_unlock */
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
      # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
      # --------------------------------------------------------------------
      # latency: 0 us, #59747/1744610, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
      #    -----------------
      #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #
      #      _-----=> irqs-off
      #     / _----=> need-resched
      #    | / _---=> hardirq/softirq
      #    || / _--=> preempt-depth
      #    ||| /
      # CPU||||  DURATION                  FUNCTION CALLS
      # |  ||||   |   |                     |   |   |   |
       0)  ..s.  1.641 us    |  } /* __rcu_process_callbacks */
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-5-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      199abfab
    • J
      tracing, function: Fix trace header to follow context-info option · f56e7f8e
      Jiri Olsa 提交于
      The header display of function tracer does not follow
      the context-info option, so field names are displayed even
      if this option is off.
      
      Added check for TRACE_ITER_CONTEXT_INFO trace_flags.
      
      With following commands:
      	# echo function > ./current_tracer
      	# echo 0 > options/context-info
      	# cat trace
      
      This is what it looked like before:
      # tracer: function
      #
      #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
      #              | |       |          |         |
      add_preempt_count <-schedule
      rcu_note_context_switch <-schedule
      ...
      
      This is what it looks like now:
      # tracer: function
      #
      _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-4-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f56e7f8e
    • J
      tracing, function_graph: Merge overhead and duration display functions · ffeb80fc
      Jiri Olsa 提交于
      Functions print_graph_overhead() and print_graph_duration() displays
      data for one field - DURATION.
      
      I merged them into single function print_graph_duration(),
      and added a way to display the empty parts of the field.
      
      This way the print_graph_irq() function can use this column to display
      the IRQ signs if needed and the DURATION field details stays inside
      the print_graph_duration() function.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-3-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ffeb80fc
    • J
      tracing, function_graph: Remove dependency of abstime and duration fields on latency · 321e68b0
      Jiri Olsa 提交于
      The display of absolute time and duration fields is based on the
      latency field. This was added during the irqsoff/wakeup tracers
      graph support changes.
      
      It's causing confusion in what fields will be displayed for the
      function_graph tracer itself. So I'm removing this depency, and
      adding absolute time and duration fields to the preemptirqsoff
      preemptoff irqsoff wakeup tracers.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      #     TIME        CPU  DURATION                  FUNCTION CALLS
      #      |          |     |   |                     |   |   |   |
       0)   0.068 us    |          } /* page_add_file_rmap */
       0)               |          _raw_spin_unlock() {
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |
       0)   0.068 us    |                } /* add_preempt_count */
       0)   0.993 us    |              } /* vfsmount_lock_local_lock */
      ...
      
      For preemptirqsoff preemptoff irqsoff wakeup tracers,
      this is what it looked like before:
      SNIP
      #                       _-----=> irqs-off
      #                      / _----=> need-resched
      #                     | / _---=> hardirq/softirq
      #                     || / _--=> preempt-depth
      #                     ||| / _-=> lock-depth
      #                     |||| /
      # CPU  TASK/PID       |||||  DURATION                  FUNCTION CALLS
      # |     |    |        |||||   |   |                     |   |   |   |
       1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      
      This is what it looks like now:
      SNIP
      #
      #                                       _-----=> irqs-off
      #                                      / _----=> need-resched
      #                                     | / _---=> hardirq/softirq
      #                                     || / _--=> preempt-depth
      #                                     ||| /
      #     TIME        CPU  TASK/PID       ||||  DURATION                  FUNCTION CALLS
      #      |          |     |    |        ||||   |   |                     |   |   |   |
         19.847735 |   1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      321e68b0
    • P
      async: Fixed an include coding style issue · 84c15027
      Paul McQuade 提交于
      Added <linux/atomic.h>,<linux/ktime.h> and Removed <asm/atomic.h>.
      Added KERN_DEBUG to printk() functions.
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPaul McQuade <tungstentide@gmail.com>
      Link: http://lkml.kernel.org/r/4DE596B4.7030904@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      84c15027
    • P
      ftrace: Fixed an include coding style issue · bd38c0e6
      Paul McQuade 提交于
      Removed <asm/ftrace.h> because <linux/ftrace.h> was already declared.
      Braces of struct's coding style fixed.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul McQuade <tungstentide@gmail.com>
      Link: http://lkml.kernel.org/r/4DE59711.3090900@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bd38c0e6
    • S
      tracing: Add disable_on_free option · cf30cf67
      Steven Rostedt 提交于
      Add a trace option to disable tracing on free. When this option is
      set, a write into the free_buffer file will not only shrink the
      ring buffer down to zero, but it will also disable tracing.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      cf30cf67
    • V
      tracing: Add a proc file to stop tracing and free buffer · 4f271a2a
      Vaibhav Nagarnaik 提交于
      The proc file entry buffer_size_kb is used to set the size of tracing
      buffer. The memory to expand the buffer size is kernel memory. Consider
      a use case where tracing is handled by a user space utility, which acts
      as a gate keeper for tracing requests. In an OOM condition, tracing is
      considered a low priority task and if the utility gets killed the ring
      buffer memory cannot be released back to the kernel.
      
      This patch adds a proc file called "free_buffer" whose purpose is to
      stop tracing and free up the ring buffer when it is closed.
      
      The user space process can then set the desired size in buffer_size_kb
      file and open the fd to the "free_buffer" file. Under OOM condition, if
      the process gets killed, the kernel closes the file descriptor. The
      release handler stops the tracing and releases the kernel memory
      automatically.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4f271a2a
    • V
      tracing: Use NUMA allocation for per-cpu ring buffer pages · 7ea59064
      Vaibhav Nagarnaik 提交于
      The tracing ring buffer is a group of per-cpu ring buffers where
      allocation and logging is done on a per-cpu basis. The events that are
      generated on a particular CPU are logged in the corresponding buffer.
      This is to provide wait-free writes between CPUs and good NUMA node
      locality while accessing the ring buffer.
      
      However, the allocation routines consider NUMA locality only for buffer
      page metadata and not for the actual buffer page. This causes the pages
      to be allocated on the NUMA node local to the CPU where the allocation
      routine is running at the time.
      
      This patch fixes the problem by using a NUMA node specific allocation
      routine so that the pages are allocated from a NUMA node local to the
      logging CPU.
      
      I tested with the getuid_microbench from autotest. It is a simple binary
      that calls getuid() in a loop and measures the average time for the
      syscall to complete. The following command was used to test:
      $ getuid_microbench 1000000
      
      Compared the numbers found on kernel with and without this patch and
      found that logging latency decreases by 30-50 ns/call.
      tracing with non-NUMA allocation - 569 ns/call
      tracing with NUMA allocation     - 512 ns/call
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ea59064
    • V
      tracing: Schedule a delayed work to call wakeup() · e7e2ee89
      Vaibhav Nagarnaik 提交于
      In using syscall tracing by concurrent processes, the wakeup() that is
      called in the event commit function causes contention on the spin lock
      of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
      tracepoints, and by running getuid_microbench from autotest in parallel
      I found that the contention causes exponential latency increase in the
      tracing path.
      
      The autotest binary getuid_microbench calls getuid() in a tight loop for
      the given number of iterations and measures the average time required to
      complete a single invocation of syscall.
      
      The patch schedules a delayed work after 2 ms once an event commit calls
      to wake up the trace wait_queue. This removes the delay caused by
      contention on spin lock in wakeup() and amortizes the wakeup() calls
      scheduled over the 2 ms period.
      
      In the following example, the script enables the sys_enter_getuid and
      sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
      with the given number of processes. The output clearly shows the latency
      increase caused by contentions.
      
      $ ~/getuid.sh 1
      1000000 calls in 0.720974253 s (720.974253 ns/call)
      
      $ ~/getuid.sh 2
      1000000 calls in 1.166457554 s (1166.457554 ns/call)
      1000000 calls in 1.168933765 s (1168.933765 ns/call)
      
      $ ~/getuid.sh 3
      1000000 calls in 1.783827516 s (1783.827516 ns/call)
      1000000 calls in 1.795553270 s (1795.553270 ns/call)
      1000000 calls in 1.796493376 s (1796.493376 ns/call)
      
      $ ~/getuid.sh 4
      1000000 calls in 4.483041796 s (4483.041796 ns/call)
      1000000 calls in 4.484165388 s (4484.165388 ns/call)
      1000000 calls in 4.484850762 s (4484.850762 ns/call)
      1000000 calls in 4.485643576 s (4485.643576 ns/call)
      
      $ ~/getuid.sh 5
      1000000 calls in 6.497521653 s (6497.521653 ns/call)
      1000000 calls in 6.502000236 s (6502.000236 ns/call)
      1000000 calls in 6.501709115 s (6501.709115 ns/call)
      1000000 calls in 6.502124100 s (6502.124100 ns/call)
      1000000 calls in 6.502936358 s (6502.936358 ns/call)
      
      After the patch, the latencies scale better.
      1000000 calls in 0.728720455 s (728.720455 ns/call)
      
      1000000 calls in 0.842782857 s (842.782857 ns/call)
      1000000 calls in 0.883803135 s (883.803135 ns/call)
      
      1000000 calls in 0.902077764 s (902.077764 ns/call)
      1000000 calls in 0.902838202 s (902.838202 ns/call)
      1000000 calls in 0.908896885 s (908.896885 ns/call)
      
      1000000 calls in 0.932523515 s (932.523515 ns/call)
      1000000 calls in 0.958009672 s (958.009672 ns/call)
      1000000 calls in 0.986188020 s (986.188020 ns/call)
      1000000 calls in 0.989771102 s (989.771102 ns/call)
      
      1000000 calls in 0.933518391 s (933.518391 ns/call)
      1000000 calls in 0.958897947 s (958.897947 ns/call)
      1000000 calls in 1.031038897 s (1031.038897 ns/call)
      1000000 calls in 1.089516025 s (1089.516025 ns/call)
      1000000 calls in 1.141998347 s (1141.998347 ns/call)
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e7e2ee89
  2. 07 6月, 2011 1 次提交
  3. 04 6月, 2011 1 次提交
  4. 31 5月, 2011 2 次提交
  5. 30 5月, 2011 1 次提交
    • L
      mm: Fix boot crash in mm_alloc() · 6345d24d
      Linus Torvalds 提交于
      Thomas Gleixner reports that we now have a boot crash triggered by
      CONFIG_CPUMASK_OFFSTACK=y:
      
          BUG: unable to handle kernel NULL pointer dereference at   (null)
          IP: [<c11ae035>] find_next_bit+0x55/0xb0
          Call Trace:
           [<c11addda>] cpumask_any_but+0x2a/0x70
           [<c102396b>] flush_tlb_mm+0x2b/0x80
           [<c1022705>] pud_populate+0x35/0x50
           [<c10227ba>] pgd_alloc+0x9a/0xf0
           [<c103a3fc>] mm_init+0xec/0x120
           [<c103a7a3>] mm_alloc+0x53/0xd0
      
      which was introduced by commit de03c72c ("mm: convert
      mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
      mm_init() vs mm_init_cpumask
      
      Thomas wrote a patch to just fix the ordering of initialization, but I
      hate the new double allocation in the fork path, so I ended up instead
      doing some more radical surgery to clean it all up.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6345d24d
  6. 29 5月, 2011 10 次提交
  7. 28 5月, 2011 8 次提交
    • P
      rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state · cc3ce517
      Paul E. McKenney 提交于
      Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can
      result in softlockup warnings.  Because some of RCU's kthreads can
      legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE
      state in order to avoid those warnings.
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc3ce517
    • P
      rcu: Remove waitqueue usage for cpu, node, and boost kthreads · 08bca60a
      Peter Zijlstra 提交于
      It is not necessary to use waitqueues for the RCU kthreads because
      we always know exactly which thread is to be awakened.  In addition,
      wake_up() only issues an actual wakeup when there is a thread waiting on
      the queue, which was why there was an extra explicit wake_up_process()
      to get the RCU kthreads started.
      
      Eliminating the waitqueues (and wake_up()) in favor of wake_up_process()
      eliminates the need for the initial wake_up_process() and also shrinks
      the data structure size a bit.  The wakeup logic is placed in a new
      rcu_wait() macro.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      08bca60a
    • P
      rcu: Avoid acquiring rcu_node locks in timer functions · 8826f3b0
      Paul E. McKenney 提交于
      This commit switches manipulations of the rcu_node ->wakemask field
      to atomic operations, which allows rcu_cpu_kthread_timer() to avoid
      acquiring the rcu_node lock.  This should avoid the following lockdep
      splat reported by Valdis Kletnieks:
      
      [   12.872150] usb 1-4: new high speed USB device number 3 using ehci_hcd
      [   12.986667] usb 1-4: New USB device found, idVendor=413c, idProduct=2513
      [   12.986679] usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
      [   12.987691] hub 1-4:1.0: USB hub found
      [   12.987877] hub 1-4:1.0: 3 ports detected
      [   12.996372] input: PS/2 Generic Mouse as /devices/platform/i8042/serio1/input/input10
      [   13.071471] udevadm used greatest stack depth: 3984 bytes left
      [   13.172129]
      [   13.172130] =======================================================
      [   13.172425] [ INFO: possible circular locking dependency detected ]
      [   13.172650] 2.6.39-rc6-mmotm0506 #1
      [   13.172773] -------------------------------------------------------
      [   13.172997] blkid/267 is trying to acquire lock:
      [   13.173009]  (&p->pi_lock){-.-.-.}, at: [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
      [   13.173009]
      [   13.173009] but task is already holding lock:
      [   13.173009]  (rcu_node_level_0){..-...}, at: [<ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58
      [   13.173009]
      [   13.173009] which lock already depends on the new lock.
      [   13.173009]
      [   13.173009]
      [   13.173009] the existing dependency chain (in reverse order) is:
      [   13.173009]
      [   13.173009] -> #2 (rcu_node_level_0){..-...}:
      [   13.173009]        [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]        [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]        [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]        [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]        [<ffffffff815697f1>] _raw_spin_lock+0x36/0x45
      [   13.173009]        [<ffffffff81090794>] rcu_read_unlock_special+0x8c/0x1d5
      [   13.173009]        [<ffffffff8109092c>] __rcu_read_unlock+0x4f/0xd7
      [   13.173009]        [<ffffffff81027bd3>] rcu_read_unlock+0x21/0x23
      [   13.173009]        [<ffffffff8102cc34>] cpuacct_charge+0x6c/0x75
      [   13.173009]        [<ffffffff81030cc6>] update_curr+0x101/0x12e
      [   13.173009]        [<ffffffff810311d0>] check_preempt_wakeup+0xf7/0x23b
      [   13.173009]        [<ffffffff8102acb3>] check_preempt_curr+0x2b/0x68
      [   13.173009]        [<ffffffff81031d40>] ttwu_do_wakeup+0x76/0x128
      [   13.173009]        [<ffffffff81031e49>] ttwu_do_activate.constprop.63+0x57/0x5c
      [   13.173009]        [<ffffffff81031e96>] scheduler_ipi+0x48/0x5d
      [   13.173009]        [<ffffffff810177d5>] smp_reschedule_interrupt+0x16/0x18
      [   13.173009]        [<ffffffff815710f3>] reschedule_interrupt+0x13/0x20
      [   13.173009]        [<ffffffff810b66d1>] rcu_read_unlock+0x21/0x23
      [   13.173009]        [<ffffffff810b739c>] find_get_page+0xa9/0xb9
      [   13.173009]        [<ffffffff810b8b48>] filemap_fault+0x6a/0x34d
      [   13.173009]        [<ffffffff810d1a25>] __do_fault+0x54/0x3e6
      [   13.173009]        [<ffffffff810d447a>] handle_pte_fault+0x12c/0x1ed
      [   13.173009]        [<ffffffff810d48f7>] handle_mm_fault+0x1cd/0x1e0
      [   13.173009]        [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de
      [   13.173009]        [<ffffffff8156a75f>] page_fault+0x1f/0x30
      [   13.173009]
      [   13.173009] -> #1 (&rq->lock){-.-.-.}:
      [   13.173009]        [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]        [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]        [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]        [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]        [<ffffffff815697f1>] _raw_spin_lock+0x36/0x45
      [   13.173009]        [<ffffffff81027e19>] __task_rq_lock+0x8b/0xd3
      [   13.173009]        [<ffffffff81032f7f>] wake_up_new_task+0x41/0x108
      [   13.173009]        [<ffffffff810376c3>] do_fork+0x265/0x33f
      [   13.173009]        [<ffffffff81007d02>] kernel_thread+0x6b/0x6d
      [   13.173009]        [<ffffffff8153a9dd>] rest_init+0x21/0xd2
      [   13.173009]        [<ffffffff81b1db4f>] start_kernel+0x3bb/0x3c6
      [   13.173009]        [<ffffffff81b1d29f>] x86_64_start_reservations+0xaf/0xb3
      [   13.173009]        [<ffffffff81b1d393>] x86_64_start_kernel+0xf0/0xf7
      [   13.173009]
      [   13.173009] -> #0 (&p->pi_lock){-.-.-.}:
      [   13.173009]        [<ffffffff81067788>] check_prev_add+0x68/0x20e
      [   13.173009]        [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]        [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]        [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]        [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]        [<ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57
      [   13.173009]        [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
      [   13.173009]        [<ffffffff81032f3c>] wake_up_process+0x10/0x12
      [   13.173009]        [<ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58
      [   13.173009]        [<ffffffff81045286>] call_timer_fn+0xac/0x1e9
      [   13.173009]        [<ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2
      [   13.173009]        [<ffffffff8103e487>] __do_softirq+0x109/0x26a
      [   13.173009]        [<ffffffff8157144c>] call_softirq+0x1c/0x30
      [   13.173009]        [<ffffffff81003207>] do_softirq+0x44/0xf1
      [   13.173009]        [<ffffffff8103e8b9>] irq_exit+0x58/0xc8
      [   13.173009]        [<ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87
      [   13.173009]        [<ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20
      [   13.173009]        [<ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310
      [   13.173009]        [<ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243
      [   13.173009]        [<ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a
      [   13.173009]        [<ffffffff810d27fe>] __pte_alloc+0x22/0x14b
      [   13.173009]        [<ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0
      [   13.173009]        [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de
      [   13.173009]        [<ffffffff8156a75f>] page_fault+0x1f/0x30
      [   13.173009]
      [   13.173009] other info that might help us debug this:
      [   13.173009]
      [   13.173009] Chain exists of:
      [   13.173009]   &p->pi_lock --> &rq->lock --> rcu_node_level_0
      [   13.173009]
      [   13.173009]  Possible unsafe locking scenario:
      [   13.173009]
      [   13.173009]        CPU0                    CPU1
      [   13.173009]        ----                    ----
      [   13.173009]   lock(rcu_node_level_0);
      [   13.173009]                                lock(&rq->lock);
      [   13.173009]                                lock(rcu_node_level_0);
      [   13.173009]   lock(&p->pi_lock);
      [   13.173009]
      [   13.173009]  *** DEADLOCK ***
      [   13.173009]
      [   13.173009] 3 locks held by blkid/267:
      [   13.173009]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8156cdb4>] do_page_fault+0x1f3/0x5de
      [   13.173009]  #1:  (&yield_timer){+.-...}, at: [<ffffffff810451da>] call_timer_fn+0x0/0x1e9
      [   13.173009]  #2:  (rcu_node_level_0){..-...}, at: [<ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58
      [   13.173009]
      [   13.173009] stack backtrace:
      [   13.173009] Pid: 267, comm: blkid Not tainted 2.6.39-rc6-mmotm0506 #1
      [   13.173009] Call Trace:
      [   13.173009]  <IRQ>  [<ffffffff8154a529>] print_circular_bug+0xc8/0xd9
      [   13.173009]  [<ffffffff81067788>] check_prev_add+0x68/0x20e
      [   13.173009]  [<ffffffff8100c861>] ? save_stack_trace+0x28/0x46
      [   13.173009]  [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]  [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]  [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]  [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]  [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57
      [   13.173009]  [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff81032f3c>] wake_up_process+0x10/0x12
      [   13.173009]  [<ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff81045286>] call_timer_fn+0xac/0x1e9
      [   13.173009]  [<ffffffff810451da>] ? del_timer+0x75/0x75
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2
      [   13.173009]  [<ffffffff8103e487>] __do_softirq+0x109/0x26a
      [   13.173009]  [<ffffffff8106365f>] ? tick_dev_program_event+0x37/0xf6
      [   13.173009]  [<ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f
      [   13.173009]  [<ffffffff8157144c>] call_softirq+0x1c/0x30
      [   13.173009]  [<ffffffff81003207>] do_softirq+0x44/0xf1
      [   13.173009]  [<ffffffff8103e8b9>] irq_exit+0x58/0xc8
      [   13.173009]  [<ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87
      [   13.173009]  [<ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20
      [   13.173009]  <EOI>  [<ffffffff810bd384>] ? get_page_from_freelist+0x114/0x310
      [   13.173009]  [<ffffffff810bd51a>] ? get_page_from_freelist+0x2aa/0x310
      [   13.173009]  [<ffffffff812220e7>] ? clear_page_c+0x7/0x10
      [   13.173009]  [<ffffffff810bd1ef>] ? prep_new_page+0x14c/0x1cd
      [   13.173009]  [<ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310
      [   13.173009]  [<ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243
      [   13.173009]  [<ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99
      [   13.173009]  [<ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a
      [   13.173009]  [<ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99
      [   13.173009]  [<ffffffff810d27fe>] __pte_alloc+0x22/0x14b
      [   13.173009]  [<ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0
      [   13.173009]  [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de
      [   13.173009]  [<ffffffff810d915f>] ? sys_brk+0x32/0x10c
      [   13.173009]  [<ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f
      [   13.173009]  [<ffffffff81065c4f>] ? trace_hardirqs_off_caller+0x3f/0x9c
      [   13.173009]  [<ffffffff812235dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
      [   13.173009]  [<ffffffff8156a75f>] page_fault+0x1f/0x30
      [   14.010075] usb 5-1: new full speed USB device number 2 using uhci_hcd
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8826f3b0
    • P
      perf: Fix SIGIO handling · f506b3dc
      Peter Zijlstra 提交于
      Vince noticed that unless we mmap() a buffer, SIGIO gets lost. So
      explicitly push the wakeup (including signals) when requested.
      Reported-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/n/tip-2euus3f3x3dyvdk52cjxw8zu@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f506b3dc
    • K
      cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed · 1e1b6c51
      KOSAKI Motohiro 提交于
      The rule is, we have to update tsk->rt.nr_cpus_allowed if we change
      tsk->cpus_allowed. Otherwise RT scheduler may confuse.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1e1b6c51
    • P
      sched: Fix ->min_vruntime calculation in dequeue_entity() · 1e876231
      Peter Zijlstra 提交于
      Dima Zavin <dima@android.com> reported:
      
      "After pulling the thread off the run-queue during a cgroup change,
      the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime
      then gets normalized to this new value. This can then lead to the thread
      getting an unfair boost in the new group if the vruntime of the next
      task in the old run-queue was way further ahead."
      Reported-by: NDima Zavin <dima@android.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Recalls-having-tested-once-upon-a-time-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1305674470-23727-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1e876231
    • P
      sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW · d6aa8f85
      Peter Zijlstra 提交于
      Marc reported that e4a52bcb (sched: Remove rq->lock from the first
      half of ttwu()) broke his ARM-SMP machine. Now ARM is one of the few
      __ARCH_WANT_INTERRUPTS_ON_CTXSW users, so that exception in the ttwu()
      code was suspect.
      
      Yong found that the interrupt could hit after context_switch() changes
      current but before it clears p->on_cpu, if that interrupt were to
      attempt a wake-up of p we would indeed find ourselves spinning in IRQ
      context.
      
      Fix this by reverting to the old behaviour for this situation and
      perform a full remote wake-up.
      
      Cc: Frank Rowand <frank.rowand@am.sony.com>
      Cc: Yong Zhang <yong.zhang0@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Reported-by: NMarc Zyngier <Marc.Zyngier@arm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6aa8f85
    • X
      sched: More sched_domain iterations fixes · cd4ae6ad
      Xiaotian Feng 提交于
      sched_domain iterations needs to be protected by rcu_read_lock() now,
      this patch adds another two places which needs the rcu lock, which is
      spotted by following suspicious rcu_dereference_check() usage warnings.
      
      kernel/sched_rt.c:1244 invoked rcu_dereference_check() without protection!
      kernel/sched_stats.h:41 invoked rcu_dereference_check() without protection!
      Signed-off-by: NXiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1303469634-11678-1-git-send-email-dfeng@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      cd4ae6ad
  8. 27 5月, 2011 5 次提交