1. 12 12月, 2009 1 次提交
  2. 11 12月, 2009 5 次提交
    • J
      kgdb: Always process the whole breakpoint list on activate or deactivate · 7f8b7ed6
      Jason Wessel 提交于
      This patch fixes 2 edge cases in using kgdb in conjunction with gdb.
      
      1) kgdb_deactivate_sw_breakpoints() should process the entire array of
         breakpoints.  The failure to do so results in breakpoints that you
         cannot remove, because a break point can only be removed if its
         state flag is set to BP_SET.
      
         The easy way to duplicate this problem is to plant a break point in
         a kernel module and then unload the kernel module.
      
      2) kgdb_activate_sw_breakpoints() should process the entire array of
         breakpoints.  The failure to do so results in missed breakpoints
         when a breakpoint cannot be activated.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      7f8b7ed6
    • J
      kgdb: continue and warn on signal passing from gdb · d625e9c0
      Jason Wessel 提交于
      On some architectures for the segv trap, gdb wants to pass the signal
      back on continue.  For kgdb this is not the default behavior, because
      it can cause the kernel to crash if you arbitrarily pass back a
      exception outside of kgdb.
      
      Instead of causing instability, pass a message back to gdb about the
      supported kgdb signal passing and execute a standard kgdb continue
      operation.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      d625e9c0
    • J
      kgdb: allow for cpu switch when single stepping · 028e7b17
      Jason Wessel 提交于
      The kgdb core should not assume that a single step operation of a
      kernel thread will complete on the same CPU.  The single step flag is
      set at the "thread" level and it is possible in a multi cpu system
      that a kernel thread can get scheduled on another cpu the next time it
      is run.
      
      As a further safety net in case a slave cpu is hung, the debug master
      cpu will try 100 times before giving up and assuming control of the
      slave cpus is no longer possible.  It is more useful to be able to get
      some information out of kgdb instead of spinning forever.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      028e7b17
    • J
      kgdb: Read buffer overflow · 84667d48
      Jason Wessel 提交于
      Roel Kluin reported an error found with Parfait.  Where we want to
      ensure that that kgdb_info[-1] never gets accessed.
      
      Also check to ensure any negative tid does not exceed the size of the
      shadow CPU array, else report critical debug context because it is an
      internal kgdb failure.
      Reported-by: NRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      84667d48
    • X
      perf_event: Fix variable initialization in other codepaths · 5e855db5
      Xiao Guangrong 提交于
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <4B20BAA6.7010609@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e855db5
  3. 10 12月, 2009 6 次提交
    • X
      perf_event: Fix perf_swevent_hrtimer() variable initialization · 21140f4d
      Xiao Guangrong 提交于
      fix:
      
       [<c0477471>] ? printk+0x1d/0x24
       [<c01c98f9>] ? perf_prepare_sample+0x269/0x280
       [<c0149231>] warn_slowpath_common+0x71/0xd0
       [<c01c98f9>] ? perf_prepare_sample+0x269/0x280
       [<c01492aa>] warn_slowpath_null+0x1a/0x20
       [<c01c98f9>] perf_prepare_sample+0x269/0x280
       [<c016e9f3>] ? cpu_clock+0x53/0x90
       [<c01cc368>] __perf_event_overflow+0x2a8/0x300
       [<c01ccc3b>] perf_event_overflow+0x1b/0x30
       [<c01ccccf>] perf_swevent_hrtimer+0x7f/0x120
      
      This is because 'data.raw' variable not initialize.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <4B208E93.1010801@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      21140f4d
    • C
      tracing: Remove comparing of NULL to va_list in trace_array_vprintk() · f2942487
      Carsten Emde 提交于
      Olof Johansson stated the following:
      
        Comparing a va_list with NULL is bogus. It's supposed to be treated like
        an opaque type and only be manipulated with va_* accessors.
      
      Olof noticed that this code broke the ARM builds:
      
          kernel/trace/trace.c: In function 'trace_array_vprintk':
          kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *')
          kernel/trace/trace.c: In function 'tracing_mark_write':
          kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk'
      
      This patch partly reverts c13d2f7c and
      re-installs the original mark_printk() mechanism.
      Reported-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NCarsten Emde <C.Emde@osadl.org>
      LKML-Reference: <4B1BAB74.104@osadl.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f2942487
    • J
      tracing: Fix function graph trace_pipe to properly display failed entries · be1eca39
      Jiri Olsa 提交于
      There is a case where the graph tracer might get confused and omits
      displaying of a single record.  This applies mostly with the trace_pipe
      since it is unlikely that the trace_seq buffer will overflow with the
      trace file.
      
      As the function_graph tracer goes through the trace entries keeping a
      pointer to the current record:
      
      current ->  func1 ENTRY
                  func2 ENTRY
                  func2 RETURN
                  func1 RETURN
      
      When an function ENTRY is encountered, it moves the pointer to the
      next entry to check if the function is a nested or leaf function.
      
                  func1 ENTRY
      current ->  func2 ENTRY
                  func2 RETURN
                  func1 RETURN
      
      If the rest of the writing of the function fills the trace_seq buffer,
      then the trace_pipe read will ignore this entry. The next read will
      Now start at the current location, but the first entry (func1) will
      be discarded.
      
      This patch keeps a copy of the current entry in the iterator private
      storage and will keep track of when the trace_seq buffer fills. When
      the trace_seq buffer fills, it will reuse the copy of the entry in the
      next iteration.
      
      [
        This patch has been largely modified by Steven Rostedt in order to
        clean it up and simplify it. The original idea and concept was from
        Jirka and for that, this patch will go under his name to give him
        the credit he deserves. But because this was modify by Steven Rostedt
        anything wrong with the patch should be blamed on Steven.
      ]
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      be1eca39
    • J
      tracing: Add full state to trace_seq · d184b31c
      Johannes Berg 提交于
      The trace_seq buffer might fill up, and right now one needs to check the
      return value of each printf into the buffer to check for that.
      
      Instead, have the buffer keep track of whether it is full or not, and
      reject more input if it is full or would have overflowed with an input
      that wasn't added.
      
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d184b31c
    • S
      tracing: Buffer the output of seq_file in case of filled buffer · a63ce5b3
      Steven Rostedt 提交于
      If the seq_read fills the buffer it will call s_start again on the next
      itertation with the same position. This causes a problem with the
      function_graph tracer because it consumes the iteration in order to
      determine leaf functions.
      
      What happens is that the iterator stores the entry, and the function
      graph plugin will look at the next entry. If that next entry is a return
      of the same function and task, then the function is a leaf and the
      function_graph plugin calls ring_buffer_read which moves the ring buffer
      iterator forward (the trace iterator still points to the function start
      entry).
      
      The copying of the trace_seq to the seq_file buffer will fail if the
      seq_file buffer is full. The seq_read will not show this entry.
      The next read by userspace will cause seq_read to again call s_start
      which will reuse the trace iterator entry (the function start entry).
      But the function return entry was already consumed. The function graph
      plugin will think that this entry is a nested function and not a leaf.
      
      To solve this, the trace code now checks the return status of the
      seq_printf (trace_print_seq). If the writing to the seq_file buffer
      fails, we set a flag in the iterator (leftover) and we do not reset
      the trace_seq buffer. On the next call to s_start, we check the leftover
      flag, and if it is set, we just reuse the trace_seq buffer and do not
      call into the plugin print functions.
      
      Before this patch:
      
       2)               |      fput() {
       2)               |        __fput() {
       2)   0.550 us    |          inotify_inode_queue_event();
       2)               |          __fsnotify_parent() {
       2)   0.540 us    |          inotify_dentry_parent_queue_event();
      
      After the patch:
      
       2)               |      fput() {
       2)               |        __fput() {
       2)   0.550 us    |          inotify_inode_queue_event();
       2)   0.548 us    |          __fsnotify_parent();
       2)   0.540 us    |          inotify_dentry_parent_queue_event();
      
      [
        Updated the patch to fix a missing return 0 from the trace_print_seq()
        stub when CONFIG_TRACING is disabled.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      ]
      Reported-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a63ce5b3
    • S
      tracing: Only call pipe_close if pipe_close is defined · 29bf4a5e
      Steven Rostedt 提交于
      This fixes a cut and paste error that had pipe_close get called
      if pipe_open was defined (not pipe_close).
      Reported-by: NKosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      LKML-Reference: <20091209153204.F4CD.A69D9226@jp.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      29bf4a5e
  4. 09 12月, 2009 6 次提交
    • F
      tracing/kprobes: Fix field creation's bad error handling · 822a6961
      Frederic Weisbecker 提交于
      When we define the common event fields in kprobe, we invert the error
      handling and return immediately in case of success. Then we omit
      to define specific kprobes fields (ip and nargs), and specific
      kretprobes fields (func, ret_ip, nargs). And we only define them
      when we fail to create common fields.
      
      The most visible consequence is that we can't create filter for
      k(ret)probes specific fields.
      
      This patch re-invert the success/error handling to fix it.
      Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <1260263815-5167-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      822a6961
    • X
      perf_event: Cleanup for cpu_clock_perf_event_update() · ec89a06f
      Xiao Guangrong 提交于
      Using atomic64_xchg() instead of atomic64_read() and
      atomic64_set().
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <4B1F19DC.90204@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ec89a06f
    • X
      perf_event: Allocate children's perf_event_ctxp at the right time · b93f7978
      Xiao Guangrong 提交于
      In current code, children task will allocate memory for
      'child->perf_event_ctxp' if the parent is counted, we can
      do it only if the parent allowed children inherit it.
      
      It can save memory and reduce overhead.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <4B1F19A8.5040805@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b93f7978
    • X
      perf_event: Clean up __perf_event_init_context() · aa5452d7
      Xiao Guangrong 提交于
      Clean up the code a bit:
      
       - define 'perf_cpu_context' variable with 'static'
       - use kzalloc() instead of kmalloc() and memset()
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <4B1F194D.7080306@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aa5452d7
    • F
      hw-breakpoints: Modify breakpoints without unregistering them · 44234adc
      Frederic Weisbecker 提交于
      Currently, when ptrace needs to modify a breakpoint, like disabling
      it, changing its address, type or len, it calls
      modify_user_hw_breakpoint(). This latter will perform the heavy and
      racy task of unregistering the old breakpoint and registering a new
      one.
      
      This is racy as someone else might steal the reserved breakpoint
      slot under us, which is undesired as the breakpoint is only
      supposed to be modified, sometimes in the middle of a debugging
      workflow. We don't want our slot to be stolen in the middle.
      
      So instead of unregistering/registering the breakpoint, just
      disable it while we modify its breakpoint fields and re-enable it
      after if necessary.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      LKML-Reference: <1260347148-5519-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      44234adc
    • M
      trace-kprobe: Support delete probe syntax · a7c312be
      Masami Hiramatsu 提交于
      Support delete probe syntax. The syntax is "-:[group/]event".
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      LKML-Reference: <20091208220316.10142.39192.stgit@dhcp-100-2-132.bos.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      a7c312be
  5. 08 12月, 2009 2 次提交
    • S
      perf: hw_breakpoints: Fix percpu namespace clash · 6ab88863
      Stephen Rothwell 提交于
      Today's linux-next build failed with:
      
        kernel/hw_breakpoint.c:86: error: 'task_bp_pinned' redeclared as different kind of symbol
        ...
      
      Caused by commit dd17c8f7 ("percpu:
      remove per_cpu__ prefix") from the percpu tree interacting with
      commit 56053170 ("hw-breakpoints:
      Fix task-bound breakpoint slot allocation") from the tip tree.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20091208182515.bb6dda4a.sfr@canb.auug.org.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ab88863
    • S
      tracing: Add pipe_close interface · c521efd1
      Steven Rostedt 提交于
      An ftrace plugin can add a pipe_open interface when the user opens
      trace_pipe. But if the plugin allocates something within the pipe_open
      it can not free it because there exists no pipe_close. The hook to
      the trace file open has a corresponding close. The closing of the
      trace_pipe file should also have a corresponding close.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c521efd1
  6. 07 12月, 2009 1 次提交
    • F
      hw-breakpoints: Fix task-bound breakpoint slot allocation · 56053170
      Frederic Weisbecker 提交于
      Whatever the context nature of a breakpoint, we always perform the
      following constraint checks before allocating it a slot:
      
      - Check the number of pinned breakpoint bound the concerned cpus
      - Check the max number of task-bound breakpoints that are belonging
        to a task.
      - Add both and see if we have a reamining slot for the new breakpoint
      
      This is the right thing to do when we are about to register a cpu-only
      bound breakpoint. But not if we are dealing with a task bound
      breakpoint. What we want in this case is:
      
      - Check the number of pinned breakpoint bound the concerned cpus
      - Check the number of breakpoints that already belong to the task
        in which the breakpoint to register is bound to.
      - Add both
      
      This fixes a regression that makes the "firefox -g" command fail to
      register breakpoints once we deal with a secondary thread.
      Reported-by: NWalt <w41ter@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      56053170
  7. 06 12月, 2009 6 次提交
  8. 04 12月, 2009 5 次提交
  9. 03 12月, 2009 8 次提交
    • F
      mutex: Fix missing conditions to build mutex_spin_on_owner() · c08f7829
      Frederic Weisbecker 提交于
      We don't need to build mutex_spin_on_owner() if we have
      CONFIG_DEBUG_MUTEXES or CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES as
      it won't be used under such configs.
      
      Use CONFIG_MUTEX_SPIN_ON_OWNER as it gathers all the necessary
      checks before building it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1259783357-8542-2-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      c08f7829
    • F
      mutex: Better control mutex adaptive spinning config · c0226027
      Frederic Weisbecker 提交于
      Introduce CONFIG_MUTEX_SPIN_ON_OWNER so that we can centralize
      in a single place the conditions that determine its definition
      and use.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1259783357-8542-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      c0226027
    • P
      rcu: Add expedited grace-period support for preemptible RCU · d9a3da06
      Paul E. McKenney 提交于
      Implement an synchronize_rcu_expedited() for preemptible RCU
      that actually is expedited.  This uses
      synchronize_sched_expedited() to force all threads currently
      running in a preemptible-RCU read-side critical section onto the
      appropriate ->blocked_tasks[] list, then takes a snapshot of all
      of these lists and waits for them to drain.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1259784616158-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9a3da06
    • P
      rcu: Enable fourth level of TREE_RCU hierarchy · cf244dc0
      Paul E. McKenney 提交于
      Enable a fourth level of rcu_node hierarchy for TREE_RCU and
      TREE_PREEMPT_RCU.  This is for stress-testing and experiemental
      purposes only, although in theory this would enable 16,777,216
      CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
      systems. Normal experimental use of this fourth level will
      normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
      though the more adventurous (and more fortunate) experimenters
      may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
      CONFIG_RCU_FANOUT=4 for 256-CPU systems.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846161257-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf244dc0
    • P
      rcu: Rename "quiet" functions · d3f6bad3
      Paul E. McKenney 提交于
      The number of "quiet" functions has grown recently, and the
      names are no longer very descriptive.  The point of all of these
      functions is to do some portion of the task of reporting a
      quiescent state, so rename them accordingly:
      
      o	cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
      	quiescent state to the per-CPU rcu_data structure.  If this
      	turns out to be a new quiescent state for this grace period,
      	then rcu_report_qs_rnp() will be invoked to propagate the
      	quiescent state up the rcu_node hierarchy.
      
      o	cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
      	a quiescent state for a given CPU (or possibly a set of CPUs)
      	up the rcu_node hierarchy.
      
      o	cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
      	reports a full set of quiescent states to the global rcu_state
      	structure.
      
      o	task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
      	a quiescent state due to a task exiting an RCU read-side critical
      	section that had previously blocked in that same critical section.
      	As indicated by the new name, this type of quiescent state is
      	reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
      	to do so).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846163698-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3f6bad3
    • H
      modules: don't export section names of empty sections via sysfs · 35dead42
      Helge Deller 提交于
      On the parisc architecture we face for each and every loaded kernel module
      this kernel "badness warning":
        sysfs: cannot create duplicate filename '/module/ac97_bus/sections/.text'
        Badness at fs/sysfs/dir.c:487
      
      Reason for that is, that on parisc all kernel modules do have multiple
      .text sections due to the usage of the -ffunction-sections compiler flag
      which is needed to reach all jump targets on this platform.
      
      An objdump on such a kernel module gives:
      Sections:
      Idx Name          Size      VMA       LMA       File off  Algn
        0 .note.gnu.build-id 00000024  00000000  00000000  00000034  2**2
                        CONTENTS, ALLOC, LOAD, READONLY, DATA
        1 .text         00000000  00000000  00000000  00000058  2**0
                        CONTENTS, ALLOC, LOAD, READONLY, CODE
        2 .text.ac97_bus_match 0000001c  00000000  00000000  00000058  2**2
                        CONTENTS, ALLOC, LOAD, READONLY, CODE
        3 .text         00000000  00000000  00000000  000000d4  2**0
                        CONTENTS, ALLOC, LOAD, READONLY, CODE
      ...
      Since the .text sections are empty (size of 0 bytes) and won't be
      loaded by the kernel module loader anyway, I don't see a reason
      why such sections need to be listed under
      /sys/module/<module_name>/sections/<section_name> either.
      
      The attached patch does solve this issue by not exporting section
      names which are empty.
      
      This fixes bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=14703Signed-off-by: NHelge Deller <deller@gmx.de>
      CC: rusty@rustcorp.com.au
      CC: akpm@linux-foundation.org
      CC: James.Bottomley@HansenPartnership.com
      CC: roland@redhat.com
      CC: dave@hiauly1.hia.nrc.ca
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35dead42
    • H
      sched, cputime: Introduce thread_group_times() · 0cf55e1e
      Hidetoshi Seto 提交于
      This is a real fix for problem of utime/stime values decreasing
      described in the thread:
      
         http://lkml.org/lkml/2009/11/3/522
      
      Now cputime is accounted in the following way:
      
       - {u,s}time in task_struct are increased every time when the thread
         is interrupted by a tick (timer interrupt).
      
       - When a thread exits, its {u,s}time are added to signal->{u,s}time,
         after adjusted by task_times().
      
       - When all threads in a thread_group exits, accumulated {u,s}time
         (and also c{u,s}time) in signal struct are added to c{u,s}time
         in signal struct of the group's parent.
      
      So {u,s}time in task struct are "raw" tick count, while
      {u,s}time and c{u,s}time in signal struct are "adjusted" values.
      
      And accounted values are used by:
      
       - task_times(), to get cputime of a thread:
         This function returns adjusted values that originates from raw
         {u,s}time and scaled by sum_exec_runtime that accounted by CFS.
      
       - thread_group_cputime(), to get cputime of a thread group:
         This function returns sum of all {u,s}time of living threads in
         the group, plus {u,s}time in the signal struct that is sum of
         adjusted cputimes of all exited threads belonged to the group.
      
      The problem is the return value of thread_group_cputime(),
      because it is mixed sum of "raw" value and "adjusted" value:
      
        group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
      
      This misbehavior can break {u,s}time monotonicity.
      Assume that if there is a thread that have raw values greater
      than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
      but only runs 45ms) and if it exits, cputime will decrease (e.g.
      -5ms).
      
      To fix this, we could do:
      
        group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
      
      But task_times() contains hard divisions, so applying it for
      every thread should be avoided.
      
      This patch fixes the above problem in the following way:
      
       - Modify thread's exit (= __exit_signal()) not to use task_times().
         It means {u,s}time in signal struct accumulates raw values instead
         of adjusted values.  As the result it makes thread_group_cputime()
         to return pure sum of "raw" values.
      
       - Introduce a new function thread_group_times(*task, *utime, *stime)
         that converts "raw" values of thread_group_cputime() to "adjusted"
         values, in same calculation procedure as task_times().
      
       - Modify group's exit (= wait_task_zombie()) to use this introduced
         thread_group_times().  It make c{u,s}time in signal struct to
         have adjusted values like before this patch.
      
       - Replace some thread_group_cputime() by thread_group_times().
         This replacements are only applied where conveys the "adjusted"
         cputime to users, and where already uses task_times() near by it.
         (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
      
      This patch have a positive side effect:
      
       - Before this patch, if a group contains many short-life threads
         (e.g. runs 0.9ms and not interrupted by ticks), the group's
         cputime could be invisible since thread's cputime was accumulated
         after adjusted: imagine adjustment function as adj(ticks, runtime),
           {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
         After this patch it will not happen because the adjustment is
         applied after accumulated.
      
      v2:
       - remove if()s, put new variables into signal_struct.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cf55e1e
    • H
      sched, cputime: Cleanups related to task_times() · d99ca3b9
      Hidetoshi Seto 提交于
      - Remove if({u,s}t)s because no one call it with NULL now.
      - Use cputime_{add,sub}().
      - Add ifndef-endif for prev_{u,s}time since they are used
        only when !VIRT_CPU_ACCOUNTING.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d99ca3b9