1. 01 7月, 2011 9 次提交
  2. 28 6月, 2011 1 次提交
    • V
      taskstats: don't allow duplicate entries in listener mode · 26c4caea
      Vasiliy Kulikov 提交于
      Currently a single process may register exit handlers unlimited times.
      It may lead to a bloated listeners chain and very slow process
      terminations.
      
      Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
      kernel memory is stolen for the handlers chain and "time id" shows 2-7
      seconds instead of normal 0.003.  It makes it possible to exhaust all
      kernel memory and to eat much of CPU time by triggerring numerous exits
      on a single CPU.
      
      The patch limits the number of times a single process may register
      itself on a single CPU to one.
      
      One little issue is kept unfixed - as taskstats_exit() is called before
      exit_files() in do_exit(), the orphaned listener entry (if it was not
      explicitly deregistered) is kept until the next someone's exit() and
      implicit deregistration in send_cpu_listeners().  So, if a process
      registered itself as a listener exits and the next spawned process gets
      the same pid, it would inherit taskstats attributes.
      Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26c4caea
  3. 22 6月, 2011 3 次提交
    • J
      alarmtimers: Return -ENOTSUPP if no RTC device is present · 1c6b39ad
      John Stultz 提交于
      Toralf Förster and Richard Weinberger noted that if there is
      no RTC device, the alarm timers core prints out an annoying
      "ALARM timers will not wake from suspend" message.
      
      This warning has been removed in a previous patch, however
      the issue still remains:  The original idea was to support
      alarm timers even if there was no rtc device, as long as the
      system didn't go into suspend.
      
      However, after further consideration, communicating to the application
      that alarmtimers are not fully functional seems like the better
      solution.
      
      So this patch makes it so we return -ENOTSUPP to any posix _ALARM
      clockid calls if there is no backing RTC device on the system.
      
      Further this changes the behavior where when there is no rtc device
      we will check for one on clock_getres, clock_gettime, timer_create,
      and timer_nsleep instead of on suspend.
      
      CC: Toralf Förster <toralf.foerster@gmx.de>
      CC: Richard Weinberger <richard@nod.at
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Thomas Gleixner <tglx@linutronix.de>
      Reported-by: NToralf Förster <toralf.foerster@gmx.de>
      Reported by: Richard Weinberger <richard@nod.at>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      1c6b39ad
    • J
      alarmtimers: Handle late rtc module loading · c008ba58
      John Stultz 提交于
      The alarmtimers code currently picks a rtc device to use at
      late init time. However, if your rtc driver is loaded as a module,
      it may be registered after the alarmtimers late init code, leaving
      the alarmtimers nonfunctional.
      
      This patch moves the the rtcdevice selection to when we actually try
      to use it, allowing us to make use of rtc modules that may have been
      loaded at any point since bootup.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Meelis Roos <mroos@ut.ee>
      Reported-by: NMeelis Roos <mroos@ut.ee>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      c008ba58
    • M
      PM: Free memory bitmaps if opening /dev/snapshot fails · 8440f4b1
      Michal Kubecek 提交于
      When opening /dev/snapshot device, snapshot_open() creates memory
      bitmaps which are freed in snapshot_release(). But if any of the
      callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
      fails, snapshot_release() is never called and bitmaps are not freed.
      Next attempt to open /dev/snapshot then triggers BUG_ON() check in
      create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
      is active on s390x.
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@kernel.org
      8440f4b1
  4. 18 6月, 2011 1 次提交
    • D
      KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring · 87966996
      David Howells 提交于
      ____call_usermodehelper() now erases any credentials set by the
      subprocess_inf::init() function.  The problem is that commit
      17f60a7d ("capabilites: allow the application of capability limits
      to usermode helpers") creates and commits new credentials with
      prepare_kernel_cred() after the call to the init() function.  This wipes
      all keyrings after umh_keys_init() is called.
      
      The best way to deal with this is to put the init() call just prior to
      the commit_creds() call, and pass the cred pointer to init().  That
      means that umh_keys_init() and suchlike can modify the credentials
      _before_ they are published and potentially in use by the rest of the
      system.
      
      This prevents request_key() from working as it is prevented from passing
      the session keyring it set up with the authorisation token to
      /sbin/request-key, and so the latter can't assume the authority to
      instantiate the key.  This causes the in-kernel DNS resolver to fail
      with ENOKEY unconditionally.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Tested-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87966996
  5. 17 6月, 2011 3 次提交
  6. 16 6月, 2011 3 次提交
  7. 15 6月, 2011 20 次提交
    • S
      sched: Check if lowest_mask is initialized in find_lowest_rq() · 0da938c4
      Steven Rostedt 提交于
      On system boot up, the lowest_mask is initialized with an
      early_initcall(). But RT tasks may wake up on other
      early_initcall() callers before the lowest_mask is initialized,
      causing a system crash.
      
      Commit "d72bce0e rcu: Cure load woes" was the first commit
      to wake up RT tasks in early init. Before this commit this bug
      should not happen.
      Reported-by: NAndrew Theurer <habanero@linux.vnet.ibm.com>
      Tested-by: NAndrew Theurer <habanero@linux.vnet.ibm.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20110614223657.824872966@goodmis.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      0da938c4
    • H
      sched: Fix need_resched() when checking peempt · 8dd0de8b
      Hillf Danton 提交于
      The RT preempt check tests the wrong task if NEED_RESCHED is
      set. It currently checks the local CPU task. It is supposed to
      check the task that is running on the runqueue we are about to
      wake another task on.
      Signed-off-by: NHillf Danton <dhillf@gmail.com>
      Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20110614223657.450239027@goodmis.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      8dd0de8b
    • M
      tracing/kprobes: Fix kprobe-tracer to support stack trace · 1fd8df2c
      Masami Hiramatsu 提交于
      Fix to support kernel stack trace correctly on kprobe-tracer.
      Since the execution path of kprobe-based dynamic events is different
      from other tracepoint-based events, normal ftrace_trace_stack() doesn't
      work correctly. To fix that, this introduces ftrace_trace_stack_regs()
      which traces stack via pt_regs instead of current stack register.
      
      e.g.
      
       # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
       # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
       # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
       # head -n 20 /sys/kernel/debug/tracing/trace
                  bash-2968  [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
                  bash-2968  [000] 10297.050247: <stack trace>
       => schedule_timeout
       => n_tty_read
       => tty_read
       => vfs_read
       => sys_read
       => system_call_fastpath
           kworker/0:1-2940  [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
           kworker/0:1-2940  [000] 10297.050266: <stack trace>
       => worker_thread
       => kthread
       => kernel_thread_helper
                  sshd-1132  [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
                  sshd-1132  [000] 10297.050365: <stack trace>
       => sysret_careful
      
      Note: Even with this fix, the first entry will be skipped
      if the probe is put on the function entry area before
      the frame pointer is set up (usually, that is 4 bytes
       (push %bp; mov %sp %bp) on x86), because stack unwinder
      depends on the frame pointer.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1fd8df2c
    • M
      stack_trace: Add weak save_stack_trace_regs() · c624d33f
      Masami Hiramatsu 提交于
      Add weak symbol of save_stack_trace_regs() as same as
      save_stack_trace_tsk() since that is not implemented
      except x86 yet.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Link: http://lkml.kernel.org/r/20110608070927.17777.37895.stgit@fedora15Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c624d33f
    • V
      ring-buffer: Set __GFP_NORETRY flag for ring buffer allocating process · d7ec4bfe
      Vaibhav Nagarnaik 提交于
      The tracing ring buffer is allocated from kernel memory. While
      allocating a large chunk of memory, OOM might happen which destabilizes
      the system. Thus random processes might get killed during the
      allocation.
      
      This patch adds __GFP_NORETRY flag to the ring buffer allocation calls
      to make it fail more gracefully if the system will not be able to
      complete the allocation request.
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1307491302-9236-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d7ec4bfe
    • P
      tracing: Convert to kstrtoul_from_user · 22fe9b54
      Peter Huewe 提交于
      This patch replaces the code for getting an unsigned long from a
      userspace buffer by a simple call to kstroul_from_user.
      This makes it easier to read and less error prone.
      Signed-off-by: NPeter Huewe <peterhuewe@gmx.de>
      Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.deSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      22fe9b54
    • J
      tracing, function_graph: Add context-info support for function_graph tracer · 749230b0
      Jiri Olsa 提交于
      The function_graph tracer does not follow global context-info option.
      Adding TRACE_ITER_CONTEXT_INFO trace_flags check to enable it.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# echo 0 > options/context-info
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      #     TIME        CPU  DURATION                  FUNCTION CALLS
      #      |          |     |   |                     |   |   |   |
       1)   0.079 us    |          } /* __vma_link_rb */
       1)   0.056 us    |          copy_page_range();
       1)               |          security_vm_enough_memory() {
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
        } /* update_ts_time_stats */
        timekeeping_max_deferment();
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-6-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      749230b0
    • J
      tracing, function_graph: Remove lock-depth from latency trace · 199abfab
      Jiri Olsa 提交于
      The lock_depth was removed in commit
      e6e1e259 tracing: Remove lock_depth from event entry
      
      Removing the lock_depth info from function_graph latency header.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# echo 1 > options/latency-format
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
      # --------------------------------------------------------------------
      # latency: 0 us, #59756/311298, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
      #    -----------------
      #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #
      #      _-----=> irqs-off
      #     / _----=> need-resched
      #    | / _---=> hardirq/softirq
      #    || / _--=> preempt-depth
      #    ||| / _-=> lock-depth
      #    |||| /
      # CPU|||||  DURATION                  FUNCTION CALLS
      # |  |||||   |   |                     |   |   |   |
       0)  ....  0.068 us    |    } /* __rcu_read_unlock */
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
      # function_graph latency trace v1.1.5 on 3.0.0-rc1-tip+
      # --------------------------------------------------------------------
      # latency: 0 us, #59747/1744610, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
      #    -----------------
      #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #
      #      _-----=> irqs-off
      #     / _----=> need-resched
      #    | / _---=> hardirq/softirq
      #    || / _--=> preempt-depth
      #    ||| /
      # CPU||||  DURATION                  FUNCTION CALLS
      # |  ||||   |   |                     |   |   |   |
       0)  ..s.  1.641 us    |  } /* __rcu_process_callbacks */
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-5-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      199abfab
    • J
      tracing, function: Fix trace header to follow context-info option · f56e7f8e
      Jiri Olsa 提交于
      The header display of function tracer does not follow
      the context-info option, so field names are displayed even
      if this option is off.
      
      Added check for TRACE_ITER_CONTEXT_INFO trace_flags.
      
      With following commands:
      	# echo function > ./current_tracer
      	# echo 0 > options/context-info
      	# cat trace
      
      This is what it looked like before:
      # tracer: function
      #
      #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
      #              | |       |          |         |
      add_preempt_count <-schedule
      rcu_note_context_switch <-schedule
      ...
      
      This is what it looks like now:
      # tracer: function
      #
      _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-4-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f56e7f8e
    • J
      tracing, function_graph: Merge overhead and duration display functions · ffeb80fc
      Jiri Olsa 提交于
      Functions print_graph_overhead() and print_graph_duration() displays
      data for one field - DURATION.
      
      I merged them into single function print_graph_duration(),
      and added a way to display the empty parts of the field.
      
      This way the print_graph_irq() function can use this column to display
      the IRQ signs if needed and the DURATION field details stays inside
      the print_graph_duration() function.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-3-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ffeb80fc
    • J
      tracing, function_graph: Remove dependency of abstime and duration fields on latency · 321e68b0
      Jiri Olsa 提交于
      The display of absolute time and duration fields is based on the
      latency field. This was added during the irqsoff/wakeup tracers
      graph support changes.
      
      It's causing confusion in what fields will be displayed for the
      function_graph tracer itself. So I'm removing this depency, and
      adding absolute time and duration fields to the preemptirqsoff
      preemptoff irqsoff wakeup tracers.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      #     TIME        CPU  DURATION                  FUNCTION CALLS
      #      |          |     |   |                     |   |   |   |
       0)   0.068 us    |          } /* page_add_file_rmap */
       0)               |          _raw_spin_unlock() {
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |
       0)   0.068 us    |                } /* add_preempt_count */
       0)   0.993 us    |              } /* vfsmount_lock_local_lock */
      ...
      
      For preemptirqsoff preemptoff irqsoff wakeup tracers,
      this is what it looked like before:
      SNIP
      #                       _-----=> irqs-off
      #                      / _----=> need-resched
      #                     | / _---=> hardirq/softirq
      #                     || / _--=> preempt-depth
      #                     ||| / _-=> lock-depth
      #                     |||| /
      # CPU  TASK/PID       |||||  DURATION                  FUNCTION CALLS
      # |     |    |        |||||   |   |                     |   |   |   |
       1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      
      This is what it looks like now:
      SNIP
      #
      #                                       _-----=> irqs-off
      #                                      / _----=> need-resched
      #                                     | / _---=> hardirq/softirq
      #                                     || / _--=> preempt-depth
      #                                     ||| /
      #     TIME        CPU  TASK/PID       ||||  DURATION                  FUNCTION CALLS
      #      |          |     |    |        ||||   |   |                     |   |   |   |
         19.847735 |   1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      321e68b0
    • P
      async: Fixed an include coding style issue · 84c15027
      Paul McQuade 提交于
      Added <linux/atomic.h>,<linux/ktime.h> and Removed <asm/atomic.h>.
      Added KERN_DEBUG to printk() functions.
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPaul McQuade <tungstentide@gmail.com>
      Link: http://lkml.kernel.org/r/4DE596B4.7030904@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      84c15027
    • P
      ftrace: Fixed an include coding style issue · bd38c0e6
      Paul McQuade 提交于
      Removed <asm/ftrace.h> because <linux/ftrace.h> was already declared.
      Braces of struct's coding style fixed.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul McQuade <tungstentide@gmail.com>
      Link: http://lkml.kernel.org/r/4DE59711.3090900@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bd38c0e6
    • S
      tracing: Add disable_on_free option · cf30cf67
      Steven Rostedt 提交于
      Add a trace option to disable tracing on free. When this option is
      set, a write into the free_buffer file will not only shrink the
      ring buffer down to zero, but it will also disable tracing.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      cf30cf67
    • V
      tracing: Add a proc file to stop tracing and free buffer · 4f271a2a
      Vaibhav Nagarnaik 提交于
      The proc file entry buffer_size_kb is used to set the size of tracing
      buffer. The memory to expand the buffer size is kernel memory. Consider
      a use case where tracing is handled by a user space utility, which acts
      as a gate keeper for tracing requests. In an OOM condition, tracing is
      considered a low priority task and if the utility gets killed the ring
      buffer memory cannot be released back to the kernel.
      
      This patch adds a proc file called "free_buffer" whose purpose is to
      stop tracing and free up the ring buffer when it is closed.
      
      The user space process can then set the desired size in buffer_size_kb
      file and open the fd to the "free_buffer" file. Under OOM condition, if
      the process gets killed, the kernel closes the file descriptor. The
      release handler stops the tracing and releases the kernel memory
      automatically.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4f271a2a
    • R
      signal.c: fix kernel-doc notation · ada9c933
      Randy Dunlap 提交于
      Fix kernel-doc warnings in signal.c:
      
        Warning(kernel/signal.c:2374): No description found for parameter 'nset'
        Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ada9c933
    • V
      tracing: Use NUMA allocation for per-cpu ring buffer pages · 7ea59064
      Vaibhav Nagarnaik 提交于
      The tracing ring buffer is a group of per-cpu ring buffers where
      allocation and logging is done on a per-cpu basis. The events that are
      generated on a particular CPU are logged in the corresponding buffer.
      This is to provide wait-free writes between CPUs and good NUMA node
      locality while accessing the ring buffer.
      
      However, the allocation routines consider NUMA locality only for buffer
      page metadata and not for the actual buffer page. This causes the pages
      to be allocated on the NUMA node local to the CPU where the allocation
      routine is running at the time.
      
      This patch fixes the problem by using a NUMA node specific allocation
      routine so that the pages are allocated from a NUMA node local to the
      logging CPU.
      
      I tested with the getuid_microbench from autotest. It is a simple binary
      that calls getuid() in a loop and measures the average time for the
      syscall to complete. The following command was used to test:
      $ getuid_microbench 1000000
      
      Compared the numbers found on kernel with and without this patch and
      found that logging latency decreases by 30-50 ns/call.
      tracing with non-NUMA allocation - 569 ns/call
      tracing with NUMA allocation     - 512 ns/call
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ea59064
    • V
      tracing: Schedule a delayed work to call wakeup() · e7e2ee89
      Vaibhav Nagarnaik 提交于
      In using syscall tracing by concurrent processes, the wakeup() that is
      called in the event commit function causes contention on the spin lock
      of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
      tracepoints, and by running getuid_microbench from autotest in parallel
      I found that the contention causes exponential latency increase in the
      tracing path.
      
      The autotest binary getuid_microbench calls getuid() in a tight loop for
      the given number of iterations and measures the average time required to
      complete a single invocation of syscall.
      
      The patch schedules a delayed work after 2 ms once an event commit calls
      to wake up the trace wait_queue. This removes the delay caused by
      contention on spin lock in wakeup() and amortizes the wakeup() calls
      scheduled over the 2 ms period.
      
      In the following example, the script enables the sys_enter_getuid and
      sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
      with the given number of processes. The output clearly shows the latency
      increase caused by contentions.
      
      $ ~/getuid.sh 1
      1000000 calls in 0.720974253 s (720.974253 ns/call)
      
      $ ~/getuid.sh 2
      1000000 calls in 1.166457554 s (1166.457554 ns/call)
      1000000 calls in 1.168933765 s (1168.933765 ns/call)
      
      $ ~/getuid.sh 3
      1000000 calls in 1.783827516 s (1783.827516 ns/call)
      1000000 calls in 1.795553270 s (1795.553270 ns/call)
      1000000 calls in 1.796493376 s (1796.493376 ns/call)
      
      $ ~/getuid.sh 4
      1000000 calls in 4.483041796 s (4483.041796 ns/call)
      1000000 calls in 4.484165388 s (4484.165388 ns/call)
      1000000 calls in 4.484850762 s (4484.850762 ns/call)
      1000000 calls in 4.485643576 s (4485.643576 ns/call)
      
      $ ~/getuid.sh 5
      1000000 calls in 6.497521653 s (6497.521653 ns/call)
      1000000 calls in 6.502000236 s (6502.000236 ns/call)
      1000000 calls in 6.501709115 s (6501.709115 ns/call)
      1000000 calls in 6.502124100 s (6502.124100 ns/call)
      1000000 calls in 6.502936358 s (6502.936358 ns/call)
      
      After the patch, the latencies scale better.
      1000000 calls in 0.728720455 s (728.720455 ns/call)
      
      1000000 calls in 0.842782857 s (842.782857 ns/call)
      1000000 calls in 0.883803135 s (883.803135 ns/call)
      
      1000000 calls in 0.902077764 s (902.077764 ns/call)
      1000000 calls in 0.902838202 s (902.838202 ns/call)
      1000000 calls in 0.908896885 s (908.896885 ns/call)
      
      1000000 calls in 0.932523515 s (932.523515 ns/call)
      1000000 calls in 0.958009672 s (958.009672 ns/call)
      1000000 calls in 0.986188020 s (986.188020 ns/call)
      1000000 calls in 0.989771102 s (989.771102 ns/call)
      
      1000000 calls in 0.933518391 s (933.518391 ns/call)
      1000000 calls in 0.958897947 s (958.897947 ns/call)
      1000000 calls in 1.031038897 s (1031.038897 ns/call)
      1000000 calls in 1.089516025 s (1089.516025 ns/call)
      1000000 calls in 1.141998347 s (1141.998347 ns/call)
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e7e2ee89
    • S
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li 提交于
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Tested-by: N"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09223371
    • P
      rcu: Simplify curing of load woes · 9a432736
      Paul E. McKenney 提交于
      Make the functions creating the kthreads wake them up.  Leverage the
      fact that the per-node and boost kthreads can run anywhere, thus
      dispensing with the need to wake them up once the incoming CPU has
      gone fully online.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      9a432736