1. 03 7月, 2013 3 次提交
    • Z
      uprobes: Fix return value in error handling path · fa44063f
      zhangwei(Jovi) 提交于
      When wrong argument is passed into uprobe_events it does not return
      an error:
      
      [root@jovi tracing]# echo 'p:myprobe /bin/bash' > uprobe_events
      [root@jovi tracing]#
      
      The proper response is:
      
      [root@jovi tracing]# echo 'p:myprobe /bin/bash' > uprobe_events
      -bash: echo: write error: Invalid argument
      
      Link: http://lkml.kernel.org/r/51B964FF.5000106@huawei.com
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: <srikar@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 3.5+
      Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fa44063f
    • S
      tracing: Fix race between deleting buffer and setting events · 2a6c24af
      Steven Rostedt (Red Hat) 提交于
      While analyzing the code, I discovered that there's a potential race between
      deleting a trace instance and setting events. There are a few races that can
      occur if events are being traced as the buffer is being deleted. Mostly the
      problem comes with freeing the descriptor used by the trace event callback.
      To prevent problems like this, the events are disabled before the buffer is
      deleted. The problem with the current solution is that the event_mutex is let
      go between disabling the events and freeing the files, which means that the events
      could be enabled again while the freeing takes place.
      
      Cc: stable@vger.kernel.org # 3.10
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a6c24af
    • S
      tracing: Add trace_array_get/put() to event handling · 8e2e2fa4
      Steven Rostedt (Red Hat) 提交于
      Commit a695cb58 "tracing: Prevent deleting instances when they are being read"
      tried to fix a race between deleting a trace instance and reading contents
      of a trace file. But it wasn't good enough. The following could crash the kernel:
      
       # cd /sys/kernel/debug/tracing/instances
       # ( while :; do mkdir foo; rmdir foo; done ) &
       # ( while :; do echo 1 > foo/events/sched/sched_switch 2> /dev/null; done ) &
      
      Luckily this can only be done by root user, but it should be fixed regardless.
      
      The problem is that a delete of the file can happen after the write to the event
      is opened, but before the enabling happens.
      
      The solution is to make sure the trace_array is available before succeeding in
      opening for write, and incerment the ref counter while opened.
      
      Now the instance can be deleted when the events are writing to the buffer,
      but the deletion of the instance will disable all events before the instance
      is actually deleted.
      
      Cc: stable@vger.kernel.org # 3.10
      Reported-by: NAlexander Lam <azl@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8e2e2fa4
  2. 02 7月, 2013 13 次提交
  3. 20 6月, 2013 4 次提交
  4. 12 6月, 2013 6 次提交
  5. 09 6月, 2013 3 次提交
  6. 07 6月, 2013 1 次提交
    • S
      tracing: Use current_uid() for critical time tracing · f17a5194
      Steven Rostedt (Red Hat) 提交于
      The irqsoff tracer records the max time that interrupts are disabled.
      There are hooks in the assembly code that calls back into the tracer when
      interrupts are disabled or enabled.
      
      When they are enabled, the tracer checks if the amount of time they
      were disabled is larger than the previous recorded max interrupts off
      time. If it is, it creates a snapshot of the currently running trace
      to store where the last largest interrupts off time was held and how
      it happened.
      
      During testing, this RCU lockdep dump appeared:
      
      [ 1257.829021] ===============================
      [ 1257.829021] [ INFO: suspicious RCU usage. ]
      [ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G        W
      [ 1257.829021] -------------------------------
      [ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
      [ 1257.829021]
      [ 1257.829021] other info that might help us debug this:
      [ 1257.829021]
      [ 1257.829021]
      [ 1257.829021] RCU used illegally from idle CPU!
      [ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
      [ 1257.829021] RCU used illegally from extended quiescent state!
      [ 1257.829021] 2 locks held by trace-cmd/4831:
      [ 1257.829021]  #0:  (max_trace_lock){......}, at: [<ffffffff810e2b77>] stop_critical_timing+0x1a3/0x209
      [ 1257.829021]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810dae5a>] __update_max_tr+0x88/0x1ee
      [ 1257.829021]
      [ 1257.829021] stack backtrace:
      [ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G        W    3.10.0-rc1-test+ #171
      [ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
      [ 1257.829021]  0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
      [ 1257.829021]  ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
      [ 1257.829021]  ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
      [ 1257.829021] Call Trace:
      [ 1257.829021]  [<ffffffff8153dd2b>] dump_stack+0x19/0x1b
      [ 1257.829021]  [<ffffffff81092a00>] lockdep_rcu_suspicious+0x109/0x112
      [ 1257.829021]  [<ffffffff810daebf>] __update_max_tr+0xed/0x1ee
      [ 1257.829021]  [<ffffffff810dae5a>] ? __update_max_tr+0x88/0x1ee
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810dbf85>] update_max_tr_single+0x11d/0x12d
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810e2b15>] stop_critical_timing+0x141/0x209
      [ 1257.829021]  [<ffffffff8109569a>] ? trace_hardirqs_on+0xd/0xf
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810e3057>] time_hardirqs_on+0x2a/0x2f
      [ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff8109550c>] trace_hardirqs_on_caller+0x16/0x197
      [ 1257.829021]  [<ffffffff8109569a>] trace_hardirqs_on+0xd/0xf
      [ 1257.829021]  [<ffffffff811002b9>] user_enter+0xfd/0x107
      [ 1257.829021]  [<ffffffff810029b4>] do_notify_resume+0x92/0x97
      [ 1257.829021]  [<ffffffff8154bdca>] int_signal+0x12/0x17
      
      What happened was entering into the user code, the interrupts were enabled
      and a max interrupts off was recorded. The trace buffer was saved along with
      various information about the task: comm, pid, uid, priority, etc.
      
      The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
      to retrieve the data, and this happened to happen where RCU is blind (user_enter).
      
      As only the preempt and irqs off tracers can have this happen, and they both
      only have the tsk == current, if tsk == current, use current_uid() instead of
      task_uid(), as current_uid() does not use RCU as only current can change its uid.
      
      This fixes the RCU suspicious splat.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f17a5194
  7. 30 5月, 2013 1 次提交
  8. 29 5月, 2013 4 次提交
    • S
      ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends · 1bb539ca
      Steven Rostedt 提交于
      As rcu_dereference_raw() under RCU debug config options can add quite a
      bit of checks, and that tracing uses rcu_dereference_raw(), these checks
      happen with the function tracer. The function tracer also happens to trace
      these debug checks too. This added overhead can livelock the system.
      
      Have the function tracer use the new RCU _notrace equivalents that do
      not do the debug checks for RCU.
      
      Link: http://lkml.kernel.org/r/20130528184209.467603904@goodmis.orgAcked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1bb539ca
    • J
      cgroup: warn about mismatching options of a new mount of an existing hierarchy · 2a0ff3fb
      Jeff Liu 提交于
      With the new __DEVEL__sane_behavior mount option was introduced,
      if the root cgroup is alive with no xattr function, to mount a
      new cgroup with xattr will be rejected in terms of design which
      just fine.  However, if the root cgroup does not mounted with
      __DEVEL__sane_hehavior, to create a new cgroup with xattr option
      will succeed although after that the EA function does not works
      as expected but will get ENOTSUPP for setting up attributes under
      either cgroup. e.g.
      
      setfattr: /cgroup2/test: Operation not supported
      
      Instead of keeping silence in this case, it's better to drop a log
      entry in warning level.  That would be helpful to understand the
      reason behind the scene from the user's perspective, and this is
      essentially an improvement does not break the backward compatibilities.
      
      With this fix, above mount attemption will keep up works as usual but
      the following line cound be found at the system log:
      
      [ ...] cgroup: new mount options do not match the existing superblock
      
      tj: minor formatting / message updates.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      2a0ff3fb
    • Z
      timekeeping: Correct run-time detection of persistent_clock. · 0d6bd995
      Zoran Markovic 提交于
      Since commit 31ade306, timekeeping_init()
      checks for presence of persistent clock by attempting to read a non-zero
      time value. This is an issue on platforms where persistent_clock (instead
      is implemented as a free-running counter (instead of an RTC) starting
      from zero on each boot and running during suspend. Examples are some ARM
      platforms (e.g. PandaBoard).
      
      An attempt to read such a clock during timekeeping_init() may return zero
      value and falsely declare persistent clock as missing. Additionally, in
      the above case suspend times may be accounted twice (once from
      timekeeping_resume() and once from rtc_resume()), resulting in a gradual
      drift of system time.
      
      This patch does a run-time correction of the issue by doing the same check
      during timekeeping_suspend().
      
      A better long-term solution would have to return error when trying to read
      non-existing clock and zero when trying to read an uninitialized clock, but
      that would require changing all persistent_clock implementations.
      
      This patch addresses the immediate breakage, for now.
      
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NZoran Markovic <zoran.markovic@linaro.org>
      [jstultz: Tweaked commit message and subject]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      0d6bd995
    • G
      ntp: Remove unused variable flags in __hardpps · aa848233
      Geert Uytterhoeven 提交于
      kernel/time/ntp.c: In function ‘__hardpps’:
      kernel/time/ntp.c:877: warning: unused variable ‘flags’
      
      commit a076b214 ("ntp: Remove ntp_lock,
      using the timekeeping locks to protect ntp state") removed its users,
      but not the actual variable.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      aa848233
  9. 28 5月, 2013 2 次提交
    • S
      ring-buffer: Do not poll non allocated cpu buffers · 6721cb60
      Steven Rostedt (Red Hat) 提交于
      The tracing infrastructure sets up for possible CPUs, but it uses
      the ring buffer polling, it is possible to call the ring buffer
      polling code with a CPU that hasn't been allocated. This will cause
      a kernel oops when it access a ring buffer cpu buffer that is part
      of the possible cpus but hasn't been allocated yet as the CPU has never
      been online.
      Reported-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Tested-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6721cb60
    • T
      tick: Cure broadcast false positive pending bit warning · 2938d275
      Thomas Gleixner 提交于
      commit 26517f3e (tick: Avoid programming the local cpu timer if
      broadcast pending) added a warning if the cpu enters broadcast mode
      again while the pending bit is still set. Meelis reported that the
      warning triggers. There are two corner cases which have been not
      considered:
      
      1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
         twice. That can result in the following scenario
      
         CPU0                    CPU1
                                 cpuidle_idle_call()
                                   clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                     set cpu in tick_broadcast_oneshot_mask
      
         broadcast interrupt
           event expired for cpu1
           set pending bit
      
                                   acpi_idle_enter_simple()
                                     clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                       WARN_ON(pending bit)
      
        Move the WARN_ON into the section where we enter broadcast mode so
        it wont provide false positives on the second call.
      
      2) safe_halt() enables interrupts, so a broadcast interrupt can be
         delivered befor the broadcast mode is disabled. That sets the
         pending bit for the CPU which receives the broadcast
         interrupt. Though the interrupt is delivered right away from the
         broadcast handler and leaves the pending bit stale.
      
         Clear the pending bit for the current cpu in the broadcast handler.
      Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2938d275
  10. 25 5月, 2013 1 次提交
  11. 24 5月, 2013 1 次提交
    • T
      cgroup: fix a subtle bug in descendant pre-order walk · 7805d000
      Tejun Heo 提交于
      When cgroup_next_descendant_pre() initiates a walk, it checks whether
      the subtree root doesn't have any children and if not returns NULL.
      Later code assumes that the subtree isn't empty.  This is broken
      because the subtree may become empty inbetween, which can lead to the
      traversal escaping the subtree by walking to the sibling of the
      subtree root.
      
      There's no reason to have the early exit path.  Remove it along with
      the later assumption that the subtree isn't empty.  This simplifies
      the code a bit and fixes the subtle bug.
      
      While at it, fix the comment of cgroup_for_each_descendant_pre() which
      was incorrectly referring to ->css_offline() instead of
      ->css_online().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: stable@vger.kernel.org
      7805d000
  12. 23 5月, 2013 1 次提交
    • S
      tracing: Fix crash when ftrace=nop on the kernel command line · ca164318
      Steven Rostedt (Red Hat) 提交于
      If ftrace=<tracer> is on the kernel command line, when that tracer is
      registered, it will be initiated by tracing_set_tracer() to execute that
      tracer.
      
      The nop tracer is just a stub tracer that is used to have no tracer
      enabled. It is assigned at early bootup as it is the default tracer.
      
      But if ftrace=nop is on the kernel command line, the registering of the
      nop tracer will call tracing_set_tracer() which will try to execute
      the nop tracer. But it expects tr->current_trace to be assigned something
      as it usually is assigned to the nop tracer. As it hasn't been assigned
      to anything yet, it causes the system to crash.
      
      The simple fix is to move the tr->current_trace = nop before registering
      the nop tracer. The functionality is still the same as the nop tracer
      doesn't do anything anyway.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ca164318