1. 12 11月, 2008 1 次提交
    • S
      ring-buffer: buffer record on/off switch · a3583244
      Steven Rostedt 提交于
      Impact: enable/disable ring buffer recording API added
      
      Several kernel developers have requested that there be a way to stop
      recording into the ring buffers with a simple switch that can also
      be enabled from userspace. This patch addes a new kernel API to the
      ring buffers called:
      
       tracing_on()
       tracing_off()
      
      When tracing_off() is called, all ring buffers will not be able to record
      into their buffers.
      
      tracing_on() will enable the ring buffers again.
      
      These two act like an on/off switch. That is, there is no counting of the
      number of times tracing_off or tracing_on has been called.
      
      A new file is added to the debugfs/tracing directory called
      
        tracing_on
      
      This allows for userspace applications to also flip the switch.
      
        echo 0 > debugfs/tracing/tracing_on
      
      disables the tracing.
      
        echo 1 > /debugfs/tracing/tracing_on
      
      enables it.
      
      Note, this does not disable or enable any tracers. It only sets or clears
      a flag that needs to be set in order for the ring buffers to write to
      their buffers. It is a global flag, and affects all ring buffers.
      
      The buffers start out with tracing_on enabled.
      
      There are now three flags that control recording into the buffers:
      
       tracing_on: which affects all ring buffer tracers.
      
       buffer->record_disabled: which affects an allocated buffer, which may be set
           if an anomaly is detected, and tracing is disabled.
      
       cpu_buffer->record_disabled: which is set by tracing_stop() or if an
           anomaly is detected. tracing_start can not reenable this if
           an anomaly occurred.
      
      The userspace debugfs/tracing/tracing_enabled is implemented with
      tracing_stop() but the user space code can not enable it if the kernel
      called tracing_stop().
      
      Userspace can enable the tracing_on even if the kernel disabled it.
      It is just a switch used to stop tracing if a condition was hit.
      tracing_on is not for protecting critical areas in the kernel nor is
      it for stopping tracing if an anomaly occurred. This is because userspace
      can reenable it at any time.
      
      Side effect: With this patch, I discovered a dead variable in ftrace.c
        called tracing_on. This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a3583244
  2. 11 11月, 2008 7 次提交
    • P
      sched: release buddies on yield · 2002c695
      Peter Zijlstra 提交于
      Clear buddies on yield, so that the buddy rules don't schedule them
      despite them being placed right-most.
      
      This fixed a performance regression with yield-happy binary JVMs.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Tested-by: NLin Ming <ming.m.lin@intel.com>
      2002c695
    • G
      timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context · 5d5254f0
      Gautham R Shenoy 提交于
      Impact: fix incorrect locking triggered during hotplug-intense stress-tests
      
      While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline,
      we queue them on the cb_pending list, so that they won't go
      stale.
      
      Thus, when the callbacks of the timers run from the softirq context,
      they could run into potential deadlocks, since these callbacks
      assume that they're running with irq's disabled, thereby annoying
      lockdep!
      
      Fix this by emulating hardirq context while running these callbacks from
      the hrtimer softirq.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.27 #2
      --------------------------------
      inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
      ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes:
       (&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
      {in-hardirq-W} state was registered at:
        [<c014103c>] __lock_acquire+0x549/0x121e
        [<c0107890>] native_sched_clock+0x88/0x99
        [<c013aa12>] clocksource_get_next+0x39/0x3f
        [<c0139abc>] update_wall_time+0x616/0x7df
        [<c0141d6b>] lock_acquire+0x5a/0x74
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c047ed45>] _spin_lock+0x1c/0x45
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c0121724>] scheduler_tick+0x3a/0x18d
        [<c012c436>] update_process_times+0x3a/0x44
        [<c013c044>] tick_periodic+0x63/0x6d
        [<c013c062>] tick_handle_periodic+0x14/0x5e
        [<c010568c>] timer_interrupt+0x44/0x4a
        [<c0150c9f>] handle_IRQ_event+0x13/0x3d
        [<c0151c14>] handle_level_irq+0x79/0xbd
        [<c0105634>] do_IRQ+0x69/0x7d
        [<c01041e4>] common_interrupt+0x28/0x30
        [<c047007b>] aac_probe_one+0x1a3/0x3f3
        [<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39
        [<c01512b4>] setup_irq+0x1be/0x1f9
        [<c065d70b>] start_kernel+0x259/0x2c5
        [<ffffffff>] 0xffffffff
      irq event stamp: 50102
      hardirqs last  enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23
      hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b
      softirqs last  enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d
      softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d
      
      other info that might help us debug this:
      no locks held by ksoftirqd/0/4.
      
      stack backtrace:
      Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2
       [<c013f6cb>] print_usage_bug+0x13e/0x147
       [<c013fef5>] mark_lock+0x493/0x797
       [<c01410b1>] __lock_acquire+0x5be/0x121e
       [<c0141d6b>] lock_acquire+0x5a/0x74
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c047ed45>] _spin_lock+0x1c/0x45
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
       [<c01210fd>] finish_task_switch+0x41/0xbd
       [<c0107890>] native_sched_clock+0x88/0x99
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0136dda>] run_hrtimer_pending+0x54/0xe5
       [<c011dae6>] sched_rt_period_timer+0x0/0x1fc
       [<c0128afb>] __do_softirq+0x7b/0xef
       [<c0128ba6>] do_softirq+0x37/0x4d
       [<c0128c12>] ksoftirqd+0x56/0xc5
       [<c0128bbc>] ksoftirqd+0x0/0xc5
       [<c0134649>] kthread+0x38/0x5d
       [<c0134611>] kthread+0x0/0x5d
       [<c0104477>] kernel_thread_helper+0x7/0x10
       =======================
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d5254f0
    • O
      fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock · ad474cac
      Oleg Nesterov 提交于
      Impact: fix hang/crash on ia64 under high load
      
      This is ugly, but the simplest patch by far.
      
      Unlike other similar routines, account_group_exec_runtime() could be
      called "implicitly" from within scheduler after exit_notify(). This
      means we can race with the parent doing release_task(), we can't just
      check ->signal != NULL.
      
      Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
      before __cleanup_signal() to make sure ->signal can't be freed under
      task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
      about the case when tsk changes cpu/rq under us, this should be OK.
      
      Thanks to Ingo who nacked my previous buggy patch.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Reported-by: NDoug Chapman <doug.chapman@hp.com>
      ad474cac
    • S
      ring-buffer: prevent infinite looping on time stamping · 4143c5cb
      Steven Rostedt 提交于
      Impact: removal of unnecessary looping
      
      The lockless part of the ring buffer allows for reentry into the code
      from interrupts. A timestamp is taken, a test is preformed and if it
      detects that an interrupt occurred that did tracing, it tries again.
      
      The problem arises if the timestamp code itself causes a trace.
      The detection will detect this and loop again. The difference between
      this and an interrupt doing tracing, is that this will fail every time,
      and cause an infinite loop.
      
      Currently, we test if the loop happens 1000 times, and if so, it will
      produce a warning and disable the ring buffer.
      
      The problem with this approach is that it makes it difficult to perform
      some types of tracing (tracing the timestamp code itself).
      
      Each trace entry has a delta timestamp from the previous entry.
      If a trace entry is reserved but and interrupt occurs and traces before
      the previous entry is commited, the delta timestamp for that entry will
      be zero. This actually makes sense in terms of tracing, because the
      interrupt entry happened before the preempted entry was commited, so
      one may consider the two happening at the same time. The order is
      still preserved in the buffer.
      
      With this idea, instead of trying to get a new timestamp if an interrupt
      made it in between the timestamp and the test, the entry could simply
      make the delta zero and continue. This will prevent interrupts or
      tracers in the timer code from causing the above loop.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4143c5cb
    • S
      ftrace: disable tracing on resize · bf5e6519
      Steven Rostedt 提交于
      Impact: fix for bug on resize
      
      This patch addresses the bug found here:
      
       http://bugzilla.kernel.org/show_bug.cgi?id=11996
      
      When ftrace converted to the new unified trace buffer, the resizing of
      the buffer was not protected as much as it was originally. If tracing
      is performed while the resize occurs, then the buffer can be corrupted.
      
      This patch disables all ftrace buffer modifications before a resize
      takes place.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      bf5e6519
    • T
      nohz: disable tick_nohz_kick_tick() for now · ae99286b
      Thomas Gleixner 提交于
      Impact: nohz powersavings and wakeup regression
      
      commit fb02fbc1 (NOHZ: restart tick
      device from irq_enter()) causes a serious wakeup regression.
      
      While the patch is correct it does not take into account that spurious
      wakeups happen on x86. A fix for this issue is available, but we just
      revert to the .27 behaviour and let long running softirqs screw
      themself.
      
      Disable it for now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ae99286b
    • T
      irq: call __irq_enter() before calling the tick_idle_check · ee5f80a9
      Thomas Gleixner 提交于
      Impact: avoid spurious ksoftirqd wakeups
      
      The tick idle check which is called from irq_enter() was run before
      the call to __irq_enter() which did not set the in_interrupt() bits in
      preempt_count. That way the raise of a softirq woke up softirqd for
      nothing as the softirq was handled on return from interrupt.
      
      Call __irq_enter() before calling into the tick idle check code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ee5f80a9
  3. 10 11月, 2008 1 次提交
  4. 07 11月, 2008 3 次提交
  5. 06 11月, 2008 3 次提交
    • R
      cpumask: introduce new API, without changing anything · 2d3854a3
      Rusty Russell 提交于
      Impact: introduce new APIs
      
      We want to deprecate cpumasks on the stack, as we are headed for
      gynormous numbers of CPUs.  Eventually, we want to head towards an
      undefined 'struct cpumask' so they can never be declared on stack.
      
      1) New cpumask functions which take pointers instead of copies.
         (cpus_* -> cpumask_*)
      
      2) Several new helpers to reduce requirements for temporary cpumasks
         (cpumask_first_and, cpumask_next_and, cpumask_any_and)
      
      3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
         (cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
      
      4) 'struct cpumask' for explicitness and to mark new-style code.
      
      5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
         not NR_CPUS for time efficiency and for smaller dynamic allocations
         in future.
      
      6) cpumask_copy() so we can allocate less than a full cpumask eventually
         (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
         definition eventually.
      
      7) work_on_cpu() helper for doing task on a CPU, rather than saving old
         cpumask for current thread and manipulating it.
      
      8) smp_call_function_many() which is smp_call_function_mask() except
         taking a cpumask pointer.
      
      Note that this patch simply introduces the new functions and leaves
      the obsolescent ones in place.  This is to simplify the transition
      patches.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d3854a3
    • A
      Add round_jiffies_up and related routines · 9c133c46
      Alan Stern 提交于
      This patch (as1158b) adds round_jiffies_up() and friends.  These
      routines work like the analogous round_jiffies() functions, except
      that they will never round down.
      
      The new routines will be useful for timeouts where we don't care
      exactly when the timer expires, provided it doesn't expire too soon.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9c133c46
    • S
      generic-ipi: fix the smp_mb() placement · 561920a0
      Suresh Siddha 提交于
      smp_mb() is needed (to make the memory operations visible globally) before
      sending the ipi on the sender and the receiver (on Alpha atleast) needs
      smp_read_barrier_depends() in the handler before reading the call_single_queue
      list in a lock-free fashion.
      
      On x86, x2apic mode register accesses for sending IPI's don't have serializing
      semantics. So the need for smp_mb() before sending the IPI becomes more
      critical in x2apic mode.
      
      Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
      smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
      doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      561920a0
  6. 05 11月, 2008 5 次提交
  7. 03 11月, 2008 3 次提交
  8. 02 11月, 2008 2 次提交
  9. 31 10月, 2008 7 次提交
  10. 30 10月, 2008 2 次提交
  11. 29 10月, 2008 2 次提交
    • S
      resources: fix x86info results ioremap.c:226 __ioremap_caller+0xf2/0x2d6() WARNINGs · d68612b2
      Suresh Siddha 提交于
      Impact: avoid false-positive WARN_ON()
      
      Andi Kleen reported:
      > When running x86info on a 2.6.27-git8 system I get
      >
      > resource map sanity check conflict: 0x9e000 0x9efff 0x10000 0x9e7ff System RAM
      > ------------[ cut here ]------------
      > WARNING: at /home/lsrc/linux/arch/x86/mm/ioremap.c:226 __ioremap_caller+0xf2/0x2d6()
      > ...
      
      Some of the pages below the 1MB ISA addresses will be shared typically by both
      BIOS and system usable RAM. For example:
      	BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
      	BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
      
      x86info reads the low physical address using /dev/mem, which internally
      uses ioremap() for accessing non RAM pages. ioremap() of such low
      pages conflicts with multiple resource entities leading to the
      above warning.
      
      Change the iomem_map_sanity_check() to allow mapping a page spanning multiple
      resource entities (minimum granularity that one can map is a page anyhow).
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d68612b2
    • F
      ftrace: perform an initialization for ftrace to enable it · 0b6e4d56
      Frederic Weisbecker 提交于
      Impact: corrects a bug which made the non-dyn function tracer not functional
      
      With latest git, the non-dynamic function tracer didn't get any trace.
      
      The problem was the fact that ftrace_enabled wasn't initialized to 1
      because ftrace hasn't any init function when DYNAMIC_FTRACE is disabled.
      
      So when a tracer tries to register an ftrace_ops struct,
      __register_ftrace_function failed to set the hook.
      
      This patch corrects it by setting an init function to initialize
      ftrace during the boot.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b6e4d56
  12. 28 10月, 2008 4 次提交
    • S
      ftrace: fix current_tracer error return · 60063a66
      Steven Rostedt 提交于
      The commit (in linux-tip) c2931e05
       ( ftrace: return an error when setting a nonexistent tracer )
      added useful code that would error when a bad tracer was written into
      the current_tracer file.
      
      But this had a bug if the amount written was more than the amount read by
      that code. The first iteration would set the tracer correctly, but since
      it did not consume the rest of what was written (usually whitespace), the
      userspace utility would continue to write what was not consumed. This
      second iteration would fail to find a tracer and return -EINVAL. Funny
      thing is that the tracer would have already been set.
      
      This patch just consumes all the data that is written to the file.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60063a66
    • H
      lockdep: fix irqs on/off ip tracing · 6afe40b4
      Heiko Carstens 提交于
      Impact: fix lockdep lock-api-caller output when irqsoff tracing is enabled
      
      81d68a96 "ftrace: trace irq disabled critical timings" added wrappers around
      trace_hardirqs_on/off_caller. However these functions use
      __builtin_return_address(0) to figure out which function actually disabled
      or enabled irqs. The result is that we save the ips of trace_hardirqs_on/off
      instead of the real caller. Not very helpful.
      
      However since the patch from Steven the ip already gets passed. So use that
      and get rid of __builtin_return_address(0) in these two functions.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6afe40b4
    • Q
      lockdep: minor fix for debug_show_all_locks() · 46fec7ac
      qinghuang feng 提交于
      When we failed to get tasklist_lock eventually (count equals 0),
      we should only print " ignoring it.\n", and not print
      " locked it.\n" needlessly.
      Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46fec7ac
    • F
      tracing: fix a build error on alpha · 21798a84
      Frederic Weisbecker 提交于
      Impact: build fix on Alpha
      
      When tracing is enabled, some arch have included <linux/irqflags.h>
      on their <asm/system.h> but others like alpha or m68k don't.
      
      Build error on alpha:
      
      kernel/trace/trace.c: In function 'tracing_cpumask_write':
      kernel/trace/trace.c:2145: error: implicit declaration of function 'raw_local_irq_disable'
      kernel/trace/trace.c:2162: error: implicit declaration of function 'raw_local_irq_enable'
      
      Tested on Alpha through a cross-compiler (should correct a similar issue on m68k).
      Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      21798a84