1. 08 3月, 2011 1 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
  2. 03 3月, 2011 1 次提交
    • T
      blktrace: Remove blk_fill_rwbs_rq. · 2d3a8497
      Tao Ma 提交于
      If we enable trace events to trace block actions, We use
      blk_fill_rwbs_rq to analyze the corresponding actions
      in request's cmd_flags, but we only choose the minor 2 bits
      from it, so most of other flags(e.g, REQ_SYNC) are missing.
      For example, with a sync write we get:
      write_test-2409  [001]   160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
      8 [write_test]
      
      Since now we have integrated the flags of both bio and request,
      it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
      blk_fill_rwbs_rq isn't needed any more.
      
      With this patch, after a sync write we get:
      write_test-2417  [000]   226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
       8 [write_test]
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      2d3a8497
  3. 26 2月, 2011 1 次提交
    • T
      clockevents: Prevent oneshot mode when broadcast device is periodic · 3a142a06
      Thomas Gleixner 提交于
      When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
      can switch into oneshot mode, when the backup broadcast device
      supports oneshot mode as well. Otherwise we would try to switch the
      broadcast device into an unsupported mode unconditionally. This went
      unnoticed so far as the current available broadcast devices support
      oneshot mode. Seth unearthed this problem while debugging and working
      around an hpet related BIOS wreckage.
      
      Add the necessary check to tick_is_oneshot_available().
      Reported-and-tested-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <alpine.LFD.2.00.1102252231200.2701@localhost6.localdomain6>
      Cc: stable@kernel.org # .21 ->
      3a142a06
  4. 19 2月, 2011 2 次提交
    • T
      genirq: Disable the SHIRQ_DEBUG call in request_threaded_irq for now · 6d83f94d
      Thomas Gleixner 提交于
      With CONFIG_SHIRQ_DEBUG=y we call a newly installed interrupt handler
      in request_threaded_irq().
      
      The original implementation (commit a304e1b8) called the handler
      _BEFORE_ it was installed, but that caused problems with handlers
      calling disable_irq_nosync(). See commit 377bf1e4.
      
      It's braindead in the first place to call disable_irq_nosync in shared
      handlers, but ....
      
      Moving this call after we installed the handler looks innocent, but it
      is very subtle broken on SMP.
      
      Interrupt handlers rely on the fact, that the irq core prevents
      reentrancy.
      
      Now this debug call violates that promise because we run the handler
      w/o the IRQ_INPROGRESS protection - which we cannot apply here because
      that would result in a possibly forever masked interrupt line.
      
      A concurrent real hardware interrupt on a different CPU results in
      handler reentrancy and can lead to complete wreckage, which was
      unfortunately observed in reality and took a fricking long time to
      debug.
      
      Leave the code here for now. We want this debug feature, but that's
      not easy to fix. We really should get rid of those
      disable_irq_nosync() abusers and remove that function completely.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: stable@kernel.org # .28 -> .37
      6d83f94d
    • T
      genirq: Prevent access beyond allocated_irqs bitmap · c1ee6264
      Thomas Gleixner 提交于
      Lars-Peter Clausen pointed out:
      
         I stumbled upon this while looking through the existing archs using
         SPARSE_IRQ.  Even with SPARSE_IRQ the NR_IRQS is still the upper
         limit for the number of IRQs.
      
         Both PXA and MMP set NR_IRQS to IRQ_BOARD_START, with
         IRQ_BOARD_START being the number of IRQs used by the core.
      
         In various machine files the nr_irqs field of the ARM machine
         defintion struct is then set to "IRQ_BOARD_START + NR_BOARD_IRQS".
      
         As a result "nr_irqs" will greater then NR_IRQS which then again
         causes the "allocated_irqs" bitmap in the core irq code to be
         accessed beyond its size overwriting unrelated data.
      
      The core code really misses a sanity check there.
      
      This went unnoticed so far as by chance the compiler/linker places
      data behind that bitmap which gets initialized later on those affected
      platforms.
      
      So the obvious fix would be to add a sanity check in early_irq_init()
      and break all affected platforms. Though that check wants to be
      backported to stable as well, which will require to fix all known
      problematic platforms and probably some more yet not known ones as
      well. Lots of churn.
      
      A way simpler solution is to allocate a slightly larger bitmap and
      avoid the whole churn w/o breaking anything. Add a few warnings when
      an arch returns utter crap.
      Reported-by: NLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org # .37
      Cc: Haojian Zhuang <haojian.zhuang@marvell.com>
      Cc: Eric Miao <eric.y.miao@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      c1ee6264
  5. 17 2月, 2011 3 次提交
  6. 16 2月, 2011 1 次提交
    • P
      perf: Fix throttle logic · 4fe757dd
      Peter Zijlstra 提交于
      It was possible to call pmu::start() on an already running event. In
      particular this lead so some wreckage as the hrtimer events would
      re-initialize active timers.
      
      This was due to throttled events being activated again by scheduling.
      Scheduling in a context would add and force start events, resulting in
      running events with a possible throttle status. The next tick to hit
      that task will then try to unthrottle the event and call ->start() on
      an already running event.
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4fe757dd
  7. 14 2月, 2011 1 次提交
  8. 12 2月, 2011 2 次提交
    • K
      timer debug: Hide kernel addresses via %pK in /proc/timer_list · f5903085
      Kees Cook 提交于
      In the continuing effort to avoid kernel addresses leaking to
      unprivileged users, this patch switches to %pK for
      /proc/timer_list reporting.
      Signed-off-by: NKees Cook <kees.cook@canonical.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Dan Rosenberg <drosenberg@vsecurity.com>
      Cc: Eugene Teo <eugeneteo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110212032125.GA23571@outflux.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f5903085
    • T
      ptrace: use safer wake up on ptrace_detach() · 01e05e9a
      Tejun Heo 提交于
      The wake_up_process() call in ptrace_detach() is spurious and not
      interlocked with the tracee state.  IOW, the tracee could be running or
      sleeping in any place in the kernel by the time wake_up_process() is
      called.  This can lead to the tracee waking up unexpectedly which can be
      dangerous.
      
      The wake_up is spurious and should be removed but for now reduce its
      toxicity by only waking up if the tracee is in TRACED or STOPPED state.
      
      This bug can possibly be used as an attack vector.  I don't think it
      will take too much effort to come up with an attack which triggers oops
      somewhere.  Most sleeps are wrapped in condition test loops and should
      be safe but we have quite a number of places where sleep and wakeup
      conditions are expected to be interlocked.  Although the window of
      opportunity is tiny, ptrace can be used by non-privileged users and with
      some loading the window can definitely be extended and exploited.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e05e9a
  9. 11 2月, 2011 2 次提交
  10. 10 2月, 2011 1 次提交
  11. 08 2月, 2011 3 次提交
  12. 04 2月, 2011 1 次提交
  13. 03 2月, 2011 6 次提交
    • S
      tracing: Replace syscall_meta_data struct array with pointer array · 3d56e331
      Steven Rostedt 提交于
      Currently the syscall_meta structures for the syscall tracepoints are
      placed in the __syscall_metadata section, and at link time, the linker
      makes one large array of all these syscall metadata structures. On boot
      up, this array is read (much like the initcall sections) and the syscall
      data is processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the __syscall_metadata section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The __syscall_metadata section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3d56e331
    • M
      tracepoints: Fix section alignment using pointer array · 65498646
      Mathieu Desnoyers 提交于
      Make the tracepoints more robust, making them solid enough to handle compiler
      changes by not relying on anything based on compiler-specific behavior with
      respect to structure alignment. Implement an approach proposed by David Miller:
      use an array of const pointers to refer to the individual structures, and export
      this pointer array through the linker script rather than the structures per se.
      It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
      for the pointers), but are less likely to break due to compiler changes.
      
      History:
      
      commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
      added the aligned(32) type and variable attribute to the tracepoint structures
      to deal with gcc happily aligning statically defined structures on 32-byte
      multiples.
      
      One attempt was to use a 8-byte alignment for tracepoint structures by applying
      both the variable and type attribute to tracepoint structures definitions and
      declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.
      
      The reason is that the "aligned" attribute only specify the _minimum_ alignment
      for a structure, leaving both the compiler and the linker free to align on
      larger multiples. Because tracepoint.c expects the structures to be placed as an
      array within each section, up-alignment cause NULL-pointer exceptions due to the
      extra unexpected padding.
      
      (this patch applies on top of -tip)
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      LKML-Reference: <20110126222622.GA10794@Krystal>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      65498646
    • P
      sched: Fix update_curr_rt() · 06c3bc65
      Peter Zijlstra 提交于
      cpu_stopper_thread()
        migration_cpu_stop()
          __migrate_task()
            deactivate_task()
              dequeue_task()
                dequeue_task_rq()
                  update_curr_rt()
      
      Will call update_curr_rt() on rq->curr, which at that time is
      rq->stop. The problem is that rq->stop.prio matches an RT prio and
      thus falsely assumes its a rt_sched_class task.
      Reported-Debuged-Tested-Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Cc: stable@kernel.org # .37
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      06c3bc65
    • P
      perf: Fix reading in perf_event_read() · 542e72fc
      Peter Zijlstra 提交于
      It is quite possible for the event to have been disabled between
      perf_event_read() sending the IPI and the CPU servicing the IPI and
      calling __perf_event_read(), hence revalidate the state.
      Reported-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      542e72fc
    • S
      tracing: Replace trace_event struct array with pointer array · e4a9ea5e
      Steven Rostedt 提交于
      Currently the trace_event structures are placed in the _ftrace_events
      section, and at link time, the linker makes one large array of all
      the trace_event structures. On boot up, this array is read (much like
      the initcall sections) and the events are processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the _ftrace_event section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The _ftrace_event section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e4a9ea5e
    • T
      genirq: Prevent irq storm on migration · f1a06390
      Thomas Gleixner 提交于
      move_native_irq() masks and unmasks the interrupt line
      unconditionally, but the interrupt line might be masked due to a
      threaded oneshot handler in progress. Unmasking the line in that case
      can lead to interrupt storms. Observed on PREEMPT_RT.
      
      Originally-from: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      f1a06390
  14. 31 1月, 2011 4 次提交
  15. 28 1月, 2011 1 次提交
    • E
      perf: Fix alloc_callchain_buffers() · 88d4f0db
      Eric Dumazet 提交于
      Commit 927c7a9e ("perf: Fix race in callchains") introduced
      a mismatch in the sizing of struct callchain_cpus_entries.
      
      nr_cpu_ids must be used instead of num_possible_cpus(), or we
      might get out of bound memory accesses on some machines.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Stephane Eranian <eranian@google.com>
      CC: stable@kernel.org
      LKML-Reference: <1295980851.3588.351.camel@edumazet-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      88d4f0db
  16. 26 1月, 2011 4 次提交
  17. 25 1月, 2011 1 次提交
  18. 24 1月, 2011 2 次提交
  19. 22 1月, 2011 1 次提交
    • O
      perf: perf_event_exit_task_context: s/rcu_dereference/rcu_dereference_raw/ · 806839b2
      Oleg Nesterov 提交于
      In theory, almost every user of task->child->perf_event_ctxp[]
      is wrong. find_get_context() can install the new context at any
      moment, we need read_barrier_depends().
      
      dbe08d82 "perf: Fix
      find_get_context() vs perf_event_exit_task() race" added
      rcu_dereference() into perf_event_exit_task_context() to make
      the precedent, but this makes __rcu_dereference_check() unhappy.
      Use rcu_dereference_raw() to shut up the warning.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: acme@redhat.com
      Cc: paulus@samba.org
      Cc: stern@rowland.harvard.edu
      Cc: a.p.zijlstra@chello.nl
      Cc: fweisbec@gmail.com
      Cc: roland@redhat.com
      Cc: prasad@linux.vnet.ibm.com
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <20110121174547.GA8796@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      806839b2
  20. 21 1月, 2011 2 次提交
    • P
      perf: Annotate cpuctx->ctx.mutex to avoid a lockdep splat · 547e9fd7
      Peter Zijlstra 提交于
      Lockdep spotted:
      
      	loop_1b_instruc/1899 is trying to acquire lock:
      	 (event_mutex){+.+.+.}, at: [<ffffffff810e1908>] perf_trace_init+0x3b/0x2f7
      
      	but task is already holding lock:
      	 (&ctx->mutex){+.+.+.}, at: [<ffffffff810eb45b>] perf_event_init_context+0xc0/0x218
      
      	which lock already depends on the new lock.
      
      	the existing dependency chain (in reverse order) is:
      
      	-> #3 (&ctx->mutex){+.+.+.}:
      	-> #2 (cpu_hotplug.lock){+.+.+.}:
      	-> #1 (module_mutex){+.+...}:
      	-> #0 (event_mutex){+.+.+.}:
      
      But because the deadlock would be cpuhotplug (cpu-event) vs fork
      (task-event) it cannot, in fact, happen. We can annotate this by giving the
      perf_event_context used for the cpuctx a different lock class from those
      used by tasks.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      547e9fd7
    • T
      genirq: Remove __do_IRQ · 1c77ff22
      Thomas Gleixner 提交于
      All architectures are finally converted. Remove the cruft.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      1c77ff22