1. 21 10月, 2010 3 次提交
    • S
      ring-buffer: Remove condition to add timestamp in fast path · 140ff891
      Steven Rostedt 提交于
      There's a condition to check if we should add a time extend or
      not in the fast path. But this condition is racey (in the sense
      that we can add a unnecessary time extend, but nothing that
      can break anything). We later check if the time or event time
      delta should be zero or have real data in it (not racey), making
      this first check redundant.
      
      This check may help save space once in a while, but really is
      not worth the hassle to try to save some space that happens at
      most 134 ms at a time.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      140ff891
    • S
      ring-buffer: Bind time extend and data events together · 69d1b839
      Steven Rostedt 提交于
      When the time between two timestamps is greater than
      2^27 nanosecs (~134 ms) a time extend event is added that extends
      the time difference to 59 bits (~18 years). This is due to
      events only having a 27 bit field to store time.
      
      Currently this time extend is a separate event. We add it just before
      the event data that is being written to the buffer. But before
      the event data is committed, the event data can also be discarded (as
      with the case of filters). But because the time extend has already been
      committed, it will stay in the buffer.
      
      If lots of events are being filtered and no event is being
      written, then every 134ms a time extend can be added to the buffer
      without any data attached. To keep from filling the entire buffer
      with time extends, a time extend will never be the first event
      in a page because the page timestamp can be used. Time extends can
      only fill the rest of a page with some data at the beginning.
      
      This patch binds the time extend with the data. The difference here
      is that the time extend is not committed before the data is added.
      Instead, when a time extend is needed, the space reserved on
      the ring buffer is the time extend + the data event size. The
      time extend is added to the first part of the reserved block and
      the data is added to the second. The time extend event is passed
      back to the reserver, but since the reserver also uses a function
      to find the data portion of the reserved block, no changes to the
      ring buffer interface need to be made.
      
      When a commit is discarded, we now remove both the time extend and
      the event. With this approach no more than one time extend can
      be in the buffer in a row. Data must always follow a time extend.
      
      Thanks to Mathieu Desnoyers for suggesting this idea.
      Suggested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      69d1b839
    • S
      ring-buffer: Pass delta by value and not by reference · f25106ae
      Steven Rostedt 提交于
      The delta between events is passed to the timestamp code by reference
      and the timestamp code will reset the value. But it can be reset
      from the caller. No need to pass it in by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the delta in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f25106ae
  2. 20 10月, 2010 2 次提交
    • S
      ring-buffer: Pass timestamp by value and not by reference · e8bc43e8
      Steven Rostedt 提交于
      The original code for the ring buffer had locations that modified
      the timestamp and that change was used by the callers. Now,
      the timestamp is not reused by the callers and there is no reason
      to pass it by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the timestamp in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e8bc43e8
    • S
      ring-buffer: Make write slow path out of line · 747e94ae
      Steven Rostedt 提交于
      Gcc inlines the slow path of the ring buffer write which can
      hurt performance. This patch simply forces the slow path function
      rb_move_tail() to always be a function.
      
      The ring_buffer_benchmark module with reader_disabled=1 shows that
      this patch changes the time to record an event from 135 ns to
      132 ns. (3 ns or 2.22% improvement)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      747e94ae
  3. 16 10月, 2010 1 次提交
  4. 14 10月, 2010 1 次提交
    • S
      hrtimer: Preserve timer state in remove_hrtimer() · f13d4f97
      Salman Qazi 提交于
      The race is described as follows:
      
      CPU X                                 CPU Y
      remove_hrtimer
      // state & QUEUED == 0
      timer->state = CALLBACK
      unlock timer base
      timer->f(n) //very long
                                        hrtimer_start
                                          lock timer base
                                          remove_hrtimer // no effect
                                          hrtimer_enqueue
                                          timer->state = CALLBACK |
                                                         QUEUED
                                          unlock timer base
                                        hrtimer_start
                                          lock timer base
                                          remove_hrtimer
                                              mode = INACTIVE
                                              // CALLBACK bit lost!
                                          switch_hrtimer_base
                                                  CALLBACK bit not set:
                                                          timer->base
                                                          changes to a
                                                          different CPU.
      lock this CPU's timer base
      
      The bug was introduced with commit ca109491 (hrtimer: removing all ur
      callback modes) in 2.6.29
      
      [ tglx: Feed new state via local variable and add a comment. ]
      Signed-off-by: NSalman Qazi <sqazi@google.com>
      Cc: akpm@linux-foundation.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20101012142351.8485.21823.stgit@dungbeetle.mtv.corp.google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      f13d4f97
  5. 13 10月, 2010 1 次提交
    • S
      ring-buffer: Fix typo of time extends per page · d0134324
      Steven Rostedt 提交于
      Time stamps for the ring buffer are created by the difference between
      two events. Each page of the ring buffer holds a full 64 bit timestamp.
      Each event has a 27 bit delta stamp from the last event. The unit of time
      is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
      happen more than 134 milliseconds apart, a time extend is inserted
      to add more bits for the delta. The time extend has 59 bits, which
      is good for ~18 years.
      
      Currently the time extend is committed separately from the event.
      If an event is discarded before it is committed, due to filtering,
      the time extend still exists. If all events are being filtered, then
      after ~134 milliseconds a new time extend will be added to the buffer.
      
      This can only happen till the end of the page. Since each page holds
      a full timestamp, there is no reason to add a time extend to the
      beginning of a page. Time extends can only fill a page that has actual
      data at the beginning, so there is no fear that time extends will fill
      more than a page without any data.
      
      When reading an event, a loop is made to skip over time extends
      since they are only used to maintain the time stamp and are never
      given to the caller. As a paranoid check to prevent the loop running
      forever, with the knowledge that time extends may only fill a page,
      a check is made that tests the iteration of the loop, and if the
      iteration is more than the number of time extends that can fit in a page
      a warning is printed and the ring buffer is disabled (all of ftrace
      is also disabled with it).
      
      There is another event type that is called a TIMESTAMP which can
      hold 64 bits of data in the theoretical case that two events happen
      18 years apart. This code has not been implemented, but the name
      of this event exists, as well as the structure for it. The
      size of a TIMESTAMP is 16 bytes, where as a time extend is only
      8 bytes. The macro used to calculate how many time extends can fit on
      a page used the TIMESTAMP size instead of the time extend size
      cutting the amount in half.
      
      The following test case can easily trigger the warning since we only
      need to have half the page filled with time extends to trigger the
      warning:
      
       # cd /sys/kernel/debug/tracing/
       # echo function > current_tracer
       # echo 'common_pid < 0' > events/ftrace/function/filter
       # echo > trace
       # echo 1 > trace_marker
       # sleep 120
       # cat trace
      
      Enabling the function tracer and then setting the filter to only trace
      functions where the process id is negative (no events), then clearing
      the trace buffer to ensure that we have nothing in the buffer,
      then write to trace_marker to add an event to the beginning of a page,
      sleep for 2 minutes (only 35 seconds is probably needed, but this
      guarantees the bug), and then finally reading the trace which will
      trigger the bug.
      
      This patch fixes the typo and prevents the false positive of that warning.
      Reported-by: NHans J. Koch <hjk@linutronix.de>
      Tested-by: NHans J. Koch <hjk@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stable Kernel <stable@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d0134324
  6. 12 10月, 2010 1 次提交
  7. 08 10月, 2010 1 次提交
  8. 07 10月, 2010 1 次提交
    • A
      HWPOISON: Copy si_addr_lsb to user · a337fdac
      Andi Kleen 提交于
      The original hwpoison code added a new siginfo field si_addr_lsb to
      pass the granuality of the fault address to user space. Unfortunately
      this field was never copied to user space. Fix this here.
      
      I added explicit checks for the MCEERR codes to avoid having
      to patch all potential callers to initialize the field.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      a337fdac
  9. 06 10月, 2010 1 次提交
    • L
      modules: Fix module_bug_list list corruption race · 5336377d
      Linus Torvalds 提交于
      With all the recent module loading cleanups, we've minimized the code
      that sits under module_mutex, fixing various deadlocks and making it
      possible to do most of the module loading in parallel.
      
      However, that whole conversion totally missed the rather obscure code
      that adds a new module to the list for BUG() handling.  That code was
      doubly obscure because (a) the code itself lives in lib/bugs.c (for
      dubious reasons) and (b) it gets called from the architecture-specific
      "module_finalize()" rather than from generic code.
      
      Calling it from arch-specific code makes no sense what-so-ever to begin
      with, and is now actively wrong since that code isn't protected by the
      module loading lock any more.
      
      So this commit moves the "module_bug_{finalize,cleanup}()" calls away
      from the arch-specific code, and into the generic code - and in the
      process protects it with the module_mutex so that the list operations
      are now safe.
      
      Future fixups:
       - move the module list handling code into kernel/module.c where it
         belongs.
       - get rid of 'module_bug_list' and just use the regular list of modules
         (called 'modules' - imagine that) that we already create and maintain
         for other reasons.
      Reported-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5336377d
  10. 02 10月, 2010 1 次提交
    • I
      kfifo: fix scatterlist usage · 399f1e30
      Ira W. Snyder 提交于
      The kfifo_dma family of functions use sg_mark_end() on the last element in
      their scatterlist.  This forces use of a fresh scatterlist for each DMA
      operation, which makes recycling a single scatterlist impossible.
      
      Change the behavior of the kfifo_dma functions to match the usage of the
      dma_map_sg function.  This means that users must respect the returned
      nents value.  The sample code is updated to reflect the change.
      
      This bug is trivial to cause: call kfifo_dma_in_prepare() such that it
      prepares a scatterlist with a single entry comprising the whole fifo.
      This is the case when you map the entirety of a newly created empty fifo.
      This causes the setup_sgl() function to mark the first scatterlist entry
      as the end of the chain, no matter what comes after it.
      
      Afterwards, add and remove some data from the fifo such that another call
      to kfifo_dma_in_prepare() will create two scatterlist entries.  It returns
      nents=2.  However, due to the previous sg_mark_end() call, sg_is_last()
      will now return true for the first scatterlist element.  This causes the
      sample code to print a single scatterlist element when it should print
      two.
      
      By removing the call to sg_mark_end(), we make the API as similar as
      possible to the DMA mapping API.  All users are required to respect the
      returned nents.
      Signed-off-by: NIra W. Snyder <iws@ovro.caltech.edu>
      Cc: Stefani Seibold <stefani@seibold.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      399f1e30
  11. 23 9月, 2010 1 次提交
  12. 21 9月, 2010 1 次提交
    • S
      sched: Fix nohz balance kick · f6c3f168
      Suresh Siddha 提交于
      There's a situation where the nohz balancer will try to wake itself:
      
      cpu-x is idle which is also ilb_cpu
      got a scheduler tick during idle
      and the nohz_kick_needed() in trigger_load_balance() checks for
      rq_x->nr_running which might not be zero (because of someone waking a
      task on this rq etc) and this leads to the situation of the cpu-x
      sending a kick to itself.
      
      And this can cause a lockup.
      
      Avoid this by not marking ourself eligible for kicking.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1284400941.2684.19.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6c3f168
  13. 17 9月, 2010 1 次提交
  14. 15 9月, 2010 2 次提交
  15. 14 9月, 2010 1 次提交
  16. 13 9月, 2010 1 次提交
  17. 12 9月, 2010 1 次提交
    • R
      PM / Hibernate: Avoid hitting OOM during preallocation of memory · 6715045d
      Rafael J. Wysocki 提交于
      There is a problem in hibernate_preallocate_memory() that it calls
      preallocate_image_memory() with an argument that may be greater than
      the total number of available non-highmem memory pages.  If that's
      the case, the OOM condition is guaranteed to trigger, which in turn
      can cause significant slowdown to occur during hibernation.
      
      To avoid that, make preallocate_image_memory() adjust its argument
      before calling preallocate_image_pages(), so that the total number of
      saveable non-highem pages left is not less than the minimum size of
      a hibernation image.  Change hibernate_preallocate_memory() to try to
      allocate from highmem if the number of pages allocated by
      preallocate_image_memory() is too low.
      
      Modify free_unnecessary_pages() to take all possible memory
      allocation patterns into account.
      Reported-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Tested-by: NM. Vefa Bicakci <bicave@superonline.com>
      6715045d
  18. 11 9月, 2010 1 次提交
  19. 10 9月, 2010 9 次提交
    • H
      generic-ipi: Fix deadlock in __smp_call_function_single · 27c379f7
      Heiko Carstens 提交于
      Just got my 6 way machine to a state where cpu 0 is in an
      endless loop within __smp_call_function_single.
      All other cpus are idle.
      
      The call trace on cpu 0 looks like this:
      
       __smp_call_function_single
       scheduler_tick
       update_process_times
       tick_sched_timer
       __run_hrtimer
       hrtimer_interrupt
       clock_comparator_work
       do_extint
       ext_int_handler
       ----> timer irq
       cpu_idle
      
      __smp_call_function_single() got called from nohz_balancer_kick()
      (inlined) with the remote cpu being 1, wait being 0 and the per
      cpu variable remote_sched_softirq_cb (call_single_data) of the
      current cpu (0).
      
      Then it loops forever when it tries to grab the lock of the
      call_single_data, since it is already locked and enqueued on cpu 0.
      
      My theory how this could have happened: for some reason the
      scheduler decided to call __smp_call_function_single() on it's own
      cpu, and sends an IPI to itself. The interrupt stays pending
      since IRQs are disabled. If then the hypervisor schedules the
      cpu away it might happen that upon rescheduling both the IPI and
      the timer IRQ are pending. If then interrupts are enabled again
      it depends which one gets scheduled first.
      If the timer interrupt gets delivered first we end up with the
      local deadlock as seen in the calltrace above.
      
      Let's make __smp_call_function_single() check if the target cpu is
      the current cpu and execute the function immediately just like
      smp_call_function_single does. That should prevent at least the
      scenario described here.
      
      It might also be that the scheduler is not supposed to call
      __smp_call_function_single with the remote cpu being the current
      cpu, but that is a different issue.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJens Axboe <jaxboe@fusionio.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100910114729.GB2827@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      27c379f7
    • C
      tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread · df091625
      Chris Wright 提交于
      Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
      having properly started the iterator to iterate the hash.  This case is
      degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
      to misuse a pointer.  This causes a NULL ptr deref with possible security
      implications.  Tracked as CVE-2010-3079.
      
      Cc: Robert Swiecki <swiecki@google.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      df091625
    • H
      swap: revert special hibernation allocation · 910321ea
      Hugh Dickins 提交于
      Please revert 2.6.36-rc commit d2997b10
      "hibernation: freeze swap at hibernation".  It complicated matters by
      adding a second swap allocation path, just for hibernation; without in any
      way fixing the issue that it was intended to address - page reclaim after
      fixing the hibernation image might free swap from a page already imaged as
      swapcache, letting its swap be reallocated to store a different page of
      the image: resulting in data corruption if the imaged page were freed as
      clean then swapped back in.  Pages freed to si->swap_map were still in
      danger of being reallocated by the alternative allocation path.
      
      I guess it inadvertently fixed slow SSD swap allocation for hibernation,
      as reported by Nigel Cunningham: by missing out the discards that occur on
      the usual swap allocation path; but that was unintentional, and needs a
      separate fix.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ondrej Zary <linux@rainbow-software.org>
      Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      910321ea
    • J
      kernel/groups.c: fix integer overflow in groups_search · 1c24de60
      Jerome Marchand 提交于
      gid_t is a unsigned int.  If group_info contains a gid greater than
      MAX_INT, groups_search() function may look on the wrong side of the search
      tree.
      
      This solves some unfair "permission denied" problems.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c24de60
    • M
      cgroups: fix API thinko · 31583bb0
      Michael S. Tsirkin 提交于
      Add cgroup_attach_task_all()
      
      The existing cgroup_attach_task_current_cg() API is called by a thread to
      attach another thread to all of its cgroups; this is unsuitable for cases
      where a privileged task wants to attach itself to the cgroups of a less
      privileged one, since the call must be made from the context of the target
      task.
      
      This patch adds a more generic cgroup_attach_task_all() API that allows
      both the source task and to-be-moved task to be specified.
      cgroup_attach_task_current_cg() becomes a specialization of the more
      generic new function.
      
      [menage@google.com: rewrote changelog]
      [akpm@linux-foundation.org: address reviewer comments]
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Ben Blum <bblum@google.com>
      Cc: Sridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31583bb0
    • P
      gcov: fix null-pointer dereference for certain module types · 85a0fdfd
      Peter Oberparleiter 提交于
      The gcov-kernel infrastructure expects that each object file is loaded
      only once.  This may not be true, e.g.  when loading multiple kernel
      modules which are linked to the same object file.  As a result, loading
      such kernel modules will result in incorrect gcov results while unloading
      will cause a null-pointer dereference.
      
      This patch fixes these problems by changing the gcov-kernel infrastructure
      so that multiple profiling data sets can be associated with one debugfs
      entry.  It applies to 2.6.36-rc1.
      Signed-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Reported-by: NWerner Spies <werner.spies@thalesgroup.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85a0fdfd
    • S
      sched: Move sched_avg_update() to update_cpu_load() · da2b71ed
      Suresh Siddha 提交于
      Currently sched_avg_update() (which updates rt_avg stats in the rq)
      is getting called from scale_rt_power() (in the load balance context)
      which doesn't take rq->lock.
      
      Fix it by moving the sched_avg_update() to more appropriate
      update_cpu_load() where the CFS load gets updated as well.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1282596171.2694.3.camel@sbsiddha-MOBL3>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      da2b71ed
    • P
      perf: Fix CPU hotplug · 5e11637e
      Peter Zijlstra 提交于
      Since we have UP_PREPARE, we should also have UP_CANCELED.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus <paulus@samba.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e11637e
    • L
      perf, trace: Fix module leak · 9cb627d5
      Li Zefan 提交于
      Commit 1c024eca (perf, trace: Optimize tracepoints by using
      per-tracepoint-per-cpu hlist to track events) caused a module
      refcount leak.
      Reported-And-Tested-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4C7E1F12.8030304@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9cb627d5
  20. 08 9月, 2010 3 次提交
  21. 09 9月, 2010 1 次提交
    • S
      tracing: Do not allow llseek to set_ftrace_filter · 9c55cb12
      Steven Rostedt 提交于
      Reading the file set_ftrace_filter does three things.
      
      1) shows whether or not filters are set for the function tracer
      2) shows what functions are set for the function tracer
      3) shows what triggers are set on any functions
      
      3 is independent from 1 and 2.
      
      The way this file currently works is that it is a state machine,
      and as you read it, it may change state. But this assumption breaks
      when you use lseek() on the file. The state machine gets out of sync
      and the t_show() may use the wrong pointer and cause a kernel oops.
      
      Luckily, this will only kill the app that does the lseek, but the app
      dies while holding a mutex. This prevents anyone else from using the
      set_ftrace_filter file (or any other function tracing file for that matter).
      
      A real fix for this is to rewrite the code, but that is too much for
      a -rc release or stable. This patch simply disables llseek on the
      set_ftrace_filter() file for now, and we can do the proper fix for the
      next major release.
      Reported-by: NRobert Swiecki <swiecki@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Cc: vendor-sec@lst.de
      Cc: <stable@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9c55cb12
  22. 05 9月, 2010 2 次提交
  23. 03 9月, 2010 1 次提交
  24. 01 9月, 2010 2 次提交
    • D
      lockup_detector: Sync touch_*_watchdog back to old semantics · 68d3f1d8
      Don Zickus 提交于
      During my rewrite, the semantics of touch_nmi_watchdog and
      touch_softlockup_watchdog changed enough to break some drivers
      (mostly over preemptable regions).
      
      These are cases where long delays on one CPU (due to
      print_delay for example) can cause long delays on other
      CPUs - so we must 'touch' the nmi_watchdog flag of those
      other CPUs as well.
      
      This change brings those touch_*_watchdog() functions back in line
      with to how they used to work.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: peterz@infradead.org
      Cc: fweisbec@gmail.com
      LKML-Reference: <1283310009-22168-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      68d3f1d8
    • P
      pid: make setpgid() system call use RCU read-side critical section · 950eaaca
      Paul E. McKenney 提交于
      [   23.584719]
      [   23.584720] ===================================================
      [   23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
      [   23.585176] ---------------------------------------------------
      [   23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
      [   23.585176]
      [   23.585176] other info that might help us debug this:
      [   23.585176]
      [   23.585176]
      [   23.585176] rcu_scheduler_active = 1, debug_locks = 1
      [   23.585176] 1 lock held by rc.sysinit/728:
      [   23.585176]  #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8104771f>] sys_setpgid+0x5f/0x193
      [   23.585176]
      [   23.585176] stack backtrace:
      [   23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
      [   23.585176] Call Trace:
      [   23.585176]  [<ffffffff8105b436>] lockdep_rcu_dereference+0x99/0xa2
      [   23.585176]  [<ffffffff8104c324>] find_task_by_pid_ns+0x50/0x6a
      [   23.585176]  [<ffffffff8104c35b>] find_task_by_vpid+0x1d/0x1f
      [   23.585176]  [<ffffffff81047727>] sys_setpgid+0x67/0x193
      [   23.585176]  [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
      [   24.959669] type=1400 audit(1282938522.956:4): avc:  denied  { module_request } for  pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas
      
      It turns out that the setpgid() system call fails to enter an RCU
      read-side critical section before doing a PID-to-task_struct translation.
      This commit therefore does rcu_read_lock() before the translation, and
      also does rcu_read_unlock() after the last use of the returned pointer.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      950eaaca