1. 26 5月, 2011 1 次提交
    • S
      ftrace: Add internal recursive checks · b1cff0ad
      Steven Rostedt 提交于
      Witold reported a reboot caused by the selftests of the dynamic function
      tracer. He sent me a config and I used ktest to do a config_bisect on it
      (as my config did not cause the crash). It pointed out that the problem
      config was CONFIG_PROVE_RCU.
      
      What happened was that if multiple callbacks are attached to the
      function tracer, we iterate a list of callbacks. Because the list is
      managed by synchronize_sched() and preempt_disable, the access to the
      pointers uses rcu_dereference_raw().
      
      When PROVE_RCU is enabled, the rcu_dereference_raw() calls some
      debugging functions, which happen to be traced. The tracing of the debug
      function would then call rcu_dereference_raw() which would then call the
      debug function and then... well you get the idea.
      
      I first wrote two different patches to solve this bug.
      
      1) add a __rcu_dereference_raw() that would not do any checks.
      2) add notrace to the offending debug functions.
      
      Both of these patches worked.
      
      Talking with Paul McKenney on IRC, he suggested to add recursion
      detection instead. This seemed to be a better solution, so I decided to
      implement it. As the task_struct already has a trace_recursion to detect
      recursion in the ring buffer, and that has a very small number it
      allows, I decided to use that same variable to add flags that can detect
      the recursion inside the infrastructure of the function tracer.
      
      I plan to change it so that the task struct bit can be checked in
      mcount, but as that requires changes to all archs, I will hold that off
      to the next merge window.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1306348063.1465.116.camel@gandalf.stny.rr.comReported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b1cff0ad
  2. 23 5月, 2011 1 次提交
  3. 12 5月, 2011 1 次提交
  4. 25 4月, 2011 1 次提交
    • F
      ptrace: Prepare to fix racy accesses on task breakpoints · bf26c018
      Frederic Weisbecker 提交于
      When a task is traced and is in a stopped state, the tracer
      may execute a ptrace request to examine the tracee state and
      get its task struct. Right after, the tracee can be killed
      and thus its breakpoints released.
      This can happen concurrently when the tracer is in the middle
      of reading or modifying these breakpoints, leading to dereferencing
      a freed pointer.
      
      Hence, to prepare the fix, create a generic breakpoint reference
      holding API. When a reference on the breakpoints of a task is
      held, the breakpoints won't be released until the last reference
      is dropped. After that, no more ptrace request on the task's
      breakpoints can be serviced for the tracer.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: v2.6.33.. <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com
      bf26c018
  5. 24 4月, 2011 1 次提交
  6. 15 4月, 2011 1 次提交
  7. 14 4月, 2011 8 次提交
  8. 11 4月, 2011 4 次提交
  9. 31 3月, 2011 1 次提交
  10. 24 3月, 2011 1 次提交
  11. 23 3月, 2011 2 次提交
  12. 10 3月, 2011 1 次提交
    • J
      block: initial patch for on-stack per-task plugging · 73c10101
      Jens Axboe 提交于
      This patch adds support for creating a queuing context outside
      of the queue itself. This enables us to batch up pieces of IO
      before grabbing the block device queue lock and submitting them to
      the IO scheduler.
      
      The context is created on the stack of the process and assigned in
      the task structure, so that we can auto-unplug it if we hit a schedule
      event.
      
      The current queue plugging happens implicitly if IO is submitted to
      an empty device, yet callers have to remember to unplug that IO when
      they are going to wait for it. This is an ugly API and has caused bugs
      in the past. Additionally, it requires hacks in the vm (->sync_page()
      callback) to handle that logic. By switching to an explicit plugging
      scheme we make the API a lot nicer and can get rid of the ->sync_page()
      hack in the vm.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      73c10101
  13. 17 2月, 2011 1 次提交
  14. 03 2月, 2011 3 次提交
    • M
      sched: Add yield_to(task, preempt) functionality · d95f4122
      Mike Galbraith 提交于
      Currently only implemented for fair class tasks.
      
      Add a yield_to_task method() to the fair scheduling class. allowing the
      caller of yield_to() to accelerate another thread in it's thread group,
      task group.
      
      Implemented via a scheduler hint, using cfs_rq->next to encourage the
      target being selected.  We can rely on pick_next_entity to keep things
      fair, so noone can accelerate a thread that has already used its fair
      share of CPU time.
      
      This also means callers should only call yield_to when they really
      mean it.  Calling it too often can result in the scheduler just
      ignoring the hint.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110201095051.4ddb7738@annuminas.surriel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d95f4122
    • R
      sched: Use a buddy to implement yield_task_fair() · ac53db59
      Rik van Riel 提交于
      Use the buddy mechanism to implement yield_task_fair.  This
      allows us to skip onto the next highest priority se at every
      level in the CFS tree, unless doing so would introduce gross
      unfairness in CPU time distribution.
      
      We order the buddy selection in pick_next_entity to check
      yield first, then last, then next.  We need next to be able
      to override yield, because it is possible for the "next" and
      "yield" task to be different processen in the same sub-tree
      of the CFS tree.  When they are, we need to go into that
      sub-tree regardless of the "yield" hint, and pick the correct
      entity once we get to the right level.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ac53db59
    • P
      perf: Cure task_oncpu_function_call() races · fe4b04fa
      Peter Zijlstra 提交于
      Oleg reported that on architectures with
      __ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from
      task_oncpu_function_call() can land before perf_event_task_sched_in()
      and cause interesting situations for eg. perf_install_in_context().
      
      This patch reworks the task_oncpu_function_call() interface to give a
      more usable primitive as well as rework all its users to hopefully be
      more obvious as well as remove the races.
      
      While looking at the code I also found a number of races against
      perf_event_task_sched_out() which can flip contexts between tasks so
      plug those too.
      Reported-and-reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fe4b04fa
  15. 01 2月, 2011 1 次提交
  16. 31 1月, 2011 1 次提交
    • T
      time: Provide xtime_update() · f0af911a
      Torben Hohn 提交于
      xtime_update() takes xtime_lock write locked and calls
      do_timer(). Provided to replace the do_timer() calls in the
      architecture code.
      Signed-off-by: NTorben Hohn <torbenh@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: johnstul@us.ibm.com
      Cc: yong.zhang0@gmail.com
      Cc: hch@infradead.org
      LKML-Reference: <20110127145910.23248.21379.stgit@localhost>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f0af911a
  17. 26 1月, 2011 2 次提交
  18. 14 1月, 2011 4 次提交
    • A
      thp: khugepaged · ba76149f
      Andrea Arcangeli 提交于
      Add khugepaged to relocate fragmented pages into hugepages if new
      hugepages become available.  (this is indipendent of the defrag logic that
      will have to make new hugepages available)
      
      The fundamental reason why khugepaged is unavoidable, is that some memory
      can be fragmented and not everything can be relocated.  So when a virtual
      machine quits and releases gigabytes of hugepages, we want to use those
      freely available hugepages to create huge-pmd in the other virtual
      machines that may be running on fragmented memory, to maximize the CPU
      efficiency at all times.  The scan is slow, it takes nearly zero cpu time,
      except when it copies data (in which case it means we definitely want to
      pay for that cpu time) so it seems a good tradeoff.
      
      In addition to the hugepages being released by other process releasing
      memory, we have the strong suspicion that the performance impact of
      potentially defragmenting hugepages during or before each page fault could
      lead to more performance inconsistency than allocating small pages at
      first and having them collapsed into large pages later...  if they prove
      themselfs to be long lived mappings (khugepaged scan is slow so short
      lived mappings have low probability to run into khugepaged if compared to
      long lived mappings).
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba76149f
    • M
      oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down · dabb16f6
      Mandeep Singh Baines 提交于
      We'd like to be able to oom_score_adj a process up/down as it
      enters/leaves the foreground.  Currently, it is not possible to oom_adj
      down without CAP_SYS_RESOURCE.  This patch allows a task to decrease its
      oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
      or its inherited value at fork.  Assuming the thread that has forked it
      has oom_score_adj of 0, each process could decrease it back from 0 upon
      activation unless a CAP_SYS_RESOURCE thread elevated it to something
      higher.
      
      Alternative considered:
      
      * a setuid binary
      * a daemon with CAP_SYS_RESOURCE
      
      Since you don't wan't all processes to be able to reduce their oom_adj, a
      setuid or daemon implementation would be complex.  The alternatives also
      have much higher overhead.
      
      This patch updated from original patch based on feedback from David
      Rientjes.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dabb16f6
    • D
      sched: remove long deprecated CLONE_STOPPED flag · 43bb40c9
      Dave Jones 提交于
      This warning was added in commit bdff746a ("clone: prepare to recycle
      CLONE_STOPPED") three years ago.  2.6.26 came and went.  As far as I know,
      no-one is actually using CLONE_STOPPED.
      Signed-off-by: NDave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43bb40c9
    • R
      epoll: convert max_user_watches to long · 52bd19f7
      Robin Holt 提交于
      On a 16TB machine, max_user_watches has an integer overflow.  Convert it
      to use a long and handle the associated fallout.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52bd19f7
  19. 11 1月, 2011 2 次提交
  20. 07 1月, 2011 1 次提交
  21. 09 12月, 2010 2 次提交