1. 14 6月, 2008 1 次提交
  2. 13 6月, 2008 1 次提交
  3. 12 6月, 2008 3 次提交
    • L
      sched: 64-bit: fix arithmetics overflow · 7a232e03
      Lai Jiangshan 提交于
      (overflow means weight >= 2^32 here, because inv_weigh = 2^32/weight)
      
      A weight of a cfs_rq is the sum of weights of which entities
      are queued on this cfs_rq, so it will overflow when there are
      too many entities.
      
      Although, overflow occurs very rarely, but it break fairness when
      it occurs. 64-bits systems have more memory than 32-bit systems
      and 64-bit systems can create more process usually, so overflow may
      occur more frequently.
      
      This patch guarantees fairness when overflow happens on 64-bit systems.
      Thanks to the optimization of compiler, it changes nothing on 32-bit.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7a232e03
    • L
      sched: fair group: fix overflow(was: fix divide by zero) · 2e084786
      Lai Jiangshan 提交于
      I found a bug which can be reproduced by this way:(linux-2.6.26-rc5, x86-64)
      (use 2^32, 2^33, ...., 2^63 as shares value)
      
      # mkdir /dev/cpuctl
      # mount -t cgroup -o cpu cpuctl /dev/cpuctl
      # cd /dev/cpuctl
      # mkdir sub
      # echo 0x8000000000000000 > sub/cpu.shares
      # echo $$ > sub/tasks
      oops here! divide by zero.
      
      This is because do_div() expects the 2th parameter to be 32 bits,
      but unsigned long is 64 bits in x86_64.
      
      Peter Zijstra pointed it out that the sane thing to do is limit the
      shares value to something smaller instead of using an even more
      expensive divide.
      
      Also, I found another bug about "the shares value is too large":
      
      pid1 and pid2 are set affinity to cpu#0
      pid1 is attached to cg1 and pid2 is attached to cg2
      
      if cg1/cpu.shares = 1024 cg2/cpu.shares = 2000000000
      then pid2 got 100% usage of cpu, and pid1 0%
      
      if cg1/cpu.shares = 1024 cg2/cpu.shares = 20000000000
      then pid2 got 0% usage of cpu, and pid1 100%
      
      And a weight of a cfs_rq is the sum of weights of which entities
      are queued on this cfs_rq, so the shares value should be limited
      to a smaller value.
      
      I think that (1UL << 18) is a good limited value:
      
      1) it's not too large, we can create a lot of group before overflow
      2) it's several times the weight value for nice=-19 (not too small)
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e084786
    • J
      ftrace: fix printout · 20764ff1
      Jiri Slaby 提交于
      Do not print loglevel before "entries of %ld bytes". Move it to the previous
      pr_info.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      20764ff1
  4. 10 6月, 2008 7 次提交
    • A
      ftrace: disable tracing when current_tracer is set to "none" · 2b1bce17
      Ankita Garg 提交于
      Found that inspite of setting the current_tracer to "none", trace from
      the previous trace type continued to be collected. The patch below fixes
      this and causes the trace to be disabled when the "none" type is
      selected.
      
      Compile and boot tested the patch for functionality.
      Signed-off-by: NAnkita Garg <ankita@in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b1bce17
    • I
      sched: sched_clock() lockdep fix · 040ec23d
      Ingo Molnar 提交于
      Sitsofe Wheeler bisected the following commit to cause a lockdep to
      warn about itself and turn itself off:
      
      > commit c6531cce
      > Author: Ingo Molnar <mingo@elte.hu>
      > Date:   Mon May 12 21:21:14 2008 +0200
      >
      >     sched: do not trace sched_clock
      
      do not use raw irq flags in cpu_clock() as it causes lockdep to lose
      track of the true state of the IRQ flag.
      Reported-and-bisected-by: NSitsofe Wheeler <sitsofe@yahoo.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      040ec23d
    • A
      ftrace: prevent freeing of all failed updates · 34078a5e
      Abhishek Sagar 提交于
      Steven Rostedt wrote:
      > If we unload a module and reload it, will it ever get converted again?
      
      The intent was always to filter core kernel functions to prevent their freeing.
      Here's a fix which should allow re-recording of module call-sites.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      34078a5e
    • A
      ftrace: add debugfs entry 'failures' · eb9a7bf0
      Abhishek Sagar 提交于
      Identify functions which had their mcount call-site updates failed. This can
      help us track functions which ftrace shouldn't fiddle with, and are thus not
      being traced. If there is no race with any external agent which is modifying
      the mcount call-site, then this file displays no entries (normal case).
      Signed-off-by: NAbhishek Sagar <sagar.abhishek@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eb9a7bf0
    • A
      ftrace: remove ftrace_ip_converted() · 1d74f2a0
      Abhishek Sagar 提交于
      Remove the unneeded function ftrace_ip_converted().
      Signed-off-by: NAbhishek Sagar <sagar.abhishek@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d74f2a0
    • A
      ftrace: prevent freeing of all failed updates · 0eb96701
      Abhishek Sagar 提交于
      Prevent freeing of records which cause problems and correspond to function from
      core kernel text. A new flag, FTRACE_FL_CONVERTED is used to mark a record
      as "converted". All other records are patched lazily to NOPs. Failed records
      now also remain on frace_hash table. Each invocation of ftrace_record_ip now
      checks whether the traced function has ever been recorded (including past
      failures) and doesn't re-record it again.
      Signed-off-by: NAbhishek Sagar <sagar.abhishek@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0eb96701
    • O
      sched: fix TASK_WAKEKILL vs SIGKILL race · 16882c1e
      Oleg Nesterov 提交于
      schedule() has the special "TASK_INTERRUPTIBLE && signal_pending()" case,
      this allows us to do
      
      	current->state = TASK_INTERRUPTIBLE;
      	schedule();
      
      without fear to sleep with pending signal.
      
      However, the code like
      
      	current->state = TASK_KILLABLE;
      	schedule();
      
      is not right, schedule() doesn't take TASK_WAKEKILL into account. This means
      that mutex_lock_killable(), wait_for_completion_killable(), down_killable(),
      schedule_timeout_killable() can miss SIGKILL (and btw the second SIGKILL has
      no effect).
      
      Introduce the new helper, signal_pending_state(), and change schedule() to
      use it. Hopefully it will have more users, that is why the task's state is
      passed separately.
      
      Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
      This is needed to preserve the current behaviour (ptrace_notify). I hope
      this check will be removed soon, but this (afaics good) change needs the
      separate discussion.
      
      The fast path is "(state & (INTERRUPTIBLE | WAKEKILL)) + signal_pending(p)",
      basically the same that schedule() does now. However, this patch of course
      bloats schedule().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      16882c1e
  5. 07 6月, 2008 1 次提交
  6. 02 6月, 2008 2 次提交
    • S
      ftrace: user update and disable dynamic ftrace daemon · ad90c0e3
      Steven Rostedt 提交于
      In dynamic ftrace, the mcount function starts off pointing to a stub
      function that just returns.
      
      On start up, the call to the stub is modified to point to a "record_ip"
      function. The job of the record_ip function is to add the function to
      a pre-allocated hash list. If the function is already there, it simply is
      ignored, otherwise it is added to the list.
      
      Later, a ftraced daemon wakes up and calls kstop_machine if any functions
      have been recorded, and changes the calls to the recorded functions to
      a simple nop.  If no functions were recorded, the daemon goes back to sleep.
      
      The daemon wakes up once a second to see if it needs to update any newly
      recorded functions into nops.  Usually it does not, but if a lot of code
      has been executed for the first time in the kernel, the ftraced daemon
      will call kstop_machine to update those into nops.
      
      The problem currently is that there's no way to stop the daemon from doing
      this, and it can cause unneeded latencies (800us which for some is bothersome).
      
      This patch adds a new file /debugfs/tracing/ftraced_enabled. If the daemon
      is active, reading this will return "enabled\n" and "disabled\n" when the
      daemon is not running. To disable the daemon, the user can echo "0" or
      "disable" into this file, and "1" or "enable" to re-enable the daemon.
      
      Since the daemon is used to convert the functions into nops to increase
      the performance of the system, I also added that anytime something is
      written into the ftraced_enabled file, kstop_machine will run if there
      are new functions that have been detected that need to be converted.
      
      This way the user can disable the daemon but still be able to control the
      conversion of the mcount calls to nops by simply,
      
        "echo 0 > /debugfs/tracing/ftraced_enabled"
      
      when they need to do more conversions.
      
      To see the number of converted functions:
      
        "cat /debugfs/tracing/dyn_ftrace_total_info"
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ad90c0e3
    • A
      ftrace: distinguish kretprobe'd functions in trace logs · 76094a2c
      Abhishek Sagar 提交于
      Tracing functions via ftrace which have a kretprobe installed on them, can produce misleading output in their trace logs. E.g, consider the correct trace of the following sequence:
      
      do_IRQ()
      {
      ~
        irq_enter();
      ~
      }
      
      Trace log (sample):
      <idle>-0     [00] 4154504455.781616: irq_enter <- do_IRQ
      
      But if irq_enter() has a kretprobe installed on it, the return value stored on the stack at each invocation is modified to divert the return to a kprobe trampoline function called kretprobe_trampoline(). So with this the trace would (currently) look like:
      
      <idle>-0     [00] 4154504455.781616: irq_enter <- kretprobe_trampoline
      
      Now this is quite misleading to the end user, as it suggests something that didn't actually happen. So just to avoid such misinterpretations, the inlined patch aims to output such a log as:
      
      <idle>-0     [00] 4154504455.781616: irq_enter <- [unknown/kretprobe'd]
      Signed-off-by: NAbhishek Sagar <sagar.abhishek@gmail.com>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76094a2c
  7. 01 6月, 2008 1 次提交
    • A
      capabilities: remain source compatible with 32-bit raw legacy capability support. · ca05a99a
      Andrew G. Morgan 提交于
      Source code out there hard-codes a notion of what the
      _LINUX_CAPABILITY_VERSION #define means in terms of the semantics of the
      raw capability system calls capget() and capset().  Its unfortunate, but
      true.
      
      Since the confusing header file has been in a released kernel, there is
      software that is erroneously using 64-bit capabilities with the semantics
      of 32-bit compatibilities.  These recently compiled programs may suffer
      corruption of their memory when sys_getcap() overwrites more memory than
      they are coded to expect, and the raising of added capabilities when using
      sys_capset().
      
      As such, this patch does a number of things to clean up the situation
      for all. It
      
        1. forces the _LINUX_CAPABILITY_VERSION define to always retain its
           legacy value.
      
        2. adopts a new #define strategy for the kernel's internal
           implementation of the preferred magic.
      
        3. deprecates v2 capability magic in favor of a new (v3) magic
           number. The functionality of v3 is entirely equivalent to v2,
           the only difference being that the v2 magic causes the kernel
           to log a "deprecated" warning so the admin can find applications
           that may be using v2 inappropriately.
      
      [User space code continues to be encouraged to use the libcap API which
      protects the application from details like this.  libcap-2.10 is the first
      to support v3 capabilities.]
      
      Fixes issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=447518.
      Thanks to Bojan Smojver for the report.
      
      [akpm@linux-foundation.org: s/depreciate/deprecate/g]
      [akpm@linux-foundation.org: be robust about put_user size]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: Bojan Smojver <bojan@rexursive.com>
      Cc: stable@kernel.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      ca05a99a
  8. 29 5月, 2008 8 次提交
  9. 28 5月, 2008 1 次提交
  10. 27 5月, 2008 10 次提交
  11. 25 5月, 2008 4 次提交
  12. 24 5月, 2008 1 次提交