1. 12 9月, 2007 1 次提交
    • A
      futex_compat: fix list traversal bugs · 179c85ea
      Arnd Bergmann 提交于
      The futex list traversal on the compat side appears to have
      a bug.
      
      It's loop termination condition compares:
      
              while (compat_ptr(uentry) != &head->list)
      
      But that can't be right because "uentry" has the special
      "pi" indicator bit still potentially set at bit 0.  This
      is cleared by fetch_robust_entry() into the "entry"
      return value.
      
      What this seems to mean is that the list won't terminate
      when list iteration gets back to the the head.  And we'll
      also process the list head like a normal entry, which could
      cause all kinds of problems.
      
      So we should check for equality with "entry".  That pointer
      is of the non-compat type so we have to do a little casting
      to keep the compiler and sparse happy.
      
      The same problem can in theory occur with the 'pending'
      variable, although that has not been reported from users
      so far.
      
      Based on the original patch from David Miller.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      179c85ea
  2. 11 9月, 2007 1 次提交
    • R
      Fix spurious syscall tracing after PTRACE_DETACH + PTRACE_ATTACH · 7d941432
      Roland McGrath 提交于
      When PTRACE_SYSCALL was used and then PTRACE_DETACH is used, the
      TIF_SYSCALL_TRACE flag is left set on the formerly-traced task.  This
      means that when a new tracer comes along and does PTRACE_ATTACH, it's
      possible he gets a syscall tracing stop even though he's never used
      PTRACE_SYSCALL.  This happens if the task was in the middle of a system
      call when the second PTRACE_ATTACH was done.  The symptom is an
      unexpected SIGTRAP when the tracer thinks that only SIGSTOP should have
      been provoked by his ptrace calls so far.
      
      A few machines already fixed this in ptrace_disable (i386, ia64, m68k).
      But all other machines do not, and still have this bug.  On x86_64, this
      constitutes a regression in IA32 compatibility support.
      
      Since all machines now use TIF_SYSCALL_TRACE for this, I put the
      clearing of TIF_SYSCALL_TRACE in the generic ptrace_detach code rather
      than adding it to every other machine's ptrace_disable.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d941432
  3. 05 9月, 2007 8 次提交
  4. 31 8月, 2007 6 次提交
  5. 28 8月, 2007 7 次提交
    • I
      sched: clean up task_new_fair() · 9f508f82
      Ingo Molnar 提交于
      cleanup: we have the 'se' and 'curr' entity-pointers already,
      no need to use p->se and current->se.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      9f508f82
    • I
      sched: small schedstat fix · 213c8af6
      Ingo Molnar 提交于
      small schedstat fix: the cfs_rq->wait_runtime 'sum of all runtimes'
      statistics counters missed newly forked tasks and thus had a constant
      negative skew. Fix this.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      213c8af6
    • I
      sched: fix wait_start_fair condition in update_stats_wait_end() · b77d69db
      Ingo Molnar 提交于
      Peter Zijlstra noticed the following bug in SCHED_FEAT_SKIP_INITIAL (which
      is disabled by default at the moment): it relies on se.wait_start_fair
      being 0 while update_stats_wait_end() did not recognize a 0 value,
      so instead of 'skipping' the initial interval we gave the new child
      a maximum boost of +runtime-limit ...
      
      (No impact on the default kernel, but nice to fix for completeness.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      b77d69db
    • T
      sched: call update_curr() in task_tick_fair() · 7109c442
      Ting Yang 提交于
      update the fair-clock before using it for the key value.
      
      [ mingo@elte.hu: small cleanups. ]
      Signed-off-by: NTing Yang <tingy@cs.umass.edu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      7109c442
    • I
      sched: make the scheduler converge to the ideal latency · f6cf891c
      Ingo Molnar 提交于
      de-HZ-ification of the granularity defaults unearthed a pre-existing
      property of CFS: while it correctly converges to the granularity goal,
      it does not prevent run-time fluctuations in the range of
      [-gran ... 0 ... +gran].
      
      With the increase of the granularity due to the removal of HZ
      dependencies, this becomes visible in chew-max output (with 5 tasks
      running):
      
       out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
       out:  27 . 27. 32 | flu:  0 .  0 | ran:   17 .   13 | per:   44 .   40
       out:  27 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   36 .   40
       out:  29 . 27. 32 | flu:  2 .  0 | ran:   17 .   13 | per:   46 .   40
       out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
       out:  29 . 27. 32 | flu:  0 .  0 | ran:   18 .   13 | per:   47 .   40
       out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
      
      average slice is the ideal 13 msecs and the period is picture-perfect 40
      msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no
      mechanism in CFS to keep that from happening: it's a perfectly valid
      solution that CFS finds.
      
      to fix this we add a granularity/preemption rule that knows about
      the "target latency", which makes tasks that run longer than the ideal
      latency run a bit less. The simplest approach is to simply decrease the
      preemption granularity when a task overruns its ideal latency. For this
      we have to track how much the task executed since its last preemption.
      
      ( this adds a new field to task_struct, but we can eliminate that
        overhead in 2.6.24 by putting all the scheduler timestamps into an
        anonymous union. )
      
      with this change in place, chew-max output is fluctuation-less all
      around:
      
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
       out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
      
      this patch has no impact on any fastpath or on any globally observable
      scheduling property. (unless you have sharp enough eyes to see
      millisecond-level ruckles in glxgears smoothness :-)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      f6cf891c
    • M
      sched: fix sleeper bonus limit · 5f01d519
      Mike Galbraith 提交于
      There is an Amarok song switch time increase (regression) under
      hefty load.
      
      What is happening is that sleeper_bonus is never consumed, and only
      rarely goes below runtime_limit, so for the most part, Amarok isn't
      getting any bonus at all.  We're keeping sleeper_bonus right at
      runtime_limit (sched_latency == sched_runtime_limit == 40ms) forever, ie
      we don't consume if we're lower that that, and don't add if we're above
      it.  One Amarok thread waking (or anybody else) will push us past the
      threshold, so the next thread waking gets nada, but will reap pain from
      the previous thread waking until we drop back to runtime_limit.  It
      looks to me like under load, some random task gets a bonus, and
      everybody else pays, whether deserving or not.
      
      This diff fixed the regression for me at any load rate.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      5f01d519
    • H
      fix bogus hotplug cpu warning · d243769d
      Hugh Dickins 提交于
      Fix bogus DEBUG_PREEMPT warning on x86_64, when cpu brought online after
      bootup: current_is_keventd is right to note its use of smp_processor_id
      is preempt-safe, but should use raw_smp_processor_id to avoid the warning.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d243769d
  6. 26 8月, 2007 4 次提交
  7. 25 8月, 2007 8 次提交
  8. 23 8月, 2007 5 次提交
    • I
      sched: tweak the sched_runtime_limit tunable · 505c0efd
      Ingo Molnar 提交于
      Michael Gerdau reported reniced task CPU usage weirdnesses.
      Such symptoms can be caused by limit underruns so double the
      sched_runtime_limit.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      505c0efd
    • S
      sched: skip updating rq's next_balance under null SD · f549da84
      Suresh Siddha 提交于
      Was playing with sched_smt_power_savings/sched_mc_power_savings and
      found out that while the scheduler domains are reconstructed when sysfs
      settings change, rebalance_domains() can get triggered with null domain
      on other cpus, which is setting next_balance to jiffies + 60*HZ.
      Resulting in no idle/busy balancing for 60 seconds.
      
      Fix this.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f549da84
    • S
      sched: fix broken SMT/MC optimizations · f8700df7
      Suresh Siddha 提交于
      On a four package system with HT - HT load balancing optimizations were
      broken.  For example, if two tasks end up running on two logical threads
      of one of the packages, scheduler is not able to pull one of the tasks
      to a completely idle package.
      
      In this scenario, for nice-0 tasks, imbalance calculated by scheduler
      will be 512 and find_busiest_queue() will return 0 (as each cpu's load
      is 1024 > imbalance and has only one task running).
      
      Similarly MC scheduler optimizations also get fixed with this patch.
      
      [ mingo@elte.hu: restored fair balancing by increasing the fuzz and
                       adding it back to the power decision, without the /2
                       factor. ]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f8700df7
    • E
      sched: fix sysctl directory permissions · c57baf1e
      Eric W. Biederman 提交于
      There are two remaining gotchas:
      
      - The directories have impossible permissions (writeable).
      
      - The ctl_name for the kernel directory is inconsistent with
        everything else.  It should be CTL_KERN.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c57baf1e
    • I
      sched: sched_clock_idle_[sleep|wakeup]_event() · 2aa44d05
      Ingo Molnar 提交于
      construct a more or less wall-clock time out of sched_clock(), by
      using ACPI-idle's existing knowledge about how much time we spent
      idling. This allows the rq clock to work around TSC-stops-in-C2,
      TSC-gets-corrupted-in-C3 type of problems.
      
      ( Besides the scheduler's statistics this also benefits blktrace and
        printk-timestamps as well. )
      
      Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup
      callbacks allow the scheduler to get out the most of the period where
      the CPU has a reliable TSC. This results in slightly more precise
      task statistics.
      
      the ACPI bits were acked by Len.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NLen Brown <len.brown@intel.com>
      2aa44d05