1. 07 3月, 2006 1 次提交
    • L
      Add early-boot-safety check to cond_resched() · 8ba7b0a1
      Linus Torvalds 提交于
      Just to be safe, we should not trigger a conditional reschedule during
      the early boot sequence.  We've historically done some questionable
      early on, and the safety warnings in __might_sleep() are generally
      turned off during that period, so there might be problems lurking.
      
      This affects CONFIG_PREEMPT_VOLUNTARY, which takes over might_sleep() to
      cause a voluntary conditional reschedule.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ba7b0a1
  2. 03 3月, 2006 2 次提交
  3. 01 3月, 2006 1 次提交
  4. 21 2月, 2006 4 次提交
  5. 19 2月, 2006 1 次提交
  6. 18 2月, 2006 4 次提交
    • R
      [PATCH] swsusp: fix breakage with swap on LVM · a8534adb
      Rafael J. Wysocki 提交于
      Restore the compatibility with the older code and make it possible to
      suspend if the kernel command line doesn't contain the "resume=" argument
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a8534adb
    • I
      [PATCH] Introduce CONFIG_DEFAULT_MIGRATION_COST · 4bbf39c2
      Ingo Molnar 提交于
      Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
      
        The boot sequence on s390 sometimes takes ages and we spend a very long
        time (up to one or two minutes) in calibrate_migration_costs.  The time
        spent there differs from boot to boot.  Also the calculated costs differ
        a lot.  I've seen differences by up to a factor of 15 (yes, factor not
        percent).  Also I doubt that making these measurements make much sense on
        a completely virtualized architecture where you cannot tell how much cpu
        time you will get anyway.
      
      So introduce the CONFIG_DEFAULT_MIGRATION_COST method for an architecture
      to set the scheduler migration costs.  This turns off automatic detection
      of migration costs.  Makes sense on virtual platforms, where migration
      costs are hard to measure accurately.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4bbf39c2
    • P
      [PATCH] Provide an interface for getting the current tick length · 726c14bf
      Paul Mackerras 提交于
      This provides an interface for arch code to find out how many
      nanoseconds are going to be added on to xtime by the next call to
      do_timer.  The value returned is a fixed-point number in 52.12 format
      in nanoseconds.  The reason for this format is that it gives the
      full precision that the timekeeping code is using internally.
      
      The motivation for this is to fix a problem that has arisen on 32-bit
      powerpc in that the value returned by do_gettimeofday drifts apart
      from xtime if NTP is being used.  PowerPC is now using a lockless
      do_gettimeofday based on reading the timebase register and performing
      some simple arithmetic.  (This method of getting the time is also
      exported to userspace via the VDSO.)  However, the factor and offset
      it uses were calculated based on the nominal tick length and weren't
      being adjusted when NTP varied the tick length.
      
      Note that 64-bit powerpc has had the lockless do_gettimeofday for a
      long time now.  It also had an extremely hairy routine that got called
      from the 32-bit compat routine for adjtimex, which adjusted the
      factor and offset according to what it thought the timekeeping code
      was going to do.  Not only was this only called if a 32-bit task did
      adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
      duplicating computations from kernel/timer.c and it wasn't clear that
      it was (still) correct.
      
      The simple solution is to ask the timekeeping code how long the
      current jiffy will be on each timer interrupt, after calling
      do_timer.  If this jiffy will be a different length from the last one,
      we then need to compute new values for the factor and offset used in
      the lockless do_gettimeofday.  In this way we can keep xtime and
      do_gettimeofday in sync, even when NTP is varying the tick length.
      
      Note that when adjtimex varies the tick length, it almost always
      introduces the variation from the next tick on.  The only case I could
      see where adjtimex would vary the length of the current tick is when
      an old-style adjtime adjustment is being cancelled.  (It's not clear
      to me why the adjustment has to be cancelled immediately rather than
      from the next tick on.)  Thus I don't see any real need for a hook in
      adjtimex; the rare case of an old-style adjustment being cancelled can
      be fixed up at the next tick.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c14bf
    • A
      [PATCH] x86_64: Add boot option to disable randomized mappings and cleanup · a62eaf15
      Andi Kleen 提交于
      AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
      installation it's easiest if it's a boot time option.
      
      Also I moved the variable to a more appropiate place and make
      it independent from sysctl
      
      And marked __read_mostly which it is.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a62eaf15
  7. 16 2月, 2006 5 次提交
    • A
      [PATCH] swsusp: nuke noisy message · c8adb494
      Andrew Morton 提交于
      I get about 88 squillion of these when suspending an old ad450nx server.
      
      Cc: Pavel Roskin <proski@gnu.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c8adb494
    • P
      [PATCH] cpuset: oops in exit on null cpuset fix · 06fed338
      Paul Jackson 提交于
      Fix a latent bug in cpuset_exit() handling.  If a task tried to allocate
      memory after calling cpuset_exit(), it oops'd in
      cpuset_update_task_memory_state() on a NULL cpuset pointer.
      
      So set the exiting tasks cpuset to the root cpuset instead of to NULL.
      
      A distro kernel hit this with an added kernel package that had just such a
      hook (allocating memory) in the exit code path.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      06fed338
    • O
      [PATCH] fix zap_thread's ptrace related problems · 5ecfbae0
      Oleg Nesterov 提交于
      1. The tracee can go from ptrace_stop() to do_signal_stop()
         after __ptrace_unlink(p).
      
      2. It is unsafe to __ptrace_unlink(p) while p->parent may wait
         for tasklist_lock in ptrace_detach().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5ecfbae0
    • O
      [PATCH] fix kill_proc_info() vs fork() theoretical race · dadac81b
      Oleg Nesterov 提交于
      copy_process:
      
      	attach_pid(p, PIDTYPE_PID, p->pid);
      	attach_pid(p, PIDTYPE_TGID, p->tgid);
      
      What if kill_proc_info(p->pid) happens in between?
      
      copy_process() holds current->sighand.siglock, so we are safe
      in CLONE_THREAD case, because current->sighand == p->sighand.
      
      Otherwise, p->sighand is unlocked, the new process is already
      visible to the find_task_by_pid(), but have a copy of parent's
      'struct pid' in ->pids[PIDTYPE_TGID].
      
      This means that __group_complete_signal() may hang while doing
      
      	do ... while (next_thread() != p)
      
      We can solve this problem if we reverse these 2 attach_pid()s:
      
      	attach_pid() does wmb()
      
      	group_send_sig_info() calls spin_lock(), which
      	provides a read barrier. // Yes ?
      
      I don't think we can hit this race in practice, but still.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dadac81b
    • O
      [PATCH] fix kill_proc_info() vs CLONE_THREAD race · 3f17da69
      Oleg Nesterov 提交于
      There is a window after copy_process() unlocks ->sighand.siglock
      and before it adds the new thread to the thread list.
      
      In that window __group_complete_signal(SIGKILL) will not see the
      new thread yet, so this thread will start running while the whole
      thread group was supposed to exit.
      
      I beleive we have another good reason to place attach_pid(PID/TGID)
      under ->sighand.siglock. We can do the same for
      
      	release_task()->__unhash_process()
      
      	de_thread()->switch_exec_pids()
      
      After that we don't need tasklist_lock to iterate over the thread
      list, and we can simplify things, see for example do_sigaction()
      or sys_times().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3f17da69
  8. 15 2月, 2006 3 次提交
    • I
      [PATCH] hrtimer: round up relative start time on low-res arches · 06027bdd
      Ingo Molnar 提交于
      CONFIG_TIME_LOW_RES is a temporary way for architectures to signal that
      they simply return xtime in do_gettimeoffset().  In this corner-case we
      want to round up by resolution when starting a relative timer, to avoid
      short timeouts.  This will go away with the GTOD framework.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      06027bdd
    • C
      [PATCH] sched: revert "filter affine wakeups" · d6077cb8
      Chen, Kenneth W 提交于
      Revert commit d7102e95:
      
          [PATCH] sched: filter affine wakeups
      
      Apparently caused more than 10% performance regression for aim7 benchmark.
      The setup in use is 16-cpu HP rx8620, 64Gb of memory and 12 MSA1000s with 144
      disks.  Each disk is 72Gb with a single ext3 filesystem (courtesy of HP, who
      supplied benchmark results).
      
      The problem is, for aim7, the wake-up pattern is random, but it still needs
      load balancing action in the wake-up path to achieve best performance.  With
      the above commit, lack of load balancing hurts that workload.
      
      However, for workloads like database transaction processing, the requirement
      is exactly opposite.  In the wake up path, best performance is achieved with
      absolutely zero load balancing.  We simply wake up the process on the CPU that
      it was previously run.  Worst performance is obtained when we do load
      balancing at wake up.
      
      There isn't an easy way to auto detect the workload characteristics.  Ingo's
      earlier patch that detects idle CPU and decide whether to load balance or not
      doesn't perform with aim7 either since all CPUs are busy (it causes even
      bigger perf.  regression).
      
      Revert commit d7102e95, which causes more
      than 10% performance regression with aim7.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6077cb8
    • H
      [PATCH] compound page: no access_process_vm check · 16bf1348
      Hugh Dickins 提交于
      The PageCompound check before access_process_vm's set_page_dirty_lock is no
      longer necessary, so remove it.  But leave the PageCompound checks in
      bio_set_pages_dirty, dio_bio_complete and nfs_free_user_pages: at least some
      of those were introduced as a little optimization on hugetlb pages.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      16bf1348
  9. 11 2月, 2006 2 次提交
    • J
      [PATCH] prevent recursive panic from softlockup watchdog · c22db941
      Jan Beulich 提交于
      When panic_timeout is zero, suppress triggering a nested panic due to soft
      lockup detection.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c22db941
    • N
      [PATCH] sched: remove smpnice · a2000572
      Nick Piggin 提交于
      I don't think the code is quite ready, which is why I asked for Peter's
      additions to also be merged before I acked it (although it turned out that
      it still isn't quite ready with his additions either).
      
      Basically I have had similar observations to Suresh in that it does not
      play nicely with the rest of the balancing infrastructure (and raised
      similar concerns in my review).
      
      The samples (group of 4) I got for "maximum recorded imbalance" on a 2x2
      SMP+HT Xeon are as follows:
      
                  | Following boot | hackbench 20        | hackbench 40
       -----------+----------------+---------------------+---------------------
       2.6.16-rc2 | 30,37,100,112  | 5600,5530,6020,6090 | 6390,7090,8760,8470
       +nosmpnice |  3, 2,  4,  2  |   28, 150, 294, 132 |  348, 348, 294, 347
      
      Hackbench raw performance is down around 15% with smpnice (but that in
      itself isn't a huge deal because it is just a benchmark).  However, the
      samples show that the imbalance passed into move_tasks is increased by
      about a factor of 10-30.  I think this would also go some way to explaining
      latency blips turning up in the balancing code (though I haven't actually
      measured that).
      
      We'll probably have to revert this in the SUSE kernel.
      
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: "Martin J. Bligh" <mbligh@aracnet.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a2000572
  10. 10 2月, 2006 2 次提交
  11. 08 2月, 2006 11 次提交
  12. 06 2月, 2006 3 次提交
  13. 04 2月, 2006 1 次提交