1. 24 5月, 2008 3 次提交
  2. 17 5月, 2008 4 次提交
  3. 15 5月, 2008 2 次提交
  4. 12 5月, 2008 1 次提交
    • L
      Add new 'cond_resched_bkl()' helper function · c3921ab7
      Linus Torvalds 提交于
      It acts exactly like a regular 'cond_resched()', but will not get
      optimized away when CONFIG_PREEMPT is set.
      
      Normal kernel code is already preemptable in the presense of
      CONFIG_PREEMPT, so cond_resched() is optimized away (see commit
      02b67cc3 "sched: do not do
      cond_resched() when CONFIG_PREEMPT").
      
      But when wanting to conditionally reschedule while holding a lock, you
      need to use "cond_sched_lock(lock)", and the new function is the BKL
      equivalent of that.
      
      Also make fs/locks.c use it.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3921ab7
  5. 11 5月, 2008 2 次提交
    • L
      BKL: revert back to the old spinlock implementation · 8e3e076c
      Linus Torvalds 提交于
      The generic semaphore rewrite had a huge performance regression on AIM7
      (and potentially other BKL-heavy benchmarks) because the generic
      semaphores had been rewritten to be simple to understand and fair.  The
      latter, in particular, turns a semaphore-based BKL implementation into a
      mess of scheduling.
      
      The attempt to fix the performance regression failed miserably (see the
      previous commit 00b41ec2 'Revert
      "semaphore: fix"'), and so for now the simple and sane approach is to
      instead just go back to the old spinlock-based BKL implementation that
      never had any issues like this.
      
      This patch also has the advantage of being reported to fix the
      regression completely according to Yanmin Zhang, unlike the semaphore
      hack which still left a couple percentage point regression.
      
      As a spinlock, the BKL obviously has the potential to be a latency
      issue, but it's not really any different from any other spinlock in that
      respect.  We do want to get rid of the BKL asap, but that has been the
      plan for several years.
      
      These days, the biggest users are in the tty layer (open/release in
      particular) and Alan holds out some hope:
      
        "tty release is probably a few months away from getting cured - I'm
         afraid it will almost certainly be the very last user of the BKL in
         tty to get fixed as it depends on everything else being sanely locked."
      
      so while we're not there yet, we do have a plan of action.
      Tested-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Alexander Viro <viro@ftp.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e3e076c
    • L
      Revert "semaphore: fix" · 00b41ec2
      Linus Torvalds 提交于
      This reverts commit bf726eab, as it has
      been reported to cause a regression with processes stuck in __down(),
      apparently because some missing wakeup.
      
      Quoth Sven Wegener:
       "I'm currently investigating a regression that has showed up with my
        last git pull yesterday.  Bisecting the commits showed bf726e
        "semaphore: fix" to be the culprit, reverting it fixed the issue.
      
        Symptoms: During heavy filesystem usage (e.g.  a kernel compile) I get
        several compiler processes in uninterruptible sleep, blocking all i/o
        on the filesystem.  System is an Intel Core 2 Quad running a 64bit
        kernel and userspace.  Filesystem is xfs on top of lvm.  See below for
        the output of sysrq-w."
      
      See
      
      	http://lkml.org/lkml/2008/5/10/45
      
      for full report.
      
      In the meantime, we can just fix the BKL performance regression by
      reverting back to the good old BKL spinlock implementation instead,
      since any sleeping lock will generally perform badly, especially if it
      tries to be fair.
      Reported-by: NSven Wegener <sven.wegener@stealer.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00b41ec2
  6. 09 5月, 2008 3 次提交
  7. 08 5月, 2008 3 次提交
    • M
      sched: fix weight calculations · 46151122
      Mike Galbraith 提交于
      The conversion between virtual and real time is as follows:
      
        dvt = rw/w * dt <=> dt = w/rw * dvt
      
      Since we want the fair sleeper granularity to be in real time, we actually
      need to do:
      
        dvt = - rw/w * l
      
      This bug could be related to the regression reported by Yanmin Zhang:
      
      | Comparing with kernel 2.6.25, sysbench+mysql(oltp, readonly) has lots
      | of regressions with 2.6.26-rc1:
      |
      | 1) 8-core stoakley: 28%;
      | 2) 16-core tigerton: 20%;
      | 3) Itanium Montvale: 50%.
      Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46151122
    • I
      semaphore: fix · bf726eab
      Ingo Molnar 提交于
      Yanmin Zhang reported:
      
      | Comparing with kernel 2.6.25, AIM7 (use tmpfs) has more th
      | regression under 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton,
      | and Itanium Montecito. Bisect located the patch below:
      |
      | 64ac24e7 is first bad commit
      | commit 64ac24e7
      | Author: Matthew Wilcox <matthew@wil.cx>
      | Date:   Fri Mar 7 21:55:58 2008 -0500
      |
      |     Generic semaphore implementation
      |
      | After I manually reverted the patch against 2.6.26-rc1 while fixing
      | lots of conflicts/errors, aim7 regression became less than 2%.
      
      i reproduced the AIM7 workload and can confirm Yanmin's findings that
      -.26-rc1 regresses over .25 - by over 67% here.
      
      Looking at the workload i found and fixed what i believe to be the real
      bug causing the AIM7 regression: it was inefficient wakeup / scheduling
      / locking behavior of the new generic semaphore code, causing suboptimal
      performance.
      
      The problem comes from the following code. The new semaphore code does
      this on down():
      
              spin_lock_irqsave(&sem->lock, flags);
              if (likely(sem->count > 0))
                      sem->count--;
              else
                      __down(sem);
              spin_unlock_irqrestore(&sem->lock, flags);
      
      and this on up():
      
              spin_lock_irqsave(&sem->lock, flags);
              if (likely(list_empty(&sem->wait_list)))
                      sem->count++;
              else
                      __up(sem);
              spin_unlock_irqrestore(&sem->lock, flags);
      
      where __up() does:
      
              list_del(&waiter->list);
              waiter->up = 1;
              wake_up_process(waiter->task);
      
      and where __down() does this in essence:
      
              list_add_tail(&waiter.list, &sem->wait_list);
              waiter.task = task;
              waiter.up = 0;
              for (;;) {
                      [...]
                      spin_unlock_irq(&sem->lock);
                      timeout = schedule_timeout(timeout);
                      spin_lock_irq(&sem->lock);
                      if (waiter.up)
                              return 0;
              }
      
      the fastpath looks good and obvious, but note the following property of
      the contended path: if there's a task on the ->wait_list, the up() of
      the current owner will "pass over" ownership to that waiting task, in a
      wake-one manner, via the waiter->up flag and by removing the waiter from
      the wait list.
      
      That is all and fine in principle, but as implemented in
      kernel/semaphore.c it also creates a nasty, hidden source of contention!
      
      The contention comes from the following property of the new semaphore
      code: the new owner owns the semaphore exclusively, even if it is not
      running yet.
      
      So if the old owner, even if just a few instructions later, does a
      down() [lock_kernel()] again, it will be blocked and will have to wait
      on the new owner to eventually be scheduled (possibly on another CPU)!
      Or if another task gets to lock_kernel() sooner than the "new owner"
      scheduled, it will be blocked unnecessarily and for a very long time
      when there are 2000 tasks running.
      
      I.e. the implementation of the new semaphores code does wake-one and
      lock ownership in a very restrictive way - it does not allow
      opportunistic re-locking of the lock at all and keeps the scheduler from
      picking task order intelligently.
      
      This kind of scheduling, with 2000 AIM7 processes running, creates awful
      cross-scheduling between those 2000 tasks, causes reduced parallelism, a
      throttled runqueue length and a lot of idle time. With increasing number
      of CPUs it causes an exponentially worse behavior in AIM7, as the chance
      for a newly woken new-owner task to actually run anytime soon is less
      and less likely.
      
      Note that it takes just a tiny bit of contention for the 'new-semaphore
      catastrophy' to happen: the wakeup latencies get added to whatever small
      contention there is, and quickly snowball out of control!
      
      I believe Yanmin's findings and numbers support this analysis too.
      
      The best fix for this problem is to use the same scheduling logic that
      the kernel/mutex.c code uses: keep the wake-one behavior (that is OK and
      wanted because we do not want to over-schedule), but also allow
      opportunistic locking of the lock even if a wakee is already "in
      flight".
      
      The patch below implements this new logic. With this patch applied the
      AIM7 regression is largely fixed on my quad testbox:
      
        # v2.6.25 vanilla:
        ..................
        Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
        2000    56096.4         91      207.5   789.7   0.4675
        2000    55894.4         94      208.2   792.7   0.4658
      
        # v2.6.26-rc1-166-gc0a18111 vanilla:
        ...................................
        Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
        2000    33230.6         83      350.3   784.5   0.2769
        2000    31778.1         86      366.3   783.6   0.2648
      
        # v2.6.26-rc1-166-gc0a18111 + semaphore-speedup:
        ...............................................
        Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
        2000    55707.1         92      209.0   795.6   0.4642
        2000    55704.4         96      209.0   796.0   0.4642
      
      i.e. a 67% speedup. We are now back to within 1% of the v2.6.25
      performance levels and have zero idle time during the test, as expected.
      
      Btw., interactivity also improved dramatically with the fix - for
      example console-switching became almost instantaneous during this
      workload (which after all is running 2000 tasks at once!), without the
      patch it was stuck for a minute at times.
      
      There's another nice side-effect of this speedup patch, the new generic
      semaphore code got even smaller:
      
         text    data     bss     dec     hex filename
         1241       0       0    1241     4d9 semaphore.o.before
         1207       0       0    1207     4b7 semaphore.o.after
      
      (because the waiter.up complication got removed.)
      
      Longer-term we should look into using the mutex code for the generic
      semaphore code as well - but i's not easy due to legacies and it's
      outside of the scope of v2.6.26 and outside the scope of this patch as
      well.
      Bisected-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf726eab
    • J
      Revert "relay: fix splice problem" · 75065ff6
      Jens Axboe 提交于
      This reverts commit c3270e57.
      75065ff6
  8. 06 5月, 2008 15 次提交
  9. 05 5月, 2008 3 次提交
    • E
      Removal of FUTEX_FD · 82af7aca
      Eric Sesterhenn 提交于
      Since FUTEX_FD was scheduled for removal in June 2007 lets remove it.
      
      Google Code search found no users for it and NGPT was abandoned in 2003
      according to IBM.  futex.h is left untouched to make sure the id does
      not get reassigned.  Since queue_me() has no users left it is commented
      out to avoid a warning, i didnt remove it completely since it is part of
      the internal api (matching unqueue_me())
      Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed rest)
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82af7aca
    • H
      kgdb: fix signedness mixmatches, add statics, add declaration to header · 688b744d
      Harvey Harrison 提交于
      Noticed by sparse:
      arch/x86/kernel/kgdb.c:556:15: warning: symbol 'kgdb_arch_pc' was not declared. Should it be static?
      kernel/kgdb.c:149:8: warning: symbol 'kgdb_do_roundup' was not declared. Should it be static?
      kernel/kgdb.c:193:22: warning: symbol 'kgdb_arch_pc' was not declared. Should it be static?
      kernel/kgdb.c:712:5: warning: symbol 'remove_all_break' was not declared. Should it be static?
      
      Related to kgdb_hex2long:
      arch/x86/kernel/kgdb.c:371:28: warning: incorrect type in argument 2 (different signedness)
      arch/x86/kernel/kgdb.c:371:28:    expected long *long_val
      arch/x86/kernel/kgdb.c:371:28:    got unsigned long *<noident>
      kernel/kgdb.c:469:27: warning: incorrect type in argument 2 (different signedness)
      kernel/kgdb.c:469:27:    expected long *long_val
      kernel/kgdb.c:469:27:    got unsigned long *<noident>
      kernel/kgdb.c:470:27: warning: incorrect type in argument 2 (different signedness)
      kernel/kgdb.c:470:27:    expected long *long_val
      kernel/kgdb.c:470:27:    got unsigned long *<noident>
      kernel/kgdb.c:894:27: warning: incorrect type in argument 2 (different signedness)
      kernel/kgdb.c:894:27:    expected long *long_val
      kernel/kgdb.c:894:27:    got unsigned long *<noident>
      kernel/kgdb.c:895:27: warning: incorrect type in argument 2 (different signedness)
      kernel/kgdb.c:895:27:    expected long *long_val
      kernel/kgdb.c:895:27:    got unsigned long *<noident>
      kernel/kgdb.c:1127:28: warning: incorrect type in argument 2 (different signedness)
      kernel/kgdb.c:1127:28:    expected long *long_val
      kernel/kgdb.c:1127:28:    got unsigned long *<noident>
      kernel/kgdb.c:1132:25: warning: incorrect type in argument 2 (different signedness)
      kernel/kgdb.c:1132:25:    expected long *long_val
      kernel/kgdb.c:1132:25:    got unsigned long *<noident>
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      688b744d
    • L
      Make forced module loading optional · 826e4506
      Linus Torvalds 提交于
      The kernel module loader used to be much too happy to allow loading of
      modules for the wrong kernel version by default.  For example, if you
      had MODVERSIONS enabled, but tried to load a module with no version
      info, it would happily load it and taint the kernel - whether it was
      likely to actually work or not!
      
      Generally, such forced module loading should be considered a really
      really bad idea, so make it conditional on a new config option
      (MODULE_FORCE_LOAD), and make it default to off.
      
      If somebody really wants to force module loads, that's their problem,
      but we should not encourage it.  Especially as it happened to me by
      mistake (ie regular unversioned Fedora modules getting loaded) causing
      lots of strange behavior.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      826e4506
  10. 04 5月, 2008 3 次提交
  11. 03 5月, 2008 1 次提交
    • H
      Make constants in kernel/timeconst.h fixed 64 bits · b9095fd8
      H. Peter Anvin 提交于
      Force constants in kernel/timeconst.h (except shift counts) to be 64 bits,
      using U64_C() constructor macros, and eliminate constants that cannot
      be represented at all in 64 bits.  This avoids warnings with some gcc
      versions.
      
      Drop generating 64-bit constants, since we have no real hope of
      getting a full set (operation on 64-bit values requires a 128-bit
      intermediate result, which gcc only supports on 64-bit platforms, and
      only with libgcc support on some.)  Note that the use of these
      constants does not depend on if we are on a 32- or 64-bit architecture.
      
      This resolves Bugzilla 10153.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b9095fd8