1. 09 5月, 2008 27 次提交
  2. 08 5月, 2008 13 次提交
    • M
      sched: fix weight calculations · 46151122
      Mike Galbraith 提交于
      The conversion between virtual and real time is as follows:
      
        dvt = rw/w * dt <=> dt = w/rw * dvt
      
      Since we want the fair sleeper granularity to be in real time, we actually
      need to do:
      
        dvt = - rw/w * l
      
      This bug could be related to the regression reported by Yanmin Zhang:
      
      | Comparing with kernel 2.6.25, sysbench+mysql(oltp, readonly) has lots
      | of regressions with 2.6.26-rc1:
      |
      | 1) 8-core stoakley: 28%;
      | 2) 16-core tigerton: 20%;
      | 3) Itanium Montvale: 50%.
      Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46151122
    • I
      semaphore: fix · bf726eab
      Ingo Molnar 提交于
      Yanmin Zhang reported:
      
      | Comparing with kernel 2.6.25, AIM7 (use tmpfs) has more th
      | regression under 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton,
      | and Itanium Montecito. Bisect located the patch below:
      |
      | 64ac24e7 is first bad commit
      | commit 64ac24e7
      | Author: Matthew Wilcox <matthew@wil.cx>
      | Date:   Fri Mar 7 21:55:58 2008 -0500
      |
      |     Generic semaphore implementation
      |
      | After I manually reverted the patch against 2.6.26-rc1 while fixing
      | lots of conflicts/errors, aim7 regression became less than 2%.
      
      i reproduced the AIM7 workload and can confirm Yanmin's findings that
      -.26-rc1 regresses over .25 - by over 67% here.
      
      Looking at the workload i found and fixed what i believe to be the real
      bug causing the AIM7 regression: it was inefficient wakeup / scheduling
      / locking behavior of the new generic semaphore code, causing suboptimal
      performance.
      
      The problem comes from the following code. The new semaphore code does
      this on down():
      
              spin_lock_irqsave(&sem->lock, flags);
              if (likely(sem->count > 0))
                      sem->count--;
              else
                      __down(sem);
              spin_unlock_irqrestore(&sem->lock, flags);
      
      and this on up():
      
              spin_lock_irqsave(&sem->lock, flags);
              if (likely(list_empty(&sem->wait_list)))
                      sem->count++;
              else
                      __up(sem);
              spin_unlock_irqrestore(&sem->lock, flags);
      
      where __up() does:
      
              list_del(&waiter->list);
              waiter->up = 1;
              wake_up_process(waiter->task);
      
      and where __down() does this in essence:
      
              list_add_tail(&waiter.list, &sem->wait_list);
              waiter.task = task;
              waiter.up = 0;
              for (;;) {
                      [...]
                      spin_unlock_irq(&sem->lock);
                      timeout = schedule_timeout(timeout);
                      spin_lock_irq(&sem->lock);
                      if (waiter.up)
                              return 0;
              }
      
      the fastpath looks good and obvious, but note the following property of
      the contended path: if there's a task on the ->wait_list, the up() of
      the current owner will "pass over" ownership to that waiting task, in a
      wake-one manner, via the waiter->up flag and by removing the waiter from
      the wait list.
      
      That is all and fine in principle, but as implemented in
      kernel/semaphore.c it also creates a nasty, hidden source of contention!
      
      The contention comes from the following property of the new semaphore
      code: the new owner owns the semaphore exclusively, even if it is not
      running yet.
      
      So if the old owner, even if just a few instructions later, does a
      down() [lock_kernel()] again, it will be blocked and will have to wait
      on the new owner to eventually be scheduled (possibly on another CPU)!
      Or if another task gets to lock_kernel() sooner than the "new owner"
      scheduled, it will be blocked unnecessarily and for a very long time
      when there are 2000 tasks running.
      
      I.e. the implementation of the new semaphores code does wake-one and
      lock ownership in a very restrictive way - it does not allow
      opportunistic re-locking of the lock at all and keeps the scheduler from
      picking task order intelligently.
      
      This kind of scheduling, with 2000 AIM7 processes running, creates awful
      cross-scheduling between those 2000 tasks, causes reduced parallelism, a
      throttled runqueue length and a lot of idle time. With increasing number
      of CPUs it causes an exponentially worse behavior in AIM7, as the chance
      for a newly woken new-owner task to actually run anytime soon is less
      and less likely.
      
      Note that it takes just a tiny bit of contention for the 'new-semaphore
      catastrophy' to happen: the wakeup latencies get added to whatever small
      contention there is, and quickly snowball out of control!
      
      I believe Yanmin's findings and numbers support this analysis too.
      
      The best fix for this problem is to use the same scheduling logic that
      the kernel/mutex.c code uses: keep the wake-one behavior (that is OK and
      wanted because we do not want to over-schedule), but also allow
      opportunistic locking of the lock even if a wakee is already "in
      flight".
      
      The patch below implements this new logic. With this patch applied the
      AIM7 regression is largely fixed on my quad testbox:
      
        # v2.6.25 vanilla:
        ..................
        Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
        2000    56096.4         91      207.5   789.7   0.4675
        2000    55894.4         94      208.2   792.7   0.4658
      
        # v2.6.26-rc1-166-gc0a18111 vanilla:
        ...................................
        Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
        2000    33230.6         83      350.3   784.5   0.2769
        2000    31778.1         86      366.3   783.6   0.2648
      
        # v2.6.26-rc1-166-gc0a18111 + semaphore-speedup:
        ...............................................
        Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
        2000    55707.1         92      209.0   795.6   0.4642
        2000    55704.4         96      209.0   796.0   0.4642
      
      i.e. a 67% speedup. We are now back to within 1% of the v2.6.25
      performance levels and have zero idle time during the test, as expected.
      
      Btw., interactivity also improved dramatically with the fix - for
      example console-switching became almost instantaneous during this
      workload (which after all is running 2000 tasks at once!), without the
      patch it was stuck for a minute at times.
      
      There's another nice side-effect of this speedup patch, the new generic
      semaphore code got even smaller:
      
         text    data     bss     dec     hex filename
         1241       0       0    1241     4d9 semaphore.o.before
         1207       0       0    1207     4b7 semaphore.o.after
      
      (because the waiter.up complication got removed.)
      
      Longer-term we should look into using the mutex code for the generic
      semaphore code as well - but i's not easy due to legacies and it's
      outside of the scope of v2.6.26 and outside the scope of this patch as
      well.
      Bisected-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf726eab
    • J
      Revert "relay: fix splice problem" · 75065ff6
      Jens Axboe 提交于
      This reverts commit c3270e57.
      75065ff6
    • P
      [ALSA] soc at91 minor bug fixes · e3a2efa6
      Patrik Sevallius 提交于
      Found these two bugs while browsing through the code.  The first one is
      a cut-n-paste bug, instead of disabling the clock when request_irq()
      fails, it enabled it once more.  The second one fixes a debug printout,
      AT91_SSC_IER is write only, AT91_SSC_IMR is readable (the printed string
      actually says imr).
      
      Frank Mandarino was busy so he asked me to send these to this list.
      
      /Patrik
      Signed-off-by: NPatrik Sevallius <patrik.sevallius@enea.com>
      Acked-by: NFrank Mandarino <fmandarino@endrelia.com>
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      e3a2efa6
    • M
      [ALSA] soc - at91-pcm - Fix line wrapping · 30a717f7
      Mark Brown 提交于
      There's more checkpatch stuff to fix in the driver, this just fixes the
      minimum required for the following patch to be clean.
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      30a717f7
    • D
      sparc: Fix SA_ONSTACK signal handling. · dc5dc7e6
      David S. Miller 提交于
      We need to be more liberal about the alignment of the buffer given to
      us by sigaltstack().  The user should not need to be mindful of all of
      the alignment constraints we have for the stack frame.
      
      This mirrors how we handle this situation in clone() as well.
      
      Also, we align the stack even in non-SA_ONSTACK cases so that signals
      due to bad stack alignment can be delivered properly.  This makes such
      errors easier to debug and recover from.
      
      Finally, add the sanity check x86 has to make sure we won't overflow
      the signal stack.
      
      This fixes glibc testcases nptl/tst-cancel20.c and
      nptl/tst-cancelx20.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc5dc7e6
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · 3de2403e
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
        sparc: Fix fork/clone/vfork system call restart.
        sparc: Fix mmap VA span checking.
      3de2403e
    • D
      sparc: Fix fork/clone/vfork system call restart. · 1e38c126
      David S. Miller 提交于
      We clobber %i1 as well as %i0 for these system calls,
      because they give two return values.
      
      Therefore, on error, we have to restore %i1 properly
      or else the restart explodes since it uses the wrong
      arguments.
      
      This fixes glibc's nptl/tst-eintr1.c testcase.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e38c126
    • A
      [MAINTAINERS] New maintainer for Intel ethernet adapters · e0164af6
      Auke Kok 提交于
      I'm handing over maintainership to Jeff Kirsher and moving on
      to other Linux/Open Source work within Intel. Good luck to Jeff ;)
      Signed-off-by: NAuke Kok <auke-jan.h.kok@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0164af6
    • S
      IB/ehca: Wait for async events to finish before destroying QP · 12137c59
      Stefan Roscher 提交于
      This is necessary because, in a multicore environment, a race between
      uverbs async handler and destroy QP could occur.
      
      Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      12137c59
    • J
      IB/ipath: Fix SDMA error recovery in absence of link status change · ab69b3cf
      John Gregor 提交于
      What's fixed:
      
          in ipath_cancel_sends()
      
              We need to unconditionally set ABORTING.  So, swap the tests
              so the set_bit() isn't shadowed by the &&.
      
              If we've disarmed the piobufs, then we need to unconditionally
              set DISARMED.  So, move it out from the overly protective if
              at the bottom.
      
          in sdma_abort_task()
      
              Abort_task was written knowing that the SDMA engine would always
              be reset (and restarted) on error.  A recent change broke that
              fundamental assumption by taking the restart portion and making
              it conditional on a link status change.  But, SDMA can go boom
              without a link status change in some conditions.
      Signed-off-by: NJohn Gregor <john.gregor@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ab69b3cf
    • D
      IB/ipath: Need to always request and handle PIO avail interrupts · e2ab41ca
      Dave Olson 提交于
      Now that we always use PIO for vl15 on 7220, we could get stuck forever
      if we happened to run out of PIO buffers from the verbs code, because
      the setup code wouldn't run; the interrupt was also ignored if SDMA was
      supported.  We also have to reduce the pio update threshold if we have
      fewer kernel buffers than the existing threshold.
      
      Clean up the initialization a bit to get ordering safer and more
      sensible, and use the existing ipath_chg_kernavail call to do init,
      rather than doing it separately.
      
      Drop unnecessary clearing of pio buffer on pio parity error.
      
      Drop incorrect updating of pioavailshadow when exitting freeze mode
      (software state may not match chip state if buffer has been allocated
      and not yet written).
      
      If we couldn't get a kernel buffer for a while, make sure we are
      in sync with hardware, mainly to handle the exitting freeze case.
      Signed-off-by: NDave Olson <dave.olson@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e2ab41ca
    • M
      IB/ipath: Fix count of packets received by kernel · 2889d1ef
      Michael Albaugh 提交于
      The loop in ipath_kreceive() that processes packets increments the
      loop-index 'i' once too often, because the exit condition does not
      depend on it, and is checked after the increment. By adding a check for
      !last to the iterator in the for loop, we correct that in a way that is
      not so likely to be re-broken by changes in the loop body.
      Signed-off-by: NMichael Albaugh <micheal.albaugh@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      2889d1ef