1. 21 9月, 2010 1 次提交
    • V
      sched: Increment cache_nice_tries only on periodic lb · 58b26c4c
      Venkatesh Pallipadi 提交于
      scheduler uses cache_nice_tries as an indicator to do cache_hot and
      active load balance, when normal load balance fails. Currently,
      this value is changed on any failed load balance attempt. That ends
      up being not so nice to workloads that enter/exit idle often, as
      they do more frequent new_idle balance and that pretty soon results
      in cache hot tasks being pulled in.
      
      Making the cache_nice_tries ignore failed new_idle balance seems to
      make better sense. With that only the failed load balance in
      periodic load balance gets accounted and the rate of accumulation
      of cache_nice_tries will not depend on idle entry/exit (short
      running sleep-wakeup kind of tasks). This reduces movement of
      cache_hot tasks.
      
      schedstat diff (after-before) excerpt from a workload that has
      frequent and short wakeup-idle pattern (:2 in cpu col below refers
      to NEWIDLE idx) This snapshot was across ~400 seconds.
      
      Without this change:
      domainstats:  domain0
       cpu     cnt      bln      fld      imb     gain    hgain  nobusyq  nobusyg
       0:2  306487   219575    73167  110069413    44583    19070     1172   218403
       1:2  292139   194853    81421  120893383    50745    21902     1259   193594
       2:2  283166   174607    91359  129699642    54931    23688     1287   173320
       3:2  273998   161788    93991  132757146    57122    24351     1366   160422
       4:2  289851   215692    62190  83398383    36377    13680      851   214841
       5:2  316312   222146    77605  117582154    49948    20281      988   221158
       6:2  297172   195596    83623  122133390    52801    21301      929   194667
       7:2  283391   178078    86378  126622761    55122    22239      928   177150
       8:2  297655   210359    72995  110246694    45798    19777     1125   209234
       9:2  297357   202011    79363  119753474    50953    22088     1089   200922
      10:2  278797   178703    83180  122514385    52969    22726     1128   177575
      11:2  272661   167669    86978  127342327    55857    24342     1195   166474
      12:2  293039   204031    73211  110282059    47285    19651      948   203083
      13:2  289502   196762    76803  114712942    49339    20547     1016   195746
      14:2  264446   169609    78292  115715605    50459    21017      982   168627
      15:2  260968   163660    80142  116811793    51483    21281     1064   162596
      
      With this change:
      domainstats:  domain0
       cpu     cnt      bln      fld      imb     gain    hgain  nobusyq  nobusyg
       0:2  272347   187380    77455  105420270    24975        1      953   186427
       1:2  267276   172360    86234  116242264    28087        6     1028   171332
       2:2  259769   156777    93281  123243134    30555        1     1043   155734
       3:2  250870   143129    97627  127370868    32026        6     1188   141941
       4:2  248422   177116    64096  78261112    22202        2      757   176359
       5:2  275595   180683    84950  116075022    29400        6      778   179905
       6:2  262418   162609    88944  119256898    31056        4      817   161792
       7:2  252204   147946    92646  122388300    32879        4      824   147122
       8:2  262335   172239    81631  110477214    26599        4      864   171375
       9:2  261563   164775    88016  117203621    28331        3      849   163926
      10:2  243389   140949    93379  121353071    29585        2      909   140040
      11:2  242795   134651    98310  124768957    30895        2     1016   133635
      12:2  255234   166622    79843  104696912    26483        4      746   165876
      13:2  244944   151595    83855  109808099    27787        3      801   150794
      14:2  241301   140982    89935  116954383    30403        6      845   140137
      15:2  232271   128564    92821  119185207    31207        4     1416   127148
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1284167957-3675-1-git-send-email-venki@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      58b26c4c
  2. 16 9月, 2010 1 次提交
    • H
      sched: Remove branch hints within context_switch() · 31915ab4
      Heiko Carstens 提交于
      With 710390d9 "sched: Optimize branch hint in context_switch()"
      the branch hint logic within context_switch() got inversed.
      
      In fact the hints "if (likely(!mm))" and "if (likely(!prev->mm))"
      mean that it is likely that the previous and next task are kernel
      threads.
      
      That assumption is certainly counter intuitive, but Tim has shown
      that at least with his workload this is true. Nevertheless the
      truth is: it depends on the current workload. So just remove the
      annotations which also improves readability.
      Reported-by: NTim Blechmann <tim@klingt.org>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20100916124225.GA2209@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      31915ab4
  3. 15 9月, 2010 1 次提交
    • H
      compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5
      H. Peter Anvin 提交于
      compat_alloc_user_space() expects the caller to independently call
      access_ok() to verify the returned area.  A missing call could
      introduce problems on some architectures.
      
      This patch incorporates the access_ok() check into
      compat_alloc_user_space() and also adds a sanity check on the length.
      The existing compat_alloc_user_space() implementations are renamed
      arch_compat_alloc_user_space() and are used as part of the
      implementation of the new global function.
      
      This patch assumes NULL will cause __get_user()/__put_user() to either
      fail or access userspace on all architectures.  This should be
      followed by checking the return value of compat_access_user_space()
      for NULL in the callers, at which time the access_ok() in the callers
      can also be removed.
      Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: <stable@kernel.org>
      c41d68a5
  4. 14 9月, 2010 2 次提交
  5. 13 9月, 2010 1 次提交
  6. 12 9月, 2010 1 次提交
    • R
      PM / Hibernate: Avoid hitting OOM during preallocation of memory · 6715045d
      Rafael J. Wysocki 提交于
      There is a problem in hibernate_preallocate_memory() that it calls
      preallocate_image_memory() with an argument that may be greater than
      the total number of available non-highmem memory pages.  If that's
      the case, the OOM condition is guaranteed to trigger, which in turn
      can cause significant slowdown to occur during hibernation.
      
      To avoid that, make preallocate_image_memory() adjust its argument
      before calling preallocate_image_pages(), so that the total number of
      saveable non-highem pages left is not less than the minimum size of
      a hibernation image.  Change hibernate_preallocate_memory() to try to
      allocate from highmem if the number of pages allocated by
      preallocate_image_memory() is too low.
      
      Modify free_unnecessary_pages() to take all possible memory
      allocation patterns into account.
      Reported-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Tested-by: NM. Vefa Bicakci <bicave@superonline.com>
      6715045d
  7. 11 9月, 2010 1 次提交
  8. 10 9月, 2010 10 次提交
  9. 08 9月, 2010 3 次提交
  10. 09 9月, 2010 1 次提交
    • S
      tracing: Do not allow llseek to set_ftrace_filter · 9c55cb12
      Steven Rostedt 提交于
      Reading the file set_ftrace_filter does three things.
      
      1) shows whether or not filters are set for the function tracer
      2) shows what functions are set for the function tracer
      3) shows what triggers are set on any functions
      
      3 is independent from 1 and 2.
      
      The way this file currently works is that it is a state machine,
      and as you read it, it may change state. But this assumption breaks
      when you use lseek() on the file. The state machine gets out of sync
      and the t_show() may use the wrong pointer and cause a kernel oops.
      
      Luckily, this will only kill the app that does the lseek, but the app
      dies while holding a mutex. This prevents anyone else from using the
      set_ftrace_filter file (or any other function tracing file for that matter).
      
      A real fix for this is to rewrite the code, but that is too much for
      a -rc release or stable. This patch simply disables llseek on the
      set_ftrace_filter() file for now, and we can do the proper fix for the
      next major release.
      Reported-by: NRobert Swiecki <swiecki@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Cc: vendor-sec@lst.de
      Cc: <stable@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9c55cb12
  11. 08 9月, 2010 1 次提交
  12. 05 9月, 2010 2 次提交
  13. 03 9月, 2010 1 次提交
  14. 01 9月, 2010 3 次提交
    • D
      lockup_detector: Sync touch_*_watchdog back to old semantics · 68d3f1d8
      Don Zickus 提交于
      During my rewrite, the semantics of touch_nmi_watchdog and
      touch_softlockup_watchdog changed enough to break some drivers
      (mostly over preemptable regions).
      
      These are cases where long delays on one CPU (due to
      print_delay for example) can cause long delays on other
      CPUs - so we must 'touch' the nmi_watchdog flag of those
      other CPUs as well.
      
      This change brings those touch_*_watchdog() functions back in line
      with to how they used to work.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: peterz@infradead.org
      Cc: fweisbec@gmail.com
      LKML-Reference: <1283310009-22168-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      68d3f1d8
    • P
      pid: make setpgid() system call use RCU read-side critical section · 950eaaca
      Paul E. McKenney 提交于
      [   23.584719]
      [   23.584720] ===================================================
      [   23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
      [   23.585176] ---------------------------------------------------
      [   23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
      [   23.585176]
      [   23.585176] other info that might help us debug this:
      [   23.585176]
      [   23.585176]
      [   23.585176] rcu_scheduler_active = 1, debug_locks = 1
      [   23.585176] 1 lock held by rc.sysinit/728:
      [   23.585176]  #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8104771f>] sys_setpgid+0x5f/0x193
      [   23.585176]
      [   23.585176] stack backtrace:
      [   23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
      [   23.585176] Call Trace:
      [   23.585176]  [<ffffffff8105b436>] lockdep_rcu_dereference+0x99/0xa2
      [   23.585176]  [<ffffffff8104c324>] find_task_by_pid_ns+0x50/0x6a
      [   23.585176]  [<ffffffff8104c35b>] find_task_by_vpid+0x1d/0x1f
      [   23.585176]  [<ffffffff81047727>] sys_setpgid+0x67/0x193
      [   23.585176]  [<ffffffff810029eb>] system_call_fastpath+0x16/0x1b
      [   24.959669] type=1400 audit(1282938522.956:4): avc:  denied  { module_request } for  pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas
      
      It turns out that the setpgid() system call fails to enter an RCU
      read-side critical section before doing a PID-to-task_struct translation.
      This commit therefore does rcu_read_lock() before the translation, and
      also does rcu_read_unlock() after the last use of the returned pointer.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      950eaaca
    • L
      tracing: Fix a race in function profile · 3aaba20f
      Li Zefan 提交于
      While we are reading trace_stat/functionX and someone just
      disabled function_profile at that time, we can trigger this:
      
      	divide error: 0000 [#1] PREEMPT SMP
      	...
      	EIP is at function_stat_show+0x90/0x230
      	...
      
      This fix just takes the ftrace_profile_lock and checks if
      rec->counter is 0. If it's 0, we know the profile buffer
      has been reset.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: stable@kernel.org
      LKML-Reference: <4C723644.4040708@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3aaba20f
  15. 31 8月, 2010 2 次提交
  16. 30 8月, 2010 1 次提交
    • S
      perf_events: Fix time tracking for events with pid != -1 and cpu != -1 · fa66f07a
      Stephane Eranian 提交于
      Per-thread events with a cpu filter, i.e., cpu != -1, were not
      reporting correct timings when the thread never ran on the
      monitored cpu. The time enabled was reported as a negative
      value.
      
      This patch fixes the problem by updating tstamp_stopped,
      tstamp_running in event_sched_out() for events with filters and
      which are marked as INACTIVE.
      
      The function group_sched_out() is modified to systematically
      call into event_sched_out() to avoid duplicating the timing
      adjustment code twice.
      
      With the patch, I now get:
      
      $ task_cpu -i -e unhalted_core_cycles,unhalted_core_cycles
      noploop 2 noploop for 2 seconds
      CPU0 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      CPU0 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      
      CPU1 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      CPU1 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      
      CPU2 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      CPU2 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      
      CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594)
      CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594)
      Signed-off-by: NStephane Eranian <eranian@gmail.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: davem@davemloft.net
      Cc: fweisbec@gmail.com
      Cc: perfmon2-devel@lists.sf.net
      Cc: eranian@google.com
      LKML-Reference: <4c76802d.aae9d80a.115d.70fe@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa66f07a
  17. 27 8月, 2010 1 次提交
  18. 25 8月, 2010 4 次提交
    • A
      tracing/trace_stack: Fix stack trace on ppc64 · 151772db
      Anton Blanchard 提交于
      save_stack_trace() stores the instruction pointer, not the
      function descriptor. On ppc64 the trace stack code currently
      dereferences the instruction pointer and shows 8 bytes of
      instructions in our backtraces:
      
       # cat /sys/kernel/debug/tracing/stack_trace
              Depth    Size   Location    (26 entries)
              -----    ----   --------
        0)     5424     112   0x6000000048000004
        1)     5312     160   0x60000000ebad01b0
        2)     5152     160   0x2c23000041c20030
        3)     4992     240   0x600000007c781b79
        4)     4752     160   0xe84100284800000c
        5)     4592     192   0x600000002fa30000
        6)     4400     256   0x7f1800347b7407e0
        7)     4144     208   0xe89f0108f87f0070
        8)     3936     272   0xe84100282fa30000
      
      Since we aren't dealing with function descriptors, use %pS
      instead of %pF to fix it:
      
       # cat /sys/kernel/debug/tracing/stack_trace
              Depth    Size   Location    (26 entries)
              -----    ----   --------
        0)     5424     112   ftrace_call+0x4/0x8
        1)     5312     160   .current_io_context+0x28/0x74
        2)     5152     160   .get_io_context+0x48/0xa0
        3)     4992     240   .cfq_set_request+0x94/0x4c4
        4)     4752     160   .elv_set_request+0x60/0x84
        5)     4592     192   .get_request+0x2d4/0x468
        6)     4400     256   .get_request_wait+0x7c/0x258
        7)     4144     208   .__make_request+0x49c/0x610
        8)     3936     272   .generic_make_request+0x390/0x434
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: rostedt@goodmis.org
      Cc: fweisbec@gmail.com
      LKML-Reference: <20100825013238.GE28360@kryten>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      151772db
    • T
      workqueue: fix cwq->nr_active underflow · 8a2e8e5d
      Tejun Heo 提交于
      cwq->nr_active is used to keep track of how many work items are active
      for the cpu workqueue, where 'active' is defined as either pending on
      global worklist or executing.  This is used to implement the
      max_active limit and workqueue freezing.  If a work item is queued
      after nr_active has already reached max_active, the work item doesn't
      increment nr_active and is put on the delayed queue and gets activated
      later as previous active work items retire.
      
      try_to_grab_pending() which is used in the cancellation path
      unconditionally decremented nr_active whether the work item being
      cancelled is currently active or delayed, so cancelling a delayed work
      item makes nr_active underflow.  This breaks max_active enforcement
      and triggers BUG_ON() in destroy_workqueue() later on.
      
      This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
      is set while a work item in on the delayed list and making
      try_to_grab_pending() decrement nr_active iff the work item is
      currently active.
      
      The addition of the flag enlarges cwq alignment to 256 bytes which is
      getting a bit too large.  It's scheduled to be reduced back to 128
      bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
      devel cycle.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJohannes Berg <johannes@sipsolutions.net>
      8a2e8e5d
    • D
      PM QoS: Fix kzalloc() parameters swapped in pm_qos_power_open() · bac1e74d
      David Alan Gilbert 提交于
      sparse spotted that the kzalloc() in pm_qos_power_open() in the
      current Linus' git tree had its parameters swapped.  Fix this.
      Signed-off-by: NDavid Alan Gilbert <linux@treblig.org>
      Acked-by: Nmark gross <markgross@thegnar.org>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      bac1e74d
    • T
      workqueue: improve destroy_workqueue() debuggability · e41e704b
      Tejun Heo 提交于
      Now that the worklist is global, having works pending after wq
      destruction can easily lead to oops and destroy_workqueue() have
      several BUG_ON()s to catch these cases.  Unfortunately, BUG_ON()
      doesn't tell much about how the work became pending after the final
      flush_workqueue().
      
      This patch adds WQ_DYING which is set before the final flush begins.
      If a work is requested to be queued on a dying workqueue,
      WARN_ON_ONCE() is triggered and the request is ignored.  This clearly
      indicates which caller is trying to queue a work on a dying workqueue
      and keeps the system working in most cases.
      
      Locking rule comment is updated such that the 'I' rule includes
      modifying the field from destruction path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e41e704b
  19. 23 8月, 2010 3 次提交
    • N
      workqueue: mark lock acquisition on worker_maybe_bind_and_lock() · 972fa1c5
      Namhyung Kim 提交于
      worker_maybe_bind_and_lock() actually grabs gcwq->lock but was missing proper
      annotation. Add it. So this patch will remove following sparse warnings:
      
       kernel/workqueue.c:1214:13: warning: context imbalance in 'worker_maybe_bind_and_lock' - wrong count at exit
       arch/x86/include/asm/irqflags.h:44:9: warning: context imbalance in 'worker_rebind_fn' - unexpected unlock
       kernel/workqueue.c:1991:17: warning: context imbalance in 'rescuer_thread' - unexpected unlock
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      972fa1c5
    • N
      workqueue: annotate lock context change · 06bd6ebf
      Namhyung Kim 提交于
      Some of internal functions called within gcwq->lock context releases and
      regrabs the lock but were missing proper annotations. Add it.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      06bd6ebf
    • T
      mutex: Improve the scalability of optimistic spinning · 9d0f4dcc
      Tim Chen 提交于
      There is a scalability issue for current implementation of optimistic
      mutex spin in the kernel.  It is found on a 8 node 64 core Nehalem-EX
      system (HT mode).
      
      The intention of the optimistic mutex spin is to busy wait and spin on a
      mutex if the owner of the mutex is running, in the hope that the mutex
      will be released soon and be acquired, without the thread trying to
      acquire mutex going to sleep. However, when we have a large number of
      threads, contending for the mutex, we could have the mutex grabbed by
      other thread, and then another ……, and we will keep spinning, wasting cpu
      cycles and adding to the contention.  One possible fix is to quit
      spinning and put the current thread on wait-list if mutex lock switch to
      a new owner while we spin, indicating heavy contention (see the patch
      included).
      
      I did some testing on a 8 socket Nehalem-EX system with a total of 64
      cores. Using Ingo's test-mutex program that creates/delete files with 256
      threads (http://lkml.org/lkml/2006/1/8/50) , I see the following speed up
      after putting in the mutex spin fix:
      
       ./mutex-test V 256 10
                       Ops/sec
       2.6.34          62864
       With fix        197200
      
      Repeating the test with Aim7 fserver workload, again there is a speed up
      with the fix:
      
                       Jobs/min
       2.6.34          91657
       With fix        149325
      
      To look at the impact on the distribution of mutex acquisition time, I
      collected the mutex acquisition time on Aim7 fserver workload with some
      instrumentation.  The average acquisition time is reduced by 48% and
      number of contentions reduced by 32%.
      
                       #contentions    Time to acquire mutex (cycles)
       2.6.34          72973           44765791
       With fix        49210           23067129
      
      The histogram of mutex acquisition time is listed below.  The acquisition
      time is in 2^bin cycles.  We see that without the fix, the acquisition
      time is mostly around 2^26 cycles.  With the fix, we the distribution get
      spread out a lot more towards the lower cycles, starting from 2^13.
      However, there is an increase of the tail distribution with the fix at
      2^28 and 2^29 cycles.  It seems a small price to pay for the reduced
      average acquisition time and also getting the cpu to do useful work.
      
       Mutex acquisition time distribution (acq time = 2^bin cycles):
               2.6.34                  With Fix
       bin     #occurrence     %       #occurrence     %
       11      2               0.00%   120             0.24%
       12      10              0.01%   790             1.61%
       13      14              0.02%   2058            4.18%
       14      86              0.12%   3378            6.86%
       15      393             0.54%   4831            9.82%
       16      710             0.97%   4893            9.94%
       17      815             1.12%   4667            9.48%
       18      790             1.08%   5147            10.46%
       19      580             0.80%   6250            12.70%
       20      429             0.59%   6870            13.96%
       21      311             0.43%   1809            3.68%
       22      255             0.35%   2305            4.68%
       23      317             0.44%   916             1.86%
       24      610             0.84%   233             0.47%
       25      3128            4.29%   95              0.19%
       26      63902           87.69%  122             0.25%
       27      619             0.85%   286             0.58%
       28      0               0.00%   3536            7.19%
       29      0               0.00%   903             1.83%
       30      0               0.00%   0               0.00%
      
      I've done similar experiments with 2.6.35 kernel on smaller boxes as
      well.  One is on a dual-socket Westmere box (12 cores total, with HT).
      Another experiment is on an old dual-socket Core 2 box (4 cores total, no
      HT)
      
      On the 12-core Westmere box, I see a 250% increase for Ingo's mutex-test
      program with my mutex patch but no significant difference in aim7's
      fserver workload.
      
      On the 4-core Core 2 box, I see the difference with the patch for both
      mutex-test and aim7 fserver are negligible.
      
      So far, it seems like the patch has not caused regression on smaller
      systems.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: <stable@kernel.org> # .35.x
      LKML-Reference: <1282168827.9542.72.camel@schen9-DESK>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d0f4dcc