1. 09 10月, 2013 2 次提交
  2. 19 4月, 2013 1 次提交
    • W
      mutex: Move mutex spinning code from sched/core.c back to mutex.c · 41fcb9f2
      Waiman Long 提交于
      As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler
      feature bit was really just an early hack to make with/without
      mutex-spinning testable. So it is no longer necessary.
      
      This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and
      move the mutex spinning code from kernel/sched/core.c back to
      kernel/mutex.c which is where they should belong.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Norton Scott J <scott.norton@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      41fcb9f2
  3. 11 12月, 2012 3 次提交
    • M
      mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node · 5bca2303
      Mel Gorman 提交于
      Due to the fact that migrations are driven by the CPU a task is running
      on there is no point tracking NUMA faults until one task runs on a new
      node. This patch tracks the first node used by an address space. Until
      it changes, PTE scanning is disabled and no NUMA hinting faults are
      trapped. This should help workloads that are short-lived, do not care
      about NUMA placement or have bound themselves to a single node.
      
      This takes advantage of the logic in "mm: sched: numa: Implement slow
      start for working set sampling" to delay when the checks are made. This
      will take advantage of processes that set their CPU and node bindings
      early in their lifetime. It will also potentially allow any initial load
      balancing to take place.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      5bca2303
    • M
      mm: sched: numa: Control enabling and disabling of NUMA balancing · 1a687c2e
      Mel Gorman 提交于
      This patch adds Kconfig options and kernel parameters to allow the
      enabling and disabling of automatic NUMA balancing. The existance
      of such a switch was and is very important when debugging problems
      related to transparent hugepages and we should have the same for
      automatic NUMA placement.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      1a687c2e
    • P
      mm: numa: Add fault driven placement and migration · cbee9f88
      Peter Zijlstra 提交于
      NOTE: This patch is based on "sched, numa, mm: Add fault driven
      	placement and migration policy" but as it throws away all the policy
      	to just leave a basic foundation I had to drop the signed-offs-by.
      
      This patch creates a bare-bones method for setting PTEs pte_numa in the
      context of the scheduler that when faulted later will be faulted onto the
      node the CPU is running on.  In itself this does nothing useful but any
      placement policy will fundamentally depend on receiving hints on placement
      from fault context and doing something intelligent about it.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      cbee9f88
  4. 16 10月, 2012 1 次提交
  5. 13 9月, 2012 1 次提交
  6. 04 9月, 2012 1 次提交
  7. 26 4月, 2012 1 次提交
  8. 07 12月, 2011 1 次提交
  9. 17 11月, 2011 1 次提交
  10. 14 11月, 2011 1 次提交
  11. 14 8月, 2011 1 次提交
  12. 21 7月, 2011 1 次提交
    • P
      sched: Allow for overlapping sched_domain spans · e3589f6c
      Peter Zijlstra 提交于
      Allow for sched_domain spans that overlap by giving such domains their
      own sched_group list instead of sharing the sched_groups amongst
      each-other.
      
      This is needed for machines with more than 16 nodes, because
      sched_domain_node_span() will generate a node mask from the
      16 nearest nodes without regard if these masks have any overlap.
      
      Currently sched_domains have a sched_group that maps to their child
      sched_domain span, and since there is no overlap we share the
      sched_group between the sched_domains of the various CPUs. If however
      there is overlap, we would need to link the sched_group list in
      different ways for each cpu, and hence sharing isn't possible.
      
      In order to solve this, allocate private sched_groups for each CPU's
      sched_domain but have the sched_groups share a sched_group_power
      structure such that we can uniquely track the power.
      Reported-and-tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-08bxqw9wis3qti9u5inifh3y@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e3589f6c
  13. 14 7月, 2011 1 次提交
    • G
      sched: adjust scheduler cpu power for stolen time · 095c0aa8
      Glauber Costa 提交于
      This patch makes update_rq_clock() aware of steal time.
      The mechanism of operation is not different from irq_time,
      and follows the same principles. This lives in a CONFIG
      option itself, and can be compiled out independently of
      the rest of steal time reporting. The effect of disabling it
      is that the scheduler will still report steal time (that cannot be
      disabled), but won't use this information for cpu power adjustments.
      
      Everytime update_rq_clock_task() is invoked, we query information
      about how much time was stolen since last call, and feed it into
      sched_rt_avg_update().
      
      Although steal time reporting in account_process_tick() keeps
      track of the last time we read the steal clock, in prev_steal_time,
      this patch do it independently using another field,
      prev_steal_time_rq. This is because otherwise, information about time
      accounted in update_process_tick() would never reach us in update_rq_clock().
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Tested-by: NEric B Munson <emunson@mgebm.net>
      CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      CC: Anthony Liguori <aliguori@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      095c0aa8
  14. 14 4月, 2011 1 次提交
  15. 18 11月, 2010 1 次提交
  16. 19 10月, 2010 1 次提交
  17. 12 3月, 2010 7 次提交
  18. 09 12月, 2009 1 次提交
  19. 17 9月, 2009 1 次提交
    • P
      sched: Add new wakeup preemption mode: WAKEUP_RUNNING · ad4b78bb
      Peter Zijlstra 提交于
      Create a new wakeup preemption mode, preempt towards tasks that run
      shorter on avg. It sets next buddy to be sure we actually run the task
      we preempted for.
      
      Test results:
      
       root@twins:~# while :; do :; done &
       [1] 6537
       root@twins:~# while :; do :; done &
       [2] 6538
       root@twins:~# while :; do :; done &
       [3] 6539
       root@twins:~# while :; do :; done &
       [4] 6540
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max          4750 usec
              Avg           497 usec
              Stdev         737 usec
      
       root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max            14 usec
              Avg             5 usec
              Stdev           3 usec
      
      Disabled by default - needs more testing.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      ad4b78bb
  20. 16 9月, 2009 3 次提交
  21. 15 9月, 2009 5 次提交
  22. 11 9月, 2009 1 次提交
    • I
      sched: Disable NEW_FAIR_SLEEPERS for now · 3f2aa307
      Ingo Molnar 提交于
      Nikos Chantziaras and Jens Axboe reported that turning off
      NEW_FAIR_SLEEPERS improves desktop interactivity visibly.
      
      Nikos described his experiences the following way:
      
        " With this setting, I can do "nice -n 19 make -j20" and
          still have a very smooth desktop and watch a movie at
          the same time.  Various other annoyances (like the
          "logout/shutdown/restart" dialog of KDE not appearing
          at all until the background fade-out effect has finished)
          are also gone.  So this seems to be the single most
          important setting that vastly improves desktop behavior,
          at least here. "
      
      Jens described it the following way, referring to a 10-seconds
      xmodmap scheduling delay he was trying to debug:
      
        " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
          I get:
      
          Performance counter stats for 'xmodmap .xmodmap-carl':
      
               9.009137  task-clock-msecs         #      0.447 CPUs
                     18  context-switches         #      0.002 M/sec
                      1  CPU-migrations           #      0.000 M/sec
                    315  page-faults              #      0.035 M/sec
      
          0.020167093  seconds time elapsed
      
          Woot! "
      
      So disable it for now. In perf trace output i can see weird
      delta timestamps:
      
        cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]
      
      That nsec field is not supposed to be that large. More digging
      is needed - but lets turn it off while the real bug is found.
      Reported-by: NNikos Chantziaras <realnc@arcor.de>
      Tested-by: NNikos Chantziaras <realnc@arcor.de>
      Reported-by: NJens Axboe <jens.axboe@oracle.com>
      Tested-by: NJens Axboe <jens.axboe@oracle.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <4AA93D34.8040500@arcor.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f2aa307
  23. 15 1月, 2009 2 次提交
    • P
      sched: prefer wakers · e52fb7c0
      Peter Zijlstra 提交于
      Prefer tasks that wake other tasks to preempt quickly. This improves
      performance because more work is available sooner.
      
      The workload that prompted this patch was a kernel build over NFS4 (for some
      curious and not understood reason we had to revert commit:
      18de9735 to make any progress at all)
      
      Without this patch a make -j8 bzImage (of x86-64 defconfig) would take
      3m30-ish, with this patch we're down to 2m50-ish.
      
      psql-sysbench/mysql-sysbench show a slight improvement in peak performance as
      well, tbench and vmark seemed to not care.
      
      It is possible to improve upon the build time (to 2m20-ish) but that seriously
      destroys other benchmarks (just shows that there's more room for tinkering).
      
      Much thanks to Mike who put in a lot of effort to benchmark things and proved
      a worthy opponent with a competing patch.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e52fb7c0
    • P
      mutex: implement adaptive spinning · 0d66bf6d
      Peter Zijlstra 提交于
      Change mutex contention behaviour such that it will sometimes busy wait on
      acquisition - moving its behaviour closer to that of spinlocks.
      
      This concept got ported to mainline from the -rt tree, where it was originally
      implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins.
      
      Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
      gave a 345% boost for VFS scalability on my testbox:
      
       # ./test-mutex-shm V 16 10 | grep "^avg ops"
       avg ops/sec:               296604
      
       # ./test-mutex-shm V 16 10 | grep "^avg ops"
       avg ops/sec:               85870
      
      The key criteria for the busy wait is that the lock owner has to be running on
      a (different) cpu. The idea is that as long as the owner is running, there is a
      fair chance it'll release the lock soon, and thus we'll be better off spinning
      instead of blocking/scheduling.
      
      Since regular mutexes (as opposed to rtmutexes) do not atomically track the
      owner, we add the owner in a non-atomic fashion and deal with the races in
      the slowpath.
      
      Furthermore, to ease the testing of the performance impact of this new code,
      there is means to disable this behaviour runtime (without having to reboot
      the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y),
      by issuing the following command:
      
       # echo NO_OWNER_SPIN > /debug/sched_features
      
      This command re-enables spinning again (this is also the default):
      
       # echo OWNER_SPIN > /debug/sched_features
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0d66bf6d
  24. 05 11月, 2008 1 次提交
    • P
      sched: backward looking buddy · 4793241b
      Peter Zijlstra 提交于
      Impact: improve/change/fix wakeup-buddy scheduling
      
      Currently we only have a forward looking buddy, that is, we prefer to
      schedule to the task we last woke up, under the presumption that its
      going to consume the data we just produced, and therefore will have
      cache hot benefits.
      
      This allows co-waking producer/consumer task pairs to run ahead of the
      pack for a little while, keeping their cache warm. Without this, we
      would interleave all pairs, utterly trashing the cache.
      
      This patch introduces a backward looking buddy, that is, suppose that
      in the above scenario, the consumer preempts the producer before it
      can go to sleep, we will therefore miss the wakeup from consumer to
      producer (its already running, after all), breaking the cycle and
      reverting to the cache-trashing interleaved schedule pattern.
      
      The backward buddy will try to schedule back to the task that woke us
      up in case the forward buddy is not available, under the assumption
      that the last task will be the one with the most cache hot task around
      barring current.
      
      This will basically allow a task to continue after it got preempted.
      
      In order to avoid starvation, we allow either buddy to get wakeup_gran
      ahead of the pack.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4793241b